(***********************************************************************

                    Mathematica-Compatible Notebook

This notebook can be used on any computer system with Mathematica 4.0,
MathReader 4.0, or any compatible application. The data for the notebook 
starts with the line containing stars above.

To get the notebook into a Mathematica-compatible application, do one of 
the following:

* Save the data starting with the line of stars above into a file
  with a name ending in .nb, then open the file inside the application;

* Copy the data starting with the line of stars above to the
  clipboard, then use the Paste menu command inside the application.

Data for notebooks contains only printable 7-bit ASCII and can be
sent directly in email or through ftp in text mode.  Newlines can be
CR, LF or CRLF (Unix, Macintosh or MS-DOS style).

NOTE: If you modify the data for this notebook not in a Mathematica-
compatible application, you must delete the line below containing the 
word CacheID, otherwise Mathematica-compatible applications may try to 
use invalid cache data.

For more information on notebooks and Mathematica-compatible 
applications, contact Wolfram Research:
  web: http://www.wolfram.com
  email: info@wolfram.com
  phone: +1-217-398-0700 (U.S.)

Notebook reader applications are available free of charge from 
Wolfram Research.
***********************************************************************)

(*CacheID: 232*)


(*NotebookFileLineBreakTest
NotebookFileLineBreakTest*)
(*NotebookOptionsPosition[     14287,        460]*)
(*NotebookOutlinePosition[     15072,        486]*)
(*  CellTagsIndexPosition[     15028,        482]*)
(*WindowFrame->Normal*)



Notebook[{

Cell[CellGroupData[{
Cell["Math 384", "Subtitle"],

Cell[CellGroupData[{

Cell["Newton's method applied to minimization", "Subsubsection",
  CellDingbat->None],

Cell["Suppose we have the function ", "Text"],

Cell[BoxData[{
    \(f[x1_, x2_, x3_, x4_] := 
      3*x1^2*x2\  - \ x1*x2*x3\  + \ x2*x4^2\  + \ 2*x1\  - \ 7*x3\  + 
        Exp[x1^2 + x2^4 + x3^2 + x4^2]\), "\[IndentingNewLine]", 
    \(\(vars\  = \ {x1, x2, x3, x4};\)\)}], "Input"],

Cell["\<\
We know from quick inspection that this function is positive coercive (why), \
hence it has a global minimum. We know, moreover, that it is everywhere \
differentiable, hence that global minimum will occur at a critical point.\
\>", "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(gradf\  = \ Map[D[f[x1, x2, x3, x4], #] &, \ vars]\)], "Input"],

Cell[BoxData[
    \({2 + 2\ \[ExponentialE]\^\(x1\^2 + x2\^4 + x3\^2 + x4\^2\)\ x1 + 
        6\ x1\ x2 - x2\ x3, 
      3\ x1\^2 + 4\ \[ExponentialE]\^\(x1\^2 + x2\^4 + x3\^2 + x4\^2\)\ x2\^3 
        - x1\ x3 + x4\^2, 
      \(-7\) - x1\ x2 + 
        2\ \[ExponentialE]\^\(x1\^2 + x2\^4 + x3\^2 + x4\^2\)\ x3, 
      2\ \[ExponentialE]\^\(x1\^2 + x2\^4 + x3\^2 + x4\^2\)\ x4 + 2\ x2\ x4}
      \)], "Output"]
}, Open  ]],

Cell[TextData[{
  "We can attempt to find a critical point by calling the ",
  Cell[BoxData[
      \(FindRoot\)]],
  " function in ",
  StyleBox["Mathematica",
    FontSlant->"Italic"],
  ", feeding in random starting values. I should point out that ",
  Cell[BoxData[
      \(FindRoot\)]],
  " uses some flavor of a Newton's method. But we will also do this \"by hand\
\" presently."
}], "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(FindRoot[Evaluate[Thread[gradf \[Equal] 0]], \ {x1, Random[]}, 
      {x2, Random[]}, {x3, Random[]}, {x4, Random[]}, 
      MaxIterations \[Rule] 100]\)], "Input"],

Cell[BoxData[
    \(FindRoot::"cvnwt" \( : \ \) 
      "Newton's method failed to converge to the prescribed accuracy after \!\
\(100\) iterations."\)], "Message"],

Cell[BoxData[
    \({x1 \[Rule] 0.9452743843737569`, x2 \[Rule] 3.84035035269016`, 
      x3 \[Rule] 1.203141175392998`, x4 \[Rule] \(-0.00007907327949295469`\)}
      \)], "Output"]
}, Open  ]],

Cell[TextData[{
  "I repeat this many times, always I see the \"failed to converge\" warning. \
Let us see what happens when we do it from basic principles. Recall the \
update formula. If we are minimizing ",
  Cell[BoxData[
      \(f\)]],
  "[",
  Cell[BoxData[
      \(\(x\& \[Rule] \)\)]],
  "] then we are finding a root of ",
  Cell[BoxData[
      \(\[Del]\)]],
  Cell[BoxData[
      \(f\)]],
  " \[Equal] ",
  Cell[BoxData[
      \(\(0\& \[Rule] \)\)]],
  " and hence our update formula is\n",
  Cell[BoxData[
      \(\(x\& \[Rule] \)\^\(k + 1\)\)]],
  " =  ",
  Cell[BoxData[
      \(\(x\& \[Rule] \)\^k\)]],
  " - ",
  Cell[BoxData[
      \(\((\(H\_f\) \((\(x\& \[Rule] \))\))\)\^\(-1\)\)]],
  " ",
  Cell[BoxData[
      \(\[Del]\)]],
  Cell[BoxData[
      \(f\)]],
  " [",
  Cell[BoxData[
      \(\(x\& \[Rule] \)\^k\)]],
  "]\nwhere the notation means we evaluate gradient and Hessian at ",
  Cell[BoxData[
      \(\(x\& \[Rule] \)\^k\)]],
  ", we invert the Hessian matrix, and we multiply that inverse by the \
evaluated gradient vector."
}], "Text"],

Cell[CellGroupData[{

Cell[BoxData[{
    \(\(hessf\  = \ Outer[D, gradf, vars];\)\), "\[IndentingNewLine]", 
    \(\(invhessf\  = \ Inverse[hessf];\)\), "\[IndentingNewLine]", 
    \(LeafCount[invhessf]\)}], "Input"],

Cell[BoxData[
    \(26179\)], "Output"]
}, Open  ]],

Cell["\<\
This is a VERY ugly matrix. Why? Because we computed a symbolic matrix \
inverse. Lesson 1: Do not do that if you can avoid it.\
\>", "Text"],

Cell[CellGroupData[{

Cell[BoxData[{
    \(\(vals\  = \ Table[Random[], {4}];\)\), "\[IndentingNewLine]", 
    \(\(ruls\  = \ Thread[vars \[Rule] vals];\)\), "\[IndentingNewLine]", 
    \(\(h1\  = \ hessf\  /. \ ruls;\)\), "\[IndentingNewLine]", 
    \(\(h2\  = \ invhessf\  /. \ ruls;\)\), "\[IndentingNewLine]", 
    \(h1\  . \ h2\  // \ Chop\)}], "Input"],

Cell[BoxData[
    \({{1.`, 0, 0, 0}, {0, 1.0000000000000002`, 0, 0}, {0, 0, 1.`, 0}, 
      {0, 0, 0, 1.0000000000000002`}}\)], "Output"]
}, Open  ]],

Cell[TextData[{
  "Not bad, at least we get a numerical inverse matrix on substitution of \
random values for our variables. But we will avoid all this and simply set up \
the correct update formula. That is, we will solve: ",
  Cell[BoxData[
      \(\((\(H\_f\) \((\(x\& \[Rule] \))\))\)\)]],
  " (",
  Cell[BoxData[
      \(\(x\& \[Rule] \)\^\(k + 1\)\)]],
  " -  ",
  Cell[BoxData[
      \(\(x\& \[Rule] \)\^k\)]],
  ") =  -",
  Cell[BoxData[
      \(\[Del]\)]],
  "f (",
  Cell[BoxData[
      \(\(x\& \[Rule] \)\^k\)]],
  ") for ",
  Cell[BoxData[
      \(\(x\& \[Rule] \)\^\(k + 1\)\)]],
  "."
}], "Text"],

Cell[BoxData[
    \(update[{xk_, yk_, zk_, wk_}, \ vars_, \ grad_, hess_]\  := \ 
      With[{reprule\  = \ 
            Thread[vars \[Rule] 
                {xk, yk, zk, wk}]}, \[IndentingNewLine]LinearSolve[
            hess /. reprule, \(-grad\) /. reprule]\  + \ {xk, yk, zk, wk}]
        \)], "Input"],

Cell["\<\
For example, if we take as our initial point {2,3,-1,6}then update will give \
us the next point in the Newton iteration sequence.\
\>", "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(update[{2. , 3. , \(-1. \), 6. }, \ {x1, x2, x3, x4}, gradf, hessf]\)], 
  "Input"],

Cell[BoxData[
    \({1.9895287958115184`, 2.994764397905759`, \(-0.9947643979057592`\), 
      5.968586387434555`}\)], "Output"]
}, Open  ]],

Cell[TextData[{
  "By the way, why do I use decimal values instead of exact integer values \
for my initial points? This is because if I use exact values then everything \
gets evaluated using exact arithmetic, we get a huge but unenlightening \
result, and it takes alot longer to attain because we must use symbolic \
rather than numeric software to do the ",
  Cell[BoxData[
      \(LinearSolve\)]],
  ".\n\nNotice that we did not move very far. Offhand I do not know if that \
is good or not. Also one might want to know if we are embarked on a \
reasonable search. Fom a theorem covered in class we know we are off in a \
direction of descent provided the Hessian matrix, evaluated at the point in \
the Newton sequence, is positive definite. We learn that it is indeed simply \
by checking that all four eigenvalues are positive. I do this using \
high-precision arithmetic to control for monstrous round-off errors in \
machine arithmetic (I first did it using machine arithmetic, saw the large \
exponents, and decided to use the higher precision \"significance \
arithmetic\" for safety)."
}], "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(Eigenvalues[
      hessf\  /. \ Thread[
          {x1, x2, x3, x4} \[Rule] N[{2, 3, \(-1\), 6}, \ 25]]]\)], "Input"],

Cell[BoxData[
    \({1.15009202243558395939712608959964`22.3836*^57, 
      3.331256791662084260450693148331001`22.3836*^53, 
      1.927333134720640254352774602693633`22.3836*^53, 
      1.927333134720640254352774408340166`22.3836*^53}\)], "Output"]
}, Open  ]],

Cell["Let us do a sequence of  10 updates and see where we go.", "Text"],

Cell[CellGroupData[{

Cell[BoxData[{
    \(\({aa, bb, cc, dd}\  = \ {2. , 3. , \(-1. \), 6. };\)\), 
  "\[IndentingNewLine]", 
    \(\(Do[
        {aa, bb, cc, dd}\  = \ 
          update[{aa, bb, cc, dd}, {x1, x2, x3, x4}, gradf, hessf], \ {10}];
      \)\), "\[IndentingNewLine]", 
    \({aa, bb, cc, dd}\)}], "Input"],

Cell[BoxData[
    \({1.8937298756268668`, 2.9459926006718025`, \(-0.9468649378134336`\), 
      5.681189626880601`}\)], "Output"]
}, Open  ]],

Cell[TextData[{
  "Not too far from where we began. Now let's try ",
  Cell[BoxData[
      \(100\)]],
  " updates, beginning from where we left off."
}], "Text"],

Cell[CellGroupData[{

Cell[BoxData[{
    \(\(Do[
        {aa, bb, cc, dd}\  = \ 
          update[{aa, bb, cc, dd}, {x1, x2, x3, x4}, gradf, hessf], \ {100}];
      \)\), "\[IndentingNewLine]", 
    \({aa, bb, cc, dd}\)}], "Input"],

Cell[BoxData[
    \({0.46953111415726856`, 1.8564399572999062`, \(-0.23476465558013526`\), 
      1.408597475073986`}\)], "Output"]
}, Open  ]],

Cell[TextData[{
  "We have moved further. This time we will do ",
  Cell[BoxData[
      \(1000\)]],
  " updates, again beginning from where we left off."
}], "Text"],

Cell[CellGroupData[{

Cell[BoxData[{
    \(\(Do[
        {aa, bb, cc, dd}\  = \ 
          update[{aa, bb, cc, dd}, {x1, x2, x3, x4}, gradf, hessf], \ {1000}];
      \)\), "\[IndentingNewLine]", 
    \(endposition\  = \ {aa, bb, cc, dd}\)}], "Input"],

Cell[BoxData[
    \({\(-0.5309270755598561`\), \(-0.4537212599048488`\), 
      0.9874185815083382`, 0.`}\)], "Output"]
}, Open  ]],

Cell[TextData[{
  "I will do another ",
  Cell[BoxData[
      \(1000\)]],
  " updates starting from where we left off. This time note that we do not \
move anywhere discernable. In other words, we have arrived (at a root of the \
gradient function, that is)."
}], "Text"],

Cell[CellGroupData[{

Cell[BoxData[{
    \(\(Do[
        {aa, bb, cc, dd}\  = \ 
          update[{aa, bb, cc, dd}, {x1, x2, x3, x4}, gradf, hessf], \ {1000}];
      \)\), "\[IndentingNewLine]", 
    \(newendposition\  = \ {aa, bb, cc, dd}\), "\[IndentingNewLine]", 
    \(newendposition - endposition\)}], "Input"],

Cell[BoxData[
    \({\(-0.5309270755598561`\), \(-0.4537212599048488`\), 
      0.9874185815083382`, 0.`}\)], "Output"],

Cell[BoxData[
    \({0.`, 0.`, 0.`, 0.`}\)], "Output"]
}, Open  ]],

Cell[TextData[{
  "Let us check that we actually found roots for our (vector) gradient \
function ",
  Cell[BoxData[
      \(\[Del]f\)]],
  "."
}], "Text"],

Cell[CellGroupData[{

Cell[BoxData[{
    \(\(endreprules\  = \ Thread[{x1, x2, x3, x4} \[Rule] endposition];\)\), 
  "\[IndentingNewLine]", 
    \(gradf\  /. \ endreprules\)}], "Input"],

Cell[BoxData[
    \({5.551115123125783`*^-17, 0.`, 0.`, 0.`}\)], "Output"]
}, Open  ]],

Cell[TextData[{
  "Not bad at all. Now let us see if the matrix is positive definite at this \
point. If so, we have encountered a local minimizer to our function ",
  Cell[BoxData[
      \(f\)]],
  "."
}], "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(Eigenvalues[hessf\  /. \ endreprules]\)], "Input"],

Cell[BoxData[
    \({24.928917526055205`, 11.428829885014997`, 6.425711866023045`, 
      3.589577723910394`}\)], "Output"]
}, Open  ]],

Cell[TextData[{
  "It is in fact a local minimizer. Let us see what is the value of ",
  Cell[BoxData[
      \(f\)]],
  " at this point. I'll use the fancy-pants ",
  Cell[BoxData[
      \(Sequence\)]],
  " notation to get ",
  Cell[BoxData[
      \(f\)]],
  " to evaluate at the coordinates given in endposition."
}], "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(f[Sequence@@endposition]\)], "Input"],

Cell[BoxData[
    \(\(-4.92875884736865`\)\)], "Output"]
}, Open  ]],

Cell[TextData[{
  "Now that we found the local minimum, I'll note that had we started with \
better initial values (or perhaps allowed more iterations), then ",
  Cell[BoxData[
      \(FindRoot\)]],
  " would have found it as well."
}], "Text"],

Cell[CellGroupData[{

Cell[BoxData[
    \(FindRoot[Evaluate[Thread[gradf \[Equal] 0]], \ {x1, \(- .7\)}, 
      {x2, \(- .3\)}, {x3, 1}, {x4,  .2}, MaxIterations \[Rule] 100]\)], 
  "Input"],

Cell[BoxData[
    \({x1 \[Rule] \(-0.5309270755604424`\), 
      x2 \[Rule] \(-0.4537212599057029`\), x3 \[Rule] 0.9874185815081897`, 
      x4 \[Rule] 9.854069249300566`*^-17}\)], "Output"]
}, Open  ]],

Cell[TextData[{
  "A very reasonable question to ask is whether we found a global minimum. \
Offhand I do not know the answer, although I suspect that we did. How might \
we attempt to check this? First we might want to find a bounded region inide \
which we know the minimum must live. I think it is not hard to show that the \
four-dimensional hypercube ",
  Cell[BoxData[
      \({\(-2\), 2}\ x\ {\(-2\), 2}\ x\ {\(-2\), 2}\ x\ {\(-2\), 2}\)]],
  " will do, because outside this (one can show that) the positive \
exponential term will dominate. So you might just try alot of random \
evaluations inside this cube to see if any come out below our local minimum. \
Also(as mentioned in a previous handout) you can check the eigenvalues of the \
Hessian at several random locations in this cube to see if it is always \
positive definite. If so then chances are it is everywhere positive definite \
in this cube, and then the theorem from class tells us that we have a global \
minimizer."
}], "Text"]
}, Open  ]]
}, Open  ]]
},
FrontEndVersion->"4.0 for Microsoft Windows",
ScreenRectangle->{{0, 640}, {0, 424}},
WindowSize->{496, 415},
WindowMargins->{{58, Automatic}, {Automatic, 98}},
PrintingCopies->1,
PrintingPageRange->{Automatic, Automatic},
PrintingOptions->{"PrintingMargins"->{{36, 36}, {43.1875, 50.375}}}
]


(***********************************************************************
Cached data follows.  If you edit this Notebook file directly, not using
Mathematica, you must remove the line containing CacheID at the top of 
the file.  The cache data will then be recreated when you save this file 
from within Mathematica.
***********************************************************************)

(*CellTagsOutline
CellTagsIndex->{}
*)

(*CellTagsIndex
CellTagsIndex->{}
*)

(*NotebookFileOutline
Notebook[{

Cell[CellGroupData[{
Cell[1739, 51, 28, 0, 64, "Subtitle"],

Cell[CellGroupData[{
Cell[1792, 55, 85, 1, 43, "Subsubsection"],
Cell[1880, 58, 45, 0, 33, "Text"],
Cell[1928, 60, 237, 4, 90, "Input"],
Cell[2168, 66, 250, 4, 71, "Text"],

Cell[CellGroupData[{
Cell[2443, 74, 83, 1, 30, "Input"],
Cell[2529, 77, 411, 8, 84, "Output"]
}, Open  ]],
Cell[2955, 88, 396, 12, 71, "Text"],

Cell[CellGroupData[{
Cell[3376, 104, 184, 3, 70, "Input"],
Cell[3563, 109, 163, 3, 63, "Message"],
Cell[3729, 114, 182, 3, 48, "Output"]
}, Open  ]],
Cell[3926, 120, 1063, 39, 129, "Text"],

Cell[CellGroupData[{
Cell[5014, 163, 194, 3, 70, "Input"],
Cell[5211, 168, 39, 1, 29, "Output"]
}, Open  ]],
Cell[5265, 172, 151, 3, 52, "Text"],

Cell[CellGroupData[{
Cell[5441, 179, 336, 5, 110, "Input"],
Cell[5780, 186, 137, 2, 29, "Output"]
}, Open  ]],
Cell[5932, 191, 610, 22, 71, "Text"],
Cell[6545, 215, 306, 6, 90, "Input"],
Cell[6854, 223, 154, 3, 52, "Text"],

Cell[CellGroupData[{
Cell[7033, 230, 103, 2, 50, "Input"],
Cell[7139, 234, 128, 2, 29, "Output"]
}, Open  ]],
Cell[7282, 239, 1110, 18, 261, "Text"],

Cell[CellGroupData[{
Cell[8417, 261, 136, 3, 50, "Input"],
Cell[8556, 266, 250, 4, 86, "Output"]
}, Open  ]],
Cell[8821, 273, 72, 0, 33, "Text"],

Cell[CellGroupData[{
Cell[8918, 277, 298, 7, 110, "Input"],
Cell[9219, 286, 129, 2, 29, "Output"]
}, Open  ]],
Cell[9363, 291, 161, 5, 52, "Text"],

Cell[CellGroupData[{
Cell[9549, 300, 209, 5, 90, "Input"],
Cell[9761, 307, 131, 2, 29, "Output"]
}, Open  ]],
Cell[9907, 312, 165, 5, 52, "Text"],

Cell[CellGroupData[{
Cell[10097, 321, 228, 5, 90, "Input"],
Cell[10328, 328, 119, 2, 29, "Output"]
}, Open  ]],
Cell[10462, 333, 271, 7, 71, "Text"],

Cell[CellGroupData[{
Cell[10758, 344, 293, 6, 110, "Input"],
Cell[11054, 352, 119, 2, 29, "Output"],
Cell[11176, 356, 54, 1, 29, "Output"]
}, Open  ]],
Cell[11245, 360, 155, 6, 33, "Text"],

Cell[CellGroupData[{
Cell[11425, 370, 163, 3, 50, "Input"],
Cell[11591, 375, 74, 1, 29, "Output"]
}, Open  ]],
Cell[11680, 379, 214, 6, 52, "Text"],

Cell[CellGroupData[{
Cell[11919, 389, 70, 1, 30, "Input"],
Cell[11992, 392, 123, 2, 29, "Output"]
}, Open  ]],
Cell[12130, 397, 326, 11, 52, "Text"],

Cell[CellGroupData[{
Cell[12481, 412, 57, 1, 30, "Input"],
Cell[12541, 415, 56, 1, 29, "Output"]
}, Open  ]],
Cell[12612, 419, 244, 6, 52, "Text"],

Cell[CellGroupData[{
Cell[12881, 429, 168, 3, 50, "Input"],
Cell[13052, 434, 190, 3, 48, "Output"]
}, Open  ]],
Cell[13257, 440, 1002, 16, 223, "Text"]
}, Open  ]]
}, Open  ]]
}
]
*)




(***********************************************************************
End of Mathematica Notebook file.
***********************************************************************)

