(* :Title: MultiplierMethod *)
(* :Context: Optimization` *)
(* :Author: Jean-Christophe Culioli *)
(* :Email: culioli@cas.ensmp.fr *)
(* :Affiliation: Centre Automatique et Systemes, *)
(* Ecole des Mines de Paris, FRANCE *)
(* :Author: Joseph P. Skudlarek *)
(* :Affiliation: Cypress Semiconductor Corp *)
(* Data Communications Division *)
(* Beaverton, Oregon 97008 *)
(* :Summary:
This package provides the function MultiplierMethod
for minimization of a nonlinear objective function
subject to equality and / or inequality constraints.
*)
(* :Package Version: 1.93 *)
(* NB: update Version in Options if change *)
(* :Mathematica Version: 5.0 *)
(* :Copyright: Copyright 1994, J.-C Culioli,
Ecole des Mines de Paris, France
This package may be copied in its entirety for nonprofit
purposes only. Sale, other than for the direct cost of the
media, is prohibited. This copyright notice must accompany all
copies. *)
(* :Copyright: Copyright 2000-2002,2004 Joseph P. Skudlarek *)
(**********************************************************************)
(* TODO:
To allow warm/directed start: handle options
EqualityLambda -> initialValue
InequalityLambda -> initialValue
Consider returning history and/or iteration count.
*)
(**********************************************************************)
(* :History: V. 1.93, 13 June 2004, by Jskud : FindMinimum changed in
Mma 5.0, requiring rework. *)
(* :History: V. 1.92, 21 July 2002, by Jskud : add WRIGHT9 problem to
MultiplierMethod.nb, and reformat notebook a little; clarify
TODO task. *)
(* :History: V. 1.91, 17June 2002, by Jskud : reorder history, so most
recent is first. Make code more robust by Take 3 vice Drop -1
in SameTest for FixedPoint. Clean up some variable names. Add
TODO list. *)
(* :History: V. 1.9, 11 June 2002, by Jskud : correct two of the
example problems -- TP2 p. 25 solution was not optimal;
TP340 p. 161 was not well formed, so add constraints.
Add Nocedal and Wright reference, and update comments in
OptimalityConditions. *)
(* :History: V. 1.8, 08 June 2002, by Jskud : ignore penalty stability
("ck") when computing a fixed point. Define Version::usage to
cause local definition and avoid spell1 warning. Fix typo in
comment *)
(* :History: V. 1.7, 27 February 2001, by Jskud : update copyright
notice; add Version to Options; update discussion to include
corrected summary; roll version number to ensure have
unambiguous release. *)
(* :History: V. 1.6, 24-26 February 2001, by Jskud : fix nasty FixUp
vice fixUp typo! (found while investigating suspect p (lambda)
values); return {nadm, nKKT, nslack} with DualParameters; add
comments justifying Derivative[0, 1][Max][0., theta_] :> 1
replacement; improve parchange messages for invalid parameter
values; add MultiplierMethodNotes; expand CheckConvergence and
MultiplierMethod usage messages. *)
(* :History: V. 1.5, 11 September 2000, by Jskud : Fixed bug: restore
message state to the way it was on entry. Fixed bug: allow
InnerIterations to be passed in as an option (determining the
max inner iteration value and passing it into
MultiplierMethodStep[] seemed to speed things up by about 10%).
Added caveat to usage "result is only a candidate solution",
when I discovered that minimizing x subject to no constraints
silently gave a bogus result. Added more keywords. Added
contact information for myself. Updated the associated
demonstration-and-verification notebook. Made yet another
editing pass through the file for wording and formating. Added
more comments. Verified package works in Mma 2.2, Mma 3.0, and
Mma 4.0. *)
(* :History: V. 1.4, 05 September 2000, by Jskud : Strengthened the
method by allowing Augmentation (the penalty weight, "c") to
optionally increase toward infinity by supporting
AugmentationGrowth. Pulled the computation of the gradient of
the augmented Lagrangian out of the inner loop -- seems to
reduce runtime by about 5%. More reformattting. Add more
keywords. Add more verbage to Limitations. *)
(* :History: V. 1.3, 04 September 2000, by Jskud : fixed bug checking
for satisfaction of inequality constraints, so no longer have
silently erroneous results; fixed bug checking for invalid
inequality-associated Lagrange multipliers ("nonposq"); fixed
bug checking for complementary slackness ("nonslackness");
remove hardwired tolerance setting, and set based on convergence
goal. Reformatted source to make code structure more apparent,
added comments, and removed redundent routine. Renamed "Grad"
to "grad" to reduce risk of subsequent collision with
Mathematica provided names. Used "Karush-Kuhn-Tucker"
throughout. *)
(* :History: V. 1.2, 17 January 2000, by Jskud : modified package so it
works in Mathematica 3.0, and so that the associated
demonstration-and-verification notebook works without error by
adding internal fixUp[] routine to KKT generation, and special
casing the gradient passed to FindMinimum. Partially
reformatted file for editing convenience, and added some
explanation of complex Mathematica coding. Note that the
results in the associated notebook have changed for one problem
(which shows up multiple times) -- for "TP217 p. 41", the
initial starting point of {10.,10.} converges to a poorer local
minimum -- use the starting point {1., 1.} to get better
results. *)
(* :History: V. 1.1, 17 September 94, by JCC : changed the use of
FindRoot for inner optimization to FindMinimum with specified
gradient, using the rule: Derivative[0, 1][Max][0., theta_]:> 1.
Suppressed any approximation to the Max[0,x] function. Slightly
slower on some problems but much more robust.*)
(* :History: V. 1.0, 10 September 94, by JCC *)
(* :Keywords:
optimization, objective function, minimize, constraints,
equality constraints, linear constraints, nonlinear constraints,
equalities, inequalities, augmented Lagrangian, Uzawa algorithm,
Primal-Dual method, method of multipliers, multiplier method,
quadratic programming, nonlinear optimization,
constrained optimization. *)
(* :Limitations: (Theoretical) The problem solved is Min f(x) subject to
g(x) = 0, h(x) <= 0 where f and h are convex and g is affine --
no proof of convergence to a solution in most other cases (with
fixed penalty parameter). Other results are in Bertsekas. *)
(* :Limitations: (Practical) This program does not check for optimality
of the returned "solution" which should thus be considered as a
simple "candidate". Some information about infeasibility or KKT
conditions not being met is given. Note that the
CheckConvergence option only controls convergence, neither
optimality nor feasibility. *)
(* :Discussion: The method is a gradient ascent algorithm (Uzawa
algorithm) applied to the dual functional with Augmented
Lagrangian. Or as we say in the MathSource summary:
This is an implementation of the Method of Multipliers (also
known as the Augmented Lagrangian Method) due to Hestenes,
Powell, Rockafellar and others. It solves nonlinear programming
minimization problems with inequality and/or equality
constraints. As such, it is a natural generalization of the
FindMinimum built-in Mathematica function. See for example
D. G. Luenberger, "Linear and Nonlinear Programming" (2nd Ed.),
Addison-Wesley, 1984. See also Dimitri P. Bertsekas,
"Constrained Optimization and Lagrange Multiplier Methods",
Athena Scientific, 1986; and Dimitri P. Bertsekas, "Nonlinear
Programming" (2nd Ed.), Athena Scientific, 1999; and Jorge
Nocedal and Stephen J. Wright, "Numerical Optimization",
Springer-Verlag, 1999.
Starting with version 1.3, we follow Bertsekas; thus, the method
and comments may no longer faithfully represent the Uzawa
algorithm. *)
(* :Implementation: The use of FindMinimum as an internal algorithm lead
us to cancel all messages and to imitate its usual output even
when internal convergence messages FindMinimum::fmlim prevents
the output. As a complement, one advantage of this is that we
can set the InnerIterations option to 1, thus mimicking
Arrow-Hurwicz algorithm! *)
BeginPackage["Optimization`MultiplierMethod`"];
MultiplierMethod::usage = "MultiplierMethod[f,g,h,x,x0,opt] attempts to
find a local solution to a minimization problem where: f is the
criterion to be minimized, g is the list (possibly empty) of
equality constraints, h is the list (possibly empty) of <= 0
inequality constraints, x = {x1,x2,...} is a list of the
variables, x0 is an initial vector for x, opt is an optional
list of options. It returns a list {f*,{x1 -> x1*, etc}} like
FindMinimum. With the option DualParameter -> True it can also
provide information on feasibility and Lagrange and/or
Karush-Kuhn-Tucker multipliers. NB: This routine does not check
all intermediate steps for success nor check the final result
for optimality; therefore, the final result should be considered
a candidate, not a certain, solution. See also
MultiplierMethodNotes.";
MultiplierMethodNotes::usage = "Internally, the Multiplier Method
generates the messy function Derivative[0, 1][Max][0., theta_],
and replaces *all* such instances with the constant 1 (knowing
that it's safe to do so for internally generated functions, and
presuming that no user specified functions use Max[0,.] or
Derivative[0,1][Max][0,.]). Therefore, do not provide objective
or constraint functions which employ either construct. If you
need something similiar, variations on Max[0,0,.] should work.";
OuterIterations::usage = "OuterIterations is an option for
MultiplierMethod specifying the maximum number of outer
iterations (gradient ascent for the dual parameters). The
default setting is OuterIterations -> 30.";
InnerIterations::usage = "InnerIterations is an option for
MultiplierMethod specifying the maximum number of inner
iterations. The default setting is InnerIterations -> 5.";
DualParameter::usage = "DualParameter is an option for MultiplierMethod.
With the default setting DualParameter -> False, no information
is provided on the dual parameters (Lagrange and/or
Karush-Kuhn-Tucker multipliers). With DualParameter -> True,
feasibility information and the dual parameters are appended to
the primal solution using this format: { f*, {x -> x*, ...}, g*,
{p -> p*, ...}, h*, {q -> q*, ...}, {admissibleViolation,
KKTViolation, slackViolation} }, where p's and q's are
automatically created variables.";
CheckConvergence::usage = "CheckConvergence is an option for
MultiplierMethod. With the setting CheckConvergence -> False,
convergence of the algorithm is not checked, and control is left
to the user with the OuterIterations option. With the default
CheckConvergence -> True, the stopping test is
Function[{u,v},Max[Abs[u-v]]< 10^-k] where k is the value of the
ConvergenceGoal option. The total number of iterations is
however limited to OuterIterations. Note that the
CheckConvergence option only controls convergence, neither
optimality nor feasibility.";
Version::usage = "Version is an option for MultiplierMethod.
The default value is the current version of the software."
Augmentation::usage = "Augmentation is an option for MultiplierMethod.
It gives the value of the augmentation (penalty) parameter for
the augmented Lagrangian. The default setting is Augmentation
-> 10. It should be decreased in case of initial explosion and
increased in case of slow convergence.";
(* AugmentationGrowth is a *fixed* growth parameter because the package
was originally written to use FixedPoint[] or Nest[]. Allowing
Augmentation's growth to depend on the change in the constraint
violation amount would require passing even more state info,
something I am not willing to do right now. // Jskud 10Sep00 *)
AugmentationGrowth::usage = "AugmentationGrowth is an option for
MultiplierMethod. It gives the value of the fixed growth factor
for the Augmentation parameter. The default setting is
AugmentationGrowth -> 1 (for compatibility with earlier versions
of MultiplierMethod). It should be increased to speed
convergence or to better enforce constraints.";
ConvergenceGoal::usage = "ConvergenceGoal is an option for
MultiplierMethod. It gives the precision of the convergence
test. Its default setting is 6 which is very low but quick. It
should be increased to 10 or more for better precision.";
MultiplierMethod::parchange =
"Inappropriate parameter value: `2` -> `3`; `1`; changed to `4`.";
MultiplierMethod::nonKKT=
"Karush-Kuhn-Tucker conditions not satisfied better than `1`.";
MultiplierMethod::nonadmissible=
"current point `1` from admissible.";
MultiplierMethod::nonposq=
"Internal Error: non admissible Karush-Kuhn-Tucker multipliers.";
MultiplierMethod::nonslackness=
"complementary slackness not satisfied; slack violation is `1`.";
Begin["`private`"];
(* use distinctive names until we can turn off General::spell1 *)
disableOneMessage[msg_] := Module[
(* use ReleaseHold on the result to re-establish message setting *)
(* Unfortunately, we need to depend on undocumented aspects of
Mathematica (e.g., that $Off[] is used to effect Off[]) to
determine if the message was on or off before we disabled it, so
we can determine if we must turn the message back on. *)
{restore = Null},
If[ Head[Evaluate[msg]] =!= $Off, restore = Hold[On[msg]] ];
Off[msg];
restore
];
disableMessages[msgSequence__] :=
disableOneMessage /@ Unevaluated[ {msgSequence} ];
SetAttributes[ { disableOneMessage, disableMessages }, HoldAll ];
spellState = disableMessages[ General::spell1, General::spell ];
grad[function_,var_List]:= D[function,#]& /@ var ;
pos[x_List]:=Map[Max[0.,#]&,x];
(* phi: penalty term for inequality constraints *)
phi[h_,q_,c_]:= Plus @@ ( 1/(2. c) (pos[q + c h]^2 - q^2) );
pvar[i_]:= ToExpression[ToString["NLP`p"]<>ToString[i]];
qvar[i_]:= ToExpression[ToString["NLP`q"]<>ToString[i]];
cvar[i_]:= ToExpression[ToString["NLP`c"]<>ToString[i]];
(* when there are no equality constraints,
grad[{},{x,y}].{}
=>
{{},{}}.{}
=>
{0}
which fails to add up in Mathematica 3.0.
To solve this problem, we fixUp[] the result for this one special case;
in effect, we convert `{0}' into `0', which is "Thread-able" with Plus.
-- Jskud 14Jan00
*)
fixUp[fixUpX_, fixUpN_] := If[fixUpN == 0, 0, fixUpX];
Options[MultiplierMethod] = {
Version -> 1.93,
Augmentation -> 10.,
AugmentationGrowth -> 1,
CheckConvergence -> True,
DualParameter -> False,
InnerIterations -> 5,
OuterIterations -> 30,
ConvergenceGoal -> 6};
(*
This code Bertsekas
Min f(x) s.t. Min f(x) s.t.
g(x) = 0 h(x) = 0
h(x) <= 0 g(x) <= 0
f f
p lambda
g h
c c
h g
q mu
*)
MultiplierMethod[f_,g_List,h_List,x_List,x0_List,opts___]:=
Module[
{xk = x0, (* primal variables *)
p, pk = Table[0.,{Length[g]}], (* "==" Lagrange multipliers *)
q, qk = Table[0.,{Length[h]}], (* "<=" Lagrange multipliers *)
c, ck, (* varying penalty parameter *)
augmLag, (* augmented Lagrangian *)
gradAL, (* Grad augmented Lagrangian *)
msgState, (* incoming state of messages*)
b, maxiter, useFixedPoint, (* option values *)
cvgoal, dualinfo, (* option values *)
ii, (* inner iteration option val*)
repx, repp, repq, (* variable replacements *)
nadm, nKKT, nslack}, (* optimality violations *)
{ck, b, maxiter, ii, useFixedPoint, cvgoal} =
{Augmentation, AugmentationGrowth,
OuterIterations, InnerIterations,
CheckConvergence, ConvergenceGoal}
/. {opts} /. Options[MultiplierMethod];
p = Map[pvar,Range[Length[g]]]; (* create aux "==" variables *)
q = Map[qvar,Range[Length[h]]]; (* create aux "<=" variables *)
c = cvar[0]; (* create varying penalty var*)
augmLag = f + p.g + c/2. g.g + phi[h,q,c];
(* The augmented Lagrangian involves phi which includes summands of
the form (cf. Bertsekas 1999, p. 397)
Max[0,theta]^2.
We use the gradient of the augmented Lagrangian, which we get
by taking the derivative; this results in summands of the form
2 Max[0, theta] * Derivative[0,1][Max][0, theta] * D[theta, x].
That's messy. Derivative[0,1][Max][0,theta] is either 0 or 1,
depending on theta. But because (in this context) the
derivative is always multiplied by Max[0, theta], we can always
set the derivative to 1, knowing that Max[0, theta] will force a
zero if the derivative should have been zero. Nice job,
Jean-Christophe!
The following rule replaces those difficult derivatives of
Max[0,.] with the less-troublesome but valid-in-this-context
constant 1.
*)
gradAL = grad[augmLag,x] //. Derivative[0, 1][Max][0., theta_]:> 1;
(* Unlike in Mathematica 2.2, Mathematica 3.0's FindMinimum gets
confused if grad is a vector when there is only a single
variable; so we convert a vector of length 1 to a scalar to
handle this quirk. -- Jskud 15Jan00
However, Mma 5.0 complains if the gradient is a scalar -- it
must be a vector; so avoid the rewrite if we're running Mma 5.0.
*)
If[ Length[gradAL] == 1 && TrueQ[$VersionNumber < 5.0], gradAL = gradAL[[1]] ];
If[ !NumberQ[cvgoal] || cvgoal < 0 || cvgoal > $MachinePrecision,
Message[MultiplierMethod::parchange,
"not a number, less than 0, or exceeds machine precision",
ConvergenceGoal, cvgoal,
cvgoal = ConvergenceGoal /. Options[MultiplierMethod]
]
];
If[ !NumberQ[b] || b < 1 || b > 10,
Message[MultiplierMethod::parchange,
"not a number, less than 1, or greater than 10",
AugmentationGrowth, b,
b = AugmentationGrowth /. Options[MultiplierMethod]
]
];
msgState = disableMessages[
FindMinimum::fmcv,
FindMinimum::fmlim,
FindMinimum::cvmit,
FindMinimum::lstol,
FindMinimum::fmgz,
FindMinimum::sdprec
];
Which[
useFixedPoint === False,
{xk, pk, qk, ck} =
Nest[
MultiplierMethodStep[augmLag,gradAL,g,h,b,ii,x,p,q,c,#]&,
{xk,pk,qk,ck}, maxiter],
useFixedPoint === True,
{xk, pk, qk, ck} =
FixedPoint[
MultiplierMethodStep[augmLag,gradAL,g,h,b,ii,x,p,q,c,#]&,
{xk,pk,qk,ck}, maxiter,
SameTest -> Function[{u,v},
Max[Take[Abs[u-v],3]] < 10^-cvgoal]
],
True,
Message[MultiplierMethod::opttf,CheckConvergence,useFixedPoint]
];
ReleaseHold[msgState];
repx =Thread[x -> xk]; repp =Thread[p -> pk]; repq =Thread[q -> qk];
{nadm, nKKT, nslack} =
OptimalityConditions[f,g,h,x,p,q,repx,repp,repq,cvgoal];
dualinfo = DualParameter /. {opts} /. Options[MultiplierMethod];
Which[
dualinfo === False,
{(f /. repx), repx},
dualinfo === True,
{ (f /. repx), repx,
(g /. repx), repp,
(h /. repx), repq,
{nadm, nKKT, nslack}
},
True,
Message[MultiplierMethod::opttf,DualParameter,dualinfo]
]
];
MultiplierMethodStep[
augmLag_,gradAL_,g_,h_,b_,iter_,x_,p_,q_,c_,{xk_,pk_,qk_,ck_}]:=
Module[
(* return {xk, pk, qk, ck} *)
{
lastx, lagk, gradk, repx
},
{lagk, gradk} = {augmLag, gradAL}
/. Thread[p -> pk]
/. Thread[q -> qk]
/. (c -> ck);
(* The "function" to be evaluated by FindMinimum is a compound
statement -- the first statement captures the last value of x
(so we have it available in case FindMinimum exits without
finding a solution), while the second statement computes the
value of the function; `##' is a sequence splicer, and `@@'
is Apply, so the result is that the {variable, value} pairs
(namely, {{x0, x0k}, {x1, x1k}, ...}) are spliced into the
FindMinimum expression as individual arguments. *)
If[TrueQ[$VersionNumber < 5.0],
FindMinimum[lastx = x; lagk, ##, MaxIterations -> iter,
Gradient -> gradk ]& @@ Transpose[{x,xk}],
(* Mma 5.0's FindMinimum changed function evaluation semantics;
so we changed to this to capture the latest value of x. *)
FindMinimum[lagk, ##, MaxIterations -> iter,
EvaluationMonitor :> (lastx = x),
Gradient -> gradk ]& @@ Transpose[{x,xk}]
];
(* force output even if it is prevented by FindMinimum *)
repx = Thread[x -> lastx];
(* Generate our results *)
{x, pk + ck g, pos[qk + ck h], ck b} /. repx
];
OptimalityConditions[f_,g_,h_,x_,p_,q_,repx_,repp_,repq_,cvgoal_]:=
Module[
{
toladm, tolKKT, tolslack,
admissible, nadm, KKT, nKKT, slackness, nslack, posq
},
(* FYI: According to Nocedal and Wright, p. 328, all these first
order necessary conditions are considered the KKT conditions. *)
(* Avoid hardwiring the tolerances since we support a
convergence goal; rather, set the tolerances based on the
cvgoal to avoid generating unexpected messages when given a
low convergence goal. *)
toladm = tolKKT = tolslack = 10^-cvgoal;
admissible = {g , pos[h]} /. repx ;
If[(nadm = Max[Abs[admissible]]) > toladm,
Message[MultiplierMethod::nonadmissible,nadm]];
KKT = grad[f,x]
+ fixUp[grad[g,x].p, Length[p]]
+ fixUp[grad[h,x].q, Length[q]] /. repx /. repp /. repq;
If[(nKKT = Sqrt[KKT.KKT]) > tolKKT,
Message[MultiplierMethod::nonKKT,nKKT]];
slackness = q.h /. repx /. repq ;
If[(nslack = Max[Abs[slackness]]) > tolslack,
Message[MultiplierMethod::nonslackness, nslack]];
posq = Apply[And,NonNegative[q /. repq]];
(* should never happen *)
If[!posq,Message[MultiplierMethod::nonposq]];
{nadm, nKKT, nslack}
];
ReleaseHold[spellState];
End[];
EndPackage[];
(*[]*)