(*:Mathematica Version: 2.2 *)

(*:Package Version: 1.0 *)

(*:Name: Statistics`MixtureEstimation` *)

(*:Context: Statistics`MixtureEstimation` *)

(*:Title: Mixture Estimation via Expectation Maximization   *)

(*:Author: E. C. Martin *)

(*:History:
  E. C. Martin (Wolfram Research), March 1995.
*)

(*:Copyright:
  Copyright 1995, Wolfram Research, Inc.
*)

(*:Summary:
This package uses the EM (expectation maximization) algorithm to 
simultaneously estimate the missing data and unknown parameter(s) associated
with a data set.  Here the missing data are assumed to be the identities of
the observations originating from each of the two distributions
contributing to the mixture.  The parameters describe the component
distributions of the mixture;  the distributions may be continuous or discrete.
*)

(*:Keywords:
EM, expectation maximization, parameter estimation, mixture distribution,
maximum likelihood estimation *)

(*:Requirements: No special system requirements. *)

(*:Warning:
There are problems with comparing the maximized likelihoods for the
simple and mixture models, as a means of testing for the existence of a
mixture.  (1) For small samples, the asymptotic chi-squared distribution 
for the test statistic may not hold satisfactorily.  (2) The asymptotic
chi-squared distribution is really only valid if epsilon is known, which
of course it is not.  See the M. Aitkin & G. Tunnicliffe Wilson
reference for a discussion of this.
*)

(*:Limitations:
This package is only an indication of how the estimation-maximization
algorithm may be used to find the mixture of any two component distributions,
discrete or continuous, underlying the observed data.  For certain component
distributions, the algorithm can be coded more efficiently.  In particular,
the parameter estimates at each stage are often found using weighted
least squares rather than the numerical solve (NSolve) implemented here.
This package  allows completely general mixtures, so some sacrifice is made
in efficiency. 

Currently this package does not provide information on parameter estimates such
as the standard errors.

The EM algorithm is well known to have only linear convergence.
*)

(*:Discussion:
    NOTE that the EM algorithm involves approximating
                ((1-epsilon) D[pdf1, parm] + epsilon D[pdf2, parm])/
                        ((1-epsilon) pdf1 + epsilon pdf2)
       with
                ((1-epsilon) D[pdf1, parm]/pdf1 + epsilon D[pdf2, parm]/pdf2)
       when solving for the maximum likelihood estimates of the parameters. 
*)
 
(*:Sources:
  Elements of Statistical Computing, Thisted, Chapman & Hall, 1988.
  "Mixture Models, Outliers, and the EM Algorithm", M. Aitkin &
	G. Tunnicliffe Wilson, Technometrics, Vol. 22, No. 3, Aug. 1980.
  "Maximum Likelihood from Incomplete Data via the EM Algorithm", 
	A. P. Dempster, N. M. Laird, D. B. Rubin, J. Roy. Statist. Soc.,
	B, 39, pp. 1-38, 1977.
  "Mixture Densities, Maximum Likelihood and the EM Algorithm",
	R. A. Redner and H. F. Walker, SIAM Review, Vol. 26, No. 2, April 1984.
*)


BeginPackage["Statistics`MixtureEstimation`",
	     "Statistics`ContinuousDistributions`",
             "Statistics`NormalDistribution`",
             "Statistics`DiscreteDistributions`",
             "Statistics`DescriptiveStatistics`",
	     "Statistics`Common`DistributionsCommon`"]

MixtureEstimation::usage =
"MixtureEstimation[data, epsilon, {dist1, dist2}, {parm1, parm2, ...}]
gives a list of replacement rules for the distribution parameters
parm1, parm2, ... and the mixing parameter epsilon describing the mixture
distribution ((1-epsilon) dist1 + epsilon dist2)  assumed to underly the data.  
The parameter epsilon and the parameter(s) of the two component
distributions are estimated via the EM (expectation maximization) algorithm.
The unknown vector indicating which observations are due to which distribution
represents missing data.  The initial probability of each observation
deriving from dist1 can be specified using the option InitialProbability1.
In the case of continuous data, data is a vector {d1, d2, ...} giving the
explicit observations.  In the case of discrete data, data is a list of pairs
{{l1, m1}, {l2, m2}, ...}, where mi is number of datums observed to take on
discrete value li.  Initial values p1, p2, ... for the parameters
parm1, parm2, ... may be specified by replacing the list {parm1, parm2, ...}
with the list of pairs {{parm1, p1}, {parm2, p2}, ...}.  An initial value
epsilon0 for the mixing parameter epsilon may be specified by replacing epsilon
with the list {epsilon, epsilon0}."

InitialProbability1::usage =
"InitialProbability1 is an option of MixtureEstimation indicating the initial
probability of each of the observations in data deriving from the first
distribution.  It may be set to a vector having the same length as the data, or
the default Automatic.  In the case of continuous data, Automatic means that
the smallest two observations will be considered to have come from the first
distribution and the remaining observations from the second.  In the case of
discrete data, Automatic means that all the observations of the first discrete
value in data will be assumed to come from the first distribution, while the
remaining observations will be assumed to come from the second.  
InitialProbability1 -> {p1, p2, ...}, pi > 0, p1 + p2 + ... = 1, specifies
a vector of probabilities.  For continuous data, pi is the probability that
the ith datum derives from the first distribution in the mixture.  For discrete
data, pi is the probability that the observations of the ith discrete value
derive from the first distribution." 

FinalProbability1::usage =
"FinalProbability1 is used in the output of MixtureEstimation to indicate the
final estimates of the probability of the data deriving from the first
distribution in the mixture, as determined by the expectation maximization
algorithm."

MixtureModelDiagnostic::usage =
"MixtureModelDiagnostic is used in the output of MixtureEstimation to
indicate whether the null hypothesis (data derives from the first distribution
only) should be rejected in favor of the alternative hypothesis (data derives
from a mixture of the first and second distributions)."

LogLikelihood1::usage =
"LogLikelihood1 is used in the output of MixtureEstimation to 
indicate the log-likelihood achieved using a model based on the first
distribution only.  If there are any parameters in this model, the
log-likelihood is maximized with respect to the parameters."

LogLikelihoodMixed::usage =
"LogLikelihoodMixed is used in the output of MixtureEstimation to 
indicate the log-likelihood achieved using a model based on the 
mixture distribution.  The log-likelihood is maximized with respect
to the parameters."

ShowProgress::usage =
"ShowProgress is an option of MixtureEstimation.  If True, a summary of
intermediate results for each iteration of the search for the best parameters
will be printed."

Begin["`Private`"]

Options[MixtureEstimation] := {AccuracyGoal -> Automatic,
	InitialProbability1 -> Automatic,
	MaxIterations -> 30, PrecisionGoal -> Automatic,
	ShowProgress -> False, WorkingPrecision -> 16}


(* ========================= Utility Functions =========================== *)

validArgumentsQ[data_, dist1_, dist2_, plist_] :=
  Module[{contexts, plist1, plist2, x},
    contexts = Sort[{Context[Evaluate[Head[dist1]]],
		     Context[Evaluate[Head[dist2]]]}];
    If[VectorQ[data],	(* data is continuous-valued *)
       If[MemberQ[contexts, "Statistics`DiscreteDistributions`"],
	  Message[MixtureEstimation::contdata];
	  Return[False]  ];
       If[Length[Union[data]] < 2,
	  Message[MixtureEstimation::data];
	  Return[False]  ]
    ];
    If[MatchQ[Dimensions[data], {_Integer, 2}], (* data is discrete-valued *)
       If[MemberQ[contexts, "Statistics`NormalDistribution`"] ||
	  MemberQ[contexts, "Statistics`ContinuousDistributions`"],
	  Message[MixtureEstimation::discdata];
	  Return[False]  ];
       If[Length[Union[Map[First, data]]] < 2,
	  Message[MixtureEstimation::data];
          Return[False]  ]
    ];  	    
    If[!FreeQ[PDF[dist1, x], PDF],
	Message[MixtureEstimation::pdf, dist1]; Return[False]];
    If[!FreeQ[PDF[dist2, x], PDF],
	Message[MixtureEstimation::pdf, dist2]; Return[False]];
    plist1 = Sort[Map[If[VectorQ[#], First[#], #]&, plist]];
    plist2 = Sort[Select[
		  Union[Apply[List, dist1], Apply[List, dist2]],
			 (!NumberQ[N[#]])&]];
    (* check whether arguments of the distributions contain at least one
	of the specified parameters *)
    If[plist2==={},
        Message[MixtureEstimation::noparm]; Return[False]];
    If[Apply[Or, Map[ListFreeQ[#, plist1]&, plist2]],
	Message[MixtureEstimation::parm, plist1, plist2]; Return[False]];
    True 
] 	

ListFreeQ[expr_, list_] := Apply[And, Map[FreeQ[expr, #]&, list]]

printprogress[iteration_, epsilon_, params_] :=
   (
    Print[StringForm[
        "Iteration:`1`  Epsilon:`2`  Parameters:`3`",
            iteration, NumberForm[N[epsilon], 6], NumberForm[N[params], 6] ]]
   )


(* ============================= Messages ================================= *)

MixtureEstimation::contdata =
"Data is in the form of continuous-valued data so it may not be modeled as
coming from a mixture involving a discrete distribution."

MixtureEstimation::discdata =
"Data is in the form of discrete-valued data so it may not be modeled as
coming from a mixture involving a continuous distribution."

MixtureEstimation::data =
"Data consists of fewer than two distinct values."

MixtureEstimation::pdf =
"The probability density function PDF[``, x] is not defined."

MixtureEstimation::parm =
"One or both of the distributions are specified by parameters not included in 
the parameter list ``."

MixtureEstimation::noparm =
"There are no unknown parameters in the two component distributions."

MixtureEstimation::initial =
"Warning: The initial probabilities that the observations derive from the
first distribution, as specified by InitialProbability1, are invalid.
Using default Automatic."

MixtureEstimation::prob =
"Warning: When the length of continuous data is only 2, the Automatic procedure 
of determining the initial probabilities that the observations derive from 
first distribution is special.  It is assumed that initially the smallest datum 
derives from the first distribution with probability 1, and the largest datum
derives from the first distribution with probability 0."

MixtureEstimation::maxiter = 
"Warning: MaxIterations must be a positive integer.  Using MaxIterations -> ``."

MixtureEstimation::workprec =
"Warning: WorkingPrecision must be an integer greater than or equal to 16.
Using WorkingPrecision -> ``."

MixtureEstimation::accgoal =
"Warning: AccuracyGoal must be a nonnegative integer.  Using AccuracyGoal ->
``."

MixtureEstimation::precgoal =
"Warning: PrecisionGoal must be a nonnegative integer.  Using PrecisionGoal ->
``."

MixtureEstimation::nsolvefail =
"Unable to find the values of the parameters using NSolve on the
maximum likelihood equations." 

MixtureEstimation::polyfail =
"The maximum likelihood method of estimating the initial values of the 
parameters yielded non-polynomial equation(s).  Specify initial value(s)
for the parameter(s) in list ``."

MixtureEstimation::parmfail =
"Unable to find the values of the parameters using FindRoot on the
maximum likelihood equations." 

MixtureEstimation::badprob =
"Expectation step yielded one or more probabilities that were either
complex or did not lie in [0, 1]."

MixtureEstimation::noconv =
"Warning: MaxIterations of `` exceeded before parameter(s) satisfied
AccuracyGoal and PrecisionGoal."

MixtureEstimation::polyexp =
"MLE equations include a polynomial equation in `1` of order `2`.
An initial value should be suggested for `1`."

MixtureEstimation::notmix =
"Warning: The maximized log-likelihood is smaller for the mixture model than 
for the model consisting of simply the first distribution.  This is strong
evidence against the mixture model."

modelreport[p_] :=
  If[p < 0.05,
     StringForm["P-value is ``, so reject null hypothesis of no mixture at the 5 percent significance level:  there is strong evidence to support the mixture model.", p],
     StringForm["P-value is ``, so fail to reject null hypothesis of no mixture at the 5 percent significance level:  there is little evidence to support the mixture model.", p]
  ]


(* =========================== MixtureEstimation =========================== *)

MixtureEstimation[data_List,
	eps:(_Symbol | {_Symbol, _?((NumberQ[N[#]])&)}), {dist1_, dist2_},
        plist:{(_Symbol | {_Symbol, _?((NumberQ[N[#]])&)})..}, opts___] :=
  Module[{initial, maxiter, accgoal, precgoal, workprec, progress, n,
	  sorted, min1, min2, sum, result},
   (
   result 
   ) /; (
	{initial, maxiter, accgoal, precgoal, workprec, progress} = 
	{InitialProbability1, MaxIterations,
	 AccuracyGoal, PrecisionGoal, WorkingPrecision, ShowProgress} /.
		 {opts} /. Options[MixtureEstimation];
	n = Length[data];
        If[!(  (VectorQ[initial] && Length[initial] == n &&
	        Apply[And, Map[(0 <= N[#] <= 1)&, initial]]) || 
	       initial===Automatic  ),
           Message[MixtureEstimation::initial];
           initial = Automatic];	
	If[initial === Automatic,
	   If[VectorQ[data],	(* continuous data *)
	      sorted = Sort[data];
	      {min1, min2} = sorted[[{1, 2}]];
	      If[Length[data] == 2,
		 Message[MixtureEstimation::prob];
		 initial = Map[If[#-min1===0, 1, 0]&, data],
	         initial = Map[If[#-min1===0 || #-min2===0, 1, 0]&, data]
	      ],
	   			(* discrete data *)
	      initial = Join[{1}, Table[0, {n-1}]]
	   ]
	];
	If[initial === $Failed,
	   result = $Failed,
	   If[!(IntegerQ[maxiter] && iter > 0),
	      maxiter = 30;
	      Message[MixtureEstimation::maxiter, maxiter]];
   	   If[!(IntegerQ[workprec] && workprec >= 16), 
	      workprec = 16;
	      Message[MixtureEstimation::workprec, workprec]];
           If[accgoal === Automatic,
	      accgoal = workprec-10,
	      If[!(IntegerQ[accgoal] && accgoal >= 0),
	         accgoal = workprec-10;
	         Message[MixtureEstimation::accgoal, accgoal]
	      ]];
	   If[precgoal === Automatic,
	      precgoal = workprec-10,
	      If[!(IntegerQ[precgoal] && precgoal >= 0),
	         precgoal = workprec-10;
	         Message[MixtureEstimation::precgoal, precgoal]
	      ]];
	   result = EM[data, eps, {dist1, dist2}, plist, initial,
		maxiter, workprec, accgoal, precgoal, progress]
	];
	result =!= $Failed
	)
  ] /; Length[data] > 1 && Apply[And, Map[NumberQ, Flatten[N[data]]]] &&
	FreeQ[N[data], Complex] && validArgumentsQ[data, dist1, dist2, plist]


(* ================================== EM ================================== *)


EM[data_?VectorQ, eps_, {dist1_, dist2_}, plist_, initialprob1_,
	 maxiter_, workprec_, ag_, pg_, prog_] :=
  Module[{n = Length[data], epsilon, epsilonRule,
	  parameters, x, pdf1, pdf2, derivatives, llD, pos, posC, parmRule,
	  llD0, parameters0, parmRule0, lhs,
	  prob1, oldparmRule, probability1, i, delta, 
	  accgoal = N[10^-ag], precgoal = N[10^pg], diagRule,
	  progress = TrueQ[prog]},
    If[VectorQ[eps],
       epsilon = eps[[1]];
       epsilonRule = {epsilon -> N[eps[[2]], workprec]},
       (* find initial value for epsilon *)
       epsilon = eps;
       epsilonRule = {epsilon -> N[Apply[Plus, 1-initialprob1]/n, workprec]}
    ];		

    parameters = Map[If[VectorQ[#], First[#], #]&, plist];

    pdf1[x0_] := PDF[dist1, x0];  pdf2[x0_] := PDF[dist2, x0];
    derivatives = Map[((1-epsilon) Simplify[D[pdf1[x], #]/pdf1[x]] +
		epsilon Simplify[D[pdf2[x], #]/pdf2[x]])&, parameters]; 
    llD[x0_, epsilon0_] := Evaluate[derivatives /. {x->x0, epsilon->epsilon0}]; 

    probability1[x0_] :=
	 (1-epsilon) pdf1[x0]/((1-epsilon) pdf1[x0] + epsilon pdf2[x0]);

    (* Find parmRule.
       Need to deal with a plist that may be neither a vector nor a matrix. *)
    pos = Flatten[Position[plist, _?VectorQ]];
    parmRule = Map[Apply[Rule, #]&, plist[[pos]]];
    If[!MatrixQ[plist],
       (* one or more of the initial parameter values are unspecified *)	
       posC = Complement[Range[Length[plist]], pos];
       parameters0 = parameters[[posC]];
       llD0[x0_, epsilon0_] := Evaluate[derivatives[[posC]] /.
		{x->x0, epsilon->epsilon0}];
       lhs = Apply[Plus, Map[Apply[llD0, #]&,
		 Transpose[{data, 1-initialprob1}] ]];
       parmRule0 = parameterMLE[lhs, parameters0,
			workprec, accgoal, maxiter, 0];
       If[parmRule0 === $Failed, Return[$Failed]];
       parmRule = Join[parmRule, parmRule0]
    ]; (* finished finding initial values for parameters *)		
    (* order parmRule *)
    parmRule = Map[{Position[parameters, First[#]][[1, 1]], #}&, parmRule];
    parmRule = Map[Last, Sort[parmRule, #1[[1]] < #2[[1]] &]]; 


    (* Iterate EXPECTATION and MAXIMIZATION, CONTINUOUS data *)
    i = 1;
    While[i == 1 ||
	  (i <= maxiter &&
	   !Apply[And,
	     Map[Module[{parm = # /. parmRule, oldparm = # /. oldparmRule,
		         delta},
		   delta = Abs[parm-oldparm];
		   delta < accgoal && delta < precgoal Abs[parm+oldparm]/2
	         ]&, parameters] ]),

       If[progress,
	  printprogress[i, epsilonRule[[1, 2]], Map[Last, parmRule]]];
	
       (* EXPECTATION *)
       (* find the expected value of the probability of the observation
		deriving from the first distribution, conditioned on the
		value of the observation *)
       prob1 = Map[(probability1[#] /. epsilonRule /. parmRule)&, data];
       If[Apply[Or, Map[!(FreeQ[#, Complex] && 0 <= # <= 1)&, prob1]],
	  Message[MixtureEstimation::badprob];
	  Return[$Failed]
       ]; 

       (* MAXIMIZATION *)
       (* find the maximum likelihood estimates of the mixing parameter
		and the distribution parameters *)
       epsilonRule = {epsilon -> Apply[Plus, 1-prob1]/n};
       oldparmRule = parmRule;
       lhs = Apply[Plus, Map[Apply[llD, #]&, Transpose[{data, 1-prob1}] ]];
       parmRule = parameterMLE[lhs, 
		Transpose[{parameters, (parameters /. oldparmRule)}],
		workprec, accgoal, maxiter, i];
       If[parmRule === $Failed, Return[$Failed]];
       (* order parmRule *)
       parmRule = Map[{Position[parameters, First[#]][[1, 1]], #}&, parmRule];
       parmRule = Map[Last, Sort[parmRule, #1[[1]] < #2[[1]] &]]; 
       
	
       i++
    ];		(* end cycle of EXPECTATION, MAXIMIZATION *)

    If[i > maxiter &&
       !Apply[And,
             Map[Module[{parm = # /. parmRule, oldparm = # /. oldparmRule,
                         delta},
                   delta = Abs[parm-oldparm];
                   delta < accgoal && delta < precgoal Abs[parm+oldparm]/2
                 ]&, parameters] ],	
       Message[MixtureEstimation::noconv, maxiter]
    ];

    (* find the final expected value of the probabilities that the
		observations derive from the first distribution *) 
    prob1 = Map[(probability1[#] /. epsilonRule /. parmRule)&, data];
    If[Apply[Or, Map[!(FreeQ[#, Complex] && 0 <= # <= 1)&, prob1]],
       Message[MixtureEstimation::badprob];
       Return[$Failed]
    ];

    (* DIAGNOSTICS: CONTINUOUS data:
       Find the reduction in (-2 * loglikelihood) caused by using the model
	   (1-epsilon)*pdf1 + epsilon*pdf2
       over the simple model  pdf1 .
       First need to find the MLE's of the parameters of the simple model. *)
    diagRule = {};
    Module[{plist1 = {}, parameters1 = {}, derivatives, llD1,
	    parmRule1, simpleLL, mixtureLL, teststat, dof, pval},
      Scan[Module[{p=#},
	     If[VectorQ[p], {p, p0} = p];
	     If[!FreeQ[pdf1[x], p],
	        AppendTo[plist1, #];
		AppendTo[parameters1, p]
	     ]  ]&, plist];
      If[parameters1 === {},
	 parmRule1 = {},	(* the simple model has no parameters! *)
         derivatives = Map[Simplify[D[pdf1[x], #]/pdf1[x]]&, parameters1];
         llD1[x0_] := Evaluate[derivatives /. x->x0];
         lhs = Apply[Plus, Map[llD1, data]  ];
         parmRule1 = parameterMLE[lhs, plist1, workprec, accgoal, maxiter, 1]
      ];
      If[parmRule1 =!= $Failed,
         (* figure out reduction in -2*loglikelihood *)
         simpleLL = Apply[Plus, Map[Log[pdf1[#] /. parmRule1]&, data]];
         mixtureLL = Apply[Plus, Map[
		Log[((1-epsilon)pdf1[#] + epsilon pdf2[#]) /. parmRule /.
			epsilonRule]&, data]];
         teststat = -2 (simpleLL - mixtureLL); (* positive when mixtureLL is
					     less negative than simpleLL is *)
         dof = Length[parameters] - Length[parameters1];
	 diagRule = {LogLikelihood1 -> N[simpleLL],
		     LogLikelihoodMixed -> N[mixtureLL]};
	 If[teststat >= 0,
	    pval = 1-CDF[ChiSquareDistribution[dof], teststat];
            AppendTo[diagRule, MixtureModelDiagnostic -> modelreport[pval]],
	    Message[MixtureEstimation::notmix]
	 ]
      ]
    ]; (* end CONTINUOUS data DIAGNOSTICS *)
    	

    (* return the MLE's of the mixing parameter, the component
       distribution parameters, and the final expected values of the
       probabilities that each observation is derived from the first
       distribution *)
    Join[epsilonRule, parmRule, diagRule, {FinalProbability1 -> prob1}]

  ] (* end of EM for CONTINUOUS data *)


EM[data_?MatrixQ, eps_, {dist1_, dist2_}, plist_, initialprob1_,
	maxiter_, workprec_, ag_, pg_, prog_] :=
  Module[{levels, counts, n, domain1only, domain2only, epsilon, epsilonRule,
	  parameters, x, pdf1, pdf2, derivatives, derivatives1, derivatives0,
	  llD, pos, posC, parmRule,
	  llD0, parameters0, parmRule0, lhs,
	  prob1, oldparmRule, probability1, i, delta, 
	  accgoal = N[10^-ag], precgoal = N[10^pg], diagRule,
	  progress = TrueQ[prog]},
    {levels, counts} = Transpose[data];
    n = Apply[Plus, counts];
    domain1only = Select[levels, (pdf1[#] =!= 0 && pdf2[#] === 0)&];
    domain2only = Select[levels, (pdf2[#] =!= 0 && pdf1[#] === 0)&];
    If[VectorQ[eps],
       epsilon = eps[[1]];
       epsilonRule = {epsilon -> N[eps[[2]], workprec]},
       (* find initial value for epsilon *)
       epsilon = eps;
       epsilonRule = {epsilon -> N[counts.(1-initialprob1)/n, workprec]}
    ];

    parameters = Map[If[VectorQ[#], First[#], #]&, plist];

    pdf1[x0_] := PDF[dist1, x0];   pdf2[x0_] := PDF[dist2, x0];
    (* Simplification is not done ahead of time as with continuous
	distributions, because the discrete distributions tend to be piecewise.
	Need to know x0 before evaluating D[pdf1[x0], #]/pdf1[x0]. *)
    llD[x0_, epsilon0_] := Map[
	Which[epsilon0==0, Simplify[D[pdf1[x0], #]/pdf1[x0]],
              epsilon0==1, Simplify[D[pdf2[x0], #]/pdf2[x0]],
              True, (1-epsilon0) Simplify[D[pdf1[x0], #]/pdf1[x0]] +
                    epsilon0 Simplify[D[pdf2[x0], #]/pdf2[x0]]
        ]&, parameters];

    probability1[x0_] :=
	 (1-epsilon) pdf1[x0]/((1-epsilon) pdf1[x0] + epsilon pdf2[x0]);

    (* Find parmRule.
       Need to deal with a plist that may be neither a vector nor a matrix. *)
    pos = Flatten[Position[plist, _?VectorQ]];
    parmRule = Map[Apply[Rule, #]&, plist[[pos]]];
    If[!MatrixQ[plist],
       (* one or more of the initial parameter values are unspecified *)
       posC = Complement[Range[Length[plist]], pos];
       parameters0 = parameters[[posC]];
       llD0[x0_, epsilon0_] := Map[
	    Which[epsilon0==0, Simplify[D[pdf1[x0], #]/pdf1[x0]],
                  epsilon0==1, Simplify[D[pdf2[x0], #]/pdf2[x0]],
                  True, (1-epsilon0) Simplify[D[pdf1[x0], #]/pdf1[x0]] +
				epsilon0 Simplify[D[pdf2[x0], #]/pdf2[x0]]
	    ]&, parameters0];
       lhs = Map[(counts.#)&, Transpose[Map[Apply[llD0, #]&,
		Transpose[{levels, 1-initialprob1}] ]]];	
       parmRule0 = parameterMLE[lhs, parameters0,
                	workprec, accgoal, maxiter, 0];
       If[parmRule0 === $Failed, Return[$Failed]];
       parmRule = Join[parmRule, parmRule0]
    ]; (* finished finding initial values for parameters *)
    (* order parmRule *)
    parmRule = Map[{Position[parameters, First[#]][[1, 1]], #}&, parmRule];
    parmRule = Map[Last, Sort[parmRule, #1[[1]] < #2[[1]] &]]; 

    (* Iterate EXPECTATION and MAXIMIZATION, DISCRETE data *)
    i = 1;
    While[i == 1 ||
          (i <= maxiter &&
           !Apply[And,
             Map[Module[{parm = # /. parmRule, oldparm = # /. oldparmRule,
                         delta},
                   delta = Abs[parm-oldparm];
                   delta < accgoal && delta < precgoal Abs[parm+oldparm]/2
                 ]&, parameters] ]),

       If[progress,
	  printprogress[i, epsilonRule[[1, 2]], Map[Last, parmRule]]];

       (* EXPECTATION *)
       (* find the expected value of the probability of the observation
                deriving from the first distribution, conditioned on the
                value of the observation *)
       prob1 = Map[
	 Which[MemberQ[domain1only, #], 1,
               MemberQ[domain2only, #], 0,
	       True, (probability1[#] /. epsilonRule /. parmRule)]&,
	 levels]; 	
       If[Apply[Or, Map[!(FreeQ[#, Complex] && 0 <= # <= 1)&, prob1]],
          Message[MixtureEstimation::badprob];
          Return[$Failed]
       ];

       (* MAXIMIZATION *)
       (* find the maximum likelihood estimates of the mixing parameter
                and the distribution parameters *)
       epsilonRule = {epsilon -> counts.(1-prob1)/n};
       oldparmRule = parmRule;
       lhs = Map[(counts.#)&, Transpose[Map[Apply[llD, #]&,    
                Transpose[{levels, 1-prob1}] ]]];
       parmRule = parameterMLE[lhs,
                Transpose[{parameters, (parameters /. oldparmRule)}],
                workprec, accgoal, maxiter, i];
       If[parmRule === $Failed, Return[$Failed]];
       (* order parmRule *)
       parmRule = Map[{Position[parameters, First[#]][[1, 1]], #}&, parmRule];
       parmRule = Map[Last, Sort[parmRule, #1[[1]] < #2[[1]] &]]; 


       i++
    ];          (* end cycle of EXPECTATION, MAXIMIZATION *)

    If[i > maxiter &&
       !Apply[And,
             Map[Module[{parm = # /. parmRule, oldparm = # /. oldparmRule,
                         delta},
                   delta = Abs[parm-oldparm];
                   delta < accgoal && delta < precgoal Abs[parm+oldparm]/2
                 ]&, parameters] ],
       Message[MixtureEstimation::noconv, maxiter]
    ];

    (* find the final expected value of the probabilities that the
                observations derive from the first distribution *)
    prob1 = Map[
         Which[MemberQ[domain1only, #], 1,
               MemberQ[domain2only, #], 0,
               True, (probability1[#] /. epsilonRule /. parmRule)]&,    
         levels];
    If[Apply[Or, Map[!(FreeQ[#, Complex] && 0 <= # <= 1)&, prob1]],
          Message[MixtureEstimation::badprob];
          Return[$Failed]
    ];

    (* DIAGNOSTICS: DISCRETE data:
       Find the reduction in (-2 * loglikelihood) caused by using the model
           (1-epsilon)*pdf1 + epsilon*pdf2
       over the simple model  pdf1 .
       First need to find the MLE's of the parameters of the simple model. *)
    diagRule = {};
    Module[{plist1 = {}, parameters1 = {}, llD1, parmRule1, simpleLL,
            mixtureLL, teststat, dof, pval},
      Scan[Module[{p=#},
             If[VectorQ[p], {p, p0} = p];
             If[!FreeQ[pdf1[x], p],
                AppendTo[plist1, #];
                AppendTo[parameters1, p]
             ]  ]&, plist];
      If[parameters1 === {},
	 parmRule1 = {},	(* the simple model has no parameters! *)
         llD1[x0_] := Map[Simplify[D[pdf1[x0], #]/pdf1[x0]]&, parameters1];
	 lhs = Map[(counts.#)&, Transpose[Map[llD1, levels]] ];
         parmRule1 = parameterMLE[lhs, plist1, workprec, accgoal, maxiter, 1]
      ];
      If[parmRule1 =!= $Failed,
         (* figure out reduction in -2*loglikelihood *)
         simpleLL = counts.Map[Log[pdf1[#] /. parmRule1]&, levels];
         mixtureLL = counts.Map[Log[((1-epsilon)pdf1[#] + epsilon pdf2[#]) /.
		parmRule /. epsilonRule]&, levels];
         teststat = -2 (simpleLL - mixtureLL); (* positive when mixtureLL is
                                             less negative than simpleLL is *)
         dof = Length[parameters] - Length[parameters1];
         diagRule = {LogLikelihood1 -> N[simpleLL],
                     LogLikelihoodMixed -> mixtureLL};
         If[teststat >= 0,
            pval = 1-CDF[ChiSquareDistribution[dof], teststat];
            AppendTo[diagRule, MixtureModelDiagnostic -> modelreport[pval]],
            Message[MixtureEstimation::notmix]
         ]
      ]
    ]; (* end DISCRETE data DIAGNOSTICS *)

    (* return the MLE's of the mixing parameter, the component
       distribution parameters, and the final expected values of the
       probabilities that each observation is derived from the first
       distribution *)
    Join[epsilonRule, parmRule, diagRule, {FinalProbability1 -> prob1}]

  ] (* end of EM for DISCRETE data *)


(* =========================== parameterMLE ============================= *)

parameterMLE[lhs0_, plist_, workprec_, accgoal_, maxiter_, iteration_] :=
   Module[{lhs = lhs0, eqns,
	   parameters = Map[If[VectorQ[#], First[#], #]&, plist],
	   scan, UseFindRoot, argumentlist, soln1, ratings, position},
     (* On the first iteration, determine which left-hand-sides are simplified
        by a Numerator[Together[]]. *)
     If[iteration <= 1,
        $simplifypos = {};
        lhs = MapIndexed[Module[{num = Numerator[Together[#1]]},
                           If[LeafCount[num] < LeafCount[#],
                              AppendTo[$simplifypos, #2]; num,
                              #1] ]&,
                         lhs] ];
     (* On later iterations, only simplify select left-hand-sides. *)
     If[iteration > 1,
        lhs = MapAt[Numerator[Together[#]]&, lhs, $simplifypos]];

     eqns = Map[(#==0)&, lhs];

     (* Find parameter estimates using either NSolve or FindRoot. *)
     polypos = {};
     Scan[If[PolynomialQ[lhs[[#]], parameters] ||
	     Module[{p = lhs[[#]], z},
	       And @@ Map[(PolynomialQ[p, #] || PolynomialQ[p /. #->1/z, z])&,
			  parameters]
	     ],
	     AppendTo[polypos, #]
	  ]&,  Range[Length[parameters]] ];

     nonpolypos = Complement[Range[Length[parameters]], polypos];

     If[nonpolypos =!= {},
        (* There are nonpolynomial equations. *)
	If[iteration==0,
           (* Failure because the equations are nonpolynomial and
              no initial values for parameters are provided. *)
           Message[MixtureEstimation::polyfail, parameters[[nonpolypos]] ];
           Return[$Failed]
	];
	UseFindRoot = True,
	(* The equations are polynomial. *)
	scan = Scan[If[(exp = Exponent[#[[1]], #[[2]]]) > 5,
                       Return[$Failed]]&, Transpose[{lhs, parameters}]];
	If[scan === $Failed,
	   If[iteration == 0,
              (* Failure because the equations are polynomial of high
		 degree	and no initial values for parameters are provided. *)
              Message[MixtureEstimation::polyexp, #[[2]], exp],
	      UseFindRoot = True
	   ],
	   UseFindRoot = False
	]
     ];


     If[UseFindRoot,
	
        (* ====================== FindRoot ======================= *)
        argumentlist = Join[{eqns}, plist,
	 	{AccuracyGoal -> accgoal, MaxIterations -> maxiter,
		 WorkingPrecision -> workprec}];
        If[!FreeQ[soln = Apply[FindRoot, argumentlist], FindRoot]
	         || soln === {},
	      Message[MixtureEstimation::parmfail];
	      Return[$Failed]
        ];
	parmRule = soln,

        (* ======================== NSolve ======================== *)
        If[!FreeQ[soln = NSolve[eqns, parameters,
		  WorkingPrecision -> workprec], NSolve]
	   || soln === {},
           Message[MixtureEstimation::nsolvefail];
           Return[$Failed]
        ];
	If[Length[soln] != 1,  (* NSolve returns more than one solution. *)
           (* Use a hack to pick the best estimate.  Really should use
              ValidParameterQ[dist] to determine which of the several
              solutions are valid parameter values for the particular
              distribution in question. *)
           soln1 = parameters /. soln;
           ratings = Map[Count[#, True]&,
                        Map[With[{parm = #},
                                 Map[(FreeQ[#, Complex] && # > 0)&, parm]
                            ]&, soln1]];
           position = Flatten[Position[ratings, Max[ratings]]][[1]];
           parmRule = soln[[position]],
           parmRule = soln[[1]]
        ]
     ];

     parmRule
   ]


End[]


EndPackage[]


(* :Examples:

Swedish Pension Data;
counts indicating the number of widows receiving pensions
having 0, 1, ..., 6 children;
data = {{0, 3062}, {1, 587}, {2, 284}, {3, 103}, {4, 33}, {5, 4}, {6, 2}};

PDF[kroneckerDistribution[k_?IntegerQ], n_] := If[n==k, 1, 0];

(1) MixtureEstimation[data, epsilon,
	{kroneckerDistribution[0], PoissonDistribution[lambda]}, {lambda}]

    MixtureEstimation::noconv: 
    Warning: MaxIterations of 30 exceeded before parameter(s) satisfied
     AccuracyGoal and PrecisionGoal.

    {epsilon -> 0.384908, lambda -> 1.03794, ...}

(2) MixtureEstimation[data, {epsilon, .25}, 
	{kroneckerDistribution[0], PoissonDistribution[lambda]},
	{{lambda, 1.035478}}]

    MixtureEstimation::noconv

    {epsilon -> 0.384918, lambda -> 1.03791, ...}

(3) PDF[myUniformDistribution[k_?IntegerQ], n_] :=
	PDF[DiscreteUniformDistribution[k], n+1];

     MixtureEstimation[data, epsilon,
     {myUniformDistribution[7], PoissonDistribution[lambda]}, {lambda}]

     MixtureEstimation::noconv

     {epsilon -> 0.939505, lambda -> 0.27925, ...}

(4) MixtureEstimation[data, epsilon,
        {GeometricDistribution[p], PoissonDistribution[lambda]},
		 {{p, .5}, lambda}]

    MixtureEstimation::noconv

    {epsilon -> 0.213431, p -> 0.887494, lambda -> 1.40466, ...}

*)

(* :Examples:

Darwin's data on the differences in heights of 15 pairs of self-fertilized
	and cross-fertilized plants grown on the same plot.  

data = {-67, -48, 6, 8, 14, 16, 23, 24, 28, 29, 41, 49, 56, 60, 75};

(1) equal variances

    MixtureEstimation[data, epsilon,
	{NormalDistribution[mu1, sigma], NormalDistribution[mu2, sigma]},
	{mu1, mu2, sigma}]

   {epsilon -> 0.8664891748852062, sigma -> 19.61844867661945, 
    mu1 -> -57.37146512241579, mu2 -> 32.99873305821706, ...   }

   The solution given above agrees with Table 2 in the
   Aitkin & Tunnicliffe Wilson reference.

(2) equal means, unequal variances

    (i) MixtureEstimation[data, epsilon,
	{NormalDistribution[mu, sigma], NormalDistribution[mu, Sqrt[k] sigma]},
	{mu, sigma, k}]

	MixtureEstimation::noconv

	{epsilon -> 0.604576, sigma -> 54.9742, k -> 0.0974945, 
	 mu -> 27.5293,  ... }

    (ii) mu is estimated by Mean[data] = 20.9333
         sigma is estimated by StandardDeviationMLE[{-67, -48}] = 13.435
	 Sqrt[k] sigma is estimated by StandardDeviationMLE[
		{6, 8, 14, 16, 23, 24, 28, 29, 41, 49, 56, 60, 75}] = 20.6956
	 k is estimated by (20.6956/13.435)^2 = 2.37291

	 MixtureEstimation[data, epsilon,
                {NormalDistribution[mu, sigma],
         	 NormalDistribution[mu, Sqrt[k] sigma]},
       		{{mu, 20.9333}, {sigma, 13.435}, {k, 2.37291}}] 

	 MixtureEstimation::noconv: 

         {epsilon -> 0.657528, sigma -> 8.34639, k -> 28.5271, mu -> 19.9968,
		 ... }

    (iii) using values from Aitkin & Tunnicliffe Wilson reference:

          MixtureEstimation[data, epsilon,
		{NormalDistribution[mu, sigma],
	 	 NormalDistribution[mu, Sqrt[k] sigma]},
		{{mu, 26.37}, {sigma, 16.7869}, {k, 8.98}}]

	  MixtureEstimation::noconv: 

	  {epsilon -> 0.65747, sigma -> 8.34736, k -> 28.5228,
			mu -> 19.9971, ... }

(3) unequal means and variances

    MixtureEstimation[data, epsilon,
	{NormalDistribution[mu1, sigma1], NormalDistribution[mu2, sigma2]},
	{mu1, mu2, sigma1, sigma2}]

    {epsilon -> 0.866828, sigma1 -> 9.49999,
		sigma2 -> 20.7234, mu1 -> -57.5111, mu2 -> 32.9849, ...}

(4) testing warning and error messages:

    MixtureEstimation[data, epsilon,
          {NormalDistribution[mu, sigma], CauchyDistribution[a, b]},
	  {mu, a, b}]	
 
    MixtureEstimation::parm: 
    One or both of the distributions are specified by parameters not 
     		included in  the parameter list {a, b, mu}.

(5) testing warning and error messages:

    MixtureEstimation[data, epsilon,
           {NormalDistribution[mu, sigma], CauchyDistribution[a, b]},
	   {mu, sigma, a, b}]	

    MixtureEstimation::polyfail: 
    The maximum likelihood method of estimating the values of the 
    parameters yielded non-polynomial equations which cannot be solved using
    NSolve.

(6) testing warning and error messages:

    MixtureEstimation[data, epsilon,
	{NormalDistribution[0, 1], NormalDistribution[2, 3]},
	{mu}]

    MixtureEstimation::noparm: 
    There are no unknown parameters in the two distributions.     

(6) MixtureEstimation[data, epsilon,
           {NormalDistribution[mu, sigma], CauchyDistribution[a, b]},
           {mu, sigma, {a, .2}, {b, .8}}]

    MixtureEstimation::notmix: 
    Warning: The maximized log-likelihood is smaller for the mixture model than
    for the model consisting of simply the first distribution.  This is
    strong evidence against the mixture model.

    {epsilon -> 0.868422, mu -> -57.5631, sigma -> 9.49979,
	a -> -456.755, b -> 25.6288, ... }

(7) sigma0 = StandardDeviationMLE[data];

    (i)  The standard deviation is fixed at sigma0.

	 MixtureEstimation[data, epsilon,
	   {NormalDistribution[mu1, sigma0], NormalDistribution[mu2, sigma0]},
	   {mu1, mu2}]

        MixtureEstimation::noconv

        {epsilon -> 0.909444, mu1 -> -50.32, mu2 -> 28.0282, ... }

    (ii)  The standard deviation is initially sigma0.

	  MixtureEstimation[data, epsilon,
            {NormalDistribution[mu1, sigma], NormalDistribution[mu2, sigma]},
            {mu1, mu2, {sigma, sigma0}}]

	  {epsilon -> 0.866489, sigma -> 19.6184,
		mu1 -> -57.3715, mu2 -> 32.9987, ...}
*)