(*:Mathematica Version: 2.2 *) (*:Package Version: 1.0 *) (*:Name: Statistics`MixtureEstimation` *) (*:Context: Statistics`MixtureEstimation` *) (*:Title: Mixture Estimation via Expectation Maximization *) (*:Author: E. C. Martin *) (*:History: E. C. Martin (Wolfram Research), March 1995. *) (*:Copyright: Copyright 1995, Wolfram Research, Inc. *) (*:Summary: This package uses the EM (expectation maximization) algorithm to simultaneously estimate the missing data and unknown parameter(s) associated with a data set. Here the missing data are assumed to be the identities of the observations originating from each of the two distributions contributing to the mixture. The parameters describe the component distributions of the mixture; the distributions may be continuous or discrete. *) (*:Keywords: EM, expectation maximization, parameter estimation, mixture distribution, maximum likelihood estimation *) (*:Requirements: No special system requirements. *) (*:Warning: There are problems with comparing the maximized likelihoods for the simple and mixture models, as a means of testing for the existence of a mixture. (1) For small samples, the asymptotic chi-squared distribution for the test statistic may not hold satisfactorily. (2) The asymptotic chi-squared distribution is really only valid if epsilon is known, which of course it is not. See the M. Aitkin & G. Tunnicliffe Wilson reference for a discussion of this. *) (*:Limitations: This package is only an indication of how the estimation-maximization algorithm may be used to find the mixture of any two component distributions, discrete or continuous, underlying the observed data. For certain component distributions, the algorithm can be coded more efficiently. In particular, the parameter estimates at each stage are often found using weighted least squares rather than the numerical solve (NSolve) implemented here. This package allows completely general mixtures, so some sacrifice is made in efficiency. Currently this package does not provide information on parameter estimates such as the standard errors. The EM algorithm is well known to have only linear convergence. *) (*:Discussion: NOTE that the EM algorithm involves approximating ((1-epsilon) D[pdf1, parm] + epsilon D[pdf2, parm])/ ((1-epsilon) pdf1 + epsilon pdf2) with ((1-epsilon) D[pdf1, parm]/pdf1 + epsilon D[pdf2, parm]/pdf2) when solving for the maximum likelihood estimates of the parameters. *) (*:Sources: Elements of Statistical Computing, Thisted, Chapman & Hall, 1988. "Mixture Models, Outliers, and the EM Algorithm", M. Aitkin & G. Tunnicliffe Wilson, Technometrics, Vol. 22, No. 3, Aug. 1980. "Maximum Likelihood from Incomplete Data via the EM Algorithm", A. P. Dempster, N. M. Laird, D. B. Rubin, J. Roy. Statist. Soc., B, 39, pp. 1-38, 1977. "Mixture Densities, Maximum Likelihood and the EM Algorithm", R. A. Redner and H. F. Walker, SIAM Review, Vol. 26, No. 2, April 1984. *) BeginPackage["Statistics`MixtureEstimation`", "Statistics`ContinuousDistributions`", "Statistics`NormalDistribution`", "Statistics`DiscreteDistributions`", "Statistics`DescriptiveStatistics`", "Statistics`Common`DistributionsCommon`"] MixtureEstimation::usage = "MixtureEstimation[data, epsilon, {dist1, dist2}, {parm1, parm2, ...}] gives a list of replacement rules for the distribution parameters parm1, parm2, ... and the mixing parameter epsilon describing the mixture distribution ((1-epsilon) dist1 + epsilon dist2) assumed to underly the data. The parameter epsilon and the parameter(s) of the two component distributions are estimated via the EM (expectation maximization) algorithm. The unknown vector indicating which observations are due to which distribution represents missing data. The initial probability of each observation deriving from dist1 can be specified using the option InitialProbability1. In the case of continuous data, data is a vector {d1, d2, ...} giving the explicit observations. In the case of discrete data, data is a list of pairs {{l1, m1}, {l2, m2}, ...}, where mi is number of datums observed to take on discrete value li. Initial values p1, p2, ... for the parameters parm1, parm2, ... may be specified by replacing the list {parm1, parm2, ...} with the list of pairs {{parm1, p1}, {parm2, p2}, ...}. An initial value epsilon0 for the mixing parameter epsilon may be specified by replacing epsilon with the list {epsilon, epsilon0}." InitialProbability1::usage = "InitialProbability1 is an option of MixtureEstimation indicating the initial probability of each of the observations in data deriving from the first distribution. It may be set to a vector having the same length as the data, or the default Automatic. In the case of continuous data, Automatic means that the smallest two observations will be considered to have come from the first distribution and the remaining observations from the second. In the case of discrete data, Automatic means that all the observations of the first discrete value in data will be assumed to come from the first distribution, while the remaining observations will be assumed to come from the second. InitialProbability1 -> {p1, p2, ...}, pi > 0, p1 + p2 + ... = 1, specifies a vector of probabilities. For continuous data, pi is the probability that the ith datum derives from the first distribution in the mixture. For discrete data, pi is the probability that the observations of the ith discrete value derive from the first distribution." FinalProbability1::usage = "FinalProbability1 is used in the output of MixtureEstimation to indicate the final estimates of the probability of the data deriving from the first distribution in the mixture, as determined by the expectation maximization algorithm." MixtureModelDiagnostic::usage = "MixtureModelDiagnostic is used in the output of MixtureEstimation to indicate whether the null hypothesis (data derives from the first distribution only) should be rejected in favor of the alternative hypothesis (data derives from a mixture of the first and second distributions)." LogLikelihood1::usage = "LogLikelihood1 is used in the output of MixtureEstimation to indicate the log-likelihood achieved using a model based on the first distribution only. If there are any parameters in this model, the log-likelihood is maximized with respect to the parameters." LogLikelihoodMixed::usage = "LogLikelihoodMixed is used in the output of MixtureEstimation to indicate the log-likelihood achieved using a model based on the mixture distribution. The log-likelihood is maximized with respect to the parameters." ShowProgress::usage = "ShowProgress is an option of MixtureEstimation. If True, a summary of intermediate results for each iteration of the search for the best parameters will be printed." Begin["`Private`"] Options[MixtureEstimation] := {AccuracyGoal -> Automatic, InitialProbability1 -> Automatic, MaxIterations -> 30, PrecisionGoal -> Automatic, ShowProgress -> False, WorkingPrecision -> 16} (* ========================= Utility Functions =========================== *) validArgumentsQ[data_, dist1_, dist2_, plist_] := Module[{contexts, plist1, plist2, x}, contexts = Sort[{Context[Evaluate[Head[dist1]]], Context[Evaluate[Head[dist2]]]}]; If[VectorQ[data], (* data is continuous-valued *) If[MemberQ[contexts, "Statistics`DiscreteDistributions`"], Message[MixtureEstimation::contdata]; Return[False] ]; If[Length[Union[data]] < 2, Message[MixtureEstimation::data]; Return[False] ] ]; If[MatchQ[Dimensions[data], {_Integer, 2}], (* data is discrete-valued *) If[MemberQ[contexts, "Statistics`NormalDistribution`"] || MemberQ[contexts, "Statistics`ContinuousDistributions`"], Message[MixtureEstimation::discdata]; Return[False] ]; If[Length[Union[Map[First, data]]] < 2, Message[MixtureEstimation::data]; Return[False] ] ]; If[!FreeQ[PDF[dist1, x], PDF], Message[MixtureEstimation::pdf, dist1]; Return[False]]; If[!FreeQ[PDF[dist2, x], PDF], Message[MixtureEstimation::pdf, dist2]; Return[False]]; plist1 = Sort[Map[If[VectorQ[#], First[#], #]&, plist]]; plist2 = Sort[Select[ Union[Apply[List, dist1], Apply[List, dist2]], (!NumberQ[N[#]])&]]; (* check whether arguments of the distributions contain at least one of the specified parameters *) If[plist2==={}, Message[MixtureEstimation::noparm]; Return[False]]; If[Apply[Or, Map[ListFreeQ[#, plist1]&, plist2]], Message[MixtureEstimation::parm, plist1, plist2]; Return[False]]; True ] ListFreeQ[expr_, list_] := Apply[And, Map[FreeQ[expr, #]&, list]] printprogress[iteration_, epsilon_, params_] := ( Print[StringForm[ "Iteration:`1` Epsilon:`2` Parameters:`3`", iteration, NumberForm[N[epsilon], 6], NumberForm[N[params], 6] ]] ) (* ============================= Messages ================================= *) MixtureEstimation::contdata = "Data is in the form of continuous-valued data so it may not be modeled as coming from a mixture involving a discrete distribution." MixtureEstimation::discdata = "Data is in the form of discrete-valued data so it may not be modeled as coming from a mixture involving a continuous distribution." MixtureEstimation::data = "Data consists of fewer than two distinct values." MixtureEstimation::pdf = "The probability density function PDF[``, x] is not defined." MixtureEstimation::parm = "One or both of the distributions are specified by parameters not included in the parameter list ``." MixtureEstimation::noparm = "There are no unknown parameters in the two component distributions." MixtureEstimation::initial = "Warning: The initial probabilities that the observations derive from the first distribution, as specified by InitialProbability1, are invalid. Using default Automatic." MixtureEstimation::prob = "Warning: When the length of continuous data is only 2, the Automatic procedure of determining the initial probabilities that the observations derive from first distribution is special. It is assumed that initially the smallest datum derives from the first distribution with probability 1, and the largest datum derives from the first distribution with probability 0." MixtureEstimation::maxiter = "Warning: MaxIterations must be a positive integer. Using MaxIterations -> ``." MixtureEstimation::workprec = "Warning: WorkingPrecision must be an integer greater than or equal to 16. Using WorkingPrecision -> ``." MixtureEstimation::accgoal = "Warning: AccuracyGoal must be a nonnegative integer. Using AccuracyGoal -> ``." MixtureEstimation::precgoal = "Warning: PrecisionGoal must be a nonnegative integer. Using PrecisionGoal -> ``." MixtureEstimation::nsolvefail = "Unable to find the values of the parameters using NSolve on the maximum likelihood equations." MixtureEstimation::polyfail = "The maximum likelihood method of estimating the initial values of the parameters yielded non-polynomial equation(s). Specify initial value(s) for the parameter(s) in list ``." MixtureEstimation::parmfail = "Unable to find the values of the parameters using FindRoot on the maximum likelihood equations." MixtureEstimation::badprob = "Expectation step yielded one or more probabilities that were either complex or did not lie in [0, 1]." MixtureEstimation::noconv = "Warning: MaxIterations of `` exceeded before parameter(s) satisfied AccuracyGoal and PrecisionGoal." MixtureEstimation::polyexp = "MLE equations include a polynomial equation in `1` of order `2`. An initial value should be suggested for `1`." MixtureEstimation::notmix = "Warning: The maximized log-likelihood is smaller for the mixture model than for the model consisting of simply the first distribution. This is strong evidence against the mixture model." modelreport[p_] := If[p < 0.05, StringForm["P-value is ``, so reject null hypothesis of no mixture at the 5 percent significance level: there is strong evidence to support the mixture model.", p], StringForm["P-value is ``, so fail to reject null hypothesis of no mixture at the 5 percent significance level: there is little evidence to support the mixture model.", p] ] (* =========================== MixtureEstimation =========================== *) MixtureEstimation[data_List, eps:(_Symbol | {_Symbol, _?((NumberQ[N[#]])&)}), {dist1_, dist2_}, plist:{(_Symbol | {_Symbol, _?((NumberQ[N[#]])&)})..}, opts___] := Module[{initial, maxiter, accgoal, precgoal, workprec, progress, n, sorted, min1, min2, sum, result}, ( result ) /; ( {initial, maxiter, accgoal, precgoal, workprec, progress} = {InitialProbability1, MaxIterations, AccuracyGoal, PrecisionGoal, WorkingPrecision, ShowProgress} /. {opts} /. Options[MixtureEstimation]; n = Length[data]; If[!( (VectorQ[initial] && Length[initial] == n && Apply[And, Map[(0 <= N[#] <= 1)&, initial]]) || initial===Automatic ), Message[MixtureEstimation::initial]; initial = Automatic]; If[initial === Automatic, If[VectorQ[data], (* continuous data *) sorted = Sort[data]; {min1, min2} = sorted[[{1, 2}]]; If[Length[data] == 2, Message[MixtureEstimation::prob]; initial = Map[If[#-min1===0, 1, 0]&, data], initial = Map[If[#-min1===0 || #-min2===0, 1, 0]&, data] ], (* discrete data *) initial = Join[{1}, Table[0, {n-1}]] ] ]; If[initial === $Failed, result = $Failed, If[!(IntegerQ[maxiter] && iter > 0), maxiter = 30; Message[MixtureEstimation::maxiter, maxiter]]; If[!(IntegerQ[workprec] && workprec >= 16), workprec = 16; Message[MixtureEstimation::workprec, workprec]]; If[accgoal === Automatic, accgoal = workprec-10, If[!(IntegerQ[accgoal] && accgoal >= 0), accgoal = workprec-10; Message[MixtureEstimation::accgoal, accgoal] ]]; If[precgoal === Automatic, precgoal = workprec-10, If[!(IntegerQ[precgoal] && precgoal >= 0), precgoal = workprec-10; Message[MixtureEstimation::precgoal, precgoal] ]]; result = EM[data, eps, {dist1, dist2}, plist, initial, maxiter, workprec, accgoal, precgoal, progress] ]; result =!= $Failed ) ] /; Length[data] > 1 && Apply[And, Map[NumberQ, Flatten[N[data]]]] && FreeQ[N[data], Complex] && validArgumentsQ[data, dist1, dist2, plist] (* ================================== EM ================================== *) EM[data_?VectorQ, eps_, {dist1_, dist2_}, plist_, initialprob1_, maxiter_, workprec_, ag_, pg_, prog_] := Module[{n = Length[data], epsilon, epsilonRule, parameters, x, pdf1, pdf2, derivatives, llD, pos, posC, parmRule, llD0, parameters0, parmRule0, lhs, prob1, oldparmRule, probability1, i, delta, accgoal = N[10^-ag], precgoal = N[10^pg], diagRule, progress = TrueQ[prog]}, If[VectorQ[eps], epsilon = eps[[1]]; epsilonRule = {epsilon -> N[eps[[2]], workprec]}, (* find initial value for epsilon *) epsilon = eps; epsilonRule = {epsilon -> N[Apply[Plus, 1-initialprob1]/n, workprec]} ]; parameters = Map[If[VectorQ[#], First[#], #]&, plist]; pdf1[x0_] := PDF[dist1, x0]; pdf2[x0_] := PDF[dist2, x0]; derivatives = Map[((1-epsilon) Simplify[D[pdf1[x], #]/pdf1[x]] + epsilon Simplify[D[pdf2[x], #]/pdf2[x]])&, parameters]; llD[x0_, epsilon0_] := Evaluate[derivatives /. {x->x0, epsilon->epsilon0}]; probability1[x0_] := (1-epsilon) pdf1[x0]/((1-epsilon) pdf1[x0] + epsilon pdf2[x0]); (* Find parmRule. Need to deal with a plist that may be neither a vector nor a matrix. *) pos = Flatten[Position[plist, _?VectorQ]]; parmRule = Map[Apply[Rule, #]&, plist[[pos]]]; If[!MatrixQ[plist], (* one or more of the initial parameter values are unspecified *) posC = Complement[Range[Length[plist]], pos]; parameters0 = parameters[[posC]]; llD0[x0_, epsilon0_] := Evaluate[derivatives[[posC]] /. {x->x0, epsilon->epsilon0}]; lhs = Apply[Plus, Map[Apply[llD0, #]&, Transpose[{data, 1-initialprob1}] ]]; parmRule0 = parameterMLE[lhs, parameters0, workprec, accgoal, maxiter, 0]; If[parmRule0 === $Failed, Return[$Failed]]; parmRule = Join[parmRule, parmRule0] ]; (* finished finding initial values for parameters *) (* order parmRule *) parmRule = Map[{Position[parameters, First[#]][[1, 1]], #}&, parmRule]; parmRule = Map[Last, Sort[parmRule, #1[[1]] < #2[[1]] &]]; (* Iterate EXPECTATION and MAXIMIZATION, CONTINUOUS data *) i = 1; While[i == 1 || (i <= maxiter && !Apply[And, Map[Module[{parm = # /. parmRule, oldparm = # /. oldparmRule, delta}, delta = Abs[parm-oldparm]; delta < accgoal && delta < precgoal Abs[parm+oldparm]/2 ]&, parameters] ]), If[progress, printprogress[i, epsilonRule[[1, 2]], Map[Last, parmRule]]]; (* EXPECTATION *) (* find the expected value of the probability of the observation deriving from the first distribution, conditioned on the value of the observation *) prob1 = Map[(probability1[#] /. epsilonRule /. parmRule)&, data]; If[Apply[Or, Map[!(FreeQ[#, Complex] && 0 <= # <= 1)&, prob1]], Message[MixtureEstimation::badprob]; Return[$Failed] ]; (* MAXIMIZATION *) (* find the maximum likelihood estimates of the mixing parameter and the distribution parameters *) epsilonRule = {epsilon -> Apply[Plus, 1-prob1]/n}; oldparmRule = parmRule; lhs = Apply[Plus, Map[Apply[llD, #]&, Transpose[{data, 1-prob1}] ]]; parmRule = parameterMLE[lhs, Transpose[{parameters, (parameters /. oldparmRule)}], workprec, accgoal, maxiter, i]; If[parmRule === $Failed, Return[$Failed]]; (* order parmRule *) parmRule = Map[{Position[parameters, First[#]][[1, 1]], #}&, parmRule]; parmRule = Map[Last, Sort[parmRule, #1[[1]] < #2[[1]] &]]; i++ ]; (* end cycle of EXPECTATION, MAXIMIZATION *) If[i > maxiter && !Apply[And, Map[Module[{parm = # /. parmRule, oldparm = # /. oldparmRule, delta}, delta = Abs[parm-oldparm]; delta < accgoal && delta < precgoal Abs[parm+oldparm]/2 ]&, parameters] ], Message[MixtureEstimation::noconv, maxiter] ]; (* find the final expected value of the probabilities that the observations derive from the first distribution *) prob1 = Map[(probability1[#] /. epsilonRule /. parmRule)&, data]; If[Apply[Or, Map[!(FreeQ[#, Complex] && 0 <= # <= 1)&, prob1]], Message[MixtureEstimation::badprob]; Return[$Failed] ]; (* DIAGNOSTICS: CONTINUOUS data: Find the reduction in (-2 * loglikelihood) caused by using the model (1-epsilon)*pdf1 + epsilon*pdf2 over the simple model pdf1 . First need to find the MLE's of the parameters of the simple model. *) diagRule = {}; Module[{plist1 = {}, parameters1 = {}, derivatives, llD1, parmRule1, simpleLL, mixtureLL, teststat, dof, pval}, Scan[Module[{p=#}, If[VectorQ[p], {p, p0} = p]; If[!FreeQ[pdf1[x], p], AppendTo[plist1, #]; AppendTo[parameters1, p] ] ]&, plist]; If[parameters1 === {}, parmRule1 = {}, (* the simple model has no parameters! *) derivatives = Map[Simplify[D[pdf1[x], #]/pdf1[x]]&, parameters1]; llD1[x0_] := Evaluate[derivatives /. x->x0]; lhs = Apply[Plus, Map[llD1, data] ]; parmRule1 = parameterMLE[lhs, plist1, workprec, accgoal, maxiter, 1] ]; If[parmRule1 =!= $Failed, (* figure out reduction in -2*loglikelihood *) simpleLL = Apply[Plus, Map[Log[pdf1[#] /. parmRule1]&, data]]; mixtureLL = Apply[Plus, Map[ Log[((1-epsilon)pdf1[#] + epsilon pdf2[#]) /. parmRule /. epsilonRule]&, data]]; teststat = -2 (simpleLL - mixtureLL); (* positive when mixtureLL is less negative than simpleLL is *) dof = Length[parameters] - Length[parameters1]; diagRule = {LogLikelihood1 -> N[simpleLL], LogLikelihoodMixed -> N[mixtureLL]}; If[teststat >= 0, pval = 1-CDF[ChiSquareDistribution[dof], teststat]; AppendTo[diagRule, MixtureModelDiagnostic -> modelreport[pval]], Message[MixtureEstimation::notmix] ] ] ]; (* end CONTINUOUS data DIAGNOSTICS *) (* return the MLE's of the mixing parameter, the component distribution parameters, and the final expected values of the probabilities that each observation is derived from the first distribution *) Join[epsilonRule, parmRule, diagRule, {FinalProbability1 -> prob1}] ] (* end of EM for CONTINUOUS data *) EM[data_?MatrixQ, eps_, {dist1_, dist2_}, plist_, initialprob1_, maxiter_, workprec_, ag_, pg_, prog_] := Module[{levels, counts, n, domain1only, domain2only, epsilon, epsilonRule, parameters, x, pdf1, pdf2, derivatives, derivatives1, derivatives0, llD, pos, posC, parmRule, llD0, parameters0, parmRule0, lhs, prob1, oldparmRule, probability1, i, delta, accgoal = N[10^-ag], precgoal = N[10^pg], diagRule, progress = TrueQ[prog]}, {levels, counts} = Transpose[data]; n = Apply[Plus, counts]; domain1only = Select[levels, (pdf1[#] =!= 0 && pdf2[#] === 0)&]; domain2only = Select[levels, (pdf2[#] =!= 0 && pdf1[#] === 0)&]; If[VectorQ[eps], epsilon = eps[[1]]; epsilonRule = {epsilon -> N[eps[[2]], workprec]}, (* find initial value for epsilon *) epsilon = eps; epsilonRule = {epsilon -> N[counts.(1-initialprob1)/n, workprec]} ]; parameters = Map[If[VectorQ[#], First[#], #]&, plist]; pdf1[x0_] := PDF[dist1, x0]; pdf2[x0_] := PDF[dist2, x0]; (* Simplification is not done ahead of time as with continuous distributions, because the discrete distributions tend to be piecewise. Need to know x0 before evaluating D[pdf1[x0], #]/pdf1[x0]. *) llD[x0_, epsilon0_] := Map[ Which[epsilon0==0, Simplify[D[pdf1[x0], #]/pdf1[x0]], epsilon0==1, Simplify[D[pdf2[x0], #]/pdf2[x0]], True, (1-epsilon0) Simplify[D[pdf1[x0], #]/pdf1[x0]] + epsilon0 Simplify[D[pdf2[x0], #]/pdf2[x0]] ]&, parameters]; probability1[x0_] := (1-epsilon) pdf1[x0]/((1-epsilon) pdf1[x0] + epsilon pdf2[x0]); (* Find parmRule. Need to deal with a plist that may be neither a vector nor a matrix. *) pos = Flatten[Position[plist, _?VectorQ]]; parmRule = Map[Apply[Rule, #]&, plist[[pos]]]; If[!MatrixQ[plist], (* one or more of the initial parameter values are unspecified *) posC = Complement[Range[Length[plist]], pos]; parameters0 = parameters[[posC]]; llD0[x0_, epsilon0_] := Map[ Which[epsilon0==0, Simplify[D[pdf1[x0], #]/pdf1[x0]], epsilon0==1, Simplify[D[pdf2[x0], #]/pdf2[x0]], True, (1-epsilon0) Simplify[D[pdf1[x0], #]/pdf1[x0]] + epsilon0 Simplify[D[pdf2[x0], #]/pdf2[x0]] ]&, parameters0]; lhs = Map[(counts.#)&, Transpose[Map[Apply[llD0, #]&, Transpose[{levels, 1-initialprob1}] ]]]; parmRule0 = parameterMLE[lhs, parameters0, workprec, accgoal, maxiter, 0]; If[parmRule0 === $Failed, Return[$Failed]]; parmRule = Join[parmRule, parmRule0] ]; (* finished finding initial values for parameters *) (* order parmRule *) parmRule = Map[{Position[parameters, First[#]][[1, 1]], #}&, parmRule]; parmRule = Map[Last, Sort[parmRule, #1[[1]] < #2[[1]] &]]; (* Iterate EXPECTATION and MAXIMIZATION, DISCRETE data *) i = 1; While[i == 1 || (i <= maxiter && !Apply[And, Map[Module[{parm = # /. parmRule, oldparm = # /. oldparmRule, delta}, delta = Abs[parm-oldparm]; delta < accgoal && delta < precgoal Abs[parm+oldparm]/2 ]&, parameters] ]), If[progress, printprogress[i, epsilonRule[[1, 2]], Map[Last, parmRule]]]; (* EXPECTATION *) (* find the expected value of the probability of the observation deriving from the first distribution, conditioned on the value of the observation *) prob1 = Map[ Which[MemberQ[domain1only, #], 1, MemberQ[domain2only, #], 0, True, (probability1[#] /. epsilonRule /. parmRule)]&, levels]; If[Apply[Or, Map[!(FreeQ[#, Complex] && 0 <= # <= 1)&, prob1]], Message[MixtureEstimation::badprob]; Return[$Failed] ]; (* MAXIMIZATION *) (* find the maximum likelihood estimates of the mixing parameter and the distribution parameters *) epsilonRule = {epsilon -> counts.(1-prob1)/n}; oldparmRule = parmRule; lhs = Map[(counts.#)&, Transpose[Map[Apply[llD, #]&, Transpose[{levels, 1-prob1}] ]]]; parmRule = parameterMLE[lhs, Transpose[{parameters, (parameters /. oldparmRule)}], workprec, accgoal, maxiter, i]; If[parmRule === $Failed, Return[$Failed]]; (* order parmRule *) parmRule = Map[{Position[parameters, First[#]][[1, 1]], #}&, parmRule]; parmRule = Map[Last, Sort[parmRule, #1[[1]] < #2[[1]] &]]; i++ ]; (* end cycle of EXPECTATION, MAXIMIZATION *) If[i > maxiter && !Apply[And, Map[Module[{parm = # /. parmRule, oldparm = # /. oldparmRule, delta}, delta = Abs[parm-oldparm]; delta < accgoal && delta < precgoal Abs[parm+oldparm]/2 ]&, parameters] ], Message[MixtureEstimation::noconv, maxiter] ]; (* find the final expected value of the probabilities that the observations derive from the first distribution *) prob1 = Map[ Which[MemberQ[domain1only, #], 1, MemberQ[domain2only, #], 0, True, (probability1[#] /. epsilonRule /. parmRule)]&, levels]; If[Apply[Or, Map[!(FreeQ[#, Complex] && 0 <= # <= 1)&, prob1]], Message[MixtureEstimation::badprob]; Return[$Failed] ]; (* DIAGNOSTICS: DISCRETE data: Find the reduction in (-2 * loglikelihood) caused by using the model (1-epsilon)*pdf1 + epsilon*pdf2 over the simple model pdf1 . First need to find the MLE's of the parameters of the simple model. *) diagRule = {}; Module[{plist1 = {}, parameters1 = {}, llD1, parmRule1, simpleLL, mixtureLL, teststat, dof, pval}, Scan[Module[{p=#}, If[VectorQ[p], {p, p0} = p]; If[!FreeQ[pdf1[x], p], AppendTo[plist1, #]; AppendTo[parameters1, p] ] ]&, plist]; If[parameters1 === {}, parmRule1 = {}, (* the simple model has no parameters! *) llD1[x0_] := Map[Simplify[D[pdf1[x0], #]/pdf1[x0]]&, parameters1]; lhs = Map[(counts.#)&, Transpose[Map[llD1, levels]] ]; parmRule1 = parameterMLE[lhs, plist1, workprec, accgoal, maxiter, 1] ]; If[parmRule1 =!= $Failed, (* figure out reduction in -2*loglikelihood *) simpleLL = counts.Map[Log[pdf1[#] /. parmRule1]&, levels]; mixtureLL = counts.Map[Log[((1-epsilon)pdf1[#] + epsilon pdf2[#]) /. parmRule /. epsilonRule]&, levels]; teststat = -2 (simpleLL - mixtureLL); (* positive when mixtureLL is less negative than simpleLL is *) dof = Length[parameters] - Length[parameters1]; diagRule = {LogLikelihood1 -> N[simpleLL], LogLikelihoodMixed -> mixtureLL}; If[teststat >= 0, pval = 1-CDF[ChiSquareDistribution[dof], teststat]; AppendTo[diagRule, MixtureModelDiagnostic -> modelreport[pval]], Message[MixtureEstimation::notmix] ] ] ]; (* end DISCRETE data DIAGNOSTICS *) (* return the MLE's of the mixing parameter, the component distribution parameters, and the final expected values of the probabilities that each observation is derived from the first distribution *) Join[epsilonRule, parmRule, diagRule, {FinalProbability1 -> prob1}] ] (* end of EM for DISCRETE data *) (* =========================== parameterMLE ============================= *) parameterMLE[lhs0_, plist_, workprec_, accgoal_, maxiter_, iteration_] := Module[{lhs = lhs0, eqns, parameters = Map[If[VectorQ[#], First[#], #]&, plist], scan, UseFindRoot, argumentlist, soln1, ratings, position}, (* On the first iteration, determine which left-hand-sides are simplified by a Numerator[Together[]]. *) If[iteration <= 1, $simplifypos = {}; lhs = MapIndexed[Module[{num = Numerator[Together[#1]]}, If[LeafCount[num] < LeafCount[#], AppendTo[$simplifypos, #2]; num, #1] ]&, lhs] ]; (* On later iterations, only simplify select left-hand-sides. *) If[iteration > 1, lhs = MapAt[Numerator[Together[#]]&, lhs, $simplifypos]]; eqns = Map[(#==0)&, lhs]; (* Find parameter estimates using either NSolve or FindRoot. *) polypos = {}; Scan[If[PolynomialQ[lhs[[#]], parameters] || Module[{p = lhs[[#]], z}, And @@ Map[(PolynomialQ[p, #] || PolynomialQ[p /. #->1/z, z])&, parameters] ], AppendTo[polypos, #] ]&, Range[Length[parameters]] ]; nonpolypos = Complement[Range[Length[parameters]], polypos]; If[nonpolypos =!= {}, (* There are nonpolynomial equations. *) If[iteration==0, (* Failure because the equations are nonpolynomial and no initial values for parameters are provided. *) Message[MixtureEstimation::polyfail, parameters[[nonpolypos]] ]; Return[$Failed] ]; UseFindRoot = True, (* The equations are polynomial. *) scan = Scan[If[(exp = Exponent[#[[1]], #[[2]]]) > 5, Return[$Failed]]&, Transpose[{lhs, parameters}]]; If[scan === $Failed, If[iteration == 0, (* Failure because the equations are polynomial of high degree and no initial values for parameters are provided. *) Message[MixtureEstimation::polyexp, #[[2]], exp], UseFindRoot = True ], UseFindRoot = False ] ]; If[UseFindRoot, (* ====================== FindRoot ======================= *) argumentlist = Join[{eqns}, plist, {AccuracyGoal -> accgoal, MaxIterations -> maxiter, WorkingPrecision -> workprec}]; If[!FreeQ[soln = Apply[FindRoot, argumentlist], FindRoot] || soln === {}, Message[MixtureEstimation::parmfail]; Return[$Failed] ]; parmRule = soln, (* ======================== NSolve ======================== *) If[!FreeQ[soln = NSolve[eqns, parameters, WorkingPrecision -> workprec], NSolve] || soln === {}, Message[MixtureEstimation::nsolvefail]; Return[$Failed] ]; If[Length[soln] != 1, (* NSolve returns more than one solution. *) (* Use a hack to pick the best estimate. Really should use ValidParameterQ[dist] to determine which of the several solutions are valid parameter values for the particular distribution in question. *) soln1 = parameters /. soln; ratings = Map[Count[#, True]&, Map[With[{parm = #}, Map[(FreeQ[#, Complex] && # > 0)&, parm] ]&, soln1]]; position = Flatten[Position[ratings, Max[ratings]]][[1]]; parmRule = soln[[position]], parmRule = soln[[1]] ] ]; parmRule ] End[] EndPackage[] (* :Examples: Swedish Pension Data; counts indicating the number of widows receiving pensions having 0, 1, ..., 6 children; data = {{0, 3062}, {1, 587}, {2, 284}, {3, 103}, {4, 33}, {5, 4}, {6, 2}}; PDF[kroneckerDistribution[k_?IntegerQ], n_] := If[n==k, 1, 0]; (1) MixtureEstimation[data, epsilon, {kroneckerDistribution[0], PoissonDistribution[lambda]}, {lambda}] MixtureEstimation::noconv: Warning: MaxIterations of 30 exceeded before parameter(s) satisfied AccuracyGoal and PrecisionGoal. {epsilon -> 0.384908, lambda -> 1.03794, ...} (2) MixtureEstimation[data, {epsilon, .25}, {kroneckerDistribution[0], PoissonDistribution[lambda]}, {{lambda, 1.035478}}] MixtureEstimation::noconv {epsilon -> 0.384918, lambda -> 1.03791, ...} (3) PDF[myUniformDistribution[k_?IntegerQ], n_] := PDF[DiscreteUniformDistribution[k], n+1]; MixtureEstimation[data, epsilon, {myUniformDistribution[7], PoissonDistribution[lambda]}, {lambda}] MixtureEstimation::noconv {epsilon -> 0.939505, lambda -> 0.27925, ...} (4) MixtureEstimation[data, epsilon, {GeometricDistribution[p], PoissonDistribution[lambda]}, {{p, .5}, lambda}] MixtureEstimation::noconv {epsilon -> 0.213431, p -> 0.887494, lambda -> 1.40466, ...} *) (* :Examples: Darwin's data on the differences in heights of 15 pairs of self-fertilized and cross-fertilized plants grown on the same plot. data = {-67, -48, 6, 8, 14, 16, 23, 24, 28, 29, 41, 49, 56, 60, 75}; (1) equal variances MixtureEstimation[data, epsilon, {NormalDistribution[mu1, sigma], NormalDistribution[mu2, sigma]}, {mu1, mu2, sigma}] {epsilon -> 0.8664891748852062, sigma -> 19.61844867661945, mu1 -> -57.37146512241579, mu2 -> 32.99873305821706, ... } The solution given above agrees with Table 2 in the Aitkin & Tunnicliffe Wilson reference. (2) equal means, unequal variances (i) MixtureEstimation[data, epsilon, {NormalDistribution[mu, sigma], NormalDistribution[mu, Sqrt[k] sigma]}, {mu, sigma, k}] MixtureEstimation::noconv {epsilon -> 0.604576, sigma -> 54.9742, k -> 0.0974945, mu -> 27.5293, ... } (ii) mu is estimated by Mean[data] = 20.9333 sigma is estimated by StandardDeviationMLE[{-67, -48}] = 13.435 Sqrt[k] sigma is estimated by StandardDeviationMLE[ {6, 8, 14, 16, 23, 24, 28, 29, 41, 49, 56, 60, 75}] = 20.6956 k is estimated by (20.6956/13.435)^2 = 2.37291 MixtureEstimation[data, epsilon, {NormalDistribution[mu, sigma], NormalDistribution[mu, Sqrt[k] sigma]}, {{mu, 20.9333}, {sigma, 13.435}, {k, 2.37291}}] MixtureEstimation::noconv: {epsilon -> 0.657528, sigma -> 8.34639, k -> 28.5271, mu -> 19.9968, ... } (iii) using values from Aitkin & Tunnicliffe Wilson reference: MixtureEstimation[data, epsilon, {NormalDistribution[mu, sigma], NormalDistribution[mu, Sqrt[k] sigma]}, {{mu, 26.37}, {sigma, 16.7869}, {k, 8.98}}] MixtureEstimation::noconv: {epsilon -> 0.65747, sigma -> 8.34736, k -> 28.5228, mu -> 19.9971, ... } (3) unequal means and variances MixtureEstimation[data, epsilon, {NormalDistribution[mu1, sigma1], NormalDistribution[mu2, sigma2]}, {mu1, mu2, sigma1, sigma2}] {epsilon -> 0.866828, sigma1 -> 9.49999, sigma2 -> 20.7234, mu1 -> -57.5111, mu2 -> 32.9849, ...} (4) testing warning and error messages: MixtureEstimation[data, epsilon, {NormalDistribution[mu, sigma], CauchyDistribution[a, b]}, {mu, a, b}] MixtureEstimation::parm: One or both of the distributions are specified by parameters not included in the parameter list {a, b, mu}. (5) testing warning and error messages: MixtureEstimation[data, epsilon, {NormalDistribution[mu, sigma], CauchyDistribution[a, b]}, {mu, sigma, a, b}] MixtureEstimation::polyfail: The maximum likelihood method of estimating the values of the parameters yielded non-polynomial equations which cannot be solved using NSolve. (6) testing warning and error messages: MixtureEstimation[data, epsilon, {NormalDistribution[0, 1], NormalDistribution[2, 3]}, {mu}] MixtureEstimation::noparm: There are no unknown parameters in the two distributions. (6) MixtureEstimation[data, epsilon, {NormalDistribution[mu, sigma], CauchyDistribution[a, b]}, {mu, sigma, {a, .2}, {b, .8}}] MixtureEstimation::notmix: Warning: The maximized log-likelihood is smaller for the mixture model than for the model consisting of simply the first distribution. This is strong evidence against the mixture model. {epsilon -> 0.868422, mu -> -57.5631, sigma -> 9.49979, a -> -456.755, b -> 25.6288, ... } (7) sigma0 = StandardDeviationMLE[data]; (i) The standard deviation is fixed at sigma0. MixtureEstimation[data, epsilon, {NormalDistribution[mu1, sigma0], NormalDistribution[mu2, sigma0]}, {mu1, mu2}] MixtureEstimation::noconv {epsilon -> 0.909444, mu1 -> -50.32, mu2 -> 28.0282, ... } (ii) The standard deviation is initially sigma0. MixtureEstimation[data, epsilon, {NormalDistribution[mu1, sigma], NormalDistribution[mu2, sigma]}, {mu1, mu2, {sigma, sigma0}}] {epsilon -> 0.866489, sigma -> 19.6184, mu1 -> -57.3715, mu2 -> 32.9987, ...} *)