(*^
::[ Information =
"This is a Mathematica Notebook file. It contains ASCII text, and can be
transferred by email, ftp, or other text-file transfer utility. It should
be read or edited using a copy of Mathematica or MathReader. If you
received this as email, use your mail application or copy/paste to save
everything from the line containing (*^ down to the line containing ^*)
into a plain text file. On some systems you may have to give the file a
name ending with ".ma" to allow Mathematica to recognize it as a Notebook.
The line below identifies what version of Mathematica created this file,
but it can be opened using any other version as well.";
FrontEndVersion = "Macintosh Mathematica Notebook Front End Version 2.2";
MacintoshStandardFontEncoding;
fontset = title, inactive, noPageBreakBelow, nohscroll, preserveAspect, cellOutline, groupLikeTitle, center, M18, O486, R65535, e8, 24, "Calculus";
fontset = subtitle, inactive, noPageBreakBelow, nohscroll, preserveAspect, groupLikeTitle, center, M18, O486, bold, R21845, G21845, B21845, e6, 12, "Calculus";
fontset = subsubtitle, inactive, noPageBreakBelow, nohscroll, preserveAspect, groupLikeTitle, center, M18, O486, R21845, G21845, B21845, e6, 12, "Calculus";
fontset = section, inactive, noPageBreakBelow, nohscroll, preserveAspect, groupLikeSection, grayBox, M18, O486, bold, R21845, G21845, B21845, a10, 12, "Calculus";
fontset = subsection, inactive, noPageBreakBelow, nohscroll, preserveAspect, groupLikeSection, blackBox, M18, O486, bold, R21845, G21845, B21845, a10, 12, "Calculus";
fontset = subsubsection, inactive, noPageBreakBelow, nohscroll, preserveAspect, groupLikeSection, whiteBox, M18, O486, bold, R21845, G21845, B21845, a10, 12, "Calculus";
fontset = text, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M18, O486, 12, "Calculus";
fontset = smalltext, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M18, O486, B65535, 12, "Calculus";
fontset = input, noPageBreakInGroup, nowordwrap, preserveAspect, groupLikeInput, M36, N23, O486, bold, L-5, 12, "Courier";
fontset = output, output, inactive, noPageBreakInGroup, nowordwrap, preserveAspect, groupLikeOutput, M36, N23, O486, L-5, 12, "Courier";
fontset = message, inactive, noPageBreakInGroup, nowordwrap, preserveAspect, groupLikeOutput, M18, N23, O486, R65535, L-5, 12, "Courier";
fontset = print, inactive, noPageBreakInGroup, nowordwrap, preserveAspect, groupLikeOutput, M18, N23, O486, L-5, 12, "Courier";
fontset = info, inactive, noPageBreakInGroup, nowordwrap, preserveAspect, groupLikeOutput, M18, N23, O486, B65535, L-5, 12, "Courier";
fontset = postscript, PostScript, formatAsPostScript, output, inactive, noPageBreakInGroup, nowordwrap, preserveAspect, groupLikeGraphics, M18, O486, l19, o2, w378, h214, 12, "Courier";
fontset = name, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M18, O486, italic, 10, "Geneva";
fontset = header, inactive, noKeepOnOnePage, preserveAspect, M18, O486, 12, "Times";
fontset = leftheader, inactive, M18, O486, L2, 12, "Times";
fontset = footer, inactive, noKeepOnOnePage, preserveAspect, center, M18, O486, 12, "Times";
fontset = leftfooter, inactive, M18, O486, L2, 12, "Times";
fontset = help, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M18, O486, 10, "Times";
fontset = clipboard, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M18, O486, 12, "Times";
fontset = completions, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M18, O486, 12, "Times";
fontset = special1, inactive, nohscroll, noKeepOnOnePage, preserveAspect, whiteBox, M18, O486, bold, R21845, G21845, B21845, 12, "Calculus";
fontset = special2, inactive, nohscroll, noKeepOnOnePage, preserveAspect, center, M18, O486, R21845, G21845, B21845, 12, "Calculus";
fontset = special3, inactive, nohscroll, noKeepOnOnePage, preserveAspect, M18, O486, 12, "Times";
fontset = special4, inactive, nohscroll, noKeepOnOnePage, preserveAspect, center, M18, O486, 10, "Courier";
fontset = special5, inactive, nohscroll, noKeepOnOnePage, preserveAspect, center, M18, O486, 10, "Courier";
currentKernel;
]
:[font = text; inactive; preserveAspect; center]
Miniproject
For The
Mathematica Workshop
;[s]
3:0,1;21,2;32,1;45,-1;
3:0,26,19,Calculus,0,12,0,0,0;2,27,21,New York,0,24,0,0,0;1,27,21,New York,2,24,0,0,0;
:[font = smalltext; inactive; preserveAspect; center; startGroup]
Graph of 4 regression models on a scatter plot of bivariate data
:[font = postscript; PICT; formatAsPICT; output; inactive; preserveAspect; pictureLeft = 19; pictureTop = 2; pictureWidth = 346; pictureHeight = 214; endGroup; pictureID = 8887]
:[font = input; preserveAspect]
:[font = subsection; inactive; preserveAspect]
Carol Castellon
;[s]
2:0,1;15,0;16,-1;
2:1,26,19,Calculus,1,12,21845,21845,21845;1,30,22,Calculus,1,14,21845,21845,21845;
:[font = subsection; inactive; preserveAspect]
"Experimenting With the Regression Packages"
;[s]
1:0,1;45,-1;
2:0,26,19,Calculus,1,12,21845,21845,21845;1,30,22,Calculus,1,14,21845,21845,21845;
:[font = subsection; inactive; preserveAspect; startGroup]
Brief Description
;[s]
2:0,1;17,0;18,-1;
2:1,26,19,Calculus,1,12,21845,21845,21845;1,30,22,Calculus,1,14,21845,21845,21845;
:[font = smalltext; inactive; preserveAspect; endGroup]
This project takes an arbitrary set of bivariate data (paired data) and looks at vaious ways to "fit" a model to the data. Models which were fitted include linear, quadratic, cubic, trigonometric, logrithmic, and a sum of non-integral-power ploynomial-like functions.
Part 1 of this project uses the "Fit" command and allows the user to compare the resulting models on the scatter plot visually. The statistical packages are not used in this section of the project.
Part 2 of this project uses the Linear Regression Package and the "Regress" command to obtain a statistical analysis of the models, and uses this analysis to compare three different models.
Part 3 of this project uses the NonLinear Regression Package to obtain several non-linear models and alter the method used for the fit.
:[font = title; inactive; preserveAspect; startGroup]
Experimenting with the
Regression Packages
:[font = input; preserveAspect]
:[font = input; initialization; closed; preserveAspect]
*)
<{{-1,7},{-1,9}},
Prolog->PointSize[.015],PlotStyle->VenetianRed];
:[font = smalltext; inactive; preserveAspect]
Here is the scatter plot of the paired data.
:[font = smalltext; inactive; preserveAspect]
The first "fit", a polynomial of degree 0, is really the mean of the dependent variable, y. The mean of y is 5. A horizontal line, y=5,
"fits" the data when linear regression is not appropriate, i.e. when only a point estimate is needed.
:[font = input; preserveAspect]
poly0 =Fit[list,{1},x]
pt = Plot[poly0,{x,-1,7},PlotStyle->Navy,
DisplayFunction->Identity];
Show[points,pt,Prolog->PointSize[.015]];
:[font = smalltext; inactive; preserveAspect]
In most cases, the line of regression, or "line of best fit"
(using Least squares criterian) is desired. In this example,
the "line of best fit" is y = 1.94118 + 0.823529 x
:[font = input; preserveAspect]
poly=Fit[list,{1,x},x]
bestfit = Plot[poly,{x,-1,7},PlotStyle->Navy,
DisplayFunction->Identity];
Show[points,bestfit,Prolog->PointSize[.015]];
:[font = input; preserveAspect]
:[font = smalltext; inactive; preserveAspect]
A second degree polynomial also "fits" this data.
The model is y = 4.08197 - 0.783763 x + 0.231069 xÛ
:[font = input; preserveAspect]
poly2 =Fit[list,{1,x,x^2},x]
bestfit2 = Plot[poly2,{x,-1,7},PlotStyle->Green,
DisplayFunction->Identity];
two = Show[points,bestfit2,Prolog->PointSize[.015]];
:[font = smalltext; inactive; preserveAspect]
Look at both linear and quadratic models on the scatter plot.
:[font = input; preserveAspect]
combine = Show[points,bestfit,bestfit2,
Prolog->PointSize[.015]];
:[font = input; preserveAspect]
:[font = smalltext; inactive; preserveAspect]
A third degree polynomial may also "fit" the data.
The model is: y = 8.37688 - 6.09859 x + 1.96411 xÛ - 0.164154 xÜ
:[font = input; preserveAspect]
poly3 =Fit[list,{1,x,x^2,x^3},x]
bestfit3 = Plot[poly3,{x,-1,7},PlotStyle->HotPink,
DisplayFunction->Identity];
Show[points,bestfit3,Prolog->PointSize[.015]];
:[font = smalltext; inactive; preserveAspect]
Look at al three models (linear, quadratic, cubic) on the scatter plot.
:[font = input; preserveAspect]
Three = Show[points,bestfit,bestfit2,bestfit3];
:[font = smalltext; inactive; preserveAspect]
One thing is clear, any of the three models will make a "good" prediction within the observed values (range) of x.
:[font = smalltext; inactive; preserveAspect]
Now try a trig "model".
:[font = input; preserveAspect]
Clear[trig,x]
trig[x_] =Fit[list,{1,Cos[(2 Pi/14) x],
Cos[4 Pi/14 x],Cos[6 Pi/14 x],Cos[8 Pi/14 x]},x]
besttrg = Plot[trig[x],{x,-1,7},PlotStyle->Purple,
DisplayFunction->Identity];
Show[points,besttrg,Prolog->PointSize[.015]];
:[font = smalltext; inactive; preserveAspect]
The trig model is:
:[font = smalltext; inactive; preserveAspect]
Visually, it appears the trig model is the best "fit" for this data.
Look at all models on the scatter plot.
:[font = input; preserveAspect]
Four = Show[points,bestfit,bestfit2,bestfit3,besttrg];
:[font = smalltext; inactive; preserveAspect; endGroup]
Once again, any of the models will make a "good" prediction within the observed values (range) of x. Note the cubic and the trig models are nearly the same. Can you explain why?
:[font = section; active; preserveAspect; startGroup]
B. Regression Package
:[font = smalltext; inactive; Cclosed; preserveAspect; startGroup]
Initialization:
:[font = input; initialization; preserveAspect; endGroup]
*)
<False,
OutputList->{BestFit,PredictedResponse,FitResiduals}]
:[font = input; preserveAspect]
predict1 := PredictedResponse /. regone
errors1 := FitResiduals /. regone
:[font = smalltext; inactive; preserveAspect]
This table compares the linear model to a smaller model consisting of only a constant. The F-ratio and the P-value indicate the linear model is a significantly better "fit", i.e., a linear model is better for predicting the dependent variable within the observed range of values of the independent variable.
:[font = smalltext; inactive; preserveAspect]
Recall that the second degree polynomial model was
y = 4.08197 - 0.783763 x + 0.231069 xÛ
Now we will look at the table for the quadratic model.
:[font = input; preserveAspect]
regtwo = Regress[data,{1,x,x^2},x,IncludeConstant->False,
OutputList->{BestFit,PredictedResponse}]
:[font = input; preserveAspect]
predict2 := PredictedResponse /. regone
errors2 := FitResiduals /. regone
:[font = smalltext; inactive; preserveAspect]
Once again, the table provides evidence that a quadratic model is a better "fit" than a constant. Later, we will try to find out how the linear compares to the quadratic.
:[font = smalltext; inactive; preserveAspect]
Recall that the third degree polynomial was:
y = 8.37688 - 6.09859 x + 1.96411 xÛ - 0.164154 xÜ
:[font = input; preserveAspect]
regtwo = Regress[data,{1,x,x^2,x^3},x,IncludeConstant->False,
OutputList->{BestFit,PredictedResponse}]
:[font = smalltext; inactive; preserveAspect]
The ANOVA Table provides a comparison of the given model to a smaller model which I reduced to the data. The F-test compares the two models by the ratio of their mean squares. If the value of F is large, the null hypothesis supporting the smaller model is rejected. In all three cases above, the smaller model (model of data only) was rejected (which was expected). The coefficient of determination, RÛ, is a summary statistic that describes the relationship between the regressors and the response variable. AdjustedSquared gives an adjusted value that you can use to compare subsequent subsets of models. The AdjustedSquared for the above models were:
linear: 0.937295
Quadratic: 0.939489
Cubic: 0.944307
which would seem to imply that all three models are nearly the same in their ability to predict the dependent variable within the observed range of the independent variable.
:[font = smalltext; inactive; preserveAspect]
We might get some insight by looking at the plot of predicted values against residuals for all three models on the same axes.
:[font = input; preserveAspect]
opt1 = ListPlot[Transpose[{predict1,errors1}],
PlotStyle->Navy,
DisplayFunction->Identity];
opt2 = ListPlot[Transpose[{predict2,errors2}],
PlotStyle->{PointSize[0.03],Green},
DisplayFunction->Identity];
opt3 = ListPlot[Transpose[{predict3,errors3}],
PlotStyle->{PointSize[0.05],Thistle},
DisplayFunction->Identity];
Show[opt3,opt2,opt1,DisplayFunction->$DisplayFunction];
:[font = smalltext; inactive; preserveAspect]
What this shows is that all three models are the same in their ability to predict the dependent variable within the observed range of the independent variable!!!
;[s]
3:0,0;50,1;54,0;164,-1;
2:2,26,19,Calculus,0,12,0,0,65535;1,26,19,Calculus,1,12,0,0,65535;
:[font = input; preserveAspect; endGroup]
:[font = section; inactive; preserveAspect; startGroup]
c. Nonlinear Fit Regression Package
:[font = input; initialization; closed; preserveAspect]
*)
<Green,
DisplayFunction->Identity];
Show[points,try1,Prolog->PointSize[.015]];
:[font = input; preserveAspect]
:[font = smalltext; inactive; preserveAspect]
The next model tried is of the form: y = axõ + cx + d
:[font = input; preserveAspect]
params5:= NonlinearFit[data,a x^b + c x + d,x,{a,b,c,d}]
Clear[nonlin5,x]
nonlin5[x_] = params5[[1,2]] x^params5[[2,2]] + params5[[3,2]] x + params5[[4,2]]
try5 = Plot[nonlin5[x],{x,0,7},PlotStyle->Green,
DisplayFunction->Identity];
Show[points,try5,Prolog->PointSize[.015]];
:[font = input; preserveAspect]
:[font = smalltext; inactive; preserveAspect]
The next model tried is of the form y = b cos(ax) + c
:[font = input; preserveAspect]
params2 := NonlinearFit[data,b Cos[a x] + c,x,{a,b,c}]
Clear[nonlin2,x]
nonlin2[x_] = params2[[2,2]] Cos [params2[[1,2]] x] + params2[[3,2]]
try2 = Plot[nonlin2[x],{x,0,7},PlotStyle->Violet,
DisplayFunction->Identity];
Show[points,try2,Prolog->PointSize[.015]];
:[font = input; preserveAspect]
:[font = smalltext; inactive; preserveAspect]
The next model tried is of the form y = a log(bx) + c
:[font = input; preserveAspect]
params3:= NonlinearFit[data,a Log[b x] + c,x,{a,b,c}]
Clear[nonlin3,x]
nonlin3[x_] = params3[[1,2]] Log[params3[[2,2]] x] + params3[[3,2]]
try3 = Plot[nonlin3[x],{x,.25,7},PlotStyle->Navy,
DisplayFunction->Identity];
Show[points,try3,Prolog->PointSize[.015]];
:[font = input; preserveAspect]
:[font = smalltext; inactive; preserveAspect]
This package is reliable when a linear model y = a + bx is tried.
Recall the "line of best fit" was y = 1.94118 + 0.823529 x
:[font = input; preserveAspect]
params4:= NonlinearFit[data,a+b x,x,{a,b}]
Clear[nonlin4,x]
nonlin4[x_] = params4[[1,2]] + params4[[2,2]] x
try4 = Plot[nonlin4[x],{x,0,7},PlotStyle->Turquoise,
DisplayFunction->Identity];
Show[points,try4,Prolog->PointSize[.015]];
:[font = smalltext; inactive; preserveAspect]
The following is an experiment with the options available in the non-linear fit regression package. The first options tried changes the method used in the fit, using the model y = axõ + cx + d again.
:[font = input; preserveAspect]
params6:= NonlinearFit[data,a x^b + c x + d,x,{a,b,c,d},Method->FindMinimum]
Clear[nonlin6,x]
nonlin6[x_] = params6[[1,2]] x^params6[[2,2]] + params6[[3,2]] x + params6[[4,2]]
try6 = Plot[nonlin6[x],{x,.00001,7},PlotStyle->Magenta,
DisplayFunction->Identity];
Show[points,try6,Prolog->PointSize[.015]];
:[font = smalltext; inactive; preserveAspect]
This gives very different parameters from the earlier result,
y = 2.52558 + 0.127878x + 0.195419xÚùÞàÞáÚ
(Curiously, this is the only model whose parameters changed significantly when a different method was used). Both graphs are plotted together to show the differences.
:[font = input; preserveAspect]
Show[points,try5,try6,Prolog->PointSize[.015]];
:[font = smalltext; inactive; preserveAspect; endGroup; endGroup]
Conclusions:
Ultimately, to do a good regression analysis, we really need more than seven observations. With a sample size of seven, regression analysis using any model would work. So it is not surprizing that any of the models "fit" the data in terms of prediction value!
The value of this project is in producing the codes which could be used for other data sets.
^*)