How Do I Perform Multivariate Linear Regression with
Mathematica?
This howto will provide an example of performing basic ordinary least
squares (OLS) regression on a data set with three independent variables.
There are too many options to demonstrate each of them here, but this
should give you a feel for the flexibility and design of the
Mathematica function Regress.
Importing the Data and Creating Exploratory Plots
First, load the packages that you will need.
Import is used to read data from the file
C:\Windows\Desktop\example.dat. This
particular data set consists of one hundred "observations"
previously manufactured with Mathematica using the following
model:
= + + + +
where
epsilon ~ N(0, 3)
i =
1, ... , 100
= 8.0
= 4.8
= 0.35
= 1.0
Taking a look at the first few observations of raw data using
TableForm, you can see that it is organized in the
form {{, , , }, {, , , }, ... }.
Individual lists can be extracted by (a) transposing the data set, (b)
selecting a column (in this case 1 through 4), and (c) adding a variable
name to each list.
Now we can visualize the relationship between each of our independent
variables and the dependent variable y.
See graphs.
Performing Basic OLS
Regress accepts three paramaters, as well as a list of options
under RegressionReport > { }. The first parameter is the data in the
form {{, , , }, {, , , }, ... }. The
second parameter is a list specification of the model to be fit. As this
is a linear model with three independent variables and a yintercept, we
specify . A model with interactions between the
variables, for example, = + + + + + , would be specified by Regress[example,].
The third parameter is a list that
simply identifies the columns in the matrix example.
{ParameterTable >


Estimate 
SE 
TStat 
PValue 
1 
6.66618 
5.03394 
1.32425 
0.188565 
x_{1} 
4.66691 
0.208694 
22.3625 
0. 
x_{2} 
0.360126 
0.0222749 
16.1673 
0. 
x_{3} 
1.01063 
0.0104686 
96.539 
0. , 
RSquared > 0.991364, AdjustedRSquared > 0.991094,
EstimatedVariance > 7.84618,
ANOVATable > 

DF 
SumOfSq 
MeanSq 
FRatio 
PValue 
Model 
3 
86469. 
28823. 
3673.51 
0. 
Error 
96 
753.233 
7.84618 


Total 
99 
87222.2 


} 

Fortunately, the parameter estimates are fairly close to the real
parameters used to manufacture this data.
Exploring the Regression Residuals
There are many report options, but for this example the focus is
exclusively on FitResiduals.
errors has now been defined as the vector of residuals from the
linear regression above. The two definitions below create graphics
objects: a histogram of the residuals and a normal distribution probability
density function (PDF). As we see, the errors are
roughly normal and, not surprisingly, the standard deviation is roughly
equal to the standard deviation of epsilon in this manufactured data
set.
As one last illustration, the residuals are visually inspected for
heteroskedacticity or other problems. Mathematical tests for
this, of course, are possible. In this case, however,
everything looks well behaved.
See graphs.
Launching Point: Doing It Yourself
The strength of
Mathematica, of course, is that you can write your own algorithms,
create your own tests, and interact with your own data at as low a level
as desired. For instance, you can write your own ordinary least
squares function that returns the parameter vector beta = . (See "How
Do I Create Functions with Mathematica?" for rules on defining
functions.)
Next, the lists of data are turned into a matrix of independent data
points.
When you run the calculations, you should get the same results.
This leads down the road
to implementing novel and sophisticated statistical tests such as maximum
likelihood estimation (MLE) and generalized method of moments
(GMM). If this is the path for you, Mathematica will
always be useful to you. Otherwise, you should find all you
need only one or two commands away in the Mathematica
package
Statistics.
 