Mathematica 9 is now available

How Do I Perform Multivariate Linear Regression with Mathematica?

This how-to will provide an example of performing basic ordinary least squares (OLS) regression on a data set with three independent variables. There are too many options to demonstrate each of them here, but this should give you a feel for the flexibility and design of the Mathematica function Regress.


Importing the Data and Creating Exploratory Plots

First, load the packages that you will need.

[Graphics:Images/index_gr_1.gif]

Import is used to read data from the file C:\Windows\Desktop\example.dat. This particular data set consists of one hundred "observations" previously manufactured with Mathematica using the following model:

        [Graphics:Images/index_gr_2.gif] = [Graphics:Images/index_gr_3.gif] + [Graphics:Images/index_gr_4.gif][Graphics:Images/index_gr_5.gif] + [Graphics:Images/index_gr_6.gif][Graphics:Images/index_gr_7.gif] + [Graphics:Images/index_gr_8.gif][Graphics:Images/index_gr_9.gif] + [Graphics:Images/index_gr_10.gif]
          where epsilon ~ N(0, 3)
               i  = 1, ... , 100
             [Graphics:Images/index_gr_11.gif]  = -8.0
             [Graphics:Images/index_gr_12.gif]  = 4.8
             [Graphics:Images/index_gr_13.gif]  = 0.35
             [Graphics:Images/index_gr_14.gif]  = 1.0          

[Graphics:Images/index_gr_15.gif]

Taking a look at the first few observations of raw data using TableForm, you can see that it is organized in the form {{[Graphics:Images/index_gr_16.gif], [Graphics:Images/index_gr_17.gif], [Graphics:Images/index_gr_18.gif], [Graphics:Images/index_gr_19.gif]}, {[Graphics:Images/index_gr_20.gif], [Graphics:Images/index_gr_21.gif], [Graphics:Images/index_gr_22.gif], [Graphics:Images/index_gr_23.gif]}, ... }.

[Graphics:Images/index_gr_24.gif]
[Graphics:Images/index_gr_25.gif]

Individual lists can be extracted by (a) transposing the data set, (b) selecting a column (in this case 1 through 4), and (c) adding a variable name to each list.

[Graphics:Images/index_gr_26.gif]

Now we can visualize the relationship between each of our independent variables and the dependent variable y.

[Graphics:Images/index_gr_27.gif]

See graphs.


Performing Basic OLS

Regress accepts three paramaters, as well as a list of options under RegressionReport -> { }. The first parameter is the data in the form {{[Graphics:Images/index_gr_29.gif], [Graphics:Images/index_gr_30.gif], [Graphics:Images/index_gr_31.gif], [Graphics:Images/index_gr_32.gif]}, {[Graphics:Images/index_gr_33.gif], [Graphics:Images/index_gr_34.gif], [Graphics:Images/index_gr_35.gif], [Graphics:Images/index_gr_36.gif]}, ... }. The second parameter is a list specification of the model to be fit. As this is a linear model with three independent variables and a y-intercept, we specify [Graphics:Images/index_gr_37.gif]. A model with interactions between the variables, for example, [Graphics:Images/index_gr_38.gif] = [Graphics:Images/index_gr_39.gif] + [Graphics:Images/index_gr_40.gif][Graphics:Images/index_gr_41.gif] + [Graphics:Images/index_gr_42.gif][Graphics:Images/index_gr_43.gif] + [Graphics:Images/index_gr_44.gif][Graphics:Images/index_gr_45.gif] + [Graphics:Images/index_gr_46.gif][Graphics:Images/index_gr_47.gif] + [Graphics:Images/index_gr_49.gif], would be specified by Regress[example,[Graphics:Images/index_gr_50.gif]]. The third parameter is a list that simply identifies the columns in the matrix example.

[Graphics:Images/index_gr_51.gif]
[Graphics:Images/index_gr_52.gif]
{ParameterTable ->
  Estimate SE TStat PValue
1 -6.66618 5.03394 -1.32425 0.188565
x1 4.66691 0.208694 22.3625 0.
x2 0.360126 0.0222749 16.1673 0.
x3 1.01063 0.0104686 96.539 0.      ,
RSquared -> 0.991364, AdjustedRSquared -> 0.991094, EstimatedVariance -> 7.84618,

ANOVATable ->
  DF SumOfSq MeanSq FRatio PValue
Model 3 86469. 28823. 3673.51 0.
Error 96 753.233 7.84618    
Total 99 87222.2           }

Fortunately, the parameter estimates are fairly close to the real parameters used to manufacture this data.


Exploring the Regression Residuals

There are many report options, but for this example the focus is exclusively on FitResiduals.

[Graphics:Images/index_gr_54.gif]
[Graphics:Images/index_gr_55.gif]

errors has now been defined as the vector of residuals from the linear regression above. The two definitions below create graphics objects: a histogram of the residuals and a normal distribution probability density function (PDF). As we see, the errors are roughly normal and, not surprisingly, the standard deviation is roughly equal to the standard deviation of epsilon in this manufactured data set.

[Graphics:Images/index_gr_56.gif]
[Graphics:Images/index_gr_57.gif]

[Graphics:Images/index_gr_58.gif]

As one last illustration, the residuals are visually inspected for heteroskedacticity or other problems. Mathematical tests for this, of course, are possible. In this case, however, everything looks well behaved.

[Graphics:Images/index_gr_59.gif]

See graphs.


Launching Point: Doing It Yourself

The strength of Mathematica, of course, is that you can write your own algorithms, create your own tests, and interact with your own data at as low a level as desired. For instance, you can write your own ordinary least squares function that returns the parameter vector beta = [Graphics:Images/index_gr_61.gif]. (See "How Do I Create Functions with Mathematica?" for rules on defining functions.)

[Graphics:Images/index_gr_62.gif]

Next, the lists of data are turned into a matrix of independent data points.

[Graphics:Images/index_gr_63.gif]
[Graphics:Images/index_gr_64.gif]

When you run the calculations, you should get the same results.

[Graphics:Images/index_gr_65.gif]
[Graphics:Images/index_gr_66.gif]

This leads down the road to implementing novel and sophisticated statistical tests such as maximum likelihood estimation (MLE) and generalized method of moments (GMM). If this is the path for you, Mathematica will always be useful to you. Otherwise, you should find all you need only one or two commands away in the Mathematica package Statistics.