(*:Version: Mathematica 3.0 *)

(*:Name: Statistics`Common`RegressionCommon` *)

(*:Context: Statistics`Common`RegressionCommon` *)

(*:Title: Options for statistical regression *)

(*:Author:
  ECM (Wolfram Research), September 1992
*)

(*:Copyright: Copyright 1992-2007, Wolfram Research, Inc. *)

(*:Reference: Usage messages only. *)

(*:Summary:
This package defines symbols in a common context for use in statistical
regression packages.
*)

(*:Keywords: weights, regression statistics *)

(*:Requirements: No special system requirements. *)

(*:Warning: None. *)

(*:Sources: Basic statistics texts. *)

BeginPackage["Statistics`Common`RegressionCommon`"]

(* ===================== common regression functions ====================== *)

RegressionReportValues::usage =
"RegressionReportValues[regfcn] gives a list of valid values that may be \
included in the RegressionReport list input as an option to regfcn; \
regfcn specifies a regression function like Regress or NonlinearRegress."


(* ===================== regression function options ====================== *)

Weights::usage =
"Weights is an option to fitting functions and is used to define a vector \
of weights inversely proportional to the variances of measurement error in \
the observed responses. The length of this vector should be equal to the \
length of the data. When weights are specified, the weighted fit is \
calculated. Weights -> Automatic specifies weights equal to unity. \
Weights -> (w[#]&) specifies a pure function that will be applied to the \
responses to obtain a vector of weights."

RegressionReport::usage =
"RegressionReport is an option to regression functions and specifies a \
statistic or a list of statistics to be reported about the fit. \
RegressionReportValues[regfcn] specifies valid statistics for the \
regression function regfcn."

BasisNames::usage =
"BasisNames is an option to regression functions and is used to specify \
headings for each of the basis functions (or predictors) in the output."


(* ==================== RegressionReport option values ===================== *)

SummaryReport
  (* give no usage to SummaryReport,
     as it is given usage by LinearRegression.m and NonlinearFit.m *)

AdjustedRSquared::usage =
"AdjustedRSquared is used in the output of regression functions to identify \
the multiple correlation coefficient adjusted for the number of degrees of \
freedom in the fit."

ANOVATable::usage =
"ANOVATable is used in the output of regression and ANOVA functions to identify \
the analysis of variance table."

BestFit::usage =
"BestFit is used in the output of regression functions to identify the best \
fit."

BestFitParameters::usage =
"BestFitParameters is used in the output of regression functions to identify the \
list of parameter estimates that give the best (least squares) fit."

BestFitParametersDelta::usage =
"BestFitParametersDelta is used in the output of linear regression functions \
to identify a list of parameter estimate influence diagnostics, p associated \
with each of the n data points, where p is the number of parameters in the \
model. The ith p-vector gives the standardized differences in the parameter
estimates resulting from the omission of a data point. If \
PredictedResponseDelta indicates that the ith point is influential, then a \
large absolute value (> 2/Sqrt[n]) for the jth element in the ith p-vector of \
BestFitParametersDelta indicates that the jth parameter is heavily \
influenced by the ith point. (Kuh and Welsch call this diagnostic matrix \
DFBETAS.)"

CatcherMatrix::usage =
"CatcherMatrix is used in the output of linear regression functions to identify \
the so-called `matrix of catchers' C. If y is the response vector and b is the \
estimated parameter vector, then b = C . y. This matrix can be used to \
compute regression diagnostics. Each row of C catches all the information \
the predictors provide about the corresponding element of the parameter \
vector b."

CookD::usage =
"CookD is used in the output of linear regression functions to identify the \
list of Cook's D influence diagnostics, one associated with each of the data \
points. This diagnostic combines a measure of the remoteness of the point in \
the space of basis functions with a measure of the fit at that point, and \
is a squared distance. Values greater than Quantile[FRatioDistribution[p, n-p], \
.5] may indicate influential points, where n is the number of points and p is \
the number of estimated parameters."

CovarianceMatrixDetRatio::usage =
"CovarianceMatrixDetRatio is used in the output of regression functions to \
identify a list of determinant ratio influence diagnostics, one associated with \
each of the data points. The ith diagnostic is given by the ratio of the \
determinant of the parameter covariance matrix obtained by deleting the ith \
row in the original data to the determinant of the parameter covariance matrix \
for the original data. Values outside the interval {1 - 3p/n, 1 + 3p/n} may \
indicate influential points, where n is the number of points and p is the number \
of estimated parameters. (Kuh and Welsch call this diagnostic list COVRATIO.)"

DurbinWatsonD::usage =
"DurbinWatsonD is used in the output of regression functions to identify the \
Durbin-Watson d statistic for testing the existence of a first order \
autoregressive process. A value close to 0 indicates positive correlation and \
a value close to 4 indicates negative correlation. To test positive \
correlation, Durbin-Watson tables are entered with d; to test negative \
correlation, Durbin-Watson tables are entered with (4-d)."

EigenstructureTable::usage =
"EigenstructureTable is used in the output of linear regression functions to \
identify a table of information about the eigenstructure of the correlation \
matrix of the nonconstant predictors (basis functions). The table includes \
eigenvalues listed from largest to smallest, the associated condition indices, \
and for each predictor, the proportion of the variance attributable to each \
eigenvalue. Predictors that indicate a large proportion of variance due to a \
particular eigenvalue may be involved in a collinear relationship. Indices \
of 30 to 100 indicate moderate to strong collinearities. When Weights are \
specified, the correlation matrix is based on the weighted observations."

EstimatedVariance::usage =
"EstimatedVariance is used in the output of regression functions to \
identify the estimated error variance, or the residual mean square."

JackknifedVariance::usage =
"JackknifedVariance is used in the output of regression functions to \
identify the jackknifed estimated error variance vector, each element \
giving the estimated error variance resulting from the omission of the \
corresponding data point. It is given by \
((n-p)*EstimatedVariance - FitResiduals^2/(1-HatDiagonal))/(n-p-1), \
where n is the number of observations and p is the number of estimated \
parameters."

FitResiduals::usage =
"FitResiduals is used in the output of regression functions \
to identify the list of differences between the response data and \
the best fit evaluated at the same abscissa points."

HatDiagonal::usage =
"HatDiagonal is used in the output of regression functions to identify \
the diagonal of the projection or `hat' matrix H. If y is the response vector \
and yhat is the predicted response vector, then yhat = H . y. The leverage of \
a data point is given by the associated element in the HatDiagonal vector. A \
leverage of zero indicates a point with no influence on the fit, and a leverage \
of one indicates that a degree of freedom has been lost to fitting that point. \
If n is the number of points and p is the number of parameters, 2*p/n is \
often used as the threshold for determining which points have significant \
leverage. For a linear model, the elements of this vector sum to p."
(* NOTE: from 
  D. A. Belsey, E. Kuh, & R. E. Welsch, Regression Diagnostics, 1980, Wiley.
  "Assume the explanatory variables are independently distributed multinormal.
  While these assumptions are often not valid in practice, they allow one to
  show that (n-p)(h[i] - (1/n))/((1-h[i])(p-1)) is distributed F with p-1 and
  n-p degrees of freedom.  For p>10 and n-p>50 the 95% value for F is < 2,
  making 2*p/n a good rough cutoff.  When p/n > .04, there are so few degrees of
  freedom per parameter that all observations become suspect.  For small p,
  2*p/n tends to call a few too many points to our attentions."
*)

MeanPredictionCITable::usage =
"MeanPredictionCITable is used in the output of regression functions to \
identify a table of confidence intervals for the mean predicted responses, \
one interval for each row in the data or design matrix. The level of the \
confidence interval is specified using the option ConfidenceLevel. The \
interval is found using StudentTCI."

ParameterCITable::usage =
"ParameterCITable is used in the output of regression functions to \
identify a table of confidence intervals for the parameters. \
The level of the confidence interval is specified using the option \
ConfidenceLevel. The interval is found using StudentTCI."

ParameterConfidenceRegion::usage =
"ParameterConfidenceRegion is used in the output of regression functions \
to specify an elliptically shaped joint confidence region for the parameters. \
It is based on CovarianceMatrix in the case of Regress and \
AsymptoticCovarianceMatrix in the case of NonlinearRegress. The level of the \
confidence interval is specified using the option ConfidenceLevel. \
The option ParameterConfidenceRegion alone specifies the joint confidence \
region of all parameters in the case of Regress, and the asymptotic joint \
confidence region in the case of NonlinearRegress. In the case of Regress, the \
option ParameterConfidenceRegion[list] specifies the confidence region of the \
parameters associated with the basis functions in list, conditioned on the rest \
of the model. In the case of NonlinearRegress, the option \
ParameterConfidenceRegion[list] specifies the asymptotic confidence region \
of the parameters in list, conditioned on the rest of the model."

ParameterTable::usage =
"ParameterTable is used in the output of regression functions to identify a \
table of information about the parameter estimates."

PartialSumOfSquares::usage =
"PartialSumOfSquares is used in the output of Regress[data, funs, vars] to \
identify a list giving the increase in the model sum of squares \
due to adding the corresponding (nonconstant) basis function (predictor) in \
funs to a model consisting of the remaining basis functions. Partial sum of \
squares is also referred to as type II sum of squares."

PredictedResponse::usage =
"PredictedResponse is used in the output of regression functions \
to identify the best fit evaluated at the data points."

PredictedResponseDelta::usage =
"PredictedResponseDelta is used in the output of linear regression functions to \
identify a list of predicted response influence diagnostics, one associated \
with each of the data points. The ith diagnostic gives the standardized \
difference in the predicted response for the ith point, resulting from the \
omission of the ith data point. An absolute value greater than 2*Sqrt[p/n] \
may indicate an influential point, where n is the number of points and p is the \
number of estimated parameters. (Kuh and Welsch call this diagnostic list \
DFFITS.)"

RSquared::usage =
"RSquared is used in the output of regression functions to identify the square \
of the multiple correlation coefficient."

SequentialSumOfSquares::usage =
"SequentialSumOfSquares is used in the output of Regress[data, funs, vars] to \
identify a list giving a partitioning of the model sum of squares into \
component sums of squares due to each (nonconstant) basis function (predictor) \
as it is added sequentially to the model, in the order it appears in funs. \
Sequential sum of squares is also referred to as type I sum of squares."

SinglePredictionCITable::usage =
"SinglePredictionCITable is used in the output of regression functions to \
identify a table of confidence intervals for the predicted response of \
single observations, one interval for each row in the data or design matrix. \
The level of the confidence interval is specified using the option \
ConfidenceLevel. The interval is found using StudentTCI."

StandardizedResiduals::usage =
"StandardizedResiduals is used in the output of regression functions to \
identify the list of standardized residuals, where the ith residual is \
divided by the standard error for that residual: \
FitResiduals / Sqrt[EstimatedVariance * (1 - HatDiagonal)]. \
In the case of linear regression, each standardized \
residual follows the beta distribution with unity variance."

StudentizedResiduals::usage =
"StudentizedResiduals is used in the output of linear regression functions to \
identify the list of studentized residuals, where the ith residual is \
divided by the standard error for that residual resulting from the omission of \
a data point: \
FitResiduals / Sqrt[JackknifedVariance * (1 - HatDiagonal)]. \
Each studentized residual follows StudentTDistribution[n-p-2]."

VarianceInflation::usage =
"VarianceInflation is used in the output of linear regression functions to \
identify the list of variance inflation collinearity diagnostics, one associated \
with each of the parameters to be estimated. Values greater than \
1/(1-RSquared) indicate basis functions that may be involved in a collinear \
relationship."


EndPackage[]