Title

Correlation and regression in contingency tables
Author

 Thomas Colignatus
 Organization: Thomas Cool Consultancy & Econometrics
 URL: http://thomascool.eu/
Revision date

2007-06-05
Description

Nominal data currently lack a correlation coefficient, such as has already been defined for real data. A measure can be designed using the determinant, with the useful interpretation that the determinant gives the ratio between volumes. With M a m × n contingency table with m ≥ n, and A = Normalized[M], then A'A is a square n × n matrix and the suggested measure is r = Sqrt[Det[A'A]]. With M an a × a × ... × a contingency matrix, then pairwise correlations can be collected in a k × k matrix R. A matrix of such pairwise correlations is called an association matrix. If that matrix is also positive semi-definite (PSD) then it is a proper correlation matrix. The overall correlation then is R = f[R] where f can be chosen to impose PSD-ness. An option is to use R = Sqrt[1 - Det[R]]. However, for both nominal and cardinal data the advisable choice is to take the maximal multiple correlation within R. The resulting measure of “nominal correlation” measures the distance between a main diagonal and the off-diagonal elements, and thus is a measure of strong correlation. Cramer’s V measure for pairwise correlation can be generalized in this manner too. It measures the distance between all diagonals (including cross-diagaonals and subdiagonals) and statistical independence, and thus is a measure of weaker correlation. Finally, when also variances are defined then regression coefficients can be determined from the variance-covariance matrix.
Subjects

 Business and Economics Mathematics > Probability and Statistics Wolfram Technology > Application Packages > Additional Applications > Cool Economics
Keywords

association, correlation, contingency table, volume ratio, determinant, nonparametric methods, nominal data, nominal scale, categorical data, Fisher’, s exact test, odds ratio, tetrachoric correlation coefficient, phi, Cramer’, s V, Pearson, contingency coefficient, uncertainty coefficient, Theil’, s U, eta, meta-analysis, Simpson’, s paradox, causality, statistical independence, regression