Wolfram Library Archive

All Collections Articles Books Conference Proceedings
Courseware Demos MathSource Technical Notes
Title Downloads

Correlation and regression in contingency tables

Thomas Colignatus
Organization: Thomas Cool Consultancy & Econometrics
URL: http://thomascool.eu/
Revision date


Nominal data currently lack a correlation coefficient, such as has already been defined for real data. A measure can be designed using the determinant, with the useful interpretation that the determinant gives the ratio between volumes. With M a m n contingency table with m ≥ n, and A = Normalized[M], then A'A is a square n n matrix and the suggested measure is r = Sqrt[Det[A'A]]. With M an a a ... a contingency matrix, then pairwise correlations can be collected in a k k matrix R. A matrix of such pairwise correlations is called an association matrix. If that matrix is also positive semi-definite (PSD) then it is a proper correlation matrix. The overall correlation then is R = f[R] where f can be chosen to impose PSD-ness. An option is to use R = Sqrt[1 - Det[R]]. However, for both nominal and cardinal data the advisable choice is to take the maximal multiple correlation within R. The resulting measure of “nominal correlation” measures the distance between a main diagonal and the off-diagonal elements, and thus is a measure of strong correlation. Cramer’s V measure for pairwise correlation can be generalized in this manner too. It measures the distance between all diagonals (including cross-diagaonals and subdiagonals) and statistical independence, and thus is a measure of weaker correlation. Finally, when also variances are defined then regression coefficients can be determined from the variance-covariance matrix.

*Business and Economics
*Mathematics > Probability and Statistics
*Wolfram Technology > Application Packages > Additional Applications > Cool Economics

association, correlation, contingency table, volume ratio, determinant, nonparametric methods, nominal data, nominal scale, categorical data, Fisher’, s exact test, odds ratio, tetrachoric correlation coefficient, phi, Cramer’, s V, Pearson, contingency coefficient, uncertainty coefficient, Theil’, s U, eta, meta-analysis, Simpson’, s paradox, causality, statistical independence, regression

ColignatusCorrelation.zip (157.5 KB) - ZIP archive