Abstract
Recent research into graphical association models has focussed interest on the conditional Gaussian distribution for analyzing mixtures of categorical and continuous variables. A special case of such models, utilizing the homogeneous conditional Gaussian distribution, has in fact been known since 1961 as the location model, and for the past 30 years has provided a basis for the multivariate analysis of mixed categorical and continuous variables. Extensive development of this model took place throughout the 1970’s and 1980’s in the context of discrimination and classification, and comprehensive methodology is now available for such analysis of mixed variables. This paper surveys these developments and summarizes current capabilities in the area. Topics include distances between groups, discriminant analysis, error rates and their estimation, model and feature selection, and the handling of missing data.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
AFIFI, A. A., and ELASHOFF, R. M. (1969), “Multivariate Two-sample Tests with Dichotomous and Continuous Variables. 1. The Location Model,”Annals of Mathematical Statistics, 40, 290–298.
AKAIKE, H. (1973), “Information Theory and an Extension of the Maximum Likelihood Principle,” inSecond International Symposium on Information Theory, Eds., B. N. Petrov and F. Csaki Budapest: Akademia Kiado, 267–281.
ANDERSON, J. A. (1982), “Logistic Discrimination,” InHandbook of Statistics 2, Classification, Pattern Recognition and Reduction of Dimensionality, Eds., P.R. Krishnaiah and L.N. Kanal. Amsterdam: North Holland, 169–191.
ANDERSON, T. W. (1973), “An Asymptotic Expansion of the Distribution of the Studentized Classification Statistic W”,Annals of Statistics, 1, 964–972.
BALAKRISHNAN, N., and TIKU, M. L. (1988), “Robust Classification Procedures Based on Dichotomous and Continuous Variables”,Journal of Classification, 5, 53–80.
CHANG P. C., and AFIFI, A. A. (1974), “Classification Based on Dichotomous and Continuous Variables,”Journal of the American Statistical Association, 69, 336–339.
COX, D. R. (1972), “The Analysis of Multivariate Binary Data,”Applied Statistics 21, 113–120.
CUADRAS, C. M. (1989), “Distance Analysis in Discrimination and Classification Using Both Continuous and Categorical Variables,” inStatistical Data Analysis and Inference, Ed., Y. Dodge, Amsterdam: North Holland, 459–473.
CUADRAS, C. M. (1991), “A Distance-based Approach to Discriminant Analysis and Its Properties”, Mathematics preprint series no. 90, Barcelona University.
DAUDIN, J. J. (1986), “Selection of Variables in Mixed-variable Discriminant Analysis,”Biometrics, 42, 473–481.
DILLON, W. R., and GOLDSTEIN, M. (1978) “On the Performance of Some Multinomial Classification Rules,”Journal of the American Statistical Association, 73, 305–313.
EDWARDS, D. (1990), “Hierarchical Interaction Models,”Journal of the Royal Statistical Society, Series B, 52, 3–20.
GANESHANANDAM, S., and KRZANOWSKI, W. J. (1989), “On Selecting Variables and Assessing Their Performance in Linear Discriminant Analysis,”Australian Journal of Statistics, 31, 433–447.
GOWER, J. C. (1971), “A General Coefficient of Similarity and Some of Its Properties,”Biometrics, 27, 857–871.
HAN, C.-P. (1979), “Alternative Methods of Estimating the Likelihood Ratio in Classification of Multivariate Normal Observations,”American Statistician 33, 204–206.
KNOKE, J. D. (1982), “Discriminant Analysis with Discrete and Continuous Variables”,Biometrics, 38, 191–200.
KRUSINSKA, E. (1988a), “Variable Selection in Location Model for Mixed Variable Discrimination: A Procedure Based on Total Probability of Misclassification,”EDV in Medizin und Biologie, 19, 14–18.
KRUSINSKA, E. (1988b), “Linear Transformations in Location Model and Their Influence on Classification Results in Mixed Variable Discrimination,”EDV in Medizin und Biologie, 19, 110–114.
KRUSINSKA, E. (1989a), “New Procedure for Selection of Variables in Location Model for Mixed Variable Discrimination,”Biometrical Journal, 31, 511–523.
KRUSINSKA, E. (1989b), “Two Step Semi-optimal Branch and Bound Algorithm for Feature Selection in Mixed Variable Discrimination,”Pattern Recognition, 22, 455–459.
KRUSINSKA, E. (1990), “Suitable Location Model Selection in the Terminology of Graphical Models,”Biometrical Journal 32, 817–826.
KRZANOWSKI, W. J. (1975), “Discrimination and Classification Using Both Binary and Continuous Variables,”Journal of the American Statistical Association 70, 782–790.
KRZANOWSKI, W. J. (1976), “Canonical Representation of the Location Model for Discrimination or Classification,”Journal of the American Statistical Association, 71, 845–848.
KRZANOWSKI, W. J. (1977), “The Performance of Fisher’s Linear Discriminant Function Under Non-optimal Conditions,”Technometrics 19, 191–200.
KRZANOWSKI, W. J. (1979), “Some Linear Transformations for Mixtures of Binary and Continuous Variables, With Particular Reference to Linear Discriminant Analysis,”Biometrika, 66, 33–39.
KRZANOWSKI, W. J. (1980), “Mixtures of Continuous and Categorical Variables in Discriminant Analysis,”Biometrics, 36, 493–499.
KRZANOWSKI, W. J. (1982), “Mixtures of Continuous and Categorical Variables in Discriminant Analysis: A Hypothesis-testing Approach,”Biometrics, 38, 991–1002.
KRZANOWSKI, W. J. (1983a), “Distance Between Populations Using Mixed Continuous and Categorical Variables,”Biometrika, 70, 235–243.
KRZANOWSKI, W. J. (1983b), “Stepwise Location Model Choice in Mixed-variable Discrimination,”Applied Statistics, 32, 260–266.
KRZANOWSKI, W. J. (1984) “On the Null Distribution of Distance Between Two Groups, Using Mixed Continuous and Categorical Variables,”Journal of Classification, 1, 243–253.
KRZANOWSKI, W. J. (1986), “Multiple Discriminant Analysis in the Presence of Mixed Continuous and Categorical Data,”Computers and Mathematics with Applications, 12A(2), 179–185.
KRZANOWSKI, W. J. (1987), “A Comparison Between Two Distance-based Discriminant Principles,”Journal of Classification, 4, 73–84.
LACHENBRUCH P. A., and MICKEY, M. R. (1968), “Estimation of Error Rates in Discriminant Analysis,”Technometrics 10, 1–11.
LAURITZEN, S. L., and WERMUTH, N. (1989), “Graphical Models for Association Between Variables, Some of Which Are Qualitative and Some Quantitative,”Annals of Statistics, 17, 31–54.
LERMAN, I. C. (1987), “Construction d’un indice de Similarité entre objets décrits par des variables d’un type quelconque. Application au problème du consensus en classification (1),”Revue de Statistique Appliquée, 35, 39–60
LEUNG, C. Y. (1989), “The Studentized Location Linear Discriminant Function,”Communications in Statistics, Theory and Methods 18, 3977–3990.
LITTLE, R. J. A., and SCHLUCHTER, M. D. (1985), “Maximum Likelihood Estimation for Mixed Continuous and Categorical Data with Missing Values,”Biometrika, 72, 497–512.
MATUSITA, K. (1956), “Decision Rule, Based on the Distance, for the Classification Problem,”Annals of Mathematical Statistics, 8, 67–77.
OKAMOTO, M. (1963), “An Asymptotic Expansion for the Distribution of the Linear Discriminant Function,”Annals of Mathematical Statistics, 34, 1286–1301 (with correction in39, 1358–1359).
OLKIN, I., and TATE R. F. (1961), “Multivariate Correlation Models with Mixed Discrete and Continuous Variables,”Annals of Mathematical Statistics, 32, 448–465 (with correction in36 343–344).
RAO, C. R. (1982), “Diversity and Dissimilarity Coefficients: A Unified Approach,”Theoretical Population Biology 21, 24–43.
TAKANE, Y., BOZDOGAN, H. and SHIBAYAMA, T. (1987), “Ideal Point Discriminant Analysis,”Psychometrika, 52, 371–392.
TIKU, M. L., and BALAKRISHNAN, N. (1984), “Robust Multivariate Classification Procedures Based on the MML Estimators,”Communications in Statistics—Theory and Methods, 13, 967–986.
TU, C. T. and HAN, C. P. (1982), “Discriminant Analysis Based on Binary and Continuous Variables,”Journal of the American Statistical Association, 77, 447–454.
VLACHONIKOLIS, I. G. (1985), “On the Asymptotic Distribution of the Location Linear Discriminant Function,”Journal of the Royal Statistical Society, Series B, 47, 498–509.
VLACHONIKOLIS, I. G. (1986), “On the Estimation of the Expected Probability of Misclassification in Discriminant Analysis with Mixed Binary and Continuous Variables,”Computers and Mathematics with Applications, 12A(2), 187–195.
VLACHONIKOLIS, I. G. (1990), “Predictive Discrimination and Classification with Mixed Binary and Continuous Variables,”Biometrika, 77, 657–662.
VLACHONIKOLIS, I. G., and MARRIOTT F. H. C. (1982), “Discrimination with Mixed Binary and Continuous Data”,Applied Statistics, 31, 23–31.
WERMUTH, N., and LAURITZEN, S. L. (1990), “On Substantive Research Hypotheses, Conditional Independence Graphs and Graphical Chain Models,”Journal of the Royal Statistical Society, Series B, 52, 21–50.
WHITTAKER, J. (1990),Graphical Models in Applied Multivariate Statistics, Chichester: Wiley.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Krzanowski, W.J. The location model for mixtures of categorical and continuous variables. Journal of Classification 10, 25–49 (1993). https://doi.org/10.1007/BF02638452
Issue Date:
DOI: https://doi.org/10.1007/BF02638452