Abstract
Compositional data need a special treatment prior to correlation analysis. In this paper we argue why standard transformations for compositional data are not suitable for computing correlations, and why the use of raw or log-transformed data is neither meaningful. As a solution, a procedure based on balances is outlined, leading to sensible correlation measures. The construction of the balances is demonstrated using a real data example from geochemistry. It is shown that the considered correlation measures are invariant with respect to the choice of the binary partitions forming the balances. Robust counterparts to the classical, non-robust correlation measures are introduced and applied. By using appropriate graphical representations, it is shown how the resulting correlation coefficients can be interpreted.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability. Chapman & Hall, London, 416 p
Anderson TW (1958) An introduction to multivariate statistical analysis. Wiley, New York, 374 p
Anděl J (1978) Mathematical statistics. SNTL/Alfa, Prague, 346 p (in Czech)
Buccianti A, Pawlowsky-Glahn V (2005) New perspectives on water chemistry and compositional data analysis. Math Geol 37(7):703–727
Conover WJ (1998) Practical nonparametric statistics, 3rd edn. Wiley, New York, 584 p
Egozcue JJ, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math Geol 37(7):795–828
Egozcue JJ, Pawlowsky-Glahn V (2006) Simplicial geometry for compositional data. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: From theory to practice. Special publications, vol 264. Geological Society, London, pp 145–160
Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueraz G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300
Filzmoser P, Hron K (2008) Outlier detection for compositional data using robust methods. Math Geosci 40(3):233–248
Gabriel KR (1971) The biplot graphic display of matrices with application to principal component analysis. Biometrika 58:453–467
Harville DA (1997) Matrix algebra from a statistican’s perspective. Springer, New York, 630 p
Johnson R, Wichern D (2007) Applied multivariate statistical analysis, 6th edn. Prentice-Hall, London, 816 p
Mahalanobis P (1936) On the generalized distance in statistics. Proc Natl Inst Sci India 12:49–55
Maronna R, Martin RD, Yohai VJ (2006) Robust statistics: Theory and methods. Wiley, New York, 436 p
Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado J (2007) Lecture notes on compositional data analysis. http://diobma.udg.edu/handle/10256/297/
Pearson K (1897) Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc Lond LX:489–502
R Development Core Team (2008) R: A language and environment for statistical computing. Vienna, http://www.r-project.org
Reimann C, Filzmoser P (2000) Normal and lognormal data distribution in geochemistry: Death of a myth. Consequences for the statistical treatment of geochemical and environmental data. Environ Geol 39:1001–1014
Reimann C, Äyräs M, Chekushin V, Bogatyrev I, Boyd R, Caritat PD, Dutter R, Finne T, Halleraker J, Jæger O, Kashulina G, Lehto O, Niskavaara H, Pavlov V, Räisänen M, Strand T, Volden T (1998) Environmental geochemical atlas of the Central Barents region. Special publication. Geological Survey of Norway (NGU), Geological Survey of Finland (GTK), and Central Kola Expedition (CKE), Trondheim, Espoo, Monchegorsk, 745 p
Reimann C, Filzmoser P, Garrett RG, Dutter R (2008) Statistical data analysis explained. Applied environmental statistics with R. Wiley, Chichester, 362 p
Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Filzmoser, P., Hron, K. Correlation Analysis for Compositional Data. Math Geosci 41, 905–919 (2009). https://doi.org/10.1007/s11004-008-9196-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11004-008-9196-y