Abstract
Hierarchical attributes appear in taxonomic or ontology- based data (e.g. NACE economic activities, ICD-classified diseases, animal/plant species, etc.). Such taxonomic data are often exploited as if they were flat nominal data without hierarchy, which implies losing substantial information and analytical power. We introduce marginality, a numerical mapping for taxonomic data that allows using on those data many of the algorithms and analytical techniques designed for numerical data. We show how to compute descriptive statistics like the mean, the variance and the covariance on marginality-mapped data. Also, we define a mathematical distance between records including hierarchical attributes that is based on marginality-based variances. Such a distance paves the way to re-using on taxonomic data clustering and anonymization techniques designed for numerical data.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14(1), 189–201 (2002)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)
Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of clinical data based on semantic marginality (manuscript, 2012)
Domingo-Ferrer, J., Solanas, A.: A measure of nominal variance for hierarchical nominal attributes. Information Sciences 178(24), 4644–4655 (2008); Erratum in Information Sciences 179(20), 3732 (2009)
Duncan, G.T., Elliot, M., Salazar-González, J.-J.: Statistical Confidentiality: Principles and Practice. Springer, New York (2011)
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Schulte-Nordholt, E., Seri, G., DeWolf, P.-P.: Handbook on Statistical Disclosure Control (version 1.2). ESSNET SDC Project (2010), http://neon.vb.cbs.nl/casc
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Schulte Nordholt, E., Spicer, K., De Wolf, P.P.: Statistical Disclosure Control. Wiley, New York (2012)
ICD9 - International Classification of Diseases, 9th Revision, Clinical Modification, 6th edn., October 1 (2008), http://icd9cm.chrisendres.com/
ISIC Rev. 4 - International Standard Industrial Classification of All Economic Activities, United Nations Statistics Division, http://unstats.un.org/unsd/cr/registry/regcst.asp?Cl=27&prn=yes
Lenz, R.: Methoden der Geheimhaltung wirtschaftsstatistischer Einzeldaten und ihre Schutzwirkung. Statistik und Wissenschaft, vol. 18. Statistisches Bundesamt, Wiesbaden (2010)
McNeill, J., et al. (eds.): International Code of Botanical Nomenclature (Vienna Code). International Association for Plant Taxonomy (2006), http://ibot.sav.sk/icbn/main.htm
NACE Rev. 2 - Statistical Classification of Economic Activities in the European Community, Rev. 2. Eurostat, European Commission (2008), http://epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-RA-07-015/EN/KS-RA-07-015-EN.PDF
Reid, K.B.: Centrality measures in trees. In: Kaul, H., Mulder, H.M. (eds.) Advances in Interdisciplinary Applied Discrete Mathematics, pp. 167–197. World Scientific eBook (2010)
Ride, W.D.L., et al. (eds.): International Code of Zoological Nomenclature, 4th edn., January 1. International Union of Biological Sciences (2000), http://www.nhm.ac.uk/hosted-sites/iczn/code/
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13(6), 1010–1027 (2001)
Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Systems with Applications 39(9), 7718–7728 (2012)
Willenborg, L., DeWaal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Domingo-Ferrer, J. (2012). Marginality: A Numerical Mapping for Enhanced Exploitation of Taxonomic Attributes. In: Torra, V., Narukawa, Y., López, B., Villaret, M. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2012. Lecture Notes in Computer Science(), vol 7647. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34620-0_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-34620-0_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34619-4
Online ISBN: 978-3-642-34620-0
eBook Packages: Computer ScienceComputer Science (R0)