Abstract
In outlying aspects mining, given a query object, we aim to answer the question as to what features make the query most outlying. The most recent works tackle this problem using two different strategies. (i) Feature selection approaches select the features that best distinguish the two classes: the query point vs. the rest of the data. (ii) Score-and-search approaches define an outlyingness score, then search for subspaces in which the query point exhibits the best score. In this paper, we first present an insightful theoretical result connecting the two types of approaches. Second, we present OARank – a hybrid framework that leverages the efficiency of feature selection based approaches and the effectiveness and versatility of score-and-search based methods. Our proposed approach is orders of magnitudes faster than previously proposed score-and-search based approaches while being slightly more effective, making it suitable for mining large data sets.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Chaovalitwongse, A.W., et al.: Quadratic integer programming: complexity and equivalent forms quadratic integer programming: Complexity and equivalent forms. In: Floudas, C.A., Pardalos, P.M. (eds.) Encyclopedia of Optimization, pp. 3153–3159 (2009)
Dang, X.H., et al.: Discriminative features for identifying and interpreting outliers. In: ICDE 2014, pp. 88–99, March 2014
Dang, X.-H., Bailey, J.: A hierarchical information theoretic technique for the discovery of non linear alternative clusterings. In: 16th ACM SIGKDD, pp. 573–582. ACM, New York (2010)
Dang, X.H., Micenková, B., Assent, I., Ng, R.T.: Local outlier detection with interpretation. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part III. LNCS, vol. 8190, pp. 304–320. Springer, Heidelberg (2013)
Duan, L., Tang, G., Pei, J., Bailey, J., et al.: Mining outlying aspects on numeric data. In: Data Mining and Knowledge Discovery (2014) (in press)
Havrda, J., Charvat, F.: Quantification method of classification processes. concept of structural \(\alpha \)-entropy. Kybernetika 3, 30–35 (1967)
Jawaharlal, K.: Entropy Measures. Maximum Entropy Principle and Emerging Applications. Springer-Verlag New York Inc., Secaucus (2003)
Keller, F., Muller, E., Bohm, K.: HiCS: high contrast sub-spaces for density-based outlier ranking. In: ICDE 2012, pp. 1037–1048 (2012)
Mathai, A.M., Haubold, H.J.: On generalized entropy measures and pathways. Physica A: Statistical Mechanics and its Applications 385(2), 493–500 (2007)
Micenkova, B., Ng, R.T., Assent, I., Dang, X.-H.: Explaining outliers by subspace separability. In: ICDM (2013)
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley (1992)
Vinh, N.X., Chan, J., Bailey, J.: Reconsidering mutual information based feature selection: A statistical significance view. In: AAAI 2014 (2014)
Vinh, N.X., Chan, J., Romano, S., Bailey, J.: Effective global approaches for mutual information based feature selection. In: KDD 2014 (2014)
Vinh, N.X., Epps, J.: Mincentropy: A novel information theoretic approach for the generation of alternative clusterings. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 521–530 (2010)
Zhang, J., Lou, M., et al.: Hos-miner: a system for detecting outlyting subspaces of high-dimensional data. In: VLDB 2004, pp. 1265–1268 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Vinh, N.X., Chan, J., Bailey, J., Leckie, C., Ramamohanarao, K., Pei, J. (2015). Scalable Outlying-Inlying Aspects Discovery via Feature Ranking. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-18032-8_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18031-1
Online ISBN: 978-3-319-18032-8
eBook Packages: Computer ScienceComputer Science (R0)