Abstract
This paper describes a classification strategy that can be regarded as a more general form of nearest-neighbor classification. It fuses the concepts of nearest neighbor, linear discriminant and Vantage-Point trees, yielding an efficient indexing data structure and classification algorithm. In the learning phase, we define a set of disjoint subspaces of reduced complexity that can be separated by linear discriminants, ending up with an ensemble of simple (weak) classifiers that work locally. In classification, the closest centroids to the query determine the set of classifiers considered, which responses are weighted. The algorithm was experimentally validated in datasets widely used in the field, attaining error rates that are favorably comparable to the state-of-the-art classification techniques. Lastly, the proposed solution has a set of interesting properties for a broad range of applications: 1) it is deterministic; 2) it classifies in time approximately logarithmic with respect to the size of the learning set, being far more efficient than nearest neighbor classification in terms of computational cost; and 3) it keeps the generalization ability of simple models.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
ALPAYDIN, E. (1999), “Combined 5 × 2 cv F-Test for Comparing Supervised Classification Learning Algorithms”, Neural Computation, 11(8), 1885–1892.
BANFIELD, R., HALL, L., BOWYER, K., and KEGELMEYER,W. (2007), “A Comparison of Decision Tree Ensemble Creation Techniques”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 173–180.
BAUER, E., and KOHAVI, R. (1999), “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants”, Machine Learning, 36(1-2), 105–139.
BOCK, K., COUSSEMENT, K., and POEL, D. (2010), “Ensemble Classification Based on Generalized Additive Models”, Computational Statistics and Data Analysis, 54, 1535–1546.
BREIMAN, L. (1996), “Bagging Predictors”, Machine Learning, 24(2), 123–140.
BREIMAN, L. (2001), “Random Forests”, Machine Learning, 45(1), 5–32.
BRYLL, R., GUTIERREZ-OSUNA,R., and QUEK, F. (2003), “Attribute Bagging: Improving Accuracy of Classifier Ensembles by Using Random Feature Subsets”, Pattern Recognition, 36(6), 291–1302,
CAI, D., HE, X., and HAN, J. (2008), “Training Linear Discriminant Analysis in Linear Time”, in Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, pp. 209–217.
CANUTO, A., ABREU, M., OLIVEIRA, L., XAVIER JR., J., and SANTOS, A. (2007), “Investigating the Influence of the Choice of the Ensemble Members in Accuracy and Diversity of Selection-Based and Fusion-Based Methods for Ensembles”, Pattern Recognition Letters, 28(4), 472–486.
CECI, M., APPICE, A., and MALERBA, D. (2003), “Comparing Simplification Methods for Model Trees with Regression and Splitting Nodes”, in Proceedings of the Fourteenth International Symposium on Methodologies for Intelligent Systems, Lecture Notes in Artificial Intelligence Vol. 2871, pp. 49–56.
CORTES, C., and VAPNIK, V. (1995), “Support Vector Networks”, Machine Learning, 20, 1–25.
COVER, T., and HART, P. (1967), “Nearest Neighbor Pattern Classification”, IEEE Transactions on Information Theory, 13(1), 21–27.
DEMSAR, J. (2006), “Statistical Comparisons of Classifiers over Multiple Data Sets”, Journal of Machine Learning Research, 7, 1–30.
DIETTERICH, T. (2000), “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization”, Machine Learning, 49(2), 139–157.
DOMINGOS, P. (1996), “Unifying Instance-Based and Rule-Based Induction”, Machine Learning, 24, 141–168.
DUDA, R., HART, P., and STORK, D. (2000), Pattern Classification (2nd ed.), Wiley Interscience, ISBN 0-471-05669-3.
FRANK, E., HALL, M., and PFAHRINGER, B. (2003), “Locally Weighted Naive Bayes”, in Proceedings of the 19th conference on Uncertainty in Artificial Intelligence, San Mateo, pp. 249–256.
FREUND, Y., and SCHAPIRE, R. (1995), “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting”, in Proceedings of the 2nd European Conference on Computational Learning Theory, pp. 23–37.
HO, T.K. (1995), “Random Decision Forests”, in Proceedings of the 3rd International Conference on Document Analysis and Recognition, pp. 278–282.
HO, T.K. (1998), “The Random Subspace Method for Constructing Decision Forests” IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.
HOTHORN, T., and LAUSEN, B. (2005), “Building Classifiers by Bagging Trees”, Computational Statistics and Data Analysis, 49, 1068–1078.
JIRINA, M., and JIRINA JR., M. (2013), “Utilization of Singularity Exponent in Nearest Neighbor Based Classifier”, Journal of Classification, 30(1), 3–29.
JOHNSON, R., and WICHERN, D. (1988), Applied Multivariate Statistic Analysis (2nd. ed.), Englewood Cliffs NJ: Prentice Hall Inc.
KLEINBERG, E.M. (1990), “Stochastic Discrimination”, Annals of Mathematics and Artificial Intelligence, 1, 207–239.
KUMAR, A. (2008), “Combining Pattern Classifiers: Methods and Algorithms”, IEEE Transactions on Industrial Electronics, 55(1), 348–363.
KUNCHEVA, L. (2004), Combining Pattern Classifiers: Methods and Algorithms, Hoboken NJ: John Wiley & Sons.
KUNCHEVA, L., and RODRÍGUEZ, J. (2007), “Classifier Ensembles with a Random Linear Oracle”, IEEE Transactions on Knowledge and Data Engineering, 19(4), 500–508.
LU, J., and TAN, Y-P. (2011), Nearest Feature Space Analysis for Classification”, IEEE Signal Processing Letters, 18(1), 55–58.
MOLLER,M. (1993), “A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning”, Neural Networks, 6, 525–533.
OJALA, T., PIETIKAINEN, M., and HARWOOD, D. (1996), “A Comparative Study of TextureMeasures with Classification Based on Feature Distributions”, Pattern Recognition, 29, 51–59.
GARCÍA-PEDRAJAS, N. (2009), “Constructing Ensembles of Classifiers by Means of Weighted Instance Selection”, IEEE Transactions on Neural Networks, 20(2), 258–277.
SCHAPIRE, R. (1990), “The Strength of Weak Learnability”, Machine Learning, 5(2), 197–227.
SEIFFERT, C., KHOSHGOFTAAR, T., HULSE, J., and NAPOLITANO, A. (2008), “RUSBoost: Improving Classification Performance When Training Data Is Skewed”, in Proceedings of the 19th International Conference on Pattern Recognition, pp. 1-4.
STALLKAMP, J., SCHLIPSING, M., SALMEN, J., and IGEL, C. (2012), “Man vs. Computer: Benchmarking Machine Learning Algorithms for Traffic Sign Recognition”, Neural Networks, 32, 323–332.
TING, K.M., WELLS, J.R., TAN, S.C., TENG, S.W., and WEBB, G.I. (2011), “Feature-Subspace Aggregating: Ensembles for Stable and Unstable Learners”, Machine Learning, 82, 375–397.
VIOLA, P.A., and JONES, M.J. (2004), “Robust Real-Time Face Detection”, International Journal of Computer Vision, 57(2), 137–154.
YAN, R., and TEŠIĆ, J. (2007), “Model-Shared Subspace Boosting for Multi-Label Classification”, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 834–843.
YIANILOS, P. (1993), “Data Structures and Algorithms for Nearest Neighbor Search in GeneralMetric Spaces”, in Proceedings of the Fourth Annual ACM-SIAMSymposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, pp. 311–321.
YU, G., ZHANG, G., YU, Z., DOMENICONI, C., YOUC, J., and HANA, G. (2012), “Semi-Supervised Ensemble Classification in Subspaces”, Applied Soft Computing, 12, 1511–1522.
ZAMAN, M., and HIROSE, H. (2013), “DF-SVM: A Decision Forest Constructed on Artificially Enlarged Feature Space by Support Vector Machine”, Artificial Intelligence Review, 40, 467–494.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Proença, H., Neves, J.C. Fusing Vantage Point Trees and Linear Discriminants for Fast Feature Classification. J Classif 34, 85–107 (2017). https://doi.org/10.1007/s00357-017-9223-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-017-9223-0