Improving k-Nearest Neighbour Classification with Distance Functions Based on Receiver Operating Characteristics

Hassan, Md. Rafiul; Hossain, M. Maruf; Bailey, James; Ramamohanarao, Kotagiri

doi:10.1007/978-3-540-87479-9_50

Md. Rafiul Hassan¹,
M. Maruf Hossain¹,
James Bailey^1,2 &
…
Kotagiri Ramamohanarao^1,2

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5211))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

5881 Accesses
5 Citations

Abstract

The k-nearest neighbour (k-NN) technique, due to its interpretable nature, is a simple and very intuitively appealing method to address classification problems. However, choosing an appropriate distance function for k-NN can be challenging and an inferior choice can make the classifier highly vulnerable to noise in the data. In this paper, we propose a new method for determining a good distance function for k-NN. Our method is based on consideration of the area under the Receiver Operating Characteristics (ROC) curve, which is a well known method to measure the quality of binary classifiers. It computes weights for the distance function, based on ROC properties within an appropriate neighbourhood for the instances whose distance is being computed. We experimentally compare the effect of our scheme with a number of other well-known k-NN distance metrics, as well as with a range of different classifiers. Experiments show that our method can substantially boost the classification performance of the k-NN algorithm. Furthermore, in a number of cases our technique is even able to deliver better accuracy than state-of-the-art non k-NN classifiers, such as support vector machines.

Download to read the full chapter text

Chapter PDF

Combining k-Nearest Neighbor and Centroid Neighbor Classifier for Fast and Robust Classification

KNN-DK: A Modified K-NN Classifier with Dynamic k Nearest Neighbors

A Fast k-Nearest Neighbor Classifier Using Unsupervised Clustering

Keywords

References

Theilhaber, J., et al.: Finding genes in the C2C12 osteogenic pathway by k-nearest-neighbor classification of expression data. Genome Research 12(1), 165–176 (2002)
Article Google Scholar
Wu, W., Xing, E., Mian, I., Bissell, M.: Evaluation of normalization methods for cDNA microarray data by k-NN classification. BMC Bioinformatics 6(191), 1–21 (2005)
Google Scholar
Hastie, T., Tibshirani, R.: Discriminant adaptive nearest-neighbor classification. IEEE Transactions on Pattern Analysis Machine Intelligence 18(6), 607–616 (1996)
Article Google Scholar
Green, D.M., Swets, J.M.: Signal detection theory and psychophysics. John Wiley & Sons Inc., New York (1966)
Google Scholar
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982)
Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)
Article Google Scholar
Kira, K., Rendell, L.: A practical approach to feature selection. In: Proc. of ICML, pp. 249–256 (1992)
Google Scholar
Salzberg, S.: Distance metrics for instance-based Learning. In: Raś, Z.W., Zemankova, M. (eds.) ISMIS 1991. LNCS, vol. 542, pp. 399–408. Springer, Heidelberg (1991)
Google Scholar
Kononenko, I.: Estimating attributes: Analysis and extensions of Relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Google Scholar
Klein, D., Kamvar, S.D., Manning, C.D.: From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In: Proc. of ICML, pp. 307–314 (2002)
Google Scholar
Stein, B., Niggemann, O.: Generation of similarity measures from different sources. In: 14th International Conference on Industrial Engineering Applications of Artificial Intelligence and Expert Systems, pp. 197–206 (2001)
Google Scholar
Han, E.H., Karypis, G., Kumar, V.: Text categorization using weight-adjusted nearest-neighbor classification. In: Conference on Knowledge Discovery and Data Mining, Hong Kong, China, pp. 53–65 (2001)
Google Scholar
Zhang, Z.: Learning metrics via discriminant kernels and multidimensional scaling: toward expected Euclidean representation. In: Proc. of ICML, pp. 872–879 (2003)
Google Scholar
Hastie, T., Tibshirani, R.: Discriminant Adaptive Nearest Neighbor Classification and Regression. In: Advances in Neural Information Processing Systems, vol. 8, pp. 409–415 (1996)
Google Scholar
Domeniconi, C., Gunopulos, D., Peng, J.: Large margin nearest neighbor classifiers. IEEE Transactions on Neural Networks 16(4), 899–909 (2005)
Article Google Scholar
Driessens, K., Reutemann, P., Pfahringer, B., Leschi, C.: Using Weighted Nearest Neighbor to Benefit from Unlabeled Data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 60–69. Springer, Heidelberg (2006)
Chapter Google Scholar
Im, K.H., Park, S.C.: Case-based reasoning and neural network based expert system for personalization. Expert Systems with Applications 32, 77–85 (2007)
Article Google Scholar
Vivencio, D.P., et al.: Feature-weighted k-Nearest Neighbor Classifier. In: Proc. of FOCI, pp. 481–486 (2007)
Google Scholar
Hossain, M.M., Hassan, M.R., Bailey, J.: ROC-tree: A Novel Decision Tree Induction Algorithm Based on Receiver Operating Characteristics to Classify Gene Expression Data. In: Proc. of 8th SIAM Int’l Conf. on Data Mining (SDM 2008), pp. 455–465 (2008)
Google Scholar
Ferri, C., Flach, P., Hernández-Orallo, J.: Learning decision trees using the area under the ROC curve. In: Proc. of ICML, pp. 139–146 (2002)
Google Scholar
Deng, L., Pei, J., Ma, J., Lee, D.L.: A Rank Sum Test Method for Informative Gene Discovery. In: Proc. of KDD, pp. 410–419 (2004)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Google Scholar
Fawcett, T.: ROC Graphs: Notes and Practical Considerations for Researchers. Technical Report MS 1143, HP Laboratories (2004)
Google Scholar
Egan, J.P.: Signal Detection Theory and ROC Analysis. Academic Press Series in Cognition and Perception. Academic Press, London (1975)
Google Scholar
Mamitsuka, H.: Selecting features in microarray classification using ROC curves. Pattern Recognition 39(12), 2393–2404 (2006)
Article MATH Google Scholar
van ’t Veer, L.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–535 (2002)
Article Google Scholar
Integrated Tumor Transcriptome Array and Clinical Data Analysis (2006), http://bioinfo-out.curie.fr/ittaca
Hedenfalk, I., et al.: Gene-expression profiles in hereditary breast cancer. The New England Journal of Medicine 344(8), 539–548 (2001)
Article Google Scholar
Gordon, G.J., et al.: Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer And Mesothelioma. Cancer Research 62, 4963–4967 (2002)
Google Scholar
The Division of Thoracic Surgery (2002), http://www.chestsurg.org/publications/2002-microarray.aspx
Broad Institute Cancer Program (2002), http://www.broad.mit.edu/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=75
Singh, D., et al.: Gene Expression Correlates of Clinical Prostate Cancer Behavior. Cancer Cell 1, 203–209 (2002)
Article Google Scholar
Broad Institute Cancer Program (1999), http://www.broad.mit.edu/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=43
Golub, T.R., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Newman, D., et al.: UCI Repository of machine learning databases. Online Repository (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Perner, P.: Methods for Data Mining. In: Data Mining on Multimedia Data. LNCS, vol. 2558, pp. 23–89. Springer, Heidelberg (2002)
Chapter Google Scholar
Glantz, S.A.: Primer of BioStatistics, pp. 309–310. McGraw-Hill, NY (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Software Engineering, The University of Melbourne, Australia
Md. Rafiul Hassan, M. Maruf Hossain, James Bailey & Kotagiri Ramamohanarao
NICTA Victoria Laboratory, The University of Melbourne, Australia
James Bailey & Kotagiri Ramamohanarao

Authors

Md. Rafiul Hassan
View author publications
You can also search for this author in PubMed Google Scholar
M. Maruf Hossain
View author publications
You can also search for this author in PubMed Google Scholar
James Bailey
View author publications
You can also search for this author in PubMed Google Scholar
Kotagiri Ramamohanarao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Walter Daelemans Bart Goethals Katharina Morik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hassan, M.R., Hossain, M.M., Bailey, J., Ramamohanarao, K. (2008). Improving k-Nearest Neighbour Classification with Distance Functions Based on Receiver Operating Characteristics. In: Daelemans, W., Goethals, B., Morik, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008. Lecture Notes in Computer Science(), vol 5211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87479-9_50

Download citation

DOI: https://doi.org/10.1007/978-3-540-87479-9_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87478-2
Online ISBN: 978-3-540-87479-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving k-Nearest Neighbour Classification with Distance Functions Based on Receiver Operating Characteristics

Abstract

Chapter PDF

Similar content being viewed by others

Combining k-Nearest Neighbor and Centroid Neighbor Classifier for Fast and Robust Classification

KNN-DK: A Modified K-NN Classifier with Dynamic k Nearest Neighbors

A Fast k-Nearest Neighbor Classifier Using Unsupervised Clustering

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improving k-Nearest Neighbour Classification with Distance Functions Based on Receiver Operating Characteristics

Abstract

Chapter PDF

Similar content being viewed by others

Combining k-Nearest Neighbor and Centroid Neighbor Classifier for Fast and Robust Classification

KNN-DK: A Modified K-NN Classifier with Dynamic k Nearest Neighbors

A Fast k-Nearest Neighbor Classifier Using Unsupervised Clustering

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation