Abstract
In general, support vector machines (SVM), when applied to text classification provide excellent precision, but poor recall. One means of customizing SVMs to improve recall, is to adjust the threshold associated with an SVM. We describe an automatic process for adjusting the thresholds of generic SVM which incorporates a user utility model, an integral part of an information management system. By using thresholds based on utility models and the ranking properties of classifiers, it is possible to overcome the precision bias of SVMs and insure robust performance in recall across a wide variety of topics, even when training data are sparse. Evaluations on TREC data show that our proposed threshold adjusting algorithm boosts the performance of baseline SVMs by at least 20% for standard information retrieval measures.
Chapter PDF
Similar content being viewed by others
Keywords
- Support Vector Machine
- Linear Support Vector Machine
- Sequential Minimal Optimization
- Threshold Adjustment
- Support Vector Machine Learning
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Arampatzis, A., Unbiased, S.-D.: Threshold Optimization, Initial Query Degradation, Decay, and Incrementality, for Adaptive Document Filtering. In: Tenth Text Retrieval Conference (TREC 2001), pp. 596–605 (2002)
Ault, T., Yang, Y.: kNN, Rocchio and Metrics for Information Filtering at TREC-10. In: Tenth Text Retrieval Conference (TREC 2001), pp. 84–93 (2002)
Cancedda, N., et al.: Kernel Methods for Document Filtering. In: Eleventh Text Retrieval Conference, TREC 2002 (2003)
Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)
Evans, D.A., Shanahan, J., Tong, X., Roma, N., Stoica, E., Sheftel, V., Montgomery, J., Bennett, J., Fujita, S., Grefenstette, G.: Topic Specific Optimization and Structuring. In: Tenth Text Retrieval Conference (TREC 2001), pp. 132–141 (2002)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)
Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO algorithm for SVM classifier design. Technical report, Dept of CSA, IISc, Bangalore, India (1999)
LeCun, Y., Jackel, L.D., Bottou, L., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Muller, U.A., Sackinger, E., Simard, P., Vapnik, V.: Learning algorithms for classification: A comparison on handwritten digit recognition. Neural Networks: The Statistical Mechanics Perspective, 261–276 (1995)
Lewis, D.D., Schapire, R.E., Callan, J.P., Papka, R.: Training algorithms for linear text classifiers. In: Int’l ACM Conf. on Research and Development in Information Retrieval (SIGIR 1996), pp. 298-306 (1996)
Lewis, D.D.: The Reuters-21578 text categorization test collection, Checked on 11 May 1998, http://www.research.att.com/~lewis/reuters21578.html ; Timestamp Tue Jan 20 21:07:21 EST (1998)
Lewis, D.D.: Applying Support Vector Machines to the TREC-2001 Batch Filtering and Routing Tasks. In: Tenth Text Retrieval Conference (TREC 2001), pp. 286–294 (2002)
Li, Y., Zaragoza, H., Herbrich, R., Shawe-Taylor, J., Kandola, J.S.: The Perceptron Algorithm with Uneven Margins. In: ICML 2002, pp. 379–386 (2002)
Mayfield, J., McNamee, P., Costello, C., Piatko, C., Banerjee, A.: JHU/APL at TREC 2001: Experiments in Filtering and in Arabic, Video, and Web Retrieval, at TREC-10. In: Tenth Text Retrieval Conference (TREC 2001), pp. 322–332 (2002)
Morik, K., Brockhausen, P., Joachims, T.: Combining statistical learning with a knowledge-based approach - A case study in intensive care monitoring. In: Proc. 16th Int’l Conf. on Machine Learning, ICML 1999 (1999)
Osuna, E., Freund, R., Girosi, F.: Training support vector machines: An application to face detection. In: Proceedings of Computer Vision and Pattern Recognition 1997, pp. 130–136 (1997)
Platt, J.: Fast training of SVMs using sequential minimal optimization. In: Scholkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods – Support Vector Learning, MIT Press, Cambridge (1998)
Robertson, S., Soboroff, I.: The TREC, Filtering Track Report. In: Tenth Text Retrieval Conference (TREC 2001), pp. 26–37 (2001)
Salton, G.: Introduction to Modern Information Retrieval. McGraw Hill, New York (1983)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)
Voorhees, E.M.: Overview of TREC. In: Eleventh Text Retrieval Conference, TREC 2002, pp. 1–16 (2002)
Yang, Y.: A study on thresholding strategies for text categorization. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2001, pp. 137-145 (2001)
Zhai, C., Jansen, P., Stoica, E., Grot, N., Evans, D.A.: Threshold Calibration in CLARIT Adaptive Filtering. In: Seventh Text Retrieval Conference (TREC-7), pp. 149–156 (1999)
Zhang, Y., Callan, J.: YFilter at TREC-9. In: Proceedings of the Ninth Text Retrieval Conference (TREC-9), pp. 135–140, 500-249. National Institute of Standards and Technology (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shanahan, J.G., Roma, N. (2003). Improving SVM Text Classification Performance through Threshold Adjustment. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds) Machine Learning: ECML 2003. ECML 2003. Lecture Notes in Computer Science(), vol 2837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39857-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-39857-8_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20121-2
Online ISBN: 978-3-540-39857-8
eBook Packages: Springer Book Archive