Skip to main content

Multi-classification of Patent Applications with Winnow

  • Conference paper
Perspectives of System Informatics (PSI 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2890))

Abstract

The Winnow family of learning algorithms can cope well with large numbers of features and is tolerant to variations in document length, which makes it suitable for classifying large collections of large documents, like patent applications.

Both the large size of the documents and the large number of available training documents for each class make this classification task qualitatively different from the classification of short documents (newspaper articles or medical abstracts) with few training examples, as exemplified by the TREC evaluations.

This note describes recent experiments with Winnow on two large corpora of patent applications, supplied by the European Patent Office (EPO). It is found that the multi-classification of patent applications is much less accurate than the mono-classification of similar documents. We describe a potential pitfall in multi-classification and show ways to improve the accuracy. We argue that the inherently larger noisiness of multi-class labeling is the reason that multi-classification is harder than mono-classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Arampatzis, A., van Hameren, A.: The Score-Distributional Threshold Optimization for Adaptive Binary Classification Tasks. In: Proceedings ACM SIGIR 2001, pp. 267–275 (2001)

    Google Scholar 

  2. Bel, N., Koster, C.H.A., Villegas, M.: Cross-Lingual Text Categorization. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 126–139. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  3. Cohen, W.W., Singer, Y.: Context-sensitive learning methods for text categorization. ACM Transactions on Information Systems 13(1), 100–111 (1999)

    Google Scholar 

  4. Dagan, I., Karov, Y., Roth, D.: Mistake-Driven Learning in Text Categorization. In: Proceedings 2nd Conference on Empirical Methods in NLP, pp. 55–63 (1997)

    Google Scholar 

  5. Grove, A., Littlestone, N., Schuurmans, D.: General convergence results for linear discriminant updates. Machine Learning 43(3), 173–210 (2001)

    Article  MATH  Google Scholar 

  6. Koster, C.H.A., Seutter, M., Beney, J.: Classifying Patent Applications with Winnow. In: Proceedings Benelearn 2001, Antwerpen, p. 8 (2001), http://cnts.uia.ac.be/benelearn2001/

  7. Koster, C.H.A., Seutter, M.: Taming Wild Phrases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 161–176. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  8. Krier, M., Zaccà, F.: Automatic Categorisation Applications at the European Patent Office. World Patent Information 24, 187–196 (2002)

    Article  Google Scholar 

  9. Larkey, L.S.: A patent search and classification system. In: Proceedings of DL 1999, 4th ACM Conference on Digital Libraries, pp. 179–187 (1999)

    Google Scholar 

  10. Littlestone, N.: Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning 2, 285–318 (1988)

    Google Scholar 

  11. Peters, C., Koster, C.H.A.: Uncertainty-based Noise Reduction and Term Selection in Text Categorisation. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS) 11(1), 115–137 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  12. Rocchio, J.J.: Relevance feedback in Information Retrieval. In: Salton, G. (ed.) The Smart Retrieval system - experiments in automatic document processing, pp. 313–323. Prentice - Hall, Englewood Cliffs (1971)

    Google Scholar 

  13. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  14. Zhiang, Y., Callan, J.: Maximum Likelyhood Estimation for Filtering Thresholds. In: Proceedings of ACM SIGIR 2001, pp. 294–302 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Koster, C.H.A., Seutter, M., Beney, J. (2004). Multi-classification of Patent Applications with Winnow. In: Broy, M., Zamulin, A.V. (eds) Perspectives of System Informatics. PSI 2003. Lecture Notes in Computer Science, vol 2890. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39866-0_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39866-0_53

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20813-6

  • Online ISBN: 978-3-540-39866-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics