Multi-classification of Patent Applications with Winnow

Koster, Cornelis H. A.; Seutter, Marc; Beney, Jean

doi:10.1007/978-3-540-39866-0_53

Cornelis H. A. Koster⁶,
Marc Seutter⁶ &
Jean Beney⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2890))

Included in the following conference series:

International Andrei Ershov Memorial Conference on Perspectives of System Informatics

493 Accesses
5 Citations

Abstract

The Winnow family of learning algorithms can cope well with large numbers of features and is tolerant to variations in document length, which makes it suitable for classifying large collections of large documents, like patent applications.

Both the large size of the documents and the large number of available training documents for each class make this classification task qualitatively different from the classification of short documents (newspaper articles or medical abstracts) with few training examples, as exemplified by the TREC evaluations.

This note describes recent experiments with Winnow on two large corpora of patent applications, supplied by the European Patent Office (EPO). It is found that the multi-classification of patent applications is much less accurate than the mono-classification of similar documents. We describe a potential pitfall in multi-classification and show ways to improve the accuracy. We argue that the inherently larger noisiness of multi-class labeling is the reason that multi-classification is harder than mono-classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Supervised Approaches to Assign Cooperative Patent Classification (CPC) Codes to Patents

Patent Classification on Subgroup Level Using Balanced Winnow

Exploration and Implementation of Classification Algorithms for Patent Classification

References

Arampatzis, A., van Hameren, A.: The Score-Distributional Threshold Optimization for Adaptive Binary Classification Tasks. In: Proceedings ACM SIGIR 2001, pp. 267–275 (2001)
Google Scholar
Bel, N., Koster, C.H.A., Villegas, M.: Cross-Lingual Text Categorization. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 126–139. Springer, Heidelberg (2003)
Chapter Google Scholar
Cohen, W.W., Singer, Y.: Context-sensitive learning methods for text categorization. ACM Transactions on Information Systems 13(1), 100–111 (1999)
Google Scholar
Dagan, I., Karov, Y., Roth, D.: Mistake-Driven Learning in Text Categorization. In: Proceedings 2nd Conference on Empirical Methods in NLP, pp. 55–63 (1997)
Google Scholar
Grove, A., Littlestone, N., Schuurmans, D.: General convergence results for linear discriminant updates. Machine Learning 43(3), 173–210 (2001)
Article MATH Google Scholar
Koster, C.H.A., Seutter, M., Beney, J.: Classifying Patent Applications with Winnow. In: Proceedings Benelearn 2001, Antwerpen, p. 8 (2001), http://cnts.uia.ac.be/benelearn2001/
Koster, C.H.A., Seutter, M.: Taming Wild Phrases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 161–176. Springer, Heidelberg (2003)
Chapter Google Scholar
Krier, M., Zaccà, F.: Automatic Categorisation Applications at the European Patent Office. World Patent Information 24, 187–196 (2002)
Article Google Scholar
Larkey, L.S.: A patent search and classification system. In: Proceedings of DL 1999, 4th ACM Conference on Digital Libraries, pp. 179–187 (1999)
Google Scholar
Littlestone, N.: Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning 2, 285–318 (1988)
Google Scholar
Peters, C., Koster, C.H.A.: Uncertainty-based Noise Reduction and Term Selection in Text Categorisation. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS) 11(1), 115–137 (2003)
Article MATH MathSciNet Google Scholar
Rocchio, J.J.: Relevance feedback in Information Retrieval. In: Salton, G. (ed.) The Smart Retrieval system - experiments in automatic document processing, pp. 313–323. Prentice - Hall, Englewood Cliffs (1971)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Zhiang, Y., Callan, J.: Maximum Likelyhood Estimation for Filtering Thresholds. In: Proceedings of ACM SIGIR 2001, pp. 294–302 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Nijmegen, The Netherlands
Cornelis H. A. Koster & Marc Seutter
Dept Informatique, INSA de Lyon, France
Jean Beney

Authors

Cornelis H. A. Koster
View author publications
You can also search for this author in PubMed Google Scholar
Marc Seutter
View author publications
You can also search for this author in PubMed Google Scholar
Jean Beney
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Technische Universität München, Germany
Manfred Broy
A.P. Ershov Institute of Informatics Systems, Siberian Branch of Russian Academy of Sciences, 630090, Novosibirsk, Russia
Alexandre V. Zamulin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koster, C.H.A., Seutter, M., Beney, J. (2004). Multi-classification of Patent Applications with Winnow. In: Broy, M., Zamulin, A.V. (eds) Perspectives of System Informatics. PSI 2003. Lecture Notes in Computer Science, vol 2890. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39866-0_53

Download citation

DOI: https://doi.org/10.1007/978-3-540-39866-0_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20813-6
Online ISBN: 978-3-540-39866-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Multi-classification of Patent Applications with Winnow

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Supervised Approaches to Assign Cooperative Patent Classification (CPC) Codes to Patents

Patent Classification on Subgroup Level Using Balanced Winnow

Exploration and Implementation of Classification Algorithms for Patent Classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multi-classification of Patent Applications with Winnow

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Supervised Approaches to Assign Cooperative Patent Classification (CPC) Codes to Patents

Patent Classification on Subgroup Level Using Balanced Winnow

Exploration and Implementation of Classification Algorithms for Patent Classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation