Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval

Yang, Yiming

doi:10.1007/978-1-4471-2099-5_2

Yiming Yang³

451 Accesses
158 Citations

Abstract

Expert Network (ExpNet) is our new approach to automatic categorization and retrieval of natural language texts. We use a training set of texts with expert-assigned categories to construct a network which approximately reflects the conditional probabilities of categories given a text. The input nodes of the network are words in the training texts, the nodes on the intermediate level are the training texts, and the output nodes are categories. The links between nodes are computed based on statistics of the word distribution and the category distribution over the training set. ExpNet is used for relevance ranking of candidate categories of an arbitrary text in the case of text categorization, and for relevance ranking of documents via categories in the case of text retrieval. We have evaluated ExpNet in categorization and retrieval on a document collection of the MEDLINE database, and observed a performance in recall and precision comparable to the Linear Least Squares Fit (LLSF) mapping method, and significantly better than other methods tested. Computationally, ExpNet has an O(N 1og N) time complexity which is much more efficient than the cubic complexity of the LLSF method. The simplicity of the model, the high recall-precision rates, and the efficient computation together make ExpNet preferable as a practical solution for real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Learning to Classify Text Using a Few Labeled Examples

TagTheWeb: Using Wikipedia Categories to Automatically Categorize Resources on the Web

Unsupervised Aggregation of Categories for Document Labelling

References

Hersh WR, Haynes RB. Evaluation of SAPHIRE: an automated approach to indexing and retrieving medical literature. Proc 15th Ann Symp Comp Applic Med Care 1991; 15: 808–812
Google Scholar
Salton G. Development in Automatic Text Retrieval. Science 1991; 253: 974–980
Article MathSciNet Google Scholar
Yang Y, Chute CG. A Linear Least Squares Fit mapping method for information retrieval from natural language texts. Proc 14th International Conference on Computational Linguistics (COLING 92) 1992; 447–453
Google Scholar
Harman D. Overview of the first TREC Conference. Proc 16th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval1993; 36–47
Google Scholar
Hersh WR, Hickam DH, Leone TJ. Words, concepts, or both: optimal indexing units for automated information retrieval. Proc 16th Ann Symp Comp Applic Med Care 1992; 16: 644–648
Google Scholar
Salton G, Buckley C. Improving retrieval performance by relevance feedback..1 Amer Soc Inf Sci 1990; 41 (4): 288–297
Article Google Scholar
Fuhr N, Hartmann S, Lustig G, et al. AIR/X-a rule-based multistage indexing systems for large subject fields. Proceedings of the RIAO’91 1991; 606–623
Google Scholar
Yang Y, Chute CG. An application of Least Squares Fit Mapping to text information retrieval. Proc 16th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval 1993; 281–290
Google Scholar
Haines D., Croft B. Relevance Feedback and inference networks. Proc 16th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval1993; 2–11
Google Scholar
Tzeras K, Hartmann S. Automatic indexing based on Bayesian inference networks. Proc 16th Ann lot ACM SIGIR Conference on Research and Development in Information Retrieval 1993; 22–34
Google Scholar
Wong SKM, Cal YJ. Computation of term associations by a neural network. Proc 16th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval 1993; 107–115
Google Scholar
Wong SKM, Cal YJ. Computation of term associations by a neural network. Proc 16th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval 1993; 107–115
Google Scholar
Yang Y, Chute CG. Words or Concepts: the Features of Indexing Units and their Optimal Use in Information Retrieval. Proc 17th Ann Symp Comp Applic Med Care 1993; 17: 685–689
Google Scholar
Salton G. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, Pennsylvania, 1989
Google Scholar
Haynes R, McKibbon K, Walker C, Ryan N, Fitzgerald D, Ramsden M. Online access to MEDLINE in clinical settings. Ann. Int. Med. 1990; 112: 78–84
Google Scholar

Download references

Author information

Authors and Affiliations

Section of Medical Information Resources, Mayo Clinic/Foundation, Rochester, Minnesota, 55905, USA
Yiming Yang

Authors

Yiming Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Massachusetts, 01003, Amherst, MA, USA
Bruce W. Croft
Department of Computer Science, University of Glasgow, G12 8RZ, 8–17 Lilybank Gardens, Glasgow, Scotland
C. J. van Rijsbergen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Y. (1994). Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_2

Download citation

DOI: https://doi.org/10.1007/978-1-4471-2099-5_2
Publisher Name: Springer, London
Print ISBN: 978-3-540-19889-5
Online ISBN: 978-1-4471-2099-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Learning to Classify Text Using a Few Labeled Examples

TagTheWeb: Using Wikipedia Categories to Automatically Categorize Resources on the Web

Unsupervised Aggregation of Categories for Document Labelling

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Learning to Classify Text Using a Few Labeled Examples

TagTheWeb: Using Wikipedia Categories to Automatically Categorize Resources on the Web

Unsupervised Aggregation of Categories for Document Labelling

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation