Abstract
We present a method of constructing and using a cascade consisting of a left-and a right-sequential finite-state transducer (FST), T 1 and T 2, for part-of-speech (POS) disambiguation. Compared to a Hidden Markov model (HMM), this FST cascade has the advantage of significantly higher processing speed, but at the cost of slightly lower accuracy. Applications such as Information Retrieval, where the speed can be more important than accuracy, could benefit from this approach.
In the process of POS tagging, we first assign every word of a sentence a unique ambiguity class c i that can be looked up in a lexicon encoded by a sequential FST. Every c i is denoted by a single symbol, e.g. “[ADJ NOUN]”, although it represents a set of alternative tags that a given word can occur with. The sequence of the c i of all words of one sentence is the input to our FST cascade (Fig. 1). It is mapped by T 1, from left to right, to a sequence of reduced ambiguity classes r i. Every r i is denoted by a single symbol, although it represents a set of alternative tags. Intuitively, T 1 eliminates the less likely tags from c i, thus creating r i. Finally, T 2 maps the sequence of r i, from right to left, to an output sequence of single POS tags t i. Intuitively, T 2 selects the most likely ti from every r i (Fig. 1).
Although our approach is related to the concept of bimachines [2] and factorization [1], we proceed differently in that we build two sequential FSTs directly and not by factorization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
C.C. Elgot, and J.E. Mezei. 1965. On relations defined by generalized finite automata. IBM Journal of Research and Development, pages 47–68, January.
M.P. Schützenberger. 1961. A remark on finite transducers. Information and Control, 4:185–187.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kempe, A. (2001). Part-of-Speech Tagging with Two Sequential Transducers. In: Yu, S., Păun, A. (eds) Implementation and Application of Automata. CIAA 2000. Lecture Notes in Computer Science, vol 2088. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44674-5_34
Download citation
DOI: https://doi.org/10.1007/3-540-44674-5_34
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42491-8
Online ISBN: 978-3-540-44674-3
eBook Packages: Springer Book Archive