Hybrid Dependency Parser with Segmented Treebanks and Reparsing

Wu, Fuxiang; Zhou, Fugen

doi:10.1007/978-3-662-46469-4_6

Fuxiang Wu³ &
Fugen Zhou³

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 336))

1126 Accesses
1 Citations

Abstract

We propose a hybrid dependency parsing pipeline which combines transition-based parser and graph-based parser, and use segmented treebanks to train transition-based parsers as subparsers in front end, and then propose a constrained Eisner’s algorithm to reparse their outputs. We build the pipeline to investigate the influence on parsing accuracy when training with different segmentations of training data and find a convenient method to obtain parsing reliability score while achieving state-of-the-art parsing accuracy. Our results show that the pipeline with segmented training dataset could improve accuracy through reparsing while providing parsing reliability score.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Tuning DeSR for Dependency Parsing of Italian

Iterative Integration of Unsupervised Features for Chinese Dependency Parsing

A Top-Down Model for Character-Level Chinese Dependency Parsing

Keywords

6.1 Introduction

A good amount of research has been devoted to parsing technology, due to the importance of dependency parsing, and many natural language applications, such as information retrieval and Q&A system [10], employing it as a base component, and their performance may highly rely on the parsing result.

Recent methods of dependency parsing can be divided into two classes: data-driven methods and rule-based methods. Data-driven methods usually are statistical parser and use some machine learning algorithms to catch the statistical features of data in order to produce syntactic relations of words in sentences.

Most of the state-of-the-art parsers are statistical parser, parsing accuracy of which highly relies on the quality and quantity of treebank [3, 6, 11]. There are several treebanks in China for syntactic parsing, such as Penn Chinese Treebank (CTB) [8] and Chinese Dependency Treebank (CDT). The CTB is constituency annotation and was retrieved from Xinhua Newswire, Hong Kong news, Sinorama and ACE broadcast news, while the CDT is a dependency treebank which was retrieved from People’s Daily newswire stories. Occurrence frequency of syntactic substructure (subtree) in one treebank may vary from another. Some would frequently occur in the treebank but others would not. The rare syntactic substructures of some sentences in the treebank are well formed for human, but would be abandoned when there are some common syntactic substructures which are conflict with the rare ones and prevent the parser from handling rare syntactic structures correctly. This would lead to label attachment recall rate degradation partly.

In order to examine the assumption, we segment treebank into k-parts in one round, and train a transition-based parser (subparser) for each part, and then a result set generated by the subparsers is compared with k-best generated by parser trained with full treebank. We learn that label attachment recall rate of k-segments would be higher than corresponding k-best’s, and this confirms the assumption. Depending on this phenomenon, we further employ a parser to post-reparse the subparsers output. Since transition-based parser and graph-based parser have different training and inference algorithms [5, 7] and have different behaviors, we construct the post-reparser with constrained Eisner’s algorithm [2, 4] to find maximum spanning trees (MST). The experiment shows that the pipeline could improve the parsing accuracy while computing the parsing reliability score.

6.2 Parsing Pipeline

The pipeline in this paper addresses the general structural prediction problem, which map an input sentence $ x \in {\mathbf{X}} $ to an output dependency structure $ y \in {\mathbf{Y}} $, which is composed of edge e,

$$ e = \left\{ {\begin{array}{*{20}c} {\left\langle {i,j, \leftarrow ,l} \right\rangle } \\ {\left\langle {i,j, \to ,l} \right\rangle } \\ \end{array} } \right. $$

(6.1)

where i and j are relation endpoints, l is dependent label in label set L. We employ transition-based parser with beam search [9] as subparser and use MST parser with conditional random field as post-reparser. In CRF model, the output y probability would be,

$$ p\left( {y|x} \right) = {{\exp \left( {f\left( {y,x} \right) \cdot {\varvec{\uplambda}}} \right)} \mathord{\left/ {\vphantom {{\exp \left( {f\left( {y,x} \right) \cdot {\varvec{\uplambda}}} \right)} {Z\left( x \right)}}} \right. \kern-0pt} {Z\left( x \right)}} $$

(6.2)

where $ f\left( {y,x} \right) $ maps y and x to a feature vector, λ is a corresponding weight vector, and $ Z\left( x \right) $ is the normalization factor. For a sentence x, the parsing result y is calculated by finding the highest probability one among the all possible results,

$$ O\left( x \right) = { \arg }\,{ \hbox{max} }_{{y\in \mathbf{PSET}}\left( x \right)}\,p\left( {y|x} \right) $$

(6.3)

where $ {\mathbf{PSET}}\left( x \right) $ denotes the set of the possible result for the sentence x.

The pipeline is composed of a training procedure and a parsing procedure. The training procedure mainly creates a set of subparsers which are transition-based parser. And in parsing procedure, we first use this set of subparsers to achieve a result set, and then reparse the set to compute the best output.

6.2.1 Training

The training procedure is as follows:

Segmenting treebank into equally sized sub-treebanks $ {\varvec{\Omega}} = \left\{ {b_{i} } \right\}_{i = 1, \ldots ,N} $
Training subparser $ t_{i} $ with sub-treebank $ b_{i} \in\Omega $ to build a set of subparsers $ {\varvec{\Gamma}}_{N} = \left\{ {t_{i} } \right\}_{i = 1, \cdots ,N} $.
Training the MST parser T with the whole treebank to calculate the weight vector $ {\varvec{\uplambda}}_{T} $ for features.

Through this procedure, we get a trained model $ \left\{ {N,{\varvec{\Gamma}}_{N} ,{\varvec{\uplambda}}_{T} ,f_{T} } \right\} $, where N is the number of subparsers trained by segmented treebanks; $ f_{T} $ is a feature extract function of MST parser, built by feature temples.

6.2.2 Post-reparsing

In parsing step, a set of result $ {\mathbf{R}} = \left\{ {r_{i} } \right\}_{i = 1, \ldots ,N} $, which is different from N-best result, have been generated by subparser in $ {\varvec{\Gamma}}_{N} $ for an input sentence x. We use them to constrain the searching space for the sentence (far small than the full searching space), and then employ constrained Eisner’s algorithm to extract the best result. The constraint scores are obtained as follows:

$$ s_{c} \left( {e,{\mathbf{R}}} \right) = \left\{ {\begin{array}{*{20}c} {f_{T} \left( e \right) \cdot {\varvec{\uplambda}}_{T} } & {\text{if}} & {e \in r_{i} ,i = 1, \ldots ,N} \\ 0 & \quad {\text{else}} & {} \\ \end{array} } \right. $$

(6.4)

where $ f_{T} \left( e \right) $ maps edge e to a feature vector. And mixture score is as follows:

$$ s_{\text{mixc}} \left( {e,{\mathbf{R}},\alpha } \right) = \alpha \cdot f_{T} \left( e \right) \cdot {\varvec{\uplambda}}_{T} + \left( {1 - \alpha } \right) \cdot s_{c} \left( {e,{\mathbf{R}}} \right) $$

(6.5)

where α is a mixture factor which controls the strength of constraint from the result set, given a sentence $ S = w_{0} w_{1} \cdots w_{N} $ and the corresponding R. The post-reparsing procedure is as follows (Table 6.1).

Table 6.1 Pseudo-code for constrained Eisner’s algorithm

Full size table

6.2.3 Reliability Score of Dependency Relation

With the pipeline, we can get a set of subparsers $ {\varvec{\Gamma}}_{N} $, in which each subparser is trained by different parts of training corpora. Because each part of corpora can be seen as unseen data from other part, we can assume that the parsing result of each subparser for a sentence is supported by the corresponding part of training corpora. Thus, reliability score can be calculated like a weighted voting scheme [1] as follows:

$$ c\left( {e_{i} } \right) = {{\sum\limits_{{\varepsilon { \in }{\mathbf{E}}_{i} ,\varepsilon = e_{i} }} {\exp \left( {\xi \cdot f_{T} \left( \varepsilon \right) \cdot {\varvec{\uplambda}}_{T} } \right)} } \mathord{\left/ {\vphantom {{\sum\limits_{{\varepsilon { \in }{\mathbf{E}}_{i} ,\varepsilon = e_{i} }} {\exp \left( {\xi \cdot f_{T} \left( \varepsilon \right) \cdot {\varvec{\uplambda}}_{T} } \right)} } {\sum\limits_{{\varepsilon { \in }R_{i} }} {\exp \left( {\xi \cdot f_{T} \left( \varepsilon \right) \cdot {\varvec{\uplambda}}_{T} } \right)} }}} \right. \kern-0pt} {\sum\limits_{{\varepsilon { \in }R_{i} }} {\exp \left( {\xi \cdot f_{T} \left( \varepsilon \right) \cdot {\varvec{\uplambda}}_{T} } \right)} }} $$

(6.6)

where $ {\mathbf{E}}_{i} $ is a set of dependency relationships $ \left\langle {i, \cdot , \cdot , \cdot } \right\rangle $, which is ith relationship of dependency structure in set R, ξ is an adjusting factor, and when $ \xi = 0,\;c\left( {e_{i} } \right) $ is normal voting score for edge $ e_{i} $.

6.3 Baseline and Experiments

This section presents the pipeline experiments of segmentation and post-reparsing. Before this, we only evaluate the pipeline with Chinese Penn Treebank corpora as heavy computation cost for the CRF training without loss of generality. We split sentences in the Penn Treebank 6.0 into training, development, and test set as Table 6.2, and then employ the head-finding rule to translate them into dependency structures.

Table 6.2 The training, development, and test data for CTB6

Full size table

Baseline parsers are ZPar^{Footnote 1} dependency parser, MSTParser^{Footnote 2}, and crfParser,^{Footnote 3} which are open source projects and have achieved state-of-the-art accuracy. They are trained with the training data in Table 6.2, and use their default feature temple, respectively. The test results are as follows (Table 6.3).

Table 6.3 The test results of the baseline parsers

Full size table

where MSTParser1 and crfParser are first order graph-based parsers, MSTParser is second order parser, and ZPar is transition-based dependency parser.

In order to explore the phenomenon brought by corpora segmentation, we segment the training data into parts with different number, namely, $ {\varvec{\Theta}} = [2,3,4,6,12] $, and then build the set of subparser $ {\varvec{\Gamma}}_{K} $ for each $ k \in\Theta $.

6.3.1 Attachment Recall Rate

Labeled/unlabeled attachment recall rate (LAR/UAR) is the ratio of correct labeled/unlabeled attachment among the dependency structure of result set,

$$ \left\{ {\begin{array}{*{20}c} {{\text{LAR}}\left(\Pi \right) = {{\sum\limits_{{\left( {R_{K} ,r_{c} } \right){ \in \varPi }}} {\delta_{L} \left( {{\mathbf{R}}_{K} ,r_{c} } \right)} } \mathord{\left/ {\vphantom {{\sum\limits_{{\left( {R_{K} ,r_{c} } \right){ \in \varPi }}} {\delta_{L} \left( {{\mathbf{R}}_{K} ,r_{c} } \right)} } {\sum\nolimits_{{\left( {{\mathbf{R}}_{K} ,r_{c} } \right){ \in \varPi }}} {\left| {r_{c} } \right|} }}} \right. \kern-0pt} {\sum\nolimits_{{\left( {{\mathbf{R}}_{K} ,r_{c} } \right){ \in \varPi }}} {\left| {r_{c} } \right|} }}} \\ {{\text{UAR}}\left(\Pi \right) = {{\sum\limits_{{\left( {R_{K} ,r_{c} } \right){ \in \varPi }}} {\delta_{U} \left( {{\mathbf{R}}_{K} ,r_{c} } \right)} } \mathord{\left/ {\vphantom {{\sum\limits_{{\left( {R_{K} ,r_{c} } \right){ \in \varPi }}} {\delta_{U} \left( {{\mathbf{R}}_{K} ,r_{c} } \right)} } {\sum\nolimits_{{\left( {{\mathbf{R}}_{K} ,r_{c} } \right){ \in \varPi }}} {\left| {r_{c} } \right|} }}} \right. \kern-0pt} {\sum\nolimits_{{\left( {{\mathbf{R}}_{K} ,r_{c} } \right){ \in \varPi }}} {\left| {r_{c} } \right|} }}} \\ \end{array} } \right. $$

where $ {\mathbf{R}}_{K} $ and $ r_{c} $ are a set of parsing result and gold dependency structure for a sentence, $ {\varvec{\Pi}} $ generated from test data is a set of $ \left( {{\mathbf{R}}_{K} ,r_{c} } \right),\;\left| {r_{c} } \right| $ is the number of relationship in dependency $ r_{c} $, function $ \updelta_{L} ({\mathbf{R}}_{k} ,r_{c} ) $ counts the correct dependency relationships with label in $ r_{c} $ which coexist in result set $ {\mathbf{R}}_{k} $, and function $ \delta_{U} $ counts similarly without label.

We calculate LAR and UAR for the k-segment’s result set $ R_{K} $ and baseline parser’s k-best result. The relationship between number of parts and attachment recall rate is as follows.

With data segmentation, we can achieve higher LAR and UAR then k-best parsing result, this means that the dataset $ {\mathbf{R}}_{K} $ would cover more correct dependency relationship than the k-best dataset. It would be beneficial to postprocessing in the pipeline, such as reparsing and reranking, with small searching space.

6.3.2 Post-reparsing

In post-reparsing state, we analyze the result set generated by the set of subparser $ {\varvec{\Gamma}}_{N} $ or the N-best result made by baseline parser to get final parsing result, and their accuracy is as follows.

For each $ k \in {\varvec{\Theta}} $, we search the best $ \alpha \in \left[ {0,1} \right] $ for calculating the LAS/UAS of each k-segment’s or k-best result. The highest LAS of 2-segmentation is 83.2112, and is 0.7751 % higher than the ZPar in baseline parsers, and 2-best’s is 83.1275 %. From Fig. 6.2, we could see that the LAS of k-segment and k-best is lower than the ZPar’s when k > 2, meanwhile, k-segment’s LAS is lower than k-best’s. The reason may be the postparser, which is first-order minimum spanning tree parser with local features, is not powerful enough to utilize the higher label attachment recall rate. That is why 2-segment’s LAS is higher than 2-best’s. From Fig. 6.1, we could find that k-segment’s LAR ascends faster than k-best’s, and the k-best’s LAS descend slower than k-segment’s since k > 3 in Fig. 6.2, this is also shown that the postparser needs a finer design. Besides, we employ reliability score $ c\left( {e_{i} } \right) $ to rerank the result set $ {\mathbf{R}}_{K} $, the result is as follows:

From Fig. 6.3, we can find that using reliability score $ c\left( {\varvec{e}_{i} } \right) $ to directly select dependency relationship is feasible. Their LAS/UAS are higher than each element in k-segment’s result set, but their output may be not a tree, and need further process. The LAS/UAS of reranking result is lower than the baseline ZPar, this may also due to weakness of the postparser as well.

6.4 Conclusions and Future Work

We build a hybrid parsing pipeline, which employs transition-based dependency parser as subparser in front end, and then use graph-based dependency parser in next stage. Finally, we investigate the influence on the pipeline with different k-segment dataset. From the experiment, we found that using segmentation of training data would largely improve the labeled/unlabeled attachment recall rate with some final LAS/UAS drop, and the result set generated by the subparsers with high attachment recall rate could be used to calculate reliability score, such as simple voting scheme used in this paper. Besides, the hybrid pipeline could improve the final LAS/UAS when using 2-segment. But it cannot further improve the accuracy due to weak postparser. In future, we would try to use more sophisticated parser as postparser to explore the searching space constructed by the subparsers effectively.

Notes

References

Collins M (2002) Ranking algorithms for named-entity extraction: boosting and the voted perceptron. In: Proceedings of the 40th annual meeting on association for computational linguistics (ACL’ 02), pp 489–496
Google Scholar
Eisner J (1996) Three new probabilistic models for dependency parsing: an exploration. In: Proceedings of the 16th international conference on computational linguistics (COLING-96), pp 340–345
Google Scholar
Li ZH, Liu T, Che WX (2012) Exploiting multiple treebanks for parsing with quasi-synchronous grammars. In: Proceedings of the 50th annual meeting of the association for computational linguistics (ACL’ 12), pp 675–684
Google Scholar
McDonald R, Pereira F (2006) Online learning of approximate dependency parsing algorithms. In: Proceedings of the 11th international conference of the European chapter of the association for computational linguistics (EACL 2006), pp 81–88
Google Scholar
Nivre J, McDonald R (2008) Integrating graph-based and transition-based dependency parsers. In: Proceedings of the 46th annual meeting of the association for computational linguistics, pp 950–958
Google Scholar
Niu ZY, Wang HF, Wu H (2009) Exploiting heterogeneous treebanks for parsing. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, pp 46–54
Google Scholar
Plank B, Noord GV (2010) Grammar-driven versus data-driven: which parsing system is more affected by domain shifts? In: Proceedings of the 2010 workshop on NLP and linguistics: finding the common ground (NLPLING’ 10), pp 25–33
Google Scholar
Xue NW, Xia F, Chiou FD, Palmer M (2005) The Penn Chinese treebank: phrase structure annotation of a large corpus. Nat Lang Eng 11(2):207–238
Article Google Scholar
Zhang Y, Clark S (2011) Syntactic processing using the generalized perceptron and beam search. Comput Linguist 37(1):105–151
Article Google Scholar
Zhou GY, Cai L, Zhao J, Liu K (2011) Phrase-based translation model for question retrieval in community question answer archives. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (HLT’ 11), pp 653–662
Google Scholar
Zhou GY, Zhao J (2013) Joint inference for heterogeneous dependency parsing. In: The 51st annual meeting of the association for computational linguistics, pp 104–109
Google Scholar

Download references

Author information

Authors and Affiliations

Image Processing Center, Beihang University, Beihang, China
Fuxiang Wu & Fugen Zhou

Authors

Fuxiang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Fugen Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fuxiang Wu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Zhidong Deng
Department of Computer Science and Techn, Tsinghua University, Beijing, China
Hongbo Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, F., Zhou, F. (2015). Hybrid Dependency Parser with Segmented Treebanks and Reparsing. In: Deng, Z., Li, H. (eds) Proceedings of the 2015 Chinese Intelligent Automation Conference. Lecture Notes in Electrical Engineering, vol 336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46469-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-662-46469-4_6
Published: 28 March 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46468-7
Online ISBN: 978-3-662-46469-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Hybrid Dependency Parser with Segmented Treebanks and Reparsing

Abstract

Similar content being viewed by others

Tuning DeSR for Dependency Parsing of Italian

Iterative Integration of Unsupervised Features for Chinese Dependency Parsing

A Top-Down Model for Character-Level Chinese Dependency Parsing

Keywords

6.1 Introduction