The Construction of a Chinese Semantic Dependency Graph Bank

Shao, Yanqiu; Che, Wanxiang; Liu, Ting; Ding, Yu

doi:10.1007/978-3-031-38913-9_13

Yanqiu Shao⁵,
Wanxiang Che⁶,
Ting Liu⁶ &
…
Yu Ding⁶

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 49))

195 Accesses

Abstract

Semantic dependency parsing is a deep semantic analysis task based on large-scale and canonically annotated corpora. This chapter will present a new Chinese semantic dependency scheme using solid linguistic knowledge of Chinese. Chinese is a meaning-combined language with flexible syntactic structures and complex modifying relations among words. Thus, we used dependency graphs instead of dependency trees as target representations to allow nodes to have more than one incoming arc and crosses among dependency arcs. We annotated the dependency structures of 30,161 sentences, with 570,403 words, using this scheme. This chapter will describe the semantic dependency scheme in detail, including its specifications and the process involved in creating the corpus. Using Fleiss’ kappa, the inner-annotated agreement evaluation results were 0.835 for non-labeled arcs and 0.686 for labeled arcs as assignments. This chapter will also provide the statistics of the annotated corpus.

Access provided by Autonomous University of Puebla. Download chapter PDF

Improving Chinese Dependency Parsing with Lexical Semantic Features

The Groningen Meaning Bank

Prague Dependency Treebank

Keywords

1 Introduction

Sentence analysis based on dependency grammar has recently become a hot issue in natural language processing. This task has been extensively studied and has proven to be useful in several applications, including question answering (Cui et al. 2005; Punyakanok et al. 2004), semantic structure extraction (Johansson and Nugues 2007), and semantic role labeling (Hacioglu 2004; Pradhan et al. 2005).

Much work has focused on constructing dependency parsers. So far, all the dependency parsing technologies have been data driven, and large-scale corpora have been annotated to construct automatic dependency parsers. The Prague Dependency Treebank (Böhmová et al. 2003), the first dependency structure annotation work, has been influential. Dependency treebanks have been built for at least 30 languages, on a large or small scale, by hand or via algorithms to automatically convert available phrase structure treebanks to dependency structure notations (Marimon and Bel 2014), such as Chatterji et al. (2014), Haverinen et al. (2014), and Marneffe and Manning (2008). Liu et al. (2006) created a Chinese syntactic dependency treebank (CDT) consisting of 60,000 sentences from the People’s Daily in the 1990s. Several studies have been conducted on Chinese dependency parsing using this corpus, such as Niu et al. (2009) and Li et al. (2012). Most studies on dependency analysis have been syntax-oriented. Semantic dependencies were seldom studied until the share tasks in the SemEval-2012 (Che et al. 2012) and SemEval-2014 (Oepen et al. 2014), where semantic dependencies annotated in Chinese and English were provided for participants to build dependency parsing systems.

Distinct from English, Chinese is an ideographic language belonging to the Sino-Tibetan family (Lu 2001) that organizes sentences based on logical connections among lexical meanings and the semantics of sub-sentences, so no formal meanings or fixed syntactic structures are available. Because rich latent information is hidden in facial words, the semantic analysis of Chinese is specialized. Conversely, English is a hypotaxis language that organizes sentences by linguistically formal meanings, wherein grammar prioritizes syntax and even disengages from semantics.

Semantic dependency parsing aims to determine all the word pairs with exact semantic relations and connect each word pair to a dependency arc with a relation label, indicating their semantic relations. Semantic dependency has similarities with and differences from syntactic dependency. Both are based on dependency grammar (Robinson 1970) and annotate each word in a sentence. Syntactic dependency gives a transparent encoding of the predicate-argument structure, while semantic dependency explicitly displays semantics hidden behind predicate-argument structures.

The number of semantic dependency labels is more than five times higher than syntactic dependency labels^{Footnote 1}, which allows them to express different information of sentences. Syntactic dependency analyzes syntactic functions from the perspective of grammar systems (e.g., subjective, predicate, and objective), and for this task, dependency tree structures are sufficient. By contrast, semantic dependency involves semantic relations (e.g., agent, patient, and experiencer) between each pair of words. According to the above analysis of the Chinese language, semantic relations between word pairs do not always generate tree structures, and graphs describe semantics better than trees. These findings coincide with the meaning-text theory (MTT), a theoretical framework for the description of natural languages (Žolkovskij and Mel’čuk 1967). MTT considered that trees are not sufficient to express the complete meaning of sentences in some cases, which has been proven undoubted in our practice of corpus annotation.

Comparing word pairs connected by dependency arcs, semantic dependency seeks to depict the relations among content words, whereas syntactic dependency mostly relies on functional words (e.g., coordinating conjunctions and prepositions). Figure 13.1 presents an example of this difference. In the prepositional phrase 在教室 zai jiaoshi “at the classroom,” the preposition 在 zai “at” is the head word in (a), whereas the headword in (b) is the content word 教室 jiaoshi “classroom.”

An illustration of 2 dependencies in the Chinese phrase, zai jiaoshi kan shu, translated as, at classroom read book and read book at classroom. a. Syntactic dependency. First to second, third to fourth, and third to first. b. Semantic dependency. Second to first, third to first, and third to fourth. — **Fig. 13.1**

The rest of this chapter is organized as follows. Section 13.2 will describe the details of our dependency scheme, while Sect. 13.3 will introduce the origin of our corpus and the design of our annotation tool. Then, an evaluation of the inner-annotator agreement of our annotated corpus will be given, concretely describing the assessment method, in Sect. 13.4. Section 13.5 will present some statistics of our annotated corpus, followed by the conclusion in Sect. 13.6.

2 Annotation Scheme of the Semantic Dependency Graph

Dependency tree structures are traditionally prerequisites for syntactic dependency analysis. However, dependency trees are not suited for meaning representation because of some distortion in or omission of the dependency arcs needed to preserve a legal dependency structure. According to large-scale real corpus and parataxis characteristics, a word may be the argument of more than one predicate, resulting in multiple incoming arcs. Therefore, we extended dependency tree structures to graphs.

2.1 Graph Structure of Semantic Dependency

Semantic dependency graphs (SDGs) are directed acyclic graphs. Nodes refer to words, while edges refer to semantic relations between labeled words. There is only one node without a head, which is the root of the entire graph. Graphs overcome the limitations of dependency trees by allowing more than one head on certain nodes and crosses of arcs. Figure 13.2 shows that the node 杯子 beizi “cup” has semantic relations with both 打 da “break” and 破 po “damaged,” which means that 杯子 beizi “cup” has two heads, and the arcs connecting 杯子 beizi “cup” and 破 po “damaged” as well as 他 ta “he” and 打 da “break” cross.

An illustration of the S D G scheme for a sample Chinese sentence, ta ba beizi da po le, translated as he broke the cup. It is annotated as He ba cup break damaged le. The semantic units marked include root, function words, and argument. — **Fig. 13.2**

The dependency structure in traditional dependency grammar must be single-headed, connective, acyclic, and projective. Since dependency graphs do not include single-headed and projective relations, only connective and acyclic relations, they are considered extensions of dependency grammar.

2.2 Semantic Relation Set

Lu (2001) explained the parataxis network of Chinese grammar. We applied this semantic unit classification and semantic combination, as well as integrated the semantic characteristics, to construct a clear semantic relation scheme. At the same time, we also considered some of the semantic relation tags in HowNet (Dong and Dong 2006).

Semantic units are divided from high to low into event chains, events, arguments, concepts, and marks. Arguments refer to noun phrases related to certain predicates. Concepts are simple elements in basic human thought or content words in syntax. Marks represent the meaning attached to the entity information conveyed by speakers (e.g., speakers’ tones or moods). These semantic units correspond to compound sentences, simple sentences, chunks, content words, and function words. The meanings of sentences are expressed by event chains, which consist of multiple simple sentences. The meanings of simple sentences are expressed by arguments, while arguments are reflected by predicate, referential, or defining concepts. Marks are attached to concepts.

The meaning of a sentence consists of the meanings of the semantic units and their combinations, including semantic relations and attachments. Semantic attachments refer to marks on semantic units which are listed in Table 13.1 as “semantic marks” such as prepositions, mood words, punctuations, and so on. Semantic relations are classified into symmetric and asymmetric types. Symmetric relations include coordination, selection, and equivalence relations, while asymmetric relations include the following:

1.
Cooperative relations occur between core and non-core roles. For example, in 工人修理管道 gongren_xiuli_guandao “workers repair the pipeline,” 管道 guandao “pipeline” serves as a non-core role and is the patient of 修理 xiuli “repair,” which is a verb that serves as a core role. Relations between predicates and nouns belong to cooperative relations. Semantic roles usually refer to cooperative relations. Table 13.1 presents the 32 semantic roles we defined, divided into 8 small categories.
2.
Additional relations refer to the modifying relations among concepts within an argument, in which all semantic roles are available; for example, in 地下的管道 dixia_de_guandao “underground pipeline,” 地下 dixia “underground” is the modifier of 管道 guandao “pipeline,” which refers to a location relation.
3.
Connectional relations are bridging relations between two events that are neither symmetric nor nested relations. For example, for the sentence “如果天气好, 我会去颐和园 ruguo_tianqi_hao, wo_hui_qu_yiheyuan ‘If the weather is good, I will go to the Summer Palace’,” the former event is the hypothesis of the latter. Fifteen event relations were defined by our scheme.

Table 13.1 Label set of semantic relations

Full size table

We analyzed how the elements of each sentence constitute the entire meaning of the sentence and used the results as the theoretical basis in designing the SDG corpus. Table 13.1 shows the entire semantic relations set, which includes five types of semantic relations, i.e., semantic roles, reverse relations, nested relations, event relations, and semantic marks.

2.3 Special Situations

1.
Reverse relations. When a verb modifies a noun, a reverse relation is applied with the label r-XX (XX refers to a single-level semantic relation). A reverse relation is generated when a word pair with the same semantic relation appears in different sentences with different modifying orders. A reverse relation distinguishes different modifying orders (i.e., they have arcs with reverse directions in the two situations). For example, the semantic relation between the head word 男孩 nanhai “boy” and the kernel word 打 da “play” in Fig. 13.3 is the r-agent, and the label agent is labeled the kernel word 打 da “play” and its modifier 男孩 nanhai “boy.” The expression of the semantic tri-tuple of this pair of words in Fig. 13.3a is 男孩 nanhai “boy,” 打 da “play,” r-agent, and in Fig. 13.3b, it is 打 da “play,” 男孩 nanhai “boy,” agent. Here, the first word in the tri-tuple is the head word, and the second one is a modified or dependency word, while the last one has asemantic role.
2.
Nested events. Two events have a nested relation (i.e., one event is regarded as a grammatical item of the other), which belongs to two semantic hierarchies. For example, in the sentence in Fig. 13.4, the event 小孙女在玩计算机 xiao_sunnv_zai_wan_jisuanji “little granddaughter is playing the computer” is regarded as the content of the action 看见 kanjian “see.” A prefix “d” is added to single-level semantic relations as a “distinctive” label. The tri-tuple of this sentence is labeled 看见 kanjian “see,” 玩 wan “play,” d-content.
3.
Quantitative phrases. There are no English quantifiers such as 个 ge, 本 ben, 只 zhi, etc. in Chinese. Here, a “quantitative word” refers to the combination of one numeral and one quantifier, such as 十个 shi_ge “ten,” and a “quantitative phrase” represents the combination of a quantitative word and a noun, such as 十个人 shi_ge_ren “ten persons.” In our scheme, considering that sometimes numerals can be omitted, such as 这本书 zhe_ben_shu “this book,” the quantifier of the quantitative word was labeled the head word, and the numeral was the dependency word, while the semantic relation between them was labeled “Quan” (quantity), a measurement role. When a quantitative word modified a noun, the noun was labeled the head word of the whole quantitative phrase, and the quantifier was the dependency word. The semantic relation between the noun and the quantitative word was labeled “Qp” (quantity phrase). For example, for the quantitative phrase 五本书 wubenshu “five books,” the semantic tri-tuples were 本 ben “ben,” 五 wu “five,” Quan and 书 shu “book,” 本 ben “ben,” Qp.
4.
Serial verb sentences. When several verbs occur in one sentence and there is neither a pause punctuation nor a conjunction sub-sentence, these kinds of sentences are called serial verb sentences or compressed sentences, which in fact includes more than two events in one sentence. Mostly, the front verb of the serial verb sentence is selected as the head word, and in rare cases such as manner serial verb sentences, the head word is the rear verb. According to the relations between different verbs, the semantic relations of serial verb sentences are classified as succession, purpose, manner, result, and soon. For instance, the head word of the Chinese sentence “他穿衣服走了。 ta_chuan_yifu_zou_le ‘He wore his cloth and left’.” is the front verb 穿 chuan “wear,” and the relation between the two events is labeled “eSucc” (successor event). The tri-tuple of the two verbs in this sentence is 穿 chuan “wear,” 走 zou “leave,” eSucc. In fact, the subject word 他 ta “he” has two parent nodes—one is the verb 穿 chuan “wear” and the other is the verb 走 zou “leave.”
5.
“De” structures with the omission of the head word. The Chinese word 的 de “De” is always used as an auxiliary word, and it is often taken as a dependency mark. However, sometimes the head word of the De structure is omitted. In this head word deletion situation, 的 de “De” was labeled the head word in our scheme. For example, in the Chinese sentence “卖菜的走了。 mai_cai_de_zou_le ‘The man who sold vegetables left’.”, the head word 人 ren “person” of the De structure was omitted. Different from the Abstract Meaning Representation (AMR) semantic labeling system (Li et al. 2016), our scheme did not add the omitted component to the sentence, so the auxiliary word 的 de “De” was considered the head word of the De structure, and the tri-tuples were expressed as 走 zou “leave,” 的 de “De,” agent and 的 de “De,” 卖 mai “sell,” r-agent. Because 的 de “De” is often labeled as an auxiliary mark, if it is not annotated as a mark, it will mean that the situation of omission has occurred.
6.
Predicate-complement structures. The semantic relations between verbs in verb serial sentences can also be applied to the predicate-complement structure. For example, for the Chinese sentence “他走累了。 ta_zou_lei_le ‘He got tired of walking’.”, the semantic relation between the predicate 走 zou “walk” and the complement 累 lei “tired” was labeled “eResu” (result event), which means that the complement was the “result” of the verb.
7.
Separable words. In Chinese, some words can be separated into two parts, which are called “separable words.” For example, the word 洗澡 xizao “take a bath” can be split into 洗个澡 xi_ge_zao “take a bath” by inserting the Chinese quantifier word 个 ge “Ge” into the word 洗澡 xizao “take a bath.” In this case, the semantic relation between the two Chinese characters 洗 xi “take” and 澡 zao “bath” can be labeled “mSepa” (separation mark).

An illustration of 2 reverse relations in 2 Chinese V Ps translated as, the boy who is playing the basketball and the boy is playing the basketball. V P is a modifier in the former and the verb is the kernel in the latter with r-agency from word 4 to 1 and 2 to 1, in order, in the Chinese syntax. — **Fig. 13.3**

An illustration of nested relation in a Chinese sentence of 7 words. It is translated in 2 annotations, grandpa see little granddaughter is play computer, and Grandpa saw that the little granddaughter is playing the computer. An arrow labeled, d-content points from word 2 to 6 in the Chinese syntax. — **Fig. 13.4**

3 Corpus

3.1 Corpus Origin

Our corpus contained more than 30,000 sentences. The sentences were chosen from newspapers, spoken sentences, and Sina Weibo microblogs. We selected 10,068 newspaper sentences and labeled the word segmentation and part-of-speech (POS) information using Chinese PropBank 6.01 (Xue and Palmer 2003). Of the remaining sentences, 10,038 spoken and 10,055 Sina Weibo sentences had no annotated tags. Thus, we annotated the morphological information first before annotating semantic dependency. Chinese Treebank (CTB)-style POS tags were derived from the Penn English Treebank, which belongs to the Indo-European word class system that includes 33 POS tags.

Table 13.2 presents additional details on our annotated corpus, while Fig. 13.5 shows the curve of the number of sentences relative to sentence lengths. Spoken sentences refer to sentences with rich expressions (e.g., dialogues, dialogue sentences, Chinese-English bilingual sentences, and primary school texts). The sentences in the primary school texts were not all colloquial, as some of them exploited luxuriant expressions. Differences and the diversification of resources resulted in rich linguistic phenomena. Fan (1998) and Huang and Liao (2003) reduced sentence patterns into single and compound sentences from a linguistic perspective. In our annotated corpus, single sentences were categorized into 8 patterns, while compound sentences were categorized into 12 patterns, and each sentence pattern had corresponding sentences.

Table 13.2 Raw corpus details

Full size table

A multi-line graph of sentence number versus sentence length for 3 categories. Spoken sentences rise to 1200 at 10 and drop to 0 at 22. Weibo has rising peaks that reach 800 at length 20 and drop to 0 at 40. Newswire varies between 180 and 200 till 25 and drops after. Values are approximated. — **Fig. 13.5**

3.2 Annotation Tool

We developed an online annotation tool to enable annotators to conveniently search, annotate, and revise. Figure 13.6 shows the annotation interface of the tool. On the annotation page, two buttons are used to switch to the word segmentation and POS tagging sub-pages. On the history page, sentences are displayed with dependency labels and relations. Annotators can click on a sentence, which will take them to a page to revise the annotation. On the search page, different keywords and their combinations can be used to search for sentences and corresponding annotation results. When annotators are confused about certain words or relations, they can search and learn from other labeling results. This online tool provides helpful functions for those involved in the annotation process.

A screenshot of the online annotation tool interface. It has a table of 2 rows with several columns with entries in both English and a foreign language. Arrows curve and connect them with labels including root experience, and N O. 4 tabs appear below with text in a foreign language. — **Fig. 13.6**

4 Evaluation of the Corpus

The quality of an annotated corpus is crucial for automatic dependency parsing. We measured the consistency degree of the inner-annotators’ agreement to evaluate the quality of our annotated corpus, wherein the same linguistic phenomena were labeled with the same dependency structures and relation labels. We employed three linguistics master’s students to annotate the same smaller corpus blindly. The smaller corpus included 422 randomly selected sentences from the 30,000 sentences collected. We evaluated the agreements on the dependency arcs level and both the arc and relation levels, respectively. The average agreements among the three pairs of annotators were 88.78% for arcs only and 72.15% for both arcs and relations. The latter result was lower than the former because only when both the dependency arcs and corresponding relations were consistent could an agreement item be obtained. Hundreds of relations were defined, so this low result was conceivable. Table 13.3 shows the agreement results.

Table 13.3 Agreement results of three separate annotator pairs

Full size table

In addition, we evaluated the agreement using Fleiss’ kappa discussed in Fleiss (1971). The degree of agreement between all annotators was computed in terms of Fleiss’ kappa (κ), as shown in Eq. (13.1):

$$ \upkappa =\frac{\overline{P}-{\overline{P}}_e}{1-{\overline{P}}_e} $$

(13.1)

The proportion of all assignments used for assigning the jth assignment was defined using Eq. (13.2), where N is the total number of words, n is the number of annotators for our resource building work, K is the total number of assignment types conducted by the annotators, and N × n is the total number of assignments made by all the annotators, while the mean proportion of assignments for all assignments was defined using Eq. (13.3):

$$ {P}_j=\frac{1}{N\times n}\sum \limits_{i=1}^N{n}_{ij} $$

(13.2)

$$ {\overline{P}}_e=\sum \limits_{j=1}^K{P}_j^2 $$

(13.3)

The extent of the annotator pairs’ agreement for the ith word was defined using Eq. (13.4), where subscript i (1, …, N) represents the words and subscript j (1, …, K) represents the assignments; thus, nij is the number of annotators who assigned the ith word to the jth assignment, and n(n − 1)/2 represents the pairs of annotators, while the mean of agreements for all words was defined using Eq. (13.5):

$$ {P}_i=\frac{1}{n\left(n-1\right)}\sum \limits_{j=1}^K{n}_{ij}\left({n}_{ij}-1\right) $$

(13.4)

$$ \overline{P}=\frac{1}{N}\sum \limits_{i=1}^N{P}_i $$

(13.5)

In this case, n is equal to 3 (i.e., the three annotators that participated in this experiment). The total number of sentences annotated was 422, which included 6634 words. We calculated two Fleiss’ kappa scores, one using arcs as assignments and the other using both arc and relation labels. For the two criteria, we had 48 and 1638 assignments, respectively. We achieved kappa scores of 0.835 and 0.686, respectively, for the two criteria. If all three annotators agreed on all the assignments, then the kappa score would be 1. Generally, when the kappa score is above 0.7, agreement is good, and when the kappa score is below 0.7 but above 0.4, agreement is reasonable. The kappa scores indicated that the three annotators mostly agreed when annotating the semantic dependency graph corpus.

5 Corpus Statistics

We performed statistics on our annotated corpus. Table 13.4 illustrates the highest and lowest frequent labels in the annotated corpus. The bottom five labels with the least occurrence were reverse or nested relations, which are uncommon kinds of linguistic phenomena. By contrast, the labels with the most frequent appearances are shown in the third and fourth columns. The mPunc (punctuation) label was excluded. Each sentence had at least one punctuation mark, and the total occurrence of mPunc exceeded 30,161. Both Exp (experiencer) and Agt (agent) appeared in the top 5 label list because they belong to the subject-predicate structure, which frequently appears in languages, at the syntactic level. Two relation marks—mAux (auxiliary mark) and mMod (modal mark)—had the highest frequencies. Desc (description) appeared the most frequently as it was used between most adjectives and nouns.

Table 13.4 Sample of semantic relations with the least and most occurrences

Full size table

Figure 13.7 shows the relation numbers and frequencies by relation groups. The frequencies of each group were added. We recorded 27 nested relations and 28 reverse relations in our annotated corpus. Reverse relations appeared the least among all groups, followed by nested relations. These two kinds of linguistic phenomena are not common in the Chinese language. The occurrence of event relations was directly related to the number of sub-sentences.

A bar cum line graph of the number and occurrence of labels in 5 categories. In number and occurrence, main role relations have the highest value of 176729 and 32, in order. Reverse relations have the least value of 4332 in number and semantic marks have the least value of 17 in occurrence. — **Fig. 13.7**

Table 13.5 shows the arc proportions that caused crossed arcs and nodes with multiple heads. Statistical analysis was performed on the entire annotated corpus, including 30,161 sentences. The proportion of sentences with cross arcs was 24.31%, while sentences with multiple heads accounted for 30.59%. Figure 13.8a shows an example of the sentence with crossed arcs, and Fig. 13.8b is an example of sentence with multiple heads. Example (a) shows the Agt arc from 哭 ku “cry,” 她 ta “she,” and the Exp arc from 肿 Zhong “swollen,” to 眼睛 yanjing “eye” cross, while (b) shows the node 妹妹 meimei “sister,” which has two parent nodes—有 you “have” and 能干 nenggan “competent.” As can be seen, the structure of quite a few sentences in Chinese highlights the limitations of dependency trees, so using semantic dependency graphs to describe semantic structures is quite necessary.

Table 13.5 Proportion of crossed arcs and sentences, including nodes with multiple heads

Full size table

2 illustrations for 2 Chinese sentences, translated with 2 annotations each. 1. Crossed arc. She eye cry swollen le and her eyes were swollen with tears. 2. Multiple heads. I have a sister very competent and I have a sister who is very competent. The labels include root, tone, and Poss. — **Fig. 13.8**

6 Conclusion

The current chapter proposed a scheme for Chinese semantic dependency, and each label in this scheme reflected concrete semantic information. The SDG is a human-understandable semantic representation both visually and logically. The semantic relations were designed from the perspective of linguistics to adapt to the characteristics of the Chinese language. Very little abstraction of semantic information exists, which distinguishes this proposed scheme from existing dependency schemes. Inducing semantics directly, we employed more relation labels than syntactic dependencies. To clarify the boundaries of relation labels, we classified them into several hierarchies that represented different types of information, namely, main semantic roles, event relations, and semantic marks.

We annotated more than 30,000 sentences based on this scheme. The sentences were chosen from spoken sentences, newswires, and Sina Weibo microblogs, covering both the common core of the language and more specialized domains. In the process of constructing this corpus, we obtained the utmost out of other gold standard information labeled in the sentences to generate pre-annotation results by rules or by machine learning tools. Triple-blinded annotation experiments were conducted to measure the inner-annotators’ agreement by calculating the widely used Fleiss’ kappa. We achieved kappa scores of 0.835 and 0.686 for non-labeled arcs and labeled arcs as assignments, respectively. These results indicate that the three annotators had a great majority of agreements while annotating the corpus, although the semantic dependency scheme was slightly complicated.

According to the statistics and analysis of the annotated corpus, we arrived at the conclusion that although most sentences constitute projective dependency trees in Chinese, non-projective trees and dependency graphs do exist but in a smaller proportion. Thus, using semantic dependency graphs to describe semantic information is quite necessary and reasonable.

Notes

1.
CDT and Malt syntactic dependency have 13 and 12 labels, respectively. The Malt dependency corpus was acquired via automatic conversion from Penn Chinese Treebank phrase structure trees using Penn2Malt. Semantic dependency labels exceed 50, including those produced by Li et al. (2003) and Chen et al. (1999). Hundreds of labels are available in our BLCU-HIT Semantic Dependency Parsing (BH-SDP) system.

References

Böhmová, Alena, Jan Hajič, Eva Hajičová, and Barbora Hladká. 2003. The Prague dependency treebank: A three-level annotation scenario. In Treebanks: Building and using parsed corpora, ed. Anne Abeillé, Amsterdam: Kluwer, 103–127.
Google Scholar
Chatterji, Sanjay, Tanaya Mukherjee Sarkar, Pragati Dhang, Samhita Deb, Sudeshna Sarkar, Jayshree Chakraborty, Anupam Basu. 2014. A dependency annotation scheme for Bangla treebank. Language Resources and Evaluation 48:443–477.
Google Scholar
Che, Wanxiang, Meishan Zhang, Yanqiu Shao, and Ting Liu. 2012. SemEval-2012 task 5: Chinese semantic dependency parsing. In Proceedings of the First Joint Conference on Lexical and Computational Semantics (Vol. 1): Proceedings of the main conference and the shared task; (Vol. 2): Proceedings of the sixth international workshop on semantic evaluation, Montréal, Canada, 378–384. Available at https://aclanthology.info/papers/S12-1050/s12-1050. Accessed 8 March 2019.
Chen, Feng-Yi, Pi-Fang Tsai, Keh-jiann Chen, and Chu-Ren Huang 陈凤仪, 蔡碧芳, 陈克健, 黄居仁. 1999. Project Report: Sinica Treebank 中文句结构树资料库的构建. Computational Linguistics and Chinese Language Processing 中文计算语言学期刊 4(2):87–104.
Google Scholar
Cui, Hang, Renxu Sun, Keya Li, Min-Yen Kan, and Tat-Seng Chua. 2005. Question answering passage retrieval using dependency relations. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval—SIGIR’05, ACM Press, New York, NY, 400–407. Available at https://www.researchgate.net/publication/221300315_Question_answering_passage_retrieval_using_dependency_relations. Accessed 8 March 2019.
De Marneffe, Marie-Catherine, and Christopher D. Manning. 2008. The Stanford typed dependencies representation. In Proceedings of the COLING 2008 Workshop on Cross-framework and Cross-domain Parser Evaluation, Manchester, United Kingdom, 1–8. Available at https://nlp.stanford.edu/pubs/dependencies-coling08.pdf. Accessed 8 March 2019.
Dong, Qiang, and Zhendong Dong. 2006. HowNet and computation of meaning. World Scientific Publishing Company.
Google Scholar
Fan, Xiao 范晓. 1998. The sentence types of Chinese 汉语的句子类型. Shuhai Publishing House.
Google Scholar
Fleiss, Joseph L. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5):378–382.
Google Scholar
Hacioglu, Kadri. 2004. Semantic role labeling using dependency trees. In Proceedings of the 20th International Conference on Computational Linguistics—COLING ’04, Geneva, Switzerland, Article number 1273. 1–4. Available at https://dl.acm.org/citation.cfm?doid=‌1220355.1220541. Accessed 8 March 2019.
Haverinen, Katri, Jenna Nyblom, Timo Viljanen, Veronika Laippala, Samuel Kohonen, Anna Missilä, Stina Ojala, Tapio Salakoski, Filip Ginter. 2014. Building the essential resources for Finnish: The Turku dependency treebank. Language Resources and Evaluation 48:493–531.
Google Scholar
Huang, Bo-rong, and Xu-dong Liao 黄伯荣, 廖旭东. 2003. Contemporary Chinese language 现代汉语. Higher Education Press.
Google Scholar
Johansson, Richard, and Pierre Nugues. 2007. LTH: Semantic structure extraction using nonprojective dependency trees. In Proceedings of the 4th International Workshop on Semantic Evaluations, Prague, Czech Republic, 227–230. Available at https://dl.acm.org/citation.cfm?id=1621522. Accessed 8 March 2019.
Li, Mingqin, Juanzi Li, Zhendong Dong, Zuoying Wang, and Dajin Lu. 2003. Building a large Chinese corpus annotated with semantic dependency. In Proceedings of the Second SIGHAN Workshop on Chinese Language Processing (Vol. 17), Sapporo, Japan, 84–91. Available at http://aclweb.org/anthology/W03-1712. Accessed 8 March 2019.
Li, Zhenghua, Ting Liu, and Wanxiang Che. 2012. Exploiting multiple treebanks for parsing with quasi synchronous grammars. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers (Vol. 1), Jeju Island, Korea, 675–684. Available at http://ir.hit.edu.cn/~lzh/papers/zhenghua-P12-multi-treebanks.pdf. Accessed 8 March 2019.
Li, Bin, Lijun Wen, Weiguang Qu, Lijun Bu, and Nianwen Xue. 2016. Annotating the Little Prince with Chinese AMRs. In Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016), Berlin, Germany, 7–15. Available at http://aclweb.org/anthology/W16-1702. Accessed 8 March 2019.
Liu, Ting, Jinshan Ma, and Sheng Li 刘挺, 马金山, 李生. 2006. Chinese dependency parsing model based on lexical governing degree 基于词汇支配度的汉语依存分析模型. Journal of Software 软件学报 17(9):1876–1883.
Google Scholar
Lu, Chuan 鲁川. 2001. The parataxis network of the Chinese grammar 汉语语法的意合网络. The Commercial Press.
Google Scholar
Marimon, Montserrat, and Núria Bel. 2014. Dependency structure annotation in the IULA Spanish LSP treebank. Language Resources and Evaluation 49(2):433–454.
Google Scholar
Niu, Zheng-Yu, Haifeng Wang, and Hua Wu. 2009. Exploiting heterogeneous treebanks for parsing. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, Association for Computational Linguistics, Suntec, Singapore, 46–54. Available at http://www.aclweb.org/anthology/P09-1006. Accessed 8 March 2019.
Oepen, Stephan, Marco Kuhlmann, Daniel Zeman, Yusuke Miyao, Dan Flickinger, Jan Hajič, Angelina Ivanova, and Yi Zhang. 2014. SemEval-2014 task 8: Broad-coverage semantic dependency parsing. In Proceedings of the Eighth International Workshop on Semantic Evaluation (SemEval-2014), Dublin City University, Dublin, Ireland, 63–72. Available at http://aclweb.org/anthology/S14-2008. Accessed 8 March 2019.
Pradhan, Sameer, Wayne Ward, Kadri Hacioglu, James H. Martin, Daniel Jurafsky. 2005. Semantic Role Labeling Using Different Syntactic Views. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, Michigan, 581–588. Available at http://cemantix.org/papers/pradhan-acl-2005.pdf. Accessed 8 March 2019.
Punyakanok, Vasin, Dan Roth, and Wen-tau Yih. 2004. Mapping dependencies trees: An application to question answering. In Proceedings of International Symposium on Artificial Intelligence & Mathematics Fort, 1–10. Available at http://l2r.cs.uiuc.edu/~danr/Papers/PunyakanokRoYi04a.pdf. Accessed 8 March 2019.
Robinson, Jane J. 1970. Dependency structures and transformational rules. Language 46:259–285.
Google Scholar
Xue, Nianwen, and Martha Palmer. 2003. Annotating the propositions in the Penn Chinese treebank. In Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, 47–54. Available at http://www.aclweb.org/anthology/W03-1707. Accessed 8 March 2019.
Žolkovskij, Aleksandr, and Igor A. Mel’čuk. 1967. O sistemesemantiˇceskogosinteza. II: Pravilapreobrazovanija [On a system of semantic synthesis (of texts). II: Paraphrasing rules]. Nauˇcno-texniˇceskaja informacija 2, Informacionnye processy I sistemy, 17–27.
Google Scholar

Download references

Acknowledgments

We appreciatively acknowledge the support of the National Natural Science Foundation of China (61872402), the Humanities and Social Science Project of the Ministry of Education (17YJAZH068), and the Science Foundation of Beijing Language and Culture University (supported by the Fundamental Research Funds for the Central Universities, 18ZDJ03).

Author information

Authors and Affiliations

College of Information Sciences, Beijing Language and Culture University, Beijing, China
Yanqiu Shao
Computer Science and Technology College, Harbin Institute of Technology, Harbin, China
Wanxiang Che, Ting Liu & Yu Ding

Authors

Yanqiu Shao
View author publications
You can also search for this author in PubMed Google Scholar
Wanxiang Che
View author publications
You can also search for this author in PubMed Google Scholar
Ting Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanqiu Shao .

Editor information

Editors and Affiliations

Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Chu-Ren Huang
Graduate Institute of Linguistics, National Taiwan University, Taipei, Taiwan
Shu-Kai Hsieh
School of Electronic Information and Artificial Intelligence, Leshan Normal University, Leshan City, Sichuan, China
Peng Jin

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shao, Y., Che, W., Liu, T., Ding, Y. (2023). The Construction of a Chinese Semantic Dependency Graph Bank. In: Huang, CR., Hsieh, SK., Jin, P. (eds) Chinese Language Resources. Text, Speech and Language Technology, vol 49. Springer, Cham. https://doi.org/10.1007/978-3-031-38913-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-38913-9_13
Published: 19 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38912-2
Online ISBN: 978-3-031-38913-9
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics

The Construction of a Chinese Semantic Dependency Graph Bank

Abstract

Similar content being viewed by others