Keywords

1 Introduction

The most popular method for dependency parser comparison involves the direct measurement of the parser output accuracy in terms of metrics such as labeled attachment score (LAS) and unlabeled attachment score (UAS). This assumes the existence of a gold-standard test corpus developed with the use of a specific tagset and a list of dependency names by following some specific syntactic criteria. Such an evaluation procedure makes it difficult to evaluate parsing systems developed with different syntactic criteria from those used in the gold-standard test. Direct evaluation has been thought to compare strategies based on different algorithms but trained on the same treebanks and using the same tokenization. In fact, the strict requirements derived from direct evaluation prevents us from making fair comparisons among systems based on very different frameworks.

In this paper, we present a task-oriented evaluation of different dependency syntactic analyzers for Portuguese using the specific task of Open Information Extraction (OIE). This evaluation allows us to compare under the same conditions very different systems, more precisely, parsers trained on treebanks with different linguistic criteria, or even data-driven and rule-based parsers. Other task-oriented evaluation work focused on measure parsing accuracy through its influence in the performance of different types of NLP systems, such as sentiment analysis [11].

OIE is an information extraction task that consists of extracting basic propositions from sentences [2]. There are many OIE systems for English language, including those based on shallow syntactic information, e.g. TextRunner [2] and ReVerb [6], and those using syntactic dependencies: e.g. OLLIE [14] or ClauseIE [4]. There are also some proposals for Portuguese language: DepOE [10], Report [17], ArgOE [8], DependentIE [13], and the extractor of open relations between named entities reported in [3]. In order to use OIE systems to evaluate dependency parsers for Portuguese, we need an OIE system for Portuguese taking as input dependency trees. For the purpose of our indirect evaluation, we will use the open source system described in [8], which takes as input dependency trees in CoNLL-X format.

2 The Role of Dependency Parsing in OIE

We consider that it is possible to indirectly evaluate a parser by measuring the performance of the OIE system in which the parser is integrated as many errors made by the OIE system come from the parsing step. Let us take for example one of the sentences of our evaluation dataset (and described in the next section):

A regulação desses processos depende de várias interações de

indivíduos com os seus ambientes

The regulation of these processes depends on several interactions of individuals with their environments

One of the evaluated systems extracts the following two basic propositions (to simplify we show just the English translation):

figure a

The second proposition is not correct since it has been extracted from an odd dependency, such as shown in Fig. 1. The dependency between “environments” and “depends” (red arc below the sentence) is incorrect since “environments” is actually dependent on the noun “interactions”.Footnote 1 In sum, any odd dependency given by the parser makes the OIE system incorrectly extract, at least, one odd triple.

Fig. 1.
figure 1

Dependency analysis with Universal Dependencies. The head of “environments” is the noun “interactions” via nmod dependency, and not the verb “depends”. (Color figure online)

Furthermore, the resulting triples extracted by an OIE system are also an excellent way of visualizing the type of errors made by the depedency parser and, thereby, dependency-based OIE systems can be seen as useful linguistic tools to carry out error analysis on the parsing step.

3 Experiments

Our objective is to evaluate and compare diferent Portuguese dependency parsers which can be easily integrated into an open-source OIE system. For this purpose, we use the OIE module of LinguaKit, described in [8], which takes as input any text parsed in CoNLL-X format. We were able to integrate five Portuguese parsers into the OIE module: two rule-based parsers and three data-driven parsers. The rule-based systems are two different versions of DepPattern [7, 9]:

The parser used by ArgOE [8], and that available in LinguaKit.Footnote 2 The three data-driven parsers were trained using MaltParser 1.7.1Footnote 3 and two different algorithms: Nivre eager [15], based on arc-eager algorithm, and 2-planar [12]. They were trained with two versions of Floresta Sintá(c)tica treebank: Portuguese treebank Bosque 8.0 [1] and Universal Dependencies Portuguese treebank (UD_Portuguese) [16], which aims at full compatibility with CoNLL UD specifications.

In order to adapt the parsers to be used by the OIE system, we implemented some shallow conversion rules to align the tagset and dependency names of Bosque 8.0 and UD_Portuguese to the PoS tags and dependency names used by the OIE system. This is not a full and deep conversion since the OIE system only uses a small list of PoS tags and dependencies. So, before training a parser on the Portuguese treebank, first we must identify the specific PoS tags and dependencies used by the extraction module, and second, we have to change them by the corresponding labels. For UD_Portuguese, we also have to change the syntactic criteria on preposition dependencies. Concerning the rule-based parsers, no adaptation is required since the OIE system is based on the dependency labels of DepPattern. A priori, this could benefit systems that did not have to be adapted, but we have no way of measuring it.

To evaluate the results of the OIE system with the parsers defined above, five systems were configured, each one with a different parser. OIE evaluation is inspired by that reported in [4, 8]. The dataset consists of 103 sentences from a domain-specific corpus, called CorpusEco [18], containing texts on ecological issues. These sentences were processed by the 5 extractors, given rise to 862 triples. Then, each extracted triple was annotated as correct (1) or incorrect (0) according to some evaluation criteria: triples are not correct if they denote incoherent and uninformative propositions, or if they are constituted by over-specified relations, i.e., relations containing numbers, pronouns, or excessively long phrases. We follow similar criteria to those defined in previous OIE evaluations [4, 5]. Annotation was made on the whole set of extracted triples without identifying the system from which each triple had been generated.

The results are summarized in Table 1. Precision is defined as the number of correct extractions divided by the number of returned extractions. Recall is estimated by identifying a pool of relevant extractions which is the total number of different correct extractions made by all the systems (this pool is our gold-standard). So, recall is the number of correct extractions made by the system divided by the total number of correct expressions in the pool (346 correct triples in total).Footnote 4

Table 1. Evaluation of five OIE systems configured with five dependency parsers

The results show that there is no clear difference among the evaluated systems except in the case of deppattern-Linguakit, which relies on a rule-based parser. However, a deeper anaysis allows us to observe that rule-based and data-driven parsers might be complementary parsers as they merely share about 25% of the correct triples. More precisely, the number of correct extractions made by deppattern-Linguakit reaches 125 triples, but only 30 of them are also extracted by maltparser-nivrearc. This means that a voting OIE system consisting of the two best rule-based and data-driver parsers would improve recall in a very significant way without losing precision.

4 Conclusions

In this article, we showed that it is possible to use OIE systems to easily compare parsers developed with different strategies, by making use of a coarse-grained and shallow adaptation of tagsets and syntactic criteria. By contrast, comparing very different parsers by means of direct evaluation is a much harder task since it requires carrying out deep changes on the training corpus (golden treebank). These changes involve adapting tagsets before training, reconsidering syntactic criteria at all analysis level and yielding the same tokenization as the golden treebank. Moreover, the proposed task-oriented evaluation might help linguists make deep error analysis of the parsers since the extraction of basic propositions allows humans to visualize and interpret linguistic mistakes in an easier way than obscure syntactic outputs.