1 Introduction

Lexical classes, defined in terms of shared meaning components and similar (morpho-)syntactic behavior of words, have attracted considerable research interest (Pinker 1989; Jackendoff 1990; Levin 1993). These classes are useful for their ability to capture generalizations about a range of (cross-)linguistic properties. For example, verbs which share the meaning component of “manner of motion” (such as travel, run, walk), behave similarly also in terms of subcategorization (I traveled/ran/walked, I traveled/ran/walked to London, I traveled/ran/walked five miles) and usually have zero-related nominals (a run, a walk). Although the correspondence between the syntax and semantics of words is not perfect and these classes do not provide means for full semantic inferencing, their predictive power is nevertheless considerable.

NLP systems can benefit from lexical classes in a number of ways. Such classes define the mapping from surface realization of arguments to predicate-argument structure, and are therefore an important component of any system which needs the latter. As the classes can capture higher level abstractions (e.g., syntactic or semantic features) they can be used as a principled means to abstract away from individual words when required. They are also helpful in many operational contexts where lexical information must be acquired from small application-specific corpora. Their predictive power can help compensate for lack of sufficient data fully exemplifying the behavior of relevant words. Lexical classes have proved helpful in supporting a number of (multilingual) tasks, such as computational lexicography, language generation, machine translation, parsing, word sense disambiguation, semantic role labeling, and subcategorization acquisition (Dorr 1997; Prescher et al. 2000; Korhonen 2002; Shi and Mihalcea 2005). While this work has met with success, it has been small in scale. Large-scale exploitation of the classes has not been possible because no comprehensive classification is available.

The largest and most widely deployed classification in English is Levin’s (1993) classification of verbs. VerbNet (VN) (Kipper et al. 2000; Kipper-Schuler 2005) Footnote 1—an extensive on-line lexicon for English verbs—provides detailed syntactic-semantic descriptions of Levin classes organized into a refined taxonomy. While the original version of VN has proved useful for a variety of natural language tasks (e.g., semantic role labeling, robust semantic parsing, word sense disambiguation) it has mainly dealt with Levin-style verbs (i.e., verbs taking noun (NP) and prepositional phrase (PP) complements) and thus has suffered from limited coverage.

Some experiments have been reported which indicate that it should be possible, in the future, to automatically supplement VN with novel classes and member verbs from corpus data (Brew and Schulte im Walde 2002; Korhonen et al. 2003; Kingsbury 2004). While an automatic approach would avoid the expensive overhead of manual classification and enable application-specific tuning, the very development of the technology capable of large-scale classification requires access to a target gold standard classification more extensive than that available currently.

Korhonen and Briscoe (2004) (K&B) have proposed a substantial extension to Levin’s original classification which incorporates 57 novel classes for verb types not covered (comprehensively) by Levin. Korhonen and Ryant (unpublished) (K&R) have recently supplemented this with another extension including 53 classes. While these novel classes are potentially very useful, their practical use is limited by the fact that no detailed syntactic-semantic descriptions are provided with the classes, and no attempt has been made to organize the classes into a taxonomy or to integrate them into Levin’s taxonomy.

Our article addresses these problems: it describes the integration of these two sets of novel classes into VN (Kipper et al. 2006a, b). Due to many differences between the classifications their integration was a major task which had to be conducted largely manually to obtain any reliable result. The outcome is a freely available on-line resource which constitutes the most comprehensive Levin-style verb classification for English. After the two extensions VN has now also increased our coverage of PropBank tokens (Palmer et al. 2005) from 78.45% to 90.86%, making feasible the creation of a substantial training corpus annotated with VN thematic role labels and class membership assignments, to be released in 2007. This will finally enable large-scale experimentation on the utility of the classes for improving the performance of syntactic parsers and semantic role labelers on new domains.

We introduce Levin’s classification in Sect. 2, VN in Sect. 3 and the classes of K&B and K&R in Sect. 4. Section 5 describes the integration of the new classes into VN, and Sect. 6 describes how this integration affected VN and its coverage. Finally, Sect. 7 discusses on-going and future work.

2 Levin’s classification

Levin’s classification (Levin 1993) provides a summary of the variety of theoretical research done on lexical-semantic verb classification over the past decades. Verbs which display the same or a similar set of diathesis alternations in the realization of their argument structure are assumed to share certain meaning components and are organized into a semantically coherent class. Although alternations are chosen as the primary means for identifying verb classes, additional properties related to subcategorization, morphology, and extended meanings of verbs are taken into account as well.

For instance, the Levin class of “Break Verbs” (class 45.1), which refers to actions that bring about a change in the material integrity of some entity, is characterized by its participation (1–3) or non-participation (4–6) in the following alternations and other constructions (7, 8):

  1. 1.

    Causative/inchoative alternation:

    Tony broke the window ↔ The window broke

  2. 2.

    Middle alternation:

    Tony broke the window ↔ The window broke easily

  3. 3.

    Instrument subject alternation:

    Tony broke the window with the hammer ↔ The hammer broke the window*

  4. 4.

    \(\user2{With/against}\) alternation:

    Tony broke the cup against the wall ↔ *Tony broke the wall with the cup

  5. 5.

    *Conative alternation:

    Tony broke the window ↔ *Tony broke at the window

  6. 6.

    *Body-Part possessor ascension alternation:

    *Tony broke herself on the arm ↔ Tony broke her arm

  7. 7.

    Unintentional interpretation available (some verbs):

    Reflexive object: *Tony broke himself

    Body-part object: Tony broke his finger

  8. 8.

    Resultative phrase:

    Tony broke the piggy bank open, Tony broke the glass to pieces

Levin’s taxonomy classifies 3,024 verbs (4,186 senses) into 192 fine-grained classes according to their participation in 79 alternations involving NP and PP complements. Verbs taking ADJP, ADVP, ADL, particle, predicative, control, and sentential complements are largely excluded, except where they show interesting behavior with respect to NP and PP complementation.

3 Description of VerbNet

VerbNet is a hierarchical domain-independent, broad-coverage verb lexicon with mappings to several widely used verb resources, including WordNet (Miller 1990; Fellbaum 1998), Xtag (XTAG Research Group 2001), and FrameNet (Baker et al. 1998). It includes syntactic and semantic information for classes of English verbs derived from Levin’s classification which is considerably more detailed than that included in the original classification. Each verb class in VN is completely described by a set of members, thematic roles for the predicate-argument structure of these members, selectional restrictions on the arguments, and frames consisting of a syntactic description and semantic predicates with a temporal function, in a manner similar to the event decomposition of Moens and Steedman (1988). Footnote 2 The original Levin classes have been refined and new subclasses added to achieve syntactic and semantic coherence among members. The resulting class taxonomy incorporates different degrees of granularity. This is an important quality given that the desired level of granularity varies from one NLP application to another.

3.1 Syntactic frames

Each VN class contains a set of syntactic descriptions, or syntactic frames, depicting the possible surface realizations of the argument structure for constructions such as transitive, intransitive, prepositional phrases, resultatives, and set of diathesis alternations listed as part of each Levin class. A syntactic frame consists of thematic roles (e.g., Agent, Theme, Location), the verb, and other lexical items which may be required for a particular construction or alternation. Semantic restrictions (e.g., animate, human, organization) are used to suggest preferences as to the types of thematic roles allowed in the classes. A frame may also be constrained in terms of which prepositions are allowed. Further restrictions may be imposed on thematic roles to indicate the syntactic nature of the constituent likely to be associated with the thematic role. Levin classes are characterized primarily by NP and PP complements. Some classes refer to sentential complementation, but only to the distinction between finite and nonfinite clauses, as in the subclasses of Verbs of Communication. The VN frames for class Tell-37.2 shown in Examples (1) and (2) illustrate how this distinction is implemented.

  1. (1)

    Sentential Complement (finite)

    “Susan told Helen that the room was too hot.”

    Agent V Recipient Topic [+sentential -infinitival]

  2. (2)

    Sentential Complement (nonfinite)

    “Susan told Helen to avoid the crowd.”

    Agent V Recipient Topic [+infinitival -wh_inf]

3.2 Semantic predicates

VN frames also contain semantic information, expressed as a conjunction of boolean semantic predicates such as “motion,” “contact,” or “cause.” Each predicate is associated with an event variable E that allows predicates to specify when in the event the predicate is true (start(E) for the preparatory stage, during(E) for the culmination stage, and end(E) for the consequent stage). Aspect is captured by this event variable argument present in the predicates. For example, verbs that denote activities or processes (e.g., motion verbs), have predicates referring to the during(E) stage of the event. Relations between verbs or classes such as present in WordNet (e.g., antonymy and entailment) and FrameNet can be predicted by semantic predicates. For example, classes with change of location of the object have the same predicates cause and location used differently (negated in different places).

3.3 Status of VerbNet

Before integrating the novel classes, VN 1.0 had descriptions for 4,100 verb senses (over 3,000 lemmas) distributed in 191 first-level classes and 74 new subclasses. These descriptions used 21 thematic roles, 36 selectional restriction preferences, 314 syntactic frames, 64 semantic predicates, and a shallow hierarchy of prepositions with 57 entries. The coverage of VN 1.0 has been evaluated through a mapping to almost 50,000 instances in the Proposition Bank’s corpus (Kingsbury and Palmer 2002). VN syntactic frames account for over 78% of the exact matches found to the frames in PropBank. The information in the lexicon has proved useful for various NLP tasks such as word sense disambiguation and semantic role labeling (see Sect. 6). In VN 1.0 Levin’s taxonomy has gained considerably in depth, but not in breadth. Verbs taking ADJP, ADVP, particle, predicative, control, and sentential complements were still largely excluded. Many of these verb types are highly frequent in language and thus important for applications. As the new classes cover these verb types, it made sense incorporate them into VN.

4 Description of the new classes

4.1 The classes of Korhonen and Briscoe (2004)

The extension of Korhonen and Briscoe (2004) to Levin’s classification includes 57 new classes and 106 new diathesis alternations for verbs. The classes were created using the following semi-automatic approach Footnote 3:

Step 1: A set of diathesis alternations were constructed for verbs not covered extensively by Levin. This was done by considering possible alternations between pairs of subcategorization frames (SCFs) in the comprehensive classification of Briscoe (2000) which incorporates 163 SCFs (a superset of those listed in the ANLT (Boguraev et al. 1987) and COMLEX Syntax dictionaries (Grishman et al. 1994)), focusing in particular on those SCFs not covered by Levin. The SCFs define mappings from surface arguments to predicate-argument structure for bounded dependency constructions, but abstract over specific particles and prepositions. 106 new alternations were identified manually, using criteria similar to Levin’s.

Step 2: 102 candidate lexical classes were selected for the verbs from linguistic resources of a suitable style and granularity: (Rudanko 1996, 2000), (Sager 1981), (Levin 1993) and the LCS database (Dorr 2001).

Step 3: Each candidate class was evaluated by examining sets of SCFs taken by its member verbs in syntax dictionaries (e.g., COMLEX) and whether these SCFs could be related in terms of diathesis alternations (from the 106 novel ones or Levin’s original ones). Where one or several alternations were found which captured the sense in question, a new verb class was created.

Identifying relevant alternations helped to identify additional SCFs, which often led to the discovery of additional alternations. For those candidate classes which had an insufficient number of member verbs, new members were searched for in WordNet. These were frequently found among the synonyms, troponyms, hypernyms, coordinate terms and/or antonyms of the extant member verbs. The SCFs and alternations discovered during the identification process were used to create the syntactic-semantic description of each novel class. For example, a new class was created for verbs such as order and require, which share the approximate meaning of “direct somebody to do something.” This class was assigned the description shown in Table 1 (where the SCFs are indicated by number codes from Briscoe’s (2000) classification).

Table 1 Order verbs

The work resulted in accepting, rejecting, combining, and refining the 102 candidate classes and—as a by-product–identifying 5 new classes not included in any of the resources used. In the end, 57 new verb classes were formed, each associated with 2–45 member verbs. Table 2 shows a small sample of these classes along with example verbs.

Table 2 Examples of K&B’s classes

4.2 The classes of Korhonen and Ryant (unpublished)

While integrating K&B classes in VN, Korhonen and Ryant (unpublished) (K&R) uncovered 53 additional verb classes which deal with a wide range of different complements. Many of them cover prepositional and sentential complements. K&R classes also introduce a large number of verb particles. Table 3 presents a small sample of these classes along with member verbs. K&R classes were identified using the same methodology as in 3.1 (Step 3), associated with 2–37 member verbs and assigned similar syntactic descriptions as K&B classes. Table 3 presents a small sample of these classes.

Table 3 Examples of K&R’s classes

5 Incorporating the new classes into VerbNet

Although the classes of K&B and K&R are similar in style to the Levin classes, their integration to VN proved a major task. The first step was to assign them VN-style syntactic-semantic descriptions. This was not straightforward because the classes lacked explicit semantic descriptions and had syntactic descriptions not directly compatible with VN’s descriptions. Also some of the descriptions had to be enriched for the new classes. The second step was to incorporate the classes into VN. This was complicated by the fact that K&B and K&R are inconsistent in terms of granularity: some classes are broad while others are fine-grained. The comparison of the new classes to Levin’s classes had to be done on a class-by-class basis: some classes are entirely new, some are subclasses of existing classes, while others require reorganization of original Levin classes. These steps had to be conducted manually in order to obtain a reliable result.

5.1 Syntactic-semantic descriptions of classes

Assigning syntactic-semantic descriptions to the new classes involved work on both VN and on the new classifications. The set of SCFs in K&B and K&R is broad in coverage and relies on finer-grained treatment of sentential complementation than present in VN 1.0. Therefore, new VN syntactic descriptions had to be created and existing ones enriched with a more detailed treatment of sentential complementation. On the other hand, prepositional SCFs in K&B and K&R do not provide VN with explicit lists of allowed prepositions as required, so these had to be added to the classes. In addition, no syntactic description of the surface realization of the frames was included in K&B and K&R and had to be created. In some cases, the creation of new syntactic descriptions required extending the inventory of thematic roles. Additional semantic predicates were also created for VN to convey the proper semantics of the new classes.

5.1.1 Syntactic descriptions

Only 44 of VN’s syntactic frames had a counterpart in the SCF classification assumed by K&B and K&R (Briscoe 2000). This is because Briscoe abstracts over prepositions and particles whereas VN differentiates between otherwise identical frames based on the types of prepositions that a given class of verbs subcategorizes for. Additionally, VN may distinguish two syntactic frames depending on thematic roles (e.g., there are two variants of the Material/Product Alternation Transitive frame differing on whether the object is the Material or Product). Regarding sentential complements the opposite occurs, with VN conflating SCFs that Briscoe considers distinct. In integrating the proposed classes into VN it was necessary to greatly enrich the set of possible syntactic restrictions VN allows on clauses. The original hierarchy contained only the valences  ±sentential,  ±infinitival, and  ±wh_inf. The new set of possible syntactic restrictions consists of 57 such features accounting for object control, subject control, and different types of complementation (see the Appendix for a partial list of these features).

Examples (3)–(6) show the VN realizations and the set of constraints for the proposed FORCE class (from K&B) which includes two frames with object control complements.

  1. (3)

    Basic Transitive

    “I forced him.”

    Agent V Patient

  2. (4)

    NP-P-ING-OC (into-PP)

    “I forced him Prep(into) coming.”

    Agent V Patient into Proposition [+oc_ing]

  3. (5)

    NP-PP (into-PP)

    I forced John into the chairmanship.”

    Agent V Patient into Proposition [-sentential]

  4. (6)

    NP-TO-INF-OC

    “I forced him to come.”

    Agent V Patient Proposition [+oc_to_inf]

K&R classes also required the use of new SCFs not appearing in either VN or in K&B. These concern the classes USE, BASE, and SEEM with examples of new SCFs shown in Table 4.

Table 4 Examples of SCFs for K&R classes

5.1.2 Thematic roles

In integrating the new classes, it was found that none of the 21 original VN thematic roles seemed to appropriately convey the semantics of the arguments for some classes. As an example, the members of the proposed URGE class (K&B) describe events in which one entity exerts psychological pressure on another to perform some action (John urged Maria to go home). While the urger (John) is assigned the role Agent as the volitional agent of the action and the urged entity (Maria) is assigned Patient as the affected participant, it is unclear what thematic role best suits the urged action (of going home). A new Proposition role was included which seemed to more appropriately describe the semantics of the “urge” action. Similar situations arose in the integration of 8 other classes. In the end, two new thematic roles were added to VN: Content and Proposition.

5.1.3 Semantic descriptions

Integrating the new classes also required enriching VN’s set of semantic predicates. Whenever possible, existing VN predicates were reused. However, as many of the incoming classes represent concepts entirely novel to VN, it was necessary to introduce 30 new predicates to adequately provide descriptions of the semantics of these incoming classes. Examples of such predicates include approve, spend, command, and attempt.

5.2 Integrating the K&B classes into VerbNet

After assigning the class descriptions, each K&B class was investigated to determine its feasibility for VN. Of the classes proposed, two were rejected as being either insufficiently semantically homogeneous or too small to be added to the lexicon, with the remaining 55 selected for incorporation. The classes fell into three categories regarding Levin’s classification: (1) classes that could be subclasses of existing Levin classes; (2) classes that would require a reorganization of Levin classes (Levin focused mainly on NP and PP complements, but many verbs classify more naturally in terms of sentential complementation); (3) entirely new classes.

5.2.1 Entirely novel classes

A total of 42 classes could be added to the lexicon as novel classes or subclasses without any restructuring. Some of these overlapped to an extent with existing VN classes semantically but syntactic behavior of the members was sufficiently distinctive to allow them to be added as new classes without restructuring of VN. 35 novel classes were actually added as new classes while 7 others were added as new subclasses (e.g., an additional novel subclass, Continue-55.3, was discovered in the process of subdividing Begin-55.1). The 35 new classes all share the quality of not overlapping to any appreciable extent with a pre-existing VN class from the standpoint of semantics. For instance, K&B’s classes of FORCE, TRY, FORBID, and SUCCEED express entirely new concepts as compared to VN 1.0.

5.2.2 Novel sub-classes

Some of the proposed classes (e.g., CONVERT, SHIFT, INQUIRE, CONFESS) were considered sufficiently similar in meaning to existing classes to be added as their new subclasses. For example, both the proposed classes CONVERT and SHIFT are similar syntactically to the VN class Turn-26.6. However, whereas the members of Turn-26.6 exclusively involve total physical transformations, the members of the proposed class CONVERT invariably exclude physical transformation, instead having a meaning that involves non-physical changes such as changes in the viewpoint of the Theme (I converted the man to Judaism.). Similarly, SHIFT verbs only take the intransitive frames from CONVERT. Consequently, as both SHIFT and CONVERT are semantically similar, yet still distinct, from the existing VN class Turn-26.6, they were added as subclasses to 26.6, yielding the new classification Turn-26.6.1, Convert-26.6.2, and Shift-26.6.3.

5.2.3 Classes where restructuring was necessary

Thirteen of the proposed classes overlapped significantly in some way with existing VN classes (either too close semantically or syntactically) and required restructuring of VN. For example, classes WANT, PAY, and SEE overlapped with existing VN classes Want-32.1, Give-13.1, and See-30.1 both in terms of semantics and syntax. Such classes were added by (1) merging proposed classes with the related VN class; or by (2) adding the proposed class as a novel class but making modifications to existing VN classes.

Cases involving merger of a proposed class and an existing class: In considering these classes for addition to VN, it was observed that semantically their members patterned after an existing class almost exactly. In the cases where the frames from the new classes were a superset of the frames recorded in VN, the existing VN class was restructured by adding the new members and by enriching its syntactic description with the new frames. For example, both K&B’s WANT class and the VN class Want-32.1 relate to the act of an experiencer desiring something. VN class Want-32.1 differs from the proposed WANT class in its membership and in that it considers only alternations in NP and PP complements whereas the proposed class WANT also considered alternations in sentential complements, particularly control cases.

Added as new class but requiring restructuring of classes: K&B’s work is of particular importance when considered in the context of classes of Verbs With Predicative Complements, whose members are frequent in language. These verbs classify more naturally in terms of sentential rather than NP or PP complementation. The proposed class CONSIDER overlaps with four of VN’s classes (Appoint-29.1, Characterize-29.5, Declare-29.4, and Conjecture-29.6), none of which were originally semantically homogeneous (see Fig. 1). The process of adding CONSIDER as another class of verbs with predicative complement gave us the opportunity to revise these four problematic classes making them more semantically homogeneous by using the more detailed coverage of complementation presented in K&B.

Fig. 1
figure 1

Original classes of predicative complement and the new Consider-29.9 class

5.3 Integrating the K&R classes into VerbNet

Integrating the second set of candidate classes proceeded much as the integration of the first set. Of the 53 suggested classes, seven were omitted as they did not fully meet the requirements of Levin style classes, 11 were decided to overlap to a reasonable extent with a pre-existing class, and 36 were added as new classes (one candidate class was divided into two new classes).

5.3.1 Novel classes and subclasses

In total, 35 classes from K&R were regarded as sufficiently novel for addition to VN without restructuring of an existing VN class. In addition, one class was divided into two new classes, PROMISE and ENSURE. As with K&B, 10 classes overlapped semantically, but not syntactically with existing VN classes, and hence were added as new subclasses. Examples of such classes include the proposed classes INTERROGATE and BEG, which were added as subclasses of the classes concerning Communication. The remaining 26 classes were added as new classes. Examples include the classes REQUIRE, DOMINATE, SUBJUGATE, and HIRE, all of which express novel concepts.

5.3.2 Additions to existing classes

Eleven of the candidate classes overlapped significantly both syntactically and semantically with an existing class. Examples include CLARIFY (overlaps the EXPLAIN class of the first candidate set), DELEGATING POWER (overlaps ALLOW of first candidate set), BEING IN CHARGE OF (overlaps second candidate set DOMINATE). Unlike with K&B classes, very little restructuring was needed for these cases. In each case, the proposed class contained a subset of the SCFs in the class it overlapped with or contained one or two additional SCFs which were compatible with the pre-existing class.

6 The extended VerbNet

A summary of how this integration affected VN and the result of the extended lexicon is shown in Table 5. The figures show that our work enriched and expanded VN considerably. The number of first-level classes grew significantly (from 191 to 274). There was also a significant increase in the number of verb senses and lemmas, along with the set of semantic predicates and the syntactic restrictions on sentential complements. We also examined the qualitative contributions of K&B and K&R to VN. The most salient difference among the two candidate sets is in the categories of activities they include. Many of the 42 K&B classes tended to cluster among three broad categories:

Table 5 Summary of the lexicon’s extension
  1. 1.

    Classes describing the interaction of two animate entities: 14 of the classes describe interactions or relationships among entities in some social context (see the following examples). The interaction can be cooperative or non-cooperative, and the two entities may or may not be thought to exist in some power relationship:

    1. (a)

      FORCE (e.g., encourage, force, pressure):

      John forced Bill.

      Agent V Patient

    2. (b)

      BATTLE (e.g., battle, debate, fight):

      John battled with Bill over the insult.

      Actor1 V with Actor2 over Topic

  2. 2.

    Classes describing the degree of engagement of an entity with an activity: Eleven classes involve an agent and an activity in which the agent is involved, but differ in how the agent approaches the activity, e.g.

    1. (a)

      TRY (e.g., try, attempt, intend):

      John tried the exercise routine.

      Agent V Theme

    2. (b)

      NEGLECT (e.g., neglect, fail, manage):

      John neglected the job.

      Agent V Theme

  3. 3.

    Classes describing the relation of an entity and some abstract idea: Six of the classes describe relations between entities and abstract ideas, such as the entity’s attitude towards some idea, e.g.

    1. (a)

      DISCOVER (e.g., ascertain, discover):

      John discovered how to do it.

      Agent V Theme

    2. (b)

      WISH (e.g., aim, intend, wish):

      John wishes to go home.

      Experiencer V Theme

The K&R classes address a much broader range of concepts (they also cover a wider range of complementation types) than the K&B classes. There is, again, a group of ten classes that could be considered broadly as describing social interactions among animate entities (i.e., DOMINATE, SUBJUGATE, HIRE). The remaining classes form small clusters of 2–4 classes, or are among the ten completely idiosyncratic classes.

  1. 1.

    Small clusters: For example, ESTABLISH and PATENT classes describe activities of bringing into existence, but, unlike the existing Create-26.4 class, they relate to the creation of abstractions such as organizations or ideas.

    1. (a)

      ESTABLISH (e.g., found, establish, initiate):

      John established the company.

      Agent V Theme

    2. (b)

      PATENT (e.g., copyright, patent, register):

      John patented his discovery.

      Agent V Theme

  2. 2.

    Idiosyncratic classes: Examples of these include classes such as USE and MULTIPLY.

    1. (a)

      USE (e.g., apply, employ, use):

      John used the money well.

      Agent V Theme

    2. (b)

      MULTIPLY (e.g., add, count, sum):

      John summed the numbers.

      Agent V Theme

With the integration of the new classes, which portray very diverse phenomena, the extended VN is now able to represent a much larger segment of the English language. The extended VN is now incorporated in the Unified Verb Index (UVI) (Trumbo 2006) available at http://www.verbs.colorado.edu/verb-index/ which merges information from four different NLP projects: VN, PropBank, Framenet and OntoNotes Sense Groupings. The Appendix includes sample UVI screenshots.

7 Conclusion and future work

Integrating the two recent extensions to Levin classes into VN was an important step in order to address a major limitation of Levin’s verb classification, namely the fact that verbs taking ADJP, ADVP, predicative, control, and sentential complements were not included or addressed in depth in that work. This limitation excludes many verbs that are highly frequent in language.

When evaluating the practical usefulness of the extended VN, the key issue is coverage, given the insufficient coverage has been the main limitation of the use of verb classes in practical NLP so far. We investigated the coverage over PropBank (Palmer et al. 2005)—the annotation of the Penn Treebank II with semantic arguments. The list of verbs in VN before the class extensions included 3,445 lemmas which matched 78.45% of the verb tokens in the annotated PropBank data (88,584 occurrences). The extended version of VN contains 3,769 lemmas which greatly increased the coverage of VN to now match 90.86% of the (102,600) PropBank verb occurrences. These numbers reflect only the verb lemmas covered independently of their class or frame membership.

Recently a manual mapping has been created between verb senses in PropBank and those in VN extended with the K&B classes (Loper et al. 2007) (http://www.verbs.colorado.edu/semlink/). This mapping was used for to generate VN thematic role labels for the English Lexical Sample Semantic Role Labeling Task 17 for SemEval (http://www.nlp.cs.swarthmore.edu/semeval/) (Pradhan et al. 2007). In this mapping the token coverage of VN is 74.5%. This is lower than in our evaluation because of the filtering of verb lemmas with different senses and because of the focus on the first extension of VN only.

Korhonen and Briscoe (2004) showed that the K&B classes now incorporated in VN can be used to significantly aid subcategorization acquisition and that the extended classification has good coverage over WordNet. We can expect to see similar improved results on many NLP applications in the near future, given the wide use of VN in the research community. Currently, the use of verb classes in VN 1.0 is being attested in a variety of applications such as automatic verb acquisition (Swift 2005), semantic role labeling (Swier and Stevenson 2004; Yi et al. 2007), robust semantic parsing (Shi and Mihalcea 2005), word sense disambiguation (Dang 2004), building conceptual graphs (Hensman and Dunnion 2004), and creating a unified lexical resource for knowledge extraction (Croch and King 2005), among others.

In the future, we hope to extend VN’s coverage further. We plan to search for additional novel classes and members using automatic methods, e.g., clustering. This is now realistic given the more comprehensive target and gold standard classification provided by VN. In addition, we plan to include in VN statistical information concerning the relative likelihood of different classes, SCFs and alternations for verbs in corpus data, using, e.g., the automatic methods proposed by McCarthy (2001) and Korhonen (2002). Such information can be highly useful for statistical NLP systems utilizing lexical classes.