Keywords

1 Introduction

NooJ allows linguists to formalize various types of linguistic description: orthography and spelling, lexicons for simple words, multiword units and frozen expressions, inflectional and derivational morphology, local, structural and transformational syntax. One important characteristic of NooJ is that all the linguistic descriptions are reversible, i.e. they can be used both by a parser (to recognize sentences) as well as a generator (to produce sentences). (Silberztein 2011) and (Silberztein 2016) show how, by combining a parser and a generator and applying them to a syntactic grammar, we can build a system that takes one sentence as its input, and produce all the sentences that share the same lexical material with the original sentence.

Here are two simple transformationsFootnote 1:

  • [Pron-0] Joe loves Lea = He loves Lea

  • [Passive] Joe loves Lea = Lea is loved by Joe

The second one can be implemented in NooJ via the following grammar:

This graph uses three variables $NO, $V and $N1. When parsing the sentence Joe loves Lea, the variable $N0 stores the word Joe, $V stores the word loves and $N1 stores Lea. The grammar’s output “$N1 is $V_V+PP by $N0” produces the string Lea is loved by Joe.

Note that morphological operations such as “$V_V+PP”, operate on NooJ’s Atomic Linguistic Units (ALUs) rather than plain strings; in other words, NooJ knows that the word form loves is an instance of the verb to love and it can produce all the conjugated and derived word forms from this ALU (e.g. “loving”, “lovers”). Here, $V_V+PP takes the value of variable $V (loves), lemmatizes it (love), produces all its verb forms and selects the ones that have property +PP (i.e. Past Participle) to get the result loved.

One application of this rewriting system is Machine Translation, whereas one grammar recognizes sentences in one input language, and produces the corresponding “rewritten” sentences in another language, see for instance (Barreiro 2008) for Portuguese-English translation, (Fehri et al. 2010) for Arabic-French translation and (Ben et al. 2015) for Arabic-English translation.

As (Silberztein 2016) has shown, any serious attempt at describing a significant part of a language will involve the creation of a large number of elementary transformations (Fig. 2):

Previously, each of these transformations would have to be described by two NooJ grammars, because each pair of sentences involve two opposite computations: we want to produce not only Lea is loved by Joe from the sentence Joe loves Lea, but also the sentence Joe loves Lea from the sentence Lea is loved by Joe, and the grammars that can perform the reverse operation is different, as we can seen in the following figure (Fig. 3):

(Silberztein 2016) shows that it is not always possible to combine elementary transformations with each others in order to take account of all complex sentences. For instance, it would not be possible to use the elementary graphs [Cleft-1], [Passive] and [Neg] to produce the sentence It is Lea who is not loved by Joe from the original sentence Joe loves Lea without modifying them profondly, as the graph in Fig. 1 would not parse the intermediary sentence Joe does not love Lea.

Fig. 1.
figure 1

The [Passive] transformation

Fig. 2.
figure 2

Elementary transformations

Fig. 3.
figure 3

The [Passive-inv] transformation

The solution described by (Silberztein 2016) is based on the fact that all NooJ’s linguistic resources are reversible, i.e. they can be used both to parse texts and also to produce them. For instance, NooJ’s morphological grammars can be used to recognize and analyze word forms, but also to produce lists of forms in the form of a dictionary, see following figure (Fig. 4).

Fig. 4.
figure 4

Generate all the word forms derived from France

NooJ’s command GRAMMAR > Generate Language automatically constructs the dictionary that contains all the word forms recognized by the morphological grammar, and associates each of the word forms with the corresponding grammar output. In the resulting dictionary, the linguistic information associated with each lexical entry corresponds in a sense to its linguistic analysis: analyzing the word form refranciser as a verb with the +Re (repetition) feature is similar to analyzing the sentence Joe does not love Lea with the +Neg (negation) operator. Our goal is then to construct a syntactic grammar that represents all the sentences transformed from the elementary sentence Joe loves Lea.

2 A Simple Grammar

I have constructed a simple grammar that contains three sub-grammars:

  • a grammar that recognizes declarative sentences, e.g. Joe cannot stop loving Lea

  • a grammar that recognizes interrogative sentences, e.g. Does Joe love Lea?

  • a grammar that recognizes noun phrases, e.g. Joe’s love for Lea.

2.1 Declarative Sentences

The graph Declarative shown in Fig. 5 recognizes the simple sentence Joe loves Lea (in the path at the very top), as well as over 1 million variants.

Fig. 5.
figure 5

Declarative sentences transformed from Joe loves Lea

  • Three nominalizations:

    Joe was in love with Lea (+NomV), Joe did no longer feel any love for Lea (+NomV), Joe is Lea’s lover (+Nom0), Lea has been Joe’s love (+Nom1)

  • Three extractions:

    It is not Joe who loves Lea (+Cleft0), It is Lea that Joe loved (+Cleft1), It was love that Joe felt for Lea (+CleftPred)

  • Three pronouns:

    He loves Lea (+Pron0), Lea is loved by him (+Pron0), Joe is in love with her (+Pron1), She is loved by Joe (+Pron1), Joe feels something for Lea (+PronPred).

  • Three elisions:

    Lea is loved (+Z0), Joe is in love (+Z1), Joe feels for Lea (+ZPred).

  • Five intensive adjectives and adverbs:

    Joe feels little love for Lea (+Intens0), Joe loves Lea a lot (+Intens1), Joe loved Lea very much (+Intens2), Lea was Joe’s passionate love (+Intens3), Joe is madly in love with Lea (+Intens4)

  • Two aspectual adverbs:

    Joe no longer loves Lea (+STOP), Joe still loves Lea (+CONT),

  • In the graph LOVE shown in Fig. 6, the embedded graph loves recognizes all conjugated forms of to love (loves, loved, has loved, was loving) as well as the corresponding negative (doesn’t love) aspectual (is still loving) and emphasis (does love) sequences.

    Fig. 6.
    figure 6

    The verb group

  • The graphs CAN, MIGHT, MUST, MUST2, NEED recognize modal sequences such as could no longer love, must have loved, needs to love, etc. The embedded graphs CONTINUE, KEEP, START and STOP are used to recognize aspectual sequences such as kept on loving, started to love, did not stop loving, etc.

By exploring all the paths in the Declarative graph, NooJ produces over 1 million declarative sentences (Fig. 7).

Fig. 7.
figure 7

The Declarative grammar represents over 1 million sentences

Similarly, the graph NounPhrase represents over 500,000 noun phrases such as Lea’s lover (+Nom0), Joe’s mad love for Lea (+NomV) and Joe’s love (+Nom1), etc.

The graph Interrogative represents over 3 million questions such as: Who used to love Lea? When did Joe’s love for Lea end? Why could Joe no longer love Lea? How come Joe still loves Lea?, etc.

3 Generalizing the Grammar

If we want to construct a syntactic grammar that recognizes all direct transitive sentences such as My cousin ate an apple, we can start by replacing Joe and Lea with a description of Noun Phrases, and love with any direct transitive verb.

Such a grammar would indeed recognize any sentence of structure NP V NP successfully; however it would be useless as a transformational grammar because it would not restrict the elements of the sentence. For instance, it would recognize the three sentences:

Lea is seen by Joe. The busy teacher dropped the broken pen. Joe saw Lea.

but it would not be able to tell that the first and the third sentences are linked by a transformation, whereas the second one is unrelated. Moreover, we want the grammar to process the two following sentences as two unrelated sentences:

Joe loves Lea. Joe is loved by Lea.

In order to produce the sentence Lea is loved by Joe (and reject sentence Joe is loved by Lea) from the first one, we need to index the elements of the sentence:

NP 0 V NP 1 = NP 1 is V-pp by NP 0

and then set NP 0, V and NP 1 respectively to Joe, love and Lea. To do that, we use NooJ’s global variables.

The new grammar is very similar to original grammar Joe loves Lea. As a matter of fact, the main Declarative graph in the new grammar is almost identical to the one shown in Fig. 5 Footnote 2. However, we modified the graphs that contain the actual lexical material, as seen in Figs. 8 and 9.

Fig. 8.
figure 8

A generic grammar to represent the main verb of a direct transitive sentence.

Fig. 9.
figure 9

A generic grammar to represent the main verb of a direct transitive sentence.

The new version of the NP0 graph sets the global variable @N0 and uses a lexical contraint on the gender of @N0 in order to produce the correct pronoun (i.e. He for Joe and She for Lea). The corresponding NP0r graph produces the pronouns him (instead of He) and her (instead of She) to take account of the object complements.

Similarly, the new version of the LOVE graph sets the global variable @V and replaces all conjugated forms of the verb to love with a syntactic symbol such as <V>, <V+PP>, <V-G>, etc.

Nominalizations.

Remember that the original grammar takes account of three nominalizations:

Joe was in love with Lea (+NomV), Joe did no longer feel any love for Lea (+NomV), Joe is Lea’s lover (+Nom0), Lea has been Joe’s love (+Nom1)

In the original grammar, these three nominalized forms were described in graphs love-N, lover and love-0. The same graphs have been modified as follow (Fig. 10):

Fig. 10.
figure 10

Three nominalizations

Note that the variable @V is used to store the predicate of the sentence, whether it is a verb or if it has been derived as a noun. In order for each of these graphs to be activated, the initial verb needs to have the corresponding derivational property: +NPred (Predicate is nominalized), +NAgent (Subject is nominalized) and +NObj (Object is derived); these three properties are produced by the corresponding derivational DRV paradigms applied to lexical entry, e.g.:

  • love+FLX=LOVE+DRV=PRED_ID+DRV=AGENT_R:TABLE+DRV=OBJECT_ID:TABLE

  • PRED_ID = <E>/N+NPred;

  • AGENT_R = r/N+NAgent;

  • OBJECT_ID = <E>/N+NObj;

Reciproquely, if a given verb does not accept one of these derivations, then the sentences that contain this derivation will just not be produced. For instance, if we start from sentence: Mary sees Helen, only 600,000 declarative sentences will be produced, because the verb to see does not accept any of the three derivations:

  • Mary sees Helen = *Mary is a seer

  • Mary sees Helen = *Mary (does | feels) some seing for Helen

  • Mary sees Helen = *Helen is Mary’s see

The list of features produced with each transformed sentence represents its transformational analysis. For instance, the resulting sentence “Mary couldn’t see Helen” is associated with the features S+Declarative+CAN+Preterit+Neg (Fig. 11).

Fig. 11.
figure 11

Produce all the sentences transformed from Mary sees Helen

4 Conclusion

In this paper, I have presented a simple grammar capable of recognizing and producing a large number of sentences that are derived by transformations from the initial elementary sentence Joe loves Lea. This accounts for approximately 4 million sentences, including over 1 million declarative sentences, 500,000 noun phrases and over 3 million questions.

Modifying this grammar so that it can recognize any direct transitive sentence and produce the corresponding transformed sentences is suprisingly easy: it involves the replacement of word forms with more general syntactic symbols (e.g. replace loved with <V+PP>) and global variables that store the three elements of the sentence @N0, @V and @N1 and then provide the reference for the arguments, wherever they are located in the sentence.

As opposed to the traditional point of view on transformational grammars, the system presented here does not require linguists to implement a new level of linguistic description that would be different from the syntactic structural level: no need to implement specific transformational operators nor design a complex set of mechanisms to process chains of transformations.