Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Originally developed for efficient learning of context-free languages [3, 13], the method of distributional learning under the paradigm of identification in the limit from positive data and membership queries has been successfully applied to a number of more complex grammatical formalisms that derive objects (strings, trees, \(\lambda \)-terms, etc.) through local sets of derivation trees [9, 12, 14]. In these formalisms, a subtree s of a complete derivation tree \(t = c[s]\) contributes a certain “substructure” \(S = \phi (s)\) which is contained in the whole derived object \(T = \phi (t)\), and the remaining part \(c[\,]\) of the derivation tree contributes a function \(C = \phi (c[\,])\) that maps S to \(T = C(S)\). We can think of C as a “context” that surrounds S in T. Fixing a class \(\mathbb {G}\) of grammars fixes the set \(\mathbb {S}\) of possible substructures and the set \(\mathbb {C}\) of possible contexts that may be contributed by parts of possible derivation trees. Each language L generated by a grammar in \(\mathbb {G}\) acts as an arbiter that decides which context \(C \in \mathbb {C}\) should “accept” which substructure \(S \in \mathbb {S}\) (i.e., whether \(C(S) \in L\)).

Distributional learning algorithms come in two broad varieties. In the primal approach, the learner first extracts all substructures and all contexts that are contained in the input data, which is a finite set of elements of the target language \(L_{*}\). The learner then collects all subsets of the extracted substructures whose cardinality does not exceed a certain fixed bound m. These subsets are used as nonterminal symbols of the hypothesized grammar. Out of all possible grammar rules that can be written using these nonterminals, the learner lists those that use operations that may be involved in the generation of the objects in the input data. In the final step of the algorithm, the learner tries to validate each of these rules with the membership oracle, which answers a query “\(C(S) \in L_{*}\)?” in constant time. If a rule has a set \(\mathbf {S}\) of substructures on the left-hand side and sets \(\mathbf {S}_1,\dots ,\mathbf {S}_r\) on the right-hand side, and the grammatical operation associated with the rule is f, then the learner determines whether the following implication holds for all contexts C extracted from the input data:

$$\begin{aligned} C(S) \in L_{*} \text { for all }S&\in \mathbf {S} \text { implies} \nonumber \\&\;\;C(f(S_1,\dots ,S_n)) \in L_{*}\,\text {for all }S_1 \in \mathbf {S}_1,\dots ,S_n \in \mathbf {S}_n. \end{aligned}$$
(1)

The grammar conjectured by the learner includes only those rules that pass this test.

The idea of the rule validation is the following: It is dictated that the elements of the nonterminal \(\mathbf {S}\) together characterize the set of all substructures that can be derived from \(\mathbf {S}\) by the hypothesized grammar in the sense that every context \(C \in \mathbb {C}\) that accepts all elements of \(\mathbf {S}\) must accept all substructures derived from \(\mathbf {S}\). Thus, only those rules that are consistent with this requirement are allowed in the hypothesized grammar. A remarkable property of the algorithm is that it successfully learns the language of every grammar in the given class \(\mathbb {G}\) that has the m-finite kernel property in the sense that each nonterminal is characterized by a set of substructures of cardinality up to m.

In the dual approach to distributional learning, the role of contexts and substructures is switched. The learner uses as nonterminals subsets of the contexts extracted from the input data with cardinality \(\le m\), and uses the extracted substructures to validate candidate rules. The algorithm learns those languages that have a grammar with the m-finite context property in the sense that each nonterminal is characterized by a set of contexts of cardinality \(\le m\).

Whether each of these algorithms runs in polynomial time in the size of the input data D depends on several factors that are all determined by the grammar class \(\mathbb {G}\). The foremost among them is the enumeration of the two sets

$$\begin{aligned} \begin{aligned} \mathbb {S}|_D&= \{\, S \in \mathbb {S} \mid C(S) \in D\text { for some }C \in \mathbb {C} \,\},\\ \mathbb {C}|_D&= \{\, C \in \mathbb {C} \mid C(S) \in D\text { for some }S \in \mathbb {S} \,\}. \end{aligned} \end{aligned}$$

There are two possible difficulties in enumerating each of these sets in polynomial time. First, the sheer number of elements of the set may be super-polynomial, in which case explicit enumeration of the set is not possible in polynomial time. Second, recognizing which substructure/context belongs to the set may be computationally costly. The second problem, even when it arises, can often be dealt with by replacing the set in question by a more easily recognizable superset without disrupting the working of the algorithm. The first problem is the more pressing one.

With all linear grammar formalisms to which distributional learning has been applied, neither of these two difficulties arise. When these formalisms are extended to allow nonlinearity in grammatical operations, however, the problem of super-polynomial cardinality hits hard. Thus, with parallel multiple context-free grammars, the nonlinear extension of multiple context-free grammars (successfully dealt with in [12]), the set \(\mathbb {C}\) becomes a much larger set, even though \(\mathbb {S}\) stays exactly the same. As a result, the cardinality of \(\mathbb {C}|_D\) is no longer bounded by a polynomial. The situation with IO context-free grammars, the nonlinear extension of the simple context-free tree grammars (treated in [9]), is even worse. Both of the sets \(\mathbb {S}|_D\) and \(\mathbb {C}|_D\) become super-polynomial in cardinality.

When only one of the two sets \(\mathbb {S}|_D\) and \(\mathbb {C}|_D\) is of super-polynomial cardinality, as is the case with PMCFGs, however, there is a way out of this plight [4]. The solution is to restrict the offending set by a certain property, parametrized by a natural number, so that its cardinality will be polynomial. The parametrized restriction leads to an increasing chain of subsets inside \(\mathbb {S}\) or \(\mathbb {C}\). In the case of PMCFGs, we get \(\mathbb {C}_1 \subset \mathbb {C}_2 \subset \mathbb {C}_3 \subset \cdots \subset \mathbb {C} = \bigcup _k \mathbb {C}_k\), where \(\mathbb {C}_k\) is the set of all possible contexts that satisfy the property with respect to the parameter k. The actual property used by [4] was a measure of nonlinearity of the context (“k-copying”), but this specific choice is not crucial for the correct working of the algorithm, as long as \(\mathbb {C}_k|_D\) can be enumerated in polynomial time. The learning algorithm now has two parameters, m and k: the former is a bound on the cardinality of sets of contexts the learner uses as nonterminals as before, and the latter is a restriction on the kind of context allowed in these sets. The class of languages successfully learned by the algorithm includes the languages of all grammars in the target class that have the (km)-finite context-property in the sense that each nonterminal is characterized by a subset of \(\mathbb {C}_k\) of cardinality \(\le m\).

This algorithm does not learn the class of all grammars with the m-finite context property, but a proper subset of it. Nevertheless, the parametrized restriction has a certain sense of naturalness, and the resulting learnable class properly extends the corresponding linear class, so the weaker result is interesting in its own right.

In this paper, we explore the connection between distributional learning and context/substructure enumerability in the general setting of almost linear second-order abstract categorial grammars generating trees [57] (“almost linear ACGs” for short). This class of grammars properly extends IO context-free tree grammars and is equivalent in tree generating power to tree-valued attribute grammars [1]. In fact, the expressive power of typed lambda calculus makes it possible to faithfully encode most known tree grammars within almost linear ACGs.

Like IO context-free tree grammars and unlike PMCFGs, almost linear ACGs in general do not allow polynomial-time enumerability either on the context side or on the substructure side. Only very special grammars do, and an interesting subclass of them consists of those grammars that allow only a bounded degree of nonlinearity in the contexts (or in the substructures). It is easily decidable whether a given ACG satisfies each of these properties. We show that both of the resulting classes of grammars indeed allow a kind of efficient distributional learning similar to that for PMCFGs.

2 Typed Lambda Terms and Almost Linear ACGs

2.1 Types and Typed Lambda Terms

We assume familiarity with the notion of a simply typed \(\lambda \) -term (à la Church) over a higher-order signature \(\varSigma = (A_{\varSigma }, C_{\varSigma },\tau _{\varSigma })\), where \(A_{\varSigma }\) is the set of atomic types, \(C_{\varSigma }\) is the set of constants, and \(\tau _{\varSigma }\) is a function from \(C_{\varSigma }\) to types over \(A_{\varSigma }\). We use standard abbreviations: \(\alpha _1 \rightarrow \dots \rightarrow \alpha _n \rightarrow p\) means \(\alpha _1 \rightarrow (\dots \rightarrow (\alpha _n \rightarrow p) \dots )\), and \(\lambda x_1^{\alpha _1} \dots x_n^{\alpha _n}.MN_1 \dots N_m\) is short for \(\lambda x_1^{\alpha _1}.\dots .\lambda x_n^{\alpha _n}.((\dots (M N_1) \dots ) N_m)\). The arity of \(\alpha = \alpha _1 \rightarrow \dots \rightarrow \alpha _n \rightarrow p\) with \(p \in A_{\varSigma }\) is \(\mathrm {arity}(\alpha ) = n\). We write \(\beta ^n \rightarrow p\) for the type \(\beta \rightarrow \dots \rightarrow \beta \rightarrow p\) of arity n.

We take for granted such notions as \(\beta \) - and \(\eta \) -reduction, \(\beta \) -normal form, and linear \(\lambda \) -terms. We write \(\mathrel {\twoheadrightarrow _{\beta }}\) and \(\mathrel {\twoheadrightarrow _{\eta }}\) for the relations of \(\beta \)- and \(\eta \)-reduction between \(\lambda \)-terms. Every typed \(\lambda \)-term has a \(\beta \)-normal form, unique up to renaming of bound variables, which we write as \(|M|_{\beta }\).

The set \(\mathrm {LNF}^{\alpha }_X(\varSigma )\) of \(\lambda \)-terms of type \(\alpha \) in \(\eta \) -long \(\beta \) -normal form (with free variables from X) is defined inductively as follows:

  • If \(x^{\alpha _1 \rightarrow \dots \rightarrow \alpha _n \rightarrow p} \in X\), \(M_1 \in \mathrm {LNF}^{\alpha _1}_X(\varSigma ),\dots , M_n \in \mathrm {LNF}^{\alpha _n}_X(\varSigma )\), and \(p \in A_{\varSigma }\), then \(x^{\alpha _1 \rightarrow \dots \rightarrow \alpha _n \rightarrow p} M_1 \dots M_n \in \mathrm {LNF}^p_X(\varSigma )\).

  • If \(c \in C_{\varSigma }\), \(\tau _{\varSigma }(c) = \alpha _1 \rightarrow \dots \rightarrow \alpha _n \rightarrow p\), \(p \in A_{\varSigma }\), and \(M_1 \in \mathrm {LNF}^{\alpha _1}_X(\varSigma ),\dots , M_n \in \mathrm {LNF}^{\alpha _n}_X(\varSigma )\), then \(c M_1 \dots M_n \in \mathrm {LNF}^p_X(\varSigma )\).

  • If \(M \in \mathrm {LNF}^{\beta }_{X \cup \{x^{\alpha }\}}(\varSigma )\), then \(\lambda x^{\alpha }.M \in \mathrm {LNF}^{\alpha \rightarrow \beta }_X(\varSigma )\).

We often suppress the superscript and/or subscript in \(\mathrm {LNF}^{\alpha }_X(\varSigma )\). Note that \(\mathrm {LNF}^{\alpha }_{\varnothing }(\varSigma )\) denotes the set of closed \(\lambda \)-terms of type \(\alpha \) in \(\eta \)-long \(\beta \)-normal form. We note that if \(M \in \mathrm {LNF}^{\alpha \rightarrow \beta }(\varSigma )\) and \(N \in \mathrm {LNF}^{\alpha }(\varSigma )\), then \(|MN|_{\beta } \in \mathrm {LNF}^{\alpha }(\varSigma )\).

Henceforth, we often suppress the type superscript on variables. This is just for brevity; each variable in a typed \(\lambda \)-term comes with a fixed type.

We use strings over \(\{ 0,1 \}\) to refer to positions inside a \(\lambda \)-term or a type. We write \(\varepsilon \) for the empty string, and write \(u \le v\) to mean u is a prefix of v. When \(u = u' 0^i\), we refer to \(u'\) as \(u 0^{-i}\).

The shape of a type \(\alpha \), written \([\alpha ]\), is defined by

$$\begin{aligned} \begin{aligned}{} [p]&= \{ \varepsilon \} \text { if }p\text { is atomic,}&[\alpha \rightarrow \beta ] = \{ \varepsilon \} \cup \{\, 1u \mid u \in [\alpha ] \,\} \cup \{\, 0u \mid u \in [\beta ] \,\}. \end{aligned} \end{aligned}$$

The elements of \([\alpha ]\) are the positions of \(\alpha \). A position u is positive if its parity (i.e., the number of 1s in u modulo 2) is 0, and negative if its parity is 1. We write \([\alpha ]^+\) and \([\alpha ]^-\) for the set of positive and negative positions of \(\alpha \), respectively. A position u of \(\alpha \) is a subpremise if \(u = u'1\) for some \(u'\). Such an occurrence is a positive (resp. negative) subpremise if it is positive (resp. negative). We write \([\alpha ]^+_{\mathrm {sp}}\) (resp. \([\alpha ]^-_{\mathrm {sp}}\)) for the set of positive (resp. negative) subpremises of \([\alpha ]\).

If \(u \in [\alpha ]\), the subtype of \(\alpha \) occurring at u, written \(\alpha /u\), is defined by

$$\begin{aligned} \begin{aligned} \alpha /\varepsilon&= \alpha ,&(\alpha \rightarrow \beta )/0u&= \beta /u,&(\alpha \rightarrow \beta )/1u&= \alpha /u. \end{aligned} \end{aligned}$$

If \(\alpha /u = \beta \), we say that \(\beta \) occurs at position u in \(\alpha \).

Given a \(\lambda \)-term M, the shape of M, written [M], is defined by

$$\begin{aligned} \begin{aligned}{} [M]&= \{ \varepsilon \}\quad \text {if }M\text { is a variable or a constant,}\\ [MN]&= \{ \varepsilon \} \cup \{\, 0u \mid u \in [M] \,\} \cup \{\, 1u \mid u \in [N] \,\},\\ [\lambda x.M]&= \{ \varepsilon \} \cup \{\, 0u \mid u \in [M] \,\}. \end{aligned} \end{aligned}$$

The elements of [M] are the positions of M.

If \(u \in [M]\), the subterm of M occurring at u, written \(M/u\), is defined by

$$\begin{aligned} \begin{aligned} M/\varepsilon&= M,&(MN)/0u&= M/u,&(MN)/1u&= N/u,&(\lambda x.M)/0u&= M/u. \end{aligned} \end{aligned}$$

When \(N = M/u\), we sometimes call u an occurrence of N (in M).

When \(v \in [M]\) but \(v0 \not \in [M]\), M/v is a variable or a constant. For each \(u \in [M]\), we refer to the unique occurrence of a variable or constant in [M] of the form \(u 0^k\) as the head of u (in M); we also call the variable or constant occurring at the head of u the head of M/u.

A position \(v \in [M]\) binds a position \(u \in [M]\) if M/u is a variable x and v is the longest prefix of u such that M/v is a \(\lambda \)-abstract of the form \(\lambda x.N\). When v binds u in M, we write \(v = b_{M}(u)\). When every occurrence in M of a \(\lambda \)-abstract is the binder of some position, M is called a \(\lambda I\) -term.

Let \(M \in \mathrm {LNF}^{\alpha }_{\varnothing }(\varSigma )\). Note that an occurrence \(v \in [M]\) of a variable or a constant of type \(\beta \) with \(\mathrm {arity}(\beta ) = n\) is always accompanied by n arguments, so that \(v 0^{-i}\) is defined for all \(i \le n\). The set of replaceable occurrences [2] of bound variables in M and the negative subpremise \(\mathrm {nsp}_{M}(u)\) of \(\alpha \) associated with such an occurrence u, are defined as follows:Footnote 1

  1. (i)

    If \(b_{M}(u) = 0^{j-1}\) for some \(j \ge 1\) (i.e., \(b_{M}(u)\) is the jth of the leading \(\lambda \)s of M), then u is replaceable and \(\mathrm {nsp}_{M}(u) = 0^{j-1} 1\).

  2. (ii)

    If \(b_{M}(u) = v 0^{-i} 1 0^{j-1}\) for some replaceable v and \(i,j \ge 1\) (i.e., \(b_{M}(u)\) is the jth of the leading \(\lambda \)s of the ith argument of v), then u is replaceable and \(\mathrm {nsp}_{M}(u) = \mathrm {nsp}_{M}(v) 0^{i-1} 1 0^{j-1} 1\).

It is easy to see that the following conditions always hold:

  • If u is a replaceable occurrence of a bound variable \(x^{\beta }\), then \(\beta = \alpha /\mathrm {nsp}_{M}(u)\).

  • If M is a \(\lambda I\)-term (in addition to belonging to \(\mathrm {LNF}^{\alpha }_{\varnothing }(\varSigma )\)), then for every \(v \in [\alpha ]^-_{\mathrm {sp}}\), there exists a \(u \in [M]\) such that \(\mathrm {nsp}_{M}(u) = v\).

Example 1

Let

$$\begin{aligned} M = \lambda y_1^o y_2^{o \rightarrow (o \rightarrow (o \rightarrow o) \rightarrow o) \rightarrow o}.y_2(f y_1 a)(\lambda y_3^o y_4^{o \rightarrow o}.f(y_4 (f y_3 y_1))(y_4 (f y_3 y_1))). \end{aligned}$$

Then \(M \in \mathrm {LNF}^{\alpha }_{\varnothing }(\varDelta )\), where \(\varDelta \) contains constants fa of type \(o \rightarrow o \rightarrow o\) and o, respectively, and

$$\begin{aligned} \alpha = {\mathop {o}\limits ^{1} \rightarrow \,(\underbrace{o \rightarrow (\underbrace{\mathop {o}\limits ^{01011} \rightarrow \,(\underbrace{o \rightarrow o}_{010101}) \rightarrow o}_{0101}) \rightarrow o}_{01}) \rightarrow o}. \end{aligned}$$
  • The bound variable \(y_1^o\) occurs in M at three positions, 000101, 001000111, 00100111, whose binder is \(\varepsilon \). These positions are associated with the negative subpremise 1 in \(\alpha \).

  • The bound variable \(y_2^{o \rightarrow (o \rightarrow (o \rightarrow o) \rightarrow o) \rightarrow o}\) occurs in M at one position, 0000, whose binder is 0. This position is associated with the subpremise 01 in \(\alpha \).

  • The bound variable \(y_3^o\) occurs in M at two positions, 0010001101 and 001001101, whose binder is 001. These positions are associated with the negative subpremise 0101.

  • The bound variable \(y_4^{o \rightarrow o}\) occurs in M at two positions, 00100010 and 0010010, whose binder is 0010. These positions are associated with the negative subpremise 010101.

2.2 Almost Linear Lambda Terms over a Tree Signature

Now we are going to assume that \(\varDelta \) is a tree signature; i.e., every constant of \(\varDelta \) is of type \(o^r \rightarrow o\) for some \(r \ge 0\), where o is the only atomic type of \(\varDelta \). For a closed \(M \in \mathrm {LNF}^{\alpha }_{\varnothing }(\varDelta )\), every occurrence of a bound variable in M is replaceable.

A tree is an element of \(\mathrm {LNF}^o_{\varnothing }(\varDelta )\). A closed \(\lambda \)-term \(M \in \mathrm {LNF}^{o^r \rightarrow o}_{\varnothing }(\varDelta )\) is called a tree context. We say that a tree context \(M = \lambda x_1 \dots x_r.N\) matches a tree T if there are trees \(T_1,\dots ,T_r\) such that \((\lambda x_1 \dots x_r.N) T_1 \dots T_r \mathrel {\twoheadrightarrow _{\beta }}T\). We say that M is contained in T if it matches a subtree of T.

The notion of an almost linear \(\lambda \) -term was introduced by Kanazawa [5, 7]. Briefly, a closed typed \(\lambda \)-term is almost linear if every occurrence of a \(\lambda \)-abstract \(\lambda x^{\alpha }.N\) in it binds a unique occurrence of \(x^{\alpha }\), unless \(\alpha \) is atomic, in which case it may bind more than one occurrence of \(x^{\alpha }\). Almost linear \(\lambda \)-terms share many of the properties of linear \(\lambda \)-terms; see [58] for details.

Almost linear \(\lambda \)-terms are typically not \(\beta \)-normal. For instance, \(\lambda y^{o \rightarrow o}.(\lambda x^o.f xx)(yc)\), where f and c are constants of type \(o \rightarrow o \rightarrow o\) and o, respectively, is almost linear, but its \(\beta \)-normal form, \(\lambda y^{o \rightarrow o}.f(yc)(yc)\), is not. In this paper, we choose to deal with the \(\eta \)-long \(\beta \)-normal forms of almost linear \(\lambda \)-terms directly, rather than through their almost linear \(\beta \)-expanded forms.

We write \(\mathrm {AL}^{\alpha }(\varDelta )\) for the set of closed \(\lambda \)-terms in \(\mathrm {LNF}^{\alpha }_{\varnothing }(\varDelta )\) that \(\beta \)-expand to an almost linear \(\lambda \)-term. (The superscript is often omitted.) The following lemma, which we do not prove here, may be taken as the definition of \(\mathrm {AL}^{\alpha }(\varDelta )\) (see [7, 8] for relevant properties of almost linear \(\lambda \)-terms):

Lemma 1

Let M be a closed \(\lambda I\)-term in \(\mathrm {LNF}^{\alpha }_{\varnothing }(\varDelta )\). Then \(M \in \mathrm {AL}^{\alpha }(\varDelta )\) if and only if the following conditions hold for all bound variable occurrences \(u,v \in [M]\) such that \(\mathrm {nsp}_{M}(u) = \mathrm {nsp}_{M}(v)\), where \(n = \mathrm {arity}(\alpha /\mathrm {nsp}_{M}(u))\):

  1. (i)

    \(\{\, w \mid u 0^{-n} w \in [M] \,\} = \{\, w \mid v 0^{-n} w \in [M] \,\}\).

  2. (ii)

    If \(M/u 0^{-n} w\) is a constant, then \(M/u 0^{-n} w = M/v 0^{-n} w\).

  3. (iii)

    If \(M/u 0^{-n} w\) is a variable, then \(M/v 0^{-n} w\) is also a variable and \(\mathrm {nsp}_{M}(u 0^{-n} w) = \mathrm {nsp}_{M}(v 0^{-n} w)\).

We call \(M \in \mathrm {AL}^{\alpha }(\varDelta )\) a canonical writing if for all bound variable occurrences uv of M, \(\mathrm {nsp}_{M}(u) = \mathrm {nsp}_{M}(v)\) implies \(M/u = M/v\) and vice versa. For example, \(\lambda y_1^{(o \rightarrow o) \rightarrow o} y_2^{(o \rightarrow o) \rightarrow o}. f(y_1(\lambda z_1^{o}.z_1))(y_1(\lambda z_1^{o}.z_1))(y_2(\lambda z_2^{o}.z_2))\) is a canonical writing, whereas neither \(\lambda y_1^{(o \rightarrow o) \rightarrow o} y_2^{(o \rightarrow o) \rightarrow o}. f(y_1(\lambda z_1^{o}.z_1))(y_1(\lambda z_2^{o}.z_2))(y_2(\lambda z_3^{o}.z_3))\) nor \(\lambda y_1^{(o \rightarrow o) \rightarrow o} y_2^{(o \rightarrow o) \rightarrow o}. f(y_1(\lambda z_1^{o}.z_1))(y_1(\lambda z_1^{o}.z_1))(y_2(\lambda z_1^{o}.z_1))\) is.

Lemma 2

For every \(M \in \mathrm {AL}^{\alpha }(\varDelta )\), there exists a canonical writing \(M' \in \mathrm {AL}^{\alpha }(\varDelta )\) such that \(M' \equiv _{\alpha } M\).

A pure \(\lambda \)-term is a \(\lambda \)-term that contains no constant. We write \(\mathrm {AL}^{\alpha }\) for the subset of \(\mathrm {AL}^{\alpha }(\varDelta )\) consisting of pure \(\lambda \)-terms. An important property of \(\mathrm {AL}^{\alpha }(\varDelta )\) that we heavily rely on in what follows is that every \(M \in \mathrm {AL}^{\alpha }(\varDelta )\) can be expressed in a unique way as an application \({M}^{\circ } {M}^{\bullet }_{1} \dots {M}^{\bullet }_{l}\) of a pure \(\lambda \)-term \({M}^{\circ }\) to a list of tree contexts \({M}^{\bullet }_{1},\dots ,{M}^{\bullet }_{l}\). We call the former the container of M and the latter its stored tree contexts. These \(\lambda \)-terms satisfy the following conditions:

  1. 1.

    \(l \le |[\alpha ]^+_{\mathrm {sp}}|+1\),

  2. 2.

    \({M}^{\bullet }_{i} \in \mathrm {AL}^{o^{r_i} \rightarrow o}(\varDelta )\) for some \(r_i \le |[\alpha ]^-_{\mathrm {sp}}|\) for each \(i=1,\dots ,l\),

  3. 3.

    \({M}^{\circ } \in \mathrm {AL}^{(o^{r_1} \rightarrow o) \rightarrow \dots \rightarrow (o^{r_l} \rightarrow o) \rightarrow \alpha }\),

  4. 4.

    \({M}^{\circ } {M}^{\bullet }_{1} \dots {M}^{\bullet }_{l} \mathrel {\twoheadrightarrow _{\beta }}M\).

The formal definition of this separation of \(M \in \mathrm {AL}^{\alpha }(\varDelta )\) into its container and stored tree contexts is rather complex, but the intuitive idea is quite simple. The stored tree contexts of M are the maximal tree contexts that can be discerned in the input \(\lambda \)-term.

Example 2

Consider the \(\lambda \)-term M of type \(\alpha = o \rightarrow (o \rightarrow (o \rightarrow (o \rightarrow o) \rightarrow o) \rightarrow o) \rightarrow o\) in Example 1. This \(\lambda \)-term belongs to \(\mathrm {AL}^{\alpha }(\varDelta )\). Its container and stored tree contexts are:

$$\begin{aligned} \begin{gathered} {M}^{\circ } = \lambda z_1^{o \rightarrow o} z_2^{o \rightarrow o} z_3^{o \rightarrow o \rightarrow o} y_1^o y_2^{o \rightarrow (o \rightarrow (o \rightarrow o) \rightarrow o) \rightarrow o}.y_2 (z_1 y_1) (\lambda y_3^o y_4^{o \rightarrow o}.z_2 (y_4 (z_3 y_3 y_1)),\\ \begin{aligned} {M}^{\bullet }_{1}&= \lambda x_1.f x_1 a,&{M}^{\bullet }_{2}&= \lambda x_1.f x_1 x_1,&{M}^{\bullet }_{3}&= \lambda x_1 x_2.f x_1 x_2. \end{aligned} \end{gathered} \end{aligned}$$

Here is the formal definition. Let \(M \in \mathrm {AL}^{\alpha }(\varDelta )\). We assume that M is canonical. Then \(|[\alpha ]^-_{\mathrm {sp}}|\) is exactly the number of distinct bound variables in M. Let \(s_1,\dots ,s_k\) list the elements of \([\alpha ]^-_{\mathrm {sp}}\) in lexicographic order. Let \(y_1,\dots ,y_k\) be the corresponding list of bound variables in M, and let \(n_i = \mathrm {arity}(\alpha /s_i)\) for each \(i=1,\dots , k\). Note that

$$\begin{aligned} \sum _{i=1}^k n_i \le |[\alpha ]^+_{\mathrm {sp}}|. \end{aligned}$$

The canonicity of M implies that every occurrence of \(y_i\) in M is accompanied by the exact same list of arguments \(N_{i,1},\dots , N_{i,n_i}\). The type of \(N_{i,j}\) is \(\alpha /s_i 0^{j-1} 1\).

Let \(x_1,\dots ,x_k\) be fresh variables of type o. For each subterm N of M of type o, define \(N^{\blacktriangle }\) by

$$\begin{aligned} \begin{aligned} (c T_1 \dots T_n)^{\blacktriangle }&= c T_1^{\blacktriangle } \dots T_n^{\blacktriangle },&(y_i N_{i,1} \dots N_{i,n_i})^{\blacktriangle }&= x_i. \end{aligned} \end{aligned}$$

Let \(M'\) be the maximal subterm of M of atomic type; in other words, \(M'\) is the result of stripping M of its leading \(\lambda \)s. Likewise, let \(N_{i,j}'\) be the maximal subterm of \(N_{i,j}\) of atomic type. Let \((M_1,\dots ,M_l)\) be the sublist of

$$\begin{aligned} (M', N_{1,1}',\dots ,N_{1,n_1}',\dots ,N_{k,1}',\dots ,N_{k,n_k}') \end{aligned}$$

consisting of the \(\lambda \)-terms whose head is a constant. (This list will contain duplicates if there exist \(i_1,j_1,i_2,j_2\) such that \((i_1,j_1) \ne (i_2,j_2)\), \(N_{i_1,j_1}' = N_{i_2,j_2}'\), and the head of this \(\lambda \)-term is a constant.) For each \(i = 1,\dots ,l\), let \(x_{m_{i,1}},\dots ,x_{m_{i,r_i}}\) list the variables in \(M_i^{\blacktriangle }\), in the order of their first appearances in \(M_i^{\blacktriangle }\). Define

$$\begin{aligned} \begin{aligned} {M}^{\bullet }_{i}&= \lambda x_{m_{i,1}} \dots x_{m_{i,r_i}}.M_i^{\blacktriangle },&\overrightarrow{M^{\bullet }}&= ({M}^{\bullet }_{1},\dots ,{M}^{\bullet }_{l}). \end{aligned} \end{aligned}$$

These are the stored tree contexts of M.

In order to define the container \({M}^{\circ }\), we first define \(N^{\vartriangle }\) by induction for each subterm N of M that is either (i) some \(M_i\), (ii) a \(\lambda \)-term of atomic type whose head is a variable, or (iii) a \(\lambda \)-abstract. Let \(z_1,\dots ,z_l\) be fresh variables of type \(o^{r_1} \rightarrow o,\dots ,o^{r_l} \rightarrow o\), respectivelyFootnote 2.

$$\begin{aligned}\begin{gathered} M_i^{\vartriangle } = z_i (y_{m_{i,1}} N_{m_{i,1},1} \dots N_{m_{i,1},n_{m_{i,1}}})^{\vartriangle } \dots (y_{m_{i,r_i}} N_{m_{i,r_i},1} \dots N_{m_{i,r_i},n_{m_{i,r_i}}})^{\vartriangle },\\ \begin{aligned} (y_i N_{i,1} \dots N_{i,n_i})^{\vartriangle }&= y_i N_{i,1}^{\vartriangle } \dots N_{i,n_i}^{\vartriangle },\\ (\lambda y_i.N)^{\vartriangle }&= \lambda y_i.N^{\vartriangle }. \end{aligned} \end{gathered}\end{aligned}$$

Finally, define

$$\begin{aligned} {M}^{\circ } = \lambda z_1 \dots z_l.M^{\vartriangle }. \end{aligned}$$

Lemma 3

\({M}^{\circ }, \overrightarrow{M^{\bullet }}\) satisfy the required conditions.

Lemma 4

Let \(N \in \mathrm {AL}^{\alpha _1 \rightarrow \dots \rightarrow \alpha _n \rightarrow \beta }(\varDelta ), M_i \in \mathrm {AL}^{\alpha _i}(\varDelta )\ (i = 1,\dots ,n)\), and \(P = |NM_1 \dots M_n|_{\beta } \in \mathrm {AL}^{\beta }(\varDelta )\). Suppose

$$\begin{aligned} \begin{gathered} \begin{aligned} \overrightarrow{M_i^{\bullet }}&= ({(M_i)}^{\bullet }_{1},\dots ,{(M_i)}^{\bullet }_{l_i}),&{(M_i)}^{\bullet }_{j}&\in \mathrm {AL}^{o^{r_{i,j}} \rightarrow o}(\varDelta ), \end{aligned} \quad \overrightarrow{P^{\bullet }} = ({P}^{\bullet }_{1},\dots ,{P}^{\bullet }_{m}). \end{gathered} \end{aligned}$$

For \(i=1,\dots ,n\) and \(j=1,\dots ,l_i\), let \(c_{i,j}\) be a fresh constant of type \(o^{r_{i,j}} \rightarrow o\). Let \(\varDelta '\) be the tree signature that extends \(\varDelta \) with the \(c_{i,j}\), and let

$$\begin{aligned} Q = |N ({(M_1)}^{\circ } c_{1,1} \dots c_{1,l_1}) \dots ({(M_n)}^{\circ } c_{n,1} \dots c_{n,l_n})|_{\beta }. \end{aligned}$$

We can compute the container and stored tree contexts of \(Q \in \mathrm {AL}^{\beta }(\varDelta ')\) with respect to \(\varDelta '\). Then we have

$$\begin{aligned} \begin{aligned} {P}^{\circ }&= {Q}^{\circ },&{P}^{\bullet }_{i}&= |({Q}^{\bullet }_{i})[c_{i,j} := {(M_i)}^{\bullet }_{j}]|_{\beta }, \end{aligned} \end{aligned}$$

where \([c_{i,j} := {(M_i)}^{\bullet }_{j}]\) denotes the substitution of \({(M_i)}^{\bullet }_{j}\) for each \(c_{i,j}\).

Definition 1

Let \(M \in \mathrm {AL}^{\alpha }(\varDelta )\).

  1. (i)

    The unlimited profile of M is \(\mathrm {prof}_{\infty }(M) = ({M}^{\circ },w_1,\dots ,w_l)\), where l is the length of \(\overrightarrow{M^{\bullet }} = ({M}^{\bullet }_{1},\dots ,{M}^{\bullet }_{l})\) and for each i, \(w_i\) is the \(r_i\)-tuple of positive integers whose jth component is the number of occurrences of the jth bound variable in \({M}^{\bullet }_{i}\).

  2. (ii)

    For \(k \ge 1\), the k-threshold profile of M, written \(\mathrm {prof}_k(M)\), is just like its unlimited profile except that any number greater than k is replaced by \(\infty \).

The type of the (unlimited or k-threshold) profile of M is \(\alpha \).

Example 3

The unlimited profile of the \(\lambda \)-term M from Example 1 is \(\mathrm {prof}(M) = ({M}^{\circ },(1),(2),(1,1))\). Its 1-threshold profile is \(\mathrm {prof}_1(M) = ({M}^{\circ },(1),(\infty ),(1,1))\), and its k-threshold profile for \(k \ge 2\) is the same as its unlimited profile.

Lemma 5

For each \(k \ge 1\) and type \(\alpha \), there are only finitely many k-threshold profiles of type \(\alpha \).

We say that a k-threshold profile \(({M}^{\circ },w_1,\dots ,w_l)\) is k-bounded if \(w_i \in \{ 1,\dots ,k \}^{r_i}\) for \(i=1,\dots ,l\). A \(\lambda \)-term \(M \in \mathrm {AL}(\varDelta )\) that has a k-bounded profile is called k-bounded. We write \(\mathrm {AL}^{\alpha }_k(\varDelta )\) for the set of all k-bounded \(\lambda \)-terms in \(\mathrm {AL}^{\alpha }(\varDelta )\).

Note that \(M \in \mathrm {AL}(\varDelta )\) is linear if and only if it is 1-bounded and has a linear container.

Lemma 6

Let \(N \in \mathrm {AL}^{\alpha _1 \rightarrow \dots \rightarrow \alpha _n \rightarrow \beta }(\varDelta )\), and \(M_i,M_i' \in \mathrm {AL}^{\alpha _i}(\varDelta )\) for each \(i=1,\dots ,n\). Suppose that for each \(i=1,\dots ,n\), \(\mathrm {prof}_k(M_i) = \mathrm {prof}_k(M_i')\). Then \(\mathrm {prof}_k(|N M_1 \dots M_n|_{\beta }) = \mathrm {prof}_k(|N M_1' \dots M_n'|_{\beta })\).

The above lemma justifies the notation \(N \pi _1 \dots \pi _n\) for \(\mathrm {prof}_k(|NM_1 \dots M_n|_{\beta })\) with \(\mathrm {prof}_k(M_i) = \pi _i\), when k is understood from context. When \(N = \lambda x_1 \dots x_n.Q\), we may also write \(Q[x_1 := \pi _1,\dots ,x_n := \pi _n]\) for \(N \pi _1 \dots \pi _n\). In this way, we can freely write profiles in expressions that look like \(\lambda \)-terms, like \(\lambda x. \pi _1(M x \pi _2)\).

Lemma 7

Given a \(\lambda \)-term \(N \in \mathrm {AL}^{\alpha _1 \rightarrow \dots \rightarrow \alpha _n \rightarrow \beta }(\varDelta )\) and k-threshold profiles \(\pi _1,\dots ,\pi _n\) of type \(\alpha _1,\dots ,\alpha _n\), respectively, the k-threshold profile \(N\pi _1 \dots \pi _n\) can be computed in polynomial time.

In what follows, we often speak of “profiles” to mean k-threshold profiles, letting the context determine the value of k.

2.3 Almost Linear Second-Order ACGs on Trees

A (tree-generating) almost linear second-order ACG \(\mathscr {G} = (\varSigma ,\varDelta ,\mathcal {H},\mathscr {I})\) consists of a second-order signature \(\varSigma \) (abstract vocabulary), a tree signature \(\varDelta \) (object vocabulary), a set \(\mathscr {I} \subseteq A_{\varSigma }\) of distinguished types, and a higher-order homomorphism \(\mathcal {H}\) that maps each atomic type \(p \in A_{\varSigma }\) to a type \(\mathcal {H}(p)\) over \(A_{\varDelta }\) and each constant \(c \in C_{\varSigma }\) to its object realization \(\mathcal {H}(c) \in \mathrm {AL}^{\mathcal {H}(\tau _{\varDelta }(c))}(\varDelta )\). It is required that the image of \(\mathscr {I}\) under \(\mathcal {H}\) is \(\{o\}\). That \(\varSigma \) is second-order means that for every \(c \in C_{\varSigma }\), its type \(\tau _{\varSigma }(c)\) is of the form \(p_1 \rightarrow \dots \rightarrow p_n \rightarrow q\); thus, any \(\lambda \)-term in \(\mathrm {LNF}^p_{\varnothing }(\varSigma )\) for \(p \in A_{\varSigma }\) has the form of a tree. A closed abstract term \(P \in \mathrm {LNF}^{\alpha }_{\varnothing }(\varSigma )\) is homomorphically mapped by \(\mathcal {H}\) to its object realization \(|\mathcal {H}(P)|_{\beta } \in \mathrm {AL}^{\mathcal {H}(\alpha )}(\varDelta )\). For \(p \in A_{\varSigma }\), we write \(\mathcal {S}(\mathscr {G},p)\) for \(\{\, |\mathcal {H}(P)|_{\beta } \mid P \in \mathrm {LNF}^p_{\varnothing }(\varSigma ) \,\}\) and \(\mathcal {C}(\mathscr {G},p)\) for \(\{\, |\mathcal {H}(Q)|_{\beta } \mid Q \text { is a closed linear }\lambda \text {-term in }\mathrm {LNF}^{p \rightarrow s}_{\varnothing }(\varSigma )\text { for some }s \in \mathscr {I} \,\}\). The elements of these sets are substructures and contexts of \(\mathscr {G}\), respectively. The tree language generated by \(\mathscr {G}\) is \(\mathcal {O}(\mathscr {G}) = \bigcup _{s \in \mathscr {I}} \mathcal {S}(\mathscr {G},s)\).

An abstract constant \(c \in C_{\varSigma }\) together with its type \(\tau (c)\) and its object realization \(\mathcal {H}(c)\) corresponds to a rule in more traditional grammar formalisms. An abstract atomic type \(p \in A_{\varSigma }\) corresponds to a nonterminal. We say that \(\mathscr {G}\) is rule-k-bounded if \(\mathcal {H}(c)\) is k-bounded for every abstract constant \(c \in C_{\varSigma }\).

Definition 2

Let \(\mathscr {G} = (\varSigma ,\varDelta ,\mathcal {H},\mathscr {I})\) be a tree-generating almost linear second-order ACG.

  1. (i)

    We say that \(\mathscr {G}\) is substructure-k-bounded if \(\mathcal {S}(\mathscr {G},p) \subseteq \mathrm {AL}^{\mathcal {H}(p)}_k(\varDelta )\) for all atomic types \(p \in A_{\varSigma }\).

  2. (ii)

    We say that \(\mathscr {G}\) is context-k-bounded if \(\mathcal {C}(\mathscr {G},p) \subseteq \mathrm {AL}^{\mathcal {H}(p) \rightarrow o}_k(\varDelta )\) for all atomic types \(p \in A_{\varSigma }\).

The set of possible k-threshold profiles of elements of \(\mathcal {S}(\mathscr {G},p)\) or \(\mathcal {C}(\mathscr {G},p)\) can easily be computed thanks to Lemmas 5 and 6, so substructure-k-boundedness and context-k-boundedness are both decidable properties of almost linear second-order ACGs. Conversely, one can design a substructure-k-bounded almost linear ACG by first assigning to each \(p \in A_\varSigma \) a possible profile set \(\varPi _p\) consisting of profiles of type \(\mathcal {H}(p)\); then, as the realization \(\mathcal {H}(c)\) of a constant c of type \(p_1 \rightarrow \dots \rightarrow p_n \rightarrow q\), we admit only \(\lambda \)-terms in \(\mathrm {AL}_k^{\mathcal {H}(p_1 \rightarrow \dots \rightarrow p_n \rightarrow q)}(\varDelta )\) that satisfy

$$\begin{aligned} \mathcal {H}(c) \varPi _{p_1} \dots \varPi _{p_n} \subseteq \varPi _q,\end{aligned}$$
(2)

where \(M \varPi _1 \dots \varPi _n = \{\, M \pi _1 \dots \pi _n \mid \pi _i \in \varPi _i\ (i=1,\dots ,n)\,\}\). To construct a context-k-bounded almost linear ACG, we need to assign a possible context profile set \(\varXi _p\) in addition to \(\varPi _p\) to each \(p \in A_\varSigma \). The realization \(\mathcal {H}(c)\) must satisfy

$$\begin{aligned} \lambda x. \varXi _q( \mathcal {H}(c)\varPi _{p_1} \dots \varPi _{p_{i-1}} x \varPi _{p_{i+1}} \dots \varPi _{p_n}) \subseteq \varXi _{p_i} \end{aligned}$$
(3)

for all \(i=1,\dots ,n\), in addition to (2). Note that (2) and (3) are “local” properties of rules of ACGs. Instead of Definition 2, one may take this local constraint as a definition of substrucure/context-k-bounded almost linear ACGs.

Example 4

Let \(\mathscr {G} = (\varSigma ,\varDelta ,\mathcal {H},\mathscr {I})\), where \(A_{\varSigma } = \{ p_1,p_2,s \}\), \(C_{\varSigma } = \{ a,b,c_1,c_2,d_1,d_2 \}\), \(\tau _{\varSigma }(a) = p_1 \rightarrow s\), \(\tau _{\varSigma }(b) = p_2 \rightarrow p_1\), \(\tau _{\varSigma }(c_i) = p_i \rightarrow p_i\), \(\tau _{\varSigma }(d_i) = p_i\), \(A_{\varDelta } = \{ o \}\), \(C_{\varDelta } = \{ e,f \}\), \(\tau _{\varDelta }(f) = o \rightarrow o \rightarrow o\), \(\tau _{\varDelta }(e) = o\), \(\mathscr {I} = \{ s \}\), \(\mathcal {H}(p_i) = (o \rightarrow o) \rightarrow o \rightarrow o\), \(\mathcal {H}(s)=o\) and

$$\begin{aligned} \mathcal {H}(a)&= \lambda x^{(o \rightarrow o) \rightarrow o \rightarrow o}.x (\lambda z^{o}. z) e,\\ \mathcal {H}(b)&= \lambda x^{(o \rightarrow o) \rightarrow o \rightarrow o}y^{o \rightarrow o}{z}^{o}.x(\lambda {w^o}. y({f w w})) {z},\\ \mathcal {H}(c_i)&= \lambda x^{(o \rightarrow o) \rightarrow o \rightarrow o}y^{o \rightarrow o}{z}^{o}.x(\lambda {w}^o. y w) (f {z} {z}),\\ \mathcal {H}(d_i)&= \lambda y^{o \rightarrow o}z^o.y(fzz) \,. \end{aligned}$$

This grammar is rule-2-bounded and generates the set of perfect binary trees of height \(\ge 1\). We have, for example, \(\mathcal {H}({b(c_2d_2)}) \in \mathcal {S}(\mathscr {G},p_1)\) and \(\mathcal {H}({\lambda x^{p_2}.a (c_1 (b (c_2x)))}) \in \mathcal {C}(\mathscr {G},p_2)\), and

$$\begin{aligned} \begin{aligned} |\mathcal {H}(b(c_2d_2))|_{\beta }&= \lambda y^{o \rightarrow o} z^{o}. y (f(f(fzz)(fzz))(f(fzz)(fzz))),\\ |\mathcal {H}(\lambda x^{p_2}.a(c_1(b (c_2 x))))|_{\beta }&= \lambda x^{(o \rightarrow o) \rightarrow o \rightarrow o}.x (\lambda z. f z z )(f(fee)(fee)). \end{aligned} \end{aligned}$$

One can see

$$\begin{aligned} \mathrm {prof}_\infty (\mathcal {S}(\mathscr {G},p_1))=\mathrm {prof}_\infty (\mathcal {S}(\mathscr {G},p_2)) =\{\,(\lambda {z_1}^{o \rightarrow o} y^{o \rightarrow o} {w}^o. y({z_1w}),(2^n)) \mid n \ge 1 \,\} \,,\end{aligned}$$

and

$$\begin{aligned} \mathrm {prof}_\infty (\mathcal {C}(\mathscr {G},p_1))&= \{ (\lambda {z_1}^o x^{(o \rightarrow o) \rightarrow o\rightarrow o}.x(\lambda {w}^o.{w}) {z_1}, ()) \}\,, \\ \mathrm {prof}_\infty (\mathcal {C}(\mathscr {G},p_2))&= \! \begin{aligned} \{&(\lambda {z_1}^o x^{(o \rightarrow o) \rightarrow o \rightarrow o}.x(\lambda {w}^o.{w}){z_1}, ()),\ \\&(\lambda {z_1}^{o \rightarrow o} {z_2}^o x^{(o \rightarrow o) \rightarrow o \rightarrow o}.x(\lambda {w}^o. {z_1} {w}) {z_2}, (2),()) \}\,. \end{aligned} \end{aligned}$$

The grammar is context-2-bounded, but not substructure-k-bounded for any k. If a new constant \(a'\) of type \(p_1 \rightarrow s\) with \(\mathcal {H}(a')=\lambda x^{(o \rightarrow o) \rightarrow o \rightarrow o}.x(\lambda z^o.fzz)e\) is added to \(\mathscr {G}\), the grammar is not context-2-bounded any more, since \(| \mathcal {H}(\lambda x^{p_2}. a' (b x)) |_\beta = \lambda x^{(o \rightarrow o) \rightarrow o \rightarrow o}.x (\lambda z^o. f (f z z)(fzz) )e \in \mathcal {C}(\mathscr {G},p_2)\).

3 Extraction of Tree Contexts from Trees

We say that \(M \in \mathrm {AL}^{\alpha }(\varDelta )\) is contained in a tree T if there is an \(N \in \mathrm {AL}^{\alpha \rightarrow o}(\varDelta )\) such that \(NM \mathrel {\twoheadrightarrow _{\beta }}T\). The problem of extracting \(\lambda \)-terms in \(\mathrm {AL}^{\alpha }(\varDelta )\) contained in a given tree reduces to the problem of extracting tree contexts from trees.

Explicitly enumerating all tree contexts of type \(o^r \rightarrow o\) is clearly intractable. A perfect binary tree with n leaves (labeled by the same constant) contains more than \(2^n\) tree contexts of type \(o \rightarrow o\).

It is easy to explicitly enumerate all tree contexts of type \(o^r \rightarrow o\) that are k-copying in the sense that each bound variable occurs at most k times. (Just pick at most \(rk+1\) nodes to determine such a tree context.) Hence it is easy to explicitly enumerate all \(M \in \mathrm {AL}^{\alpha }_k(\varDelta )\) whose stored tree contexts (which are all k-copying) are contained in a given tree. (Recall that there is a fixed finite set of candidate containers for each \(\alpha \).) Not all these \(\lambda \)-terms are themselves contained in T, but it is harmless and simpler to list them all than to enumerate exactly those \(\lambda \)-terms \(M \in \mathrm {AL}^{\alpha }_k(\varDelta )\) for which there is an \(N \in \mathrm {AL}^{\alpha \rightarrow o}(\varDelta )\) (which may not be k-bounded) such that \(MN \mathrel {\twoheadrightarrow _{\beta }}T\).

We consider distributional learners for tree-generating almost linear second-order ACGs who are capable of extracting k-copying tree contexts from trees. Such a learner conjectures rule-k-bounded almost linear ACGs, and use only k-bounded substructures and k-bounded contexts in order to form hypotheses.

4 Distributional Learning of One-Side k-bounded ACGs

We present two distributional learning algorithms, a primal one for the context-k-bounded almost linear ACGs, and a dual one for the substructure-k-bounded almost linear ACGs.

In distributional learning, we often have to fix certain parameters that restrict the class \(\mathbb {G}\) of grammars available to the learner as possible hypotheses, in order to make the universal membership problem solvable in polynomial time. This is necessary since the learner needs to check whether the previous conjecture generates all the positive examples received so far, including the current one. In the case of almost linear ACGs, the parameters are the maximal arity n of the type of abstract constants and the finite set \(\varOmega \) of the possible object images of abstract atomic types. When these parameters are fixed, the universal membership problem “\(T \in \mathcal {O}(\mathscr {G})\)?” is in P [7].

In addition to these two parameters, we also fix a positive integer k so that any hypothesized grammar is rule-k-bounded, for the reason explained in the previous section. The hypothesis space for our learners is thus determined by three parameters, \(\varOmega ,n,k\). We write \(\mathbb {G}(\varOmega ,n,k)\) for the class of grammars determined by these parameters.

In what follows, we often use sets of profiles or \(\lambda \)-terms inside expressions that look like \(\lambda \)-terms, as we did in (2) and (3) in Sect. 2.3.

4.1 Learning Context-k-bounded ACGs with the Finite Kernel Property

For \(\mathbf {T} \subseteq \mathrm {LNF}^o_{\varnothing }(\varDelta )\) and \(\mathbf {R} \subseteq \mathrm {AL}^\alpha (\varDelta )\), we define the k-bounded context set of \(\mathbf {R}\) with respect to \(\mathbf {T}\) by

$$\begin{aligned} \mathrm {Con}_k(\mathbf {T}|\mathbf {R}) = \{\, Q \in \mathrm {AL}_{k}^{\alpha \rightarrow o}(\varDelta ) \mid | QR |_\beta \in \mathbf {T} \text { for all } R \in \mathbf {R} \,\} \,.\end{aligned}$$

Definition 3

A context-k-bounded ACG \(\mathscr {G} = (\varSigma ,\varDelta ,\mathcal {H},\mathscr {I})\) is said to have the profile-insensitive (k, m)-finite kernel property if for every abstract atomic type \(p \in A_\varSigma \), there is a nonempty set \(\mathbf {S}_p \subseteq {\mathcal {S}(\mathscr {G},p) \cap {}} \mathrm {AL}_k^{\mathcal {H}(p)}(\varDelta )\) such that \(|\mathbf {S}_p| \le m\) and

$$\begin{aligned} \mathrm {Con}_k(\mathcal {O}(\mathscr {G})|\mathbf {S}_p) = \mathrm {Con}_k(\mathcal {O}(\mathscr {G})|\mathcal {S}(\mathscr {G},p)). \end{aligned}$$

This may be thought of as a primal analogue of the notion of (km)-FCP in [4] for the present case. It turns out, however, designing a distributional learning algorithm targeting grammars satisfying this definition is neither elegant nor quite as straightforward as existing distributional algorithms. One reason is that simply validating hypothesized rules against k-bounded contexts (see (1) in Sect. 1) does not produce a context-k-bounded grammar. Recall that to construct a context-k-bounded grammar, we must fix an assignment of an admissible substructure profile set \(\varPi _p\) and an admissible context profile set \(\varXi _p\) to each atomic type p which restricts the object realizations of abstract constants of each type. We let our learning algorithm use such an assignment together with finite sets of k-bounded substructures in constructing grammar rules, and make the validation of rules sensitive to the context profile set assigned to the “left-hand side” nonterminal. This naturally leads to the following definition:

Definition 4

A context-k-bounded ACG \(\mathscr {G} = (\varSigma ,\varDelta ,\mathcal {H},\mathscr {I})\) is said to have the profile-sensitive (km)-finite kernel property ((km)-FKP\(_{\mathrm {prof}}\)) if for every abstract atomic type \(p \in A_\varSigma \), there is a nonempty set \(\mathbf {S}_p \subseteq \mathcal {S}(\mathscr {G},p) \cap \mathrm {AL}_k^{\mathcal {H}(p)}(\varDelta )\) such that \(|\mathbf {S}_p| \le m\) and

$$\begin{aligned} \mathrm {Con}_k(\mathcal {O}(\mathscr {G}) | \mathbf {S}_p) \cap \mathrm {prof}_k^{-1}(\varXi ) = \mathrm {Con}_k(\mathcal {O}(\mathscr {G}) | \mathcal {S}(\mathscr {G},p)) \cap \mathrm {prof}_k^{-1}(\varXi ) \,, \end{aligned}$$
(4)

where \(\varXi = \mathrm {prof}_k(\mathcal {C}(\mathscr {G},p))\). Such a set \(\mathbf {S}_p\) is called a characterizing substructure set of p.

Clearly, if a context-k-bounded grammar satisfies Definition 3, then it satisfies the (km)-FKP\(_{\mathrm {prof}}\), so the class of grammars with \((k,m)\text {-FKP}_{\mathrm {prof}}\) is broader than the class given by Definition 3. The notion of \((k,m)\text {-FKP}_{\mathrm {prof}}\) is also monotone in k in the sense that (4) implies

$$\begin{aligned} \mathrm {Con}_{k+1}(\mathcal {O}(\mathscr {G}) | \mathbf {S}_p) \cap \mathrm {prof}_{k+1}^{-1}(\varXi ') = \mathrm {Con}_{k+1}(\mathcal {O}(\mathscr {G}) | \mathcal {S}(\mathscr {G},p)) \cap \mathrm {prof}_{k+1}^{-1}(\varXi ') \,, \end{aligned}$$

where \(\varXi ' = \mathrm {prof}_{k+1}(\mathcal {C}(\mathscr {G},p))= \mathrm {prof}_{k}(\mathcal {C}(\mathscr {G},p))\), as long as \(\mathscr {G}\) is context-k-bounded. This means that as we increase the parameter k, the class of grammars satisfying \((k,m)\text {-FKP}_{\mathrm {prof}}\) monotonically increases. This is another advantage of Definition 4 over Definition 3.

The polynomial enumerability of the k-bounded \(\lambda \)-terms makes an efficient primal distributional learner possible for the class of context-k-bounded grammars in \(\mathbb {G}(\varOmega ,n,k)\) with the (km)-FKP\(_\mathrm {\mathrm {prof}}\).

Algorithm. Hereafter we fix a learning target \(\mathbf {T}_* \subseteq \mathrm {LNF}^o_{\varnothing }(\varDelta )\) which is generated by \(\mathscr {G}_* = (\varSigma ,\varDelta ,\mathcal {H},\mathscr {I}) \in \mathbb {G}(\varOmega ,n,k)\) with the (km)-FKP\(_{\mathrm {prof}}\). We write \(\mathbf {S}^{[\varXi ]} = \mathrm {Con}_k(\mathbf {T}_* | \mathbf {S}) \cap \mathrm {prof}_k^{-1}(\varXi )\) for a k-bounded profile set \(\varXi \).

For a tree \(T \in \mathrm {LNF}^o_{\varnothing }(\varDelta )\), let \(\mathrm {Ext}^{\alpha }_k(T) = \{\, M \in \mathrm {AL}^{\alpha }_k(\varDelta ) \mid \overrightarrow{M^{\bullet }}\) are contained in \(T \,\}\). Define

$$\begin{aligned} \begin{aligned} \mathrm {Sub}_k^{\varOmega }(\mathbf {D})&= \bigcup \{\, \mathrm {Ext}^{\alpha }_k(T) \mid T \in \mathbf {D},\, \alpha \in \varOmega \,\},\\ \mathrm {Glue}_k^{\varOmega ,n}(\mathbf {D})&= \bigcup \{\, \mathrm {Ext}^{\alpha _1 \rightarrow \dots \rightarrow \alpha _j \rightarrow \alpha _0}_k(T) \mid T \in \mathbf {D},\, \alpha _i \in \varOmega \text { for }i = 1,\dots ,j \\& \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \text { and }j \le n \,\},\\ \mathrm {Con}_k^{\varOmega }(\mathbf {D})&= \bigcup \{\, \mathrm {Ext}^{\alpha \rightarrow o}_k(T) \mid T \in \mathbf {D},\, \alpha \in \varOmega \,\}. \end{aligned} \end{aligned}$$

It is easy to see that \(\mathcal {H}(c) \in \mathrm {Glue}_k^{\varOmega ,n}(\mathbf {T}_*)\) for all \(c \in C_{\varSigma }\).

figure a

Our learner (Algorithm 1) constructs a context-k-bounded ACG \(\hat{\mathscr {G}} = \mathcal {G}(\mathbf {K},\mathbf {B},\mathbf {F})= (\varGamma ,\varDelta ,\mathcal {J},\mathscr {J})\) from three sets \(\mathbf {K} \subseteq \mathrm {Sub}_k^\varOmega (\mathbf {D})\), \(\mathbf {B} \subseteq \mathrm {Glue}_k^{\varOmega ,n}(\mathbf {D})\) and \(\mathbf {F} \subseteq \mathrm {Con}_k^\varOmega (\mathbf {D})\), where \(\mathbf {D}\) is a finite set of positive examples given to the learner. As with previous primal learning algorithms, whenever we get a positive example that is not generated by our current conjecture, we expand \(\mathbf {K}\) and \(\mathbf {B}\), while in order to suppress incorrect rules, we keep expanding \(\mathbf {F}\).

Each abstract atomic type of our grammar is a triple of a subset of \(\mathbf {K}\), a k-threshold profile set, and a k-bounded profile set:

$$\begin{aligned} {A}_\varGamma = \{\, [\![ \mathbf {S},\varPi ,\varXi ]\!] \mid {}&\mathbf {S} \subseteq \mathbf {K} \cap \mathrm {prof}_k^{-1}(\varPi ) \text { with }1 \le |\mathbf {S}| \le m\text {, where for some }\alpha \in \varOmega , \\&~ \varPi \text { is a set of }k\text {-threshold profiles of type }\alpha \text { and} \\&~ \varXi \text { is a set of }k\text {-bounded profiles of type }\alpha \rightarrow o \,\} \,.\end{aligned}$$

We have \(|A_\varGamma | \le 2^{2 \ell } |\mathbf {K}|^m\), where \(\ell \) is the total number of profiles of relevant types, which is a constant.

The set of distinguished types is defined as

$$\begin{aligned} \mathscr {J} = \{\, [\![ \,\mathbf {S},\,\{(\lambda z^o.z)\},\,\{(\lambda y^o.y)\}\, ]\!] \in A_\varGamma \mid \mathbf {S} \subseteq \mathbf {T}_* \,\} \,,\end{aligned}$$

which is determined by membership queries. Define \({\mathcal {J}}([\![ \mathbf {S},\varPi ,\varXi ]\!] )\) to be the type of the profiles in \(\varPi \).

We have an abstract constant \(d \in C_\varGamma \) such that

$$\begin{aligned} \tau _\varGamma (d)&= [\![ \mathbf {S}_1,\varPi _1,\varXi _1 ]\!] \rightarrow \dots \rightarrow [\![ \mathbf {S}_j,\varPi _j,\varXi _j ]\!] \rightarrow [\![ \mathbf {S}_0,\varPi _0,\varXi _0 ]\!] \text { with } j \le n\,, \\ \mathcal {J}(d)&= R \in \mathbf {B}\,, \end{aligned}$$

if

  • \(R \varPi _1 \dots \varPi _j \subseteq \varPi _0\),

  • \(\lambda x. \varXi _0 (R \varPi _1 \dots \varPi _{i-1} x \varPi _{i+1} \dots \varPi _j) \subseteq \varXi _i\) for \(i=1,\dots ,j\),

  • \(| Q (R S_1 \dots S_j) |_\beta \in \mathbf {T}_*\) for all \(Q \in \mathbf {S}_0^{[\varXi _0]} \cap \mathbf {F}\) and \(S_i \in \mathbf {S}_i\) for \(i=1,\dots ,j\).

The last condition is checked with the aid of the membership oracle.

Lemma 8

We have \(\mathrm {prof}_k(N) \in \varPi \) for all \(N \in \mathcal {S}(\mathscr {G},[\![ \mathbf {S},\varPi ,\varXi ]\!] )\), and \(\mathrm {prof}_k(M) \in \varXi \) for all \(M \in \mathcal {C}(\mathscr {G},[\![ \mathbf {S},\varPi ,\varXi ]\!] )\). The grammar \(\mathcal {G}(\mathbf {K},\mathbf {B},\mathbf {F})\) is context-k-bounded.

Lemma 9

If \(\mathbf {K} \subseteq \mathbf {K}'\), then \(\mathcal {O}(\mathcal {G}(\mathbf {K},\mathbf {B},\mathbf {F})) \subseteq \mathcal {O}(\mathcal {G}(\mathbf {K}',\mathbf {B},\mathbf {F}))\).

If \(\mathbf {B} \subseteq \mathbf {B}'\), then \(\mathcal {O}(\mathcal {G}(\mathbf {K},\mathbf {B},\mathbf {F})) \subseteq \mathcal {O}(\mathcal {G}(\mathbf {K},\mathbf {B}',\mathbf {F}))\).

If \(\mathbf {F} \subseteq \mathbf {F}'\), then \(\mathcal {O}(\mathcal {G}(\mathbf {K},\mathbf {B},\mathbf {F})) \supseteq \mathcal {O}(\mathcal {G}(\mathbf {K},\mathbf {B},\mathbf {F}'))\).

Lemma 10

Let \(\mathbf {S}_p\) be a characterizing set of each atomic type \(p \in A_\varSigma \) of the target grammar \(\mathscr {G}_*\). Then \( \mathbf {S}_p \subseteq \mathrm {Sub}_k^\varOmega (\mathbf {T}_*) \). Moreover, if \(\mathbf {S}_p \subseteq \mathbf {K}\) for all \(p \in A_\varSigma \) and \(\mathcal {H}(c) \in \mathbf {B}\) for all \(c \in C_\varSigma \), then \(\mathbf {T}_* \subseteq \mathcal {O}(\mathcal {G}(\mathbf {K},\mathbf {B},\mathbf {F}))\) for any \(\mathbf {F}\).

We say that an abstract constant d of type \([\![ \mathbf {S}_1,\varPi _1,\varXi _1 ]\!] \rightarrow \dots \rightarrow [\![ \mathbf {S}_j,\varPi _j,\varXi _j ]\!] \rightarrow [\![ \mathbf {S}_0,\varPi _0,\varXi _0 ]\!] \) is invalid if \(| Q (\mathcal {J}(c) {S}_1 \dots {S}_j) |_\beta \notin \mathbf {T}_*\) for some \(Q \in \mathbf {S}_0^{[\varXi _0]}\) and \(S_i \in \mathbf {S}_i\).

Lemma 11

For every \(\mathbf {K}\) and \(\mathbf {B}\), there is a finite set \(\mathbf {F} \subseteq \mathrm {Con}_k^{\varOmega }(\mathbf {T}_*) \) of cardinality \(|\mathbf {B}||A_\varGamma |^{n+1}\) such that \(\mathcal {G}(\mathbf {K},\mathbf {B},\mathbf {F})\) has no invalid constant.

Lemma 12

If \(\mathcal {G}(\mathbf {K},\mathbf {B},\mathbf {F})\) has no invalid constant, then \(\mathcal {O}(\mathcal {G}(\mathbf {K},\mathbf {B},\mathbf {F})) \subseteq \mathbf {T}_*\).

Theorem 1

Algorithm 1 successfully learns all grammars in \(\mathbb {G}(\varOmega ,n,k)\) with the (km)-FKP\(_{\mathrm {prof}}\).

We remark on the efficiency of our algorithm. It is easy to see that the description sizes of \(\mathbf {K}\) and \(\mathbf {B}\) are polynomially bounded by that of \(\mathbf {D}\), and so is that of \(\varGamma \). We need at most a polynomial number of membership queries to construct a grammar. Thus Algorithm 1 updates its conjecture in polynomial time in \(\Vert \mathbf {D}\Vert \). Moreover, we do not need too much data. To make \(\mathbf {K}\) and \(\mathbf {B}\) satisfy the condition of Lemma 10, \(m |A_\varSigma |+|C_\varSigma |\) examples are enough. To remove invalid constants, polynomially many contexts are enough by Lemma 11.

4.2 Learning Substructure-k-bounded ACGs with the Finite Context Property

For sets \(\mathbf {T} \subseteq \mathrm {LNF}^o_\varnothing (\varDelta )\) and \(\mathbf {Q} \subseteq \mathrm {AL}_k^{\alpha \rightarrow o}(\varDelta )\), we define the k-bounded substructure set of \(\mathbf {Q}\) with respect to \(\mathbf {T}\) by

$$\begin{aligned} \mathrm {Sub}_k(\mathbf {T} | \mathbf {Q}) = \{\, R \in \mathrm {AL}_k^{\alpha }(\varDelta ) \mid | QR |_\beta \in \mathbf {T} \text { for all } Q \in \mathbf {Q} \,\} \,.\end{aligned}$$

Again, we target grammars that satisfy a property sensitive to profile sets assigned to nonterminals:

Definition 5

A substructure-k-bounded ACG \(\mathscr {G} = (\varSigma ,\varDelta ,\mathcal {H},\mathscr {I})\) is said to have the profile-sensitive (km)-finite context property ((km)-FCP\(_{\mathrm {prof}}\)) if for every abstract atomic type \(p \in A_\varSigma \), there is a nonempty set \(\mathbf {Q}_p \subseteq \mathcal {C}(\mathscr {G},p) \cap \mathrm {AL}_k^{\mathcal {H}(p) \rightarrow o}(\varDelta )\) of k-bounded \(\lambda \)-terms such that \(|\mathbf {Q}_p| \le m\) and

$$\begin{aligned} \mathrm {Sub}_k(\mathcal {O}(\mathscr {G}) | \mathbf {Q}_p) \cap \mathrm {prof}_k^{-1}(\varPi ) = \mathcal {S}(\mathscr {G},p) \,,\end{aligned}$$

where \(\varPi = \mathrm {prof}(\mathcal {S}(\mathscr {G},p))\). We call \(\mathbf {Q}_p\) a characterizing context set of p.

Algorithm. Our dual learner turns out to be considerably simpler than its primal cousin. While the primal learner uses two profile sets, the dual learner assigns just a single profile to each nonterminal. This corresponds to the fact that the context-profiles play no role in constructing a structure-k-bounded grammar and that the \((k,m)\text {-FCP}_{\mathrm {prof}}\) is preserved under the normalization which converts a grammar into an equivalent one \(\mathscr {G}'\) where \(\mathrm {prof}_k(\mathcal {S}(\mathscr {G}',p))\) is a singleton for all abstract atomic types p of \(\mathscr {G}'\), where it is not necessarily the case for the \((k,m)\text {-FKP}_{\mathrm {prof}}\).

Hereafter we fix a learning target \(\mathbf {T}_* \subseteq \mathrm {LNF}^o_\varnothing (\varDelta )\) which is generated by \(\mathscr {G}_* = (\varSigma ,\varDelta ,\mathcal {H},\mathscr {I}) \in \mathbb {G}(\varOmega ,n,k)\) with the (km)-FCP\(_{\mathrm {prof}}\). We write \(\mathbf {Q}^{[\pi ]} = \mathrm {Sub}_k( \mathbf {T}_* | \mathbf {Q} ) \cap \mathrm {prof}_k^{-1}(\pi )\) for a k-bounded profile \(\pi \).

Our learner (Algorithm 2) constructs a context-k-bounded ACG \(\hat{\mathscr {G}} = \mathcal {G}(\mathbf {F},\mathbf {B},\mathbf {K})= (\varGamma ,\varDelta ,\mathcal {J},\mathscr {J})\) from three sets \(\mathbf {F} \subseteq \mathrm {Con}_k^\varOmega (\mathbf {D})\), \(\mathbf {B} \subseteq \mathrm {Glue}_k^{\varOmega ,n}(\mathbf {D})\), and \(\mathbf {K} \subseteq \mathrm {Sub}_k^\varOmega (\mathbf {D})\), where \(\mathbf {D}\) is a finite set of positive examples.

figure b

Each abstract atomic type of our grammar is a pair of a finite subset of \(\mathbf {F} \cap \mathrm {AL}_k^\alpha (\varDelta )\) of cardinality at most m and a profile \(\pi \) whose type is \(\alpha \):

$$\begin{aligned} {A}_\varGamma = \{\, [\![ \mathbf {Q},\pi ]\!] \mid {}&\pi \text { is a }k\text {-bounded profile of type }\alpha \in \varOmega , \\&\mathbf {Q} \subseteq \mathbf {F} \cap \mathrm {AL}_k^{\alpha \rightarrow o}(\varDelta )\text { and } 1 \le |\mathbf {Q}| \le m \,\} \,.\end{aligned}$$

We have \(|A_\varGamma | \le |\mathbf {F}|^m {\ell }\) for \(\ell \) the number of possible profiles. We have only one distinguished type:

$$\begin{aligned} \mathscr {J} = \{\, [\![ \{\lambda y.y\},\, (\lambda z^o.z)\, ]\!] \,\} \,.\end{aligned}$$

We define \(\mathcal {J}([\![ \mathbf {Q},\pi ]\!] )\) to be the type of \(\pi \).

We have an abstract constant \(c \in C_\varGamma \) such that

$$\begin{aligned} \begin{aligned} \tau _\varGamma (c)&= [\![ \mathbf {Q}_1,\pi _1 ]\!] \rightarrow \dots \rightarrow [\![ \mathbf {Q}_j,\pi _j ]\!] \rightarrow [\![ \mathbf {Q}_0,\pi _0 ]\!] \text { with } j \le n\,,&\mathcal {J}(c)&= P \in \mathbf {B}\,, \end{aligned} \end{aligned}$$

if

  • \(\pi _0 = P \pi _1 \dots \pi _j\),

  • \(| Q (P S_1 \dots S_j) |_\beta \in \mathbf {T}_*\) for all \(Q \in \mathbf {Q}_0\) and \(S_i \in \mathbf {Q}_i^{[\pi _i]} \cap \mathbf {K}\).

The second clause is checked with the aid of the membership oracle. By the construction, \(\mathrm {prof}(|\mathcal {J}(M)|_{\beta }) \in \pi \) for every \(M \in \mathrm {LNF}_\varnothing ^{[\![ \mathbf {Q},\pi ]\!] }(\varGamma )\). Thus the grammar \(\hat{\mathscr {G}}\) is substructure-k-bounded.

Lemma 13

If \(\mathbf {F} \subseteq \mathbf {F}'\), then \(\mathcal {O}(\mathcal {G}(\mathbf {F},\mathbf {B},\mathbf {K})) \subseteq \mathcal {O}(\mathcal {G}(\mathbf {F}',\mathbf {B},\mathbf {K}))\).

If \(\mathbf {B} \subseteq \mathbf {B}'\), then \(\mathcal {O}(\mathcal {G}(\mathbf {F},\mathbf {B},\mathbf {K})) \subseteq \mathcal {O}(\mathcal {G}(\mathbf {F},\mathbf {B}',\mathbf {K}))\).

If \(\mathbf {K} \subseteq \mathbf {K}'\), then \(\mathcal {O}(\mathcal {G}(\mathbf {F},\mathbf {B},\mathbf {K})) \supseteq \mathcal {O}(\mathcal {G}(\mathbf {F},\mathbf {B},\mathbf {K}'))\).

Lemma 14

Let \(\mathbf {Q}_p\) be a characterizing set of each atomic type \(p \in A_\varSigma \) of the target grammar \(\mathscr {G}_*\). Then \( \mathbf {Q}_p \subseteq \mathrm {Con}_k^\varOmega (\mathbf {T}_*) \). Moreover, if \(\mathbf {Q}_p \subseteq \mathbf {F}\) for all \(p \in A_\varSigma \) and \(\mathcal {H}(c) \in \mathbf {B}\) for all \(c \in C_\varSigma \), then \(\mathbf {T}_* \subseteq \mathcal {O}(\mathcal {G}(\mathbf {F},\mathbf {B},\mathbf {K}))\) for any \(\mathbf {K}\).

We say that an abstract constant c of type \([\![ \mathbf {Q}_1,\pi _1 ]\!] \rightarrow \dots \rightarrow [\![ \mathbf {Q}_j,\pi _j ]\!] \rightarrow [\![ \mathbf {Q}_0,\pi _0 ]\!] \) is invalid if \(| Q (\mathcal {J}(c) S_1 \dots S_j) |_\beta \notin \mathbf {T}_*\) for some \(S_i \in \mathbf {Q}_i^{[\pi _i]}\) and \(Q \in \mathbf {Q}_0\).

Lemma 15

For every \(\mathbf {F}\) and \(\mathbf {B}\), there is a finite set \(\mathbf {K} \subseteq \mathrm {Sub}_k^{\varOmega }(\mathbf {T}_*) \) of cardinality \(n|\mathbf {B}||A_\varGamma |^{n+1}\) such that \(\mathcal {G}(\mathbf {F},\mathbf {B},\mathbf {K})\) has no invalid constant.

Lemma 16

If \(\mathcal {G}(\mathbf {F},\mathbf {B},\mathbf {K})\) has no invalid constant, then \(\mathcal {O}(\mathcal {G}(\mathbf {F},\mathbf {B},\mathbf {K})) \subseteq \mathbf {T}_*\).

Theorem 2

Algorithm 2 successfully learns all grammars in \(\mathbb {G}(\varOmega ,n,k)\) with the (km)-FCP\(_{\mathrm {prof}}\).

A remark similar to the one on the efficiency of Algorithm 1 applies to Algorithm 2.