1 Introduction

Context-free grammars are not able to cover all aspects of linguistic phenomena. The membership of a string in a context-sensitive grammar is decided in exponential time. To overcome the limitations of context-free and context-sensitive grammars, the concept of regulated grammar [1,2,3] was introduced. Regulated grammar is a class of grammar in which certain restrictions are imposed on context-free productions, making them more powerful than context-free grammar. Regulated grammar is classified into context and rule-based grammar. In context-based regulated grammar, restrictions are imposed on the context, whereas in rule-based regulated grammar, the restrictions are imposed on the application of rules.

In formal languages, the input string is either accepted (μ(w) = 1) or rejected (μ(w) = 0), where μ denotes the membership of a string, w. To reduce the gap between natural and classical formal languages, fuzziness has been introduced into formal languages, thereby giving rise to fuzzy languages. In recent years, fuzzy languages have been applied in various areas, such as intelligent interface design [4, 5], clinical monitoring [6], neural networks [7, 8] and pattern recognition [9]. Various fuzzy inference methods [10,11,12] are used in the design of fuzzy systems.

Motivated by the prior work of recognition of imperfect strings in context-free grammar [13] and context-sensitive grammar [14], in this paper, the recognition of faulty strings generated by the regulated grammar is described. As well, the problem of single and multiple fault occurrences is discussed. Faulty string refers to some imperfection caused in the string. Single fault refers to imperfection at one position in a string, whereas multiple faults refer to imperfection at more than one position in a string. Here, fuzzy sets are used to compute the membership of strings that do not exactly match with the grammar.

1.1 Previous work

Regularly controlled and matrix grammar were proposed by Ginsburg et al [15] and Cremers et al [16], respectively. In regularly controlled grammar, the control is regulated by the sequence of productions in the regular expression. In matrix grammar, control is regulated by following the same sequence given in the matrix. Solar and Meduna [17] proposed a state translation scheme, which is an extended version of state grammar [18], for producing two output strings in a single derivation. Meduna and Zemek [19] introduced one-sided random context grammars, a variant of random context grammar [20]. Furthermore, Meduna and Zemek [21] introduced one-sided random context grammars, with a limited number of right random context rules. Meduna [22] put forth generalized forbidding grammar. Furthermore, Meduna and Zemek [23] also introduced one-sided forbidding grammars, where each rule is further divided into left and right forbidden rules, and each rule consists of a finite set of forbidden strings. Meduna and Zemek [24] established that the power of one-sided forbidding grammars and selective substitution grammars [25, 26] are equivalent in the case of erasing rule grammar. Kalra and Kumar [27, 28] introduced the concept of deterministic deep pushdown automata and fuzziness in deep pushdown automata. Garhwal and Jiwari [29] introduced the concept of fuzziness in parallel regular expression.

The concept of fuzzy languages, a fuzzy subset of strings over an alphabet, was introduced by Lee and Zadeh [30]. Asveld [31] proposed fuzzy context-free K-grammar for describing correct and erroneous sentences. Asveld [32] devised the modified Cocke–Younger–Kasami (CYK) algorithm for recognizing fuzzy context-free grammar. Additionally, Schinder et al [13] described an approach for recognizing imperfect strings generated by the context-free grammar. Inui et al [14] expanded on the work of Schinder et al [13], describing an approach for identifying imperfect strings generated by the context-sensitive grammar. Schinder et al [13] and Inui et al [14] applied the concept of fuzzy sets to determine the membership values of strings. Here, the concept of errors and fuzzy sets in this context is utilized which was earlier used by Schinder et al [13] and Inui et al [14] regarding context-free grammar and context-sensitive grammar. Recently, Zhanga et al [33] and Bag et al [34] have applied the concept of pattern recognition in intelligent computing systems and prediction of consumer intention.

After introducing preliminary concepts in section 2, the following topics have been explained.

  • Recognition of faulty strings generated by regulated grammar using the different fuzzy sets has been proposed in section 3.

  • The problem of occurrence of multiple faults has been solved by converting the regulated grammar into Chomsky normal form in section 3.

  • The complete procedure is depicted by numerical example in section 4.

  • The proposed algorithm has been applied on the Hindi language to demonstrate its real- life application in section 5 and the conclusions are provided in section 6.

2 Preliminaries

Let T be an alphabet, λ denotes empty string and μ(w) denotes the membership of a string such that 0 ≤ μ(w) ≤ 1. _ represents the symbol that can be replaced by any symbol from T.

Definition 2.1

Regular expression over an alphabet T is defined using following rules:

  1. i)

    ϕ and λ are regular expressions representing the empty set and empty string.

  2. ii)

    If a ∊ T, then a is a regular expression representing the singleton set {a}.

  3. iii)

    If r1 and r2 are regular expressions, then r1r2, r1|r2 and r1* are regular expressions representing concatenation, union, and Kleene closure.

The regulated grammar consists of production rules similar as context-free grammar, but it has a larger generative capacity than context-free grammar. Regulated grammar lies between context-free and context-sensitive grammar. Context-sensitive language (such as ww, anbncn and \( a^{{2^{n} }} \)) involving interleaved dependencies can be represented using regulated grammar.

Definition 2.2

Regulated grammar is GR = (G, R), where G = (N, T, S, P) is a context-free grammar and R is the restriction applied on the derivations of strings. R depends on the type of regulated grammar [1,2,3].

Regularly controlled grammar is a rule-based regulated grammar, where the restriction on rules is applied using a regular expression.

Definition 2.3

Regularly controlled grammar [1,2,3] is GRC = (G, r), where G = (N, T, S, P) is a context-free grammar and r is a regular expression over P for controlling the derivations of strings.

Language L (GRC) represented by regularly controlled grammar GRC consists of all words w ∊ T* generated by the following sequence:

$$ S\mathop \to \limits^{{pr_{1} }} w_{1} \mathop \to \limits^{{pr_{2} }} w_{2} \cdots \mathop \to \limits^{{pr_{n} }} w \,\,{\text{such}}\,\,{\text{that}}\,\,pr_{1} , pr_{2} , \ldots , pr_{n} \in P $$

Example 2.1

Consider the regularly controlled grammar [2] GRC = (G1, r), where G1 = ({S, C, D}, {c, d}, S, P) where the production P is of the following forms:

$$ \begin{array}{*{20}l} {pr_{0} :S \to CD} \hfill & {\quad pr_{1} :C \to cC} \hfill \\ {pr_{2} :D \to cD} \hfill & { \quad pr_{3} :C \to dC} \hfill \\ {pr_{4} :D \to dD} \hfill & { \quad pr_{5} :C \to c} \hfill \\ {pr_{6} :D \to c} \hfill & { \quad pr_{7} :C \to d} \hfill \\ {pr_{8} :D \to d} \hfill & {} \hfill \\ \end{array} $$
$$ r = pr_{0} (pr_{1} pr_{2} ,pr_{3} pr_{4} )^{*} (pr_{5} pr_{6} ,pr_{7} pr_{8} ) $$

Consider w = cdcd

$$ S\mathop \to \limits^{{pr_{0} }} CD\mathop \to \limits^{{pr_{1} }} cCD\mathop \to \limits^{{pr_{2} }} cCcD\mathop \to \limits^{{pr_{7} }} cdcD\mathop \to \limits^{{pr_{8} }} cdcd $$

Hence, L(GRC) = {ww|w ∈ (c, d)+} is a non-context-free language.

Matrix grammar is a rule-based regulated grammar, where restrictions on productions are applied using matrices. Here, we have to apply all productions of a matrix before starting with another matrix.

Definition 2.4

Matrix grammar [1,2,3] is GM = (G, rm), where G = (N, T, S, M) and M = (m1, m2, …, mn), n ≥ 1 is a finite set of matrices, where each mi consists of a finite set of context-free production and rm is a regular expression over the set of matrices mi.

Example 2.2

Consider GM = (G, rm), where G = ({S, C, D}, {c, d}, S, {m0, m1, m2, m3, m4}) be the matrix grammar [2] where

$$ \begin{aligned} m_{0} & = (p_{0} :S \to CD), \\ m_{1} & = (p_{1} :C \to cC,p_{2} :D \to cD), \\ m_{2} & = (p_{3} :C \to dC,p_{4} :D \to dD), \\ m_{3} & = (p_{5} :C \to c,p_{6} :D \to c), \\ m_{4} & = (p_{7} :C \to d,p_{8} :D \to d), \\ \end{aligned} $$

The control set is rm = m0(m1, m2)*(m3, m4).

Consider w = dcdc

$$ S\mathop \to \limits^{{m_{0} :p_{0} }} CD\mathop \to \limits^{{m_{2} :p_{3} }} dCD\mathop \to \limits^{{m_{2} :p_{4} }} dCdD\mathop \to \limits^{{m_{3} :p_{5} }} dcdD\mathop \to \limits^{{m_{3} :p_{6} }} dcdc $$

Hence, L(GM) = {ww|w ∊ (c, d)+} is a non-context-free language.

Definition 2.5

Fuzzy grammar [30] is quadruple Gf = (N, T, P, S), where P is a set of fuzzy productions of the form \( \{ \alpha \mathop{\longrightarrow}\limits^{\mu }\beta |0 \le \mu \le 1 \wedge \alpha , \beta \in (N \cup T)^{*} \} \).

Definition 2.6

Fuzzy language L [30] is a fuzzy set of TL = {(x, μ(x))|x ∊ T* and 0 ≤ μ(x) ≤ 1} where μ(x) denote the degree of membership of string x.

μ(x) is obtained by taking a minimum of all derivation rules from S to derive x if a single derivation exists for the string x. If there exists more than one derivation for a particular string x, then max is taken over the minimum of all the derivation rules from starting from S to derive x.

Example 2.3

[30] Consider L = {(0, 0.8), (00, 0.4), (10, 0.2), (1, 0.2), (11, 0.6)}. Here L describes a fuzzy language over {0, 1}. Zero membership strings are not represented in L.

Definition 2.7

Fuzzy regulated grammar Gfr(GR, E, w), where

  1. i)

    GR is a regulated grammar in Chomsky normal form.

  2. ii)

    E is the error set of productions constructed from the original productions P using the replace operator (replacing a terminal symbol with another terminal in p ∊ P), add operator (adding an extra terminal in p ∊ P) and remove operator (removing a terminal in p ∊ P).

    Errors in the input string are classified into following three types:

    • Replacement Error [13, 14]: In replacement error, a terminal is substituted with another terminal symbol from T.

    • Addition Error [13, 14]: In addition error, an extra new terminal symbol is added from T. This new terminal can be placed before or after the terminal.

    • Removal Error [13, 14]: In removal error, some terminal is deleted.

  3. iii)

    w is a weighted function such that {(p, w(p))|p ∊ P ∧ w(p) ∊ [0, 1]}. It is used to assign the degree of membership to the production rules.

3 Proposed algorithm for the recognition of faulty strings in regulated grammar using fuzzy sets

In classical automata theory, a string is either accepted or rejected. Fuzzy languages close the gap between natural and formal languages. In the fuzzy set, the degree to which an element belongs to the set is referred to as the degree of membership [35]. In this section, an algorithm is proposed for the recognition of faulty strings in regulated grammar using fuzzy sets (RFSRG). Given a regularly controlled grammar GRC = (G, r). For avoiding multiple errors, in step 1, GRC = (G, r) is converted into \( G_{{RC^{\prime } }} = (G^{\prime } ,r^{\prime } ) \) where G′ is a Chomsky normal form of G. Every production \(p_{i}\,\in \,G \) is converted into productions p 1 i , p 2 i , …, p k i |k ≥ 1 of G′. Specifically, G is converted into Chomsky normal form G′ [36].

In step 2, control derivation r of GRC is converted into r′ of \( G_{{RC^{'} }} \) such that L(G′, r′) = L(G, r) − {λ}.

figure a

Example 3.1

Consider example 2.1, where the production rules P is of the following forms:

$$ \begin{array}{*{20}l} {pr_{0} :S \to CD} \hfill & {pr_{1} :C \to cC} \hfill \\ {pr_{2} :D \to cD} \hfill & {pr_{3} :C \to dC} \hfill \\ {pr_{4} :D \to dD} \hfill & {pr_{5} :C \to c} \hfill \\ {pr_{6} :D \to c} \hfill & { pr_{7} :C \to d} \hfill \\ {pr_{8} :D \to d} \hfill & {} \hfill \\ \end{array} $$
$$ r = pr_{0} (pr_{1} pr_{2} ,pr_{3} pr_{4} )^{*} (pr_{5} pr_{6} ,pr_{7} pr_{8} ) $$

Equivalent Chomsky normal form (CNF) for the grammar GRC:

$$ \begin{array}{*{20}l} {pr_{0} :S \to CD} \hfill & {\quad pr_{1}^{1} :C \to A_{1} C} \hfill \\ {pr_{1}^{2} :A_{1} \to c} \hfill & {\quad pr_{2}^{1} :D \to A_{2} D} \hfill \\ {pr_{2}^{2} :A_{2} \to c} \hfill & {\quad pr_{3}^{1} :C \to A_{3} C} \hfill \\ {pr_{3}^{2} :A_{3} \to d} \hfill & { \quad pr_{4}^{1} :D \to A_{4} D} \hfill \\ {pr_{4}^{2} :A_{4} \to d} \hfill & {\quad pr_{5} :C \to c} \hfill \\ {pr_{6} :D \to c} \hfill & { \quad pr_{7} :C \to d} \hfill \\ {pr_{8} :D \to d} \hfill & {} \hfill \\ \end{array} $$

r is converted into r′ = pr0(pr 11 pr 21 pr 12 pr 22 , pr 13 pr 23 pr 14 pr 24 )*(pr5pr6, pr7pr8).

In step 3, we apply replace, add and remove operators to the productions of the form A → a|A ∊ N and a ∊ T. Hence, there is no possibility of occurrence of multiple errors using one production. To accumulate the faulty string, replace, add and remove operators are considered. Similar attempts have been made for the recognition of faulty strings in context-free and context-sensitive grammar [13, 14].

figure b

As regularly controlled grammar is ambiguous, in step 4, different derivations for the string, w ∉ L, were generated. In one derivation of a string, total 2|w| − 1 productions are used. In each derivation of a string, w, the number of productions of the form A → BC|A, B, C ∊ N can be |w| − 1 and the number of productions of the form A → a|A ∊ N ∧ a ∊ T can be |w|.

In step 5, a fuzzy set E is used for determining the number of erroneous substitutions. Membership of a faulty string is determined by \( \mu_{E} (w) = \frac{{|w| - |E_{p} |}}{|w|} \) where |w| represents the length of the string, w, and |Ep| represents the number of error productions used for the generation of string w.

A certainty factor was defined that can be used for the acceptance and rejection of strings. In step 6, a fuzzy set Tf is used for determining the derivation with minimal production rules utilized to derive the faulty string, w. Membership of a faulty string is determined by:

$$ \mu_{{T_{f} }} (w) = 1 - S\,\, {\text{shaped}}\,\,{\text{membership}}\,\,{\text{function}} $$
$$ \mu_{{T_{f} }} (w) = 1 - \left\{ {\begin{array}{*{20}c} {0,} & {x \le a} \\ {2\left( {\frac{x - a}{b - a}} \right)^{2} ,} & {a < x \le (a + b)/2} \\ {1 - 2\left( {\frac{x - b}{b - a}} \right)^{2} ,} & {(a + b)/2 < x < b} \\ {1,} & {x \ge b} \\ \end{array} } \right\} $$
(1)

where a and b represent the minimum and maximum number of productions used to derive the faulty string, w, in all derivations, respectively. x denotes the actual number of productions used in a particular derivation of the faulty string, w. In this a and b plots the extreme points of the sloped curve, therefore making it S-shape. In step 7, the optimal derivation for the string, w, was chosen. The optimal derivation was found by choosing the highest degree of membership obtained from E-set and Tf-set.

figure c

If a matrix grammar GM = (G, rm) is given instead of regularly controlled grammar. During, the conversion of G into Chomsky normal form G′, every production  \(p_{i}\,\in \,G \) is replaced by production p 1 i , p 2 i , …p k i |k ≥ 1 of G′. In step 2 of RFSRG, we call procedure Matrix_Control_derivation (GM, GM′). The rest of the algorithm remains the same.

figure d

Example 3.2

Consider GM = (G, rm), where G = ({S, C, D}, {c, d}, S, {m0, m1, m2, m3, m4}) be the matrix grammar of example 2.2 where

$$ \begin{aligned} m_{0} & = (p_{0} :S \to CD), \\ m_{1} & = (p_{1} :C \to cC,p_{2} :D \to cD), \\ m_{2} & = (p_{3} :C \to dC,p_{4} :D \to dD), \\ m_{3} & = (p_{5} :C \to c,p_{6} :D \to c), \\ m_{4} & = (p_{7} :C \to d,p_{8} :D \to d), \\ \end{aligned} $$

The control set is rm = m0(m1, m2)*(m3, m4).

Equivalent CNF for the grammar GM:

$$ \begin{aligned} m_{0}^{\prime } &= (p_{0} :S \to CD), \hfill \\ m_{1}^{\prime }& = (p_{1}^{1} :C \to A_{1} C,p_{1}^{2} :A_{1} \to c,p_{2}^{1} :D \to A_{2} D,p_{2}^{2} :A_{2} \to c), \hfill \\ m_{2}^{\prime } &= (p_{3}^{1} :C \to A_{3} C,p_{3}^{2} :A_{3} \to d,p_{4}^{1} :D \to A_{4} D,p_{4}^{2} :A_{4} \to d), \hfill \\ m_{3}^{\prime } &= (p_{5} :C \to c,p_{6} :D \to c), \hfill \\ m_{4}^{\prime } &= (p_{7} :C \to d,p_{8} :D \to d) \hfill \\ \end{aligned} $$

The control set is rm′ is m0′(m1′, m2′)*(m3′, m4′).

Theorem 1

Given a regulated grammar GR and string w. The time complexity of the RFSRG algorithm is equal to O(|G 2 R |·|w|), where |GR| represents the size of productions and |w| represents the length of the string respectively.

Proof

In Step1 of RFSRG algorithm, regulated grammar GR is converted into Chomsky Normal form \(G_{R^{\prime}}\) using DEL, TERM, UNIT and BIN operations [36]. Size of \(G_{R^{\prime}} \)i.e. | \(G_{R^{\prime}}\)| depends on the order in which DEL, BIN, UNIT and TERM are carried out [36]. TERM always give a linear increase in the size of the grammar. If we carry out the transformation in BIN → DEL → UNIT, then |\(G_{R^{\prime}}\)| = O(|G 2 R |). In step 2, the control derivation GR is converted into \(G_{R^{\prime}}\) . Each production pi ∊ GR may be converted into a number of productions say pr 1 i , pr 2 i , …, pr k i |k ≥ 1 in \(G_{R^{\prime}}\) and each substitution of pi in control derivation r gives the corresponding control derivation r′. Each substitution can be carried out in O(|1|). Therefore, the control derivation r can be converted into r′ in O(|GR|).

In step 3, we add additional productions using replace, add and remove operators. Thus, the number of productions is of the size O(|G 2 R |). For each production, four faulty productions (one each by replace and remove operation and two productions by add operations) are generated. Addition of each faulty productions can be carried out in O(|1|), thereby giving the overall complexity of the second step is O(|G 2 R |).

In each derivation of a string, w, the number of productions of the form A → BC|A, B, C ∊ N can be |w| − 1 and the number of productions of the form A → a|A ∊ N ∧ a ∊ T can be |w|. Therefore total 2|w| − 1 productions are used in one derivation of a string. Considering at each step of the derivation, we can choose a production from O(|G 2 R |) productions. Hence the overall complexity is O(|G 2 R |·|w|). Step 5 can be carried out in O(|1|). In step 6, for finding a, b and x values, complexity is O(|G 2 R |) and further computation of \( \mu_{{T_{f} }} (w) \) can be carried out in O(|1|). So overall complexity of Step 6 is O(|G 2 R |). Step 7 can be carried out in O(|1|). Thus the overall complexity of the algorithm is O(|G 2 R |·|w|).

The proposed algorithm works on all rule-based regulated grammars with little modifications in applications of rules.

4 Numerical examples

In this section, we apply RFSRG algorithm to regularly controlled grammar.

Example 4.1

Consider the regularly controlled grammar of example 2.1 GRC = (G1, r), where G1 = ({S, C, D}, {c, d}, S, P) where the production P is of the following forms:

$$ \begin{array}{*{20}l} {pr_{0} :S \to CD} \hfill & {\quad pr_{1} :C \to cC} \hfill \\ {pr_{2} :D \to cD} \hfill & {\quad pr_{3} :C \to dC} \hfill \\ {pr_{4} :D \to dD} \hfill & {\quad pr_{5} :C \to c} \hfill \\ {pr_{6} :D \to c} \hfill & {\quad pr_{7} :C \to d} \hfill \\ {pr_{8} :D \to d} \hfill & {} \hfill \\ \end{array} $$
$$ r = pr_{0} (pr_{1} pr_{2} ,pr_{3} pr_{4} )^{*} (pr_{5} pr_{6} ,pr_{7} pr_{8} ) $$

Equivalent Chomsky normal form (CNF) for the grammar GRC:

$$ \begin{array}{*{20}l} {pr_{0} :S \to CD} \hfill & {\quad pr_{1}^{1} :C \to A_{1} C} \hfill \\ {pr_{1}^{2} :A_{1} \to c} \hfill & {\quad pr_{2}^{1} :D \to A_{2} D} \hfill \\ {pr_{2}^{2} :A_{2} \to c} \hfill & {\quad pr_{3}^{1} :C \to A_{3} C} \hfill \\ {pr_{3}^{2} :A_{3} \to d} \hfill & {\quad pr_{4}^{1} :D \to A_{4} D} \hfill \\ {pr_{4}^{2} :A_{4} \to d} \hfill & {\quad pr_{5} :C \to c} \hfill \\ {pr_{6} :D \to c} \hfill & {\quad pr_{7} :C \to d} \hfill \\ {pr_{8} :D \to d} \hfill & \quad \hfill \\ \end{array} $$

r is converted into r′ = pr0(pr 11 pr 21 pr 12 pr 22 , pr 13 pr 23 pr 14 pr 24 )*(pr5pr6, pr7pr8).

Using fuzzy replace operator:

$$ \begin{array}{*{20}l} {pr_{9} :A_{1} \to \_} \hfill & {\quad pr_{10} :A_{2} \to \_} \hfill \\ {pr_{11} :A_{3} \to \_} \hfill & {\quad pr_{12} :A_{4} \to \_} \hfill \\ {pr_{13} :C \to \_} \hfill & {\quad pr_{14} :D \to \_} \hfill \\ {pr_{15} :C \to \_} \hfill & {\quad pr_{16} :D \to \_} \hfill \\ \end{array} $$

Using fuzzy add operator:

$$ \begin{array}{*{20}l} {pr_{17} :A_{1} \to \_c|c\_} \hfill & {\quad pr_{18} :A_{2} \to \_c|c\_} \hfill \\ {pr_{19} :A_{3} \to \_d|d\_} \hfill & {\quad pr_{20} :A_{4} \to \_d|d\_} \hfill \\ {pr_{21} :C \to \_c|c\_} \hfill & {\quad pr_{22} :D \to \_c|c\_} \hfill \\ {pr_{23} :C \to \_d|d\_} \hfill & {\quad pr_{24} :D \to \_d|d\_} \hfill \\ \end{array} $$

Using fuzzy remove operator:

$$ \begin{array}{*{20}l} {pr_{25} :A_{1} \to \lambda } \hfill & {\quad pr_{26} :A_{2} \to \lambda } \hfill \\ {pr_{27} :A_{3} \to \lambda } \hfill & {\quad pr_{28} :A_{4} \to \lambda } \hfill \\ {pr_{29} :C \to \lambda } \hfill & {\quad pr_{30} :D \to \lambda } \hfill \\ {pr_{31} :C \to \lambda } \hfill & {\quad pr_{32} :D \to \lambda } \hfill \\ \end{array} $$

Consider the string \( w = ddddcd \notin L \). Table 1 depicts some ways for deriving the string w.

Table 1 Various derivation for the string w = ddddcd.

Table 2 represents the confidence level for the string w using E-set.

Table 2 Confidence level for the string w using E-set.

Table 3 represents the confidence level for the string w using Tf-set. Table 4 represents the final interpretations with fuzzy confidence for the string \( w = ddddcd \).

The optimal derivation for the string \( w = ddddcd \) is as follows:

$$ \mu (w) = \hbox{max} (0,\;0,\,0.6667,\;0,\;0,\;0,\;0,\;0,\;0.3333) = 0.6667 $$
Table 3 Confidence level for the string w using Tf-set.
Table 4 Final interpretations with fuzzy confidence for the string w = ddddcd.

By choosing a certain level of confidence λc, we say the string is accepted if μ(w) ≥ λc. Application of RFSRG algorithm to matrix grammar (for details refer “Appendix 1”).

5 Real -life application of RFSRG algorithm in the Hindi language

Regulated languages are the most appropriate formal method for representing human languages.

In linguistic topology, the Hindi language contains cross-serial dependency, and its structure is similar to the German and Dutch languages. Consider the following sentence of the Hindi Language spoken in India.

figure e

In the above sentences, sequential noun phrases मोहन (Mohan), सेब (apple), राम (Ram), पेन (pen) and the verb phrases ख़ाया (ate), खराब हो गया (rotten), लिखा (written), उपहार मे मिला था (gifted) both form two separate series of constituents. Let m′ be the morphism which maps to मोहन and ख़ाया to cसेब and खराब हो गया to d and all other words of the sentence to the empty string. Therefore, S1:cdcd. Therefore, sentence S1 is of the form ww, which is a non-context-free language.

Consider m″ be the morphism, which maps राम, पेन, लिखा, उपहार मे मिला था to a, b, c and d, respectively and the dependency relation between राम (Ram) and लिखा (written) is denoted by m and पेन (pen) and उपहार मे मिला था (gifted) by n and other words of the sentence by empty string. Therefore, sentence S2 is of the form ambncmdn, which is a non-context-free language. Cross-serial dependency of the Hindi language can be represented by regulated grammar.

Now consider the sentence S1 and If the users unintentionally write the S1 sentence as

  • S1′: मोहन ने वो सेब ख़ाया जो सेब खराब हो गया|

In this sentence, the user makes an insertion error of one noun phrase सेब (apple). Now the sentence S1′ maps to cdcdd instead of cdcd. In formal grammar, this string is completely rejected. However, in case of fuzzy regulated grammar, we cannot completely reject the string w. By considering the grammar of example 4.1, Erroneous sentence acceptance S1′ is shown using proposed algorithm RFSRG. Table 5 depicts few ways for deriving the string w = cdcdd.

Table 5 Various derivation for the string w = cdcdd.

Table 6 represents the confidence level for the string w using E-set.

Table 6 Confidence level for the string w using E-set.

Table 7 represents the confidence level for the string w using Tf-set. Table 8 represents the final interpretations with fuzzy confidence for the string \( w = cdcdd \).

The optimal derivation for the string \( w = cdcdd \) is as follows:

$$ \mu (w) = \hbox{max} (0.75;\;0.75;\;0.5;\;0.5) = 0.75 $$
Table 7 Confidence level for the string w using Tf-set.
Table 8 Final interpretations with fuzzy confidence for the string w = cdcdd.

By choosing a certain level of confidence λc, we say the string is accepted if μ(w) ≥ λc.

6 Conclusions

In this paper, an algorithm for deriving a string, w ∉ L, with a certain level of confidence is described. Regulated grammar is converted to Chomsky normal form to avoid multiple error elimination. Furthermore, the algorithm is applied to regularly controlled and matrix grammar. The time complexity of the proposed algorithm is O(|G 2 R |·|w|). By way of numerical examples for a string, w ∉ L, the determination of a confidence level for the string, w, is also demonstrated Further, the real-life application of a proposed algorithm for the recognition of faulty strings for the natural language, Hindi, is exhibited.