Error tolerance for the recognition of faulty strings in a regulated grammar using fuzzy sets

Kumar, Ajay; Kalra, Nidhi; Garhwal, Sunita

doi:10.1007/s12046-018-0833-y

Error tolerance for the recognition of faulty strings in a regulated grammar using fuzzy sets

Published: 10 July 2018

Volume 43, article number 134, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Sādhanā Aims and scope Submit manuscript

Error tolerance for the recognition of faulty strings in a regulated grammar using fuzzy sets

Download PDF

Ajay Kumar¹,
Nidhi Kalra¹ &
Sunita Garhwal¹

152 Accesses
1 Citation
Explore all metrics

Abstract

To overcome the limitations of context-free and context-sensitive grammars, regulated grammars have been proposed. In this paper, an algorithm is proposed for the recognition of faulty strings in regulated grammar. Furthermore, depending on the errors and certainty, it is decided whether the string belongs to the language or not based on string membership value. The time complexity of the proposed algorithm is O(|G ²_R |·|w|), where |G_R| represents the number of production rules and |w| is the length of the input string, w. The reader is provided with numerical examples by applying the algorithm to regularly controlled and matrix grammar. Finally, the proposed algorithm is applied in the Hindi language for the recognition of faulty strings in regulated grammar as a real-life application.

Analysis of Fuzzy String Patterns with the Help of Syntactic Pattern Recognition

Interval type-2 fuzzy automata and Interval type-2 fuzzy grammar

Article 16 June 2021

Automatic recognition of machine English translation errors using fuzzy set algorithm

Article 09 January 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Context-free grammars are not able to cover all aspects of linguistic phenomena. The membership of a string in a context-sensitive grammar is decided in exponential time. To overcome the limitations of context-free and context-sensitive grammars, the concept of regulated grammar [1,2,3] was introduced. Regulated grammar is a class of grammar in which certain restrictions are imposed on context-free productions, making them more powerful than context-free grammar. Regulated grammar is classified into context and rule-based grammar. In context-based regulated grammar, restrictions are imposed on the context, whereas in rule-based regulated grammar, the restrictions are imposed on the application of rules.

In formal languages, the input string is either accepted (μ(w) = 1) or rejected (μ(w) = 0), where μ denotes the membership of a string, w. To reduce the gap between natural and classical formal languages, fuzziness has been introduced into formal languages, thereby giving rise to fuzzy languages. In recent years, fuzzy languages have been applied in various areas, such as intelligent interface design [4, 5], clinical monitoring [6], neural networks [7, 8] and pattern recognition [9]. Various fuzzy inference methods [10,11,12] are used in the design of fuzzy systems.

Motivated by the prior work of recognition of imperfect strings in context-free grammar [13] and context-sensitive grammar [14], in this paper, the recognition of faulty strings generated by the regulated grammar is described. As well, the problem of single and multiple fault occurrences is discussed. Faulty string refers to some imperfection caused in the string. Single fault refers to imperfection at one position in a string, whereas multiple faults refer to imperfection at more than one position in a string. Here, fuzzy sets are used to compute the membership of strings that do not exactly match with the grammar.

1.1 Previous work

Regularly controlled and matrix grammar were proposed by Ginsburg et al [15] and Cremers et al [16], respectively. In regularly controlled grammar, the control is regulated by the sequence of productions in the regular expression. In matrix grammar, control is regulated by following the same sequence given in the matrix. Solar and Meduna [17] proposed a state translation scheme, which is an extended version of state grammar [18], for producing two output strings in a single derivation. Meduna and Zemek [19] introduced one-sided random context grammars, a variant of random context grammar [20]. Furthermore, Meduna and Zemek [21] introduced one-sided random context grammars, with a limited number of right random context rules. Meduna [22] put forth generalized forbidding grammar. Furthermore, Meduna and Zemek [23] also introduced one-sided forbidding grammars, where each rule is further divided into left and right forbidden rules, and each rule consists of a finite set of forbidden strings. Meduna and Zemek [24] established that the power of one-sided forbidding grammars and selective substitution grammars [25, 26] are equivalent in the case of erasing rule grammar. Kalra and Kumar [27, 28] introduced the concept of deterministic deep pushdown automata and fuzziness in deep pushdown automata. Garhwal and Jiwari [29] introduced the concept of fuzziness in parallel regular expression.

The concept of fuzzy languages, a fuzzy subset of strings over an alphabet, was introduced by Lee and Zadeh [30]. Asveld [31] proposed fuzzy context-free K-grammar for describing correct and erroneous sentences. Asveld [32] devised the modified Cocke–Younger–Kasami (CYK) algorithm for recognizing fuzzy context-free grammar. Additionally, Schinder et al [13] described an approach for recognizing imperfect strings generated by the context-free grammar. Inui et al [14] expanded on the work of Schinder et al [13], describing an approach for identifying imperfect strings generated by the context-sensitive grammar. Schinder et al [13] and Inui et al [14] applied the concept of fuzzy sets to determine the membership values of strings. Here, the concept of errors and fuzzy sets in this context is utilized which was earlier used by Schinder et al [13] and Inui et al [14] regarding context-free grammar and context-sensitive grammar. Recently, Zhanga et al [33] and Bag et al [34] have applied the concept of pattern recognition in intelligent computing systems and prediction of consumer intention.

After introducing preliminary concepts in section 2, the following topics have been explained.

Recognition of faulty strings generated by regulated grammar using the different fuzzy sets has been proposed in section 3.
The problem of occurrence of multiple faults has been solved by converting the regulated grammar into Chomsky normal form in section 3.
The complete procedure is depicted by numerical example in section 4.
The proposed algorithm has been applied on the Hindi language to demonstrate its real- life application in section 5 and the conclusions are provided in section 6.

2 Preliminaries

Let T be an alphabet, λ denotes empty string and μ(w) denotes the membership of a string such that 0 ≤ μ(w) ≤ 1. _ represents the symbol that can be replaced by any symbol from T.

Definition 2.1

Regular expression over an alphabet T is defined using following rules:

i)
ϕ and λ are regular expressions representing the empty set and empty string.
ii)
If a ∊ T, then a is a regular expression representing the singleton set {a}.
iii)
If r₁ and r₂ are regular expressions, then r₁r₂, r₁|r₂ and r₁* are regular expressions representing concatenation, union, and Kleene closure.

The regulated grammar consists of production rules similar as context-free grammar, but it has a larger generative capacity than context-free grammar. Regulated grammar lies between context-free and context-sensitive grammar. Context-sensitive language (such as ww, aⁿbⁿcⁿ and $ a^{{2^{n} }} $) involving interleaved dependencies can be represented using regulated grammar.

Definition 2.2

Regulated grammar is G_R = (G, R), where G = (N, T, S, P) is a context-free grammar and R is the restriction applied on the derivations of strings. R depends on the type of regulated grammar [1,2,3].

Regularly controlled grammar is a rule-based regulated grammar, where the restriction on rules is applied using a regular expression.

Definition 2.3

Regularly controlled grammar [1,2,3] is G_RC = (G, r), where G = (N, T, S, P) is a context-free grammar and r is a regular expression over P for controlling the derivations of strings.

Language L (G_RC) represented by regularly controlled grammar G_RC consists of all words w ∊ T* generated by the following sequence:

$$ S\mathop \to \limits^{{pr_{1} }} w_{1} \mathop \to \limits^{{pr_{2} }} w_{2} \cdots \mathop \to \limits^{{pr_{n} }} w \,\,{\text{such}}\,\,{\text{that}}\,\,pr_{1} , pr_{2} , \ldots , pr_{n} \in P $$

Example 2.1

Consider the regularly controlled grammar [2] G_RC = (G₁, r), where G₁ = ({S, C, D}, {c, d}, S, P) where the production P is of the following forms:

$$ \begin{array}{*{20}l} {pr_{0} :S \to CD} \hfill & {\quad pr_{1} :C \to cC} \hfill \\ {pr_{2} :D \to cD} \hfill & { \quad pr_{3} :C \to dC} \hfill \\ {pr_{4} :D \to dD} \hfill & { \quad pr_{5} :C \to c} \hfill \\ {pr_{6} :D \to c} \hfill & { \quad pr_{7} :C \to d} \hfill \\ {pr_{8} :D \to d} \hfill & {} \hfill \\ \end{array} $$

$$ r = pr_{0} (pr_{1} pr_{2} ,pr_{3} pr_{4} )^{*} (pr_{5} pr_{6} ,pr_{7} pr_{8} ) $$

Consider w = cdcd

$$ S\mathop \to \limits^{{pr_{0} }} CD\mathop \to \limits^{{pr_{1} }} cCD\mathop \to \limits^{{pr_{2} }} cCcD\mathop \to \limits^{{pr_{7} }} cdcD\mathop \to \limits^{{pr_{8} }} cdcd $$

Hence, L(G_RC) = {ww|w ∈ (c, d)⁺} is a non-context-free language.

Matrix grammar is a rule-based regulated grammar, where restrictions on productions are applied using matrices. Here, we have to apply all productions of a matrix before starting with another matrix.

Definition 2.4

Matrix grammar [1,2,3] is G_M = (G, r_m), where G = (N, T, S, M) and M = (m₁, m₂, …, m_n), n ≥ 1 is a finite set of matrices, where each m_i consists of a finite set of context-free production and r_m is a regular expression over the set of matrices m_i.

Example 2.2

Consider G_M = (G, r_m), where G = ({S, C, D}, {c, d}, S, {m₀, m₁, m₂, m₃, m₄}) be the matrix grammar [2] where

$$ \begin{aligned} m_{0} & = (p_{0} :S \to CD), \\ m_{1} & = (p_{1} :C \to cC,p_{2} :D \to cD), \\ m_{2} & = (p_{3} :C \to dC,p_{4} :D \to dD), \\ m_{3} & = (p_{5} :C \to c,p_{6} :D \to c), \\ m_{4} & = (p_{7} :C \to d,p_{8} :D \to d), \\ \end{aligned} $$

The control set is r_m = m₀(m₁, m₂)*(m₃, m₄).

Consider w = dcdc

$$ S\mathop \to \limits^{{m_{0} :p_{0} }} CD\mathop \to \limits^{{m_{2} :p_{3} }} dCD\mathop \to \limits^{{m_{2} :p_{4} }} dCdD\mathop \to \limits^{{m_{3} :p_{5} }} dcdD\mathop \to \limits^{{m_{3} :p_{6} }} dcdc $$

Hence, L(G_M) = {ww|w ∊ (c, d)⁺} is a non-context-free language.

Definition 2.5

Fuzzy grammar [30] is quadruple G_f = (N, T, P, S), where P is a set of fuzzy productions of the form $ \{ \alpha \mathop{\longrightarrow}\limits^{\mu }\beta |0 \le \mu \le 1 \wedge \alpha , \beta \in (N \cup T)^{*} \} $.

Definition 2.6

Fuzzy language L [30] is a fuzzy set of T*·L = {(x, μ(x))|x ∊ T* and 0 ≤ μ(x) ≤ 1} where μ(x) denote the degree of membership of string x.

μ(x) is obtained by taking a minimum of all derivation rules from S to derive x if a single derivation exists for the string x. If there exists more than one derivation for a particular string x, then max is taken over the minimum of all the derivation rules from starting from S to derive x.

Example 2.3

[30] Consider L = {(0, 0.8), (00, 0.4), (10, 0.2), (1, 0.2), (11, 0.6)}. Here L describes a fuzzy language over {0, 1}. Zero membership strings are not represented in L.

Definition 2.7

Fuzzy regulated grammar G_fr(G_R, E, w), where

i)
G_R is a regulated grammar in Chomsky normal form.
ii)
E is the error set of productions constructed from the original productions P using the replace operator (replacing a terminal symbol with another terminal in p ∊ P), add operator (adding an extra terminal in p ∊ P) and remove operator (removing a terminal in p ∊ P).

Errors in the input string are classified into following three types:
- Replacement Error [13, 14]: In replacement error, a terminal is substituted with another terminal symbol from T.
- Addition Error [13, 14]: In addition error, an extra new terminal symbol is added from T. This new terminal can be placed before or after the terminal.
- Removal Error [13, 14]: In removal error, some terminal is deleted.
iii)
w is a weighted function such that {(p, w(p))|p ∊ P ∧ w(p) ∊ [0, 1]}. It is used to assign the degree of membership to the production rules.

3 Proposed algorithm for the recognition of faulty strings in regulated grammar using fuzzy sets

In classical automata theory, a string is either accepted or rejected. Fuzzy languages close the gap between natural and formal languages. In the fuzzy set, the degree to which an element belongs to the set is referred to as the degree of membership [35]. In this section, an algorithm is proposed for the recognition of faulty strings in regulated grammar using fuzzy sets (RFSRG). Given a regularly controlled grammar G_RC = (G, r). For avoiding multiple errors, in step 1, G_RC = (G, r) is converted into $ G_{{RC^{\prime } }} = (G^{\prime } ,r^{\prime } ) $ where G′ is a Chomsky normal form of G. Every production $p_{i}\,\in \,G $ is converted into productions p ¹_i , p ²_i , …, p ^k_i |k ≥ 1 of G′. Specifically, G is converted into Chomsky normal form G′ [36].

In step 2, control derivation r of G_RC is converted into r′ of $ G_{{RC^{'} }} $ such that L(G′, r′) = L(G, r) − {λ}.

Example 3.1

Consider example 2.1, where the production rules P is of the following forms:

$$ \begin{array}{*{20}l} {pr_{0} :S \to CD} \hfill & {pr_{1} :C \to cC} \hfill \\ {pr_{2} :D \to cD} \hfill & {pr_{3} :C \to dC} \hfill \\ {pr_{4} :D \to dD} \hfill & {pr_{5} :C \to c} \hfill \\ {pr_{6} :D \to c} \hfill & { pr_{7} :C \to d} \hfill \\ {pr_{8} :D \to d} \hfill & {} \hfill \\ \end{array} $$

$$ r = pr_{0} (pr_{1} pr_{2} ,pr_{3} pr_{4} )^{*} (pr_{5} pr_{6} ,pr_{7} pr_{8} ) $$

Equivalent Chomsky normal form (CNF) for the grammar G_RC:

$$ \begin{array}{*{20}l} {pr_{0} :S \to CD} \hfill & {\quad pr_{1}^{1} :C \to A_{1} C} \hfill \\ {pr_{1}^{2} :A_{1} \to c} \hfill & {\quad pr_{2}^{1} :D \to A_{2} D} \hfill \\ {pr_{2}^{2} :A_{2} \to c} \hfill & {\quad pr_{3}^{1} :C \to A_{3} C} \hfill \\ {pr_{3}^{2} :A_{3} \to d} \hfill & { \quad pr_{4}^{1} :D \to A_{4} D} \hfill \\ {pr_{4}^{2} :A_{4} \to d} \hfill & {\quad pr_{5} :C \to c} \hfill \\ {pr_{6} :D \to c} \hfill & { \quad pr_{7} :C \to d} \hfill \\ {pr_{8} :D \to d} \hfill & {} \hfill \\ \end{array} $$

r is converted into r′ = pr₀(pr ¹₁ pr ²₁ pr ¹₂ pr ²₂ , pr ¹₃ pr ²₃ pr ¹₄ pr ²₄ )*(pr₅pr₆, pr₇pr₈).

In step 3, we apply replace, add and remove operators to the productions of the form A → a|A ∊ N and a ∊ T. Hence, there is no possibility of occurrence of multiple errors using one production. To accumulate the faulty string, replace, add and remove operators are considered. Similar attempts have been made for the recognition of faulty strings in context-free and context-sensitive grammar [13, 14].

As regularly controlled grammar is ambiguous, in step 4, different derivations for the string, w ∉ L, were generated. In one derivation of a string, total 2|w| − 1 productions are used. In each derivation of a string, w, the number of productions of the form A → BC|A, B, C ∊ N can be |w| − 1 and the number of productions of the form A → a|A ∊ N ∧ a ∊ T can be |w|.

In step 5, a fuzzy set E is used for determining the number of erroneous substitutions. Membership of a faulty string is determined by $ \mu_{E} (w) = \frac{{|w| - |E_{p} |}}{|w|} $ where |w| represents the length of the string, w, and |E_p| represents the number of error productions used for the generation of string w.

A certainty factor was defined that can be used for the acceptance and rejection of strings. In step 6, a fuzzy set T_f is used for determining the derivation with minimal production rules utilized to derive the faulty string, w. Membership of a faulty string is determined by:

$$ \mu_{{T_{f} }} (w) = 1 - S\,\, {\text{shaped}}\,\,{\text{membership}}\,\,{\text{function}} $$

$$ \mu_{{T_{f} }} (w) = 1 - \left\{ {\begin{array}{*{20}c} {0,} & {x \le a} \\ {2\left( {\frac{x - a}{b - a}} \right)^{2} ,} & {a < x \le (a + b)/2} \\ {1 - 2\left( {\frac{x - b}{b - a}} \right)^{2} ,} & {(a + b)/2 < x < b} \\ {1,} & {x \ge b} \\ \end{array} } \right\} $$

(1)

where a and b represent the minimum and maximum number of productions used to derive the faulty string, w, in all derivations, respectively. x denotes the actual number of productions used in a particular derivation of the faulty string, w. In this a and b plots the extreme points of the sloped curve, therefore making it S-shape. In step 7, the optimal derivation for the string, w, was chosen. The optimal derivation was found by choosing the highest degree of membership obtained from E-set and T_f-set.

If a matrix grammar G_M = (G, r_m) is given instead of regularly controlled grammar. During, the conversion of G into Chomsky normal form G′, every production $p_{i}\,\in \,G $ is replaced by production p ¹_i , p ²_i , …p ^k_i |k ≥ 1 of G′. In step 2 of RFSRG, we call procedure Matrix_Control_derivation (G_M, G_M′). The rest of the algorithm remains the same.

Example 3.2

Consider G_M = (G, r_m), where G = ({S, C, D}, {c, d}, S, {m₀, m₁, m₂, m₃, m₄}) be the matrix grammar of example 2.2 where

$$ \begin{aligned} m_{0} & = (p_{0} :S \to CD), \\ m_{1} & = (p_{1} :C \to cC,p_{2} :D \to cD), \\ m_{2} & = (p_{3} :C \to dC,p_{4} :D \to dD), \\ m_{3} & = (p_{5} :C \to c,p_{6} :D \to c), \\ m_{4} & = (p_{7} :C \to d,p_{8} :D \to d), \\ \end{aligned} $$

The control set is r_m = m₀(m₁, m₂)*(m₃, m₄).

Equivalent CNF for the grammar G_M:

$$ \begin{aligned} m_{0}^{\prime } &= (p_{0} :S \to CD), \hfill \\ m_{1}^{\prime }& = (p_{1}^{1} :C \to A_{1} C,p_{1}^{2} :A_{1} \to c,p_{2}^{1} :D \to A_{2} D,p_{2}^{2} :A_{2} \to c), \hfill \\ m_{2}^{\prime } &= (p_{3}^{1} :C \to A_{3} C,p_{3}^{2} :A_{3} \to d,p_{4}^{1} :D \to A_{4} D,p_{4}^{2} :A_{4} \to d), \hfill \\ m_{3}^{\prime } &= (p_{5} :C \to c,p_{6} :D \to c), \hfill \\ m_{4}^{\prime } &= (p_{7} :C \to d,p_{8} :D \to d) \hfill \\ \end{aligned} $$

The control set is r_m′ is m₀′(m₁′, m₂′)*(m₃′, m₄′).

Theorem 1

Given a regulated grammar G_R and string w. The time complexity of the RFSRG algorithm is equal to O(|G ²_R |·|w|), where |G_R| represents the size of productions and |w| represents the length of the string respectively.

Proof

In Step1 of RFSRG algorithm, regulated grammar G_R is converted into Chomsky Normal form $G_{R^{\prime}}$ using DEL, TERM, UNIT and BIN operations [36]. Size of $G_{R^{\prime}} $i.e. | $G_{R^{\prime}}$| depends on the order in which DEL, BIN, UNIT and TERM are carried out [36]. TERM always give a linear increase in the size of the grammar. If we carry out the transformation in BIN → DEL → UNIT, then |$G_{R^{\prime}}$| = O(|G ²_R |). In step 2, the control derivation G_R is converted into $G_{R^{\prime}}$ . Each production p_i ∊ G_R may be converted into a number of productions say pr ¹_i , pr ²_i , …, pr ^k_i |k ≥ 1 in $G_{R^{\prime}}$ and each substitution of p_i in control derivation r gives the corresponding control derivation r′. Each substitution can be carried out in O(|1|). Therefore, the control derivation r can be converted into r′ in O(|G_R|).

In step 3, we add additional productions using replace, add and remove operators. Thus, the number of productions is of the size O(|G ²_R |). For each production, four faulty productions (one each by replace and remove operation and two productions by add operations) are generated. Addition of each faulty productions can be carried out in O(|1|), thereby giving the overall complexity of the second step is O(|G ²_R |).

In each derivation of a string, w, the number of productions of the form A → BC|A, B, C ∊ N can be |w| − 1 and the number of productions of the form A → a|A ∊ N ∧ a ∊ T can be |w|. Therefore total 2|w| − 1 productions are used in one derivation of a string. Considering at each step of the derivation, we can choose a production from O(|G ²_R |) productions. Hence the overall complexity is O(|G ²_R |·|w|). Step 5 can be carried out in O(|1|). In step 6, for finding a, b and x values, complexity is O(|G ²_R |) and further computation of $ \mu_{{T_{f} }} (w) $ can be carried out in O(|1|). So overall complexity of Step 6 is O(|G ²_R |). Step 7 can be carried out in O(|1|). Thus the overall complexity of the algorithm is O(|G ²_R |·|w|).

The proposed algorithm works on all rule-based regulated grammars with little modifications in applications of rules.

4 Numerical examples

In this section, we apply RFSRG algorithm to regularly controlled grammar.

Example 4.1

Consider the regularly controlled grammar of example 2.1 G_RC = (G₁, r), where G₁ = ({S, C, D}, {c, d}, S, P) where the production P is of the following forms:

$$ \begin{array}{*{20}l} {pr_{0} :S \to CD} \hfill & {\quad pr_{1} :C \to cC} \hfill \\ {pr_{2} :D \to cD} \hfill & {\quad pr_{3} :C \to dC} \hfill \\ {pr_{4} :D \to dD} \hfill & {\quad pr_{5} :C \to c} \hfill \\ {pr_{6} :D \to c} \hfill & {\quad pr_{7} :C \to d} \hfill \\ {pr_{8} :D \to d} \hfill & {} \hfill \\ \end{array} $$

$$ r = pr_{0} (pr_{1} pr_{2} ,pr_{3} pr_{4} )^{*} (pr_{5} pr_{6} ,pr_{7} pr_{8} ) $$

Equivalent Chomsky normal form (CNF) for the grammar G_RC:

$$ \begin{array}{*{20}l} {pr_{0} :S \to CD} \hfill & {\quad pr_{1}^{1} :C \to A_{1} C} \hfill \\ {pr_{1}^{2} :A_{1} \to c} \hfill & {\quad pr_{2}^{1} :D \to A_{2} D} \hfill \\ {pr_{2}^{2} :A_{2} \to c} \hfill & {\quad pr_{3}^{1} :C \to A_{3} C} \hfill \\ {pr_{3}^{2} :A_{3} \to d} \hfill & {\quad pr_{4}^{1} :D \to A_{4} D} \hfill \\ {pr_{4}^{2} :A_{4} \to d} \hfill & {\quad pr_{5} :C \to c} \hfill \\ {pr_{6} :D \to c} \hfill & {\quad pr_{7} :C \to d} \hfill \\ {pr_{8} :D \to d} \hfill & \quad \hfill \\ \end{array} $$

r is converted into r′ = pr₀(pr ¹₁ pr ²₁ pr ¹₂ pr ²₂ , pr ¹₃ pr ²₃ pr ¹₄ pr ²₄ )*(pr₅pr₆, pr₇pr₈).

Using fuzzy replace operator:

$$ \begin{array}{*{20}l} {pr_{9} :A_{1} \to \_} \hfill & {\quad pr_{10} :A_{2} \to \_} \hfill \\ {pr_{11} :A_{3} \to \_} \hfill & {\quad pr_{12} :A_{4} \to \_} \hfill \\ {pr_{13} :C \to \_} \hfill & {\quad pr_{14} :D \to \_} \hfill \\ {pr_{15} :C \to \_} \hfill & {\quad pr_{16} :D \to \_} \hfill \\ \end{array} $$

Using fuzzy add operator:

$$ \begin{array}{*{20}l} {pr_{17} :A_{1} \to \_c|c\_} \hfill & {\quad pr_{18} :A_{2} \to \_c|c\_} \hfill \\ {pr_{19} :A_{3} \to \_d|d\_} \hfill & {\quad pr_{20} :A_{4} \to \_d|d\_} \hfill \\ {pr_{21} :C \to \_c|c\_} \hfill & {\quad pr_{22} :D \to \_c|c\_} \hfill \\ {pr_{23} :C \to \_d|d\_} \hfill & {\quad pr_{24} :D \to \_d|d\_} \hfill \\ \end{array} $$

Using fuzzy remove operator:

$$ \begin{array}{*{20}l} {pr_{25} :A_{1} \to \lambda } \hfill & {\quad pr_{26} :A_{2} \to \lambda } \hfill \\ {pr_{27} :A_{3} \to \lambda } \hfill & {\quad pr_{28} :A_{4} \to \lambda } \hfill \\ {pr_{29} :C \to \lambda } \hfill & {\quad pr_{30} :D \to \lambda } \hfill \\ {pr_{31} :C \to \lambda } \hfill & {\quad pr_{32} :D \to \lambda } \hfill \\ \end{array} $$

Consider the string $ w = ddddcd \notin L $. Table 1 depicts some ways for deriving the string w.

Table 1 Various derivation for the string w = ddddcd.

Full size table

Table 2 represents the confidence level for the string w using E-set.

Table 2 Confidence level for the string w using E-set.

Full size table

Table 3 represents the confidence level for the string w using T_f-set. Table 4 represents the final interpretations with fuzzy confidence for the string $ w = ddddcd $.

The optimal derivation for the string $ w = ddddcd $ is as follows:

$$ \mu (w) = \hbox{max} (0,\;0,\,0.6667,\;0,\;0,\;0,\;0,\;0,\;0.3333) = 0.6667 $$

Table 3 Confidence level for the string w using T_f-set.

Full size table

Table 4 Final interpretations with fuzzy confidence for the string w = ddddcd.

Full size table

By choosing a certain level of confidence λ_c, we say the string is accepted if μ(w) ≥ λ_c. Application of RFSRG algorithm to matrix grammar (for details refer “Appendix 1”).

5 Real -life application of RFSRG algorithm in the Hindi language

Regulated languages are the most appropriate formal method for representing human languages.

In linguistic topology, the Hindi language contains cross-serial dependency, and its structure is similar to the German and Dutch languages. Consider the following sentence of the Hindi Language spoken in India.

In the above sentences, sequential noun phrases मोहन (Mohan), सेब (apple), राम (Ram), पेन (pen) and the verb phrases ख़ाया (ate), खराब हो गया (rotten), लिखा (written), उपहार मे मिला था (gifted) both form two separate series of constituents. Let m′ be the morphism which maps to मोहन and ख़ाया to c, सेब and खराब हो गया to d and all other words of the sentence to the empty string. Therefore, S₁:cdcd. Therefore, sentence S₁ is of the form ww, which is a non-context-free language.

Consider m″ be the morphism, which maps राम, पेन, लिखा, उपहार मे मिला था to a, b, c and d, respectively and the dependency relation between राम (Ram) and लिखा (written) is denoted by m and पेन (pen) and उपहार मे मिला था (gifted) by n and other words of the sentence by empty string. Therefore, sentence S₂ is of the form a^mbⁿc^mdⁿ, which is a non-context-free language. Cross-serial dependency of the Hindi language can be represented by regulated grammar.

Now consider the sentence S₁ and If the users unintentionally write the S₁ sentence as

S₁′: मोहन ने वो सेब ख़ाया जो सेब खराब हो गया|

In this sentence, the user makes an insertion error of one noun phrase सेब (apple). Now the sentence S₁′ maps to cdcdd instead of cdcd. In formal grammar, this string is completely rejected. However, in case of fuzzy regulated grammar, we cannot completely reject the string w. By considering the grammar of example 4.1, Erroneous sentence acceptance S₁′ is shown using proposed algorithm RFSRG. Table 5 depicts few ways for deriving the string w = cdcdd.

Table 5 Various derivation for the string w = cdcdd.

Full size table

Table 6 represents the confidence level for the string w using E-set.

Table 6 Confidence level for the string w using E-set.

Full size table

Table 7 represents the confidence level for the string w using T_f-set. Table 8 represents the final interpretations with fuzzy confidence for the string $ w = cdcdd $.

The optimal derivation for the string $ w = cdcdd $ is as follows:

$$ \mu (w) = \hbox{max} (0.75;\;0.75;\;0.5;\;0.5) = 0.75 $$

Table 7 Confidence level for the string w using T_f-set.

Full size table

Table 8 Final interpretations with fuzzy confidence for the string w = cdcdd.

Full size table

By choosing a certain level of confidence λ_c, we say the string is accepted if μ(w) ≥ λ_c.

6 Conclusions

In this paper, an algorithm for deriving a string, w ∉ L, with a certain level of confidence is described. Regulated grammar is converted to Chomsky normal form to avoid multiple error elimination. Furthermore, the algorithm is applied to regularly controlled and matrix grammar. The time complexity of the proposed algorithm is O(|G ²_R |·|w|). By way of numerical examples for a string, w ∉ L, the determination of a confidence level for the string, w, is also demonstrated Further, the real-life application of a proposed algorithm for the recognition of faulty strings for the natural language, Hindi, is exhibited.

References

Dassow J and Păun G 1989 Regulated rewriting in Formal Language Theory (1st ed)., Berlin, Heidelberg: Springer
Book MATH Google Scholar
Dassow J 2004 Grammars with regulated rewriting. In: Martin-vide C, Mitrana V and Paun G (Eds.) Formal Languages and Applications. Berlin, Heidelberg: Springer, pp. 249–273
Chapter Google Scholar
Meduna A and Zemek P 2014 Regulated Grammars and Automata (1st ed). New York: Springer
Book MATH Google Scholar
Senay H 1992 Fuzzy command grammars for intelligent interface design. IEEE Transactions on Systems, Man and Cybernetics 22(5): 1124–1131
Article Google Scholar
Hopcroft J E, Motwani R and Ullman J D 2006 Introduction to automata theory, languages and computation (3rd ed). Boston: Addison-Wesley
MATH Google Scholar
Steimann F and Adlassnig K P 1994 Clinical monitoring with fuzzy automata. Fuzzy Sets and Systems 61(1): 37–42
Article MathSciNet Google Scholar
Giles C L, Omlin C W and Thornber K K 1999 Equivalence in knowledge representation: automata, recurrent neural networks, and dynamical fuzzy systems. Proceedings of the IEEE 87(9):1623–1640
Article Google Scholar
Zadeh L A 2004 A note on web intelligence, world knowledge and fuzzy logic. Data and Knowledge Engineering 50(3): 291–304
Article Google Scholar
DePalma G F and Yau S S 1975 Fractionally fuzzy grammars with application to pattern recognition. In: Zadeh L A, Fu K S, Tanaka K and Shimura M (Eds.) Fuzzy Sets and their Application to Cognitive and Decision Processes. New York: Academic Press, pp. 329–351
Qiu D 2007 Automata theory based on quantum logic: Reversibilities and pushdown automata. Theoretical Computer Science 386(1–2): 38–56
Article MathSciNet MATH Google Scholar
Bělohlávek R 2002 Determinism and fuzzy automata. Information Sciences 143(1–4): 205–209
Article MathSciNet MATH Google Scholar
Ignjatović J, Ćirić M and Bogdanović S 2008 Determinization of fuzzy automata with membership values in complete residuated lattices. Information Sciences 178(1): 164–180
Article MathSciNet MATH Google Scholar
Schneider M, Lim H and Shoaff W 1992 The utilization of fuzzy sets in the recognition of imperfect strings. Fuzzy Sets and Systems 49(3): 331–337
Article MathSciNet MATH Google Scholar
Inui M, Shoaff W, Fausett L and Schneider M 1994 The recognition of imperfect strings generated by fuzzy context sensitive grammars. Fuzzy sets and systems 62(1): 21–29
Article MathSciNet Google Scholar
Ginsburg S and Spanier E H 1968 Control sets on grammars. Math. Systems Theory 2(2): 159–177
Article MathSciNet MATH Google Scholar
Cremers A and Mayer O 1973 On matrix languages. Information and Control 23(1): 86–96
Article MathSciNet MATH Google Scholar
Solar P 2014 Deep Pushdown Transducers and State Translation Schemes. In: Proceedings of the 20th Conference STUDENT EEICT, Brno University of Technology, 24 April, pp. 264–268
Kasai T 1970 An hierarchy between context-free and context-sensitive languages. Journal of Computer and System Sciences 4(5): 492–508
Article MathSciNet MATH Google Scholar
Zemek P 2013 One-sided random context grammars: Established results and open problems. In: Proceedings of the 19th Conference STUDENT EEICT, Brno University of Technology, 25 April, pp. 222–226
Van der Walt A P J 1970 Random context grammars. In: Proceedings IFIP Congress. North-Holland, Amsterdam, pp. 66–68
Meduna A and Zemek P 2014 One-sided random context grammars with a limited number of right random context rules. Theoretical Computer Science 516: 127–132
Article MathSciNet MATH Google Scholar
Meduna A 1990 Generalized forbidding grammars. International Journal of Computer Mathematics 36(1–2): 31–38
Article MATH Google Scholar
Meduna A and Zemek P 2013 Generalized one-sided forbidding grammars. International Journal of Computer Mathematics 90(2): 172–182
Article MathSciNet MATH Google Scholar
Meduna A and Zemek P 2012 One-sided forbidding grammars and selective substitution grammars. International Journal of Computer Mathematics 89(5): 586–596
Article MathSciNet MATH Google Scholar
Kleijn H C M 1983 Selective Substitution Grammars Based on Context-Free Productions. Ph.D. Thesis, Leiden University, Netherlands
Kleijn H C M 1987 Basic ideas of selective substitution grammars. In: Kelemenova A and Kelemen J (Eds.) Trends Techniques and Problems in Theoretical Computer Science. Berlin, Germany: Springer, pp. 75–95
Kalra N and Kumar A 2017 Deterministic Deep Pushdown Transducer and its Parallel Version. The Computer Journal 61(1): 63–73
Article MathSciNet Google Scholar
Kalra N and Kumar A 2016 Fuzzy state grammar and fuzzy deep pushdown automaton. Journal of Intelligent and Fuzzy Systems 31(1): 249–258
Article MATH Google Scholar
Garhwal S and Jiwari R 2016 Parallel fuzzy regular expression and its conversion to epsilon-free fuzzy automaton. The Computer Journal 59(9):1383–1391
Article MATH Google Scholar
Lee E T and Zadeh L A 1969 Note on fuzzy languages. Information Sciences 1(4): 421–434
Article MathSciNet Google Scholar
Asveld P R J 2005 Fuzzy context-free languages. Part 1: generalized fuzzy context-free grammars. Theoretical Computer Science 347(1): 167–190
Article MathSciNet MATH Google Scholar
Asveld P R J 2005 Fuzzy context-free languages. Part 2: Recognition and parsing algorithms. Theoretical Computer Science 347(1): 191–213
Article MathSciNet MATH Google Scholar
Zhanga J, Williams S O and Wang H 2017 Intelligent computing system based on pattern recognition and data mining algorithms. Sustainable Computing: Informatics and Systems. https://doi.org/10.1016/j.suscom.2017.10.010
Google Scholar
Bag S, Tiwari M K and Chan F T S 2017 Predicting the consumer’s purchase intention of durable goods: An attribute-level analysis. Journal of Business Research. https://doi.org/10.1016/j.jbusres.2017.11.031
Zadeh L A 1965 Fuzzy sets. Information and Control 8(3): 338–353
Article MathSciNet MATH Google Scholar
Lange M and Leiß H 2009 To CNF or not to CNF? An efficient yet presentable version of the CYK algorithm. Informatica Didactica 8: 2008–2010
Google Scholar

Download references

Acknowledgements

One of the authors, Nidhi Kalra was supported under Visvesvaraya PhD Scheme Fellowship by Ministry of Electronics and Information Technology, Government of India.

Author information

Authors and Affiliations

Computer Science and Engineering Department, Thapar Institute of Engineering & Technology, Patiala, 147004, India
Ajay Kumar, Nidhi Kalra & Sunita Garhwal

Authors

Ajay Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Nidhi Kalra
View author publications
You can also search for this author in PubMed Google Scholar
Sunita Garhwal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ajay Kumar.

Appendix 1: Numerical example

In this appendix, the proposed algorithm has been applied to a matrix grammar.

Example 4.2

Consider G_M = (G, r_m), where G = {{S, C, D}, {c, d}, S, {m₀, m₁, m₂, m₃, m₄}} be the matrix grammar of example 2.2 where

$$ \begin{aligned} m_{0} & = (p_{0} :S \to CD), \\ m_{1} & = (p_{1} :C \to cC,p_{2} :D \to cD), \\ m_{2} & = (p_{3} :C \to dC,p_{4} :D \to dD), \\ m_{3} & = (p_{5} :C \to c,p_{6} :D \to c), \\ m_{4} & = (p_{7} :C \to d,p_{8} :D \to d), \\ \end{aligned} $$

The control set is r_m = m₀(m₁, m₂) * (m₃, m₄).

Equivalent CNF for the grammar G_M:

$$ \begin{aligned} m_{0}^{'} & = (p_{0} :S \to CD), \\ m_{1}^{'} & = (p_{1}^{1} :C \to A_{1} C,p_{1}^{2} :A_{1} \to c,p_{2}^{1} :D \to A_{2} D,p_{2}^{2} :A_{2} \to c), \\ m_{2}^{'} & = (p_{3}^{1} :C \to A_{3} C,p_{3}^{2} :A_{3} \to d,p_{4}^{1} :D \to A_{4} D,p_{4}^{2} :A_{4} \to d), \\ m_{3}^{'} & = (p_{5} :C \to c,p_{6} :D \to c), \\ m_{4}^{'} & = (p_{7} :C \to d,p_{8} :D \to d) \\ \end{aligned} $$

The control set is r_m′ is m₀′(m₁′, m₂′) * (m₃′, m₄′).

Using fuzzy replace operator:

$$ \begin{array}{*{20}l} {pr_{9} :A_{1} \to \_} \hfill & {pr_{10} :A_{2} \to \_} \hfill \\ {pr_{11} :A_{3} \to \_} \hfill & {pr_{12} :A_{4} \to \_} \hfill \\ {pr_{13} :C \to \_} \hfill & {pr_{14} :D \to \_} \hfill \\ {pr_{15} :C \to \_} \hfill & {pr_{16} :D \to \_} \hfill \\ \end{array} $$

Using fuzzy add operator:

$$ \begin{array}{*{20}l} {pr_{17} :A_{1} \to \_c|c\_} \hfill & {pr_{18} :A_{2} \to \_c|c\_} \hfill \\ {pr_{19} :A_{3} \to \_d|d\_} \hfill & {pr_{20} :A_{4} \to \_d|d\_} \hfill \\ {pr_{21} :C \to \_c|c\_} \hfill & {pr_{22} :D \to \_c|c} \hfill \\ {pr_{23} :C \to \_d|d\_} \hfill & {pr_{24} :D \to \_d|d\_} \hfill \\ \end{array} $$

Using fuzzy remove operator:

$$ \begin{array}{*{20}l} {pr_{25} :A_{1} \to \lambda } \hfill & {pr_{26} :A_{2} \to \lambda } \hfill \\ {pr_{27} :A_{3} \to \lambda } \hfill & {pr_{28} :A_{4} \to \lambda } \hfill \\ {pr_{29} :C \to \lambda } \hfill & {pr_{30} :D \to \lambda } \hfill \\ {pr_{31} :C \to \lambda } \hfill & {pr_{32} :D \to \lambda } \hfill \\ \end{array} $$

On considering string w = ddcd ∉ L. Table A1 dep

Table A1 Various derivation for the string $ w = cdcdcccd $.

Full size table

icts few ways for deriving the string w = ddcd.

Table A2 represents the confidence level for the string wusing E-set.

Table A2 Confidence level for the string w using E-set.

Full size table

Table A3 and A4 represent the confidence level for the string wusing T_f-set and final interpretations with fuzzy confidence for the string w = cdcdcccd respectively.

$$ \mu (w) = \hbox{max} (0,\;0,\;0.75,\;0,\;0,\;0,\;0.5,\;0,\;0,\;0) = 0.75 $$

By choosing a certain level of confidence λ_c, we say string is accepted if μ(w) ≥ λ_c.

Table A3 Confidence level for the string w using T_f-set.

Full size table

Table A4 Final interpretations with fuzzy confidence for the string w = cdcdcccd.

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, A., Kalra, N. & Garhwal, S. Error tolerance for the recognition of faulty strings in a regulated grammar using fuzzy sets. Sādhanā 43, 134 (2018). https://doi.org/10.1007/s12046-018-0833-y

Download citation

Received: 25 July 2017
Revised: 05 January 2018
Accepted: 08 January 2018
Published: 10 July 2018
DOI: https://doi.org/10.1007/s12046-018-0833-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Error tolerance for the recognition of faulty strings in a regulated grammar using fuzzy sets

Abstract

Similar content being viewed by others

Analysis of Fuzzy String Patterns with the Help of Syntactic Pattern Recognition

Interval type-2 fuzzy automata and Interval type-2 fuzzy grammar

Automatic recognition of machine English translation errors using fuzzy set algorithm