Keywords

1 Introduction

In today’s world, there is a need to design gadgets and systems with automation. To enable automation of gadgets, it is essential that systems handle decisions independently based on the environmental conditions.

Systems should adapt to the external environment and handle tasks based on the underlying behaviour. Systems should also handle exceptional tasks. To fulfil the same, the system needs to be intelligent to take decisions in the absence of manual intervention.

Machine learning is one way of implementing these features of Artificial Intelligence where systems are trained based on an initial data set.

The system forms the initial dataset by collecting the information from the external environment through sensors and actuators. System is trained to learn information and take decisions based on iteration of datasets. The time needed by the system to learn information and take decisions from the datasets should be finite and small so as to avoid failures that can lead to catastrophe.

In this work, a fuzzy rule matrix is constructed and the threshold of the entire rule set is also computed. The modified rule set is constructed by selecting rules whose rule strength is greater than the threshold. These are the rules used by the system to learn the data. Other rules are not considered.

The paper is organised as follows: Sect. 2 discusses the literature on machine learning algorithms. Section 3 explains the fuzzy model proposed, illustrates the same with an example and gives the algorithm. Section 4 explains the simulation environment and the results. Section 5 concludes the work and highlights the scope for the future work.

The block diagram of the entire procedure is shown in Fig. 1.

Fig. 1
figure 1

Block diagram of fuzzy learning model

2 Literature Review

The most conventional machine learning strategy is the Naive Bayes classifier. Several literatures report the inaccuracies in Bayes’ conditional independence. This is due to the suboptimal nature of the probability estimates as reported in [1,2,3]. Even when multidimensional tables are used to compute probabilities in real-time scenarios, many errors are observed as discussed in [2]. Support Vector method uses a hyperplane to classify the data. Based on the classification a number of support vectors are generated as explained in [4]. But when data sets are widely spread and have a large range of deviation, this method does not yield accurate results. The choice of the hyperplane is the key factor associated with the data classification as discussed in [5, 6]. Artificial neural networks give reliability but again this depends on the choice of the activation function used to train the data set as explained in [7, 8]. Linear and Logistic regression as explained in [9,10,11] uses a cost function and estimates the gradient. The reduction in the gradient is used to learn the datasets. The regression analysis depends on the effectiveness of the cost function.

Several deep learning neural network algorithms model a data set as a function using a mathematical model and train the same to interpolate or extrapolate the data value for a particular value. The Stochastic Gradient Descent technique as explained in [12] attempts to minimise the loss of the model by incorporating additional parameters and increments these proportional to the gradient estimated. But this makes the data noisy due and filters needs to be used to eliminate the noise associated. The method of steepest descent as discussed on [13] which when used on a very large data set requires an infinite number of iterations to get an optimum solution. It may so happen that an optimum solution may not exist also. Another approach which works in a different dimension is Reinforcement learning as explained in [14]. This method arrives at an optimal solution by trial and error in infinite time. But the time of training in an unsupervised scenario would be infinite and the optimal solution cannot be guaranteed. The drawbacks highlighted in the above techniques are overcome in the fuzzy-based analysis. In the fuzzy-based analysis a precise value for the rule strength are arrived and the optimal solution is arrived by training the same with an activation function in a finite period of time. It does not require any mathematical model and infinite number of trials to arrive at an optimal solution.

3 Proposed Work

In this work a fuzzy-based learning approach using Mamdani’s engine as discussed by Venkat and Pradheep in [15] has been used. For each rule, a rule strength is computed. The rule strength is computed based on a set of sub-factors and their corresponding weights. For each linguistic variable used in the rule, the mid-value of the range is computed. The weighted average of each variable of the rule gives the rule strength. A typical rule is given as follows:

The Rule Strength is computed using the expression

$${\text{RS}} = \frac{{\sum { ( {\text{Weight}}*{\text{MidValue)}}} }}{{\sum {\text{Weight}} }}$$
(1)

The rule strengths have been computed for all rules in the fuzzy rule set using the above expression. The threshold of the fuzzy rule set is computed using the following expression

$$T = \frac{{\sum\limits_{i = 1}^{i = n} { ( {\text{Rulenumber}}*{\text{RuleStrength)}}} }}{{\sum\limits_{i = 1}^{n} {\text{Rulenumber}} }}$$
(2)

where n is the total number of rules.

The threshold computed is a measure of the upper bound till completion of the training of the system. The modified rule set is constructed by selecting rules, which have a rule strength greater than the computed threshold. After this a matrix is constructed in the form (RS − mT, RS, RS + mT), where m = 1 to n and n is the number of rules in the original rule set. The matrix is formulated for different values of m in such a manner that RS − T is positive. This criterion decides the number of columns of the matrix.

The matrix now is non-symmetrical as the number of rows would be greater than the number of columns. The matrix is split into a number of sub-symmetrical matrices by matching the number of rows with the columns.

The procedure has been illustrated with an example where RS − T is positive and RS − 2T, RS − 3T, etc. are negative. With only RS − T, RS and RS + T, the matrix has been formulated. Hence, the modified rule set should be split into 3 X 3 matrices based on the modified rule set. The actual output matrix is accepted as input. The error, which is the difference between the output and modified rule matrix, is computed. An activation function is used to iteratively train the modified rule set till the error is non-negative. The final value is the optimal rule set.

For computing the trained matrix for antivirus check, the different sub-factors and their corresponding weights are shown in Table 1.

Table 1 Weight for each sub-factor

The different Linguistic variables with their ranges and corresponding mid-values are indicated in Table 2.

Table 2 Linguistic variable with range and mid-value

A typical rule is shown below.

If (Phishing Check is Excellent) and (Spyware Check is Good) and (Malware check is Average) and (Trojan Check is Fair) and (Rootkit scan is Poor) Then (Antivirus software is Average).

The entire rule set comprises of 125 rules. Few sample rules from the entire rule set is shown below in Table 3.

Table 3 Sample rule set

The rule strength of the rule If (Phishing Check is Excellent) and (Spyware Check is Good) and (Malware check is Average) and (Trojan Check is Fair) and (Rootkit scan is Poor), then (Antivirus software is Average) is 36.2. The Rule strengths of the above rule base would be computed as indicated in Table 4.

Table 4 Rule strengths computed from rule set

Using the expression, the threshold has been computed as 62.02. Based on this threshold modified rule set is shown in Table 5.

Table 5 Modified rule set

Hence RS − T is only non-negative and the rest of RS − mT is negative. From the above discussion 14 (3 X 3 Symmetrical) sub-matrices have been formulated. The last row of the 14th matrix would be zeroes. The modified rule set is trained using the activation function \(E = \left( {\frac{1}{{1 - e^{ - T} }}} \right)\) as shown in Table 6.

Table 6 Trained modified rule set

The modified rule matrix is LRM. The iteration matrix is LRM1. Given Output matrix is OM. The error is (OM − LRM1). The iteration is carried out till all elements of OM − LRM1 becomes negative.

The algorithm has been discussed below

  • Form the fuzzy rule set

  • Compute the rule strength and Threshold

  • Form the Learning rule matrix (LRM) with selected rules (RS > T)

  • Form modified rule matrix (RS − mT, RS, RS + mT)

  • Accept the output matrix (OM)

  • Start the iteration by computing Error E = (OM − LRM)

  • Use activation function \(E = \left( {\frac{1}{{1 - e^{ - T} }}} \right)\) to minimise the error.

  • LRM1 = LRM + E

  • If (LRM1 <= OM) then

    • Final output O1 = LRM1

      Else

    • L1: LRM1 = LRM + E

  • If (LRM1 > OM)

    • O1 = LRM1

      Else

    • (LRM1 = LRM1 + E)

  • Repeat iteration until LRM1 value is > OM.

  • Goto L1.

  • Compute RMSE, RRSE, RAE and MAE for final value of O1.

  • End the procedure.

4 Simulation Results

The algorithm was simulated on Java platform with several IT-related issues like Antivirus, Data backup strategies, hardware maintenance. Simulations were carried out on fuzzy rule sets with about 500 rules.

  • Root Mean Square Error (RMSE)

  • Root Relative Square Error (RRSE)

  • Relative Absolute Error (RAE)

  • Mean Absolute Error (MAE).

4.1 Root Mean Square Error (RMSE)

It is the square root of the mean of the squared difference between the Output Matrix (OM) and Trained Learning Rule Matrix (LRM1). It is given by the equation

$${\text{RMSE}} = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{i = n} {({\text{OM}}(i) - {\text{LRM}}1(i))^{2} } }$$
(3)

where n is the number of sample values of the data set under consideration.

4.2 Root Relative Square Error (RRSE)

It is defined as the normalised version of the total squared error to the total squared error of the prediction made.

$${\text{RRSE}} = \sqrt {\frac{{\sum\limits_{i = 1}^{i = n} {({\text{LRM}}1(i) - {\text{OM}}(i))^{2} } }}{{\sum\limits_{i = 1}^{i = n} {({\text{OM}}(i) - A\sim )^{2} } }}}$$
(4)

where A~ is the average of the n sample values.

$$A \sim = \frac{1}{n}\sum\limits_{i = 1}^{i = n} {{\text{OM}}(i)}$$
(5)

4.3 Relative Absolute Error (RAE)

It is defined as the total absolute error normalised to the total absolute error of the prediction made.

$$RAE = \frac{{\sum\limits_{i = 1}^{i = n} {\left| {(LRM1(i) - OM(i))} \right|} }}{{\sum\limits_{i = 1}^{i = n} {\left| {(OM(i) - A\sim )} \right|} }}$$
(6)

where A ~ is the average of the n sample values.

$$A \sim = \frac{1}{n}\sum\limits_{i = 1}^{i = n} {OM(i)}$$
(7)

4.4 Mean Absolute Error (MAE)

It is defined as the error difference for the n samples considered.

$${\text{MAE}} = \frac{{\sum\limits_{i = 1}^{i = n} {({\text{OM}}(i) - {\text{LRM}}1(i))} }}{n}$$
(8)

The results of the fuzzy-based approach proposed in this work have been compared with the four conventional machine learning algorithms listed below:

  1. 1.

    Naïve Bayes Classifier

  2. 2.

    Support Vector Method (SVM)

  3. 3.

    Regression Analysis

  4. 4.

    Artificial Neural Networks (ANN).

4.5 Comparison with Naïve Bayes Classifier

The results of the proposed fuzzy-based approach are compared with Naïve Bayes Classifier. The reduction in Error for the parameters RMSE, RRSE, RAE and MAE when compared with the Naives Classifier method are 21, 17, 18 and 27% respectively as indicated in Table 7.

Table 7 Comparison of parameters (Fuzzy learning model vs. Naïve classifier)

4.6 Comparison with Support Vector Method (SVM)

The reduction in Error for the parameters RMSE, RRSE, RAE and MAE for the proposed work when compared with the Support Vector Method are 32, 10, 17 and 17% respectively as indicated in Table 8.

Table 8 Comparison of parameters (Fuzzy learning model vs. Support vector method)

4.7 Comparison with Regression Analysis

The reduction in Error for the parameters RMSE, RRSE, RAE and MAE for the proposed fuzzy-based approach when compared with the Regression Analysis Method are 13, 4, 7 and 23% respectively as indicated in Table 9.

Table 9 Comparison of parameters (Fuzzy learning model vs. Regression analysis method)

4.8 Comparison with Artificial Neural Networks (ANN)

The reduction in Error for the parameters RMSE, RRSE, RAE and MAE for the proposed fuzzy-based approach when compared with the Artificial Neural Networks method are 34, 21, 17 and 32% respectively as indicated in Table 10.

Table 10 Comparison of parameters (Fuzzy learning model vs. Artificial neural networks)

4.9 Summary

The reduction in error for the proposed work compared to conventional approaches is shown in Table 11.

Table 11 Summarisation of results of comparison for error reduction

The average reduction by using the fuzzy learning method for RMSE, RRSE, RAE and MAE are 25, 13, 15 and 25% respectively as indicated in Table 11. The average of these values would be 20%. Hence, the fuzzy learning method reduces error compared to conventional approaches by 20%.

5 Conclusion

A fuzzy-based learning approach has been proposed in this work. The fuzzy learning approach reduces the error by 20%. This approach computes the rule strength and chooses rules with rule strengths greater than the threshold limit for effective learning. The learning approach uses a predictive range for the linguistic variable, which is chosen before commencing the training process.

6 Future Work

The work can be extended by making this training process fully unsupervised by generating rule strengths using random function and then completing the entire process. The random unsupervised learning approach could later be implemented using Raspberry Pi as hardware for several applications like healthcare analytics, stock market, etc.