Introduction

The emerging discipline of cognitive computation deals with artificial reasoning systems that interact with humans in complex situations, mimicking human rational processes as autonomously as possible in order to improve people’s productivity in many areas.

To do so, this interdisciplinary research area employs and combines models from various areas, many of them involving computational intelligence developments as natural and biologically inspired methodologies [1,2,3]. They use knowledge learned from both people and data by means of machine learning and data mining, among others. Cognitive computing systems entail many challenges [4], some of them closely related to large-scale [5], high dimensional [6] and big data [7] problems.

Fuzziness is an inherent feature of cognitive information, due to the incomplete cognition of human beings [8]. On the other hand, decision-making ability is one of the main characteristics of cognitive computation and in this sense, fuzzy logic helps bring computer reasoning closer to its human counterpart [9].

Evolutionary algorithms [10, 11] are biologically inspired methods that have been widely used, particularly together with fuzzy systems, forming the core of the soft computing area. Both together are particularly useful due to the good practical results, developing the area named evolutionary fuzzy systems (EFSs). Relevant current reviews in this area can be found in [12,13,14]. Also, some recent papers in this area related with other current hot topics could be found in [15] which is about explainable artificial intelligence and EFS; or applications such as [16], related with intrusion detection systems with EFS; or [17], discussing the fuzzy modeling and control of micro-air vehicles using evolutionary algorithms; [18] is also about learning TSK fuzzy systems with evolutionary algorithms for high dimensional datasets.

Nonetheless, evolutionary algorithms in general terms have the drawback of a substantial computational cost, due to the iterative searching method employed. Furthermore, when learning fuzzy systems rule bases (RBs), the high number of examples (large scale problem) implies an exponential rule growth, i.e., as many defuzzification parameters as rules, significantly expanding the search space and thus the time required. Moreover, calculating the fitness function when using an enormous number of training examples take a lot of computational time.

The philosophies usually followed in the EFS area to handle large amounts of data are basically the following [14]:

  1. (a)

    Staking out the algorithms: the inner mechanisms of the evolutionary algorithms should be rebuilt.

  2. (b)

    Reducing the amount of data: this way, the evolutionary procedure should require lower computational effort.

  3. (c)

    Using distributed computing: the use of clusters of computers to decrease the time employed.

Therefore, in the EFS design area, but not exclusively, one important challenge is the development of methods and algorithms for enormous data volumes capable of doing the same work as small data approaches [19, 20], or quality improvements due to the higher computational power.

Although many papers on EFSs for big data focused on classifiers [21,22,23,24,25,26] can be found, only a few proposals are devoted to regression problems [27,28,29,30]. However, [27, 28] are not of linguistic models but Takagi-Sugeno. The recent interest in scalability was preceded by some previous proposals around sizeable datasets [31,32,33,34,35] based on different kinds of approach, such as reducing the training data or decreasing the searching space. Therefore, they are not really scalable proposals, that is, offering a wide range of applicability while increasing the size of the dataset, based on increasing the computational power without varying the design of the algorithm.

Focusing on the design area of fuzzy rule-based systems (FRBSs) problems, there have been many different methods to improve their accuracy [12, 14]. Some of them are based on the use of custom aggregation operators [36, 37]. Attending to regression problems, one of the most well-known, and normally also compatible with the others, is the use of adaptive fuzzy operators in the inference system [38, 39], and particularly adaptive defuzzification [40]. The accuracy improvements it provides derive from the adaptation of defuzzification to each particular rule, fine-tuning their specific relevance, i.e., making the rule set be more cooperative. This is especially interesting when the RB has been learned using methods based on covering criteria and the best rule of its area. One of these methodologies is the widely used and well-known WM-method [41]. Therefore, the blend of this simple RB learning procedure with the adaptive defuzzification achieves a good combination, particularly when using evolutionary algorithms [40].

However, the use of this approach when the problems involve huge data volumes is a challenge. In that situation, the first step, which is RB learning through adaptation of the WM-method, was conceptually resolved in our conference paper [30]. There, a model that gets the same RB as that which could be learned with the sequential original model was presented and tested with an Apache Hadoop implementation. The second step, in [30], is a preliminary study devoted to an evolutionary adaptive defuzzification (EAD) method, but there, the proposed model suffered from lack of precision in terms of the accuracy achieved by the equivalent traditional sequential model.

Attending particularly to the aforesaid second step, in this work, we introduce an original new EAD method which improves upon the approach presented in [30]. Specifically, it definitively enhances the previous work in terms of accuracy and scalability based on a substantially different distributing scheme. Now, it is a single evolutionary process with distributed population evaluation, designated global learning model [42], instead of the multiple distributed evolutionary processes, called the local learning model [42], used in [30]. Also, a specific Apache Spark [43] implementation in place of the Apache Hadoop employed in [30] is proposed, in order to perform the iteration efficiently over the distributed loop needed by the new learning model.

To verify the behavior of our proposal, we carried out an experimental study. We used 12 regression problems and measured the performance in terms of computational cost and accuracy. Furthermore, to confirm the advantages of the new model, we compared it against the preliminary distributed EAD presented in [30], applying statistical tests [44, 45] in order to confirm our hypothesis.

To organize this paper, we have planned the sections as follows: “Preliminaries” section describes the new scalable EAD method; “WM-EAD-Global” section comprises the distributed MapReduce approach developed in this work; and “Experimental Study” section shows the experimental study carried out, where we analyze the accuracy and study the speed-up of our new proposal to finally reach some conclusions presented in “Conclusions and Future Works.”

Preliminaries

This section is devoted to, first of all, review definitions and notations related with adaptive defuzzification methods, followed by an introduction to the big data distributed computing frameworks.

Evolutionary Adaptive Defuzzification

Adaptive defuzzification is an easy mechanism to improve the accuracy of linguistic FRBS for fuzzy modeling, based on using the appropriate individual contribution of each rule to the inference process in order to promote cooperation between the rules [39, 40].

There are many papers devoted to parameterized defuzzifiers. Frequently, they tune the behavior of the defuzzification with a single global parameter or with one parameter for each rule, resulting in an improved accuracy.

In this paper, we opted to use the specific expression shown in (1), as it is efficient, easy to implement, and showed good behavior in previous works [40]:

$$ {y}_0=\kern0.5em \frac{\sum \limits_i^N{h}_i\cdot {\alpha}_i.C{G}_i}{\sum \limits_i^N{h}_i.{\alpha}_i}, $$
(1)

where hi is the so-called matching degree, αi is the parameter that tunes each rule Ri, i = 1 to N, and CGi is the gravity center of the fuzzy set inferred with the rule Ri. This is a Mode - B defuzzifier, i.e., it converts individually every inferred fuzzy set into a real value and then calculates a weighted sum.

Note that the αi parameters are equivalent to rule weighting [46], where values of αi [1,∞) emphasize the contribution of that rule, whereas values αi∈ [0,1] penalize it.

The set of defuzzifier parameters are often learned by using an evolutionary algorithm with real coding [39, 40], following the scheme of a chromosome comprising all the parameters associated with each rule of the RB. In this way, the learning process achieves a subset of rules with improved cooperation among them [39, 40]. Therefore, the learning process described is particularly interesting for use in post-processing after the quick and simple methods of RB learning guided by examples coverage (i.e., WM-method). These methods select the best rules individually, instead of in a collaborative group, which is finally reached thanks to the evolutionary process.

Big Data and Cluster Computing Frameworks

In general terms, big data is employed to denote volumes of data out of the capabilities of the typical database resources to capture, store, manage, and analyze [47].

The current big data technologies employed to manage the aforesaid data volumes are based on three columns [48]:

  • Distributed file systems that store the big files in several distributed servers, e.g., the popular Apache Hadoop Distributed File System (HDFS) [49]

  • Programming paradigms such as MapReduce [50] or Pregel [51] that ease the distributed programming jobs into clusters or servers

  • Frameworks for computing clusters such as Apache Hadoop [49] or Apache Spark [43] to let us organize and manage this groups of computers as storage (through distributed file systems) and data processing (implementing distributed programing models) structures efficiently

The MapReduce programming paradigm was featured by Google in 2004 [50], and its best known open-source implementation is Apache Hadoop. It is famous due to being one of the first proposals including MapReduce, and also concepts like relatively easy to use scalable storage and data processing, as well as highly fault-tolerant, high availability, automatic data redundancy and recovery, etc. Hadoop is conceived to perform in a simple one-pass batch processing over data, that is, it is not intended to implement iterations over data where it is not efficient, or for interactive data research. Some of these drawbacks have recently been resolved by the Apache Spark [43] framework.

Apache Spark is likewise an open-source distributed programming framework conceived as a step forward in flexibility and efficiency. It incorporates different computational models, MapReduce being one of them, whose implementation is significantly faster [43] than those of other frameworks. A particularly interesting advantage is that it allows efficient iterative or multi-pass data processing due to one of its key features: the use of in-memory computing. This mechanism is based on a distributed memory abstraction (called resilient distributed dataset (RDD) [43]) that reduces the middle disk access, dramatically accelerating overall performance. An RDD can be seen as a set of data split through different servers of the cluster, which can be processed in parallel. Programmers can use two categories of operations over RDDs: transformations, which take an RDD and obtain another new RDD, and actions, which obtain a value from a computation over a given RDD. The fault-tolerant capability is implemented based on the aforementioned RDDs, as their slices can be automatically reconstructed if, for any reason, they get lost.

A user program in Spark can be seen as a single driver running the main function in the cluster master machine, and a set of several parallel tasks, run by the executors, achieving the RDDs on the slave machines of the cluster and returning the results of their computations to the driver.

Finally, we can point out that Spark can also use HDFS distributed storage, but it is independent of the storage file system of the cluster, and it can be used not only through programming but also interactively by using a console command line interpreter.

The EAD proposal developed in this paper is based on the use of the MapReduce paradigm implemented in the Spark framework, due to its capacity for efficient use in data sciences problems [52,53,54,55].

WM-EAD-Global: a Linguistic Fuzzy System with Evolutionary Adaptive Defuzzification with Spark

Now, we describe the proposal, the WM-EAD-Global FRBS for regression. This entails two sequential phases, which benefit from the distributed approach:

  • WM-Spark: The first phase consists of creating the RB using the scalable version of the WM-method we proposed in [30]. As we code it using Spark this time, it is designated WM-Spark.

  • EAD-Global-Spark: The second phase entails the evolutionary adaptive defuzzification method itself. It uses a scalable global evolutionary learning model, where the defuzzification method parameters are learned using an evolutionary algorithm [40]. This time is also implemented in Spark, where it takes advantage of the in-memory data capabilities to implement an iterative global model efficiently.

First Phase: WM-Spark

The WM data-driven method [41] is one of the most referenced algorithms of the FRBSs research area, to obtain, in a simple way, the RB employing a set of samples. The first phase, a distributed Spark implementation of the WM-method, which we named WM-Spark, involves a conceptually simple idea, which is to divide the original training dataset into some subsets, and apply the WM-method to each subset in a distributed way, helped by the MapReduce paradigm. Thus, in Spark, the training set is uniformly split into n portions and distributed alongside the computer cluster. This one-pass MapReduce schema is shown in Fig. 1, highlighting the operations carried out on the master node single computer by the driver program, and the ones on the group of slave computers by the executors. The functions are detailed below:

  1. 1.

    First, the driver program makes the partition of the fuzzy variables (so-called Data Base (DB)) using a uniformly allocated fixed number of triangular linguistic terms. Beforehand, the training dataset is divided into n disjoint subsets of training data with the same size, which are spread into the worker nodes together with the partitioned fuzzy variables.

  2. 2.

    Map function: worker nodes individually perform the classical WM-method, creating a rule for each example on its partition (naming RBi in Fig. 1, to the set of rules of partition i) using for every variable the labels with the greater matching. Additionally, a matching degree is given in order to combine these generated rules. Therefore, this function creates a list of key-value pairs, where (key: labels of the antecedents of the rules; value: consequent of the rules, and its matching), which are returned to the driver program to be joined.

  3. 3.

    Reduce function: at least one or more reduce processes take the RBi to be joined to build the final set of rules (in Fig. 1, we name it as RBF). When rules with the same antecedents and consequent appear two or more times, they are removed keeping only a single copy; on the other hand, if there are rules with the same antecedents but dissimilar consequent, they are fixed selecting only the rule with the highest matching degree. Regarding the key-value pairs managed, this function takes a list of key-value duos grouped by key/antecedent as (key: labels of the antecedents of a rule; list of value: list of pairs (consequents for that antecedents, and their respective matching)) and results in a rule. The whole of the rules generated is the final RB, (RBF) which, together with the DB previously performed by the driver program, is the complete Knowledge Base (KB). Lastly, note that the WM-Spark described achieves the same as the WM-method so the RBs it creates are identical.

    Fig. 1
    figure 1

    First stage of the WM-EAD-Global method in MapReduce: the WM-Spark

Second Phase: The EAD-Global-Spark

This phase performs the evolutionary process devoted to tune the parameters of the defuzzifier linked with each rule [39, 40] in a scalable way.

In the “Introduction” section, we commented that the proposal of this paper significantly improves the approach presented in [30]. There, we employed a local distributed evolutionary process model, each with its own different training data subset, to learn the defuzzification rule parameters. In fuzzy regression, where each output of the FRBS is computed through the aggregation or combination of the inference of some fired rules together instead of a single one, the values of rule parameters learned are closely related to each other, creating a cooperation relationship [40]. Therefore, the later complex combination of rule parameters learned within different partitions causes an inexorable decrease in the accuracy of the model.

This paper proposes to solve the aforementioned drawback by changing the local evolutionary learning model (single-pass MapReduce schema) for a global one (multi-pass MapReduce schema). In this way, the distributed computational power is not employed to perform distributed learning, but the heavy chromosome evaluation for every iteration of the evolutionary algorithm. Conceptually, the proposed model performs in the same way as the sequential model and later, in the experimental study, the differences between local and global model will be studied.

In order to describe the method proposed, we begin by describing the evolutionary schema details, and then how we use a MapReduce schema implemented in a Spark to make it scalable and efficient.

The evolutionary algorithm implemented is based on a classical CHC evolutionary algorithm [56], so here, we briefly depict its essential mechanisms:

Encoding

A real encoding equal to the one employed in our previous paper [40] was used. It consists of N genes in the interval [0,10], corresponding to the respective rule parameters αi, of the RB, Ri.

$$ \mathrm{C}=\left({\upalpha}_1,\dots, {\upalpha}_{\mathrm{N}}\right)\mid {\upalpha}_{\mathrm{i}}\in \left\{0,10\right\} $$

Initial Population

The chromosomes of the initial population have been established with a single one with all its genes fixed to 1, in order to have an individual with the whole of its rules without weights. The rest of the chromosomes of the population were initialized randomly.

Evaluation

The evaluation process is the element of the evolutionary procedure performed in a distributed way by using Spark executors, as described later. Our approach entails minimizing the mean square error (MSE) to maximize the accuracy. The MSE expression 2 is:

$$ MSE\kern0.5em (S)\kern0.5em =\frac{\frac{1}{2}\kern0.5em \sum \limits_{k=1}^M\ {\left(\ {y}_k-S\ \left({x}_k\right)\ \right)}^2}{M} $$
(2)

where S denotes the fuzzy model considered. We use a set of evaluation data made by M pairs of numerical data Zk = (xk,yk), k = 1,..,M, with xk being the values of the input variables, yk the corresponding values of the output variables.

Crossover and Restart

The recombination of the chromosomes was implemented using the BLX-α operator [57] specific for real-coded genetic algorithms (fixing the parameter α = 0.5). This involved taking into consideration that CHC algorithm only pairs chromosomes that overcome the mating threshold (they are sufficiently different, measured by using the Hamming distance after the conversion of the real numbers into strings).

The aforesaid mating threshold is set initially to L/4, being L is the number of characters of the string. The mating threshold is reduced by one unit if no offspring reach the new population.

The best chromosome of the population is maintained when the algorithm restarts, while the rest of the individuals are randomly initiated.

An Iterative MapReduce Process for Adaptive Defuzzification

Now, we shall describe the MapReduce strategy to achieve the EAD-Global-Spark. First, we describe it in terms of functions and processes (illustrated in Fig. 2), and then in terms of the computation of the MSE (showed in Fig. 3).

Fig. 2
figure 2

Second stage of the WM-EAD-Global method in MapReduce: evolutionary adaptive defuzzification

Fig. 3
figure 3

Detailed schema used to compute the MSE by the proposed WM-EAD-Global approach with the MapReduce programming model

In this second phase, an iterative process MapReduce is performed to obtain the weighs for each rule obtained in the first phase. Next, we describe this process developed using the Spark paradigm:

  1. 1.

    The driver program, which is executed in the master node, performs an evolutionary learning algorithm to learn the associated RB weights, processing the heavy computational population evaluation in a distributed way along the cluster. To do so, it takes the whole data set, splits it into as many different units as working nodes available in the cluster, and distributes it to each node, also giving them a full copy of the KB (DB + RB) previously obtained in the first phase, and, of course, a full copy of the population to be evaluated.

  2. 2.

    Map function carried out by the executors: each worker node uses its available data to do the evaluation process, that is, every chromosome is evaluated using the worker node set of examples. Therefore, each chromosome, which represents the set of rule weights or defuzzification parameters associated with the RB, gets in this process the partial fitness corresponding with its data partition. In terms of the key-value pairs, the Map function produces a list of intermediate key-value pairs as (key: chromosomes (weights associated of each rule that have to be evaluated); value: fitness (a measure of the accumulated error obtained for the RB and DB using the associated weights of each chromosome), which are transmitted back to the master node that joins them. Later, we describe in full how the MSE is computed with the aforesaid accumulated error.

  3. 3.

    Reduce function, also carried out by the executors: at least one or more reduce processes get the sorted ends of the Maps functions and merge them to construct the definitive fitness of each chromosome (we named Ci with wi in Fig. 2). This is achievable because the final fitness of each chromosome of the population, which is the MSE of the whole data, can be computed by combining the measure of the accumulated errors computed in each partition with their subset of data examples. Using key-value pair terminology, the reduce process gets the list of midway key-value pairs aggregated for key as (key: chromosome (weights associated of each rule that have been evaluated); list of value: list of fitness (accumulated errors) for each chromosome evaluated in each partition) and produces a list of chromosomes with their new associated fitness. The fitness obtained for all chromosomes is then sent to driver program in order to follow the evolutionary process.

Specifically, the definition of the MSE shown in expression 2 for a sequential calculation is computed within the MapReduce schema in this way (see also Fig. 3):

  • The Map functions compute the accumulated error (Errorij) showed in expression 3:

$$ Erro{r}_i^j=\sum \limits_{k=1}^{\frac{M}{n}}\ \left({y}_{i_k}-{S}^j\ \left({x}_{i_k}\right)\ \right) $$
(3)

where n is the number of subsets into which the training dataset has been divided, i is the subset considered (with i = 1,…,n), j is the chromosome contemplated (with j = 1,…,t, and t being the number of chromosomes), and M denotes the number of instances or examples of the original dataset. Sj is the fuzzy model employing the chromosome j, that is, using the j set of defuzzifier parameters, and in the same way that in (2), (xik,yik) are the numerical pairs of values of the i-subset of examples for the input variables (x) of the example k (with k = 1,…,M), and the theoretical output (y).

  • The reduce functions compute the MSE shown in expression 4 using the previously computed accumulated errors, Errorij:

$$ MS{E}^j(S)\kern0.5em =\frac{\frac{1}{2}\kern0.5em \sum \limits_{i=1}^n\ {\left( Erro{r}_i^j\right)}^2}{M} $$
(4)

with j being the chromosome considered, so MSEj(S) is the set of MSEs computed returned as the fitness of each chromosome to the evolutionary algorithm.

Experimental Study

In this section, we present a study of the behavior of the proposed model in terms of accuracy and scalability. As we commented above, both WM-Spark and the new proposal EAD-Global-Spark perform exactly in the same way as the sequential WM and EAD. Therefore, the WM-EAD-Global is alike to its sequential ancestor in terms of accuracy. Thus, in the present experimental study, we attend to the scalability of the new model and compare the accuracy of the presented model against its predecessor proposed in [30].

This section is organized as follows: first, we show the datasets selected for the experimental study. Then, the “Experimental Setup” subsection describes how has been configured the experimental study developed and the non-parametrical statistical test employed to do a comparative study. Then, the “Results Obtained and Analysis” subsection is devoted to the examination of the results. Finally, the “Scalability” subsection studies the speed-up achieved by our proposal.

Table 1 shows a summary of the main characteristics of the 12 regression problems that we selected to carry out our experimental study. They have different complexities, different numbers of instances and variables and can be found in the KEEL [58] data repository, UCI Machine Learning Repository, and the complementary material website of the paper. We also included two particularly complex and great datasets (ETHY2 and YPRE) in order to observe the behavior with them. Particularly ETHY2 has been synthetically built from its corresponding original. It is one of the two time series ethylene_methane gas sensor array under dynamics gas mixtures of 4.178.504 instances from UCI. It has 19 attributes including the time and 2 outputs, so we have selected the ethylene-methane output and the 17 input variables for 12 h. Therefore, the set of datasets selected goes from a lesser to greater number of variables, comprising a range from 8192 to 1,044,625 examples and from 6 to 90 input variables.

Table 1 Datasets considered. Available at KEEL (https://sci2s.ugr.es/keel/datasets.php) and UCI (https://archive.ics.uci.edu/ml/datasets.html) repositories, and also YPRE and ETHY2, which can be downloaded from http://www.uhu.es/gisimd/papers/WM-EAD-Global/)

Experimental Setup

Regarding the KBs, we have used three triangular linguistic terms for each variable in each problem. Both the conjunction and inference operator were the minimum t-norm. Concerning defuzzifiers, WM-Spark models use the center of gravity weighted by the matching degree, while the later global model EAD uses the defuzzifier showed in expression (1).

Datasets were previously separated into training and test, also considering a 5-fold cross-validation model. A total of 30 runs for each problem (5 partitions and 6 different seeds for the random number generator of the genetic algorithms) were carried out. A population of 50 chromosomes, a crossover probability of 1, and a maximum number of 100,000 evaluations were the rest of the evolutionary process setup considered.

An exception to the aforesaid setup is the YPRE dataset. It was added in order to use a greater and complex dataset, but due to its features, we have exceptionally used a single partition and a single seed. Regarding the evolutionary process, in this particular case, we set the number of evaluations to 30,000, because of the relatively modest testing platform available we describe next.

We employed a Spark cluster of 17 virtual servers with 4 cores and 8 GB of RAM each, the first of them being the one that executes the driver program, and the other 16 the worker nodes where the executors act. The host computer hardware is a server with 4 CPUs Intel Xeon E7–4850 with 10 cores per CPU and hyperthreading (thus, capable of 80 threads), and 192 GB of RAM. The number of cores was set in two different ways:

  • Using a 2 core setup to measure the times needed with a basic machine.

  • Using the cluster mode with the whole 16 servers, with 16 and 32 cores setups.

Note that this is not a true HPC cluster for big data problems but a research platform to test and validate algorithms, so the absolute values of time obtained are not truly representative, but interesting from a scalability point of view and for comparison between them. Our objective is not to show the well-known usefulness of the EADs for fuzzy regression applications, but to study the ways to adapt them to the current MapReduce distributed paradigm, to take advance of computer clusters.

Our study has been validated using statistical testing [44, 45]. In fact, we compared the performance approaches using a Wilcoxon signed-rank test [59]. In this sense, we performed a nonparametric statistical test of pairwise comparisons, the Wilcoxon signed rank test [59], to compare performance approaches. For this purpose, the test first performs the absolute value of the differences between the two FRBSs compared and classifies the results in ascending order, establishing a range for each of them. Once done, a sum of the R+ ranges is calculated when the first model exceeds the second, and vice versa when the opposite occurs R-. To conclude, a p value related with the statistical distribution is calculated so then, the null hypothesis of equality of means can be rejected if it is under a pre-specified level of significance.

Results and Analysis

The results achieved by the two models (WM-EAD-Local [30] (in that contribution, named Scalable WM-EAD) and the new proposal, WM-EAD-Global) with each problem are shown in Table 2, where the two main columns show the average MSE obtained by the two models compared, both in training (MSEtra) and test (MSEtst). The table also shows information on the number of rules of the models, and the best test accuracies are highlighted.

Table 2 Reference values of average number of rules and MSE of the FRBSs built with WM-EAD-Local and WM-EAD-Global. Values of MSE in this table must be multiplied by, 10−6, 108, 109, 10−6, 10−8, and 10−4 for DELV, CAL, HOU, ELV, AIL, and TIC respectively

In the same sense, Table 3 shows the Wilcoxon test results of both models, where it can be concluded that there are significant differences between the two methodologies compared because the p values are lower than the fixed level of significance of α = 0.1.

Table 3 Wilcoxon test to compare the accuracy of the WM-EAD-Local vs. WM-EAD-Global. R+ corresponds to the sum of the ranks for WM-EAD-Global and R- to WM-Spark-Local

Consequently, when analyzing Table 2 and the statistical results found in Table 3, we can highlight that:

  • The global learning model presented, WM-EAD-Global, shows better average results for both training and test than the local learning model implemented in WM-EAD-Local, so the new proposal improves on the previous one in terms of accuracy, as it does not suffer any deterioration of the rule cooperation. But there are two exceptions with the TIC and YPRE datasets: the results of TIC in test are slightly worse due to overfitting, which is always a situation that can occur when using evolutionary algorithms. Attending to the particular case of the YPRE dataset, we observed that the WM-EAD-Local gets slightly better accuracy results than the WM-EAD-Global. This is likely due to the lower number of evaluations of the evolutionary algorithm fixed (only 30,000), together with its greater complexity, so the evolutionary algorithm did not converge and take advantage of the global learning model like the rest of the datasets.

  • Finally, it is also interesting to point out that the accuracy of the global methodology proposed is independent of the size of the partition employed (sometimes conditioned by the distributed computing resources available). In other words, this proposal is not only more accurate than the preliminary one presented in [30], WM-EAD-Local, but also independent of the computational resources. Thus, greater computational power only affects the time needed, but not the quality of the solution found.

Scalability

In this section, we observe the behavior of the proposed approach attending to the times obtained and its speed-up when the number of computing resources, in terms of cores, grows. We have selected the basic setup of the cluster using 2 cores, and then setups with 16 and 32 cores. Table 4 shows the runtime results obtained in hours, minutes, and seconds, spent by the EAD-Global-Spark (the second phase comprising of a multi-pass algorithm). It is important to note that, as we have commented before, although many of the times shown are sizeable, they are due to the available experimental platform employed, based on virtual servers running inside a big host computer. Nevertheless, the times are not important in absolute terms, but in relative terms (speed-up), some with respect to others. We did not include in Table 4 the measures of YPRE dataset for the 2 cores setup because its complexity makes it close to impossible to compute with this specific cluster setup, but the results obtained with 16 and 32 cores show a similar scalability than the one obtained with the other datasets.

Table 4 Average runtime elapsed in hours:minutes:seconds and speed-ups for the WM-EAD-Global using 16 and 32 cores

Finally, in order to easily compare the different speed-ups of Table 4, Fig. 4 graphically shows the relative speed-up between the 16 and 32 cores setup for each dataset of the experimental study. The speed-up is defined as the ratio between the time spent with the 16 and 32 core setups respectively, and the time spent by the simple 2 core setup. The speed-up for the YPRE dataset is not computed due to the aforesaid lack of time measured for the 2 core setup.

Fig. 4
figure 4

Speed-up obtained with each dataset (except YPRE)

The time reduction when increasing the number of cores is remarkable, as was expected. However, this time reduction is not completely proportional to the number of cores, due to the extra computational load of the Spark framework.

Conclusions

The purpose of this paper is to propose a new completely linguistic FRBSs for regression with adaptive defuzzification in large-scale environments created using MapReduce distributed paradigm and a global learning model. Although this paper is focused on the evolutionary adaptive defuzzification proposal, we include a distributed scalable version of the very well-known Wang and Mendel approach [41] to learn the fuzzy RB from examples. Both models are implemented in Apache Spark. The most remarkable aspect of this paper is that both algorithms produce the same results as their sequential ancestor, which was not achieved in our preliminary approach [30] where the learning model was local. The interest of the use of evolutionary adaptive defuzzification approaches is that it is compatible with most other methodologies to improve the accuracy in linguistic FRBSs for regression and control.

The proposal presented in this work is interesting not only in terms of large-scale problems (i.e., dataset with huge volume of examples) but also with smaller datasets in more reasonable execution times utilizing distributed processing, despite the heavy computational cost of the evolutionary technique.