Ensemble of Binary Classifiers Combined Using Recurrent Correlation Associative Memories

Lobo, Rodolfo Anibal; Valle, Marcos Eduardo

doi:10.1007/978-3-030-61380-8_30

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12320))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

927 Accesses
1 Citations

Abstract

An ensemble method should cleverly combine a group of base classifiers to yield an improved classifier. The majority vote is an example of a methodology used to combine classifiers in an ensemble method. In this paper, we propose to combine classifiers using an associative memory model. Precisely, we introduce ensemble methods based on recurrent correlation associative memories (RCAMs) for binary classification problems. We show that an RCAM-based ensemble classifier can be viewed as a majority vote classifier whose weights depend on the similarity between the base classifiers and the resulting ensemble method. More precisely, the RCAM-based ensemble combines the classifiers using a recurrent consult and vote scheme. Furthermore, computational experiments confirm the potential application of the RCAM-based ensemble method for binary classification problems.

This work was supported in part by CNPq under grant no. 310118/2017-4, FAPESP under grant no. 2019/02278-2, and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Ensemble of Classifiers with Modification of Confidence Values

Measuring Diversity and Accuracy in ANN Ensembles

A Multi-agent Ensemble of Classifiers

Keywords

1 Introduction

Inspired by the idea that multiple opinions are crucial before making a final decision, ensemble methods make predictions by consulting multiple different predictors [31]. Apart from their similarity with some natural decision-making methodologies, ensemble methods have a strong statistical background. Namely, ensemble methods aim to reduce the variance – thus increasing the accuracy – by combining multiple different predictors. Due to their versatility and effectiveness, ensemble methods have been successfully applied to a wide range of problems including classification, regression, and feature selection. As a preliminary study, this paper only addresses ensemble methods for binary classification problems.

Although there is no rigorous definition of an ensemble classifier [23], they can be conceived as a group of base classifiers, also called weak or base classifiers. As to the construction of an ensemble classifier, we must take into account the diversity of the base classifiers and the rule used to combine them [23, 30]. There are a plethora of ensemble methods in the literature, including bagging, pasting, random subspace, boosting, and stacking [2, 10, 14, 38]. For example, a bagging ensemble classifier is obtained by training copies of a single base classifier using different subsets of the training set [2]. Similarly, a random subspace classifier is obtained by training copies of a classifier using different subsets of features [14]. In both bagging and random subspace ensembles, the base classifiers are then combined using a voting scheme. Random forest is a successful example of an ensemble of decision tree classifiers trained using both bagging and random subspace ensemble ideas [3].

In contrast to the traditional majority voting, in this paper, we propose to combine the base classifiers using an associative memory. Associative memories (AMs) refer to a broad class of mathematical models inspired by the human brain’s ability to store and recall information by association [1, 13, 21]. The Hopfield neural network is a typical example of a recurrent neural network able to implement an associative memory [15]. Despite its many successful applications [16, 32,33,34], the Hopfield neural network suffers from an extremely low storage capacity as an associative memory model [24]. To overcome the low storage capacity of the Hopfield network, many prominent researchers proposed alternative learning schemes [18, 27] as well as improved network architectures. In particular, the recurrent correlation associative memories (RCAMs), proposed by Chiueh and Goodman [5], can be viewed as a kernelized version of the Hopfield neural network [8, 9, 29]. In this paper, we apply the RCAMs to combine binary classifiers in an ensemble method.

At this point, we would like to remark that associative memories have been previously used by Kultur et al. to improve the performance of an ensemble method [22]. Apart from addressing a regression problem, Kultur et al. use an associative memory in parallel to an ensemble of multi-layer perceptrons. The resulting model is called ensemble of neural networks with associative memory (ENNA). Our approach, in contrast, uses an associative memory to combine the base classifiers. Besides, Kultur et al. associate patterns using the k-nearest neighbor algorithm which is formally a non-parametric method used for classification or regression. Differently, we use recurrent correlation associative memories, which are models conceived to implement associative memories.

The paper is organized as follows: The next section reviews the recurrent correlation associative memories. Ensemble methods are presented in Sect. 3. The main contribution of the manuscript, namely the ensemble classifiers based on associative memories, are addressed in Sect. 3.2. Section 4 provides some computational experiments. The paper finishes with some concluding remarks in Sect. 5.

2 A Brief Review on Recurrent Correlation Associative Memories

Recurrent correlation associative memories (RCAMs) has been introduced by Chiueh and Goodman as an improved version of the famous correlation-based Hopfield neural network [5, 15].

Briefly, an RCAM is obtained by decomposing the Hopfield network with Hebbian learning into a two-layer recurrent neural network. The first layer computes the inner product (correlation) between the input and the memorized items followed by the evaluation of a non-decreasing continuous activation function. The subsequent layer yields a weighted average of the stored items.

In mathematical terms, a RCAM is defined as follows: Let $\mathbb {B} = \{-1,+1\}$ and $f:[-1,+1]\rightarrow \mathbb {R}$ be a continuous non-decreasing real-valued function. Given a fundamental memory set $\mathcal {U}=\{\mathbf{u}^1,\ldots ,\mathbf{u}^P \} \subset \mathbb {B}^N$, the neurons in the first layer of a bipolar RCAM yield

$$\begin{aligned} w_\xi (t)= f\left( \frac{1}{N} \sum _{i=1}^N z_i(t) u_i^\xi \right) , \quad \forall \xi \in 1,\ldots ,P, \end{aligned}$$

(1)

where $\textit{\textbf{z}}(t)=[z_1(t),z_2(t),\ldots ,z_N(t)]^T \in \mathbb {B}^N$ denotes the current state of the network and $\mathbf{u}^\xi = [u_1^\xi ,\ldots ,u_N^\xi ]^T$ is the $\xi $th fundamental memory. The activation potential of the output neuron $a_i(t)$ is given by the following weighted sum of the memory items:

$$\begin{aligned} a_i(t)=\sum _{\xi =1}^P w_\xi (t) u_i^\xi , \quad \forall i=1,\ldots ,N.\end{aligned}$$

(2)

Finally, the state of the ith neuron of the RCAM is updated as follows for all $i=1,\ldots ,N$:

$$\begin{aligned} z_i(t+1) = {\left\{ \begin{array}{ll} {\text {sgn}}\big (a_i(t)\big ) &{} a_i(t) \ne 0, \\ z_i(t), &{} \hbox {otherwise}. \end{array}\right. } \end{aligned}$$

(3)

From (2), we refer to $w_\xi (t)$ as the weight associated to the $\xi $th memory item.

In contrast to the Hopfield neural network, the sequence $\{\textit{\textbf{z}}(t)\}_{t \ge 0}$ produced by an RCAM is convergent in both synchronous and asynchronous update modes independently of the number of fundamental memories and the initial state vector $\textit{\textbf{z}}(0)$ [5]. In other words, the limit $\textit{\textbf{y}}= \lim _{t \rightarrow \infty } \textit{\textbf{z}}(t+1)$ of the sequence given by (3) is well defined using either synchronous or asynchronous update.

As an associative memory model, an RCAM designed for the storage and recall of the vectors $\mathbf{u}^1,\ldots ,\mathbf{u}^P$ proceeds as follows: Given a stimulus (initial state) $\textit{\textbf{z}}(0)$, the vector recalled by the RCAM is $\textit{\textbf{y}}= \lim _{t \rightarrow \infty } \textit{\textbf{z}}(t+1)$.

Finally, the function f defines different RCAM models. For example:

1.
The correlation RCAM or identity RCAM is obtained by considering in (1) the identity function $f_i(x)=x$.
2.
The exponential RCAM, which is determined by
$$\begin{aligned} f_e(x;\alpha )= e^{\alpha x}, \quad \alpha >0. \end{aligned}$$
(4)

The identity RCAM corresponds to the traditional Hopfield network with Hebbian learning and self-feedback. Different from the Hopfield network and the identity RCAM, the storage capacity of the exponential RCAM scales exponentially with the dimension of the memory space. Apart from the high storage capacity, the exponential RCAM can be easily implemented on very large scale integration (VLSI) devices [5]. Furthermore, the exponential RCAM allows for a Bayesian interpretation [11] and it is closely related to support vector machines and the kernel trick [8, 9, 29]. In this paper, we focus on the exponential RCAM, formerly known as exponential correlation associative memory (ECAM).

3 Ensemble of Binary Classifiers

An ensemble classifier combines a group of single classifiers, also called weak or base classifiers, in order to provide better classification accuracy than a single one [23, 31, 38]. Although this approach is partially inspired by the idea that multiple opinions are crucial before making a final decision, ensemble classifiers have a strong statistical background. Namely, ensemble classifiers reduce the variance combining the base classifiers. Furthermore, when the amount of training data available is too small compared to the size of the hypothesis space, the ensemble classifier “mixes” the base classifiers reducing the risk of choosing the wrong single classifier [19].

Formally, let $ \mathcal {T} = \{(\textit{\textbf{t}}_1,d_1),\dots , (\textit{\textbf{t}}_M,d_M)\}$ be a training set where $\textit{\textbf{t}}_i \in \mathcal {X}$ and $d_i \in \mathcal {C}$ are respectively the feature sample and the class label of the ith training pair. Here, $\mathcal {X}$ denotes the feature space and $\mathcal {C}$ represents the set of all class labels. In a binary classification problem, we can identify $\mathcal {C}$ with $\mathbb {B} = \{-1,+1\}$. Moreover, let $h_1,h_2,\ldots ,h_P:\mathcal {X} \rightarrow \mathcal {C}$ be base classifiers trained using the whole or part of the training set $\mathcal {T}$.

Usually, the base classifiers are chosen according to their accuracy and diversity. On the one hand, an accurate classifier is one that has an error rate better than random guessing on new instances. On the other hand, two classifiers are diverse if they make different errors on new instances [12, 19].

Bagging and random subspace ensembles are examples of techniques that can be used to ensure the diversity of the base classifiers. The idea of bagging, an acronym for Bootstrap AGGregatING, is to train copies of a certain classifier h on subsets of the training set $\mathcal {T}$ [2]. The subsets are obtained by sampling the training $\mathcal {T}$ with replacement, a methodology known as bootstrap sampling [23]. In a similar fashion, random subspace ensembles are obtained by training copies of a certain classifier h using different subsets of the feature space [14]. Random forest, which is defined as an ensemble of decision tree classifiers, is an example of an ensemble classifier that combines both bagging and random subspace techniques [3].

Another important issue that must be addressed in the design of an ensemble classifier is how to combine the base classifiers. In the following, we review the majority voting methodology – one of the oldest and widely used combination scheme. The methodology based on associative memories is introduced and discussed subsequently.

3.1 Majority Voting Classifier

As remarked by Kuncheva [23], majority voting is one of the oldest strategies for decision making. In a wide sense, a majority voting classifier yields the class label with the highest number of occurrences among the base classifiers [10, 35].

Formally, let $h_1,h_2,\ldots ,h_P:\mathcal {X} \rightarrow \mathcal {C}$ be the base classifiers. The majority voting classifier, also called hard voting classifier and denoted by $H_v:\mathcal {X} \rightarrow \mathcal {C}$, is defined by means of the equation

$$\begin{aligned} H_v(\textit{\textbf{x}}) = \underset{c \in \mathcal {C}}{{\text {argmax}}}\, \sum _{\xi =1}^P w_\xi \mathcal {I}[h_\xi (\textit{\textbf{x}})=c], \quad \forall \textit{\textbf{x}}\in \mathcal {X}, \end{aligned}$$

(5)

where $w_1,\ldots ,w_P$ are the weights of the base classifiers and $\mathcal {I}$ is the indicator function, that is,

$$\begin{aligned} \mathcal {I}[h_\xi (\textit{\textbf{x}})=c] = {\left\{ \begin{array}{ll} 1, &{} h_\xi (\textit{\textbf{x}})=c,\\ 0, &{} \hbox {otherwise}. \end{array}\right. } \end{aligned}$$

(6)

When $\mathcal {C}=\{-1,+1\}$, the majority voting ensemble classifier given by (5) can be written alternatively as

$$\begin{aligned} H_h(\textit{\textbf{x}}) = {\text {sgn}}\left( \sum _{\xi =1}^P w_\xi h_\xi (\textit{\textbf{x}})\right) , \quad \forall \textit{\textbf{x}}\in \mathcal {X}, \end{aligned}$$

(7)

whenever $\sum _{\xi =1}^P w_\xi h_\xi (\textit{\textbf{x}}) \ne 0$ [7].

3.2 Ensemble Based on Bipolar Associative Memories

Let us now introduce the ensemble classifiers based on the RCAM models. In analogy to the majority voting ensemble classifier, the RCAM-based ensemble classifier is formulated using only the base classifiers $h_1,\ldots ,h_P:\mathcal {X} \rightarrow \mathbb {B}$. Precisely, consider a training set $\mathcal {T} = \{(\textit{\textbf{t}}_i,d_i):i=1,\ldots ,M\} \subset \mathcal {X}\times \mathbb {B}$ and let $X = \{\textit{\textbf{x}}_{1},\ldots ,\textit{\textbf{x}}_{L} \} \subset \mathcal {X}$ be a batch of input samples. We first define the fundamental memories as follows for all $\xi =1,\ldots ,P$:

$$\begin{aligned} \mathbf{u}^\xi = [h_\xi (\textit{\textbf{t}}_1),\ldots ,{h_\xi (\textit{\textbf{t}}_M)},h_\xi (\textit{\textbf{x}}_1),\ldots ,h_\xi (\textit{\textbf{x}}_L)]^T \in \mathbb {B}^{M+L}. \end{aligned}$$

(8)

In words, the $\xi $th fundamental memory is obtained by concatenating the outputs of the $\xi $th base classifier evaluated at the M training samples and the L input samples. The bipolar RCAM is synthesized using the fundamental memory set $\mathcal {U}=\{\mathbf{u}^1,\ldots ,\mathbf{u}^P\}$ and it is initialized at the state vector

$$\begin{aligned} \textit{\textbf{z}}(0) = [d_1,d_2,\ldots ,d_M, \underbrace{0,0,\ldots ,0}_{L-\hbox {times}}]^T. \end{aligned}$$

(9)

Note that the first M components of initial state $\textit{\textbf{z}}(0)$ correspond to the targets in the training set $\mathcal {T}$. The last L components of $\textit{\textbf{z}}(0)$ are zero, a neutral element different from the class labels. The initial state $\textit{\textbf{z}}(0)$ is presented as input to the associative memory and the last L components of the recalled vector $\textit{\textbf{y}}$ yield the class label of the batch of input samples $X = \{\textit{\textbf{x}}_1,\ldots ,\textit{\textbf{x}}_L\}$. In mathematical terms, the RCAM-based ensemble classifier $H_a:\mathcal {X} \rightarrow \mathbb {B}$ is defined by means of the equation

$$\begin{aligned} H_a(\textit{\textbf{x}}_i) = y_{M+i}, \quad \forall \textit{\textbf{x}}_i \in X, \end{aligned}$$

(10)

where $\textit{\textbf{y}}= [y_1,\ldots ,y_M,y_{M+1},\ldots ,y_{M+L}]^T$ is the limit of the sequence $\{\textit{\textbf{z}}(t)\}_{t\ge 0}$ given by (3).

In the following, we point out the relationship between the bipolar RCAM-based ensemble classifier and the majority voting ensemble described by (7). Let $\textit{\textbf{y}}$ be the vector recalled by the RCAM fed by the input $\textit{\textbf{z}}(0)$ given by (9), that is, $\textit{\textbf{y}}$ is a stationary state of the RCAM. From (2), (3), and (8), the output of the RCAM-based ensemble classifier satisfies

$$\begin{aligned} H_a(\textit{\textbf{x}}_i) = {\text {sgn}}\left( \sum _{\xi =1}^P w_\xi h_\xi (\textit{\textbf{x}}_i) \right) , \end{aligned}$$

(11)

where

$$\begin{aligned} w_\xi = f\left( \frac{1}{M+L}\sum _{i=1}^{M+L} y_i u_i^\xi \right) , \quad \forall \xi =1,\ldots ,P. \end{aligned}$$

(12)

From (11), the bipolar RCAM-based ensemble classifier can be viewed as a weighted majority voting classifier. Furthermore, the weight $w_\xi $ depends on the similarity between the $\xi $th base classifier $h_\xi $ and the ensemble classifier $H_a$. Precisely, let us define the similarity between two binary classifiers $H,h_\xi :\mathcal {X} \rightarrow \mathbb {B}$ on a set of samples S by means of the equation

$$\begin{aligned} \mathtt {Sim}(H,h) = \frac{1}{\hbox {Card}(S)} \sum _{\textit{\textbf{s}} \in S} \mathcal {I}\big [ h(\textit{\textbf{s}}) = H(\textit{\textbf{s}})\big ]. \end{aligned}$$

(13)

Using (13), we can state the following theorem:

Theorem 1

The weights of the RCAM-based ensemble classifier given by (11) satisfies the following identities for all $\xi =1,\ldots ,P$:

$$\begin{aligned} w_\xi = f \big (1-2\cdot \mathtt {Sim}(H_a, h_\xi ) \big ), \quad \forall t \ge 1, \end{aligned}$$

(14)

where the similarity in (14) is evaluated on the union of all training and input samples, that is, on $S = X \cup T = \{\textit{\textbf{t}}_1,\ldots ,\textit{\textbf{t}}_M\}\cup \{\textit{\textbf{x}}_1,\ldots ,\textit{\textbf{x}}_L\}$.

Proof

Since we are considering a binary classification problem, the similarity between the ensemble $H_a$ and the base classifier $h_\xi $ on $S = X \cup T$, with $N = \text {Card}(S) = M+L$, satisfies the following identities:

$$\begin{aligned} \mathtt {Sim}(H, h)&= 1 - \frac{1}{N}\sum _{i=1}^N \mathcal {I}[h(\textit{\textbf{s}}_i) \ne H_a(\textit{\textbf{s}}_i)] = 1 - \frac{1}{4N}\sum _{i=1}^N \big (h(\textit{\textbf{s}}_i)- H_a(\textit{\textbf{s}}_i)\big )^2 \\&= 1 - \frac{1}{2N}\sum _{i=1}^N \big (1-H_a(\textit{\textbf{s}}_i) h(\textit{\textbf{s}}_i)\big ) = \frac{1}{2}\left( 1 - \frac{1}{N} \sum _{i=1}^N H_a(\textit{\textbf{s}}_i) h(\textit{\textbf{s}}_i) \right) \end{aligned}$$

Equivalently, we have

$$\begin{aligned} \frac{1}{\hbox {Card}(S)} \sum _{\textit{\textbf{s}} \in S} H(\textit{\textbf{s}})h(\textit{\textbf{s}}) = 1 - 2 \cdot \mathtt {Sim}(H,h). \end{aligned}$$

(15)

Now, from (1), (10), and (15), we obtain the following identities:

$$\begin{aligned} w_\xi&= f\left( \frac{1}{N}\sum _{i=1}^N y_i u_i^\xi \right) = f\big (1-2\cdot \mathtt {Sim}(H_a,h_\xi )\big ), \end{aligned}$$

which concludes the proof.

Theorem 1 shows that the RCAM-based ensemble classifier is a majority voting classifier whose weights depend on the similarity between the base classifiers and the ensemble itself. In fact, in view of the dynamic nature of the RCAM model, $H_a$ is obtained by a recurrent consult and vote scheme. Moreover, at the first step, the weights depend on the accuracy of the base classifiers.

4 Computational Experiments

In this section, we perform some computational experiments to evaluate the performance of the proposed RCAM-based ensemble classifiers for binary classification tasks. Precisely, we considered the RCAM-based ensembles obtained using the identity and the exponential as the activation function f. The parameter $\alpha $ of the exponential activation function has been either set to $\alpha =1$ or it has been determined using a grid search on the set $\{10^{-2},10^{-1},0.5,1,5,10,20,50\}$ with 5-fold cross-validation on the training set. The RCAM-based ensemble classifiers have been compared with AdaBoost, gradient boosting, and random forest ensemble classifiers, all available at the python’s scikit-learn API (sklearn) [28].

First of all, we trained AdaBoost and gradient boosting ensemble classifiers using the default parameters of sklearn. Recall that boosting ensemble classifiers are developed incrementally by adding base classifiers to reduce the number of misclassified samples [23]. Also, we trained the random forest classifier with 30 base classifiers ($P=30$) [3]. Recall that the base classifiers of the random forest are decision trees obtained using bagging and random subspace techniques [2, 14]. Then, we used the base classifiers from the trained random forest ensemble to define the RCAM-based ensemble. In other words, the same base classifiers $h_1,\ldots ,h_{30}$ are used in the random forest and the RCAM-based classifiers. The difference between the ensemble classifiers resides in the combining rule. Recall that the random forest combines the base classifiers using majority voting. From the computational point of view, training the random forest and the RCAM-ensemble classifiers required similar resources. Moreover, despite the consult and vote scheme of the RCAM-based ensemble, they have not been significantly more expensive than the random forest classifier. The grid search used to fine-tune the parameter $\alpha $ of the exponential RCAM-based ensemble is the major computational burden in this computational experiment.

For the comparison of the ensemble classifiers, we considered 28 binary classification problems from the OpenML repository [36]. These binary classification problems can be obtained using the command fetch_openml from sklearn. We would like to point out that missing data has been handled before splitting the data set into training and test sets using the command SimpleImputer from sklearn. Also, we pre-processed the data using the StandardScaler transform. Therefore, each feature is normalized by subtracting the mean and dividing by the standard deviation, both computed using only the training set. Furthermore, since some data sets are unbalanced, we used the F-measure to evaluate quantitatively the performance of a certain classifier. Table 1 shows the mean and the standard deviation of the F-measure obtained from the ensemble classifiers using stratified 10-fold cross-validation. The largest F-measures for each data set have been typed using boldface. Note the exponential RCAM-based ensemble classifier with grid search produced the largest F-measures in 11 of the 28 data sets. In particular, the exponential RCAM with grid search produced outstanding F-measures on the “Monks-2” and “Egg-Eye-State” data sets. For a better comparison of the ensemble classifiers, we followed Demšar’s recommendations to compare multiple classifier models using multiple data sets [6]. The Friedman test rejected the hypothesis that there is no difference between the ensemble classifiers. A visual interpretation of the outcome of this computational experiment is provided in Fig. 1 with the Hasse diagram of the non-parametric Wilcoxon signed-rank test with a confidence level at 95% [4, 37]. In this diagram, an edge means that the classifier on the top statistically outperformed the classifier on the bottom. The outcome of this analysis confirms that the RCAM-based ensemble classifiers statistically outperformed the other ensemble methods: AdaBoost, gradient boosting, and random forest.

Table 1. Mean and standard deviation of the F-measures produced by ensemble classifiers using stratified 10-fold cross-validation.

Full size table

As to the computational effort, Fig. 2 shows the average time required by the ensemble classifiers for the prediction of a batch of testing samples. Note that the most expensive method is identity RCAM-based ensemble classifier while the gradient boosting is the cheapest. The exponential RCAM-based ensemble is less expensive than the AdaBoost and quite comparable to the random forest classifier.

Finally, note from Table 1 that some problems such as the “Banknote”’ and the “MOFN-3-7-10” data sets are quite easy while others such as the “Haberman” and “Hill Valley” are very hard. In order to circumvent the difficulties imposed by each data set, Fig. 3 shows a box-plot with the normalized F-measure values provided in Table 1. Precisely, for each data set (i.e., each row in Table 1), we subtracted the mean and divided by the standard deviation of the score values. The box-plot in Fig. 3 confirms the good performance of the RCAM-based ensemble classifiers, including the exponential RCAM-based ensemble classifier with a grid search. Concluding, the boxplots shown on Figs. 2 and 3 supports the potential application of the RCAM models as an ensemble of classifiers for binary classification problems.

5 Concluding Remarks

This paper provides a bridge between ensemble methods and associative memories. In general terms, an ensemble method reduces variance and improve the accuracy and robustness by combining a group of base predictors [23, 38]. The rule used to combine the base predictors is one important issue in the design of an ensemble method. In this paper, we propose to combine the base predictors using an associative memory. Associative memory is a model designed for the storage and recall of a set of vectors [13]. Furthermore, an associative memory should be able to retrieve a stored item from a corrupted or partial version of it. In an ensemble method, the memory model is designed for the storage of evaluations of the base classifiers. The associative memory is then fed by a vector with the target of training data as well as the unknown predictions. The output of the ensemble method is obtained from the vector retrieved by the memory.

Specifically, in this paper, we presented ensemble methods based on the recurrent correlation associative memories (RCAMs) for binary classifications. RCAMs, proposed by Chiueh and Goodman [5], are high storage capacity associative memories which, besides Bayesian and kernel trick interpretation, are particularly suited for VLSI implementation [8, 9, 11, 29]. Theorem 1 shows that the RCAM model yields a majority voting classifier whose weights are obtained by a recurrent consult and vote scheme. Moreover, the weights depend on the similarity between the base classifiers and the resulting ensemble. Computational experiments using decision tree as the base classifiers revealed an outstanding performance of the exponential RCAM-based ensemble classifier combined with a grid search strategy to fine-tune its parameter. The exponential RCAM-based ensemble, in particular, outperformed the traditional AdaBoost, gradient boosting, and random forest classifiers.

In the future, we plan to investigate further associative memory-based ensemble methods. In particular, we plan to extend these ensemble methods to multi-class classification problems using, for instance, multistate associative memory models [17, 20, 25, 26].

References

Austin, J.: ADAM: a distributed associative memory for scene analysis. In: Proceedings of the IEEE First International Conference on Neural Networks, vol. IV, p. 285. San Diego (1987)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996). https://doi.org/10.1023/A:1018054314350
Article MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Burda, M.: Paircompviz: An R Package for Visualization of Multiple Pairwise Comparison Test Results (2013). https://doi.org/10.18129/B9.bioc.paircompviz
Chiueh, T., Goodman, R.: Recurrent correlation associative memories. IEEE Trans. Neural Netw. 2, 275–284 (1991)
Article Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Ferreira, A., Figueiredo, M.: Boosting algorithms: a review of methods, theory, and applications. In: Zhang, C., Ma, Y. (eds.) Ensemble Machine Learning: Methods and Applications, pp. 35–85. Springer (2012). https://doi.org/10.1007/978-1-4419-9326-7_2
García, C., Moreno, J.A.: The hopfield associative memory network: improving performance with the kernel “Trick”. In: Lemaître, C., Reyes, C.A., González, J.A. (eds.) IBERAMIA 2004. LNCS (LNAI), vol. 3315, pp. 871–880. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30498-2_87
Chapter Google Scholar
García, C., Moreno, J.A.: The kernel hopfield memory network. In: Sloot, P.M.A., Chopard, B., Hoekstra, A.G. (eds.) ACRI 2004. LNCS, vol. 3305, pp. 755–764. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30479-1_78
Chapter Google Scholar
Géron, A.: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media (2019)
Google Scholar
Hancock, E.R., Pelillo, M.: A Bayesian interpretation for the exponential correlation associative memory. Pattern Recogn. Lett. 19(2), 149–159 (1998)
Article Google Scholar
Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)
Article Google Scholar
Du, K.-L., Swamy, M.N.S.: Associative Memory Networks. Neural Networks and Statistical Learning. LNCS, pp. 201–229. Springer, London (2019). https://doi.org/10.1007/978-1-4471-7452-3_8
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Article Google Scholar
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. 79, 2554–2558 (1982)
Article MathSciNet Google Scholar
Hopfield, J., Tank, D.: Neural computation of decisions in optimization problems. Biol. Cybern. 52, 141–152 (1985)
MATH Google Scholar
Jankowski, S., Lozowski, A., Zurada, J.: Complex-valued multi-state neural associative memory. IEEE Trans. Neural Netw. 7, 1491–1496 (1996)
Article Google Scholar
Kanter, I., Sompolinsky, H.: Associative recall of memory without errors. Phys. Rev. 35, 380–392 (1987)
Article Google Scholar
Kittler, J., Roli, F.: 2000 Proceedings of the Multiple Classifier Systems: First International Workshop, MCS 2000, Cagliari, Italy, June 21–23. Springer (2003)
Google Scholar
Kobayashi, M.: Quaternionic Hopfield neural networks with twin-multistate activation function. Neurocomputing 267, 304–310 (2017). https://doi.org/10.1016/j.neucom.2017.06.013
Article Google Scholar
Kohonen, T.: Self-Organization and Associative Memory, 2rd edn. Springer, New York (1987)
Google Scholar
Kultur, Y., Turhan, B., Bener, A.: Ensemble of neural networks with associative memory (ENNA) for estimating software development costs. Knowl.-Based Syst. 22(6), 395–402 (2009)
Article Google Scholar
Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms, 2 edn. Wiley (2014)
Google Scholar
McEliece, R.J., Posner, E.C., Rodemich, E.R., Venkatesh, S.: The capacity of the Hopfield associative memory. IEEE Trans. Inf. Theory 1, 33–45 (1987)
MathSciNet MATH Google Scholar
Minemoto, T., Isokawa, T., Nishimura, H., Matsui, N.: Quaternionic multistate Hopfield neural network with extended projection rule. Artif. Life Robot. 21(1), 106–111 (2015). https://doi.org/10.1007/s10015-015-0247-4
Article Google Scholar
Müezzinoǧlu, M., Güzeliş, C., Zurada, J.: A new design method for the complex-valued multistate Hopfield associative memory. IEEE Trans. Neural Netw. 14(4), 891–899 (2003)
Article Google Scholar
Müezzinoǧlu, M., Güzelis, C., Zurada, J.: An energy function-based design method for discrete Hopfield associative memory with attractive fixed points. IEEE Trans. Neural Netw. 16(2), 370–378 (2005)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Perfetti, R., Ricci, E.: Recurrent correlation associative memories: a feature space perspective. IEEE Trans. Neural Netw. 19(2), 333–345 (2008)
Article Google Scholar
Polikar, R.: Ensemble learning. In: Zhang, C., Ma, Y. (eds.) Ensemble Machine Learning: Methods and Applications, pp. 1–34. Springer (2012). https://doi.org/10.1007/978-1-4419-9326-7_1
Ponti Jr, M.P.: Combining classifiers: from the creation of ensembles to the decision fusion. In: 2011 24th SIBGRAPI Conference on Graphics, Patterns, and Images Tutorials, pp. 1–10. IEEE (2011)
Google Scholar
Serpen, G.: Hopfield network as static optimizer: learning the weights and eliminating the guesswork. Neural Process. Lett. 27(1), 1–15 (2008). https://doi.org/10.1007/s11063-007-9055-8
Article Google Scholar
Smith, K., Palaniswami, M., Krishnamoorthy, M.: Neural techniques for combinatorial optimization with applications. IEEE Trans. Neural Netw. 9(6), 1301–1318 (1998)
Article Google Scholar
Sun, Y.: Hopfield neural network based algorithms for image restoration and reconstruction II. Perform. Anal. IEEE Trans. Sign. Process. 48(7), 2119–2131 (2000). https://doi.org/10.1109/78.847795
Article MATH Google Scholar
Van Erp, M., Vuurpijl, L., Schomaker, L.: An overview and comparison of voting methods for pattern recognition. In: Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 195–200. IEEE (2002)
Google Scholar
Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013). https://doi.org/10.1145/2641190.2641198
Article Google Scholar
Weise, T., Chiong, R.: An alternative way of presenting statistical test results when evaluating the performance of stochastic approaches. Neurocomputing 147, 235–238 (2015). https://doi.org/10.1016/j.neucom.2014.06.071
Article Google Scholar
Zhang, C., Ma, Y. (eds.): Ensemble Machine Learning: Methods and Applications. Springer (2012). https://doi.org/10.1007/978-1-4419-9326-7

Download references

Author information

Authors and Affiliations

University of Campinas, Campinas, Brazil
Rodolfo Anibal Lobo & Marcos Eduardo Valle

Authors

Rodolfo Anibal Lobo
View author publications
You can also search for this author in PubMed Google Scholar
Marcos Eduardo Valle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rodolfo Anibal Lobo .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, Brazil
Ricardo Cerri
Federal University of ABC, Santo Andre, Brazil
Ronaldo C. Prati

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lobo, R.A., Valle, M.E. (2020). Ensemble of Binary Classifiers Combined Using Recurrent Correlation Associative Memories. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-61380-8_30
Published: 13 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61379-2
Online ISBN: 978-3-030-61380-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Ensemble of Binary Classifiers Combined Using Recurrent Correlation Associative Memories

Abstract

Similar content being viewed by others

Ensemble of Classifiers with Modification of Confidence Values