Keywords

1 Introduction

One of the most common data mining tasks consists in assigning a set of inputs to a given class or classes, for which it is required a statistical model representing a mapping from input data (normally described by several attributes) to the appropriate category. This model approximates the true mapping from inputs to outputs. A decision of to which category a new, unseen input belongs, can be reached [11].

Classification focuses on methods that establish a dependency within data, concentrating on the so-called target attribute [1]. Classification algorithms are intended for modeling the dependency between input attributes and the target attribute. Each object x is described by its attributes, which in turn define the value of the target attribute Y. Thus, the dataset D from which the model is constructed consists of a finite set of tuples such as \(D = \{(x_i, Y_i),\,i = 1 \cdots n\}\). If the target attribute Y possesses nominal values, we are dealing with a classification problem. If the target attribute is described by continuous numeric values, we face a regression problem. Along this text we employ the term classification referring to both tasks, and classifier to the algorithm that computes classification.

However, the usage of a single classifier might not be the best decision to complete a classification task because its performance is affected by several factors i.e. the initial parameters of the algorithm, the distribution of the training dataset, risk of overtraining, among others. This provokes that the arrival of new objects that do not match the statistical model decreases the performance of the classifier. These problems are aggravated when classifiers learn from real datasets, where the distribution of input objects might be unbalanced.

The performance of classifiers is quantified by using the confusion matrix and derived metrics such as the F-Measure and the area under the ROC curve. The F-Measure is used in multi-class problems, while the area under the ROC curve serves only for binary problems. The Weka platform implements the method proposed by Mann Whitney [15] to compute the area of the ROC curve.

It is thought that Ensembles of Classifiers (EoC) have better performance than single classifiers because they benefit from diversity. In an EoC the final conclusion is obtained by aggregating individual decisions. Information fusion is largely employed to that end. As reported in [11] the concept of EoC’s has been studied at large: Stacking [4], bagging [2], boosting [5], model averaging and forecast combining are classic methods to form ensembles. A survey of Ensembles of Classifiers can be found in [17].

The main challenges regarding EoC’s consist in finding a procedure to employ base classifiers and rules to combine their individual solutions. One of the essential requirements to form ensembles is that two base classifiers do not make the same mistakes on new input data. That is to say, the errors made by the individual predictors must be uncorrelated. For example, if the final solution is obtained by simple majority voting, and if it is assumed that the mistakes made by the classifiers are independent, then the ensemble will misclassify a new input object only if more than half the base classifiers make mistakes. This situation in highly unlikely in heterogeneous EoC.

Thus, the design of EoC’s should include a set of base classifiers so the ensemble yields the highest possible performance [13]. To increase the efficiency of the EoC it has been suggested that each base classifier learn from a subset of the original dataset, without duplicating training subsets [7].

We explore the multi-agent paradigm to form ensembles of classifiers. We call this approach a Multi-Agent Ensemble of Classifiers (MAEoC). A software agent is a computer system that is situated in some environment and possesses a strong notion of agency (self-directed behavior). That is to say, an agent is capable of performing autonomous actions within an environment [14, 16]. A Multi-Agent System (MAS) consists of agents that respond to changes in the environment and interact with other agents by using a communication language, such as the Agent communication Language (ACL).

MAS are suitable for building dynamic ensembles of classifiers: It is feasible to develop an environment with a number of classifier agents where each of them performs its duties (a classification task), communicate to other agents its results (what instances where correctly and incorrectly classified), and learn from what other classifier agents do well. Classifier agents complement each other. To the best of the authors’ knowledge, MAEoC constitute a novel approach to designing Ensembles of Classifiers.

In Sect. 2 we describe the main notions to form EoC with MAS. We test our MASEoC on a demanding classification task: To determine to which emotional tag some well-defined input parameters used to create fractal music belong. The dataset for this task is described in Sect. 3. The experimental results and comparisons with other types of ensembles is given is Sect. 4. We apply our MAEoC to assist in the creation of musical fragments. Conclusions and a roadmap leading to improvements are delineated in Sect. 5.

2 The Multi-agent Ensemble of Classifiers

The algorithm we propose to form an Ensemble of Classifiers relies on two premises: (i) the performance of base classifiers and (ii) the communication of hits (H) and failures (F) obtained by base classifiers. The steps of the algorithm are outlined next:

  1. 1.

    At time t = 0

    • Coordinator agent recruits m classifiers, \(m > 2\), and launches m classifier agents.

    • Coordinator agent broadcasts dataset D to classifier agent\(_i\), \(\forall i, i \cdots m\).

    • Classifier agent\(_i\) performs a ten fold cross-validation. F-Measure\(_i\) is calculated.

    • Classifier agent\(_i\), \(\forall i, i \cdots m\), informs Coordinator Agent two subsets. Subset \(H_i\) contains objects correctly classified; subset \(F_i\) contains objects incorrectly classified.

  2. 2.

    At time t = 1

    • Coordinator agent forms two aggregated sets: AH and AF. \(AH = \cup H_i\); \(AF = \cup F_i\).

    • Coordinator agent launches classifier\(_{m+1}\), based on the highest F-Measure\(_i\) obtained at \(t=0\).

    • Classifier\(_{m+1}\) is trained with set AF. Model \(M_{m+1}\) is added to the ensemble. F-Measure C\(_{m+1}\) is obtained by ten fold cross validation on AF.

    • Classifiers\(_{1 \cdots m}\) are trained with set AH. Models \(M_{1 \cdots m}\) are added to the ensemble. F-Measures\(_{1 \cdots m}\) are obtained by ten fold cross-validation on AH.

  3. 3.

    At time t = 2

    • Classifiers\(_{1 \cdots m+1}\) are given weights according to their updated F-Measure at \(t=1\).

    • Weighted voting is used to reach a final conclusion.

To validate our EoC, we perform a classification task whose objective is to assign an input object \(x_{input}\) to one of the sixteen emotional tags of Russell’s Circumplex Model of Affect [12]. Input object \(x_{input}\) possesses attributes to create fractal music as described in [9]. Target attribute Y is the emotional tag.

3 The Psychoacoustics Dataset

To test our ensemble we employ a psychoacoustic dataset, which contains the emotional responses to fractal music, as reported in [8]. It is a challenging dataset because:

  • It is a multi-class dataset. There are sixteen different emotions as values of the target variable.

  • It is an unbalanced dataset.

  • It is a high-dimension dataset. Each object \(x_i\) is defined by fifteen attributes, on which fractal musical fragments are created.

  • Each object \(x_i\) is defined by mixed attributes i.e. nominal and numeric values.

  • The target attribute \(Y_i\) is an emotional tag associated to each object \(x_i\).

Thus, \(D = \{(x_i, Y_i),\,i = 1 \cdots n\}\) contains diverse combinations of input parameters, and the corresponding emotional tag. Input parameters refer to the chaotic system i.e. Lorenz equations, and musical parameters.

The Lorenz equations [10] are defined by variables x, y, and z, which represent the initial values of the attractor. In this case such variables represent initial notes. Variables sigma, r and b help determine the actual trajectory. Table 1 displays the range of values that were used to compute the Lorenz attractor (as a generator of melodic sequences).

Table 1. Range of Lorenz Parameters
Table 2. Sample of the psychoacoustics dataset

To complete the creation of musical fragments, musical parameters are paired with the sequence yielded b the Lorenz equations. Musical parameters are: Tempos, Notes Durations, Musical Scales, Chords and Instruments.

Values for variable Tempo are in the interval (60, 220) beats per minute. Notes Durations are tied to Tempos, so they are expressed in values such as whole duration (1), half duration (1/2), a quarter (1/4), and so on. For the present study, Notes Durations lie between 1/16 and 1. Instruments are: (i) Grand Piano, Bright Acoustic and Harpsichord; (ii) Acoustic Guitar, Steel String Guitar; (iii) Electric Clean Guitar, Electric Jazz Guitar, Guitar Harmonics, Distorted Guitar, Overdriven Guitar, Electric Muted Guitar; (iv) Violin, Viola, Cello, Tremolo, Pizzicato, Orchestral Strings, String Ensembles; and Acoustic Bass, Electric Bass Finger, Electric Bass Pick, Fretless Bass, Slap Basses, Contrabass.

The Chord variable accepts the following values: Mayor chords; Minor chords; Augmented chords; Diminished Chords; Other chords, and No chords. Musical Scales are: Pentatonic Scales; Harmonic Scales; Natural Scales; Blues Scales; Melodic Scales; No scale.

Some \((x_i, Y_i) \in D\) are shown in Table 2. Altogether, dataset D contains 2312 objects at the time of performing the experiments. Figure 1 illustrates the distribution of objects by emotional label.

Fig. 1.
figure 1

Distribution of evaluations by emotional label

4 Experimental Results

The proposed algorithm is implemented on the JADE platform [3]. Classifiers and performance measures are obtained from Weka [6]. The psychoacoustics dataset is stored on a relational database implemented in the MySQL database management system. We now present the experimental results. To do so, we follow the steps of the proposed algorithm (see Sect. 2).

The Multi-Agent Ensemble of Classifiers (MAEoC) is composed of the following base classifiers C\(_i\): (i) Naive Bayes, (ii) k-Nearest Neighbors, \(k=5\), (iii) Decision Tree J48, (iv) Support Vector Machine, and (v) Multi-Layer Perceptron. This ensemble ensures diversity of algorithms. The main results are summarized in Table 3.

Firstly, the F-Measure was quantified for each classifier C\(_i\) when they were trained with original dataset D. These results are presented in column C\(_{it0}\).

A second experiment consisted in forming ensembles using bagging, boosting, and stacking on every classifier C\(_i\).

Since dataset D is unbalanced, we modify it via Oversampling and Undersampling. Then each classifier C\(_i\) was trained with those modified datasets. The F-Measures are given in columns Oversampling and Undersampling, respectively.

We also present how the MAEoC behaves at time \(t=1\). Column C\(_{it1}\) shows the F-Measure when base classifiers C\(_i\) are trained with dataset AH. We urge reader to remember that AH is the aggregated dataset of the objects correctly classified.

Finally, the performance of the MAEoC is given.

Table 3. Performance of the EoC vs other techniques

The performance of the MAEoC can be observed in Figs. 2 and 3.

Fig. 2.
figure 2

Performance of classifiers at t = 0 and t = 1 (Color figure online)

Fig. 3.
figure 3

Performance of base classifiers vs MAEoC

We normalized the performances based on the highest F-Measure obtained when the base classifiers were trained with original dataset D. In this experiment, that honor corresponds to k-Nearest Neighbors, \(k=5\). F-Measure = 0.201 corresponds to a baseline of 100. In Fig. 2 the blue dots correspond to the normalized F-Measure obtained by the base classifiers when they were trained with the original dataset D. Thus, it can be seen that the Naive - Bayes classifier (NB) has a performance 36 % worse than 5-NN. Conversely, The red dots reflect the normalized F-Measure when the base classifiers were trained with dataset AH, that is to say, the aggregated set of correctly classified instances. Based on this data, we support our claim that communicating hits and fails is a way of reducing the influence of the data distribution in the creation of its statistical model.

Figure 3 presents a comparison of the F-Measures obtained at t = 0 and the final F-Measure obtained with the MAEoC, which improved in the order of 67 %.

4.1 Application to Computer-Assisted Creativity

The application to Computer-Assisted Creativity is presented in the following screenshots. As one of the applications of the MAEoC, we use it to help a creative subject to know what emotions will most likely be evoked by the parameters s/he enters in order to create a musical fragment. The MAEoC is used as follows. Once the MAEoC is launched, the training phase begins. This is show in Fig. 4. As stated before, dataset D contains the emotional responses to fractal music obtained in a psychoacoustics study [8]. In the let frame of the screenshot the communication among classifier agents and coordinator agent is shown.

Fig. 4.
figure 4

Training of the MAEoC

Fig. 5.
figure 5

Voting weights obtained by the members of the ensemble

Fig. 6.
figure 6

Example 1. Classification of a new input object

As soon as the MAEoC is trained, the final weights are given to each classifier. This includes the classifier \(m+1\) contemplated as part of the algorithm. These results are show in Fig. 5.

When the voting weights are assigned to the classifiers, the MAEoC is ready to classify new objects. This is illustrated in Fig. 6. Variables x, y, and z are the initial values of the Lorenz attractor. Variables sigma, r, and b define its trajectory. The remaining of the variables refer to the musical parameters necessary to create a musical fragment: Instrument (Instrumento) is Piano; Chord (acorde) is null; Musical Scales (Escala) is set to be G Major Pentatonic. Variables Tempo and Notes Durations defined the rhythm of the musical fragment. The MAEoC classifies this new input object \(X_i\) belongs to the class of objects that evoke happiness (alegría).

If that were the creative subject’s intention, then those input parameters would be used to render musical fragments; otherwise, the creative subject would be free to change their values. Figure 7 shows a different input object. The MAEoC determines that it belongs to the class of objects that provoke Distress (estres).

Fig. 7.
figure 7

Example 2. Classification of a new input object

When the MAEoC is incorporated into a system that creates music, it guides the creative subject in her/his endeavor. The MAEoC classifies newly input values into one of the sixteen classes of the CMoA, preventing the creative subject from doing educated guesses.

5 Conclusions and Future Work

We propose a Multi-Agent Ensemble of Classifiers (MAEoC). The algorithm employed to construct and exploit such ensemble is based on the dynamic calculation of a performance measure, communication, and information fusion. We tested our proposal on a demanding classification tasks: determine the most likely emotional tag based on a number of musical and fractal parameters on which a musical fragment is created. A psychoacoustics dataset is used as training source. We also compared he MAEoC with aggregation techniques (bagging, boosting, stacking), and with data manipulation techniques (undersampling and oversampling). On our experiments the MAEoC obtained the highest F-Measure.

Future work includes development of different versions of the algorithm. For instance, we can proposed a version where objects in subset AH are weighted according to the number of classifiers that predicted them correctly. We will test the MAEoC with different datasets, taken mostly from the UCI repository. We will continue acquiring data regarding the emotional responses, and train the MAEoC with upgraded versions of the dataset.