Multi-objective evolutionary optimization using the relationship between F1 and accuracy metrics in classification tasks

Fernández, Juan Carlos; Carbonero, Mariano; Gutiérrez, Pedro Antonio; Hervás-Martínez, César

doi:10.1007/s10489-019-01447-y

Multi-objective evolutionary optimization using the relationship between F₁ and accuracy metrics in classification tasks

Published: 13 April 2019

Volume 49, pages 3447–3463, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

Multi-objective evolutionary optimization using the relationship between F₁ and accuracy metrics in classification tasks

Download PDF

Juan Carlos Fernández ORCID: orcid.org/0000-0001-8849-6036¹,
Mariano Carbonero²,
Pedro Antonio Gutiérrez¹ &
…
César Hervás-Martínez¹

420 Accesses
13 Citations
Explore all metrics

Abstract

This work analyses the complementarity and contrast between two metrics commonly used for evaluating the quality of a binary classifier: the correct classification rate or accuracy, C, and the F₁ metric, which is very popular when dealing with imbalanced datasets. Based on this analysis, a set of constraints relating C and F₁ are defined as a function of the ratio of positive patterns in the dataset. We evaluate the possibility of using a multi-objective evolutionary algorithm guided by this pair of metrics to optimise binary classification models. To check the validity of the constraints, we perform an empirical analysis considering 26 benchmark datasets obtained from the UCI repository and an interesting liver transplant dataset. The results show that the relation is fulfilled and that the use of the algorithm for simultaneously optimising the pair (C,F₁) leads to a generally balanced accuracy for both classes. The experiments also reveal that, in some cases, better results are obtained by using the majority class as the positive class instead of using the minority one, which is the most common approach with imbalanced datasets.

A Logarithmic Distance-Based Multi-Objective Genetic Programming Approach for Classification of Imbalanced Data

Selection Operators Based on Maximin Fitness Function for Multi-Objective Evolutionary Algorithms

Multi-objective Feature Selection in Classification: A Differential Evolution Approach

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Correct classification rate or accuracy, C, is one of the most common performance metrics for classification tasks. However, this metric is not reliable when the classification of imbalanced datasets is considered [26, 28], because it cannot capture the accuracy level of each class in the results. Imbalanced datasets are those where one or several classes have a much lower prior probability in the training set, and they pose a difficult challenge for Machine Learning (ML) researchers [34], because standard error metrics tend to ignore minority classes, only reducing the error of the majority ones.

As a result, complementary and alternative performance metrics for classification have been proposed to take the prior probabilities into account, based on accuracy rates computed independently for each class, partially compensating data skewness. Examples of this type of metrics are the Geometric Mean (GM) [1, 31] and the adjusted GM [4], or the F₁ metric [16, 44] (the harmonic mean of precision and sensitivity or recall) and the adjusted F metric [36]. However, while these metrics can be effective in cases of moderate imbalance, the need for specific metrics arises in case of heavily skewed data [21].

In this work, we focus on obtaining binary classification models using the C and F₁ metrics simultaneously and analysing their behaviour by a multi-objective evolutionary algorithm (MOEA) [11], which can alleviate the negative effects that imbalanced datasets have on these metrics. Given that the F₁ metric provides an evaluation of the balance in terms of classification for both classes of a problem, this methodology can lead to classification models with a trade-off between global accuracy and recall for the minority class (sensitivity).

On the other hand, this theoretical study is supported by an empirical study with 26 datasets from the UCI repository [3] with different degrees of skewness and a complex donor-recipient matching model for a liver transplantation problem [12]. The results are analysed with three metrics: the mentioned C and F₁, the latter calculated on the positive class and named as F_1Pos, plus a third metric, F_1Neg, calculated on the negative class. The study shows that using the majority class as a positive class instead of the minority class results in better performance for both classes, depending on the dataset and the percentage of patterns classified as positive. This can be explained by the fact that, in some cases, the noise of minority classes can mislead the results obtained for metrics such as F₁. The classifiers obtained by the pair of metrics (C,F₁) are shown to lead to yield performance as balanced as possible in both metrics, without the need of applying resampling techniques.

In summary, the main contributions of this paper are the following:

We introduce a theoretical study on the relationship between the C and F₁ metrics, which is evaluated both graphically and from an empirical point of view. This study shows how the constraints presented may limit (or not) an algorithm in the process of finding good classifiers, depending on the shape of the feasible region of solutions. Moreover, it provides graphical information about the space of possible solutions attainable for the algorithm.
Given a binary classifier B(α), we address the problem of finding the parameters α which maximizes the metrics C and F₁. If the C value is evaluated and optimised together with F₁, we would achieve an overall success for both classes in a binary classification problem. Unlike previous works which try to obtain a high global accuracy, this proposal simultaneously analyses the accuracy and the F₁ level, which is specially useful when the costs of misclassification are different but not exactly known. In this way, the study tries to alleviate the negative effects that imbalanced datasets have on the F₁ metric, the value of F₁ being increased by slightly decreasing the value of C.
On the other hand, binary classification models, which separate positive and negative patterns, can be built making a choice between two alternatives: the positive class is the majority or the minority one. We experimentally show that, in certain datasets, considering the majority class as the positive one instead of using the minority class leads to models with better values in C and F₁.

Once explained the motivation of this work, the following sections are organised as follows: In Section 2, C and F₁ metrics are presented and their properties are discussed. Section 3 derives the theoretical relation between the values of the contingency matrix and the discussed metrics. Section 4 shows the experiments performed and the results obtained. The conclusions are finally drawn in Section 5.

2 Performance metrics for binary classification

Although ML algorithms can be evaluated using theory (deriving generalized error bounds) [37], empirical evaluation remains the most common approach for algorithm assessment. Evaluation techniques based on multiple experiments are frequently considered in practice [14, 17, 41].

A metric evaluating a classifier must be estimated from all the available samples, which are usually split into training and test sets [25]. The classifier is first designed using training samples, and then it is evaluated based on its classification performance on the test samples. Different performance metrics are used to select the classification model which performs better for a given problem. In this context, class imbalance learning refers to those classification problems where the datasets present a skewed class distribution. In these imbalanced binary classification problems, patterns of a class are highly outnumbered by those of the other class, the minority class being heavily under-represented [2, 7]. Although the minority class is in many problems the class of interest, training a classifier with an imbalanced dataset often produces models biased towards the majority class (sometimes leading to trivial classifiers where all patterns are classified as coming from the majority class), which perform poorly on the minority class [44, 45].

2.1 Traditional metrics

The behaviour of a binary classifier can be described by the number of patterns of the positive class correctly recognised (true positives, TP), the number of patterns of the negative class correctly recognised (true negatives, TN), and by the number of examples which were incorrectly assigned either to the positive class (false positives, FP) or to the negative one (false negatives, FN). These four quantities constitute a confusion matrix, as shown in Table 1.

Table 1 An example of a binary confusion matrix

Full size table

From the binary confusion matrix, the following metrics can be defined:

Global performance, which can be evaluated using the accuracy, C:

$$ C = \frac{TP+TN}{TP+FP+TN+FN}, $$
that is, the rate of correct predictions.
Precision, P, is a metric of exactness, representing how many examples have the true positive label from all predicted as positive:

$$ P = \frac{TP}{TP+FP}, $$
Recall or sensitivity, R, also known as TP Rate and Positive Precision, is a metric of completeness, and it represents how many positive examples are classified correctly:

$$ R = \frac{TP}{TP+FN}. $$

Other traditional metrics for imbalanced problem are Kubat’s G-mean [31] and AUC (Area under the Roc curve) [5]. All these metrics were claimed to be effective in evaluating classification performance for binary imbalanced learning scenarios [18, 26], but they present disadvantages. For example, both P and R do not detect changes in TN, and this is a serious problem when the classes have distinct features for which both positive and negative classes are well-defined (e.g. male or female). The retrieval of a positive class, the discrimination between classes or the balance between retrieval from both classes are problem-dependent tasks.

In this first study we will focus on considering the C metric, that takes into account the four values of the confusion matrix of a binary classification problem, and also the F₁ metric, that uses three of the four values (it does not detect changes in TN), but it considers the values of P and R through its harmonic mean, as will be shown in the next Section.

2.2 F ₁ metric

Another alternative for dealing with imbalanced data is the F-metric (F₁) [16, 44]. This metric is specially useful when the cost of misclassification for the different classes is not exactly known.

The F-metric [40] (or balanced F₁-score) is given by:

$$ F_{1} = \frac{2}{\frac{1}{P}+\frac{1}{R}}=\frac{2PR}{P+R}, $$

that is, the harmonic mean of P and R. It tends towards the lower of the two and is a summary indicator that can be generalised in F_β:

$$ F_{\beta} = (1+\beta^{2})\frac{PR}{\beta^{2}P+R}. $$

The F_β metric allows us to assign β times importance to P over R. The most popular choice is β = 1, i.e. assigning the same importance to P and R, leading to F_β= 1 = F₁. F₁ reaches its best value at 1 (perfect P and R) and worst at 0 (P and/or R = 0).

F₁ can be used to obtain optimal classification models with high R and P in class imbalance situations due to the trade-off between both metrics [30]. It is commonly used as a criterion for classifier selection [8]. F₁ is invariant to changes of TN counts, so it does not recognise the specifying ability of classifiers. Such a metric is more applicable to domains with a multi-modal negative class taken as “everything not positive” [41]. F₁ is also invariant to uniform changes of positive and negative counts [41], that is, it is stable with respect to the uniform increase of data size (scalar multiplication of the confusion matrix). If we expect that, for different data sizes, the same proportion of examples will exhibit positive and negative characteristics, that is, the size of a stratified sample does not affect the performance of the classifier, then this invariant metric is a good choice.

Improving a classifier using F₁ is generally not easy, as the resulting optimisation problem is non-convex. Therefore, various approximation methods have been proposed. An efficient algorithm to maximise a non-convex approximation to F₁ for logistic regression was presented in [27], showing its effectiveness on an information extraction problem. This method can fail if the initial value of the expected F₁ is unreasonable. In [39], a variant of the standard Support Vector Machine (SVM) that optimises an approximation to F₁ is proposed. An efficient algorithm for maximising a convex lower bound of F₁ for SVMs was introduced in [29].

Liu et al. [33] proposed a novel method which maximises an approximate F₁ including an L1-regulariser. In [10], the authors presented a non-convex loss function for F₁ maximisation in the presence of outliers, proposing a formulation based on the elastic net regulariser (combination of L1 and L2 penalties).

Now, we redefine F₁ to study better its optimisation. First, we expand the definition of F₁ as follows:

$$ \begin{array}{@{}rcl@{}} F_{1} &=& \frac{2(TP)^{2}}{2(TP)^{2}+(TP)(FP)+(TP)(FN)}\\ &=& \frac{2(TP)}{2(TP)+FP+FN}. \end{array} $$

Although it is not common, we will empirically study the performance of the classifier for the negative class, which can be similarly expressed as:

$$ \begin{array}{@{}rcl@{}} F_{1\text{Neg}} &=& \frac{2(TN)^{2}}{2(TN)^{2}+(TN)(FP)+(TN)(FN)}\\ &=& \frac{2(TN)}{2(TN)+FP+FN}. \end{array} $$

Let f be the ratio of patterns of the positive class:

$$ \begin{array}{@{}rcl@{}} f=\frac{TP+FN}{TP+FN+TN+FP}. \end{array} $$

Let z be the ratio of true positive patterns obtained by the classifier:

$$ z=\frac{TP}{TP+FN+TN+FP}. $$

Then, the confusion matrix can be expressed (in ratios) as:

$$ \begin{array}{@{}rcl@{}} \left( \begin{array} {cc} TP & FN \\ FP & TN \end{array}\right) \equiv \left( \begin{array}{cc} z & \quad f-z \\ 1-f-C+z & \quad C-z \end{array}\right), \end{array} $$

(1)

Taking into account the equivalence shown in (1), F₁ can be simplified to:

$$ F_{1} = \frac{2z^{2}}{2z^{2}+z(1-z-C+z)}=\frac{1}{1+\frac{1-C}{2z}}. $$

F₁ can be maximised by minimising $\frac {1-C}{2z}$, assuming that z≠ 0. In this way, the minimisation problem can be expressed as:

$$ \min \left( (1-C)-\lambda(2z)\right), $$

where λ is a positive constant, thus F₁ is maximised by maximising both C and z. When z → 0, the maximisation of F₁ is difficult because, as it will be shown, F₁ → 0.

3 Relation between F ₁ and C metrics

We analyse and obtain the relation between C and F₁ metrics based on the ratio of patterns of the positive class (f ). To carry out this analysis, it is necessary to study the boundaries in terms of the values that C and F₁ metrics can take for a binary classification problem.

If we represent C in the abscissa of the plane and F₁ in the ordinate one, as shown in Fig. 1, a classifier can be represented within the square [0,1] × [0,1] in terms of the values that it could take with respect to both metrics, where (1,1) is the optimal point. We denominate feasible region to the area (shaded region) that includes the values which can be reached for both metrics, and the rest of the values outside of that area would result in what we denominate unfeasible region. Note that the feasible area would be attainable or not depending on the difficulty of the dataset considered.

In the following sections, the boundaries of the unfeasible region for the pair of metrics (C,F₁) are analysed, defined as a conjoint metric of global classification success (C) and priority class accuracy (F₁), as well as a more formal and detailed explanation about it and its representation.

3.1 Range of values of F ₁ for each C

We want to obtain the range of possible values of F₁ for each C ∈ [0,1]. Taking into account the above-mentioned terms z and f, the metrics can be expressed as:

$$ R = \frac{z}{f}, $$

$$ P= \frac{z}{1-f-C + 2z}, $$

$$ F_{1} = \frac{2z}{2z + 1-C}. $$

(2)

Firstly, we analyse the extreme cases:

Suppose C = 0.

Then, only if z = 0 (TP = 0), R = P = 0 and F₁ → 0.
Suppose C = 1.

Then, only if z = f, R = P = F₁ = 1.

Secondly, we analyse the most common case, 0 < C < 1, for which F₁ value is given by (2).

Proposition 1

The following constraint will always be fulfilled, assuring that the confusion matrix is positive:

$$ \max\{0,f+C-1\}<z<\min\{C,f\}. $$

(3)

Proof

Given that:

C,f > 0 ⇒ 0 < min{C,f}.
C,f < 1 ⇒ f + C − 1 < min{C,f}.

Then: max{0,f + C − 1} < min{C,f} □

Now we analyse the relation between F₁ and C. It will be based on the variation of F₁ = F₁(z) for each value of C.

Lemma 1

F₁(z) is an increasing function.

Proof

Given that the derivative of F₁(z) is:

$$ \frac{\partial F_{1}(z )}{\partial z} = \frac{2 (1-C )}{(2z + 1-C )^{2}} >0, $$

it is clear that this derivative is always positive, provided that C is constant. □

Using inequality (3) and given that F₁(z) is non-decreasing:

$$ \begin{array}{@{}rcl@{}} M &=&\max\{F_{1}(0), F_{1}(f+C-1)\}< F_{1}(z) < \\ &<&\min\{F_{1}(C), F_{1}(f)\}=m. \end{array} $$

(4)

We can now replace these values in (2), obtaining the following expressions:

$$ F_{1}(0)= 0, F_{1}(f+C-1) = \frac{2 (f+C-1 )}{2f+C-1}, $$

$$ F_{1}(C) = \frac{2C}{1+C}, F_{1}(f) = \frac{2f}{2f + 1-C}. $$

From these expressions and inequality (4), the following relations can be obtained:

If F₁(0) < F₁(C),

$$ 0<\frac{2C}{1+C}, \text{ and then, } C>0. $$
If F₁(0) < F₁(f),

$$ 0<\frac{2f}{2f + 1-C}, \text{ and then, } f>0 \colon \ 2f + 1-C>0. $$
(5)

3.2 Boundaries of the infeasible region

We analyse the boundaries according to the relative values of C and F₁. Minimum and maximum values in (4) will be called m and M respectively, and based on these values, we determine the functions which limit the infeasible region (boundaries). There are four different cases:

1.
M = 0 and m = F₁(C).

In this case,

M = 0 is only possible if F₁(f + C − 1) < F₁(0) = 0, i.e.:

$$ \begin{array}{@{}rcl@{}} \frac{2(f+C-1)}{2f+C-1} < 0 \Rightarrow f&+&C-1<0,\\ C&<&1-f, \end{array} $$
(6)
given that 2f + 1 − C > 0 (see (5)).

On the other hand, if F₁(C) < F₁(f) and m = F₁(C) then:

$$ \begin{array}{@{}rcl@{}} \frac{2C}{1+C}<\frac{2f}{2f + 1-C},\\ C^{2}-C(f + 1)+f>0,\\ (C-f)(C-1)>0,\\ \left\{\begin{array}{l} C-f<0 \Rightarrow C<f,\\ C-1<0 \Rightarrow C<1. \end{array} \right. \end{array} $$
(7)
From (6) and (7), C < min{f,1 − f}, therefore:

$$ \begin{array}{@{}rcl@{}} 0<F_{1}(z)< \frac{2C}{1+C}. \end{array} $$
2.
M = 0 and m = F₁(f).

In this case,

if M = 0, from (6), C < 1 − f, and if m = F₁(f), then F₁(f) < F₁(C). Consequently:

$$ \begin{array}{@{}rcl@{}} \frac{2f}{2f + 1-C}<\frac{2C}{1+C}, \end{array} $$
and following the same steps as in (7), f < C, and as from (6) C < 1 − f, f < C < 1 − f, consequently $f<\frac {1}{2}$ and:

$$ \begin{array}{@{}rcl@{}} 0<F_{1}(z)< \frac{2f}{2f + 1-C}. \end{array} $$
3.
M = F₁(f + C − 1) and m = F₁(C).

In this case,

M = F₁(f + C − 1) if $F_{1}(f+C-1)=\frac {2(f+C-1)}{2f+C-1}>0$, then f + C − 1 > 0.

On the other hand, m = F₁(C) if F₁(C) < F₁(f), and operating as in case 1, then C < f.

From both inequalities, we have that 1 − f < C < f, so, in this case:

$$ \frac{2(f+C-1)}{2f+C-1}<F_{1}(z)<\frac{2C}{1+C}. $$
This can only happen if $f>\frac {1}{2}$.
4.
M = F₁(f + C − 1) and m = F₁(f).

M = F₁(f + C − 1) implies that f + C − 1 > 0, and m = F₁(f) implies that f < C. Then, C > max{f,1 − f} and:

$$ \frac{2(f+C-1)}{2f+C-1}<F_{1}(z)<\frac{2f}{2f + 1-C}. $$

Rearranging the results by using the value of f (if $f<\frac {1}{2} \Rightarrow \min \{f, 1-f\}=f$ and max{f,1 − f} = 1 − f, or if $f>\frac {1}{2} \Rightarrow \min \{f, 1-f\}= 1-f$ and max{f,1 − f} = f) and calling L_f(C) to the lower limit and U_f(C) to the upper one, we find that, in both cases, the boundaries are the same:

$$ \begin{array}{@{}rcl@{}} & L_{f}(C) = \left\{ \begin{array}{lc} 0 & 0 \leq C \leq 1-f\\ \frac{2(f+C-1)}{2f+C-1} & 1-f \leq C \leq 1 \end{array}\right., & \\ & U_{f}(C) = \left\{ \begin{array}{lc} \frac{2C}{1+C} & 0 \leq C \leq f\\ \frac{2f}{2f + 1-C} & f \leq C \leq 1 \end{array}\right.. & \end{array} $$

We use non-strict inequalities for the limits (≤) because L_f(C) and U_f(C) are continuous functions (their values are similar at both sides of 1 − f and f, respectively).

3.3 Representation and feasibility

Once the limits of the unfeasible region have been calculated, the graphical representation of the feasible region in the Fig. 1 for different values of f (0.35 and 0.65) can be understood properly.

It is clear that the whole shaded region is feasible. Given any point (x₀,y₀), it is possible to find a classifier (confusion matrix) such that C = x₀ and F₁ = y₀. This can be done by taking:

$$ z= \frac{y_{0}(1-x_{0})}{2(1-y_{0})}=\frac{F_{1}(1-C)}{2(1-F_{1})}. $$

Note that, for a given value of C (or F₁), there are many possible values for F₁ (or C). In this way, the performance of a classifier when considering C and F₁ can be represented by using this kind of plot to have an idea of the range of possible improvement for a given value of C or F₁, according to the feasible region which can still be explored.

4 Experimental validation

This section presents the experiments performed in this paper to study the relation between C and F₁ and validate the feasible region derived in the previous section and the possibility of simultaneously optimising C and F₁ in a binary classifier B(α).

Firstly, we present the MOEA considered for the optimization of a population of B(α) classifiers, in this case Artificial Neural Networks (ANNs), where α is the structure and weights of the net. We also describe a mono-objective version of the algorithm, which has been implemented for comparison purposes: an Evolutionary Algorithm (EA) for the optimization of ANNs using only one of the two metrics proposed to guide the algorithm, C or F₁, and applying the same operators and the same type of ANNs the MOEA used. Finally, we describe the datasets used in the experimentation and the results obtained.

4.1 Algorithms to optimize C and F ₁

The two proposed metrics, C and F₁, are used for guiding a MOEA called MPENSGA2 (Memetic Pareto Evolutionary NSGA2), described in detail in [19] and based on the original algorithm NSGA2 [13]. MPENSGA2 is based on the evolution of ANNs as binary classifiers [20, 43], where both the structure and the weights of the ANNs are optimised.

The pseudocode of MPENSGA2 is shown in Algorithm 1. The population size is established as N = 100. Five mutation operators are used in this algorithm, four structural mutators (add neurons, delete neurons, add links, delete links) and one parametric mutator (add random noise to the links). The probability of choosing a type of mutator and applying it to an individual is equal to 1/5. With regard to the add or delete link mutations, the links are added or deleted first between the input layer and the hidden layer and then between the hidden layer and the output layer. Specifically, we randomly add or delete 30% of the total number of links in the input-hidden layers, and 5% of the total in the hidden-output layers. Weights are assigned using a uniform distribution defined throughout two intervals, [− 5,5] for connections between the input layer and hidden layer and [− 10,10] for connections between the hidden layer and the output layer. The number of neurons to be added or deleted is chosen randomly between the values [1,2]. All these values have been obtained experimentally and are sufficiently robust. For a more detailed description of the parameters of the algorithm see [19].

Taking into account the feasible region defined above and the Pareto dominance concept, one point in (C,F₁) space dominates another if it has more C and equal or greater F₁, or if it has greater F₁ and equal or better C. Thus, the most competitive classifiers will tend towards the upper right part of the feasible region, as the evolutionary process of the MOEA progresses.

On the other hand, we have adapted the EA algorithm described in [38] in order to compare the simultaneous multi-objective optimization of C and F₁ to the mono-objective optimization of each of the two functions. Algorithm 2 shows the pseudocode of the EA. The population size is established at N = 100, the value of the weights of the links are the same than those used in the MOEA, as well as the mutation operators: four structural mutators and one parametric mutator. The activation function of the neurons in hidden layer is the sigmoidal function (as in the MOEA).

Therefore, four methodologies appear in the comparison process, two corresponding to the MOEA (attending to the C or F₁ extreme of the Pareto front) and two corresponding to the EA (optimizing only C or only F₁). Note that the MOEA is run once, while the EA has to be run twice (once for each objective, C or F₁).

In order to check whether the null hypothesis can affect the performance of this pair of metrics, we will consider two approaches for each dataset used in the experimental procedure:

1.
The null hypothesis is that the positive class is the minority one.
2.
The null hypothesis is that the positive class is the majority one.

Although it could seem that the first hypothesis would report the best results for improving the value of F₁ for the minority class, we will see that, in some cases, the second approach is able to improve the classification for the less frequent class.

4.2 Datasets and experimental design

In this work 26 benchmark datasets obtained from the UCI repository [3] have been considered, presenting different levels of imbalance, and

a complex and interesting dataset named “Madre”, corresponding to a problem of donor recipient matching in liver transplant [6]. This real transplant dataset consists of data from liver transplants performed in 11 Spanish units, including all the transplants performed between January 1, 2007, and December 31, 2008. Recipient and donor characteristics were reported at the time of transplant: 16 recipient characteristics, 16 donor characteristics and 9 operative factors were reported for each donor-recipient pair. The end-point variable is 3-months graft mortality.

All these datasets have passed through the following preprocessing steps: categorical attributes were expanded into the corresponding binary vectors, and then each attribute was normalized into the interval [− 1,1]. Multiclass datasets were reduced to binary classification using one of the two procedures: 1) choice of one label to represent the positive class and the combination of the others to form the negative class, e.g., Ecoli 3 (“imU”) versus all (“cp”, “im”, “pp”, “om”, “omL”, “imL”, “imS”); and 2) selection of only two labels among all the possible ones, e.g., Yeast (2 versus 4), where the examples labeled as 2 (“me2”) and 4 (“cyt” ) were chosen to represent the positive and negative classes. Both procedures were applied following the suggestions in the literature [9, 46].

The degree of imbalance of a dataset can be indicated by the value of f, previously defined as the ratio of patterns of the class considered as positive, or by the Imbalance Ratio, IR, defined as the ratio of the number of instances in the majority class to the number of examples in the minority class [22]. Each dataset has a different level of imbalance, in order to better explore the relation between C and F₁ metrics. The values of f are specified depending on the null hypothesis selected: 1) the positive class is the majority one (f-Maj.), or 2) the positive class is the minority one (f-Min.). Table 2 shows the features for each dataset ordered from highest to lowest imbalance.

Table 2 Characteristics of the datasets

Full size table

For all datasets, a stratified 10-fold cross validation was conducted, with 3 repetitions per each fold. Each dataset was run using both the majority class and the minority class as the positive label, resulting in 30 runs for each null hypothesis and dataset.

4.3 Results

First of all, we graphically evaluate whether the theoretical constraints derived for the simultaneous optimisation of C and F₁ metrics are fulfilled in the experimentation with the MOEA. Figure 2, divided in two parts, shows the graphical representation for a specific run of the Bands dataset:

Figure 2a and b include training and test results when the algorithm is run using the majority class as the positive one.
Figure 2c and d include training and test results when the algorithm is run using the minority class as the positive one.

On the other hand, Fig. 2a and c represent the values obtained for each individual (classification model) of the population of the MOEA in C and F₁ for the training set, showing the non-dominated individuals (red circles ) ordered forming a Pareto front. Figure 2b and d show the results obtained by the same models for both metrics, but in this case for the test set. It is important to highlight that there are no Pareto fronts in the test set, since these fronts are always obtained in training. Therefore, these figures show where the models obtained during training are located when the test set is applied to them. Besides, the individuals that were in the first Pareto front in the training figures can now be located within the (C,F₁) feasible region in a zone that is worse (lower values of C and F₁), depending on the generalization capacity of the models. Similar conclusions and Subfigures can be found in Fig. 3 for the Liver dataset. As can be checked, the class used as positive label has a clear influence on the shape of the feasible region and the spread of the members of the Pareto fronts.

We evaluate the models of the main Pareto front with maximum training C, and with maximum training F₁, since the MOEA returns a Pareto considering both objectives (C and F₁), that is, we evaluate the extremes of each front. This is done for each of the 30 runs, for each null hypothesis and for each dataset. To clarify the experimentation, one illustrative example is shown in Fig. 4, where the reader can observe several models of the main Pareto front from one run out of the 30 ones carried out for the Bands dataset (using the minority class as positive label).

Tables 3, 4, 5 and 6 include for each dataset and methodology used, the average test values of C, F_1Pos and F_1Neg in the 30 runs when considering the majority or the minority class as the positive label. Regarding the notation used, MOEA − C or MOEA − F₁ refers to the multi-objective algorithm considering the average values obtained by the C or F₁ extreme, respectively. MONO − C or MONO − F₁ refers to the mono-objective methodology considering the average values obtained optimizing only C or only F₁, respectively. The best results of the MOEA when comparing the two null hyphotesis are marked in bold face. The results obtained with the EA methodologies are compared to the corresponding row of the MOEA (using the majority or minority class as positive label, depending on the table). If the mono-objective methodology obtains the best results, it is marked with the asterisk symbol (*).

Table 3 Average test results in 30 runs of those datasets that considering the majority class as positive label (Maj.) using the MOEA leads to better C and F₁ for the positive and negative classes (F_1Pos and F_1Neg) in the C and F₁ extremes

Full size table

Table 4 Average test results in 30 runs of those datasets that considering the majority class as positive label (Maj.) using the MOEA leads to better C and F_1Pos in the C and F₁ extremes

Full size table

Table 5 Average test results in 30 runs of those datasets that considering the minority class as positive label (Min.) using the MOEA leads to better C and F₁ for the positive and negative classes (F_1Pos and F_1Neg) in the C and the F₁ extremes

Full size table

Table 6 Average test results in 30 runs of those datasets that considering the minority class as positive label (Min.) using the MOEA leads to better C and F_1Pos in the C extreme. Nevertheless in the F₁ extreme, considering the majority class as positive label (Maj.) leads to better C and F_1Pos

Full size table

From now on, we refer to the results obtained in F₁ for the class considered as positive as F_1Pos. Note that the comparison between considering the majority class or the minority one as positive (Maj. or Min. respectively) with the MOEA is done attending only to the C extreme or the F₁ extreme. Therefore, when comparing the values of F₁ it is important to take into account which is the null hypothesis and the extreme observed, C or F₁. In this way, the value of F_1Pos obtained when the positive class is the majority one (Maj.) should be compared against the value of F_1Neg obtained when the positive class is the minority one (Min.). Similarly, the value of F_1Neg when the positive class is the majority one (Maj.) should be compared against the value of F_1Pos when it is the minority one (Min.). Finally, the value of C must be directly compared with its counterpart (Maj. Cvs Min. C). As an example of clarification, the results shown in Fig. 5 corresponds to the Hepatitis dataset, and they are compared as follows: for the extreme with the best value in C, considering the majority class as positive, the value F_1Pos = 0.893 is compared against the value F_1Neg = 0.869 that corresponds to that same extreme but considering the minority class as positive. In the same way, for this C extreme, the value F_1Neg = 0.515 (considering the majority class as positive) is compared against the value F_1Pos = 0.474 of the minority class considered as positive. Finally, the value C = 82.74 (considering the majority class as positive) can be directly compared with the value C = 79.51 (minority class considered as positive). On the other hand, this same comparison procedure is applied to the F₁ extreme. Regarding the mono-objective methodology, the values obtained optimizing only C or F₁ are shown considering the majority class as positive label, which is the null hypothesis with the best results in the Hepatitis dataset using the MOEA.

That said, the datasets are grouped in the four tables named above as follows:

Table 3 shows the 4 datasets where considering the majority class as positive label (Maj.) using the MOEA leads to better C and F₁ for both the positive and negative class (F_1Pos and F_1Neg) in the extremes C and F₁. In this case, the results obtained by the EA methodologies () are compared to the MOEA ones, when they use the majority class as positive label, because this gets better results in these datasets.
Table 4 show the 10 datasets where considering the majority class as positive label (Maj.) using the MOEA leads to better C and F₁ for the positive class (F_1Pos) in the extremes C and F₁.
Table 5 show the 11 datasets where considering the minority class as positive label (Min.) using the MOEA leads to better C and F₁ for the positive and negative class (F_1Pos and F_1Neg) in the extremes C and F₁.
And finally, Table 6 shows the 2 datasets where considering the minority class as positive label (Min.) using the MOEA leads to better C and F₁ for the positive class (F_1Pos) in the C extreme. Nevertheless, in the F₁ extreme, the same thing happens but considering the majority class as positive label (Maj.), that is, it leads to better C and F₁ for the positive class (F_1Pos).

Taking into account this experimental design and comparison process, the following conclusions can be drown about considering the majority class or the minority one as positive class:

Majority class as positive label: Tables 3 and 4 show that the value of C decreases when the minority class is considered as positive instead of the majority one.

For this case, in Table 3 some datasets are included for which the use of the majority class as the positive label improves C and F₁ for both classes (F_1Pos and F_1Neg) and for both extremes of the Pareto front. In this way, when the minority class is considered as positive, C is positively correlated with F_1Pos and F_1Neg (i.e. the three values decrease) for Hepatitis, Ionos, HorseColic and HeartStatlog datasets. For example, in the Hepatitis dataset, the value Maj.-C = 82.74 becomes Min.-C = 79.51, the value Maj.-F_1Pos = 0.893 becomes Min.-F_1Neg = 0.869, and the value Maj.-F_1Neg = 0.515 becomes Min.-F_1Pos = 0.474. The IR of these datasets is not too high, from 1.250 in HeartStatlog to 3.844 in Hepatitis. The mono-objective methodologies does not obtain better results than the multi-objective ones when the majority class is considered as positive label.

Note that there are cases in which the average values of C, F_1Pos and F_1Neg are similar or even equal when comparing both extremes. This happens when the values of C and F₁ are considerably high, then the models obtained are close to the (1,1) optimal point of the feasible region, which could be narrow (see Fig. 1). This can cause the obtained models to be different in terms of their ANN structure but with similar or equal performance. There could even be cases in which one run provides a main Pareto front with only one individual or model.

On the other hand, Table 4 include the datasets where considering the majority class as positive leads to better C and F_1Pos in the extremes C and F₁ for Ecoli (3 versus all), SaHeart, Madre, Pima, SpectfHeart, Glass (1 versus all), BreastC, Bands, Haberman and Glass (0 versus all). In this case, a positive correlation is shown between C and F_1Pos when the minority class is considered as positive. Nevertheless, the correlation is negative between C and F_1Neg. For example, in the Ecoli (3 versus all), when the value Maj.-C = 91.87 becomes Min.-C = 91.27 (i.e. it decreases), the value Maj.-F_1Neg = 0.547 becomes Min.-F_1Pos = 0.577 (i.e., it increases). The IR of these datasets is diverse, from 1.704 in Bands to 8.600 in Ecoli (3 versus all).

Regarding the mono-objective methodologies, only for Ecoli (3 versus all), their results are better than the multi-objective ones in the three metrics considered (C, F_1Pos and F_1Neg). For SaHeart, Madre, SpectHeart, Haberman and Glass (0 versus all), they also achieve slightly better results but only for some of the three metrics, sometimes causing the value of C to decrease.
Minority class as positive label: Continuing with the interpretation of the results, in Table 5, the value of C is shown to increase when the minority class is considered as positive (instead of the majority one).

For the datasets of this table, C, F_1Pos and F_1Neg values are improved in the extremes C and F₁, i.e. C is positively correlated with F_1Pos and F_1Neg. The IR of the datasets varies significantly, from 1.148 in HouseVoting to 15.329 in Sick. In this way, obtaining good results in C and F₁ when the minority class is considered as positive seems to be independent on the imbalance level and probably related with the structure of the dataset. This empirically shows that there is no reason why we should use this null hypothesis (minority class as positive) when the imbalance level is high.

Regarding the mono-objective methodologies, they obtain better results than the multi-objective ones only in three datasets: BreastW-Diagnostic, BreastW-Original and HouseVoting.
Majority or minority class as positive label: Finally, in Table 6, we show 2 datasets, German and Liver, with behaviours different from those commented so far: in the C extreme, the value of C metric increases when considering the minority class as positive class instead of the majority one, being positively correlated with F_1Neg and negatively with F_1Pos. On the other hand, if we observe the F1 extreme, the value of C decreases with the same consideration, being C positively correlated with F_1Pos and negatively correlated with F_1Neg. The IR of these datasets is not high, from 1.379 in Liver to 2.333 in German.

The mono-objective methodologies obtain better results than the multi-objective ones only in the F_1Neg and F_1Pos metrics for the German dataset, when the positive label is assigned to the minority and majority class, respectively.

In some cases, the MOEA may stagnate due to the feasibility constraints existing from the relation between C and F1 and due to the value of f (ratio of patterns of the class considered as positive). Furthermore, the number of non-dominated solutions can be small and even identical in terms of their C and F₁ values. In these cases, the classifier is not sufficiently trained, and the mono-objective methodologies may obtain better results in generalisation, although this will depend a lot on the database. Figure 6 shows the feasible region for the Glass (0 versus All) and Card datasets when the majority class and the minority one are considered as positive class, respectively. As can be seen, for C values greater than 0.7, the space of solutions begins to narrow drastically, and, under these circumstances, it would be convenient to also experiment with mono-objective algorithms.

5 Conclusions

This work presents the theoretical constraints associated to the relation between C and F₁ metrics, as a function of the ratio of patterns of the positive class of a binary classifier. We propose to represent binary classifiers as points in a two-dimensional plot according to their C and F₁ performances. Using this representation, the constraints limit the feasible region in such a way that this region is wider when the values of C and F₁ are relatively low. This representation can give us an idea of the range of possible improvement for a given value of C or F₁.

The results show that the theoretical constraints are fulfilled. The MOEA is able to optimise a Pareto front of binary classifiers where both C and F₁ values are acceptable, showing a high accuracy both globally and for the positive and negative classes. On the other hand, it has also been proven that mono-objective methodologies generally obtain worse results when optimizing only C or F₁ instead of optimising both metrics simultaneously. Its use should only be considered when the feasible region is narrow due to the constraints from the relation of both metrics or due to the value of f (ratio of patterns of the class labelled as positive).

For some datasets, using the majority class instead of the minority one as the positive label results in better performance both for C and F₁ for both classes, particularly in those datasets where the degree of imbalance is lower. This option should be considered by decision makers when training binary classifiers, given that, in general, the positive label is always assumed to be the minority one, since it is understood that better results will be obtained. This leads to a performance that, in some cases seems, to be independent of the imbalance level and probably related with the structure of the dataset.

It is also observed that C and F_1Pos metrics are correlated in all datasets tested, except in German and Liver when the C extreme is considered. With respect to C and F_1Neg, they are negatively correlated in the 10 datasets of Table 4 and in German and Liver dataset when the F₁ extreme is considered.

Finally, the use of an MOEA leads to acceptable results for both C and F₁ in a good number of datasets according to the experiments performed. The mono-objective methodologies need to be run twice, once for each metric. This increases the computational cost and does not always lead to better results with respect to the multi-objective methodology.

As future research lines, we plan to extend the findings in this paper to a multiclass classification environment by considering, for example, multi-objective evolutionary algorithm based on decomposition [35]. Moreover, an automatic method to choose the best null hypothesis for a given classification problem could be designed, based on the analysis of the dataset and the classifier.

On the other hand, with the rise of popularity of deep learning methods and their recent promising results in many classification and forecasting applications [15, 24, 42], our methodology could be extended to deep structures. In this direction, recent works has applied evolutionary techniques with simple learning modules [23, 32] in order to simultaneously optimize different objectives.

References

Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of the 15th European conference on machine learning (ECML2004), pp 39–50
Almogahed B, Kakadiaris I (2015) NEATER: filtering of over-sampled data using non-cooperative game theory. Soft Comput 19(11):3301–3322
Article Google Scholar
Asuncion A, Newman D (2007) UCI maching learning repository. http://www.ics.uci.edu/mlearn/MLRepository.html
Batuwita R, Palade V (2012) Adjusted geometric-mean: a novel performance measure for imbalanced bioinformatics datasets learning. J Bioinform Comput Biol 10(4):1250,003
Article Google Scholar
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Article Google Scholar
Briceño J, Cruz-Ramírez M, Prieto M, Navasa M, Ortiz de Urbina J, Orti R, Gómez-Bravo M, Otero A, Varo E, Tomé S, Clemente G, Bañares R, Bárcena R, Cuervas-Mons V, Solórzano G, Vinaixa C, Rubín A, Colmenero J, Valdivieso A, Ciria R, Hervás-Martínez C, de la Mata M (2014) Use of artificial intelligence as an innovative donor-recipient matching model for liver transplantation: results from a multicenter Spanish study. J Hepatol 61(5):1020–1028
Article Google Scholar
Castro CL, Braga AP (2013) Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Transactions on Neural Networks and Learning Systems 24(6):888–899
Article Google Scholar
Chawla N, Sylvester J (2007) Exploiting diversity in ensembles: improving the performance on unbalanced datasets. In: Multiple classifier systems, lecture notes in computer science, vol 4472, pp 397–406
Chen S, He H, Garcia EA (2010) RAMOBoost: ranked minority oversampling in boosting. IEEE Trans Neural Netw 21(10):1624–1642
Article Google Scholar
Chinta P, Balamurugan P, Shevade S, Murty M (2013) Optimizing F-measure with non-convex loss and sparse linear classifiers. In: The 2013 international joint conference on neural networks (IJCNN), pp 1–8
Collette Y, Siarry P (2004) Multiobjective optimization - principles and case studies. Decision engineering. Springer, Berlin
Book MATH Google Scholar
Cruz-Ramírez M, Hervás-Martínez C, Fernández JC, Briceño J, de la Mata M (2013) Predicting patient survival after liver transplantaton using evolutionary multi-objective artificial neural networks. Artif Intell Med 58(1):37–49
Article Google Scholar
Deb K, Pratab A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA2. IEEE Trans Evol Comput 6(2):182–197
Article Google Scholar
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923
Article Google Scholar
Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660
MathSciNet MATH Google Scholar
Estabrooks A, Japkowicz N (2001) A mixture-of-experts framework for learning from imbalanced data sets. In: Advances in intelligent data analysis, lecture notes in computer science, vol 2189, pp 34–43
Faris H, Al-Zoubi AM, Heidari AA, Aljarah I, Mafarja M, Hassonah MA, Fujita H (2018) An efficient binary Salp swarm algorithm with crossover scheme for feature selection problems. Knowl-Based Syst 154:43–67
Article Google Scholar
Faris H, Al-Zoubi AM, Heidari AA, Aljarah I, Mafarja M, Hassonah MA, Fujita H (2019) An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Information Fusion 48:67–83
Article Google Scholar
Fernández JC, Martínez FJ, Hervás C, Gutiérrez PA (2010) Sensitivity versus accuracy in multi-class problems using memetic pareto evolutionary neural networks. IEEE Trans Neural Netw 21(5):750–770
Article Google Scholar
Furtuna R, Curteanu S, Leon F (2012) Multi-objective optimization of a stacked neural network using an evolutionary hyper-heuristic. Appl Soft Comput 12(1):133–144
Article Google Scholar
García V, Mollineda R, Sánchez J (2009) Index of balanced accuracy: a performance measure for skewed class distributions. In: Proceedings of the 4th Iberian conference on pattern recognition and image analysis (IbPRIA 2009), lecture notes in computer science, vol 5524, pp 441–448
García V, Sánchez J, Mollineda R (2012) On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl-Based Syst 25(1):13–21
Article Google Scholar
Gong M, Liu J, Li H, Cai Q, Su L (2015) A multiobjective sparse feature learning model for deep neural networks. IEEE Transactions on Neural Networks and Learning Systems 26(12):3263–3277
Article MathSciNet Google Scholar
Han X, Dai Q (2018) Batch-normalized Mlpconv-wise supervised pre-training network in network. Appl Intell 48(1):142–155
Article Google Scholar
Hand D (1986) Recent advances in error rate estimation. Pattern Recogn Lett 4(5):335–346
Article Google Scholar
He H, García EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Jansche M (2005) Maximum expected f-measure training of logistic regression models. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, pp 692–699
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intelligent Data Analysis 6(5):429–449
Article MATH Google Scholar
Joachims T (2005) A support vector method for multivariate performance measures. In: Proceedings of the 22nd international conference on machine learning, pp 377–384
Joshi M (2002) On evaluating performance of classifiers for rare classes. In: Proceedings 2002 IEEE international conference on data mining, pp 641–644
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the fourteenth international conference on machine learning, pp 179– 186
Liu J, Gong M, Miao Q, Wang X, Li H (2018) Structure learning for deep neural networks based on multiobjective optimization. IEEE Transactions on Neural Networks and Learning Systems 29(6):2450–2463
Article MathSciNet Google Scholar
Liu Z, Tan M, Jiang F (2009) Regularized F-measure maximization for feature selection and classification. J Biomed Biotechnol 2009:617946:8
Google Scholar
Luengo J, Fernández A, García S, Herrera F (2011) Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling. Soft Comput 15(10):1909–1936
Article Google Scholar
Ma X, Qi Y, Li L, Liu F, Jiao L, Wu J (2014) MOEA/D with uniform decomposition measurement for many-objective problems. Soft Comput 18(12):2541–2564
Article Google Scholar
Maratea A, Petrosino A, Manzo M (2014) Adjusted F-measure and kernel scaling for imbalanced data learning. Inf Sci 257:331–341
Article Google Scholar
Marchand M, Taylor JS (2003) The set covering machine. J Mach Learn Res 3:723–746
MathSciNet MATH Google Scholar
Martínez-Estudillo F, Hervás-Martínez C, Gutiérrez P, Martínez-Estudillo A (2008) Evolutionary product-unit neural networks classifiers. Neurocomputing 72(1–3):548–561
Article MATH Google Scholar
Musicant DR, Kumar V, Ozgur A (2003) Optimizing F-measure with support vector machines. In: Proceedings of the international FLAIRS conference, pp 356–360
Rijsbergen CJV (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, London
MATH Google Scholar
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437
Article Google Scholar
Song G, Dai Q (2017) A novel double deep ELMs ensemble system for time series forecasting. Knowl-Based Syst 134: 31–49
Article Google Scholar
Tan CJ, Lim CP, Cheah YN (2014) A multi-objective evolutionary algorithm-based ensemble optimizer for feature selection and classification with neural network models. Neurocomputing 125:217–228
Article Google Scholar
Weiss GM (2004) Mining with rarity: a unifying framework. Sigkdd Explorations 6(1):7–19
Article Google Scholar
Weiss GM, Provost F (2003) Learning when training data are costly: the effect of class distribution on tree induction. J Artif Intell Res 19(1):315–354
Article MATH Google Scholar
Wu G, Chang EY (2005) KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Trans Knowl Data Eng 17(6):786–795
Article Google Scholar

Download references

Funding

This work has been partially subsidised by the TIN2014-54583-C2-1-R, TIN2017-85887-C2-1-P and TIN2017-90567-REDT projects of the Spanish Ministry of Economy and Competitiveness (MINECO), and FE726 DER funds of the European Union.

Author information

Authors and Affiliations

Department of Computer Science and Numerical Analysis, University of Cordoba, 14071, Córdoba, Spain
Juan Carlos Fernández, Pedro Antonio Gutiérrez & César Hervás-Martínez
Department of Quantitative Methods, Universidad Loyola Andalucía, Córdoba, Spain
Mariano Carbonero

Authors

Juan Carlos Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Mariano Carbonero
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Antonio Gutiérrez
View author publications
You can also search for this author in PubMed Google Scholar
César Hervás-Martínez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Carlos Fernández.

Ethics declarations

Conflict of interests

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fernández, J.C., Carbonero, M., Gutiérrez, P.A. et al. Multi-objective evolutionary optimization using the relationship between F₁ and accuracy metrics in classification tasks. Appl Intell 49, 3447–3463 (2019). https://doi.org/10.1007/s10489-019-01447-y

Download citation

Published: 13 April 2019
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s10489-019-01447-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multi-objective evolutionary optimization using the relationship between F₁ and accuracy metrics in classification tasks

Abstract

Similar content being viewed by others

A Logarithmic Distance-Based Multi-Objective Genetic Programming Approach for Classification of Imbalanced Data

Selection Operators Based on Maximin Fitness Function for Multi-Objective Evolutionary Algorithms

Multi-objective Feature Selection in Classification: A Differential Evolution Approach

1 Introduction