Keywords

1 Introduction

Boosting is a machine learning effective method of producing a very accurate classification rule by combining a weak classifier [7]. The weak classifier is defined to be a classifier which is only slightly correlated with the true classification i.e. it can classify the object better than a random classifier. In boosting, the weak classifier is learned from various training examples sampled from the original learning set. The sampling procedure is based on the weight of each example. In each iteration, the weights of examples are changing. The final decision of the boosting algorithm is determined on the ensemble of classifiers derived from each iteration of the algorithm. One of the fundamental problems of the development of different boosting algorithms is choosing the weights and to define rules for an ensemble of classifiers. In recent years, many authors presented various concepts based on the boosting idea [6, 9]. In this article we present a new extension of AdaBoost [5] algorithm in which a linear modification of the weights was applied.

This paper is organized as follows: Sect. 2 introduces the necessary terms of the AdaBoost algorithm. In the next section there is our modification of the presented algorithm. Section 4 presents the experiment results comparing AdaBoost with our modification. Finally, some conclusions are presented.

2 AdaBoost Algorithm

In the work [5] weak and strong learning algorithms were discussed. The weak algorithms can classify the object better than random, on the other hand strong algorithms can classify the object accurately. Schapire formulated the first algorithm to “boost” a weak classifier. The main idea of boosting is to improve the prediction of weak learning algorithms by creating a set of weak classifiers which is a single strong classifier. The well-known and widely applied is the AdaBoost algorithm. Its main steps are as follows [2] (Tables 1 and 2):

Table 1. AdaBoost algorithm
Table 2. Notation of the AdaBoost algorithm

One of the main steps in the algorithm is to maintain a distribution of the training set using the weights. Initially, all weights of the training set observations are set equally. If an observation is incorrectly classified (at the current stage) the weight of this observation is increased. Similarly, the correctly classified observation receives less weight in the next step. For this reason the weak learner is forced to focus on the and examples from the training set in each next step of the algorithm. In each step of the AdaBoost algorithm the best weak classifier according to the current distribution of the observation weights is found. The goodness of a weak classifier is measured by its error. Based on the value of this error the ratio is calculated. The final prediction of the AdaBoost algorithm is a weighted majority vote of all weak classifiers.

3 AdaBoost Algorithm with Linear Modification of the Weights

One of the main factors that have an effect on the action of the AdaBoost algorithm is the selection of weights assigned to individual elements of the learning set. Let’s propose then a modification of the AdaBoost algorithm in which a linear modification of the weights is introduced. The value of factor \(c_t\) is modified in point 4d. The value of the modification depends on the iteration of the algorithm t. In experimental studies we have assumed that the value of the coefficient after modification (point 4d) is 1.25, 1.5, 1.75, 2, 2.25 or 2.5 times higher in the first iteration compared to the original algorithm (point 4c). The proposed algorithm steps are presented in Table 3.

Table 3. Lmw-AdaBoost algorithm

In the earlier work [1] we proposed the changes in weights based on interval-valued fuzzy sets and in the work [11] the linear combination of the upper and lower value of the weights to brain-computer interface was applied.

4 Experiments

To test Lmw-AdaBoost algorithm, we performed experiments on Pima data set. The feature selection process [10] was performed to indicate four most informative features for this data set. The final results are obtained via the 10-fold-cross-validation method.

The results for the twenty-five iterations of AdaBoost and proposed Lmw-AdaBoost algorithms are presented in Table 4.

Table 4. The results of experiments

The best results (since the third iteration) are in bold. In general the results for the AdaBoost algorithm are worse than the proposed modifications Lmw-AdaBoost. For the first twelve iterations no clear results were obtained. In the iterations 13–19 the best is the algorithm in which the primary coefficient \(c_t\) is increased 1.5 times in the first iteration. In the last iteration this coefficient is unchanged therefore the parameters a and b are equal \(-0.020833333\) 1.541666667 respectively. In recent iterations the best is the algorithm in which parameters a and b are equal \(-0.041666667\) 2.041666667 respectively. With such parameters in the first iteration coefficient \(c_t\) is increased 2 times. The obtained results show an improvement in the quality of the proposed modification the AdaBoost algorithm with respect to the ordinal one.

5 Conclusions

In this paper we presented the new Lmw-AdaBoost algorithm. It is a modification of the AdaBoost algorithm in which it was changed coefficient \(c_t\). Consequently, the change affects the weights assigned to the individual learning objects. Changes compared to the original algorithm are linear. The value of the change is greater in the initial iterations.

The experiments have been carried out on Pima data sets. The aim of the experiments was to compare the proposed algorithm and the original AdaBoost algorithm. The results obtained show an improvement in the classification quality of the proposed method with respect to the original one.

Future work might include the proposed modification in other boosting algorithms such as RealAdaBoost or GentleAdaBoost as well as application of the proposed methods for various practical task [3, 4, 8] or testing other data sets.