1 Introduction

Computer vision techniques in intelligent real time systems need sophisticated and fast decision making implementations. The bottlenecks in these systems concern the modeling of the external variabilities caused by the scale, the positions and the illuminating conditions during the detection step (Abril et al. 2007; Ho et al. 2003; Mekami et al. 2018). Though these difficulties are still a big challenge, several methods have been proposed in the literature by employing different machine learning paradigms, which have given good performances (Andreopoulos and Tsotsos 2013; Shantaiya et al. 2013; Yilmaz et al. 2006).

One computer vision system for face detection based on ensemble learning, have been proposed by Viola and Jones (2001b). This framework can be generalized to detect any kind of objects (faces, cars, pedestrian...) given as positive examples in the learning database (DB). Their method is based on the combination of a cascade of simple classifiers (weak learners) using the Haar-like features on the image of interest. These weak classifiers are generated using Adaboost algorithm (Schapire 1999a), and they are combined to generate a strong classifier able to detect and track complex objects. The cascade framework in Adaboost allows filtering negative examples away quickly.

The Haar-like features are based on the wavelet representation previously introduced by Papageorgiou et al. (1998). The main idea is the calculation of the difference between two or multiple rectangular regions of the analyzed input image. A sliding window (in general \(24\times 24\) pixels) is applied on different positions of the image, and in each position a threshold is calculated for a feature as the sum of pixels intensity in black regions differentiated with the sum of those in white regions. The calculated threshold of each feature in each position of the sliding window is a weighted score on the learning examples.

Even though the calculation of these features seems simple, major drawbacks reside in the computation time during the training and the detection, since it needs to normalize the histogram (the variance and the mean) of each sliding window position prior to applying the features.

In Abramson and Freund (2005) an improved visual learning features called control points have been proposed, to cope with time processing and poor detection rates of complex objects encountered with Viola and Jones’ features. Regarding the authors, this new kind of features, totally liberate the detection system from the histogram normalization of the examined sub-window.

Viola and Jones’ algorithm perform a complete scan on the input image with different scales, thus it is rather lengthy and computationally expensive. This full search procedure is due to the poor power of discrimination of the weak classifier obtained by the Haar-like features. In other words, Adaboost tries to combine a lot of weak learners by exploring the entire search space. Moreover, expanding the number and the type of visual features (Viola and Jones or control points) will increase drastically the size of the features set. Consequently, the search space would became huge and time processing of Adaboost during the learning and detection will increase exponentially.

To overcome with these inconveniences, we investigated in this paper, the use of meta heuristics search strategies combined with Adaboost for detecting and identifying complex objects learned as positive examples. More specifically, we suggest the introduction of particle swarm optimization (PSO), Genetic Algorithms, and a new hybridization technique for exploring the search space of the visual features instead of a full and exhaustive search initially used in the existing detection frameworks. The obtained results demonstrated that our method increased explicitly the detection rate, and decreased time processing during the training and detection steps.

2 Chronology of the existing methods for real time object detection and tracking

Machine learning is an important research area in artificial intelligence. It refers to implementations, theorems, and methods that allow to a machine to evaluate and perform intelligent tasks throughout a learning process.

Machine learning algorithms can be categorized in two major classes: supervised or unsupervised methods. In the first category, a learning database of labeled examples is used to create an expert decision maker for classifying future examples in the right class of labels. Neural networks, support vectors machine (SVM), Decision Trees are typical examples of supervised machine learning methods. In the unsupervised machine learning category, no expert is needed. The algorithm should discover inherent structures among the data. The Kmeans, PAM (partitions around medoids), or hierarchical clustering are the typical unsupervised algorithms, which learn the global behaviors of the data among the available observations (Mitchell 1997; Bishop 2006; Witten et al. 2011; Marsland 2009; Drew and John 2012; Metidji et al. 2013).

During the past decades there have been an explosion of proposed machine learning methods for object recognition and tracking (Antani et al. 2002; Liu et al. 2013). In fact, real time object detection has become an entire challenging discipline in computer science with the aim of multiple objectives such as: video-surveillance, autonomous robot navigation, intelligent vehicle, drones navigation, biometrics,....

Regarding Yilmaz et al. (2006), the aim of an object detection and tracking is to generate the trajectory of a moving object over time by locating its position. These authors differentiated many methods of tracking as depicted in Fig. 1.

Fig. 1
figure 1

Different object detection and tracking categories in Yilmaz et al. (2006)

Overall, Object detection methods are based on four categories of methods:

  • Knowledge-based methods: Use intuitive representation of the object that we want to detect.

  • Feature invariance: Extract discriminative attributes of the detected object.

  • Template matching: The detected object is represented with global model templates.

  • Appearance model: Learn models from a database of images that contain examples of the detected object. The images should contain different variabilities during the capture for suitable detection.

In the knowledge-based methods (Yang et al. 2002), rules are extracted from images using the knowledge and perceptions of the human on the objects. For example in the case of faces detection, it is easy to propose simple rules for describing primitives of a face and their relationships. For instance, a human face in an image often appears with two eyes that are symmetrical. The relative positions of the various components of the face are studied after being detected. The difficulty in these approaches concern the translation by strict rules, the way that the researchers represent the detected objects.

In the features based methods, the objective is to detect invariant characteristics of the object. Boundaries detectors (eg. Canny filter), morphological estimators (erosion and dilatation), shapelets extractor are usually used to isolate the object from the rest of the image (Yang et al. 2002).

The template matching methods aim at using a model that is manually parameterized, to give a standard representation of the detected object. The main issue of these approaches concern the lake of generalization since it is not possible to represent all possible variations an object can have in an image (illumination, scale...) (Brunelli and Poggio 1993).

The final category concerns appearance models, in which statistical analysis and artificial machine learning (SVM, Neural Networks, Deep Learning, Boosting, KNN...) are used to find discriminating characteristics to classify the object which the user wants to detect. The characteristics learned are in the form of probabilistic distributions or discriminatory functions which are therefore used for the detection. In the meantime, the reduction of dimensionality is usually carried out for calculation quickness and detection efficiency (LeCun et al. 2004).

In this paper we are focusing our work on boosting-based classification systems, which are based on learning models from a database of images. We will show how it is efficient in term of rate detection and time processing, when combining Adaboost with metaheuristics models. The proposed object detection framework was implemented and evaluated with real cars database. The fundamental definitions as well as the experimentations are presented in the next sections.

3 I-SwarmBoost: combining metaheuristics, Adaboost and visual features for real time object detection

3.1 Principle of boosting classification

Boosting is a general method that tries to improve the accuracy of a given learning algorithm. The theoretical basis of the method comes from research studies of Ehrenfeucht and Haussler (1989), Valiant (1984). The problem was to check whether a weak learning algorithm (Weak Learner), can do better than the simple chance. The principle is based on the combination of these weak learners (also called hypotheses), by successive iterations in order to generate a final strong learner. This concept is called probably approximately correct learning (PAC learning). A weak learner is an algorithm that provides weak classifier capable of recognizing two classes at least as well as chance does. Each provided classifier is weighted by the quality of its classification: the better it classifies, the more important weight it will have. The misclassified examples are boosted so that they are more important to the weak learner on the next loop of the algorithm.

Several boosting algorithms have been proposed in the literature (Schapire 1999b). For example, Adaboost (for Adaptive boosting) was introduced in Rudin et al. (2004), Freund et al. (2003) for binary classification problems, Textboost (Schapire and Singer 2000) for textual documents classification and categorization, FilterBoost for regression analysis (Bradley and Schapire 2007), in Collins et al. (2002) it was used for logistic regression and distance learning, and finally in human-computer spoken-dialogues systems as in Schapire (2002).

In this paper we are going to concentrate our work on Adaboost to propose our classification model for object detection in images and video streams. Adaboost is one of the learning algorithms that were mostly used for object detection and tracking tasks (Wu et al. 2015; Schapire 1999a). As a special case of boosting methods, Adaboost is based on the generation of a strong learner by combining a cascade of weak learners. In each iteration, Adaboost searches the most discriminative feature, by learning a weak classifier that minimizes classification error on a set of training examples. It has been shown that the training error exponentially trends to zero as the number of classifiers improve. Algorithm 1 gives a pseudo code of Adaboost. It takes as input a set labeled training examples \(\mathcal {D} =\{(x_{m},y_{m}), n=1 \ldots n, y_{n}=\{-1,1\}\}\), where each example \(x_{i}\) is labeled by a class \(y_{j}\). Adaboost iterates around a prefixed number of rounds t = 1 \(\ldots \) T,

figure a

These examples have at the first step, an uniformed initial weight distribution \(D_{1}(i)=\frac{1}{n}\) that will be updated regarding the obtained results in the previous steps. At each step, a binary classifier \(h_{t}(x)\), which is a classification hypothesis, is generated and the error rate \(\varepsilon _t\) is calculated. A weighting scheme \(\alpha _t\) is introduced on each hypothesis to measure its importance. The final classifier (hypothesis) H is calculated with a majority weighted vote using the previously generated weak classifiers \(h_{t}(x)\).

The interest of Adaboost resides in its ability to reduce the learning error. Freund et al. (2003) have proved that if each weak hypothesis having an error slightly better than chance then the learning error of Adaboost decreases exponentially and rapidly with steps t.

3.2 Visual learning

As reported above, Adaboost is a binary classification algorithm. It has been widely used in a lot of computer vision problems, more precisely for detecting faces/non-faces images. In these systems, Adaboost was combined with visual features extraction methods for identifying human faces in image sequences. Papageorgiou et al. in (1998) were among the pioneers in the utilization of visual features for object detection in static images of cluttered scenes. Their proposed approach was based on Haar-like wavelet representation of an object belonging to a class. Figure 2-Top gives an example of the learning database used by these authors, and Fig. 2-Bottom represents Haar-like wavelets that were applied on the images for visual features extraction.

Fig. 2
figure 2

Top: An example of the learning database used in Papageorgiou et al. (1998) for pedestrian detection. Bottom: The results of Haar wavelets on the images for features extraction

Thanks to Papageorgiou et al. works, Viola and Jones (2001a, b) proposed a new visual object detection framework that has the ability of identifying an object in an image very rapidly. The main contributions of their work concerned in three points: (i) the application of a new image representation called the Integral Image (also called Rectangular Features RF, described in next sub section) which is similar of the wavelet representation inspired from Papageorgiou et al. (1998), (ii) the utilization of Adaboost for the selection of a small number of discriminative visual features from a larger set to enhance the classification, and (iii) the combination of complex classifiers in cascading way, which allows background regions of the image to be quickly eliminated while spending more computation on promising object-like regions. The important results obtained by Viola and Jones have helped the reputation of their rectangular features and facilitated the production of several libraries that implement this technique. Some future contributions have been proposed to improve this Adaboost-based object recognition system, for instance: Lienhart et al. (2003), Lienhart and Maydt (2002) with their new set of visual features, and Ghimire and Lee (2013), Wu et al. (2015) for their multi-classes Adaboost classifiers to the aim of facial expression recognition in image sequences. The following subsections give more details on the visual features that are used within Adaboost for object detection.

3.3 Rectangular features

Viola and Jones proposed an Adaboost object detection system (Viola and Jones 2001a, b; Lienhart et al. 2003) using a set of basic Haar-like wavelet as rectangular features, in a cascading way, as illustrated in Fig. 3a. These features indicate the signal intensity difference of pixels in local rectangular regions neighboring each other. The response of each feature is considered as a simple classifier, and is calculated as the difference between the sum of pixel intensity in gray and white regions. The advantage of using the rectangular features is the simplicity of their calculation within the Integral Image representation (see Fig. 3a): \(II(x,y)=\Sigma _{x^{'}\le x,y^{'}\le y}I(x^{'},y^{'})\). Further, Lienhart et al. in Lienhart et al. (2003) proposed an extension of the rectangular features, as in displayed in Fig. 4. By considering the positions of the features in the image, and their types, we can have five degrees of freedom. Consequently, by varying the positions of the features in the analyzed image, the width, the height, and the type, we can generate a huge set of features (260,000 according to the authors using \(24 \times 24\) window in the image). This number can grow up with the extension of novel features. Hence computing complexity can be raised up. The role of Adaboost is to extract the discriminative features, i.e., those generating low classification errors that are the total weight of all misclassified examples. Figure 5 represents an example of good features kept by Adaboost during the learning. They represent the selected week learners.

Fig. 3
figure 3

Generalized Haar-like visual features used by Viola and Jones (a). The features are applied on an input image, and the integral image is obtained by the calculation of the difference between two rectangular regions. Control points features (b) are based on individual pixel values examination

Fig. 4
figure 4

Feature prototypes of simple haar-like and center-surround features. Black areas have negative and white areas positive weights (Lienhart et al. 2003)

Fig. 5
figure 5

Example of good features kept by Adaboost during the learning. They represent the selected week learners

3.4 Control points features

To overcome with the high processing complexity encountered with rectangular features of Viola and Jones, authors in Abramson et al. (2007) proposed an improvement to the previous object detection schemes, by introducing a new kind of visual features called Control points (CP), which are based on sampling individual pixels within the examined sub-window, instead of comparing the sums of pixel values in rectangular regions. It has been shown that these features are faster and do not demand the preparation of an integral image, nor variance-normalization of each sub-window. Figure 3-B gives an example of control points features that are applied on car learning database. On each image, and in different resolutions, a set of positive \(\{x_1,x_2,\ldots ,x_n\}\) and negative points \(\{y_1,y_2,\ldots ,y_m\}\) are generated on the image. The feature answers positively if and only if for every control point \(x\in \{x_1,x_2,\ldots ,x_n\}\) and every control point \(y\in \{y_1,y_2,\ldots ,y_m\}\), we have val(x) \(\ge \) val(y).

Each CP feature tests the relations between pixels. It tries to look for a predefined order of pixels in the image, and if such is not found, it labels it negatively (Abramson et al. 2007). Thus, CP features are order-based and not intensity-based, they do not consider the difference between two pixels values, but the sign of that difference. It is therefore insensitive to variance normalization.

3.5 The global architecture of I-SwarmBoost

Motivations We have shown that AdaBoost needs a weak learner, i.e., an algorithm which provides a good feature in each iteration. Doubtlessly, choosing the best simple classifier at each step cannot be done by testing all visual features possibilities (varying the positions of the features in the image, the resolution, and the class of the features), since there are more than 260,000 rectangular features, and about 10E32 Control Points features. Therefore, the use of a meta-heuristic algorithm became evident, to explore the research space of the features in an intelligent way, and generating simple classifiers and iteratively improving them. Thus we are proposing here I-SwarmBoost, an improved boosting classification method, by considering both rectangular features (Viola and Jones 2001b), and control points (Abramson et al. 2007) within Adaboost, combined with several meta-heuristic methods for the exploration of the search space of the visual features. The general architecture of our proposed model is illustrated in Fig. 6. As input we dispose of a learning database containing positive and negative examples. Adaboost uses the database by applying a sliding window on the images, with the visual features to extract the best weak learners. In each iteration it uses a meta-heuristic algorithm to explore the search space. The output of I-SwarmBoost is a ROI (region of interest) that is plotted in the image, to identify the detected object.

Fig. 6
figure 6

An overview of the proposed framework. An input database serves Adaboost with learning examples. Meta-heuristic algorithms are then used to explore the search space of the most discriminative visual features

Each category of the visual features was combined with a meta-heuristic algorithm. For clarity each optimization method will be presented in a separate section.

4 Description of the meta-heuristics boosted weak learners

The application case of this study is the detection of cars for helping the autonomous driving system to make decisions by detecting obstacles. The project has been done at Paris robotics center (CAOR), with the aim of evaluating a precedent work in this laboratory (Abramson et al. 2006), and for proposing improvements by developing and implementing other types of weak learners. The obstacles here are in-front cars, through a view of their backside. To that aim we used a learning database (see Fig. 7) in the input of the algorithm, containing positive examples (5000 car images), and negative examples (2500 non cars images). To ensure diversity in the database, we captured the images in different lighting conditions, resolutions, and positions in the road. Note that the detection system could be generalized on other objects, it is only necessary to create positive examples on the class of these objects (human faces, pedestrians, bikes, road signs,...). The video stream should be analyzed in real time to produce a ROI on the positive detected object, as in Fig. 8.

Fig. 7
figure 7

The learning database used for training Adaboost classifiers. Top: positive examples of cars images, Bottom: negative examples that are other objects than cars

Fig. 8
figure 8

A region of interest (ROI) is drawn on the image for identifying the positive detected object (car)

Adaboost algorithm tries to identify in each iteration, the best features that are the most discriminative. This procedure produces a weak-learner. In the case of Adaboost, the weak-learner is an algorithm that explores the space of the visual features. We propose here the evaluation of some exploration methods, using meta-heuristics and intelligent swarm optimization. In the literature, we can have a lot of techniques that were proposed for optimization issues. We advise reading this excellent book for a deep review study (Talbi 2009). The techniques can be classified as either local (typically gradient-based) or global (typically non-gradient based or evolutionary) algorithms (Venter 2010). Among the existing techniques there are: gradient descent (GD) (Bottou 2012), Ant colony (Hajjem et al. 2017), genetic algorithms (GA) (Goldberg 1989; Oliver 2017), evolutionary hill climbing (Abramson et al. 2006), particle swarm optimization (PSO) (Poli et al. 2007), taboo search (Glover and Laguna 1997). All these optimization algorithm can be used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost). They can be best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Control points and rectangular features were evaluated with only one optimization algorithm, which is Hill Climbing as explained in Abramson et al. (2006). In our paper, we propose a continuity of the precedents authors’ work (Abramson et al. 2006), by proposing an exhaustive scheme for the evaluation of other optimization algorithm to discover weak learners with the aim of detection’s rate improvement.

The evaluation protocol that we propose here, involves the comparison of five strategies:

  • Evolutionary hill climbing (Evo-HC-BOOST): we evaluate here the model proposed in Abramson et al. (2006) by fine tuning the parameters of their algorithm. This type of algorithm is used to remedy with the problem of total exploration, and it uses a principle close to genetic algorithms that is to say it begins with a population of features, select the best elements among theme, i.e., those that reduce the classification error, and finally makes mutations between these best elements.

  • Random walk (RW-BOOST): in this case we developed a purely random search that is made in the features’ space with the Max Trials parameter (number of iterations and features) and then the best features found are refined by successive mutations.

  • Particle swarm optimization (PSO-BOOST): in this algorithm, we developed and implemented a new way technique for the exploration of the search space using bio-inspired intelligent methods.

  • Hybrid (H-BOOST): it combines both genetic algorithms and PSO in a global model.

  • Full search (FS): it is expected to perform an exhaustive exploration via Adaboost, of all possible rectangular and control points features, with different scales of the input image. It is the most computationally expansive.

Each method has been evaluated regarding the intrinsic parameters (tuning), time processing during the learning, ROC and Precision/Recall score for the validation. They are presented in the same order in the next subsections.

4.1 Genetic algorithms and evolutionary hill climbing-boosting classifiers: Evo-HC-BOOST

The objective here is to select in each iteration of Adaboost, a simple weak-learner that classifies better than chance for a given distribution of examples. The selection is made here using genetic algorithms, whose common characteristics are to be based on the manipulation of an artificial population, that are elements of a search space (Potential solutions to the problem). Figure 9 illustrates the pseudo-code of a GA algorithm. The evolution of the population is simulated with some operations that can be repeated in a loop, called generations, until the optimization converges. If the parameters are correctly calibrated, than the convergence to a solution can be obtained with small number of iterations.

Fig. 9
figure 9

Genetic algorithm simple flow chart

As it is explained in algorithm 2 pseudo-code, the hill climbing concept consists of allowing a succession of mutations if the learning error decreases.

figure b

The algorithm starts by generating for a given number of generations, random population of features on the input image. The error is calculated on each feature \(\epsilon (h_i^g)\), and best features with low error are selected for mutation. The mutation operator aims at diversifying the population and avoiding exploring around local minimum. But the most important parameter in Evo-HC, is the hill climbing parameter (noImpr). Conversely to gradient descent, it allows exploring the research space even though the gradient increases.

\(Evo-HC\) algorithm has fours important parameters: \(N_{generation}\) which represents the number of generations of the genetic algorithm, \(N_{best}\) representing the best selected features, \(N_{mut}\) the number of mutations applied on \(N_{best}\) visual features, and \(N_{Rand}\) the randomly added features. Our goal here is to evaluate the impact of each of these parameters on Evo-HC algorithm and to find the best tuning regarding the detection rate with our cars learning database. At each time we varied a parameter by keeping unvaried the remaining parameters, and then we calculated the ROC (Receiver Operating Characteristic), the Precision-Recall curves, and also the learning processing time. Results are illustrated in Sect. 5.1.

4.2 Random walk-boosting classifiers: RW-BOOST

Random search (or select and refine) weak-learner is the second proposed algorithm that is combined with Adaboost in our study for object detection purposes. The general principle of this algorithm as its name suggests is based on the generation of a huge number of random features (MaxTrials) from which are retained the bests, then a succession of mutations (MaxMut) are applied only for these selected features. Hence, this algorithm has two parameters that we would like to evaluate: MaxTrials and MaxMut. Algorithm 3 gives an overview of the RW-BOOST pseudo-code. Following the same scheme of evaluation, we will show the impact of each parameter in Sect. 5.2.

figure c

4.3 Particle swarm optimization-boosting classifiers: PSO-BOOST

In this subsection, we introduce a novel weak-learner that we have developed and implemented using boosting paradigm, and which belongs to meta-heuristic techniques. More precisely, it is based on particle swarm optimization (PSO) family of algorithms (Pai and Michel 2017; Tavares et al. 2017).


Definitions Meta-heuristics are of family of deep optimization algorithms aimed at solving difficult problems arising from operational research for which no more conventional method is known to be more effective. Appeared from early 80’s of the last century, metaheuristics are generally iterative stochastic algorithms, which progress towards an optimum by sampling an objective function. The metaheuristics methods, try to find the global optimum of a difficult optimization problem (with possible discontinuities) without being trapped by the local optima. They manipulate one or more solutions, when searching for the optimum that is the best solution to the problem. The successive iterations must make possible to pass from a solution of poor quality to the optimal solution. Usually, the algorithm stops after reaching a stop criterion, which consists in reaching the execution time or in a requested precision.


Characteristics Each metaheuristic algorithm has set of characteristics:

  • They do not require necessary special knowledge of the optimized problem to operate.

  • They are not necessarily used for problems that cannot be optimized by mathematical methods.

  • They are often used in combinatorial optimization.

In a general manner, metaheuristics do articulate around three main notions:

  • Diversification: It refers to the process of gathering information about the optimized problem.

  • Intensification/exploration: It aims at using the information already gathered to define and navigate in the areas of interest of the research space.

  • Memory and learning: It is the support of learning, which allows the algorithm to take into account only the areas where the global optimum is likely to be, thus avoiding the local optima.

Metaheuristics progress in an iterative way, alternating at each step the phases of intensification, diversification and learning, or by combining these notions more closely. The initial state is often chosen randomly, the algorithm then proceeds until a stop criterion is reached. Figure 10 illustrates the process of diversification and intensification.

Fig. 10
figure 10

The diversification (a) and intensification (b) notions of a metaheuristics algorithm

Metaheuristics are often inspired by natural systems, whether they are taken in physics (simulated annealing), or in biology (case of ant colony algorithms or optimization by particle swarms).


Swarm intelligence Particle swarm optimization (PSO) is a metaheuristic of optimization, invented by Russel Eberhart (electrical engineer) and James Kennedy (socio-psychologist) in 1995. The guiding idea for the authors was the simulation of Collective behavior of birds inside a cloud (Kennedy et al. 2001). Initially J. Kennedy and R. Eberhart sought to simulate the ability of birds to fly synchronously and their ability to change direction abruptly while remaining in optimal formation. The model they proposed was then extended into a simple and efficient optimization algorithm. The particles are individuals that move into the hyper-space of research. The exploration process is based on two rules: (i) Each particle has a memory that allows it to memorize the best point by which it has already passed and tends to return at this point (called Pbest for personal best). (ii) Each particle is informed of the best known point in its neighborhood and it will also tend to go towards this group best Gbest point.


Neighborhood The neighborhood of the particles is the structure of the biological social network. The particles within a neighborhood communicate to each other. Different neighborhoods have been studied in the literature.

  • Star topology (Fig. 11middle): the social network is complete, each particle is attracted to the best group particle noted gbest, and communicates with the others. It is this topology that we have chosen for our PSO-BOOST WeakLearner.

  • Ring topology (Fig. 11left): each particle communicates with n immediate neighbors, and tends to move towards the best position in the local neighborhood called lbest.

  • Radius topology (Fig. 11right): a central particle is connected to all others. Only this central particle adjusts its position towards the better, if this causes an improvement the information is propagated to the others.

Fig. 11
figure 11

The different topologies of particle swarm algorithms. (left): ring topology, (middle) star topology, (right) radius topology

PSO algorithm Each particle represents a potential solution in the features space search. The new position of a particle is determined according to its own value and that of its neighbors. Let \(\overrightarrow{x_i(t)}\) be the current position of a particle \(P_i\) at time t. This position is modified by adding a speed vector (velocity) \(\overrightarrow{v_i(t)}\) to its past position: \(\overrightarrow{x_i(t)}\) = \(\overrightarrow{x_i(t-1)}\)+\(\overrightarrow{v_i(t)}\). It is the velocity vector that directs the research process and reflects the “sociability” of the particles. If N particles are considered and each particle compares its new position to its best obtained position, this gives the simple PSO algorithm as in 4, where Fitness is the fitness function.

figure d

This first algorithm does not take into account the neighborhood structure and the topology of the swarm network, since it uses only the improvement obtained on the particle itself. If we consider a star neighborhood, the Algorithm 4 becomes 5. This is the one we have implemented, and tested with Adaboost for object detection.

figure e

This algorithm starts by generating a N particle on random positions \(\overrightarrow{x_i}\) in the search space. Then each particle (which is a visual feature in our case) is evaluated through a fitness function. At each step the best precedent personal fitness as well as the group best fitness are memorized. The more a particle is far from the best position in the swarm, the greater will be the variation of its speed in order to move towards the best solutions. The random variables in this algorithm \(\rho _1\) and \(\rho _2\) represent the compromise between the inertia of the particle, its nostalgia, and its imitation of others. They are chosen rigorously to have a convergence and can be: \(\rho _1=r_1 c_1\), \(\rho _2=r_2 c_2\), with \(r_1\) and \(r_2\) follow a uniform distribution in [0,1] and \(c_1\), \(c_2\) are constants representing a positive acceleration. The algorithm executes as long as a convergence criterion has not been reached. In our case, this can be done by three ways:

  • Reaching a certain number of iterations, called in our case Epoch.

  • The fitness (optimization function) reaches an acceptable value.

  • The velocity variation tends to 0.

Implementation In this paper, each particle represents a visual feature (control point or rectangular feature). As reported antecedently, we followed a start topology, and the different parameters that we want to evaluate here are:

  • The number of weak learners.

  • The number of iteration of the algorithm Epoch.

  • The initial number of randomly generated particles Popsize)

  • Features’ resolutions in the images (full, half, quarter).

The evaluation protocol is the same as in the precedent methods. Results are presented in Sect. 5.3.

5 Experimental evaluation

We present here the evaluation results on the entire previously described algorithms. Recall that one important parameter of Adaboost is the number of the weak-learners, which are generated at the end of the learning process. The more important is the number of the weak learners, the more detection rate increases. This observation is illustrated and asserted in Figs. 12 and 13, where in the first picture we can observe decreasing error rates are visible with increasing number of weak learners, whereas in the second figure we can observe increasing detection rates with high number of weak learners. After a series of experiences we obtained good results with 500 weak-learners for the majority of the algorithms.

Fig. 12
figure 12

Variation of training and validation error when varying the number of weak learners. The more number of weak learners increases, the more error decreases

Fig. 13
figure 13

The more important is the number of weak learners, the more detection rate increases

5.1 Evo-HC-BOOST evaluation

Here we present the evaluation of Evo-HC-BOOST algorithm. Table 1 illustrates the values of the different parameters that have been chosen during the tests with a certain manner for allowing the exploration of each parameter separately.

The first observation that we can made concerns time processing. Greater values are observed when the product \(N_{best}\times N_{mut}\) increases. For example the couples (45,80), (45,50), (30,100) generated about 58, 41 and 31 h respectively. Obviously, important successive mutation operations on best features are very time consuming (see Fig. 14 for more details).

Table 1 Evo-HC-BOOST parameters evaluation. Combinations are ranked regarding time consumed during the learning process
Fig. 14
figure 14

Variation of the processing time (in hours) during the learning of EVO-HC-BOOST. Important time values are observed within greater number of mutations that are applied on the best features

To go deeper in our analysis, we evaluated each combination’s classifier with a ground truth database. Figure 15 depicts the variation of the precision/recall curves within the previously used parameters of Evo-HC-BOOST. For instance, we can see that the good classifiers were generated using great mutation values (eg: \(EVO\text{-}HC\_45best25rand80mut50gen\) classifier). as a further analysis we will elaborate evaluations on each parameter separately, in order to have a good calibration of Evo-HC-BOOST.

Fig. 15
figure 15

Precision Recall curves of different parameters of EVO-HC-BOOST algorithm


Analyzing the effect of the mutation operator Here we present the evaluation of the mutation effect that is applied on the visual features. More precisely, the implemented mutation operator can either move a feature on the image with a specific radius (from 1 to 5 pixels), add randomly a feature at a random position in the image, delete an existing feature, change the resolution of the image, or change the gray-scale of the image (see Fig. 16). We decided to fix the remaining parameters of EVO-HC-BOOST and modify only \(N_{mut}\) with a range in (10,20,40 and 100) as it is illustrated in Table 2. For each tuple of values we calculated the AUC (area under curve) with the ground truth. Firstly, we can confirm past observations about time processing that raises with large mutations. Secondly, we observed that modifying \(N_{mut}\) from 10 to 40 allowed the augmentation of AUC of the generated weak learners, with a max value of 0.95 observed for \(N_{mut}=40\). However, we also observed that with \(N_{mut}\ge 100\), the AUC starts diminishing, which could be an over-learning effect or a well local minimum. Consequently, it is not recommended to augment indefinitely the size of \(N_{mut}\) since it is in one side very time consuming, and in the other side can bring to over-fitting.

Table 2 Evaluation of the mutation parameter of EVO-HC-BOOST
Fig. 16
figure 16

Example of some mutation operators with control point features in EVO-HC-BOOST algorithm


Analyzing the effect of the random operator Here we present the evaluation of the random effect that is applied on the visual features. This parameter represents the number of randomly added individuals (visual features) by the genetic algorithm during the training. It is very important since it allows the expansion of the research space, and avoiding the local optimum.

In a similar way, we have varied values of \(N_{rand}\), fixed the rest of parameters, and calculated AUC of each tuple as illustrated in Table 3. The first observation concerns time processing which is varying in a nonlinear way. Secondly, the real impact of this parameter on the AUC is unforeseeable regarding the tested detectors since for \(N_{rand}=0\) and 50 the AUC is practically the same which can be hardly interpreted.

Table 3 Evaluation of the random parameter of EVO-HC-BOOST

Analyzing the effect of the number of generations operator Here we present the evaluation of the number of generation effect that is applied on the visual features. This parameter represents the stop criteria of the algorithm, that is to say, the number of successive generations without reducing the minimum error of the population. By always keeping the same principle for the other parameters, we did the realizations depicted in Table 4. The first remark that we can make concerns the time processing which is exponentially related to the number of generations. Secondly, large size of generations allow having great AUC values. Likewise \(N_{mut}\), \(N_{gen}\) seems to be a good parameter for the search space global exploration.

Table 4 Evaluation of the number of generations parameter of EVO-HC-BOOST

Discussions Throughout these experiments on the EVO-HC algorithm, we had some intermediate results on the effect of each parameter:

  • For the \(N_{rand}\): it seems useless to choose a large value.

  • For For \(N_{mut}\): seems very effective at least for values between 40 and 60.

  • For the \(N_{gen}\): seems also effective when exceeding 20 generations.

It seems however that none of these varied parameters has an absolutely systematic impact. This may be an effect hidden by the random variations during the training, and/or the over-fitting problem. It may also be necessary to ideally varying certain parameters together (for example, changing the \(N_{mut}\) and \(N_{gen}\) at the same time). However, despite these difficulties, we have generated very good detectors up to \(95\%\) of detection rate. These results are visible on the ROC curves of Fig. 17 in which can be seen the new best combination detectors: \((N_{best}\), \(N_{rand}\), \(N_{mut}\), \(N_{gen})= (30,40,40,30)\) and (45, 25, 50, 50).

Fig. 17
figure 17

Best combination of parameters during the training of EVO-HC-BOOST algorithm

5.2 RW-BOOST evaluation

There are two parameters of RW-BOOST that we want to evaluate, namely: MaxTrials for the number of iterations, and MaxMut for the number of mutations. Table 5 gives the series of the couples of values which were associated to these parameters.


Analyzing the effect of MaxMut parameter As in the previous algorithm, it seems that augmenting the number of Max Mutations from 3000 to 100,000 tends to increase the time of training. The second observation, concerns the performance of the generated classifiers. Indeed, as it is illustrated in Fig. 18, it seems that augmenting MaxMut from 3000 to 10,000, automatically impacts the detection rate and it is reflected on the ROC curves, in which the classifier with MaxTrial \(=\) 300,000, and MaxMut \(=\) 100,000 has obtained the best detection rate.

Table 5 RW-BOOST parameters evaluation
Fig. 18
figure 18

Analyzing the impact of MaxMut parameter, through ROC curves during the training of RW-BOOST algorithm. MaxTrials parameter is fixed on 300000 whereas MaxMut varies from 3000 to 100,000


Analyzing the effect of MaxTrials parameter In a reciprocal manner, to analyze the effect of MaxTrials parameter, we fixed MaxMut at each step, and varied MaxTrials from 30,000 to 300,000. Concerning the impact of this parameter on the learning time, it can be seen in Fig. 19 that it is quite considerable (exponential).

Concerning the impact of MaxTrials on the detection rate, it can be observed in Fig. 20, that ROC curves are overlapping, with a fixed value of \(MaxMut=3000\). Typically, this can be explained as an over-fitting during the learning process.

Fig. 19
figure 19

Exponential training time variation was observed when varying the number of Maxtrials (from 30,000 to 300,000) of RW-BOOST algorithm

Fig. 20
figure 20

Analyzing the impact of MaxTrial parameter, through ROC curves during the training of RW-BOOST algorithm. MaxMut parameter is fixed on 3000 whereas MaxTrials varies from 30,000 to 300,000


Discussions To finalize the study of the RandomSearch WeakLearner, a small comparison with the results obtained in the previous section with the EVO-HC-BOOST was made. Results are presented in precision recall curves of Fig. 21. They show that even for high values of MaxTrials (300,000) and MaxMut (100,000) (With 25h46 of learning time) the detection performance remains very low compared with those obtained with the EVO-HC. We can see that the considerable discrepancy between the best classifier obtained with the EVO-HC (45best25rand50mut50gen) and those of the Random Search.

Fig. 21
figure 21

Comparison between EVO-HC-BOOST and RW-BOOST

5.3 PSO-BOOST evaluation

Analyzing the effect of the number of the weak learners Evaluating this algorithm involves firstly checking the impact of the number of utilized weak learners on the global detection rate. Figure 22 illustrates different detectors with 100, 200, and 500 weak-learners. We can observe that this later is the most efficient classifier in term of true positives in the ROC curve. For information, the average convergence time of all PSO-BOOST classifiers tested here was up to 1h30h which is faster compared to the previous methods.

Fig. 22
figure 22

ROC curves with multiple weak learners of PSO-BOOST algorithm. Using high number of weak learners the system generates good classifier


Analyzing the effect of the swarm population size For evaluating the swarm population, we changed Popsize regarding these values: 40, 45, 60. Results in ROC curves of Fig. 23 shows that for Popsize\(\,=\,\)60, we obtained the best classification rate. Consequently, with large swarm population the search space is explored more efficiently.

Fig. 23
figure 23

ROC curves with different swarm population size of PSO-BOOST algorithm. Using great Popsize the system generates good classifier


Comparison between Evo-HC and PSO-BOOST To enhance our analysis, we decided to compare the three precedent algorithms: EVO-HC, RW and PSO-BOOST. For each algorithm we kept the best combination of parameters that we discovered and discussed previously. The resulting ROC curves of this comparison are given in Fig. 24. We can see that Swarm-BOOST outperform the rest of classifiers.

Fig. 24
figure 24

ROC curves of the comparison between EVO-HC-, RW- and PSO-BOOST algorithms. Results show that swarm-based detector outperforms both genetic-based and random walk algorithms


Discussions In the three algorithms discussed and evaluated above, one of the key difference resides is in the mechanism of generating and producing a new population of solutions via the modification of the obtained solutions from the past populations. The three described methods generate a population of visual features with different equilibrium between intensification and diversification. This dynamic behavior of the population can be deducted from the basic perturbation method used in the creation of new solutions. These facts were observed via the precedent results that have shown good performances of PSO compared to RW and EVO-HC. This can be interpreted by the intelligent way with which the swarm particles explore the search space instead of a “random walk”. Moreover, the drawback of the EVO-HC and RW were in their expensive computational cost compared to PSO. This work has confirmed the previous claimed results in other optimization domains, in which PSO had the better effectiveness (finding the true global optimal solution) compared to GA, with significantly better computational efficiency (Sidhartha and Padhy 2008). Motivated by these results we have implemented an hybrid combination of PSO and EVO-HC by modifying the line number 17 of algorithm 5, in which we applied the previous mutation operator of EVO-HC, on each best obtained particle \(\overrightarrow{x_i}\). Our intuitive hypothesis supposes that global best optimum could be in the neighborhood of each best particle \(\overrightarrow{x_i}\). The result of H-BOOST (hybrid boosting) are presented in Fig. 25. They show thin improvement of the detection rate compared to the previously best PSO-BOOST classifier. Future exploration of H-BOOST could bring better detector compared to PSO and EVO-HC algorithms.

Fig. 25
figure 25

Comparison of the ROC curves obtained with PSO-BOOST and H-BOOST (hybrid) algorithms

6 Discussions and conclusion

In this paper we have presented new boosting classifiers implemented with Adaboost algorithm, in which the weak learners are obtained using challenging meta-heuristics methods, for the aim of object detection and tracking. Firstly we have fine-tuned EVO-HC, an existing Adaboost classifier based on genetic algorithms by identifying its best parameters. Genetic algorithms method has been popular due to their intuitiveness, ease of implementation, and their ability to effectively solve highly non-linear, mixed integer optimization problems. Secondly, we have proposed a new classifier based on particle swarm optimization. Particles Swarms are inspired by the swarming or collaborative behavior of biological populations. The evaluation has shown good results compared to the one based on genetic algorithms. After that we have proposed a third classifier which combines both swarm intelligence and genetic algorithms, and it obtained the best classification rate compared to the other studied techniques. The approaches are supposed to find a solution to a given objective function but employ different strategies and computational effort. It was appropriate to compare their performance in term of detection rate using boosting classification. In fact, all techniques were rigorously evaluated, in term of processing time, detection rates (ROC and Precision/recall curves), parameters calibration,... .

The proposed framework detects objects in new images with an explicit more rapid way than the previous approaches.

As perspectives to this work, we would like to compare the boosting-based classifiers with other machine learning techniques such as deep learning (Goodfellow et al. 2016). Indeed, we can consider the use of the same visual features that can be considered as convolution filter, and implement multi resolution analysis in the neural networks. In a complementary way, we would like to evaluate the proposed algorithms in a full parallel way under big data platforms. Indeed, a parallel Adaboost implementation has been recently published (BoostingPL) in Weka (Indranil and Reddy 2012). Our aim is to check whether the learning process can be improved or not.