Abstract
Image analysis is a prolific field of research which has been broadly studied in the last decades, successfully applied to a great number of disciplines. Since the apparition of Big Data, the number of digital images is explosively growing, and a large amount of multimedia data is publicly available. Not only is it necessary to deal with this increasing number of images, but also to know which features extract from them, and feature selection can help in this scenario. The goal of this paper is to survey the most recent feature selection methods developed and/or applied to image analysis, covering the most popular fields such as image classification, image segmentation, etc. Finally, an experimental evaluation on several popular datasets using well-known feature selection methods is presented, bearing in mind that the aim is not to provide the best feature selection method, but to facilitate comparative studies for the research community.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
One of the most prolific fields of research is image analysis, which in fact has been broadly studied in the last few years. Part of its success is due to the fact that it can be applied to different disciplines, obtaining satisfactory results. A huge amount of information is contained in digital images, which is used with different purposes and extracted accordingly. For example, a manufacturing system of an aircraft factory needs to discard the pieces which present imperfections, so they could be categorized into defective and non-defective pieces. Another example could be a collection of images of the Earth planet acquired from a satellite, and the necessity of partitioning them into several regions, such as water, forest, etc.
Different techniques have been developed to automatically analyze images, not only for general purposes (Haralick et al. 1973; Makadia et al. 2008) but also to be applied to very specific problems (Chen et al. 1998; Núñez and Llacer 2003). They are usually focused on the image processing side, i.e. on the extraction of relevant image properties which may include shape, color, texture, spatial information, etc. However, the availability of a large amount of images for most real-life problems, in conjunction with the large dimensionality of data extracted from certain types of images, make necessary the use of feature selection methods.
Feature selection (FS) is a popular preprocessing step which aims at identifying the relevant features and, at the same time, removing the redundant and irrelevant features. In this way, it is possible to scale down the dimensionality of the data without incurring in degradation in performance (and in some cases it is even possible to improve performance) (Guyon et al. 2006). These characteristics make FS an active research area, widely applied to real problems, mostly belonging to classification although it can be also applied to other fields such as regression, etc.
FS can help to deal with image analysis, and in fact it has been gaining importance in the last few years, not only to diminish the input dimensionality, but also to alleviate the computational burden required for extracting information from the images (Remeseiro et al. 2014) or to understand the causes of disagreement among experts on the analysis of images for diagnosing diseases (Bolón-Canedo et al. 2015a).
This work aims at offering the reader a broad review of the existing FS methods which have been applied (and in some cases developed ad-hoc) in the last few years to help in image analysis. Moreover, we have introduced an explanation of the different categories in this field and the datasets used. We have also designed an experimental study to try the effectiveness of FS on image datasets. We use four popular datasets, where some of them are binary and other have up to 26 classes, covering image classification and segmentation. We applied four state-of-the-art FS methods combined with five popular classifiers, leading to interesting conclusions and guidelines.
The rest of this manuscript is structured as follows. Section 2 is devoted to providing some necessary background about image analysis, specially for those machine learning experts who are not familiar with certain terms. Section 3 describes some basic concepts related to FS, and some popular techniques. Recent works of image analysis which benefit from FS are presented in Sect. 4. Some image datasets and repositories can be found in Sect. 5, and the experimental study is presented in Sect. 6. Finally, Sect. 7 concludes the manuscript.
2 Image analysis
Digital imaging has revolutionized the process of image acquisition, storage or transmission. The availability of digital images along with the Internet have noticeably increase the size of digital image collection (Liu et al. 2007). As a result, many general-purpose imaging techniques have been developed over the last decades.
When talking about digital image processing techniques, it is typical to take into account three different techniques (Gonzalez and Woods 2008): low-level, mid-level, and high-level. Low-level approaches work at the pixel/voxel level, using primitive operators, and with images as input and output data. Mid-level methods include more elaborated tasks with images as input data, whilst the output data can be a set of characteristics/descriptors derived from images. High-level techniques concern the interpretation of the image content, trying to emulate the essential functionality of the human vision system. These three types of processes can be associated, respectively, with three different areas: image processing, image analysis, and computer vision—a brand of artificial intelligence. This paper is focused on the field of image analysis, also known as image understanding.
2.1 Terms
There are many different methods to automatically analyze images, each of them useful for a particular task or range of tasks. Therefore, image analysis techniques can be grouped into different fields, among which there are no clear-cut boundaries. Once reviewed the most up-date methods for FS applied to image analysis, four of these fields haven been considered in this review, and subsequently described.
2.1.1 Image classification
It consists in classifying an image, or a region of it, into a class from a set of classes according to the characteristics that it contains (see Fig. 1). This categorization can be performed by using different approaches: template matching, which compares the image/region with a pattern; tree search algorithms; or more complex classifiers, which use a set of features. Note that digital image classification can be viewed as the fundamental prerequisite for other image analysis pursuits (Ng et al. 2007).
2.1.2 Image segmentation
Its goal is to detect boundaries and objects in digital images, and can be defined as the identification of homogeneous regions (see Fig. 2). That is, an image is divided in multiple regions so that each one is homogeneous according to some characteristic, whilst the union of any two or more segmented areas is not (Cheng et al. 2001).
2.1.3 Image annotation
It consists in automatically assigning keywords or labels to a digital image (see Fig. 3). The idea is to use some low-level features from the image, and then use a classifier to annotate it with concept labels (Zhang et al. 2012a). Therefore, image annotation is also seen as a multi-class image classification task with a large number of class labels. Notice that automatic image annotation can be used as part of an image retrieval process with the aim of locating images in a database.
2.1.4 Image retrieval
Its objective is to search and retrieve images of interest from a database (see Fig. 4). Two different frameworks can be found in the literature (Liu et al. 2007): one is content-based, i.e. images are retrieved based on their visual content; and the other one is text-based, i.e. images are retrieved based on text descriptors manually annotated. In the second case, image retrieval is highly related to image annotation as previously stated.
2.2 First attempts
Image analysis techniques began to deal with different problems several decades ago, and a very large number of approaches can be found in the literature regarding the four fields previously presented. Image classification has been used in different problems using a great range of image features, such as spectral and spatial information (Landgrebe 1980), or texture (Haralick et al. 1973); and by means of several algorithms, such as neural networks (Lee et al. 1990), or decision trees (Mui and Fu 1980). In a similar way, different properties have been considered to perform image segmentation, including color (Lim and Lee 1990), texture (du Buf et al. 1990), or even a combination of them (Shafarenko et al. 1997). Regarding image annotation, texture information has been also commonly found in the literature (Picard and Minka 1995), as well as ontology-based approaches (Schreiber et al. 2001). Finally, image retrieval techniques are also based on properties obtained from image content, including color and shape (Jain and Vailaya 1996).
2.3 Related works
There are other surveys about image analysis techniques although, to the best of our knowledge, none of them has been devoted to specifically analyzing the existing works on FS in this research field. Lu and Weng (2007) presented a review of image classification techniques aimed at improving their performance, as well as discussing important issues that might have effect on it. Zhang et al. (2012a) provided a description of the latest developments in image retrieval as well as a complete review of automatic image annotation methods. In 2008, Datta et al. surveyed a great amount of relevant contributions, both theoretical and empirical, regarding the automatic processes of image retrieval and annotation. Within the field of image segmentation, there are several works that review the most popular techniques, such as Zhang et al. (2008), which is focused on unsupervised methods; or Raut et al. (2009), devoted to prediction.
3 Feature selection
As mentioned in the Introduction, the advent of Big Data problems, specially those containing a high number of features, has brought an important need for methods that can effectively reduce the dimensionality of the data. There are two main approaches to solve this problem: feature extraction and feature selection (FS). Basically, the main difference between them is that FS methods work by selecting a subset of the original features, while feature extraction methods are able to reduce the dimensionality by combining the existing features and creating new ones (see Fig. 5). In the figure we can see an example in which we are trying to guess the race of a given person. A FS system would select those features that can help to determine the race, such as height, hair color, eye color, skin color or name. On the contrary, a feature extraction system would generate—for example—two new features, distinct from the original ones, that are likely to be combining information contained on the features selected by the FS system, but in this way we are losing the original meaning.
Another example, more related with image analysis, can be seen in Fig. 6. In this case, in the left side we can see how the pixels (features) relevant for distinguishing between a four and nine digit are selected by a FS method and marked as blue dots and, in the right side, we can see the two principal components after a feature extraction process.
Both techniques have their merits and demerits (Zhao and Liu 2011). On the one hand, feature extraction methods have the power to generate a new set of features, which are usually more compact and of stronger discriminating power. This approach is the preferred one in applications in which model accuracy is very important, at the expense of interpretability. Examples of this type of applications are information retrieval, image analysis or signal processing. On the other hand, FS methods maintain the original features, so they are usually more adequate for data mining problems, such as text mining or genetic analysis, in which the original features are extremely important for model interpretability and knowledge extraction. Moreover, they offer the possibility of gaining speed and reducing costs (because in the future, the non-relevant features do not need to be considered).
Feature extraction is the preferred approach when dealing with image analysis (Thomaz and Giraldi 2010; Remeseiro et al. 2013; Zhao and Du 2016; Yao et al. 2018), and therefore there are plenty of works analyzing the application of these methods to this field (Weinberger and Saul 2006; Guo et al. 2008; Maaten and Hinton 2008; Juan and Gwun 2009; Patil and Mudengudi 2011). However, although not so common, there are also image analysis works that employ FS methods as a preprocessing step, and this article is focused on reviewing them.
Existing FS methods are typically classified into three approaches: filters, wrappers and embedded methods (Guyon et al. 2006; Bolon-Canedo et al. 2015b). Filters work by using only the general characteristics of the datasets, with no influence of any classification model. For example, there is a huge amount of filters based on Information Theory that use mutual information as a measure of relevance for the features (Brown et al. 2012). Many mutual information feature selection methods have been proposed in the last 25 years (Vergara and Estévez 2014), and still new information theoretic feature selection methods are being proposed every year. Recent examples are a class-specific mutual information variation method for feature selection (Gao et al. 2018a) and another method that integrates two groups of feature evaluation criteria (Gao et al. 2018b), showing very competitive results. Different than filters, wrappers and embedded methods build the subset of relevant features based on a learning algorithm; the former by letting the prediction model to score the quality of the features for the prediction task and, the latter, by performing the selection of the features in the training process of a classifier. Given this relationship with the learner, wrapper and embedded methods are usually more accurate in their selection, but of course more costly in computational terms than filters. Moreover, wrappers are known for having the risk of over-fitting when the number of features is greater than that of samples (Loughrey and Cunningham 2005), so they must be avoided in that situation. One of the most popular ensemble methods is SVM-RFE (Recursive Feature Elimination for Support Vector Machine) (Guyon et al. 2002), which computes the importance of the features in the process of training a SVM. As for wrappers, they basically consist in using a learning algorithm (such as a SVM, decision tree, neural network, etc.) combined with a selection strategy such as forward or backward elimination.
Each year, new FS methods have been appearing belonging to the three categories mentioned above. However, this abundance of FS algorithms has not facilitated the choice of a particular method in a given situation, but quite the opposite. Nevertheless, despite the big amount of FS methods available, some of them have been able to stand out and their use has become very popular among researchers. Some of most popular ones are subsequently described.
Correlation-based feature selection (CFS) (Hall 1999) is a multivariate filter that chooses subsets of attributes uncorrelated among them but showing a high correlation with the class.
Consistency-based filter (Dash and Liu 2003) is a multivariate technique that also selects subsets of features but, in this case, according to the level of consistency with the class. It uses an inconsistency criterion to determine the acceptable data reduction rate.
Information Gain (Hall and Smith 1998) is a simple univariate filter that computes the mutual information between each attribute and the class, then providing an ordered ranking of all the attributes.
ReliefF (Kononenko 1994) is a popular multivariate filter extending a previous version (Relief Kira and Rendell 1992), which is based on nearest neighbors. The idea is to randomly pick an example from the data and then find its nearest hit (neighbor from the same class) and miss (neighbor from the opposite class). After comparing the values of the selected instance with its hit and miss, the relevance score for each feature is updated. The values of a good feature should be similar in samples from the same class, whilst different in samples from different classes.
4 Recent works: 2008—present
In the last decade there has been an important number of FS methods applied to image analysis; both standard and add-hoc to solve specific problems. In this section we review some of these works, classifying them according to the fields defined in Sect. 2: image classification, image segmentation, image annotation, and image retrieval.
4.1 Image classification
Image classification is perhaps one of the most prolific fields of image analysis. Moreover, and probably because of its similarity with regular classification, FS has been extensively applied to preprocess the data, trying to achieve the same level of success as in general data classification.
Among the vast amount of works that can be found in the literature, some of them present comparisons of several FS methods employed prior to the image classification step. Laliberte et al. (2012) evaluated three FS methods for their ability to perform object-based vegetation classification, based on classification trees, distance and optimization. Their experiments conclude that the best option for this problem was to use classification tree analysis. Another example of comparison can be found in Porebski et al. (2010), in which several FS methods based on sequential search are studied, in this case to be applied to the problem of supervised color texture classification.
Not only have different FS methods been compared in the literature, but also different datasets to validate new algorithms. For instance, Barbu et al. (2017) introduced a new FS method with annealing that combines the sequential algorithm and the regularization technique in order to be suitable for big data learning. In fact, it was successfully tested both on real and artificial data, and the experimental results were comparable to state-of-the-art methods while being computationally very efficient and scalable. An example of unsupervised FS can be found in Li and Tang (2015), in which the authors propose a new unsupervised FS scheme based on redundancy analysis and non-negative spectral clustering. The experimental study carried out includes nine image datasets with different types of data (faces, objects, handwritten digits), and demonstrates the effectiveness of the proposal. Other example is the novel unsupervised sparse FS method presented in Cong et al. (2016). Experiments on two public UCI datasets and also on a new medical endoscopic image dataset showed that, when compared with state-of-the-art methods, their proposed method was able to choose the most discriminative features. At the same time, it also assigned meaningful weights to the useful attributes. In Zhang et al. (2018), the authors proposed to use sparsity-inducing regularization to define a supervised FS method that selects the minimum number of features. The experimentation was carried out on different datasets, some of them related to image analysis, and the obtained results demonstrated the effectiveness of their approach. Ma et al. (2017) presented a two-step wrapper comprising both feature ranking and feature subset methods. Several benchmark image data were employed in the testing phase, obtaining results in which the mean overall accuracy was significantly higher than previous approaches.
Other interesting approaches are focused on ensemble learning, based on the idea of combining the output of several methods (so-called experts) instead of relying on a single method, that might not be appropriate for all scenarios and situations. Wen et al. (2015) recently proposed a fast ensemble for FS based on AdaBoost. In particular, it combined the values of the features with the class label. The experimental results outperformed state-of-the-art methods and demonstrated to being able to reduce times in the FS process. In Korytkowski et al. (2016), boosting meta-learning was used to select the most representative subset of features in a novel approach for visual object classification. The experiments demonstrated that accuracy and training time could be improved.
Incremental FS also appears in the specialized literature, as in Jia et al. (2012), in which a new FS algorithm has demonstrated to beat previously published results on the CIFAR-10 dataset, employing for this a much smaller number of features than other methods. In the field of ant colony optimization there are some works devoted to image FS (Chen et al. 2011, 2013), which were able to improve classification results, using less features than other methods and reducing the processing times. A novel FS method was proposed in Zhou et al. (2017) for general image classification. It considers human factors and leverages the value of eye tracking data to find a subset of relevant input attributes, which are subsequently refined by a hybrid method combining FS based on mutual information (mRMR) and on SVMs. Experimentation carried out with two reference databases demonstrated that eye tracking data are of high relevance for FS when classifying images.
In the past few years, and specially with the arrival of Big Data, new challenges are to be faced by researchers to be able to deal with massive volumes of data. This huge amount of data makes that classification becomes a more complex and computationally demanding task, requiring the use of advanced techniques for Deep Learning (LeCun et al. 2015) and, more specifically when using images, of Convolutional Neural Network (CNN) approaches (Krizhevsky et al. 2012; Szegedy et al. 2015). Some works are recently appearing that apply FS based on deep learning to image classification. For example, Zou et al. (2015) presented a new FS method based on deep learning formulated as a feature reconstruction problem. This new filter works by selecting those attributes that can be reconstructed as relevant and was successfully applied to remote sensing scene images. Roffo et al. (2015) presented a new filter called infinite feature selection (Inf-FS) which performs the ranking step in an unsupervised manner. Their approach is based on affinity graphs and construct cost matrices that take into account pairwise relationships between attributes. They tested their approach on features extracted with a CNN from object recognition benchmark datasets, achieving results similar to the state-of-the-art.
Also in the context of high-dimensional data, Zheng et al. proposed a novel FS procedure to deal with incomplete datasets (Zheng et al. 2018). Their idea lies in defining a robust FS framework by taking into account the influence of outliers. Experimentation was carried out using different incomplete datasets, both real and artificial, one of which corresponds to Internet Advertisements and includes, among others, geometrical features computed from advertisement images. The results demonstrated the adequacy of the proposed method compared to other FS methods.
Finally, it is worth noting that there are also works in the literature devoted to a specific application. In particular, we found an important number of articles promoting the use of FS for the classification of hyperspectral imagery, since it contains enormous amounts of data and the challenge is to obtain higher accuracy without incurring computational inefficiency. In this situation, dimensionality reduction techniques are often applied. Jia et al. (2010) proposed a hybrid approach combining feature extraction (wavelet transform) and feature selection (affinity propagation), and concluded that their method outperformed other approaches that tackle feature extraction or feature selection individually. The same author (Jia et al. 2014) proposed later a FS framework, demonstrating the advantages of this proposal when compared to standard methods in a complex scenario with only a few instances per class being labeled. Also with very limited labeled samples, Zhang et al. (2012b) introduced a new FS algorithm in the field of particle swarm optimization (PSO), showing promising results. Based as well on PSO, but in this case integrated with a genetic algorithm, a new FS method was proposed in Ghamisi and Benediktsson (2015) for dealing with this type of data. In Shen et al. (2013) the authors proposed an approach to select useful and non redundant Gabor features, since their original dimension was huge. This method was based on Markov blanket and symmetrical uncertainty. Also, a correlation coefficient for FS was proposed in Qi et al. (2017), which used in combination with an optimized SVM allowed to improve state-of-the-art methods in terms of accuracy and efficiency. Finally, a common trend in this domain seems to be to employ FS schemes based on support vector machines (Pal and Foody 2010; Kuo et al. 2014).
Regarding filter methods, a common approach is to apply those related with the information theory field. For instance, Kerroum et al. (2010) proposed a new method with the goal of creating digital thematic maps for cartography exploitation (in the context of multispectral image classification). The method was based on Gaussian mixture models and computed the mutual information between the attributes and the class. The experimental results demonstrated that the selected textural features were the best option to improve the classification performance. Other works apply more sophisticated FS methods, for example through the use of SVM and different kernels. The authors in Tuia et al. (2010) proposed a method which works by automatically optimizing a linear combination of kernels, each of them based on a different subset of attributes. The experimental results, performed in contextual, multi- and hyperspectral, and multisource remote sensing data classification, demonstrated that the method is capable of ranking the features according to their relevance without losing computational efficiency.
Based on fuzzy-rough sets, we can find some works in the literature which used fuzzy-rough FS trying to classify Mars image (Shang et al. 2011; Shang and Barnes 2013) showing promising results that could help in future Mars rover missions, both for ground-based or on-board image classification.
4.2 Image segmentation
There are several algorithms found in the literature to deal with image segmentation, not only as theoretical methods but also applied to different real-life problems. In both cases, FS has been applied in order to improve the algorithm performance.
Regarding generic methods, a novel framework was presented in Levin and Weiss (2009) to combine two different types of segmentation, bottom-up and top-down, into an energy function. Authors used supervised learning and a feature induction method for conditional random fields. From a large set of candidate features, the supervised learning method chooses those relevant to define the energy function. For this task, they proposed their own FS algorithm, an iterative process that selects those features that maximize a conditional log likelihood. The learned algorithm achieved the performance of the state-of-art methods, being tested on three different datasets. In Liang et al. (2017), we can find an approach based on genetic programming (GP) in the field of image segmentation. They proposed three novel FS methods, covering both single- and multi-objective GP. The experimentation demonstrated that the proposed multi-objective methods provide feature subsets that, in combination with classical classification systems, are able to improve previous results while being less computationally expensive. Cheng et al. (2016) presented a hierarchical FS method used as part of an image segmentation system implemented on GPUs, which also included a fusion strategy with learning. The experimental results demonstrated its main advantage in speed, with no degradation in performance. Additionally, the system can be used in different image segmentation tasks.
Popular FS methods, such as random forests, can be also found in image segmentation problems. They were used for object class segmentation in Schroff et al. (2008). This research demonstrated that the use of random forests allows to combine different image features in such a way that the pixel-wise segmentation performance is improved.
Different real problems have been also addressed in the area of image annotation. One representative example is fingerprint analysis, very used by e.g. law enforcements agencies, in which obtaining a correct segmentation is a critical issue. Sankaran et al. (2017) focused on the automatic segmentation of fingerprints with the aim of differentiating types of patterns: ridge and non-ridge. In this context, they presented a FS technique based on the Relief algorithm to analyze how different category features affect the segmentation. Additionally, a random decision forest was used to categorize the local patches into two target classes: background and foreground. The validation was carried out by using three public databases and demonstrated the adequacy of the method to the problem at hand.
Another real problem in which FS was used to increase the performance of an image segmentation procedure is rock analysis. In Perez et al. (2011), segmentation and classification were carried out to estimate the mineral composition in rock images. The minimum Redundancy Maximum Relevance (mRMR) algorithm was used to select 14 from 36 features extracted from rock images. In mining applications, the monitoring of rock composition has to work in real-time, which can be achieved thanks to FS methods that allow a reduction in both the dimensional space and the computational time.
Most of FS approaches determine features that are appropriate for a given dataset, being the main target of Izadipour et al. (2016) to overcome this issue. The authors proposed a FS method independent of the dataset considered by predefining the effective feature types based on reasonable facts and selecting the appropriate candidate features for each feature type. In this manner, the features selected from a single image can be used in image segmentation applied to satellite images. The obtained results improved the ones provided by well-known FS methods, according to different evaluation measures. Also for satellite image segmentation, Chen et al. (2016) introduced a new semi-supervised FS method. It works by generating different feature views (obtaining when attributes are distributed into several disjoint sets). The idea is to evaluate features and select them within each view. The experimentation on a dataset of very high resolution satellite images demonstrated that their new method performs better than the traditional algorithms.
4.3 Image annotation
Automatic image annotation consists in, given an input image, assigning it a semantic label, or more. It plays a key role to increase the effectiveness of other image processes such as retrieval or analysis. It can be defined as a special case of image classification and, therefore, FS is commonly used in this context too.
With the main aim of improving the performance of automatic systems for image annotation, researchers have to focus their attention on suitable frameworks for several purposes, including image content representation, feature extraction, classification algorithms, and feature selection. Regarding the latter issue, a FS approach was presented in Jin et al. (2015) with the aim of improving image annotation. In this case, the contribution of each image feature was measured by means of mutual information, and its performance was measured by means of a non-linear factor in the evaluation function.
As most of the features used in these systems may be noisy and/or redundant, FS was used in Ma et al. (2012) to represent the images in a more compact and precise way, which implies an improvement in terms of performance. More specifically, the authors proposed a novel method with two appealing properties: selecting the relevant attributes with a sparsity-based model, and determining the share sub-space of original features, useful in multi-label problems. Experimentation demonstrated that the proposed method for FS is robust as well as adequate for web images, which usually have multi-labels. In Shi et al. (2015), a sparse FS framework was presented. The validation procedure demonstrated that this novel technique is both effective and efficient, as well as suitable for large-scale image annotation.
Genetic algorithms (GA) force a natural selection to find optimal values of some function, and thus they are susceptible to be used for FS. Lu et al. (2008) used color, texture and shape properties to represent low-level features in their image annotation procedure. In order to optimize the weights of feature vectors, they defined the fitness function using a GA and k-nearest neighbor accuracy. In the same context, the authors in Li et al. (2010) proposed a method for image annotation using Adaboost and a FS method based on GA. Their idea lies in generating and optimizing a set of feature subsets at each iteration of the Adaboost method by means of a GA. In this case, two different approaches of GA-based FS were analyzed. In addition to genetic algorithms, another evolutionary computation paradigm has been used for FS applied to image annotation: particle swarm optimization (PSO). A novel scheme for image annotation was presented in Jin and Jin (2015), based on an improved quantum PSO method for visual FS. Additionally, the performance of this approach was improved by applying a boosting-based ensemble strategy. The experimentation demonstrated that the proposed scheme is adequate for the problem at hand.
Much attention has been attracted by multi-task FS in the last few years, since it often outperforms single-task FS. In Li et al. (2016b), the authors addressed the problem of semi-supervised multi-task FS as applied to social image annotation. For this purpose, the authors proposed to use manifold regularization in order to manage the great imbalance between labeled and unlabeled instances. The process consists in estimating a FS matrix by integrating the obtained information into a learning framework, resulting in a novel method that outperforms state-of-art techniques. In the same line of research, Zeng et al. (2017) presented a novel semi-supervised multi-label FS model and apply it to the task of multimedia annotation. The idea lies in combining both semi-supervised and multi-label feature learning into a single framework. They applied the proposed method to both web page and image annotation, using several types of real-world multimedia datasets, and demonstrated its effectivenesses.
4.4 Image retrieval
Image retrieval systems have become a focus of research in the field of image analysis and machine vision, and FS has been successfully applied to them as can be found in the literature and following summarized.
FS can be applied because of the very large number of image features and image classes. Regarding the first issue, experiments to compare different features for image retrieval were presented in Deselaers et al. (2008). The paper includes an analysis of a large set of different features and a comparison of them on different tasks, such as photo and building retrieval. In order to determine how different features can be used in combination, the proposed method analyzes the correlation between them and gives some recommendations to select an appropriate set of features, depending on the type of data.
A common complication that deserves attention in image retrieval is the existing difference between what features we humans see in a particular image, and the semantic features we use when it comes to describe it. For example, Lotfabadi et al. (2015) presented a method to extract useful features from a feature vector. It combines a SVM classifier with a fuzzy rough set based on mutual information. The method has been compared with traditional systems that use other techniques for dimensionality reduction, and the experimentation demonstrated its adequacy in terms of accuracy and robustness by providing results highly relevant to the content of an image query.
On the other hand, cross-modal retrieval has lately attracted much attention because of the widespread use of multi-modal data. In this problem, relevant objects of one type of data are retrieved by means of another type of data used as the query. Therefore, researchers have to deal with two main issues: the measure of relevance and coupled FS. Both problems were tackled in Wang et al. (2016a) by means of a novel joint learning framework. The second problem, that is the key point in this survey, was approached by a learning procedure that uses the \(l_2\)-norm penalties on the projection matrices. This procedure allows to simultaneously select relevant and discriminative features, outperforming the state-of-the-art results.
There are also specific applications in the field of image retrieval. In Li et al. (2016a), a novel approach of content-based image retrieval was presented as applied to remote sensing images. The authors proposed a novel scheme that, inspired on FS, selects in an adaptive way the adequate vantage point tree indexing when considering different feature spaces. In this manner, the system is able to increase the response speed as well as the retrieval quality. The traffic vehicle search in large databases was addressed in Zhu et al. (2017), in which the authors define a local descriptor based on the gradient quantity as well as the spatial gradient distribution of the feature. Then, they propose an adaptive FS method by combining feature distinctive degree and the priori information. The experimentation carried out showed that their proposal outperformed the standard algorithms. Other application found in the literature is Batik image retrieval (Fahmi et al. 2016), being Batik a unique fabric from Indonesia. The authors analyzed the performance of FS and reduction techniques on the batik retrieval process. Particularly, they used sequential forward floating selection (SFFS) that consists of a forward step and a conditional backward step. The experimental results demonstrated that SFFS can improve the processing time allowing the method to be 1800 times faster.
Regarding many motion data based applications, it should be highlighted the important role played by human motion retrieval. Wang et al. (2016b) presented an adaptive multi-view FS technique to deal with this problem. In a first step, they used linear regression in a local way trying to learn Laplacian graphs based on multiple views, that were then combined together to take advantage of the complementary information between different attributes. Then, an objective function was formulated as a trace ratio optimization problem to remove from the original feature representation those feature components that are either irrelevant or redundant. Experimentation performed on two datasets publicly available show that the method is sound and achieved a state-of-art performance for motion data retrieval, as well as being adequate for other real-world applications.
4.5 Summary
We have reviewed the recent works promoting the use of FS to deal with image analysis, grouped according to the field they were applied to. Additionally, we have analyzed the nature of the FS methods usually applied. In Table 1 we can see a summary of all the methods reviewed.
Figure 7 illustrates the distribution of the FS methods present in the literature review, according to different aspects. Firstly, we have analyzed if the FS methods were new methods or classical techniques applied to deal with a given problem (see Fig. 7a). Regarding this, we can say that most of the works proposed new methods, including generic approaches for a certain image analysis category or ad-hoc frameworks to solve very specific problems. Other works use existing techniques, although in both cases the experimental results show the adequacy of using FS in this domain.
Although many proposed methods are tested on benchmark and state-of-the-art datasets, others are applied to solve real problems. We have seen that analysis of remote sensing images through the use of feature selection methods has been the focus of much attention, especially when it comes to classification but also in image retrieval and image segmentation. Hyperspectral imagery, medical data or advertisements are other real applications that have been found in the field of image classification. Regarding image segmentation, feature selection has been used to deal with fingerprint analysis and rock analysis. For image annotation, the real applications we found are mainly based on multimedia datasets. Finally, feature selection methods have been successfully applied in image retrieval to real problems such as traffic vehicle search or human motion retrieval.
We have also analyzed the nature of the FS methods (see Fig. 7b). In this sense, we could note that the methods based on information theory are still in use, in spite of the fact that they have appeared several years ago—a good review about this kind of methods is provided by Vergara and Estévez (2014). Ensemble learning is an approach that combines the results of multiple methods (or experts) aiming at obtaining better performance than that of any single method. Mostly, this approach has been applied to classification problems, but also to image analysis, being Adaboost and Random Forest among the most used techniques. The evolutionary computing paradigm (Xue et al. 2016), comprising genetic algorithms and particle swarm optimization, is widely employed. Possibly because of the good outcomes of the SVM classifier, some FS methods based on it and its different kernels have been also used. It is also worth to mention the tendency to combine different methods, also known as hybrid methods, trying to improve the performance achieved with classical FS methods. Finally, other approaches found in the literature are sparse FS, wrappers, fuzzy-based approaches, regularization techniques, or methods based on distances.
Finally, we have analyzed the different types of FS methods according to the three approaches detailed in Sect. 3. As can be observed in Fig. 7c, most of the approaches correspond to filters, regardless of the category considered, and probably due to their independence of the classification model and their lower risk of overfitting. Embedded methods are also quite used in the literature, especially in the field of classification, probably because of their ability of performing feature selection and classification at the same time. Particularly popular are those embedded methods based on the \(\ell _1\)-norm. Wrappers, possibly because of their high computational cost, are the less used approach.
Notice that, normally, deep learning algorithms are used to extract relevant features. By removing the last layer, one can take the final layer as a feature vector. These are the so-called deep features (Zhou et al. 2014; Kong et al. 2016; Liu et al. 2017). This is a feature extraction procedure so that is the reason why it is not reviewed in this article.
5 Image datasets
When researchers started to work in the field of image analysis, a key point was to find datasets publicly available to test their new approaches. Nowadays, several image databases are commonly used as benchmark datasets in different topics. Table 2 presents some of the most popular ones, and includes a brief description of them as well as some useful information.
Among all these image databases, it should be highlighted ImageNet (Deng et al. 2009) as one of the most popular collections. It is arranged based on the WordNet hierarchy, and each of its nodes is characterized by thousands of images. Currently, it has an average of over five hundred images per node, and more than 14 millions in total. Note that it became more popular thanks to the ImageNet Large Scale Visual Recognition Challenge (ILSVRC),Footnote 1 which helps image analysis researchers to test their new algorithms at a large scale. COCO is also well-known due to its challenges that include their joint organization with ILSVRC in 2016 and Coco + Places 2017. There are other databases that are also very popular due to similar challenges, such as for example PASCAL VOC and the PASCAL Visual Object Classes Challenge (Everingham et al. 2015). Although the VOC challenges have now finished (2005-2012), the PASCAL VOC Evaluation Server as well as the different versions of the PASCAL VOC dataset continue to be available for benchmarking.
Note that image datasets include, in general, a set of row images (samples) and their respective annotations (labels); in contrast to the datasets commonly used in other machine learning problems in which samples are directly represented by feature vectors. Therefore, when dealing with row images, some feature extraction is required as a previous step to apply FS. If researchers want to focus their efforts on the FS procedure, it should be more appropriate for them to directly manage a set of features or image properties. In this sense, some popular image datasets (e.g. ImageNet) already include a set of features computed from the images (e.g. SIFT features). Additionally, we should highlight the UCI Machine Learning Repository (Blake and Merz 1998), which contains datasets for general machine learning purposes, some of which are related to image analysis. The datasets included in this repository are provided in a way that machine learning algorithms (such as FS methods) can be directly applied. For this reason, some of the image datasets included in the UCI repository will be used in Sect. 6 for experimentation.
6 An experimental study
Typically, the most important benefits from performing FS on image analysis are to improve the learning performance or to better understand what features or pixels are important. If we are interested in class prediction, it is necessary to employ afterwards a supervised machine learning technique, such as a classifier. On the contrary, if the goal is data understanding, the classification part is ignored and the selected features have to be individually evaluated. In this section we present experiments focused on class prediction.
6.1 Datasets
We have chosen four image datasets from the UCI repository (see Sect. 5). The reason to choose these datasets is because they have the features already extracted, so we do not introduce another layer of complexity depending on the method used to extract the features from the image. In Table 3 we can see a summary of the main characteristics of these datasets.
Gisette is a handwritten digit recognition problem. The problem is to separate the highly confusible digits ‘4’ and ‘9’. This dataset is one of five datasets of the NIPS 2003 feature selection challenge (Guyon et al. 2005). The digits were size-normalized and centered in a fixed-size image of dimension \(28\times 28\). The original data were modified for the purpose of the FS challenge. In particular, pixels were sampled at random in the middle top part of the feature containing the information necessary to disambiguate 4 from 9 and higher order features were created as products of these pixels to plunge the problem in a higher dimensional feature space. They also added a number of distracting features called ‘probes’ having no predictive power. The order of the features and patterns was randomized.
Image Segmentation is an image dataset described by high-level numeric-valued attributes. The instances were drawn randomly from a database of 7 outdoor images. The images were hand-segmented to create a classification for every pixel. Each instance is a \(3\times 3\) region.
Letter Recognition is a database of character image features, in which the objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The character images were based on 20 different fonts and each letter within these 20 fonts was randomly distorted to produce a file of 20000 unique stimuli. Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts), which were then scaled to fit into a range of integer values from 0 through 15.
Semeion Handwritten Digit dataset was constructed by collecting 1593 handwritten digits from around 80 people, and then scan and stretch them in a \(16\times 16\) rectangular box with a gray scale of 256 values. Then each pixel of each image was scaled between 0 and 1, by setting to 0 every pixel whose value was under the value 127 of the gray scale (127 included) and setting to 1 each pixel whose original value in the gray scale was over 127. Finally, each binary image was scaled again into a \(16\times 16\) square box (the final 256 binary attributes). Each person wrote on a paper all the digits from 0 to 9, twice. The commitment was to write the digit the first time in the normal way (trying to write each digit accurately) and the second time in a fast way (with no accuracy).
6.2 Results
The goal of this section is to perform an experimental study using four representative image datasets extracted from the UCI Machine Learning Repository and some classical widely used FS methods, providing the readers with some baselines for their comparisons. For this purpose, we have chosen four popular FS algorithms that can be considered as state-of-the-art and extensively used by researchers in Machine Learning (Bolon-Canedo et al. 2015b): CFS, consistency-based, Information Gain and ReliefF. Two of the FS methods (CFS and consistency-based) return a subset of features, whilst the other two (Information Gain and ReliefF) provide an ordered ranking of the features. These methods were selected because they are available in popular tools used by researchers, such as Weka, Matlab, RapidMiner or KEEL. Among them, we have chosen WekaFootnote 2 since it includes the four of them and it is very easy to use even for non experienced researchers. Note that for the ranker methods, we show the performance when the top 40% of the features are retained.
In order to evaluate the adequacy of these methods over image data, five well-known classifiers were chosen: C4.5, naive Bayes (NB), Support Vector Machine (SVM), Random Forest (RF) and k-NN (with \(k=3\), in this case). Except for the parameter k in the k-NN classifier, the rest of the parameters have the values given by default in Weka, for an easy reproducibility of the experiments. In the case of Gisette dataset, we used the original division into train and test sets, and for the remaining, we performed a holdout validation using 2/3 of data for training and 1/3 for test.
Figure 8 shows the experimental results obtained by the different classifiers in the four datasets. As can be seen, the datasets Gisette, letter recognition and image segmentation are in general benefited from the application of FS methods, improving their classification accuracy with respect to using all the features. On the contrary, for Semeion the best results were obtained when using the whole set of features, which suggests that all the pixels are necessary to correctly determine the class. Analyzing in more detail the behavior of the FS methods as well as the influence of the classifier on the studied datasets, some interesting conclusions can be drawn:
The best performances are obtained by Random Forest, kNN and, specially, SVM. This is not surprising since Random Forest and SVM are reported to be powerful classifiers (Fernández-Delgado et al. 2014). Notice the bad results obtained by SVM on the letter recognition dataset compared with the performance of kNN. This can be due to the fact that this dataset has a high number of classes (26) and SVMs are designed to deal with binary problems, so they have to perform one-versus-rest or one-versus-one approaches, leading to loss of accuracy.
Focusing on the FS methods, in general for all datasets, the subset filters (CFS and consistency-based) show an outstanding behavior. The poor performance of the ranker methods (except for Gisette dataset) can be explained by the restriction of having to establish a threshold for the number of features to keep, which might not be enough in some cases. In the case of subset filters, the number of selected features is supposed to be the optimal one for a given dataset. Thus, the main disadvantage of rankers is the need for setting the threshold a priori, with the risk of choosing a too large or too small number.
All the methods tested except Information Gain are multivariate, which in theory provide a better performance than univariate methods. In average for all the datasets and classifiers tested, in fact Information Gain obtains the worst test accuracy. However, there are some cases in which for Gisette dataset this method leads to the best classification accuracy, suggesting that there are not important interactions between features in this dataset.
In light of the above, it can be seen that the results obtained by this experimental study are highly dependent on the classifier, the FS method, and in particular the dataset. Although a detailed analysis of the results is outside the scope of this paper, the authors recommend the use of subset FS methods (in particular, CFS) in combination with SVM or Random Forest classifiers.
7 Conclusions
In this work we have provided an exhaustive review and analysis of the recent contributions of FS as a preprocessing step applied to the field of image analysis. Nowadays, with the Big Data phenomenon surrounding us, the necessity of using FS methods is more important than ever, although it was decades ago when image analysis researchers noticed the need of knowing which features had to be extracted from each pixel (Bolón-Canedo et al. 2015c).
Image analysis covers a wide field of applications and, thus, of specific techniques. This work has been focused on those in which FS has been mostly applied—image classification, image segmentation, image annotation, and image retrieval—, providing an extensive description of each of them to avoid confusions to the interested reader who is not an expert on the field. Analogously, basic FS concepts are also explained.
The goal of this work is to explain and review the different image analysis categories and the FS approaches that have been applied to them, bringing together as much up-to-date knowledge as possible. Thus, recent works found in the specialized literature have been exhaustively examined, in an attempt to describe the applications of FS to the different subfields of image analysis. Furthermore, the most popular data repositories in this field have been briefly presented.
Finally, we have performed a practical evaluation for FS methods using image datasets in which we analyze the results obtained. We chose four widely-used datasets to apply over them four classical FS methods. In order to obtain the final classification accuracy, five well-known classifiers were used. This set of experiments also aims at facilitating future comparative studies when a researcher proposes a new method.
Regarding the opportunities for future research, it is essential not to overlook the new scenario of Big Data, in which it is not only important to deal with millions of pixels in a given image, but also with millions of images at the time. As pointed out in Bolón-Canedo et al. (2015c), “data is being collected at an unprecedented fast pace and, consequently, needs to be processed rapidly”. We live in a society where social media networks are everywhere, specially thanks to portable devices, which generate huge amounts of data each second. Therefore, we need sophisticated methods able to process millions of images in real time. To this end, on-line FS methods are in need, which still remain a challenge for researchers. Moreover, another way to solve this issue is to develop distributed FS methods, trying to alleviate the computational burden required for processing large amounts of images.
Notes
http://www.image-net.org/challenges/LSVRC/.
References
Barbu A, She Y, Ding L, Gramajo G (2017) Feature selection with annealing for computer vision and big data learning. IEEE Trans Pattern Anal Mach Intell 39(2):272–286
Blake CL, Merz CJ (1998) UCI machine learning repository, vol 55. Department of Information and Computer Science, University of California. http://archive.ics.uci.edu/ml/. Accessed August 2019
Bolón-Canedo V, Ataer-Cansizoglu E, Erdogmus D, Kalpathy-Cramer J, Fontenla-Romero O, Alonso-Betanzos A, Chiang M (2015a) Dealing with inter-expert variability in retinopathy of prematurity: a machine learning approach. Comput Methods Programs Biomed 122(1):1–15
Bolon-Canedo V, Sanchez-Marono N, Alonso-Betanzos A (2015b) Feature selection for high-dimensional data. Springer, Berlin
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015c) Recent advances and emerging challenges of feature selection in the context of big data. Knowl Based Syst 86:33–45
Bossard L, Guillaumin M, Van Gool L (2014) Food-101—mining discriminative components with random forests. In: European conference on computer vision, pp 446–461
Brown G, Pocock A, Zhao MJ, Luján M (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13:27–66
Chen EL, Chung PC, Chen CL, Tsai HM, Chang CI (1998) An automatic diagnostic system for CT liver image classification. IEEE Trans Biomed Eng 45(6):783–794
Chen L, Chen B, Chen Y (2011) Image feature selection based on ant colony optimization. In: AI 2011: advances in artificial intelligence, pp 580–589
Chen B, Chen L, Chen Y (2013) Efficient ant colony optimization for image feature selection. Signal Process 93(6):1566–1576
Chen X, Liu W, Su F, Shao G (2016) Semi-supervised multiview feature selection with label learning for VHR remote sensing images. In: IEEE international geoscience and remote sensing symposium, pp 2372–2375
Cheng HD, Jiang X, Sun Y, Wang J (2001) Color image segmentation: advances and prospects. Pattern Recognit 34(12):2259–2281
Cheng MM, Liu Y, Hou Q, Bian J, Torr P, Hu SM, Tu Z (2016) HFS: hierarchical feature selection for efficient image segmentation. In: European conference on computer vision, pp 867–882
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: ACM international conference on image and video retrieval, p 48
Cong Y, Wang S, Fan B, Yang Y, Yu H (2016) UDSFS: unsupervised deep sparse feature selection. Neurocomputing 196:150–158
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1):155–176
Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2):5
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
Deselaers T, Keysers D, Ney H (2008) Features for image retrieval: an experimental comparison. Inf Retr 11(2):77–107
du Buf JMH, Kardan M, Spann M (1990) Texture feature performance for image segmentation. Pattern Recognit 23(3–4):291–309
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The PASCAL visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Fahmi H, Zen RA, Sanabila HR, Nurhaida I, Arymurthy AM (2016) Feature selection and reduction for Batik image retrieval. In: Proceedings of the fifth international conference on network, communication and computing, pp 47–52
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J Mach Learn Res 15(1):3133–3181
Gao W, Hu L, Zhang P (2018a) Class-specific mutual information variation for feature selection. Pattern Recognit 79:328–339
Gao W, Hu L, Zhang P, Wang F (2018b) Feature selection by integrating two groups of feature evaluation criteria. Expert Syst Appl 110:11–19
Ghamisi P, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Remote Sens Lett 12(2):309–313
Gonzalez RC, Woods RE (2008) Digital image processing, 3rd edn. Pearson, Prentice Hall, Englewood Cliffs
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset
Guo G, Fu Y, Dyer CR, Huang TS (2008) Image-based human age estimation by manifold learning and locally adjusted robust regression. IEEE Trans Image Process 17(7):1178–1188
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Guyon I, Gunn S, Ben-Hur A, Dror G (2005) Result analysis of the NIPS 2003 feature selection challenge. In: Advances in neural information processing systems, pp 545–552
Guyon I, Gunn S, Nikravesh M, Zadeh LA (2006) Feature extraction: foundations and applications. Springer, Berlin
Hall MA (1999) Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato
Hall MA, Smith LA (1998) Practical feature subset selection for machine learning. Comput Sci 98:181–191
Haralick RM, Shanmugam K, Dinstein I (1973) Texture features for image classification. IEEE Trans Syst Man Cybern 3:610–621
Izadipour A, Akbari B, Mojaradi B (2016) A feature selection approach for segmentation of very high-eesolution satellite images. Photogramm Eng Remote Sens 82(3):213–222
Jain AK, Vailaya A (1996) Image retrieval using color and shape. Pattern Recognit 29(8):1233–1244
Jia S, Qian Y, Li J, Liu W, Ji Z (2010) Feature extraction and selection hybrid algorithm for hyperspectral imagery classification. In: IEEE international geoscience and remote sensing symposium, pp 72–75
Jia Y, Huang C, Darrell T (2012) Beyond spatial pyramids: receptive field learning for pooled image features. In: IEEE conference on computer vision and pattern recognition, pp 3370–3377
Jia S, Zhu Z, Shen L, Li Q (2014) A two-stage feature selection framework for hyperspectral image classification using few labeled samples. IEEE J Sel Top Appl Earth Obs Remote Sens 7(4):1023–1035
Jin C, Jin SW (2015) Automatic image annotation using feature selection based on improving quantum particle swarm optimization. Signal Process 109:172–181
Jin C, Liu J, Guo J (2015) A hybrid model based on mutual information and support vector machine for automatic image annotation. In: Artificial intelligence perspectives and applications, pp 29–38
Juan L, Gwun O (2009) A comparison of SIFT, PCA-SIFT and SURF. Int J Image Process 3(4):143–152
Kerroum MA, Hammouch A, Aboutajdine D (2010) Textural feature selection by joint mutual information based on Gaussian mixture model for multispectral image classification. Pattern Recognit Lett 31(10):1168–1174
Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. In: National conference on artificial intelligence, pp 129–129
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: IEEE conference on computer vision and pattern recognition, pp 845–853
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Machine learning: ECML-94, pp 171–182
Korytkowski M, Rutkowski L, Scherer R (2016) Fast image classification by boosting fuzzy classifiers. Inf Sci 327:175–182
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kuo BC, Ho HH, Li CH, Hung CC, Taur JS (2014) A kernel-based feature selection method for SVM with RBF kernel for hyperspectral image classification. IEEE J Sel Top Appl Earth Obs Remote Sens 7(1):317–326
Laliberte AS, Browning D, Rango A (2012) A comparison of three feature selection methods for object-based classification of sub-decimeter resolution UltraCam-L imagery. Int J Appl Earth Obs Geoinf 15:70–78
Landgrebe DA (1980) The development of a spectral-spatial classifier for earth observational data. Pattern Recognit 12(3):165–175
Learned-Miller E, Huang GB, RoyChowdhury A, Li H, Hua G (2016) Labeled faces in the wild: a survey. In: Advances in face detection and facial image analysis, pp 189–248
LeCun Y, Cortes C, Burges CJ (2010) MNIST handwritten digit database, vol 2. AT&T Labs. http://yann.lecun.com/exdb/mnist. Accessed August 2019
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Lee J, Weger R, Sengupta S, Welch R (1990) A neural network approach to cloud classification. IEEE Trans Geosci Remote Sens 28(5):846–855
Levin A, Weiss Y (2009) Learning to combine bottom-up and top-down segmentation. Int J Comput Vis 81(1):105–118
Li Z, Tang J (2015) Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Trans Image Process 24(12):5343–5355
Li R, Lu J, Zhang Y, Zhao T (2010) Dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation. Knowl Based Syst 23(3):195–201
Li S, Yu H, Yuan L (2016a) A novel approach to remote sensing image retrieval with multi-feature VP-tree indexing and online feature selection. In: IEEE second international conference on multimedia big data, pp 133–136
Li Y, Shi X, Du C, Liu Y, Wen Y (2016b) Manifold regularized multi-view feature selection for social image annotation. Neurocomputing 204:135–141
Liang Y, Zhang M, Browne WN (2017) Image feature selection using genetic programming for figure-ground segmentation. Eng Appl Artif Intell 62:96–108
Lim YW, Lee SU (1990) On the color image segmentation algorithm based on the thresholding and the fuzzy c-means techniques. Pattern Recognit 23(9):935–952
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: European conference on computer vision, pp 740–755
Liu Y, Zhang D, Lu G, Ma WY (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recognit 40(1):262–282
Liu Y, Cheng MM, Hu X, Wang K, Bai X (2017) Richer convolutional features for edge detection. In: IEEE conference on computer vision and pattern recognition, pp 5872–5881
Lotfabadi MS, Shiratuddin MF, Wong KW (2015) Utilising fuzzy rough set based on mutual information decreasing method for feature reduction in an image retrieval system. In: Innovations and advances in computing, informatics, systems sciences, networking and engineering, pp 177–184
Loughrey J, Cunningham P (2005) Overfitting in wrapper-based feature subset selection: the harder you try the worse it gets. In: Research and development in intelligent systems XXI, pp 33–43
Lu D, Weng Q (2007) A survey of image classification methods and techniques for improving classification performance. Int J Remote Sens 28(5):823–870
Lu J, Zhao T, Zhang Y (2008) Feature selection based-on genetic algorithm for image annotation. Knowl Based Syst 21(8):887–891
Ma Z, Nie F, Yang Y, Uijlings JR, Sebe N (2012) Web image annotation via subspace-sparsity collaborated feature selection. IEEE Trans Multimed 14(4):1021–1030
Ma L, Li M, Gao Y, Chen T, Ma X, Qu L (2017) A novel wrapper approach for feature selection in object-based image classification using polygon-based cross-validation. IEEE Geosci Remote Sens Lett 14(3):409–413
Maaten LVD, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: European conference on computer vision, pp 316–329
Mui JK, Fu KS (1980) Automated classification of nucleated blood cells using a binary tree classifier. IEEE Trans Pattern Anal Mach Intell 2(5):429–443
Ng WW, Dorado A, Yeung DS, Pedrycz W, Izquierdo E (2007) Image classification with the use of radial basis function neural networks and the minimization of the localized generalization error. Pattern Recognit 40(1):19–32
Núñez J, Llacer J (2003) Astronomical image segmentation by self-organizing neural networks and wavelets. Neural Netw 16(3):411–417
Pal M, Foody GM (2010) Feature selection for classification of hyperspectral data by SVM. IEEE Trans Geosci Remote Sens 48(5):2297–2307
Patil U, Mudengudi U (2011) Image fusion using hierarchical PCA. In: International conference on image information Processing, pp 1–6
Perez CA, Estévez PA, Vera PA, Castillo LE, Aravena CM, Schulz DA, Medina LE (2011) Ore grade estimation by feature selection and voting using boundary detection in digital image analysis. Int J Miner Process 101(1):28–36
Picard RW, Minka TP (1995) Vision texture for annotation. Multimed Syst 3(1):3–14
Porebski A, Vandenbroucke N, Macaire L (2010) Comparison of feature selection schemes for color texture classification. In: International conference on image processing theory tools and applications, pp 32–37
Qi C, Zhou Z, Sun Y, Song H, Hu L, Wang Q (2017) Feature selection and multiple kernel boosting framework based on PSO with mutation mechanism for hyperspectral classification. Neurocomputing 220:181–190
Raut SA, Raghuwanshi M, Dharaskar R, Raut A (2009) Image segmentation–a state-of-art survey for prediction. In: International conference on advanced computer control, pp 420–424
Remeseiro B, Penas M, Barreira N, Mosquera A, Novo J, García-Resúa C (2013) Automatic classification of the interferential tear film lipid layer using colour texture analysis. Comput Methods Programs Biomed 111(1):93–103
Remeseiro B, Bolon-Canedo V, Peteiro-Barral D, Alonso-Betanzos A, Guijarro-Berdinas B, Mosquera A, Penedo MG, Sanchez-Marono N (2014) A methodology for improving tear film lipid layer classification. IEEE J Biomed Health Inform 18(4):1485–1493
Roffo G, Melzi S, Cristani M (2015) Infinite feature selection. In: IEEE international conference on computer vision, pp 4202–4210
Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77(1):157–173
Sankaran A, Jain A, Vashisth T, Vatsa M, Singh R (2017) Adaptive latent fingerprint segmentation using feature selection and random decision forest classification. Inf Fusion 34:1–15
Schreiber AT, Dubbeldam B, Wielemaker J, Wielinga B (2001) Ontology-based photo annotation. IEEE Intell Syst 3:66–74
Schroff F, Criminisi A, Zisserman A (2008) Object class segmentation using random forests. In: British machine vision conference, pp 1–10
Shafarenko L, Petrou M, Kittler J (1997) Automatic watershed segmentation of randomly textured color images. IEEE Trans Image Process 6(11):1530–1544
Shang C, Barnes D (2013) Fuzzy-rough feature selection aided support vector machines for mars image classification. Comput Vis Image Underst 117(3):202–213
Shang C, Barnes D, Shen Q (2011) Facilitating efficient mars terrain image classification with fuzzy-rough feature selection. Int J Hybrid Intell Syst 8(1):3–13
Shen L, Zhu Z, Jia S, Zhu J, Sun Y (2013) Discriminative Gabor feature selection for hyperspectral image classification. IEEE Geosci Remote Sens Lett 10(1):29–33
Shi C, Ruan Q, Guo S, Tian Y (2015) Sparse feature selection based on L 2, 1/2-matrix norm for web image annotation. Neurocomputing 151:424–433
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition, pp 1–9
Thomaz CE, Giraldi GA (2010) A new ranking method for principal components analysis and its application to face image analysis. Image Vis Comput 28(6):902–913
Tuia D, Camps-Valls G, Matasci G, Kanevski M (2010) Learning relevant image features with multiple-kernel classification. IEEE Trans Geosci Remote Sens 48(10):3780–3791
Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24(1):175–186
Wang K, He R, Wang L, Wang W, Tan T (2016a) Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 38(10):2010–2023
Wang Z, Feng Y, Qi T, Yang X, Zhang JJ (2016b) Adaptive multi-view feature selection for human motion retrieval. Signal Process 120:691–701
Weinberger KQ, Saul LK (2006) Unsupervised learning of image manifolds by semidefinite programming. Int J Comput Vis 70(1):77–90
Wen X, Shao L, Fang W, Xue Y (2015) Efficient feature selection and classification for vehicle detection. IEEE Trans Circuits Syst Video Technol 25(3):508–517
Xue B, Zhang M, Browne W, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
Yao C, Han J, Nie F, Xiao F, Li X (2018) Local regression and global information-embedded dimension reduction. IEEE Trans Neural Netw Learn Syst 29(10):4882–4893
Zeng Z, Wang X, Chen Y (2017) Multimedia annotation via semi-supervised shared-subspace feature selection. J Vis Commun Image Represent 48:386–395
Zhang H, Fritts JE, Goldman SA (2008) Image segmentation evaluation: a survey of unsupervised methods. Comput Vis Image Underst 110(2):260–280
Zhang D, Islam MM, Lu G (2012a) A review on automatic image annotation techniques. Pattern Recognit 45(1):346–362
Zhang X, Wang W, Li Y, Jiao L (2012b) PSO-based automatic relevance determination and feature selection system for hyperspectral image classification. Electron Lett 48(20):1263–1265
Zhang R, Nie F, Li X (2018) Self-weighted supervised discriminative feature selection. IEEE Trans Neural Netw Learn Syst 29(8):3913–3918
Zhao W, Du S (2016) Spectral-spatial feature extraction for hyperspectral image classification: a dimension reduction and deep learning approach. IEEE Trans Geosci Remote Sens 54(8):4544–4554
Zhao ZA, Liu H (2011) Spectral feature selection for data mining. CRC Press, Boca Raton
Zheng W, Zhu X, Zhu Y, Zhang S (2018) Robust feature selection on incomplete data. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, pp 3191–3197
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495
Zhou X, Gao X, Wang J, Yu H, Wang Z, Chi Z (2017) Eye tracking data guided feature selection for image classification. Pattern Recognit 63:56–70
Zhu C, Jia H, Lu T, Tao L, Song J, Xiang G, Li Y, Xie X (2017) Adaptive feature selection based on local descriptor distinctive degree for vehicle retrieval application. In: IEEE international conference on consumer electronics, pp 66–69
Zou Q, Ni L, Zhang T, Wang Q (2015) Deep learning based feature selection for remote sensing scene classification. IEEE Geosci Remote Sens Lett 12(11):2321–2325
Acknowledgements
This research has been financially supported in part by European Union FEDER funds, by the Spanish Ministerio de Economía y Competitividad (research project TIN2015-65069-C2), by the Consellería de Industria of the Xunta de Galicia (research project GRC2014/035), and by the Principado de Asturias (research project IDI-2018-000176). Financial support from the Xunta de Galicia (Centro singular de investigación de Galicia accreditation 2016–2019) and the European Union (European Regional Development Fund—ERDF), is gratefully acknowledged (research project ED431G/01). We are particularly grateful to Brais Cancela and Amparo Alonso-Betanzos for our stimulating discussions and their comments on the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bolón-Canedo, V., Remeseiro, B. Feature selection in image analysis: a survey. Artif Intell Rev 53, 2905–2931 (2020). https://doi.org/10.1007/s10462-019-09750-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-019-09750-3