Abstract
A traditional feature selection of filters evaluates the importance of a feature by using a particular metric, deducing unstable performances when the dataset changes. In this paper, a new hybrid feature selection (called MFHFS) based on multi-filter weights and multi-feature weights is proposed. Concretely speaking, MFHFS includes the following three stages: Firstly, all samples are normalized and discretized, and the noises and the outliers are removed based on 10-folder cross validation. Secondly, the vector of multi-filter weights and the matrix of multi-feature weights are calculated and used to combine different feature subsets obtained by the optimal filters. Finally, a Q-range based feature relevance calculation method is proposed to measure the relationship of different features and the greedy searching policy is used to filter the redundant features of the temp feature subset to obtain the final feature subset. Experiments are carried out using two typical classifiers of support vector machine and random forest on six datasets (APS, Madelon, CNAE9, Gisette, DrivFace and Amazon). When the measurements of F1macro and F1micro are used, the experimental results show that the proposed method has great improvement on classification accuracy compared to the traditional filters, and it achieves significant improvements on running speed while guaranteeing the classification accuracy compared to typical hybrid feature selections.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
In machine learning fields, a sample often contains a large number of features, many of which are highly correlated or even logically redundant, leading to the problems of high computational complexity and low interpretability [1]. As a consequence, many dimension reduction methods such as feature selection, feature reduction and feature extraction have been studied deeply in recent years. Feature selection is concerned with identifying a small subset of relevant features that are sufficient for learning the target concept, while the aim of feature reduction is to eliminate the logically redundant features from the original feature space without sacrificing the classification accuracy [2]. Different with feature reduction and feature selection, feature extraction transforms a high dimensional feature space into a distinct low dimensional feature space through a transformation of the original feature space [3]. Typical feature extraction methods include: AutoEncoder (AE) [4], latent semantic indexing (LSI) [5], partial least square (PLS) [6], multidimensional scaling [7], principal component analysis (PCA) [8] and latent Dirichlet allocation (LDA) [9]. However, compared to the feature selections and feature reductions, feature extractions cannot be easily interpreted since the physical meaning of the original features cannot be retrieved, limiting its application in dimension reduction.
The methods of feature selections can be divided in to two types: filters and wrappers [10]. Filters evaluate the importance of the features separately based on some predefined metrics instead of using the classifiers. Features are measured and ranked according to their importance using simple measurements such as distance, dependency and information. The most widely used filters include: Chi-square (CHI) [11], improved gini index (IMGI) [12], distinguishing feature selector (DFS) [13], odds ratio (OR) [14], document frequency (DF) [11], Darmstadt indexing approach (DIA) [15], information gain (IG) [16], mutual information (MI) [17], F-score [18], orthogonal centroid feature selection (OCFS) [19], feature selection considering the imbalance problem in text categorization (CMFSX) [20], supervised meaning rank (SMR) [21], unsupervised meaning average (UMA) [21], balanced accuracy measure (AAC2) [22], normalized difference measure (NDM) [23], clustering based feature selection (CBFS) [24], improved mutual information feature selection (NDMI) [25], novel feature selection based on normalized mutual information (NNMI) [26], multi-label feature selection based on max-dependency and min-redundancy (MDMR) [27]. Among these methods, NDMI, NNMI and MDMR are MI based filters which use the greedy searching policy to select the best features by calculating the mutual information of feature-feature and feature-class. A wrapper estimates the accuracy of a classifier by adding each unselected feature to the feature subset until the accuracy is less than the estimated accuracy of the feature set already selected [28]. Typical feature selections of wrappers include: forward search (FS), backward search (BS), sequential floating search (SFS) and simplified silhouette filter (SSF) [29]. Compared to the wrapper methods, though filters may provide less precise results, they are particularly efficient when dealing with large datasets.
In order to combine the advantages of filters or wrappers, many hybrid methods are proposed in recent years. Typical hybrid feature selections include: cluster based hybrid feature selection approach (CBHFS) [30], improved global feature selection scheme (IGFSS) [31], variable global feature selection scheme (VGFSS) [32], hybrid dimension reduction by integrating feature selection with feature extraction method (TDPFS) [3], novel feature selection based on harmony search (HFS) [33], hybrid approach of differential evolution and artificial bee colony for feature selection (HDAFS) [34], particle swarm optimization for feature selection (PSOFS) [35], hybrid feature selection based on enhanced genetic algorithm (EGAFS) [36], hybrid feature selection using component co-occurrence based feature relevance measurement (HFSCC) [37], multi-measure multi-weight ranking approach (MMMW) [38], multi-filter based feature selection (EMFFS) [39], two-step based feature selection (TSFS) [40], etc. These methods have good performances on classification accuracy, thus have been widely used in data classification fields.
In this paper, a new hybrid feature selection (called MFHFS) based on the information of multi-filter weights and multi-feature weights is proposed. In our experiments, we applied the proposed hybrid feature selection method on SVM and RF using six benchmark datasets. We show the effectiveness of our method by demonstrating that it significantly outperforms typical existing feature selection methods on the aspects of classification accuracy or running speed. The remainder of this paper is organized as follows. Section 2 reviews the related work on feature selections. Section 3 gives the details of the proposed method. Section 4 shows the experimental results. Section 5 concludes the whole paper.
2 Related work
2.1 The filters
2.1.1 The IMGI method
In order to apply the feature selection to data classification tasks with multiple class labels, Shang [14] improved the traditional Gini index method [41] and proposed the IMGI method. Given a feature ti, its IMGI value is defined as follows:
where p(ti|ck) represents the conditional probability that feature ti occurs in category ck, p(ck|ti) represents the conditional probability that a sample belongs to ck when it contains ti.
2.1.2 The CHI method
Given a category ck and a feature ti, ti has strong category discriminative ability if it has a high CHI value. The CHI method calculates the score of ti as follows [42]:
where N represents the total number of samples, aik represents the frequency that ti and ck co-occur, bik represents the frequency that ti does not occur in ck, eik represents the frequency that class ck occurs and does not contain feature ti, and dik represents the frequency that neither ck nor ti occurs.
2.1.3 The DFS method
DFS is one of the successful feature selections and is also a global feature selection metric [13]. The idea of DFS is to select a set of distinctive features while eliminating the uninformative ones. The formula of DFS is defined as follows [31]:
where \( p\left(\overline{t_i}|{c}_k\right) \) is the conditional probability of absence of term ti given class ck, and \( p\left({t}_i|\overline{c_k}\right) \) is the conditional probability of term ti given all the classes except ck.
2.1.4 The CMFSX method
Yang proposed a new feature selection method (called CMFSX) which can weaken the adverse effect caused by the imbalance factor in the dataset. CMFSX calculates the score of a feature ti as follows [20]:
2.1.5 The SMR method
SMR uses a class of documents as the basic unit or context in order to calculate the meaning scores for words. Assume that a feature ti appears k times in the dataset S, and m times in the documents of class ck, SMR calculates the score of ti as follows [21]:
where L and B are the feature frequency of ti in the dataset and class ck, respectively.
2.1.6 The NDM method
NDM is a modified form of ACC2 measure [22] which solves the problem of class imbalance by normalizing true positive and false positive rates by the respective class size. Mathematically, NDM is defined as follows [23]:
Where tpi is the number of samples belonging to the positive classes and containing the term ti, fni is the number of samples not belonging to the positive classes and not containing the feature ti; fpi is the number of samples not belonging to the positive classes and not containing the feature ti.
2.1.7 The NDMI method
NDMI uses the mutual information of feature-feature and feature-class to determine an optimal set of features [25]. It uses a greedy searching policy to select the most discriminative features and filters the redundant ones. Given the feature ti, its NDMI score is defined as follows:
where S is the set of selected features, |S| is the number of features in S, p(ti) denotes the occurring probability that a sample contains the feature ti, p(ck) denotes the probability of the samples in category ck, p(ti, ck) denotes the probability that ti occurs in ck, p(ti, tj) denotes the probability that ti and tj both occurs in a sample.
2.1.8 The MDMR method
Different from traditional single-label feature selection, MDMR considers not only the redundancies between individual features or the redundancies between class label and the candidate features, but also the conditional dependencies between features and each class label. The objective function is described as follows [27]:
where MI(ti, cj| tl) is the mutual information between the candidate feature ti and all categories when given the selected feature tl.
2.2 The hybrid methods
As different filters use different metrics to evaluate the feature importance, they always output different feature subsets for a particular dataset. In recent years, hybrid methods which combine different feature evaluating metrics altogether have received considerable attentions.
2.2.1 The EGAFS method
This approach combines the advantages of the filters with an enhanced genetic algorithm (EGA) in a wrapper approach to handle the high dimensionality of the feature space [36]. EGA improves the crossover and mutation operators of traditional genetic algorithms. The crossover operation is performed based on feature subset partitioning, while the mutation is performed based on the classifier performance of the original parents and feature importance. Moreover, this method combines six well-known filters with the EGA to obtain the final feature subset. Though EGAFS has high classification accuracy, it uses wrappers and deduces the problem of high computational complexity.
2.2.2 The TDPFS method
Based on two feature selections and a feature extraction method, Bharti and Singh proposed the TDPFS method for text clustering [3]. Firstly, a typical term frequency based feature selection and a typical document frequency based feature selection are used to obtain two feature subsets, respectively. Then, the modified union operation is proposed to merge the features of these feature subsets. Finally, the PCA algorithm is applied to further transform the features in an interpretable feature space to reduce the dimension further without losing much information.
2.2.3 The IGFSS method
Although the features selected by traditional feature selections scheme can represents some of the classes successfully, some of the classes still may not be represented. Uysal proposed an improved global feature selection scheme (IGFSS) where a traditional feature selection is modified to obtain a more representative feature set [31]. In other words, IGFSS aims to improve the performance of traditional feature selections by selecting the features which represents all classes almost equally.
2.2.4 The VGFSS method
Though IGFSS solves the problem that some of the classes may not be represented by selecting an equal number of representative features from all the classes. However, this method is not suitable for an unbalanced dataset which has many classes, deducing the problem that some important features of the class that contains a higher number of features may be ignored. On this basis, Agnihotri and Verma proposed the VGFSS method to select a variable number of features from each class based on the distribution of features [32]. The number of selected features in each class is defined as follows:
Where count(Cj) is the number of features of class Cj, N is the number of final selected features, TFC is the number of all features in the dataset.
2.2.5 The CBHFS method
Jaskowiak and Campello presented a hybrid feature selection (CBHFS) tailored for data classification problems [30]. This method consists of two main stages: in the first stage, it uses a clustering algorithm to identify the best features and remove the redundant ones [18]; in the second stage, a wrapper is used to evaluate different feature subsets produced by the clustering processes, determining the one that produces the best classification performance in terms of accuracy. This method has high classification accuracy but faces the problems of parameter dependency and high computational complexity.
2.2.6 The HFSCC method
Wang and Feng proposed a hybrid feature selection which can achieve the best features effectively and efficiently [37]. HFSCC consists of three steps: firstly, the samples are preprocessed and two feature subsets are obtained by using two different optimal filters; secondly, a feature weight based union operation is proposed to merge the obtained feature subsets; finally, the hierarchical agglomerative clustering algorithm is applied to obtain the final feature subset by combining a component co-occurrence based feature relevance measurement and a predetermined threshold. Experimental results show that this method achieves high classification accuracy and execution speed.
2.2.7 The MMMW method
As the features selected by different feature selections are always different, it is not enough to evaluate the importance of a feature by using only one particular feature selection. Bhattacharya and Selvakumar proposed the MMMW method which combines a filter, a wrapper and a clustering based feature selection to select the best features [38]. Though MMMW uses novel metrics to assign weights to the features, it ignores the weights of different methods on different datasets.
2.2.8 The EMFFS method
Osanaiye and Cai proposed a multi-filter based feature selection that combines the results of four filters to achieve the best features [39]. This method uses a simple majority voting policy to merge the features selected by different filters. Moreover, a threshold is predetermined to select the frequently occurring features among the four filters to construct the set of final features. However, this method ignores the fact that different filters have different performances on different datasets, and the optimal value of the predetermined threshold is hard to tune.
As is shown in Table 1, the main advantages and limitations of the above feature selections can be described as follows: (1) The traditional filters have high running speed, but they cannot filter the redundant features and the output results rely on the feature importance measurements and datasets seriously; (2) The MI based filters and wrappers can filter the redundant features, but they generally have high computation complexities when dealing with the data sets with high numbers of features or samples [40]; (3) The hybrid methods can obtain higher classification accuracy than those of the other methods, but some are very time consuming as they use the wrappers to obtain the best feature subset. Moreover, some existing hybrid methods combine the results of different filters by using a simple majority voting or set union mechanism [37], ignoring the effects of feature weight and filter weight.
For solving these problems, we propose a new hybrid feature selection (called MFHFS) based on the information of multi-filter weights and multi-feature weights. The main contributions of this paper are given as follows: (1) A new hybrid feature selection frame which combines the advantage of traditional filters on running speed and the advantage of greedy searching policy on filtering the redundant features is proposed. (2) In order to avoid the problem that the performance of a filter is unstable when the dataset changes, the vector of multi-filter weights is introduced. Moreover, in order to distinguish the importance of different features from different datasets, the matrix of multi-feature weights is proposed to obtain the most discriminative features. (3) A Q-range based feature relevance calculation method is proposed to improve the running speed of achieving the feature relevance matrix when filtering the redundant features.
3 The proposed method - MFHFS
3.1 Description of MFHFS
Though the traditional filters have high executing speed, they fail to filter the redundant features and their performances rely on the feature importance measurements and the datasets seriously. Moreover, though the MI based filters or the hybrid methods can filter the redundant features, they always face the problem of high computational complexity, decreasing the executing speed when the number of features or the number of samples is high [37]. On this basis, we combine the advantages of the methods with different types, and propose a new multi-filter weights and multi-feature weights based feature selection (called MFHFS) of which the flowchart is shown in Fig. 1. MFHFS can be executed in the following three steps: (1) Data preprocessing: the samples are normalized and discretized by using the equal width interval binning (EWIB) algorithm, and the noises and outliers are filtered by combining the 10-folder cross validation; (2) Feature combination: several feature subsets are obtained by a set of optimal filters, and they are used to obtain a temp feature subset FSt by considering the multi-filter weights and multi-feature weights; (3) Feature refinement: a Q-range based feature relevance calculation method is proposed to obtain the feature relevance matrix of FSt, and the final features are selected from FSt by using the greedy searching policy. The details of MFHFS are described as follows:
-
Stage 1: Data preprocessing
In the pre-processing step, three operations: normalization, discretization and removal of noises and outliers are executed. Normalization aims to adjust values measured on different scales to a notionally common scale. It can reduce the computational cost and solve the problem that the classification accuracy may be unstable when the features have large ranges of values. Discretization is the process of transferring continuous data into discrete counterparts. It is used to reduce the computational complexity and improve the stability of feature selection in the preprocess of the proposed method. Moreover, as the noises and outliers may affect the distribution characteristic of the entire dataset, we will remove them to improve the robustness of the proposed method. The details of data preprocessing are given in Algorithm 1.
In the normalization step (step 1.3), fmin and fmax are set to fmin = 0 and fmax = 6; in the discretization step (step 1.4), NQ is set to NQ = fmax + 1, and the interval width is set to \( \delta =\frac{f_{\mathrm{max}}-{f}_{\mathrm{min}}}{N_{\mathrm{Q}}} \) [37].
Stage 2: Feature combination
Different traditional filters use different metrics to evaluate the feature importance, thus the performance of a filter may be unstable when the dataset changes. Osanaiye et al. proposed an ensemble-based multi-filter feature selection that combines the output of four filter methods to achieve an optimum selection [40]. However, this method uses a simple majority vote to determine the final selected features by combining a predetermined threshold, thus deduces the following problems: (1) It treats different filters equally thus ignores the filter weights, decreasing the classification accuracy when some of the filters have bad performances on particular datasets. (2) The features selected by a filter are treated equally, ignoring the fact that the category discriminative abilities of these features are always different. (3) The predetermined threshold is related to the number of selected features and is hard to tune. On this basis, we propose a novel feature combination method by introducing the vector of multi-filter weights and the matrix of multi-feature weights, and the executing flowchart of this stage is given in Fig. 2. Given the preprocessed entire sample set ES′, we first obtain the vector of multi-filter weights and denote it as A = (α1, α2, …, αL). As is shown in Algorithm 2, the details are given as follows:
Further, we obtain the matrix of multi-feature weights and denote it as B = {Bi} = {(βi1, βi2, …, βij, …, βiM)}. As is shown in Algorithm 3, the details are given as follows:
On this basis, the 1 × M size vector of final feature weights is obtained by calculating the dot product of the 1 × L size vector of multi-filter weights and the L × M size matrix of multi-feature weights, and the temp feature subset FSt is obtained by the steps described in Algorithm 4:
From Algorithms 2–4 we know that we consider the effects of the multi-filter weights and the multi-feature weights by introducing the vector W, thus can avoid the problem that the optimal filters or the selected features are treated equally when searching the best features. For ease of computation, we set the number of optimal filters L = 4 in this paper. Thus, the time complexity of obtaining the final feature subset FSt is O(4 × (M|C| + Mlog2M + 3 N) + Nlog2N).
Stage 3: Feature refinement
In order to filter the redundant features in FSt, TDPFS [3] uses the PCA algorithm to transform the high dimensional feature space into an un-interpreted low dimensional feature space. EGAFS [36] uses the enhanced GA based wrapper method to select the optimal features, thus has the problem of high computation cost. On this basis, we utilize the greedy searching policy [25,26,27] to obtain the optimal final feature subset from FSt of which the feature number is much lower than that of entire dataset. The details of this stage are given as follows:
- (1)
obtain the feature relevance matrix
HFSCC [37] obtains the feature relevance matrix by calculating the feature relevance of each pair of the features in FSt. This method ignores the fact that a feature is only redundant with the ones which have equal or approximate global feature weight values, thus deduces a high computation complexity of O(N2). In this paper, we suppose that the proportion of redundant features in FSt is no higher than r (0 < r < 1), and propose a new feature relevance calculation method which only calculate the relevance between a feature fi and the features which have similar normalized global feature weight values with fi. As is shown in Fig. 3, given the temp feature subset FSt obtained in stage 2, we first normalize the final feature weight values of FSt, then calculate the feature relevance matrix Φ = {ϕij} (1 ≤ i, j ≤ N) using formula (15):
where fi and fj are the ith and jth features in the feature subset FSt; Q is a factor which ranges in the interval (0, 1); WN(fi) and WN(fj) are the normalized final feature weight values of fi and fj, respectively; cij denotes the feature relevance of fi and fj using the CCFR algorithm of HFSCC method; Ω denotes the corresponding component values (possible values) for each feature; fik denotes the kth component value of fi and fjl denotes the lth component value of fj; p(fik| fjl) denotes the conditional probability that fik occurs when fjl occurs; p(fjl| fik) denotes the conditional probability that fjl occurs when fik occurs; p(fik, fjl) is the feature component based normalization coefficient which denotes the probability that fik and fjl occur together in the dataset.
- (2)
select the final feature subset using the greedy searching policy
Though HFSCC [37] and TSFS [40] can filter the redundant features of FSt, they both require a predetermined threshold th which has great influence on the number of selected features. Therefore, we apply the greedy searching policy of traditional MI filters on the sorted feature subset FSt to obtain the final feature subset FS. A greedy algorithm is proposed for solving an optimization problem of finding the solution that maximises the measured fitness [44]. When dealing with an optimization problem, greedy searching policy selects the local optimal solution in each step, and hopes to generate a global optimal solution through a series of local optimal selections. As this policy measures the importance of the features by maximining the relevance between the features and the classes and minimizing the redundancies among the selected features [26, 27], in this paper, the objective function of selecting a feature f from FSt is defined as follows:
Where S is the set of selected features; θ (θ > 0) and φ (φ > 0) are two parameters balancing the feature category discriminative ability and the feature redundancy. In order to emphasize the importance of the feature category discriminative ability and weaken the data loss that may be brought by setting some elements in Φ to zero directly, the parameters of θ and φ are set to θ = 1 and φ = 0.5 in this paper. On this basis, the details of stage 3 are shown in Algorithm 5:
From Algorithm 5 we know that the time complexity of steps 2–3 is O(N + pN2) (0 < p < 1, p is the probability that the expression ∣WN(fi) − WN(fj) ∣ < Q is true). As there are few redundant features in FSt, Q is much lower than 1 in practical situations, which means that the running speed of the proposed Q-range based method is 1/p times of that of the HFSCC method. Moreover, we notice that the time complexity of step 4 is \( O\left(\sum \limits_{i=0}^{N_{\mathrm{s}}-1}\left({N}_1-i\right)\times i\right)=\frac{\left({N}_{\mathrm{s}}-1\right){N}_{\mathrm{s}}{N}_1}{2}-\frac{\left({N}_{\mathrm{s}}-1\right){N}_{\mathrm{s}}\left(2{N}_{\mathrm{s}}-1\right)}{6} \), thus we conclude that the overall time complexity of Algorithm 5 is \( O\left(N+p{N}^2+\frac{\left({N}_{\mathrm{s}}-1\right){N}_{\mathrm{s}}N}{2}-\frac{\left({N}_{\mathrm{s}}-1\right){N}_{\mathrm{s}}\left(2{N}_{\mathrm{s}}-1\right)}{6}\right) \).
For ease of understanding, we give an example to show the executing processes of MFHFS. As is shown in Table 2, the preprocessed samples are denoted as ES′ = {s1, s2, s3, s4, s5, s6, s7, s8} and the set of class labels are denoted as La = {1,2,3}. Moreover, we suppose that the set of optimal filters used in Algorithms 2–4 is OFS = {IG, IMGI, CHI, DFS}, the corresponding vector of multi-filter weights is set to A = (1, 1, 1, 1), the feature number of FSt is set to N = 8, the factor Q is set to Q = 0.1, and the feature number of FS is set to Ns = 6. On this basis, Table 3 shows the feature scores when different methods are used and the corresponding feature orders are given in the brackets. Moreover, Tables 4 and 5 give the feature relevance matrices of the features in FSt when HFSCC and the proposed Q-range based method are used, respectively. Obviously, we know from Table 4 that the features f1 and f2 are redundant with each other as their corresponding feature relevance equals to 1. Further, it can be seen from Table 5 that the feature relevance between f1 and f2 can also be achieved by the proposed Q-range based method accurately and the computational cost of our method is 25% (16 of 64 cases) of that of HFSCC method. Moreover, Table 6 gives the Score values of the features in FSt and the changes of the final feature subset FS when the number of selected feature increases. Obviously, we know from Table 6 that the redundant feature f2 is filtered and the final selected feature subset is FS = {f4, f5, f1, f8, f9, f7}.
3.2 Time complexity analysis of MFHFS
The time complexity of the proposed feature selection consists of three stages:
- (1)
The stage of data processing: we know from Algorithm 1 that the complexity of this stage is T1 = O(MD) + L×(O(flt) + O(clat) + |ES| × O(clap)), where O(flt), O(clat) and O(clap) are the time complexities of the filter flt, the training and the predicting processes of classifier cla. However, as this stage can be applied to all feature selections, we do not consider this part when calculating the time complexity of MFHFS.
- (2)
The stage of feature combination: we know from Algorithms 2–4 that the complexity of this stage is T2 = O(4 × (M| C| +Mlog2M + 3N) + Nlog2N), where M is the number of all features and N is the number of selected features in Algorithm 2.
- (3)
The stage of feature refinement: we know from Algorithm 5 that the time complexity of this stage is \( {T}_3=\mathrm{O}\left(N+p{N}^2+\frac{\left({N}_{\mathrm{s}}-1\right){N}_{\mathrm{s}}N}{2}-\frac{\left({N}_{\mathrm{s}}-1\right){N}_{\mathrm{s}}\left(2{N}_{\mathrm{s}}-1\right)}{6}\right) \), where Ns is the number of features in FS. Generally, there exists Ns ≈ N> > 1, thus we have:
On this basis, we combine the results of T2 and T3 and remove the constant coefficients, then obtain the overall time complexity TMFHFS (constant coefficients are removed) as follows:
When considering the traditional MI based filters (like NDMI and NNMI), according to [35] we know that their computational complexities are all:
When the value of M is large enough, there generally exists M/2> > 1/3, thus we remove the constant coefficients of formula (21) and transform TMI to the following formula:
Then, we have:
As there exists |C| < <Ns2, log2M < <Ns2 and Ns < <M, then we have \( \frac{T_{\mathrm{MFHFS}}}{T_{\mathrm{MI}}}<<1 \), which means that the computational complexity of MFHFS is obviously lower than that of the traditional MI based filters.
4 Experimental results and analysis
We developed our hybrid method on a computer having Windows 10 Home operating system, 8 GB of RAM, and Intel Core i7–7500 processor. In this section, to verify the classification performance of the selected feature subsets, a series of experiments are conducted to compare the proposed method with typical feature selections. All experiments are implemented with matlab 2015a which is popular on machine learning and data mining. Moreover, the 10-folder cross validation is used to test the performances of different methods.
4.1 Datasets
In this section, we select six datasets which contain more than 100 features from UCI machine learning repository [45, 46]. Table 7 represents the characteristics of these datasets including the numbers of features, the numbers of instances, and the numbers of classes. These datasets included APS, Madelon, CNAE9, Gisette, DrivFace and Amazon. As Amazon contains too many classes, for ease of computation, we only consider the top six classes in which each class consists of 30 samples.
4.2 Classifiers
In order to investigate the performance of the proposed algorithm, two well-known classifiers: support vector machine (SVM) [47] and random forest (RF) [48] are used. SVM is a discriminative classifier which is formally defined by a separating hyper-plane. In SVM, a good separation is achieved by the hyper-plane that has the largest distance to the nearest training data point of any class. RF is a classification method which combines multiple tree predictors in a way that each tree depends on a value of randomly chosen vector distributed among all trees in forest in the same way [49]. The parameters of these classifiers are given as follows: the number of trees in RF classifier is set to nt = 5; the libSVM library [50] is used for SVM classifier with the parameters c = 15,000 and gamma = 0.07; the classifier used in Algorithms 1–2 is SVM.
4.3 Evaluation measurements
The macro-averaged F1 measurement and the micro-averaged F1 measurement are used to evaluate the performances of different feature selections. Given class c, the F1 measurement (F1c) which is defined in formula (24) considers both the precision pc and the recall rc [21]:
Where tpc is the number of the samples which are correctly classified into c, fpc is the number of the samples which are wrongly classified into c, fnc is the number of the samples which belong to c but are wrongly classified. Obviously, the F1c measurement is a harmonic mean of precision and recall whose best value is 1 and worst value is 0 [15]. On this basis, the definition of Macro-averaged F1 measurement (F1macro) is given as follows:
Where |C| is the number of classes in the dataset.
Different with Macro-averaged F1 measurement, Micro-averaged F1 measurement sums up all classification decisions for all classes of a dataset to calculate global precision and recall. The calculation for Micro-averaged F1 measurement is shown as follows:
Where p is the global precision for all classes and r is the global recall for all classes. The definitions of p and r are shown in formula (27) [51, 52]:
4.4 Selection of the optimal filters in OFS
From Algorithms 2–4 we know that, a good selection of the optimal filters may deduce a feature subset FSt with high quality. On this basis, eight traditional filters (IG, IMGI, CHI, CMFSX, DIA, DFS, CDM [51] and OR) are used for experiments to select the set of four optimal filters OFS = {of1, of2, of3, of4}. When the classifiers of SVM and RF are used on different datasets, we calculate the average F1macro values (Fmac_a) and average F1micro values (Fmic_a) of different filters as the ratio of selected features ranges from 2% to 10% with a step of 2%. Further, the average values (called Fa) of Fmac_a values and Fmic_a values with respect to each method are calculated and the results are shown in Tables 8 and 9. For ease of understanding, the highest Fa values with respect to each dataset are denoted in bold. We know from Tables 8 and 9 that CHI, CMFSX and DFS perform generally better than the other methods as they obtain the highest Fa values for 6, 4 and 3 times, respectively. Moreover, Fig. 4 give the average Fa values (Faa) of each method with respect to all datasets when SVM and RF are used, respectively. As the performances of CHI, CMFSX, DFS and IMGI are obviously better than those of the other filters, they are selected to form the set of optimal filters OFS = {CHI, CMFSX, DFS, IMGI}. Moreover, according to Algorithm 2, we obtain the vectors of multi-filter weights of different datasets and the results are shown in Table 10 and 11.
4.5 Sensitivity analysis of the parameter Q
From Algorithm 5 we know that the parameter Q affects the running speed of calculating the feature relevance matrix. In this section, we evaluate the performances of the Q-range based method when Q ranges from 0.1 to 1.0 with a step of 0.1. With respect to each value of Q, we calculate the average consuming time (denoted as ta and expressed in seconds) of step 3 in Algorithm 5 when the ratio of selected features ranges from 1% to 10% on each dataset, and the results are shown in Fig. 5. Further, given a predetermined threshold th (th = 0.9 in this paper) which is approaching 1, we calculate the average probability (denoted as pa) that the feature relevance is higher than th when the ratio of selected features ranges from 1% to 10% to test the ability of the Q-range based method on achieving the redundant features, and the results are also shown in Fig. 5. Obviously, when considering the datasets of APS, Madelon, Gisette, DrivFace and Amazon, we can see from these figures that the pa values remain unchanged, but the ta values increase gradually with the highest increment of about 120 s when Gisette is used. When considering the dataset of CNAE9, we notice that the pa values have slight increments but the ta values have significant increments when value of Q increases. Therefore, we conclude that, when Q is greater than 0.1, it has great effect on the running speed of calculating the feature relevance matrix but little effect on the performance of filtering the redundant features. On this basis, in order to improve the efficiency of the proposed method while guaranteeing the classification accuracy, Q is set to Q = 0.1 in this paper.
4.6 Comparisons of MFHFS with typical filters
In order to validate the performances of different methods, we compare typical methods of filters (SMR [21], NDM [23], NDMI [25], MDMR [27], NFS [52], OCFSX [53] and FECS_FR [54]) on the aspects of classification accuracy and running speed. Figure 6, 7, 8, 9, 10 and 11 gives the average values (called F1m) of F1macro values and F1micro values of SVM and RF classifiers when the ratio of selected features ranges from 2% to 10% with a step of 2%. Obviously, when SVM is used, the F1m values of MFHFS are greater than those of the other filters in most of the cases when the ratio of selected features is no less than 0.06. As the ratio of selected features increases, the performances of MFHFS are much more stable than those of the other filters, showing the robustness of the proposed method on different datasets. When RF is used, the performances of SMR and NFS are less stable than those of the other filters as the ratio of selected features increases. MFHFS obtains the highest F1m values in all cases with respect to the datasets of APS, DrivFace and Amazon, with the highest F1m value (0.994) when 0.04 features are selected from the dataset of APS.
Further, Tables 12, 13, 14 and 15 given the average F1macro values (Fmac_a) and average F1micro values (Fmic_a) with the ratio of selected features ranging from 2% to 10% when the classifiers of SVM and RF are used, respectively. In these tables, the highest values of each dataset are denoted in bold. Obviously, we know from Tables 12 and 13 that when SVM is used, MFHFS obtains the highest Fmac_a values except for the DrivFace dataset, and it obtains the highest Fmic_a values except for the Madelon and DrivFace datasets. Moreover, it can be seen from Tables 14 and 15 that, when RF is used, MFHFS outperforms the other methods for six and five times on the measurements of Fmac_a and Fmic_a, respectively, illustrating the effectiveness of the proposed method on selecting the best features. Therefore, we conclude that the performance of MFHFS is generally better than those of the typical filters on classification accuracy. This may due to the following two reasons: 1) MFHFS filters the noises and outliers in the preprocessing process, avoiding the overfit problem and improving the robustness of the proposed method; 2) MFHFS achieves the best features of different optimal filters by combining the multi-filter weights and the multi-feature weights, differentiating the importance of different features and solving the problem that the classification performance is not stable when the dataset changes.
Furthermore, Table 16 shows the average running time (called rta, expressed in seconds) of different methods when the ratio of selected features ranging from 2% to 10% with a step of 2%. It can be seen from Table 16 that the filters of OCFSX, SMR, NFS, FECS_FR and NDM run obviously faster than MFHFS. This is due to the fact that MFHFS combines multiple filters and executes redundant feature filtering process which is time-consuming on calculating the feature relevance matrix. However, when comparing MFHFS with NDMI and MDMR, we notice that the rta values of the former method are obviously lower than those of the later methods. For example, when Gisette dataset is used, the rta values of the NDMI and MDMR are 1260.371 and 4387.189, respectively, while the corresponding rta value of MFHFS is 87.162. Therefore, by combining Tables 12, 13, 14 and 15 we conclude that: 1) for the aspect of classification accuracy, the performances of MFHFS are generally better than those of typical filters; 2) for the aspect of running speed, MFHFS has acceptable decrements compared to the traditional filters (OCFSX, SMR, NFS, FECS_FR and NDM) and has great increments compared to the MI based filters (NDMI and MDMR).
4.7 Comparisons of MFHFS with typical feature extractions and hybrid feature selections
In this section, we conduct a series of experiments to compare MFHFS with several typical feature extractions and hybrid feature selections: AE [4], PCA [8], CBHFS [30], VGFSS [32], EGAFS [36], EMFFS [39], MFHFS1 and MFHFS2. Among these methods, MFHFS1 is modified from MFHFS and calculates the feature relevance matrix using HFSCC method [37]; MFHFS2 is modified from MFHFS without filtering the noises and outliers in the data preprocessing stage. Here, the parameters of each method are given as follows: (1) AE: the maximum epochs: Me = 100; (2) EMFFS: the optimal filters are CMFSX, CHI, IMGI and DFS; (3) EGAFS: number of particles: Np = 20, number of iterations: Nit = 10; (4) CBHFS: number of the K values: Nk = 10, number of repetitions for each K: Nr = 10. Moreover, the classifiers used in the wrappers of these methods are SVM and RF. On this basis, when the ratio of selected features ranges from 2% to 10% with a step of 2%, the average F1macro values (Fmac_a) and average F1micro values (Fmic_a) are shown in Tables 17, 18, 19 and 20, and the average Fmac_a (Fmic_a) values of all datasets for each method when SVM and RF are used are given in Fig. 12.
We can see from Tables 17 and 18 that when SVM is used, MFHFS obtains the highest values for five times, which is obviously higher than those of the other methods. The performances of MFHFS and MFHFS1 are similar and generally better than that of NFHFS2, showing the efficiency of filtering noises and outliers in the preprocessing stage on improving the classification accuracy. Moreover, we can see from Fig. 12a and b that the average Fmac_a values and average Fmic_a values of the proposed method are generally higher than those of the other methods, with the highest improvement of about 0.04 over that of PCA when SVM is used. We know from Tables 19 and 20 that when RF is used, MFHFS obtains the highest Fmac_a value when the datasets of APS, Gisette and DrivFace are used, illustrating the efficacy of MFHFS on dealing with the datasets those have high numbers of features or samples. Moreover, we know from Fig. 12c and d that MFHFS and MFHFS1 outperforms the other method significantly, with the average improvements of about 0.05 and 0.04 over those of EGAFS and EMFFS when the measurements of average Fmac_a values and average Fmic_a values are used, respectively. This may be due to the following reasons: (1) Feature extractions like PCA and AE ignore the category information of the samples, losing some information which may be helpful for data classification; (2) EMFFS and VGFSS evaluate the importance of the features separately, ignoring the effect of the redundant information contained in the selected features; (3) The results of EGAFS and CBHFS rely on some important parameters (such as the numbers of iterations or particles) in the genetic algorithm and the clustering algorithm, deducing unstable results when dealing with high dimensional datasets; (4) The samples of noise or outliers are removed in MFHFS, improving the robustness and generalization of the proposed method.
Table 21 gives the average running time (called rta, expressed in seconds) of these methods on each dataset when SVM and RF are used, respectively. We observe that CBHFS is the slowest one in these methods, and it obtains the highest running time (335,926.365 s) on Gisette dataset. Moreover, EGAFS runs much slower than the other methods except for CBHFS. The reason may be that CBHFS and EGAFS both use the wrapper methods which take a lot of time on training and classification to obtain the best features. In addition, we notice that the increments of MFHFS over PCA, EMFFS and VGFSS on rta are all lower than 70 s, which is acceptable in actual situations. Moreover, compared to the MFHFS1 methods, MFHFS runs much faster, illustrating the efficiency of the Q-range based feature relevance calculation method on improve the running speed while guaranteeing the qualities of selected features.
4.8 Statistical comparisons
The nonparametric tests are widely used in statistical learning. The Wilcoxon signed rank test is usually powerful in detecting the difference between two populations [55]. In this section, the Wilcoxon signed rank test is used, and the null hypothesis is that the two methods are equivalent. If the null hypothesis is rejected (p value is less than or equal to the significance level α), the differences between the methods are significant. Given the significance level α = 0.05, the average F1macro and F1micro values (called Fa) of different typical methods are compared using the Wilcoxon signed rank test when the ratio of selected feature ranges from 0.01 to 0.1 with a step of 0.01. Moreover, experiments are carried out 10 times and the average p values are shown in Tables 22 and 23 when SVM and RF are used, respectively. We can see from Table 22 that the p values are lower than α in 52 of 60 cases when SVM is used, showing that the proposed method outperforms the other methods significantly in 86.7% of the cases on classification accuracy. Moreover, it is obvious that the proposed method outputs the other methods significantly on the datasets of CNAE9, Gisette and DrivFace, though its superiority on Amazon dataset is not remarkable when compared to IMGI, OCFSX, EMFFS and VGFSS. Further, we can see from Table 23 that the proposed method outputs the other methods significantly in 91.7% cases (55 of 60 cases) when RF is used, illustrating the effectiveness of the proposed method on selecting the best features and ensuring the classification accuracy.
5 Conclusions
Many traditional feature selections of filters rely on the datasets and cannot deal with the redundant information effectively. Moreover, many MI based filters or hybrid methods have high time complexities when the numbers of features or the numbers of samples are high. In this paper, a multi-filter weights and multi-feature weights based hybrid feature selection (called MFHFS) is proposed. In the data preprocessing stage, the samples are normalized and discretized by using the equal width interval binning (EWIB) algorithm. Moreover, the 10-folder cross validation is combined to remove the samples of noises and outliers. In the feature combination stage, several optimal filters are chosen and used to obtain the feature subsets, and these feature subsets are merged into a temp feature subset by considering the multi-filter weights and multi-feature weights. In the feature refinement stage, the redundant information of the temp feature subset is filtered to obtain the final feature subset, and a Q-range based feature relevance calculation method is proposed to improve the running speed of redundant information filtering. The efficiency of MFHFS is examined through the classification experiments with SVM and RF classifiers on six datasets: APS, Madelon, CNAE9, Gisette, DrivFace and Amazon. Experimental results show that: (1) When compared to the traditional filters, MFHFS has great improvement on classification accuracy at the cost of acceptable decrement of running speed; (2) MFHFS achieves obvious improvements on classification accuracy over typical feature extractions and hybrid feature selections, and it has significant improvements on running speed over typical methods like AE, EGAFS and CBHFS, illustrating its efficacy on obtaining the best features in data classification field.
In the future, we will study deeper in the following two aspects: (1) Learn from different datasets of multiple fields and investigate the multi-task based feature selection methods; (2) Investigate parallel computing methods to improve the running speed and extend the proposed method to real applications.
References
Hancer E, Xue B, Zhang M (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowledge-Based Systems
Rawles S, Flach P (2004) Redundant feature elimination for multi-class problems. International Conference on Machine Learning ACM
Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 42(6):3105–3114
Zabalza J, Ren J, Zheng J et al (2016) Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing 185(C):1–10
Quispe O, Ocsa A, Coronado R (2017) Latent semantic indexing and convolutional neural network for multi-label and multi-class text classification. IEEE Latin American Conference on Computational Intelligence. IEEE, 1–6
Marquetti I, Link JV, Lemes ALG et al (2016) Partial least square with discriminant analysis and near infrared spectroscopy for evaluation of geographic and genotypic origin of arabica coffee. Comput Electron Agric 121(C):313–319
Okada K, Lee MD (2016) A Bayesian approach to modeling group and individual differences in multidimensional scaling. J Math Psychol 70:35–44
Fan Z, Xu Y, Zuo W et al (2017) Modified principal component analysis: an integration of multiple similarity subspace models. IEEE Transactions on Neural Networks & Learning Systems 25(8):1538–1552
Prihatini PM, Putra IKGD, Giriantari IAD et al (2017) Fuzzy-Gibbs latent Dirichlet allocation model for feature extraction on Indonesian documents. Contemporary Engineering Sciences 10:403–421
Zhang Y, Zhang Z (2012) Feature subset selection with cumulate conditional mutual information minimization. Expert Syst Appl 39(5):6078–6088
Yang Y, Pedersen J (1997) A comparative study on feature set selection in text categorization. In: Fisher DH (ed) Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, pp 412–420
Shang W, Huang H, Zhu H et al (2007) A novel feature selection algorithm for text classification. Expert Syst Appl 33(1):1–5
Uysal AK, Gunal S A novel probabilistic feature selection for text classification. Knowl-Based Syst 36:226–235
Mengle SSR, Goharian N (2009) Ambiguity measure feature-selection algorithm. J Am Soc Inf Sci Technol 60:1037–1050
Sebastiani F (2002) Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1):1–47
Shi JT, Liu HL, Xu Y et al (2014) Chinese sentiment classifier machine learning based on optimized information gain feature selection. Adv Mater Res 988:511–516
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Moradi P, Rostami M (2015) Integration of graph clustering with ant colony optimization for feature selection. Knowl-Based Syst 84(C):144–161
Yan J, Liu N, Zhang B (2009) OCFS: optimal orthogonal centroid feature selection for text categorization. International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM: 122–129
Yang J, Qu Z, Liu Z (2014) Improved feature-selection method considering the imbalance problem in text categorization. Sci World J:1–17
Tutkan M, Ganiz MC, Akyokuş S (2016) Helmholtz principle based supervised and unsupervised feature selection methods for text mining. Inf Process Manag 52(5):885–910
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305
Rehman A, Javed K, Babri HA (2017) Feature selection based on a normalized difference measure for text classification. Inf Process Manag 53(2):473–489
Zhou X, Hu Y, Guo L (2014) Text categorization based on clustering feature selection. Procedia Computer Science 31(31):398–405
Hoque N, Bhattacharyya DK, Kalita JK (2014) MIFS-ND: A mutual information-based feature selection. Expert Syst Appl 41(14):6371–6385
Vinh LT, Lee S, Park YT et al (2012) A novel feature selection based on normalized mutual information. Appl Intell 37(1):100–120
Lin Y, Hu Q, Liu J et al (2015) Multi-label feature selection based on max-dependency and min-redundancy. Neurocomputing 168:92–103
Das S (2001) Wrappers and a boosting-based hybrid for feature selection. International Conference on Machine Learning 74–81
Es TF, Hruschka ER, Castro LN et al (2009) A cluster-based feature selection approach. Hybrid Artificial Intelligence Systems, International Conference, Salamanca, Spain, Proceedings DBLP: 169–176
Jaskowiak PA, Campello RJGB (2015) A cluster based hybrid feature selection approach. Intelligent Systems. IEEE, 43–48
Uysal AK (2016) An improved global feature selection scheme for text classification. Expert Syst Appl 43:82–92
Agnihotri D (2017) Variable global feature selection scheme for automatic classification of text documents. Expert Syst Appl 81(C):268–281
Wang Y, Liu Y, Feng L et al (2015) Novel feature selection based on harmony search for email classification. Knowl-Based Syst 73(1):311–323
Zorarpacı E, Özel SA (2016) A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst Appl 62:91–103
Xue B, Zhang M, Browne WN (2014) Particle swarm optimization for feature selection in classification: novel initialization and updating mechanisms. Appl Soft Comput 18:261–276
Ghareb AS, Bakar AA, Hamdan AR (2016) Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst Appl 49:31–47
Wang Y, Feng L (2018) Hybrid feature selection using component co-occurrence based feature relevance measurement. Expert Syst Appl 102:83–99
Bhattacharya S, Selvakumar S (2016) Multi-measure multi-weight ranking approach for the identification of the network features for the detection of DoS and Probe attacks. Comput J 59(6):bxv078
Osanaiye O, Cai H, Choo KKR et al (2016) Ensemble-based multi-filter feature selection for DDoS detection in cloud computing. EURASIP J Wirel Commun Netw 2016(1):130
Wang Y, Feng L, Li Y (2017) Two-step based feature selection for filtering redundant information. J Intell Fuzzy Syst 33(4):2059–2073
Breiman L, Friedman JH, Olshen RA (1984) Classification and regression trees. Wadsworth International Group, Montery
Wang Y, Feng L, Zhu J (2017) Novel artificial bee colony based feature selection for filtering redundant information. Appl Intell 3:1–18
Duda J (1995) Supervised and unsupervised discretization of continuous Features. Machine Learning Proceedings (2):194–202
Paulus J, Klapuri A (2009) Music structure analysis using a probabilistic fitness measure and a greedy search algorithm. IEEE Trans Audio Speech Lang Process 17(6):1159–1170
Dadaneh BZ, Markid HY, Zakerolhosseini A (2016) Unsupervised probabilistic feature selection using ant colony optimization. Expert Syst Appl 53:27–42
Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Department of Information and Computer Science, Irvine
Shan S (2016) Support vector machine. Machine Learning Models and Algorithms for Big Data Classification. Springer US, 24–52
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Masetic Z, Subasi A (2016) Congestive heart failure detection using random forest classifier. Comput Methods Prog Biomed 130(C):54–64
Chang CC, Lin CJLIBSVM (2001) A library for support vector machines. ACM Trans Intell Syst Technol 2(27):1–27
Chen J, Huang H, Tian S, Qu Y (2009) Feature selection for text classification with Naïve Bayes. Expert Syst Appl 36(3):5432–5435
Chang F, Guo J, Xu W et al (2015) A feature selection to handle imbalanced data in text classification. J Digit Inf Manag 13(3):169–175
Yang J, Qu Z, Liu Z (2014) Improved feature-selection method considering the imbalance problem in text categorization. Sci World J 3:625342
Liu WS, Chen X, Gu Q (2018) A noise tolerable feature selection framework for software defect prediction. Chinese Journal of Computers 41(3):506–520
Wang YW, Feng LZ (2018) A new feature selection for handling redundant information in text classification. Frontiers of Information Technology & Electronic Engineering 19(2):221–234
Acknowledgements
This research is supported by the Beijing Natural Science Foundation, China (No. 4174105), the Key Projects of National Bureau of Statistics of China (No. 2017LZ05), the National Key R&D Program of China (2017YFB1400700), the Joint Funds of the National Natural Science Foundation of China (No. U1509214).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
ESM 1
(M 5 kb)
ESM 2
(M 2 kb)
ESM 3
(M 1 kb)
ESM 4
(M 2 kb)
ESM 5
(M 527 bytes)
ESM 6
(M 726 bytes)
ESM 7
(M 1 kb)
ESM 8
(M 2 kb)
ESM 9
(M 480 bytes)
ESM 10
(M 1 kb)
ESM 11
(M 386 bytes)
ESM 12
(M 4 kb)
ESM 13
(M 1 kb)
ESM 14
(M 1 kb)
ESM 15
(M 2 kb)
ESM 16
(M 1 kb)
ESM 17
(M 3 kb)
ESM 18
(M 1 kb)
ESM 19
(M 1 kb)
ESM 20
(M 1 kb)
ESM 21
(M 1 kb)
ESM 22
(M 1 kb)
ESM 23
(M 2 kb)
ESM 24
(M 11 kb)
ESM 25
(M 3 kb)
ESM 26
(M 2 kb)
ESM 27
(M 1005 bytes)
ESM 28
(M 2 kb)
ESM 29
(M 1 kb)
ESM 30
(M 2 kb)
ESM 31
(M 834 bytes)
ESM 32
(M 1 kb)
ESM 33
(M 1 kb)
ESM 34
(M 3 kb)
ESM 35
(M 1 kb)
ESM 36
(M 1 kb)
ESM 37
(M 1 kb)
ESM 38
(M 170 bytes)
ESM 39
(M 732 bytes)
ESM 40
(M 2 kb)
ESM 41
(M 1 kb)
ESM 42
(M 2 kb)
ESM 43
(M 1 kb)
ESM 44
(M 1 kb)
ESM 45
(M 1 kb)
Rights and permissions
About this article
Cite this article
Wang, Y., Feng, L. A new hybrid feature selection based on multi-filter weights and multi-feature weights. Appl Intell 49, 4033–4057 (2019). https://doi.org/10.1007/s10489-019-01470-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01470-z