Keywords

1 Introduction

Gas sensor array is an important part of classical electronic nose systems, which converts different gases into different electrical signals thus enabling pattern recognition [5]. The time series signals of the sensor response are usually represented in a low-dimensional way by feature extraction [13]. However, in practice, some sensors in the array are useless for gas detection because they do not respond to the target gas or are heavily disturbed by noise, then the corresponding extracted features will be redundant for pattern recognition task. The optimization of the sensor array, i.e., the selection of the optimal sensor combination, will improve the accuracy of pattern recognition [8]. In addition, it will also reduce the complexity and cost of the subsequent system design.

Sensor array optimization is usually performed by combining feature selection methods due to their similar purpose [7, 9, 11]. In [7], three types of feature selection methods, t-statistics, Fisher’s criterion and minimum redundancy maximum relevance, were used to select the most informative features, experimental results show that performance of electronic nose system was improved by 6–10\(\%\). In [9], five methods were used to optimize sensor array, then linear discriminant analysis (LDA) achieved the best experimental results, with a 9.6\(\%\) increase in recognition rate while reducing the number of sensors by 10. Most of the above methods obtain the sensor importance ranking by evaluating the importance of individual features, but the best combination of features is not necessarily composed of the best features. In many scenarios, especially where multiple features are extracted for each sensor, considering the correlation among features is more valuable for selecting the optimal feature set.

Recently, sparse learning based feature selection methods have received considerable attention due to its good performance and interpretability [2,3,4]. In these methods, the \(\ell _{p}\) norm or \(\ell _{p,q}\) norm regularization terms is often used, which forces important features to have large coefficients and unimportant features to have small or zero coefficients, thus completing the selection of important features. Specially, in [4], the joint \(\ell _{2,1}\) norms minimization was designed for multi-class classification problems, it selects features across all data points with joint sparsity. In [3], sparse group LASSO was used to select most informative features and thus improve the accuracy of the binary classification problem. The group structure between features is considered in group LASSO-based approaches, but the \(\ell _{1}\) norm in them is commonly used to constrain variables in vector form. The \(\ell _{2,1}\) norm-based methods eliminate unimportant features by generating row sparse solutions, but does not consider the correlation between features.

Motivated by the previous works, in this paper, we propose a robust sensor array optimization method (RSAO) for multi-feature fusion data classification by combining the least squares regression framework and an \(\ell _{F,1}\) norm regularizer design. When each sensor is characterized by multiple features, there are clear group structures divided by sensor category among the features. Therefore, the \(\ell _{F,1}\) norm regularizer is designed to enforce unimportant feature groups to have small or zero coefficients, and then the corresponding sensors can be removed. Compared with traditional methods, the proposed method takes into account the intrinsic relevance of features and selects important sensors by directly scoring the feature groups. Besides, to improve the discriminative ability of the model, we further introduce the \(\varepsilon \)-dragging technique proposed in [12] to increase the inter-class distance. Meanwhile, an efficient alternating iteration algorithm is presented to solve the convex optimization problem. Experimental results on the gas sensor data sets show that the sensor combinations selected by the proposed method have better classification accuracy than the other conventional methods.

The rest of the paper is organized as follows. In Sect. 2, we present the proposed method and its optimization algorithm. In Sect. 3, we report experimental results. Finally, conclusions are offered in Sect. 4.

2 The Proposed Method

In this section, we propose a robust sensor array optimization method by combining an \(\ell _{F,1}\) norm regularizer design and least squares regression framework. Meanwhile, an efficient iterative algorithm is presented to optimize the convex objective function.

2.1 \(\ell _{F,1}\) Norm Regularization Term

Here we first summarize the common norms for vectors and matrices. For any vector \({\textbf {x}}=[x_{1},x_{2},\ldots ,x_{n}]^{T}\), its \(\ell _{1}\) norm and \(\ell _{2}\) norm are

$$\begin{aligned} \Vert {\textbf {x}}\Vert _{1}=\sum _{i=1}^{n}|x_{i}| \end{aligned}$$
(1)
$$\begin{aligned} \Vert {\textbf {x}}\Vert _{2}=\sqrt{x_{1}^2+x_{2}^2+\cdots +x_{n}^2}. \end{aligned}$$
(2)

For any matrix \({\textbf {X}} \in \mathbb {R}^{n\times d}\), its i-th row and j-th column element is denoted as \(X_{ij}\), then its Frobenius norm and \(\ell _{2,1}\) norm can be calculated by

$$\begin{aligned} \Vert {\textbf {X}}\Vert _{F}=(\sum _{i=1}^{n}\sum _{j=1}^{d}X_{ij}^{2})^{1/2} \end{aligned}$$
(3)
$$\begin{aligned} \Vert {\textbf {X}}\Vert _{2,1}=\sum _{i=1}^{n}\sqrt{\sum _{j=1}^{d}X_{ij}^{2}}. \end{aligned}$$
(4)

Models based on \(\ell _{1}\) norm regularizer can usually generate sparse solutions, and based on similar principles, row sparse solutions can also be generated for models based on \(\ell _{2,1}\) norm regularizer. To enable a model to produce grouped row sparse solutions, we design the \(\ell _{F,1}\) norm as follows.

Suppose \({\textbf {X}}\in \mathbb {R}^{n\times d}\) is the feature fusion matrix extracted from the sensor array response values, and d features can be divided into m groups by sensor category, that is, \({\textbf {X}}\) is divided into m block matrices by column \({\textbf {X}}=[{\textbf {X}}_{1},{\textbf {X}}_{2},\ldots ,{\textbf {X}}_{m}]\), where n and m are the number of samples and sensors, respectively. Correspondingly, the transformation matrix \({\textbf {W}}\) can also be divided into m block matrices by row \({\textbf {W}}=[{\textbf {W}}_{1}^{T},{\textbf {W}}_{2}^{T},\ldots ,{\textbf {W}}_{m}^{T}]^{T}\), and then we define \(\ell _{F,1}\) norm of matrix as

$$\begin{aligned} \Vert {\textbf {W}}\Vert _{F,1}=\sum _{s=1}^{m}\Vert {\textbf {W}}_{s}\Vert _{F} \end{aligned}$$
(5)

Obviously, \(\Vert \cdot \Vert _{F,1}\) is a norm due to satisfying the positive definiteness, absolute homogeneity and triangle inequality.

2.2 Sensor Selection Model

For binary classification problems, the \(\ell _{1}\) norm-based models are often used to select important features, such as LASSO model

$$\begin{aligned} \mathop {\min }\limits _{{\textbf {w}}} \Vert {\textbf {X}}{} {\textbf {w}}+{\textbf {1}}b-{\textbf {y}}\Vert _{2}^{2}+\lambda \Vert {\textbf {w}}\Vert _{1} \end{aligned}$$
(6)

where \({\textbf {1}}\) is a vector with all elements one. In multi-class classification problems, class label vector \({\textbf {y}}_{i}\) is usually transformed into matrix consisting of “0/1” element by one-hot coding. At this point, feature selection can be accomplished by using \(\ell _{2,1}\) norm-based models, such as

$$\begin{aligned} \mathop {\min }\limits _{{\textbf {W}}} \Vert {\textbf {X}}{} {\textbf {W}}+{\textbf {1}}{} {\textbf {b}}^{T}-{\textbf {Y}}\Vert _{F}^{2}+\lambda \Vert {\textbf {W}}\Vert _{2,1}. \end{aligned}$$
(7)

Usually, traditional feature selection methods obtain the importance ranking of sensors by scoring individual features, in which the intrinsic structure between features is not considered. However, in many sensor array optimization tasks, each sensor is represented by multiple features, it is valuable to consider the group structure among features divided by sensor type. Therefore, following the least squares regression framework, we propose a novel sensor array optimization method for multi-feature fusion data classification by using the \(\ell _{F,1}\) norm regularizer as follows:

$$\begin{aligned} \mathop {\min }\limits _{{\textbf {W}}} \Vert {\textbf {X}}{} {\textbf {W}}+{\textbf {1}}{} {\textbf {b}}^{T}-{\textbf {Y}}\Vert _{F}^{2}+\lambda \Vert {\textbf {W}}\Vert _{F,1}. \end{aligned}$$
(8)

In Eq. (8), the intrinsic group structure of the features is considered and the unimportant feature groups are forced to have small or zero coefficients, so that the corresponding sensors are removed. In addition, to increase the robustness of the model, the Frobenius norm is replaced by the \(\ell _{2,1}\) norm, while the \(\varepsilon \)-dragging trick is introduced to improve the discriminative performance of the model, and finally we obtain a robust sensor array optimization method (RSAO) as follows:

$$\begin{aligned} \mathop {\min }\limits _{{\textbf {M}},{\textbf {W}},{\textbf {b}}} \Vert {\textbf {X}}{} {\textbf {W}}+{\textbf {1}}{} {\textbf {b}}^{T}-{\textbf {Y}}-{\textbf {E}}\odot {\textbf {M}}\Vert _{2,1}+\lambda \Vert {\textbf {W}}\Vert _{F,1} \end{aligned}$$
(9)

where symbol \(\odot \) represents Hadamard product of matrix, matrix \({\textbf {M}}\) consists of positive elements, \(\lambda \) is a positive regularized parameter, and matrix \({\textbf {E}}\) is defined as

$$\begin{aligned} E_{ij}= {\left\{ \begin{array}{ll} +1, &{} \text {if }Y_{ij}\text { =1}\\ -1, &{} \text {if }Y_{ij}\text { =0}. \end{array}\right. } \end{aligned}$$
(10)

In sensor selection process, the proposed method uses \(\Vert {\textbf {W}}_{i}\Vert _{F}\) as a score for the feature subset \({\textbf {X}}_{i}\) to obtain the importance ranking of the sensors, which is more efficient and has better interpretability.

2.3 Model Optimization

Clearly, there are three unknown variables to learn in Eq. (9), so we present an alternating iteration algorithm to solve it. First, given \({\textbf {M}}\), let \(\tilde{{\textbf {Y}}}={\textbf {Y}}+{\textbf {E}}\odot {\textbf {M}}\), we will solve the following problem

$$\begin{aligned}&\mathop {\min }\limits _{{\textbf {W}},{\textbf {b}}} \Vert {\textbf {X}}{} {\textbf {W}}+{\textbf {1}}{} {\textbf {b}}^{T}-\tilde{{\textbf {Y}}}\Vert _{2,1}+\lambda \Vert {\textbf {W}}\Vert _{F,1}. \end{aligned}$$
(11)

Since problem (11) has no analytical solution, we present an efficient iterative algorithm to solve for \({\textbf {W}}\) and \({\textbf {b}}\).

Let \(J({\textbf {W}},{\textbf {b}})\) be objective function of problem (11), taking the derivative of the function \(J({\textbf {W}},{\textbf {b}})\) with respect to \({\textbf {W}}\) and \({\textbf {b}}\), we have

$$\begin{aligned} \frac{\partial J({\textbf {W}},{\textbf {b}})}{\partial {\textbf {W}}}={\textbf {X}}^{T}{} {\textbf {U}}_{1}({\textbf {X}}{} {\textbf {W}}+{\textbf {1}}{} {\textbf {b}}^{T}-{\textbf {Y}})+\lambda {\textbf {U}}_{2}{} {\textbf {W}} \end{aligned}$$
(12)
$$\begin{aligned} \frac{\partial J({\textbf {W}},{\textbf {b}})}{\partial {\textbf {b}}^{T}}={\textbf {1}}^{T}{} {\textbf {U}}_{1}({\textbf {X}}{} {\textbf {W}}+{\textbf {1}}{} {\textbf {b}}^{T}-{\textbf {Y}}) \end{aligned}$$
(13)

where \({\textbf {U}}_{1}\) and \({\textbf {U}}_{2}\) are diagonal matrix, and their diagonal elements are

$$\begin{aligned} U_{1}^{tt}=\frac{1}{\Vert {\textbf {e}}_{t}\Vert _{2}}, \quad t=1,2\ldots ,n \end{aligned}$$
(14)
$$\begin{aligned} U_{2}^{ii}=\frac{1}{\Vert {\textbf {W}}_{j}\Vert _{F}}, {\textbf {w}}_{i}\in {\textbf {W}}_{j}, j=1,2\ldots ,m \end{aligned}$$
(15)

where \({\textbf {e}}_{t}\) is t-th row of matrix \({\textbf {X}}{} {\textbf {W}}+{\textbf {1}}{} {\textbf {b}}^{T}-\tilde{{\textbf {Y}}}\) and \({\textbf {w}}_{i}\) is i-th row of matrix \({\textbf {W}}\). Setting Eq. (13) equal to zero, we can get

$$\begin{aligned} {\textbf {b}}^{T}=c{\textbf {1}}^{T}{} {\textbf {U}}_{1}({\textbf {Y}}-{\textbf {X}}{} {\textbf {W}}) \end{aligned}$$
(16)

where c is equal to \(({\textbf {1}}^{T}{} {\textbf {U}}_{1}{} {\textbf {1}})^{-1}\). Then setting Eq. (12) equal to zero and using Eq. (16), we can obtain

$$\begin{aligned} {\textbf {W}}=({\textbf {X}}^{T}{} {\textbf {L}}{} {\textbf {X}}+\lambda {\textbf {U}}_{2})^{-1}{} {\textbf {X}}^{T}{} {\textbf {L}}{} {\textbf {Y}} \end{aligned}$$
(17)

where \({\textbf {L}}={\textbf {U}}_{1}-c{\textbf {U}}_{1}{} {\textbf {1}}{} {\textbf {1}}^{T}{} {\textbf {U}}_{1}\). Note that the computation of \({\textbf {U}}_{1}\) and \({\textbf {U}}_{2}\) depends on \({\textbf {W}}\) and \({\textbf {b}}\), so \({\textbf {W}}\) and \({\textbf {b}}\) can be iteratively updated by using \({\textbf {U}}_{1}\) and \({\textbf {U}}_{2}\) from the previous step.

Second, we perform the optimization of \({\textbf {M}}\). Given \({\textbf {W}}\) and \({\textbf {b}}\), let \({\textbf {T}}={\textbf {X}}{} {\textbf {W}}+{\textbf {1}}{} {\textbf {b}}^{T}-{\textbf {Y}}\), then we need to solve the following problem

$$\begin{aligned} \mathop {\min }\limits _{{\textbf {M}}}\Vert {\textbf {T}}-{\textbf {E}}\odot {\textbf {M}}\Vert _{2,1} \end{aligned}$$
(18)

this problem can be decomposed into subproblems by row

$$\begin{aligned} \mathop {\min }\limits _{{\textbf {m}}_{i}}\Vert {\textbf {t}}_{i}-{\textbf {e}}_{i} \odot {\textbf {m}}_{i} \Vert _{2},~~~i=1,2\cdots ,n \end{aligned}$$
(19)

where \({\textbf {t}}_{i}\), \({\textbf {e}}_{i}\), and \({\textbf {m}}_{i}\), are \(i \)-th row of the matrix \({\textbf {T}}\), \({\textbf {E}}\), and \({\textbf {M}}\), respectively.

Let \(J({\textbf {m}}_{i})\) be objective function of problem (19), taking the derivative of \(J({\textbf {m}}_{i})\) with respect to \(M_{ij}\), we have

$$\begin{aligned} \frac{\partial J({\textbf {m}}_{i})}{\partial M_{ij}}=\frac{E_{ij}T_{ij}-M_{ij}}{\Vert {\textbf {t}}_{i}-{\textbf {e}}_{i} \odot {\textbf {m}}_{i} \Vert _{2}} \end{aligned}$$
(20)

and set it equal to zero, we obtain

$$\begin{aligned} M_{ij}=max\{0,E_{ij}T_{ij}\} \end{aligned}$$
(21)

where \(M_{ij}\), \(E_{ij}\), and \(T_{ij}\) are \(i \)-th row and \(j \)-th column of the matrix M, E, and T, respectively. So the solution of problem (18) can be write as

$$\begin{aligned} {\textbf {M}}=max\{{\textbf {0}},{\textbf {E}}\odot {\textbf {T}}\} \end{aligned}$$
(22)

In short, we present an alternating iterative method to solve problem (9), and it mainly includes two steps. First, optimize vector \({\textbf {b}}\) and matrix \({\textbf {W}}\) with fixed matrix \({\textbf {M}}\) according to Eq. (16) and Eq. (17); Second, optimize matrix \({\textbf {M}}\) with fixed vector \({\textbf {b}}\) and matrix \({\textbf {W}}\) according to Eq. (22). The proposed iterative algorithm is summarized as Algorithm 1.

figure a

2.4 Complexity Analysis

The computational cost of Algorithm 1 is mainly concentrated in three parts due to matrix inverse and matrix product calculation. In step 6, the complexity of calculating \({\textbf {b}}\) is \(O(n^{2}k+ndk)\), where n and k are the number of samples and classes, d is the dimension of features. In step 7, the complexity of calculating \({\textbf {W}}\) is \(O(n^{2}d+nd^{2}+ndk+d^{3}+d^{2}k)\). The sum of the complexity of computing \({\textbf {W}}\) and \({\textbf {b}}\) is \(O(n^{2}d+nd^{2}+ndk+d^{3}+d^{2}k+n^{2}k)\). In step 9, the complexity of calculating \({\textbf {X}}{} {\textbf {W}}\) is O(ndk). Since the number of classes is much smaller than the number of samples and the feature dimension, neglecting the lower order quantities, the total computational complexity of algorithm 1 is \(O(\tau (n^{2}d+d^{3})\), where \(\tau \) is the number of iterations.

3 Experiment

In this section, we evaluate the proposed method on gas sensor array data sets while comparing other classical methods.

3.1 Data Sets

We provide a brief description of all the data sets used in the experiments as follows. Note that three data sets are from the UCI Machine Learning Repository and the corresponding papers are cited.

Gas sensor array under flow modulation data set (GSAFM) [15]: this data set was collected from an array of 16 metal-oxide gas sensors under gas flow modulation conditions. It contains four categories and a total of 58 samples. Each sample included 16 time series (one time series per sensor),and then 27 features contain one maximum features, 13 high-frequency features and 13 low-frequency features extracted from each time series as corresponding sensor features, so each sample has 432 features.

Gas sensor array drift data set (GSAD) [6, 10]: this data set was collected from an array of 16 chemical sensors exposed six gases. Its first batch contains six classes and a total of 445 samples. Each sample contains 16 time series, four steady-state features and four dynamic features are extracted from each time series, thus each sample is characterized by 128 features.

Gas sensors for home activity monitoring data set (GSHAM) [1]: this data set has recorded time series signals of eight gas sensors in response to wine, banana and background activity. It contains three classes and a total of 100 samples. After the time series signals are filtered by FIR low-pass filter, each sensor signal is represented by three features, minimum, average, and minimum slope, i.e., each sample contains 24 features.

3.2 Experiments Settings

We will compare our method, RSAO, with the classical T-test [7], LDA [9], MI [14] and ERFS [4]. All data are normalized by Z-score method. Classification accuracy is used to evaluate the performance of the selected sensors. A linear SVM classifier is trained on the training set, and then its classification accuracy on the test set is calculated by the function fitcsvm in Matlab. For all data sets, ten times five-fold cross-validation is performed, i.e., 50 classification accuracies are obtained, and finally we present the average accuracy and standard deviation of the different methods to compare.

In ERFS and RSAO, the regularization parameter \(\lambda \) needs to be tuned. In addition, the regularization parameter C in SVM needs to be tuned for all methods. In each experiment, the optimal parameters are selected by the grid search method with three-fold cross validation as the evaluation criterion based on the training set. The candidate set for log value of parameter \(\lambda \) is \(\{-2,-1,0,1,2,3,4\}\) and the candidate set for log value of parameter C is \(\{-3,-2,-1,0,1,2,3\}\).

3.3 Comparison of Classification Accuracy

Figure 1 presents the relationship between the classification accuracy and the number of sensors selected by the five methods. It can be seen that the proposed method surpasses the other methods in most points. Table 1 presents the classification accuracies of the raw sensor array features and the optimized sensor array features, as well as the corresponding number of selected sensors. It can be seen that the classification accuracy has been significantly improved after sensor array optimization, and the proposed method achieves better accuracy than other methods while fewer sensors are selected. In addition, Fig. 2 presents the variation of the values of the objective function (11) and the objective function (9) with the number of iterations in the proposed optimization algorithm. It can be seen that the objective function values are monotonically decreasing and are convergent after fewer iteration steps.

Table 1. Optimal classification accuracy and corresponding number of selected sensors after executing sensor array optimization.
Fig. 1.
figure 1

Relationship between classification accuracy and the number of selected sensors on the three gas sensor array data sets.

Fig. 2.
figure 2

Relationship between the objective function value and the number of algorithm iterations.

4 Conclusion and Future Work

In this paper, we propose a novel sensor array optimization method for multi-feature fusion data classification. The intrinsic group structure of the sensor features is considered by combining a least squares regression framework and an \(\ell _{F,1}\) norm regularization design. Experimental results on the gas sensor array data sets demonstrate that the proposed method can effectively improve the classification accuracy while reducing the number of sensors compared to other classical methods.

However, the work in this paper has a limitation that it is only evaluated on gas sensor array data sets. In the future, the generalization of the proposed method will be validated for more feature fusion tasks, such as EEG signals, machine fault detection, radar array signals, etc.