Abstract
In this paper, we propose a robust sensor array optimization method based on sparse learning for multi-feature fusion data classification. The proposed approach contains three key characteristics. First, it considers the intrinsic group structure among features by combining an \(\ell _{F,1}\) norm regularizer design and least squares regression framework. Second, in sensor selection, insignificant feature groups can be eliminated by grouped row sparse coefficients generated by the model, while the \(\varepsilon \)-dragging trick is introduced to improve the classification ability. Third, an efficient alternating iteration algorithm is presented to optimize the convex objective function. The results compared with the other classical methods on gas sensor array data sets demonstrate that the proposed method can effectively reduce the number of sensors with higher classification accuracy.
This work is founded by the Natural Science Foundation of China (No. 62171066).
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Gas sensor array is an important part of classical electronic nose systems, which converts different gases into different electrical signals thus enabling pattern recognition [5]. The time series signals of the sensor response are usually represented in a low-dimensional way by feature extraction [13]. However, in practice, some sensors in the array are useless for gas detection because they do not respond to the target gas or are heavily disturbed by noise, then the corresponding extracted features will be redundant for pattern recognition task. The optimization of the sensor array, i.e., the selection of the optimal sensor combination, will improve the accuracy of pattern recognition [8]. In addition, it will also reduce the complexity and cost of the subsequent system design.
Sensor array optimization is usually performed by combining feature selection methods due to their similar purpose [7, 9, 11]. In [7], three types of feature selection methods, t-statistics, Fisher’s criterion and minimum redundancy maximum relevance, were used to select the most informative features, experimental results show that performance of electronic nose system was improved by 6–10\(\%\). In [9], five methods were used to optimize sensor array, then linear discriminant analysis (LDA) achieved the best experimental results, with a 9.6\(\%\) increase in recognition rate while reducing the number of sensors by 10. Most of the above methods obtain the sensor importance ranking by evaluating the importance of individual features, but the best combination of features is not necessarily composed of the best features. In many scenarios, especially where multiple features are extracted for each sensor, considering the correlation among features is more valuable for selecting the optimal feature set.
Recently, sparse learning based feature selection methods have received considerable attention due to its good performance and interpretability [2,3,4]. In these methods, the \(\ell _{p}\) norm or \(\ell _{p,q}\) norm regularization terms is often used, which forces important features to have large coefficients and unimportant features to have small or zero coefficients, thus completing the selection of important features. Specially, in [4], the joint \(\ell _{2,1}\) norms minimization was designed for multi-class classification problems, it selects features across all data points with joint sparsity. In [3], sparse group LASSO was used to select most informative features and thus improve the accuracy of the binary classification problem. The group structure between features is considered in group LASSO-based approaches, but the \(\ell _{1}\) norm in them is commonly used to constrain variables in vector form. The \(\ell _{2,1}\) norm-based methods eliminate unimportant features by generating row sparse solutions, but does not consider the correlation between features.
Motivated by the previous works, in this paper, we propose a robust sensor array optimization method (RSAO) for multi-feature fusion data classification by combining the least squares regression framework and an \(\ell _{F,1}\) norm regularizer design. When each sensor is characterized by multiple features, there are clear group structures divided by sensor category among the features. Therefore, the \(\ell _{F,1}\) norm regularizer is designed to enforce unimportant feature groups to have small or zero coefficients, and then the corresponding sensors can be removed. Compared with traditional methods, the proposed method takes into account the intrinsic relevance of features and selects important sensors by directly scoring the feature groups. Besides, to improve the discriminative ability of the model, we further introduce the \(\varepsilon \)-dragging technique proposed in [12] to increase the inter-class distance. Meanwhile, an efficient alternating iteration algorithm is presented to solve the convex optimization problem. Experimental results on the gas sensor data sets show that the sensor combinations selected by the proposed method have better classification accuracy than the other conventional methods.
The rest of the paper is organized as follows. In Sect. 2, we present the proposed method and its optimization algorithm. In Sect. 3, we report experimental results. Finally, conclusions are offered in Sect. 4.
2 The Proposed Method
In this section, we propose a robust sensor array optimization method by combining an \(\ell _{F,1}\) norm regularizer design and least squares regression framework. Meanwhile, an efficient iterative algorithm is presented to optimize the convex objective function.
2.1 \(\ell _{F,1}\) Norm Regularization Term
Here we first summarize the common norms for vectors and matrices. For any vector \({\textbf {x}}=[x_{1},x_{2},\ldots ,x_{n}]^{T}\), its \(\ell _{1}\) norm and \(\ell _{2}\) norm are
For any matrix \({\textbf {X}} \in \mathbb {R}^{n\times d}\), its i-th row and j-th column element is denoted as \(X_{ij}\), then its Frobenius norm and \(\ell _{2,1}\) norm can be calculated by
Models based on \(\ell _{1}\) norm regularizer can usually generate sparse solutions, and based on similar principles, row sparse solutions can also be generated for models based on \(\ell _{2,1}\) norm regularizer. To enable a model to produce grouped row sparse solutions, we design the \(\ell _{F,1}\) norm as follows.
Suppose \({\textbf {X}}\in \mathbb {R}^{n\times d}\) is the feature fusion matrix extracted from the sensor array response values, and d features can be divided into m groups by sensor category, that is, \({\textbf {X}}\) is divided into m block matrices by column \({\textbf {X}}=[{\textbf {X}}_{1},{\textbf {X}}_{2},\ldots ,{\textbf {X}}_{m}]\), where n and m are the number of samples and sensors, respectively. Correspondingly, the transformation matrix \({\textbf {W}}\) can also be divided into m block matrices by row \({\textbf {W}}=[{\textbf {W}}_{1}^{T},{\textbf {W}}_{2}^{T},\ldots ,{\textbf {W}}_{m}^{T}]^{T}\), and then we define \(\ell _{F,1}\) norm of matrix as
Obviously, \(\Vert \cdot \Vert _{F,1}\) is a norm due to satisfying the positive definiteness, absolute homogeneity and triangle inequality.
2.2 Sensor Selection Model
For binary classification problems, the \(\ell _{1}\) norm-based models are often used to select important features, such as LASSO model
where \({\textbf {1}}\) is a vector with all elements one. In multi-class classification problems, class label vector \({\textbf {y}}_{i}\) is usually transformed into matrix consisting of “0/1” element by one-hot coding. At this point, feature selection can be accomplished by using \(\ell _{2,1}\) norm-based models, such as
Usually, traditional feature selection methods obtain the importance ranking of sensors by scoring individual features, in which the intrinsic structure between features is not considered. However, in many sensor array optimization tasks, each sensor is represented by multiple features, it is valuable to consider the group structure among features divided by sensor type. Therefore, following the least squares regression framework, we propose a novel sensor array optimization method for multi-feature fusion data classification by using the \(\ell _{F,1}\) norm regularizer as follows:
In Eq. (8), the intrinsic group structure of the features is considered and the unimportant feature groups are forced to have small or zero coefficients, so that the corresponding sensors are removed. In addition, to increase the robustness of the model, the Frobenius norm is replaced by the \(\ell _{2,1}\) norm, while the \(\varepsilon \)-dragging trick is introduced to improve the discriminative performance of the model, and finally we obtain a robust sensor array optimization method (RSAO) as follows:
where symbol \(\odot \) represents Hadamard product of matrix, matrix \({\textbf {M}}\) consists of positive elements, \(\lambda \) is a positive regularized parameter, and matrix \({\textbf {E}}\) is defined as
In sensor selection process, the proposed method uses \(\Vert {\textbf {W}}_{i}\Vert _{F}\) as a score for the feature subset \({\textbf {X}}_{i}\) to obtain the importance ranking of the sensors, which is more efficient and has better interpretability.
2.3 Model Optimization
Clearly, there are three unknown variables to learn in Eq. (9), so we present an alternating iteration algorithm to solve it. First, given \({\textbf {M}}\), let \(\tilde{{\textbf {Y}}}={\textbf {Y}}+{\textbf {E}}\odot {\textbf {M}}\), we will solve the following problem
Since problem (11) has no analytical solution, we present an efficient iterative algorithm to solve for \({\textbf {W}}\) and \({\textbf {b}}\).
Let \(J({\textbf {W}},{\textbf {b}})\) be objective function of problem (11), taking the derivative of the function \(J({\textbf {W}},{\textbf {b}})\) with respect to \({\textbf {W}}\) and \({\textbf {b}}\), we have
where \({\textbf {U}}_{1}\) and \({\textbf {U}}_{2}\) are diagonal matrix, and their diagonal elements are
where \({\textbf {e}}_{t}\) is t-th row of matrix \({\textbf {X}}{} {\textbf {W}}+{\textbf {1}}{} {\textbf {b}}^{T}-\tilde{{\textbf {Y}}}\) and \({\textbf {w}}_{i}\) is i-th row of matrix \({\textbf {W}}\). Setting Eq. (13) equal to zero, we can get
where c is equal to \(({\textbf {1}}^{T}{} {\textbf {U}}_{1}{} {\textbf {1}})^{-1}\). Then setting Eq. (12) equal to zero and using Eq. (16), we can obtain
where \({\textbf {L}}={\textbf {U}}_{1}-c{\textbf {U}}_{1}{} {\textbf {1}}{} {\textbf {1}}^{T}{} {\textbf {U}}_{1}\). Note that the computation of \({\textbf {U}}_{1}\) and \({\textbf {U}}_{2}\) depends on \({\textbf {W}}\) and \({\textbf {b}}\), so \({\textbf {W}}\) and \({\textbf {b}}\) can be iteratively updated by using \({\textbf {U}}_{1}\) and \({\textbf {U}}_{2}\) from the previous step.
Second, we perform the optimization of \({\textbf {M}}\). Given \({\textbf {W}}\) and \({\textbf {b}}\), let \({\textbf {T}}={\textbf {X}}{} {\textbf {W}}+{\textbf {1}}{} {\textbf {b}}^{T}-{\textbf {Y}}\), then we need to solve the following problem
this problem can be decomposed into subproblems by row
where \({\textbf {t}}_{i}\), \({\textbf {e}}_{i}\), and \({\textbf {m}}_{i}\), are \(i \)-th row of the matrix \({\textbf {T}}\), \({\textbf {E}}\), and \({\textbf {M}}\), respectively.
Let \(J({\textbf {m}}_{i})\) be objective function of problem (19), taking the derivative of \(J({\textbf {m}}_{i})\) with respect to \(M_{ij}\), we have
and set it equal to zero, we obtain
where \(M_{ij}\), \(E_{ij}\), and \(T_{ij}\) are \(i \)-th row and \(j \)-th column of the matrix M, E, and T, respectively. So the solution of problem (18) can be write as
In short, we present an alternating iterative method to solve problem (9), and it mainly includes two steps. First, optimize vector \({\textbf {b}}\) and matrix \({\textbf {W}}\) with fixed matrix \({\textbf {M}}\) according to Eq. (16) and Eq. (17); Second, optimize matrix \({\textbf {M}}\) with fixed vector \({\textbf {b}}\) and matrix \({\textbf {W}}\) according to Eq. (22). The proposed iterative algorithm is summarized as Algorithm 1.
2.4 Complexity Analysis
The computational cost of Algorithm 1 is mainly concentrated in three parts due to matrix inverse and matrix product calculation. In step 6, the complexity of calculating \({\textbf {b}}\) is \(O(n^{2}k+ndk)\), where n and k are the number of samples and classes, d is the dimension of features. In step 7, the complexity of calculating \({\textbf {W}}\) is \(O(n^{2}d+nd^{2}+ndk+d^{3}+d^{2}k)\). The sum of the complexity of computing \({\textbf {W}}\) and \({\textbf {b}}\) is \(O(n^{2}d+nd^{2}+ndk+d^{3}+d^{2}k+n^{2}k)\). In step 9, the complexity of calculating \({\textbf {X}}{} {\textbf {W}}\) is O(ndk). Since the number of classes is much smaller than the number of samples and the feature dimension, neglecting the lower order quantities, the total computational complexity of algorithm 1 is \(O(\tau (n^{2}d+d^{3})\), where \(\tau \) is the number of iterations.
3 Experiment
In this section, we evaluate the proposed method on gas sensor array data sets while comparing other classical methods.
3.1 Data Sets
We provide a brief description of all the data sets used in the experiments as follows. Note that three data sets are from the UCI Machine Learning Repository and the corresponding papers are cited.
Gas sensor array under flow modulation data set (GSAFM) [15]: this data set was collected from an array of 16 metal-oxide gas sensors under gas flow modulation conditions. It contains four categories and a total of 58 samples. Each sample included 16 time series (one time series per sensor),and then 27 features contain one maximum features, 13 high-frequency features and 13 low-frequency features extracted from each time series as corresponding sensor features, so each sample has 432 features.
Gas sensor array drift data set (GSAD) [6, 10]: this data set was collected from an array of 16 chemical sensors exposed six gases. Its first batch contains six classes and a total of 445 samples. Each sample contains 16 time series, four steady-state features and four dynamic features are extracted from each time series, thus each sample is characterized by 128 features.
Gas sensors for home activity monitoring data set (GSHAM) [1]: this data set has recorded time series signals of eight gas sensors in response to wine, banana and background activity. It contains three classes and a total of 100 samples. After the time series signals are filtered by FIR low-pass filter, each sensor signal is represented by three features, minimum, average, and minimum slope, i.e., each sample contains 24 features.
3.2 Experiments Settings
We will compare our method, RSAO, with the classical T-test [7], LDA [9], MI [14] and ERFS [4]. All data are normalized by Z-score method. Classification accuracy is used to evaluate the performance of the selected sensors. A linear SVM classifier is trained on the training set, and then its classification accuracy on the test set is calculated by the function fitcsvm in Matlab. For all data sets, ten times five-fold cross-validation is performed, i.e., 50 classification accuracies are obtained, and finally we present the average accuracy and standard deviation of the different methods to compare.
In ERFS and RSAO, the regularization parameter \(\lambda \) needs to be tuned. In addition, the regularization parameter C in SVM needs to be tuned for all methods. In each experiment, the optimal parameters are selected by the grid search method with three-fold cross validation as the evaluation criterion based on the training set. The candidate set for log value of parameter \(\lambda \) is \(\{-2,-1,0,1,2,3,4\}\) and the candidate set for log value of parameter C is \(\{-3,-2,-1,0,1,2,3\}\).
3.3 Comparison of Classification Accuracy
Figure 1 presents the relationship between the classification accuracy and the number of sensors selected by the five methods. It can be seen that the proposed method surpasses the other methods in most points. Table 1 presents the classification accuracies of the raw sensor array features and the optimized sensor array features, as well as the corresponding number of selected sensors. It can be seen that the classification accuracy has been significantly improved after sensor array optimization, and the proposed method achieves better accuracy than other methods while fewer sensors are selected. In addition, Fig. 2 presents the variation of the values of the objective function (11) and the objective function (9) with the number of iterations in the proposed optimization algorithm. It can be seen that the objective function values are monotonically decreasing and are convergent after fewer iteration steps.
4 Conclusion and Future Work
In this paper, we propose a novel sensor array optimization method for multi-feature fusion data classification. The intrinsic group structure of the sensor features is considered by combining a least squares regression framework and an \(\ell _{F,1}\) norm regularization design. Experimental results on the gas sensor array data sets demonstrate that the proposed method can effectively improve the classification accuracy while reducing the number of sensors compared to other classical methods.
However, the work in this paper has a limitation that it is only evaluated on gas sensor array data sets. In the future, the generalization of the proposed method will be validated for more feature fusion tasks, such as EEG signals, machine fault detection, radar array signals, etc.
References
Huerta, R., Mosqueiro, T., Fonollosa, J., Rulkov, N.F., Rodriguez-Lujan, I.: Online decorrelation of humidity and temperature in chemical sensors for continuous monitoring. Chemometr. Intell. Lab. Syst. 157, 169–176 (2016)
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 1–45 (2017)
Liu, B., et al.: Lung cancer detection via breath by electronic nose enhanced with a sparse group feature selection approach. Sens. Actuators B Chem. 339, 129896 (2021)
Nie, F., Huang, H., Cai, X., Ding, C.: Efficient and robust feature selection via joint \(\ell \)2, 1-norms minimization. Adv. Neural Information Process. Syst. 23, 1813–1821 (2010)
Röck, F., Barsan, N., Weimar, U.: Electronic nose: current status and future trends. Chem. Rev. 108(2), 705–725 (2008)
Rodriguez-Lujan, I., Fonollosa, J., Vergara, A., Homer, M., Huerta, R.: On the calibration of sensor arrays for pattern recognition using the minimal number of experiments. Chemometr. Intell. Lab. Syst. 130, 123–134 (2014)
Saha, P., Ghorai, S., Tudu, B., Bandyopadhyay, R., Bhattacharyya, N.: Optimization of sensor array in electronic nose by combinational feature selection method. In: Mason, A., Mukhopadhyay, S.C., Jayasundera, K.P., Bhattacharyya, N. (eds.) Sensing Technology: Current Status and Future Trends II. SSMI, vol. 8, pp. 189–205. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-02315-1_9
Scott, S.M., James, D., Ali, Z.: Data analysis for electronic nose systems. Microchimi. Acta 156(3), 183–207 (2006). https://doi.org/10.1007/s00604-006-0623-9
Sun, H., et al.: Sensor array optimization of electronic nose for detection of bacteria in wound infection. IEEE Trans. Ind. Electron. 64(9), 7350–7358 (2017)
Vergara, A., Vembu, S., Ayhan, T., Ryan, M.A., Homer, M.L., Huerta, R.: Chemical gas sensor drift compensation using classifier ensembles. Sens. Actuators B Chem. 166, 320–329 (2012)
Wei, G., Zhao, J., Yu, Z., Feng, Y., Li, G., Sun, X.: An effective gas sensor array optimization method based on random forest. In: 2018 IEEE SENSORS, pp. 1–4. IEEE (2018)
Xiang, S., Nie, F., Meng, G., Pan, C., Zhang, C.: Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans. Neural Netw Learn. Syst. 23(11), 1738–1754 (2012)
Yan, J., et al.: Electronic nose feature extraction methods: A review. Sensors 15(11), 27804–27831 (2015)
Zhou, J., Welling, C.M., Kawadiya, S., Deshusses, M.A., Grego, S., Chakrabarty, K.: Sensor-array optimization based on mutual information for sanitation-related malodor alerts. In: 2019 IEEE Biomedical Circuits and Systems Conference (BioCAS), pp. 1–4. IEEE (2019)
Ziyatdinov, A., Fonollosa, J., Fernandez, L., Gutierrez-Galvez, A., Marco, S., Perera, A.: Bioinspired early detection through gas flow modulation in chemo-sensory systems. Sens. Actuators B Chem. 206, 538–547 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, L., Tian, F., Qian, J., Liu, R., Jiang, A. (2022). Robust Sparse Learning Based Sensor Array Optimization for Multi-feature Fusion Classification. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13532. Springer, Cham. https://doi.org/10.1007/978-3-031-15937-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-15937-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15936-7
Online ISBN: 978-3-031-15937-4
eBook Packages: Computer ScienceComputer Science (R0)