19.1 Introduction

So far, there are many types of chemometric methods used in spectral analysis. For spectral analysts, it is relatively easy to master the basic principles of these methods, but turning these algorithms into applications requires proficiency in mathematics, statistics, and advanced programming skills. The development of chemometrics software and toolkits plays a very crucial role in the popularization and application of analysis techniques such as spectroscopy combined with chemometrics. Mastering this software can solve most of the problems in practical applications. Spectrometer hardware and software (mainly including spectrum acquisition software and chemometrics software) constitute the technical platform of modern spectroscopic analysis. The above chapters of this book have given a detailed introduction of the common chemometrics involved in modern spectroscopy techniques and their latest developments. The following in this chapter mainly introduces the basic structure, functions, and commercial software and toolkits of chemometrics software.

19.2 Basic Structure and Functions of Software

The chemometric software used for spectral analysis is mainly to establish calibration models and predict unknown samples. As shown in Fig. 19.1, in terms of structure, this type of software usually consists of three parts: sample set managing, calibration, and blind sample prediction. Sample set managing is to stack the spectral data and reference data into a matrix to form a sample set file that can be used for model establishment and validation. Calibration refers to the establishment of a quantitative or qualitative calibration model. Commonly used chemometric algorithms such as spectral preprocessing algorithms, multivariate calibration, and classification algorithms are all concentrated in this module. Blind sample prediction is to use the built model to calculate the concentration or property data of the unknown sample.

Fig. 19.1
figure 1

Scheme of chemometrics software for spectral analysis

  1. (1)

    Samples Managing

The main function of calibration set managing is to stack the spectra of a group of samples and reference data into a matrix to form a database. Thus, the sample set managing should be able to identify and call common spectral file formats, and input reference data in different ways. Calibration set managing usually also has the function of selecting samples to form a representative calibration set and validation set. Moreover, the real-time spectra and spatial distribution diagram of the sample can be displayed on this interface to determine extremely outlier spectra, and the concentration value of the sample can be statistically analyzed. Calibration set managing is supposed to be an open interface, and be easy to add and delete samples.

  1. (2)

    Calibration Establishment

The function of establishing a calibration model is the core function of chemometrics software, which is divided into two types: establishing the qualitative and quantitative model. Both types include three steps: spectral preprocessing, spectral range selection, and method selection. After establishment, the model should be evaluated and optimized by visual operation.

Commonly used spectral preprocessing algorithms include baseline correction (first and second derivatives, subtraction), smoothing, multiplicative scatter correction, standard normalization of vector, standardization, centralization, etc. Commonly used quantitative calibration algorithms usually include MLR, PCR, PLS, SVR, ANN, etc. Qualitative algorithms mainly include cluster analysis, KNN, SIMCA, etc. Spectral range or interval selection generally adopts a visual interactive mode, which can be directly conducted on the spectra with the mouse or can be automatically selected by parameters such as correlation coefficients.

View analysis after modeling is very important for judging whether the model is acceptable or not and removing outliers, generally including PRESS diagram, regression curve, spectral residual distribution, score and loading diagram, etc. At the same time, the evaluation results such as SEC, SECV, and R2 should be observable. According to ASTM E1655, three types of outliers in the calibration set, such as Mahalanobis distance outliers, property residual outliers, and spectral residual outliers, should be eliminated during modeling. Therefore, the software needs to provide corresponding view analysis functions.

External validation is the main way to test whether the model is reasonable. Model validation can provide multiple statistical parameters (such as RMSEP, RPD, t-test, etc.), as well as the comparison of measured and predicted values so as to evaluate the pros and cons of the model.

Some software has the function of the automatic output of modeling parameters, such as spectral preprocessing parameters, PLS main factors, spectral interval, etc. Generally, this function is only for reference, the final model parameters still need to be determined by the users based on the necessary chemical knowledge.

  1. (3)

    Prediction

The main function of the predictive module is to perform predictive analysis on the unknown samples. As shown in Fig. 19.2, when calculating, the spectra of the unknown sample are first preprocessed by the saved preprocessing parameters, and then the calibration method and setup parameters are run for calculation. Quantitative models generally need to determine whether unknown samples are within the model range, such as Mahalanobis distance, spectral residuals, and the nearest-neighbor distances. Prediction results are usually displayed directly or output to the corresponding file in the form of a report.

Fig. 19.2
figure 2

Basic steps of predictive analysis of unknown samples

19.3 Common Software and Toolkits

Nowadays, almost all large-scale spectrometer manufacturers, especially near-infrared spectroscopy suppliers, have developed dedicated chemometric software, such as FOSS WinISI, Thermo TQ Analyst, Bruker OPUS, Metrohm Vision, Buchi NIRCal, etc.

Some chemometric calculation software includes the Unscrambler of Norway Camo, Solo of Eigenvector Research of the U.S., and the PLS_Toolbox developed based on Matlab, Pirouette of InfoMatrix of the U.S., and the SIMCA MVDA of Sartorius of Germany, etc. There is also chemometrics software developed by some universities, such as the ParLeS software of the University of Sydney, Australia [1], Caunir of China Agricultural University, RIPP software of SINOPEC Research Institute of Petroleum Processing, etc.

Commercial chemometrics software can solve most of the problems encountered in daily analysis, and plays an important role in the popularization and application of modern spectroscopic technology. However, the updates of commercial software would be relatively slow. As well, the improvement of new algorithms or classic algorithms sometimes requires users’ programming. The commercialization of MATLAB, R, and Python significantly provides great convenience for the program implementation of chemometric algorithms. There have been many commercial or open assess chemometrics software and toolkits, such as the PLS Toolbox based on MATLAB, the mdatools based on R language [2], the scikit-learn toolkit based on Python [3], etc.

MATLAB software comes with many toolboxes that can be directly or slightly modified for spectral analysis, such as statistics and machine learning toolbox, wavelet toolbox, neural network toolbox, deep learning toolbox, global optimization toolbox, optimization toolbox, etc.

Table 19.1 is some MATLAB toolbox and open source code of certain algorithms written by chemometrics researchers [4,5,6,7,8,9,10,11,12,13]. The emergence of these toolboxes has greatly promoted the application research of new algorithms in chemometrics [21, 22].

Table 19.1 Some MATLAB toolboxes that can be used for chemometrics