1 Introduction

1.1 Color Fading Ozonation: A Textile Finishing Process

In recent years, textile products with faded effect, worn look and vintage style, are attracting a growing number of young customers’ attention and have gained a considerable share of the fashion market [1]. However, the faded effect of these products is achieved by a textile finishing process consisted by a large number of chemical treatments (e.g. bleaching using hydrogen peroxide or chlorine, washing using stone/permanganate). The traditional use of these chemical treatments is not only highly water- and power-consuming, but also release a wide range of toxic substances to the environment. Over the past few decades, the rising concern of people on the environment issues has leaded to the rapid development of alternative sustainable approaches, where ozonation is the most important one of its in textile finishing.

Ozone is an excellent gaseous oxidant with an environmentally friendly nature that can be rapidly decomposed into O\(_2\) after its application without emitting additional pollution. It is able to react with a large number of organic and inorganic substances in water due to a series of intermediates or by-products such as hydroxyl radicals (which reacts with no selectivity) may be generated in the reaction between ozone and water [2]. More significantly, ozone could be applied directly in the form of gas without a water bath that react with the targets (with certain water content directly and consequently dramatically decrease the water consumption in the sector. Meanwhile, approximate performance of conventional treatments on color fading effect can be obtained in the ozonation with less damage and other negative influence on target product materials [3]. Therefore, it is regarded as a perfect alternative to traditional oxidizing agents and bleaching agents and has been applied to a wide range of textile related domains such as wastewater treatment, dyeing, paper-making, fiber modification. The studies regarding color fading dyed textile using ozone, instead of the conventional processes, have been increasingly reported in these years by taking advantage of the application of ozone [4].

The decolorization of dyes in ozonation, in short, could be attributed to the simultaneous oxidation of direct ozone and (but more of) indirect free radicals (which is generated from the decomposition of ozone) with the unsaturated organic compounds, chromophoric organic system, e.g. the chromophore groups of azo. However, the real situation of the actions in the application of ozonation for color fading dyed textile is very complicated that can hardly be concluded in couples of sentences. Color fading ozonation of textiles is affected by many interdependent different factors ranging from the properties of textile material to the setting of color fading process [5]. How these factors affect the color fading process separately is known, while to understand their overall impacts simultaneously, the complex and nonlinear relationship between the factors of material properties as well as technical parameters of ozonation and color fading effects must be taken into consideration. In previous literatures, the simultaneous effects of multiple factors on color fading ozonation of textiles have been barely systematically investigated. This is because the factors have an extreme nonlinear and hardly-understood relation as well as unclear effects on the target product properties, analytical or mathematical expression in terms of models relied on chemical or physical laws with certain simplified assumptions for understanding the mechanism is limited in this regard. To address this large number input and output parameter issue, using artificial intelligent techniques that can learn from data would be more effective and applicable.

1.2 Artificial Intelligent Techniques for Modeling Textile Process

As the computing power has increasingly promoted in these years, the related intelligent modeling techniques are well developed, while as the correlation between wanted inputs and outputs in textile manufacturing process are hardly to be characterized, the applicable techniques mostly were concentrated in multiple linear or nonlinear regression models, ANN etc. ANN is a widely used artificial intelligence approach in the textile sector. It is a method inspired by the bionic simulation of human brain that interconnected numerous neurons in different hidden layers to process the complex information of specific input-output relation [6]. In particular, ELM is a novel algorithm for SLFNs, which randomly chooses W and analytically determines \(\beta \) of SLFNs. ELM tends to acquire a good generalization performance at extremely fast learning speed [7]. Sun et al. have successfully applied ELM to forecast sales behavior in the fashion retailing, and the experimental results in their study demonstrated that this ELM model performed better than backpropagation neural networks based methods [8].

SVM is also a popular machine learning tool based on artificial intelligence for classification and regression based on statistic learning theory, first identified by Vladimir Vapnik and his colleagues in 1992 [9]. SVR is the most common application form of SVM and a typical feature of it is that SVR minimizes the generalized error bound instead of minimizing the observed training error so as to achieve generalized performance. And it only relies on a subset of the training data due to the cost function for building the model neglects any training data that is close (within \(\varepsilon \)) to the model prediction [10]. The excellent use of SVR in textile industry has been issued for predicting yarn properties [11, 12], PU-coated cotton fabrics qualities [13] and wool knitwear pilling propensity [14], which have shown the potential of SVR in the application of textile process modeling.

RF is another famous artificial intelligence based model technique that composed of a weighted combination of multiple regression trees. It constructs each tree using a different bootstrap sample of the data, and different from the decision tree splitting each node using the best split among all variables, RF using the best among a subset of predictors randomly chosen at that node [15]. In general, combining multiple regression trees increases predictive performance. It accurately predicts by taking advantage of the interaction of variables and the evaluation of the significance of each variable [16]. Kwon et al. [17] developed a surface defect detection method based on RF to inspect the fabric surface. Venkatraman and Alsberg [18] predicted the important photovoltaic properties of phenothiazine dyes using RF which paves the way for rapid screening of new potential dyes and computer-aided materials design.

1.3 Modeling Color Fading Ozonation of Reactive-Dyed Cotton

Cotton is the most vital material in textile industry, and reactive dyes take the dominant position in cotton dyeing as it is easy to achieve optimum dyeing performance on cotton with environmentally-friendly advantages without high cost and complicated processes. These have benefited the reactive-dyed cotton to be one of the most important textile products in the fashion market. Color is the most important property of a textile product which finally determined by the finishing process. According to Kubelka-Munk theory [19], it is known that K/S value can indicate the color depth of textile products. While \(L^*, a^*, b^*\) values (or CIELab), an international standard widely used for color measurements, is capable of illustrating the color variation of textile samples. Among these colorimetric values, L\(^*\) (ranges from 0 to 100) is the lightness component, whereas \(a^*\) and \(b^*\) are chromatic components and demonstrate the color variation from green (\(\bigtriangleup a^{*}<0\)) to red (\(\bigtriangleup \)a\(^{*}>0\)) and blue (\(\bigtriangleup \)b\(^{*}<0\)) to yellow (\(\bigtriangleup \)b\(^{*}>0\)) respectively by a series of numbers from \(-120\) to 120. Normally, the color of the final textile product agreeing with specific K/S and \(L^*, a^*, b^*\) values is in the acceptable tolerance of the consumer. Therefore, the K/S and \(L^* a^* b^*\) values could be used to characterize the color variation of the color fading ozonation on reactive-dyed cottons.

An attempt is made for modeling color fading ozonation, a textile finishing process, in order to predict the color properties of ozone faded reactive-dyed cotton using different artificial intelligent techniques. ELM, SVR and RF model were constructed with corresponding optimization process to comparatively find the potential applicability of them in predicting the color performance of the reactive-dyed cotton in a textile finishing process named color fading ozonation. Part of this work can also be found in Ref. [20].

2 Experimental

2.1 Material

Desized grey cotton fabrics (\(^3\)/\(_1\) twill; 325.7 g/m\(^2\); supplied by Shunfu, Hubei, China) were dyed by three bifunctional fluorotriazine azo reactive dyes of RB-RN, RR-2BL and RY-2RN (provided by Color Root, Hubei, China; commercial quality, purity of dyes: 92%) respectively. Chemical material such as sodium hydroxide, hydrogen chloride, sodium metasilicate nonahydrate, 30% hydrogen peroxide, sodium sulfate, sodium carbonate (analytical grade, supplied by Sinopharm Limited, China) and OP-20 (polyoxyethylene octylphenol ether, a nonionic surfactant, chemical pure, supplied by Tianjin Guangfu, China) were used in this study.

2.2 Apparatus

Ozone employed in this work was generated by a corona discharge ozone generator, CF-G50 (Guolin, China), that fed by pure and compressed dry oxygen (\(\ge \)99.9%, 1 Mpa, 12 L/min) from oxygen cylinder. Ozone was flow to the reactor (made of glass, the structure is exhibited in Fig. 1), and in each single color fading ozonation experiment, samples would be distributed evenly on the sample desk (made of air-permeable steel net). Ozone was imported with a gas flow of 2 L/min and a dosage of \(137\pm 3\) mg/L min (tested by UV meter NS-xmd614, Naishi, China) throughout the treatment. The exhaust from reactor would be collected and decomposed by a heater (\({\ge } 230\,^{\circ }\text {C} \)) before evacuating to the atmosphere.

2.3 Methods

2.3.1 Pretreatment and Dyeing of Cotton Fabrics

The cotton fabric was scoured with 8 g/L sodium hydroxide, 3 g/L OP-20 and 5 g/L sodium metasilicate nonahydrate at 100 \(^{\circ }\text {C}\) for 15 min at a liquor ratio of 20:1 and then was bleached with 8 g/L hydrogen peroxide, 3 g/L OP-20 and 5 g/L sodium metasilicate nonahydrate at 90 \(^{\circ }\text {C}\) for 15 min at a liquor ratio of 20:1. Afterward, rinsing the fabrics thoroughly before it was dyeing with 3% o.w.f (on weight of fabric) dyestuff (RB-RN, RR-2BL and RY-RN respectively), 70 g/L sodium sulfate, 20 g/L sodium carbonate and 2 g/l OP-20 at a liquor ratio of 20:1 based on the profile displayed in Fig. 2.

Fig. 1
figure 1

The reactor setup of ozonation

Fig. 2
figure 2

Dyeing profile of the reactive dyes

2.3.2 Ozonation Process

Three dyed cottons in different colors were treated respectively by the color fading ozonation following the steps: wetting the fabrics by deionized water (pH = 7, or using sodium hydroxide and hydrogen chloride respectively when specific pH is required) to obtain certain pick-up water content. After ozone treating, samples were rinsed by deionized water before naturally drying up.

Ozonation at different pH (1, 4, 7, 10, 13), temperature (0, 20, 40, 60, 80 \(^{\circ }\text {C}\)) with variable pick-ups (water content of sample, 0, 75%, 150%)) for different treating time (0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60 min) were investigated on the three dyed cotton fabrics (blue, red, yellow, 612 fabrics samples in total) respectively. Besides of pH which was set up depending on the method mentioned above (using sodium hydroxide and hydrogen chloride respectively in the water pick-up step), the temperature of ozonation was controlled by a water bath around the reactor (including the inlet tubes), and the pick-up of sample was calculated by the Eq. (1).

$$\begin{aligned} Pickup\,(\%)=\frac{W_s-W_0}{W_0} \times 100\%, \end{aligned}$$
(1)

where \(W_s\) and \(W_0\) are the weights of the wet pickup sample and corresponding non-wetting original sample respectively.

2.3.3 Analytical

Colorimetric values of K/S and \(L^*, a^*, b^*\) were tested by Datacolor 110 spectrophotometer (Datacolor, USA) from taking the average of four measurements located on different parts of two sides of each sample, within a relative error of 0.3. All of the samples were conditioned at \(21\,\pm \,1\) \(^{\circ }\text {C}\) with moisture of \(65\,\pm \,2\)% over 24 h before each ozonation process and the test experiment.

3 Algorithms of Intelligent Techniques and Structure for Modeling

3.1 Extreme Learning Machine

ELM is an algorithm of SLFNs randomly chooses W and analytically determines \(\beta \). Taking K hidden nodes SLFNs as an example, using activation function \(f(x)=(f_1(x),f_2(x),...,f_k(x))\) to learn N samples \((X_i,Y_i)\), where \(X_i=[x_{i1}, x_{i2},...,x_{in}]^T \in R_n\) and \(Y_i=[y_{i1}, y_{i2},...,y_{in}]^T \in R_m\). The ideal approximation of the SLFNs to these samples is zero error, which turns out that

$$\begin{aligned} \sum _{j=1}^N\Vert \hat{Y}_j - Y_j \Vert = 0 \end{aligned}$$
(2)

where \(\hat{Y}\) is the actual output value of SLFNs. Taking the weights W, \(\beta \) and bias b into consideration, we have

$$\begin{aligned} \sum _{i=1}^K \beta \cdot f_i(W_i \cdot X_j + b_i) = Y_j, j=1,...,N \end{aligned}$$
(3)

where \(W_i =[w_{i1}, w_{i2},..., w_{im}]^T\) and \({\beta }_i=[{\beta }_{i1},{\beta }_{i2},...,{\beta }_{im}]^T, i=1,...K\) are the weight vector for inputs and activated nodes respectively. \(b_i\) is the threshold of \(i_{\text {th}}\) hidden node. The compact expression of Eq. (3) terms of vectorization could be

$$\begin{aligned} H \beta = Y \end{aligned}$$
(4)

where H\((W_1,...,W_j,b_j,...,b_j,X_1,...,X_i)=f(W_j \cdot X_i+ b_j)\;(i=1,...,N\) and \(j=1,...,K)\) is the hidden layer output matrix of the neural network, the \(j_{\text {th}}\) column of it is the \(j_{\text {th}}\) hidden node output in regard to the inputs of \(X_1,...,X_i\). While \({\beta }=[{\beta }_1,{\beta _2},...,{\beta }_K]^T\) and \(Y=[Y_1,Y_2,...,Y_N]^T\) are the matrix of output weights and targets respectively. As the input weights W are randomly chosen, as well as the biases b in the ELM algorithm, the output weights \(\beta \) which connect the hidden layer and output layer could be simply determined by finding the least-square solution to the given linear system. According to [21], the smallest norm least-squares solution of the linear system (4) among all the solutions is

$$\begin{aligned} \hat{\beta }=H^\dagger Y \end{aligned}$$
(5)

where \(H^\dagger \) is the Moore-Penrose generalized inverse of the matrix H [22]. In this study, A multi-output ELM regression function developed by Huang’s group was used in this study with an optimal trial of the varied activation functions (i.e. Sigmoid, Sine, and Hardlim) given in Eqs. (6)–(8) and the number of hidden nodes (from 1 to 200) in the use of ELM.

$$\begin{aligned} Sigmoid(x)=\frac{1}{1+e^{-x}} \end{aligned}$$
(6)
$$\begin{aligned} Sine(x)=sin(x) \end{aligned}$$
(7)
$$\begin{aligned} Hardlim(x) = {\left\{ \begin{array}{ll} 1 \quad x\ge 0,\\ 0 \quad x<0 \end{array}\right. } \end{aligned}$$
(8)

3.2 Support Vector Machine

Compared with neural networks, SVR assures more generalization on the foundation of structural risk minimization, and generally performs better with less training samples. When we have training data \({(x_l, y_l),..., (x_l, y_l)} \subset \mathbb {R}^n \times \mathbb {R}\) for the SVR model, the targeted function g(x) should be as plat as possible and has \(\varepsilon \) deviation in maximum from the actual targets yi for all the training data in the form of:

$$\begin{aligned} g(x) ={<}w,x{>} + b; \quad w\in \mathbb {R}^n, b\in \mathbb {R} \end{aligned}$$
(9)

where x is the n-dimensional input vectors, w is the weight vector and b is the bias term. Flatness in (9) means small w, and the way achieving it is recommended to minimize the Euclidean norm, i.e. \(\frac{1}{2}\Vert w\Vert ^2\) [23], which turns out to a convex optimization problem:

$$\begin{aligned} minimize \quad \frac{1}{2}\Vert w\Vert ^2 \qquad \ \qquad \qquad \end{aligned}$$
(10)
$$\begin{aligned} subject \ to \left\{ \begin{aligned} y_i-{<}w,x_i{>}-b\le \varepsilon \\ {<}w,x_i{>}+b-y_i\le \varepsilon \\ \end{aligned} \right. \end{aligned}$$
(11)

This is a feasible optimization problem when the function g(x) actually exists and approximates all pairs \((x_i, y_i)\) with \(\varepsilon \) precision, and \(\xi _i, \xi _i^*\) were introduced to deal with the otherwise infeasible constraints of it [9],

$$\begin{aligned} minimize \quad \frac{1}{2}\Vert w\Vert ^2 + C\sum _{i=1}^l (\xi _i + \xi _i^*) \ \ \ \quad \end{aligned}$$
(12)
$$\begin{aligned} subject \ to \left\{ \begin{aligned}&y_i-<w,x_i>-b\le \varepsilon +\xi _i \\&<w,x_i>+b-y_i\le \varepsilon +\xi _i^* \\&\xi _i,\xi _i^* \ge 0 \\ \end{aligned} \right. \end{aligned}$$
(13)

where C is a constant greater than 0, determines the trade-offs of \(\frac{1}{2}\Vert w\Vert ^2 \) and the sum of permitted errors. It is found that dual formulation makes it easily to solve this optimization problem [24], a standard dualization method utilizing Lagrange multipliers has been proposed:

$$\begin{aligned} \begin{aligned}&L=\frac{1}{2}\Vert w\Vert ^2 +C \sum _{i=1}^l (\xi _i + \xi _i^*) - \sum _{i=1}^l \alpha _i(\varepsilon + \xi _i -{<}w,x_i{>} + b) \\&- \sum _{i=1}^l\alpha _i^*(\varepsilon +\xi _i^*+y_i - {<}w,x_i{>}-b) -\sum _{i=1}^l (\eta _i \xi _i + \eta _i^* \xi _i^*) \\ \end{aligned} \end{aligned}$$
(14)

where \(\eta _i, \eta _i^*, \alpha _i^*, \alpha _i^*\) have to satisfy positivity constraints of \({\ge }\)0. The partial derivatives of L with respect to the variables \((w, b,\xi _i, \xi _i^*)\) have to vanish for optimality.

$$\begin{aligned} \partial _bL = \sum _{1=1}^l (\alpha _i^* -\alpha _i)=0 \end{aligned}$$
(15)
$$\begin{aligned} \partial _wL = w - \sum _{i=1}^l (\alpha _i - \alpha _i^*)x_i =0 \end{aligned}$$
(16)
$$\begin{aligned} \partial _{\xi _i^{(*)}}L = C- \alpha _i^{(*)} - \eta _i^{(*)} =0 \end{aligned}$$
(17)

here \(\xi _i^{(*)}\), \(\alpha _i^{(*)}\) and \(\eta _I^{(*)}\) refers to \(\xi _i\) and \(\xi _i^*\), \(\alpha _i\) and \(\alpha _i^*\), \(\eta _i\) and \(\eta _i^*\), respectively. Substituting (15)–(17) into (14), the dual optimization problem is given by

$$\begin{aligned} maximize \left\{ \begin{aligned} -\frac{1}{2}\sum _{i,j=1}^l (\alpha _i - \alpha _i^*)(\alpha _j -\alpha _j^*){<}x_i, x_j{>} \\ -\varepsilon \sum _{i=1}^l(\alpha _i +\alpha _i^*) + \sum _{i=1}^l y_i (\alpha _i - \alpha _i^*) \\ \end{aligned} \right. \end{aligned}$$
(18)
$$\begin{aligned} subject \ to \sum _{i=1}^l(\alpha _i - \alpha _i^*) = 0 \quad and \quad \alpha _i , \alpha _i^* \in [0,C] \end{aligned}$$
(19)

As the dual variables \(\eta _i, \eta _i^*\) can be reformulated on the basis of (18), (19) as \(\eta _i^{(*)} = C - \alpha _i^{(*)}\), (16) turns to

$$\begin{aligned} w=\sum _{i=1}^l (\alpha _i - \alpha _i^*) x_i , \quad thus \quad g(x)=\sum _{i=1}^l (\alpha _i - \alpha _i^*){<}x_i,x{>} +b \end{aligned}$$
(20)

This is a so-called \(Support \ Vector \ expansion\). In SVM training algorithm, the next necessary step is to make it nonlinearly, which was suggested to be achieved by a mapping \(\phi (x)\) from \(\mathbb {R}^n\) to a higher dimensional feature space using kernel function \(K(x,x_i )={<}\phi (x_i ),\phi (x){>}\), therefore (20) becomes

$$\begin{aligned}&w=\sum \limits _{i=1}^l (\alpha _i - \alpha _i^*)\phi (x_i ), \end{aligned}$$
(21)
$$\begin{aligned}&g(x)=\sum \limits _{i=1}^l (\alpha _i - \alpha _i^*)k(x_i,x)+b \end{aligned}$$
(22)

It is different from the linear case as w means the flatness is no longer explicitly given. In this nonlinear case, the optimization problem refers to finding the flattest function in feature space, rather than in input space. The standard SVR is

$$\begin{aligned} g(x)=\sum _{i=1}^N (\alpha _i - \alpha _i^*)k(x_i,x)+b \end{aligned}$$
(23)

where N (should be less than the total number of input-output pairs) is the number of input data having nonzero values of \(\alpha _i^{(*)}\). The kernel function \(k (x_i, x)\) corresponds to a linear dot product of the nonlinear mapping. As we are disposing of a case of the process modeling containing multiple outputs, we applied a multi-output least-squares support vector regression (MLS-SVR) toolbox developed by Xu et al. [25]. Particularly in this study with an optimal trail on kernel functions of:

$$\begin{aligned} \textit{Linear:} \quad K(x,x_i) = x^Tx_i + C \end{aligned}$$
(24)
$$\begin{aligned} \textit{Sigmoid:}\quad K(x,x_i) =tanh(\alpha x^Tx_i +C) \end{aligned}$$
(25)
$$\begin{aligned} \textit{Polynomial:}\quad K(x,x_i) = {<}x,x_i{>}^p \end{aligned}$$
(26)
$$\begin{aligned} \textit{Radial basis function:} \quad K(x,x_i) = e^{-\frac{\Vert x-x_i\Vert ^2}{2\sigma ^2}} \end{aligned}$$
(27)
$$\begin{aligned} \textit{Exponential radial basis function:} \quad K(x,x_i) = e^{-\frac{\Vert x-x_i\Vert }{2\sigma ^2}} \end{aligned}$$
(28)

There is also an optimization process (LOO) of the parameters of \(\gamma , \lambda \) and p in the toolbox from where \(\gamma \in \lbrace 2^{-5}, 2^{-3},..., 2^{15}\rbrace \), \(\gamma \in \lbrace 2^{-10}, 2^{-8},..., 2^{10}\rbrace \) and \(p \in \lbrace {2^{-15},2^{-13},..., 2...3}\rbrace \) (\({=}\frac{1}{2\sigma ^2}\)) [25].

3.3 Random Forest

RF is an ensemble-learning algorithm depending on the bagging method that combines multiple independently-constructed decision tree predictors to classify or predict certain variables [16]. In RF, successive trees do not rely on earlier trees; they are independently using a bootstrap sample of the dataset, and therefore a simple unweighted average over the collection of grown trees \({h(x,\Theta _k)}\) would be taken for prediction in the end.

$$\begin{aligned} \bar{h}(X)+ \frac{1}{K} \sum _{k=1}^K h(X,\Theta _k) \end{aligned}$$
(29)

where \(k=1,...,K\) is the number of trees, x represents the observed input vector, \(\Theta \) is an independent identically distributed random vector that the tree predictor takes on numerical values. RF algorithm starts from randomly drawing ntree bootstrap samples from the original data with replacement. And then grow the certain number of regression trees in accordance with the bootstrap samples. In each node of regression tree, a number of the best split (mtry) randomly selected from all variables are considered for binary partitioning. The selection of the feature for node splitting from a random set of features decreases the correlation between different trees and thus the average prediction of multiple regression trees is expected to have lower variance than individual regression trees [26]. Regression tree hierarchically gives specific restriction or condition and it grows from the root node to the leaf node by splitting the data into partitions or branches according to the lowest Gini index:

$$\begin{aligned} I_G(t_{X(X_i)}) = 1 - \sum _{J=1}^M f(t_{X(X_i)}, j)^2 \end{aligned}$$
(30)

where \(f(t_{X(X_i)}, j)\) is the proportion of samples with the value \(x_i\) belonging to leaf j as node t [27]. In present study, MRF developed by Raziur Rahman et al. [5] was employed with an optimal topology of three parameters in terms of ntree, minleaf and mtry.

Fig. 3
figure 3

The demonstration of the a front side and b back side of real ozone treated cotton samples

3.4 Modeling Structure

In this work, constructed model is expected to be capable of predicting (or outputting) the color qualities of ozone treated samples in terms of K/S and \(L^*, a^*, b^*\) values by giving 5 variables including not only the specific color of treated fabric but also the process parameters of pH, temperature, pick-up and treating time. In other words, the anticipated model of color fading ozonation on reactive-dyed cotton realizes the complex and unclear relationship of color fading ozonation parameters and its effectiveness on reactive-dyed cotton fabric in certain respects.

Particularly, taking the real samples used in the ozonation at pH7, 20 \(^{\circ }\)C with 150% pick-up over different time from 0 to 60 min which can be observed in Fig. 3 as an example, its corresponding \(K/S, L^*, a^*, b^* \) values are listed in Table 1. It is clearly noted that each treated sample has an obvious difference from others in regard to color properties as treated by various ozonation processes, which on the other hand apparently indicated how complex the process parameters influence the color of dyed cotton fabric in ozonation. Table 2 exhibites the variation of the minimum, maximum, average and standard deviation of the dataset we used in the process modeling.

A total number of 612 sets of data collected in the experiment were divided into two groups, namely training data and testing data, where 75% of it was used for training and the remaining 25% was used for testing according to the general use of data division in the machine learning sector. As a result, 459 datasets (75%) were learnt by the models while the rest 153 datasets (25%) were distributed to test the model. The correlation of these factors (pH, Temperature, Pick-up and Time only as original color of fabric is not a continuous variable) to \(K/S, L^*, a^*, b^*\) values was estimated by Spearman rank correlation coefficients (based on (31)) and listed in Table 3. It is found that pH and treating time are slightly more relevant than temperature and pick-up in the ozonation that play a more important role.

$$\begin{aligned} \rho = 1 - \frac{6\sum d_i^2}{n(n^2-1)} \end{aligned}$$
(31)
Table 1 \(K/S, L^*, A^*, B^*\) values of samples shown in Fig. 3
Table 2 The maximum, minimum, average and standard deviation of ozonation parameters
Table 3 The maximum, minimum, average and standard deviation of ozonation parameters

where n is the number of observation, \(d_i\) is the difference between the ranks of corresponding variables. K-fold cross-validation (k \(=\) 10) was used in the modeling section. It is a popular statistical approach for estimating predictive models. Taking k \(=\) 10 as an example as it was the one we used in the modeling study, in which case 459 training sets of data would be divided randomly and equally into 10 disjoint folds, 9 folds of it would be split into training subset while the rest 1-fold would be used as validating subset. This procedure would be repeated 10 times with varied training and testing dataset at each time to validate the trained models. In order to evaluate the performance of models in validation, Mean Square Error (MSE) would be used based on:

$$\begin{aligned} MSE = \frac{1}{n}\sum _{i=1}^n(e_i - p_i)^2 \end{aligned}$$
(32)

where \(e_i\) is the real experimental results, whereas \(p_i\) is the predicted output of the specific model. Additionally, four statistical performance criteria, including MAE, RMSE, R and MRAE are used in this study for indicating the predictive performance of the obtained models.

$$\begin{aligned}&MAE=\frac{1}{n}\sum \limits _{i=1}^n\vert e_i - p_i\vert \end{aligned}$$
(33)
$$\begin{aligned}&RMSE =\sqrt{\frac{1}{n}\sum _{i=1}^n(e_i - p_i)^2} \end{aligned}$$
(34)
$$\begin{aligned}&R(e,p) = \frac{\sum _{i=1}^n(e_i-\bar{e})(p_i-\bar{p})}{\sqrt{\sum _{i=1}^n(e_i-\bar{e})^2\cdot \sum _{i=1}^n(p_i-\bar{p})^2}} \end{aligned}$$
(35)
$$\begin{aligned}&MRAE = \frac{1}{n}\sum \limits _{i=1}^n\frac{\vert e_i-p_i \vert }{e_i} \end{aligned}$$
(36)

The models’ development and constructing were carried out using MATLAB R2015b for multi-output ELM and MLS-SVR, but R studio for MRF respectively on a laptop (Core i7-4710, 2.5 GHz, 16 GB RAM). All of the original data was regularized to the range of [0, 1] before using.

4 Results and Discussion

4.1 Modeling Training

4.1.1 ELM Models

ELM models with hidden nodes from 1 to 200 activated by Sigmoid, Sine and Hardlim functions are investigated respectively (the corresponding validation MSE is illustrated in Fig. 4 with a detailed demonstration of the trained ELM models possessing nodes from 1 to 140 in detail). The overfitting situation of ELM activated by Sigmoid and Sine is easy to be observed that starts from the ones with nodes around 100. More specifically, it is noted that Sigmoid trained ELM models performed similarly to the ones trained by Sine since MSE of these models both dropped as well as minimized at the ones with around 50 nodes (MSE \(\approx \) 0.052) following by a dramatic enhancement. By contrast, validation MSE of Hardlim activated models performed generally stable with the growing number of nodes in the ELM model, but strictly a minimum of MSE \(\approx \) 0.069 (lager than Sigmoid and Sine) at the one with 97 nodes still can be discovered in Fig. 4. Similar comparative results of the use of these activation functions in ELM can be found as well in the work of Singh and Balasundaram [28].

Fig. 4
figure 4

Validation MSE of ELM models activated by different functions

The use of activation functions in an artificial neural network is to convert an input signal of the node to an output signal by mapping non-linear properties. It is very important for an ELM model to learn and achieve the complicated mapping of input and output data by activating the nodes with a certain activation function. The graph of the activation functions we used is given in Fig. 5. It is noted that Sigmoid and Sine have much in common with their S-shaped curve and both are infinitely differentiable functions which make them easy to be understood and applied. However, on the other hand, it may also result in their similar proximity and disadvantage in the ELM models as we can see their similar performance variation and the overfitting situation with the increasing nodes in Fig. 4. Hardlim performed least compared with Sigmoid and Sin in terms of their activated ELM models in this issue probably is owing to its oversaturation.

Fig. 5
figure 5

Activation functions of ELM

4.1.2 SVR Models

Multi-output SVR models with kernel functions of Linear, Sigmoid, Polynomial, RBF and ERBF were trained and developed using MLS-SVR toolbox. The corresponding results of minimum validation MSE are 0.05678, 0.00932, 0.08613, 0.00493 and 0.0092 respectively (as demonstrated in Fig. 6). It is worth noting that models trained with Linear kernel and Polynomial kernel are found that performed far poorly than the others. Performance of the ones with Sigmoid kernel and ERBF kernel are very close in a quite low level though the validation MSE of them is nearly two times than the SVR model with RBF kernel (which performed utmost in the comparison in this issue when its parameters are optimized to \( \gamma = 32{,}768\), \(\lambda = 9.7656\) \(e^{-4}\) and \(p = 0.125\). For more information regarding the LOO optimization process used in the toolbox for these kernel parameters see [25]).

The kernel function is to transform the data as input into the required form to facilitate a non-linear decision surface to be a linear equation in higher dimensions where the computational power of the learning machine is heightened. The type of kernel function used would influence many of the characteristics of the SVR model. A wide range of kernels exist and it is hard to explain their individual characteristics, but it is well known that RBF kernel is recommended to be tried first in an SVR model due to the fact that it not only possesses certain similar parameters and behaviors of Linear and Sigmoid but also has fewer hyper parameters than Polynomial to complex the model. RBF is assumed as having computational advantages over other kernels depending on its easier and faster to compute the kernel values [29]. The lowest MSE it achieved in this case validates its preferential suitability to be employed in this study, and it should be attributed to that we have not too many features in the model but with comparatively large numbers of observations.

Fig. 6
figure 6

Validation MSE of SVM models with varied kernel functions

RF models with different mtry (from 1 to 5), minleaf (from 1 to 10) and ntree (from 1 to 100) are trained and developed respectively, and the validation MSE of these models are given in Fig. 7 with a detailed demonstration of the ones \(mtry =1\) and ntree ranging from 1 to 100 excluding those which validation MSE higher than 0.026. In Fig. 7, the number of mtry in each regression tree node is found that plays a very significant role in affecting the models’ prediction accuracy of the color properties of ozone treated cotton fabrics.

The falling curves of MSE with growing number of mtry may reveal that the five inputs we used to construct these RF modes, i.e. (1) color of dyed cotton, (2) pH, (3) temperature, (4) pick-up, (5) treating time of ozonation process, have a very clear independent relation with each other. As a result, RF models with five randomly selected features generally lead the low validation MSE in this comparison. It is also found that ntree played another significant role in RF models as MSE of these models decreased dramatically when the number of trees increased in the forest from 1 to 30. In general, these models perform steadily when there are more than 30 regression trees in the forest construction no matter what are the mtry or minleaf employed, but in order to save time and cost less in the model training process, 10 trees forest is sufficient and may be more recommended to be used in the color fading ozonation of dyed textile predicting model for further experiments. However, unlike the mtry and ntree, minimum number of samples in the leaf node, i.e. minleaf seems to be preferable to be less though it is relatively uninfluential. Depending on the observation of the detailed-depicted MSE plots of \(1-mtry\) RF models in Fig. 7, we can see that the average MSE of achieved RF models generally enhanced when the number of leaves increased from 1 to 10.

Fig. 7
figure 7

MSE of RF models with varied number of features, leaves and trees

4.2 Prediction Performance

The quality of a model is not only determined by its ability to learn from the data but also its ability to predict unseen data. Which two are so called learning capacity and generalization ability of a model. Models are seldom good in both of these two capacities. According to the training results obtained above, we can find out that Sine and Sigmoid trained ELM models have very similar performance that both optimized at 50 nodes, while SVR with RBF kernel function and RF as \(mtry =5\), \(ntree =10\), \(minleaf = 2\), by contrast, clearly precede to all the others in their training process respectively. In order to further comparatively investigate the potential application of these three techniques without losing significant observations, the two ELM models were taken into account together with the RBF-SVR and the optimized RF (\(mtry =5\), \(ntree =10\), \(minleaf =2\)) in this section.

To estimate and compare these optimized models, the prediction test using the testing dataset (which has not been used in the training and validation processes) is carried out. Table 4 presents a comparison of the prediction performance of ELM, SVR and RF models. It is found that, in general, ELM models using activation functions of Sigmoid (MSE = 0.0172) and Sine (MSE = 0.0173) do not make any big difference in regard to their prediction performance, but both of it are slightly poorer comparing with SVR and RF models. However, ELM models are the fastest-trained ones in the comparison, which means ELM model is still worth to be applied in certain resource-limited cases especially while limited training time is concerned. The most accurately-predicted model we can see, according to the finding in Table 3, is RF as it leads to the least testing error with higher R (0.9847) and less MSE (0.0036), MAE (0.0295), RMSE (0.0601) and MRAE (0.0062). However, it is also noted that training RF model requires a much longer time than the others (21s). As a result, it is worth taking the SVR models into account as it achieved the second lowest error (R = 0.9777, MSE = 0.0043, MAE = 0.0429, RMSE = 0.656, MRAE = 0.0109) with a more acceptable shorter training time (0.9360 s).

Table 4 Prediction performance of optimized models
Fig. 8
figure 8

Predicted data outputted by ELM (trained by Sigmoid and Sine respectively), SVR and RF versus experimental data

Table 4 demonstrates the overall performance of the constructed models in terms of certain estimation evaluation indexes, but the detail of these predictions is neglected. As known that the constructed models possess four outputs, i.e. \(K/S, L^*, a^*, b^*\) values of reactive-dyed cotton fabrics treated in the color fading ozonation. How these predictive models work in detail with them is unclear. In order to reflect the real prediction performance (using testing data) of each trained model on predicting each single output separately, the predicted results range from output1 (k/s value) to output4 (\(b^*\)) versing real experimental data (target 1–4) is illustrated in Fig. 8a, b, c, d respectively.

In Fig. 8, the predicted values of models generally agree with the actual values, though the predictive errors varied in different levels for different models. As we can see that the gap of models’ prediction performance is not that significant in Table 5 (taking MSE as an example, Sigmoid activated ELM = 0.0172, Sine activated ELM = 0.0173, SVR = 0.0043, RF = 0.0036), while the distribution of errors in terms of each single output prediction is observed that has a larger gap in the real application. In Fig. 8a, c, certain predicted data of ELM models can be clearly seen that is far different from the real target data in certain range, which situation would result in a big mistake in certain prediction application where good overall performance of the average of multiple outputs may hinder the discovery of a wrong prediction on specific single output. According to the linear fitting correlation coefficients of predicting data versing real experimental data (demonstrated in Fig. 6) listed in Table 5, the testing result obtained reveals that SVR (R\(^2 = 0.9505\)) model and RF model (R\(^2=0.9555\)) are actually more stable and suitable than ELM models (R\(^2=0.8025\) and 0.8007 for Sigmoid and Sine activated respectively) in modeling color fading ozonation of dyed textile, in terms of overall prediction performance, and more importantly predicting multiple outputs without deviation on certain single output. This may attribute to the features of data we used concerning color fading ozonation of dyed textile. While on the other hand, it could also attribute to a disadvantage of ELM that it completely relies on increasing the number of nodes to promote the predicting performance, which makes it risky to be applied in a complicated issue such as present investigation. The results also reveal that both SVR and RF can well deal with the interaction of variables and are comparatively more stable in multi-variable nonlinear modeling.

Table 5 Correlation coefficience of data in Fig. 8

5 Conclusion and Prospective

In this chapter, three artificial technique modeling techniques i.e. ELM, SVR and RF were used to model the ozonation process for predicting the color properties of ozonated reactive-dyed cottons. The potential applicability of these models in the use of this textile finishing process modeling was estimated.

Color fading of dyed textile is a very vital process in the textile industry to obtain certain stylish effects on the product, which process has been increasingly used in last decades. Ozonation is a novel technology developed in recent years to be employed to achieve the color fading effect of textile with high performance not only in the respect of efficiency and quality but also in regards to the environmental sustainability. For the purpose of getting better understanding and application of color fading ozonation of textile in industrial scale, the complexity and nonlinearity of the factors and impacts of color fading ozonation on reactive-dyed cotton were investigated by process modeling. The effects of ozonation in terms of pH, temperature, water pick-up, treating time of process and dyed colors of fabrics on the color fading performance in terms of \(K/S, L^*, a^*, b^*\) values of reactive-dyed cotton were modeled using ELM, SVR and RF respectively. The finding results denoted that both of SVR and RF are potential applicable candidates for modeling the color fading ozonation process of dyed textile, as the predicted results of it on the ozonation process had a good agreement with the actual data entirely as well as individually. But taking the training time and cost as a consideration, SVR model would be more recommended than RF to be applied in real use. By contrast ELM models performed poorer in the prediction and were very unstable in terms of predicting certain individual output in multi-variable process modeling.