1 Introduction

Froth flotation is a well-established beneficiation process that is widely used for the separation of the value from the unwanted gangue minerals in mineral processing plants [1]. Despite extensive research in modeling and simulation of the flotation process and recent developments in instrumentation and process control systems, optimizing control systems (model-based controllers and expert systems) has not been successfully implemented in flotation plants [2].

Previous studies have shown that the froth visual and textural features are closely related to working conditions and the performance of the flotation process [3,4,5,6,7,8]. Hence, several soft sensors were developed in the last few decades to measure the froth visual and textural characteristics in flotation plants. These sensors use machine vision learning algorithms and process data for modeling and predicting the flotation performance [3, 4, 9,10,11,12,13,14,15].

Advances in machine learning and computer systems led to the development of deep learning algorithms. Deep learning algorithms are capable of capturing relevant features from an image at different levels similar to a human brain [16, 17]. The convolutional neural network is a class of deep neural networks that have been successfully applied for image processing of froth flotation [18,19,20,21,22]. In a machine vision system for froth flotation, convolutional neural networks are typically used to directly and continuously extract features from froth images, which often yield better results than traditional approaches. The extracted features can be used for the classification of froth images or prediction of the flotation conditions and performance [18, 22,23,24].

The predominant advantage of convolutional neural networks over traditional fully connected neural networks is that they are specifically designed for image input. This is mainly due to a series of kernel filters employed to extract fundamental features. The kernels in the first convolutional layer are used to extract the low-level features (edges and lines), while the kernels in the next layers capture the high-level features [25]. The classification performance of convolutional neural networks improves by increasing the number of network layers, but at the expense of more complexity [20].

Another approach is to use convolutional neural networks trained on a different image database (ImageNet) for the extraction of features from flotation froth images [26]. This would be beneficial because the pre-trained model requires less training and less effort to build a model of the process. Although the pre-trained model may be as accurate as or even more accurate than a custom-built convolutional neural network, it saves huge efforts and time required to make the model [15, 21, 24, 25].

To the author’s knowledge, to date, a few studies have been published on using deep convolutional neural networks for monitoring and prediction of flotation performance. Horn et al. [27] compared the feature extraction quality of platinum froth flotation images by convolutional neural networks and some traditional feature extraction algorithms. Their findings indicated that the performance of convolutional neural networks was similar to the other feature extraction algorithms. Fu and Aldrich [20] used a pre-trained convolutional neural network (AlexNet) to extract features from froth images. These features were then used in a random forest model to predict the flotation performance and conditions.

In a study by Wang et al. [28], deep convolutional neural network algorithms were used for feature extraction and classification of froth images of a gold–antimony rougher flotation cell to identify the working conditions of the flotation process. Li et al. [29] used a combination of a convolution neural network (as a feature extractor) and a support vector machine (as a predictor) for fault detection and recognition of flotation conditions. Zarie et al. [22] developed a 16-layer deep convolutional neural network for the classification of froth images of an industrial coal flotation column. The convolutional neural network model achieved an accuracy of 93.1% for the classification of froth images in terms of concentrate ash content and combustible recovery, which was much higher than the artificial neural network model.

Zhang et al. [14] built a soft sensor model based on convolutional neural network and memory networks for the prediction of concentrate Zn grade in a zinc rougher flotation bank. Zhang and Gao [15] established a soft sensor based on a hybrid deep neural network for the classification of iron tailings froth images. The proposed algorithm achieved a classification accuracy of 97%. Wen et l. [24] proposed a soft sensor based on convolutional neural networks for the prediction of concentrate ash content of an industrial coal flotation cell. They used various convolutional neural networks for the classification of froth images with different concentrate ash content.

In this study, several convolutional neural networks of five different architectures (AlexNet, GoogLeNet, VGGNet, ResNet, and SqueezeNet) pre-trained on the ImageNet database including over 1.2 million images were used to predict the metallurgical performance of two flotation systems. The first database was the image and metallurgical data obtained from a batch flotation cell operated over a wide range of process conditions. The second database was obtained from an industrial flotation column in a coal preparation plant. The pre-trained networks were used to extract features from the images, and these features were used to construct a model for predicting the metallurgical parameters associated with the froth images.

2 Convolutional Neural Networks (CNNs)

Artificial neural networks (ANNs) are fast, reliable, and robust computational tools for modeling complex non-linear systems like froth flotation. In recent times, ANNs have been used extensively for classification, clustering, pattern recognition, and prediction in many applications [6]. A convolutional neural network (CNN) is a class of feed-forward artificial neural networks that have been successfully applied for image analysis [30]. CNN is a deep learning algorithm that can extract fundamental features from images. Several successive convolutional layers of CNN extract image features and finally learn to classify images. CNNs typically consist of the input layer, convolutional layers, pooling layers, and fully connected layers. In addition to these, rectified linear units (ReLU) can be used for faster training and convergence of the network. The image data are inputted to the first layer. This is followed by a series of convolutional layers. The convolutional layers extract high-level features from the image. The pooling layers are typically used to reduce the dimensions of the feature maps and also to make computation faster because of the reduced number of learning parameters. The extracted features from the previous layers are finally compiled in the fully connected layers to form the final output [20, 21]. A typical architecture of a CNN for the classification of froth images is shown in Fig. 1 [22].

Fig. 1
figure 1

Typical structure of a CNN for classification of forth images [22]

3 Pre-trained Convolutional Neural Networks

The pre-trained CNN algorithms are robust feature extractors, which have been recently used for the analysis of the froth images in the froth flotation systems [20, 21]. The pre-trained CNNs can be used as transfer learning or feature extraction algorithms to speed up the learning process and eliminate the need to design the network. The pre-trained CNN model is regarded as the starting point for transfer learning. In this approach, the input image is presented to the network, and high- and low-level image features are extracted by multiple layers of neurons. The first layers extract the low-level features while the last layers extract high-level features. The deep layer’s parameters can be further tuned according to the new image database. Fine-tuning of parameters leads to accuracy improvement due to the learning of various sets of features. The principal challenge of this procedure is processing time which is greatly increased in the very large image dataset. Feature extraction is the next strategy benefiting from the potential of pre-trained CNNs. The new database is only passed through the pre-trained CNN, and image features are extracted based on adjusted parameters of the pre-trained CNN. In the froth flotation systems, these features can be subsequently used to predict the process conditions or performance [31].

The feature maps of the pre-trained networks are summarized in Table 1. A brief description of the pre-trained CNN algorithms utilized to extract features from the current froth image database is given in the following sections.

Table 1 Feature maps of pre-trained CNNs and number of used features in regression models

3.1 AlexNet (2012)

AlexNet is a deep CNN architecture designed by Krizhevsky, Sutskever [25]. AlexNet contains eight layers. The first five layers are convolutional layers, some of them followed by max-pooling layers, and the last three are fully connected layers. AlexNet uses the dropout technique to effectively prevent overfitting while training the network [32, 33].

3.2 GoogLeNet (2014)

GoogLeNet is a deep CNN algorithm developed by researchers at Google. This network has 22 layers deep (27 layers, including 5 pooling layers), 9 inception modules, a global average pooling (GAP) layer at the end of the last inception module, and over 7 million parameters. GoogLeNet is a variant of the inception architecture. Convolutional layers with different sizes along with pooling layers are applied to the input in parallel, and then, the resulting output is created by staking convolutional and pooling layers. Thus, different types of features are extracted in an inception module. A dropout layer is also used during training to avoid overfitting the network. GoogLeNet showed a relatively lower error rate compared with AlexNet in the classification of the ImageNet database [33]. The architecture of GoogleNet is shown in Fig. 2.

Fig. 2
figure 2

The architecture of GoogLeNet

3.3 VGGNet (2014)

VGG16 is a deep CNN model proposed by K. Simonyan and A. Zisserman at the University of Oxford [19]. It is composed of 13 convolutional layers (with 5 max-pooling layers) and 3 fully connected layers. The architecture of VGG16 is similar to AlexNet, but with a few more multiple \(3\times 3\) kernel-sized filters. VGG16 is a relatively large network including a total of around 138 million parameters. VGG-19 architecture is the same as VGG-16 except that it has 3 more additional convolutional layers. [34].

3.4 ResNet (2015)

ResNet is a residual neural network developed by He et al. [35]. Residual networks (ResNets) were introduced based on the key idea of identity connection which can skip one or more layers. ResNets use identity shortcut connections between layers which turn the network more efficient compared to the widely used feedforward CNNs. Every ResNet block contains two (ResNet-18 and ResNet-34) or 3 layers deep (ResNet-50, ResNet-101, and ResNet-152). The ResNet model has lower complexity than VGGNet because of fewer filters employed. ResNet-18, ResNet-50, and ResNet-101 architects were used in the current study.

3.5 SqueezeNet (2016)

SqueezeNet is a deep CNN algorithm that was originally introduced by researchers at DeepScale, the University of California, Berkeley, and Stanford University in 2016. The main goal of the development of the SqueezeNet model was to build a smaller neural network with fewer parameters without losing network performance. A SqueezeNet model is composed of convolution layers, fire modules, and pooling layers. The number of connections is significantly reduced with the use of fire modules. An important feature of SqueezeNet is the lack of fully connected layers, which are responsible for extracting high-level features from the input images [36, 37].

Among the pre-trained neural networks, AlexNet, GoogLeNet, and SqueezeNet have relatively similar architecture, which can be classified into the same category, while VGGNet and ResNet have different structures.

4 Case Studies

The first database was the image and metallurgical data obtained from a batch flotation cell operated over a wide range of process conditions [6]. The second database was obtained from an industrial flotation column in a coal preparation plant [18].

4.1 Laboratory Flotation Tests

In the first case study, image and metallurgical data captured from a batch flotation cell treating a copper sulfide ore were considered [6, 38]. The flotation cell was operated at different process conditions, and the froth surface was continuously filmed using a digital video camera (Fig. 3). A video camera and lighting system (a 50-W halogen lamp) were mounted at a distance of 20 cm above the froth surface. The froth images were captured at a rate of 25 frames per second and 3000 frames per test. The concentrate samples were collected at time intervals of 0.5, 2, and, 5 min. The copper recovery (R) at each time was calculated from the following equation:

$$R=\frac{C.c}{F.f}$$
(1)

where f and c are copper content of feed and concentrate samples (%), respectively, and F and C are the mass of feed and concentrate samples (g), respectively.

Fig. 3
figure 3

Laboratory flotation cell equipped with a video camera [6]

The metallurgical parameters and the froth images were analyzed in each run. The image and process data collected until 2 min were considered at different runs.

At first, each video recording was split into eight 15-s segments. Typical froth images at different flotation times are presented in Fig. 4. Then, features of the froth images were extracted using the pre-trained CNNs and the mean values of these features were computed for each 15-s segment. The large number of features extracted from the froth images were reduced by principal component analysis (PCA) [18]. The principal components were subsequently used to build a model for the prediction of the metallurgical parameters (copper grade and recovery). The generalization capability of the regression models was assessed by the leave-one-out cross-validation technique. In this method, \(n\) data samples are divided into \(n-1\) training and 1 testing data. Each time, the model is trained by \(n-1\) data sample and then validated by the remaining 1 testing data. The hold-out process is repeated \(n\) times to test all the data samples, and then, the error of all testing data is computed and used to evaluate the model [39].

Fig. 4
figure 4

Typical froth images of laboratory flotation cell at different flotation times

4.1.1 Prediction of Copper Grade and Recovery at \({\varvec{t}}=30\boldsymbol{ }{\varvec{s}}\) Using Pre-trained CNNs

The pre-trained CNN algorithms were applied to extract the froth features and these features were then used to make predictions of the copper recovery and the concentrate grade of the first batch run (t = 30 s) of the flotation system. Hence, the froth features were captured for the first two 15-s segments and the mean values were computed. Meanwhile, the copper content of the first-30 s concentrate was determined and the concentrate grade and the copper recovery were measured. Afterwards, two fitted linear regression models were used to predict the copper recovery and the concentrate grade at \(t=30\mathrm{ s}\) by using the features extracted from the first two 15-s segments. The goodness-of-fit of the developed models was determined by the coefficients of determination (\({R}^{2}\)), which were calculated from the following expression:

$${R}^{2}=1-\frac{\sum_{i=1}^{n}{({y}_{i}-{\widehat{y}}_{i})}^{2}}{\sum_{i=1}^{n}{({y}_{i}-\overline{y })}^{2}}$$
(2)

where \({y}_{i}\) and \({\widehat{y}}_{i}\) are the observed and the predicted values of the metallurgical parameters, respectively and \(\overline{y }\) is the mean value of the observed values. \({R}^{2}\) values are between zero and unity. \({R}^{2}=0\) indicates that the regression model does not fit to the data points, and \({R}^{2}=1\) denotes a perfect fit.

Figure 5 shows \({R}^{2}\) values of the regression models obtained from the extracted features of the first (\(t=0-15\mathrm{ s}\)) and the second (\(t=15-30\mathrm{ s}\)) batch runs to predict the copper recovery and the concentrate grade at \(t=30\mathrm{ s}\). The results indicate that the froth visual features extracted at \(t=15-30\mathrm{ s}\) are more correlated with the metallurgical parameters obtained at \(t=30\mathrm{ s}\). In other words, the prediction accuracy of the algorithms improves as the image capturing time approaches that of the froth sampling time.

Fig. 5
figure 5

Comparison of pre-trained CNNs algorithms for prediction of concentrate grade (up) and copper recovery (down) at \(t=30 \mathrm{s}\)

The accuracy of VGG-19 in predicting the copper recovery and the concentrate grade is superior to VGG-16. GoogLeNet significantly outperforms both AlexNet and SqueezeNet models. Among ResNet models, ResNet-101 is the most accurate algorithm for estimating metallurgical performances. Thus, based on the experiments and analysis presented in this paper, it was concluded that the deeper ResNet algorithms with more neural layers exhibit better prediction results compared to the shallower networks. Deeper networks are able to learn more complex and non-linear functions. A large amount of training data involved enables them to more easily predict complicated systems. Compared with all tested algorithms, GoogLeNet achieved the best performance and the highest accuracy on the current dataset.

4.1.2 Prediction of Copper Grade and Recovery at \({\varvec{t}}=30\boldsymbol{ }{\varvec{s}}\) Using GoogLeNet-ANN

Analysis of the first batch run data indicated that GoogLeNet was the most efficient algorithm for extracting features from the froth images. Furthermore, the image features extracted from the second 15-s segment showed a greater correlation with the metallurgical parameters. The froth features extracted by the GoogLeNet algorithm at \(t=15-30\mathrm{ s}\) were used to build an ANN model for predicting the flotation performance. The large number of features extracted from the froth images by the GoogLeNet algorithm was reduced by PCA (that explains 90% of total variance). The dataset was randomly partitioned into training (70%), validation (1.5%), and testing (15%) data. The number of neurons in the hidden layers were determined by trial and error. The structure of developed ANN models is listed in Table 2.

Table 2 Characteristics of developed ANN model

The correlation coefficient (\(R\)) and the root mean square error (\(\mathrm{RMSE}\)) were used as predictors to explain the performance of the ANN model in predicting the metallurgical parameters from the image features.

$$R=\frac{\mathrm{cov}\left({y}_{i},{\widehat{y}}_{i}\right)}{\sqrt{\mathrm{var}({y}_{i})\times \mathrm{var}({\widehat{y}}_{i})}}$$
(3)
$$\mathrm{RMSE}=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{({y}_{i}-{\widehat{y}}_{i})}^{2}}$$
(4)

Figure 6 shows scatter plots of the observed versus model-predicted values of the copper recovery and the concentrate grade obtained by ANN. The solid lines are the best regression lines fitted to the scatter points. The results of the performance evaluation of the developed ANN for predicting the metallurgical parameters are summarized in Table 3. The results demonstrate that the developed GoogLeNet-ANN algorithm can successfully predict the conditions and performance of the batch flotation system from the froth visual features.

Fig. 6
figure 6

Scatter plots of observed versus model-predicted values of copper recovery and concentrate grade at \(t=30\) obtained by GoogLeNet-ANN (testing data)

Table 3 Performance of GoogLeNet-ANN algorithms for predicting of the batch flotation performance at \(t=30\)

4.1.3 Prediction of Copper Grade and Recovery at \({\varvec{t}}=120\boldsymbol{ }{\varvec{s}}\) Using Pre-trained CNNs

The second concentrate sample was collected at \(t=120\mathrm{ s}\) after start of the run. The froth features were extracted at \(t=30-45\mathrm{ s}\), 45–60 s, 60–75 s, 75–90 s, 90–105 s, and 105–120 s, and the mean values for each 15-s segment was computed. The froth images were fed to the pre-trained CNNs algorithms, and the features were extracted. The extracted features were used as predictors in a model to estimate the copper recovery and the concentrate grade at \(t=120\mathrm{ s}\). Six linear regression models were fitted to the extracted features of each 15-s segment to make prediction of the flotation performance at \(t=120\mathrm{ s}\).

Figure 7 depicts \({R}^{2}\) values of the regression models built with the froth features extracted at different flotation times (\(t=45-120\mathrm{ s}\)). The \({R}^{2}\) values exhibit ascending trends over time, indicating again that the last 15-s segment images are more representative of the flotation performance. The results as shown in Fig. 7 indicate that ResNet-110 and GoogLeNet algorithms outperform the other architectures in predicting the copper recovery and the concentrate grade at \(t=120\mathrm{ s}\). In general, it seems that the deeper networks with more convolutional layers are more successful feature extractors.

Fig. 7
figure 7

Comparison of pre-trained CNNs algorithms for prediction of concentrate grade (left curves) and copper recovery (right curves) at \(t=120 \mathrm{s}\)

4.1.4 Prediction of Copper Grade and Recovery at \({\varvec{t}}=120\boldsymbol{ }{\varvec{s}}\) Using GoogLeNet-ANN

The results presented in the previous section showed that GoogLeNet was the best pre-trained CNN algorithm for extraction of the visual feature from the froth images. The best results belonged to the froth images captured at \(t=105-120\mathrm{ s}\), which were close to the concentrate sampling time (\(t=120\mathrm{ s}\)). An ANN was trained on the froth features extracted by GoogLeNet algorithm at \(t=105-120 7\mathrm{s}\) to estimate the froth grade and recovery at \(t=120\mathrm{ s}\). The ANN was designed in the same way as the previous one as shown in Table 2.

The predictions of the copper recovery and the concentrate grade by the pre-trained CNN (GoogLeNet) and ANN models are shown in Fig. 8. The performance indicators of the proposed GoogLeNet-ANN algorithm, namely, \(R\) and \(\mathrm{RMSE}\), are summarized in Table 4. The high \(R\) and low \(\mathrm{RMSE}\) values imply the good ability of the model to accurately predict the flotation performance.

Fig. 8
figure 8

Scatter plots of observed versus model-predicted values of copper recovery and concentrate grade at \(t=120\) obtained by GoogLeNet-ANN (testing data)

Table 4 Performance of GoogLeNet-ANN algorithms for predicting of the batch flotation performance at \(t=120\)

In the earlier study, the authors developed visual feature extraction (VFE) algorithms for extraction of the froth visual properties including bubble size distribution, froth color, froth velocity, and bubble collapse rate from the froth images captured from this laboratory flotation cell. Afterwards, a three-layer feed-forward artificial neural network was designed to learn the relationship between the froth features and the metallurgical [6]. The results of predictive accuracy of the developed neural network are summarized in Table 5.

Table 5 Performance of VFE-ANN algorithms for predicting the batch flotation performance [6]

The results presented in Tables 3 and 4 suggest that the prediction accuracy of the developed ANN models through visual and GoogLeNet features are relatively similar. Thus, even though GoogLeNet is a convolutional neural network pre-trained on a different image database, it is able to accurately extract features from the froth images. Furthermore, there are some challenges facing the image processing algorithms for analysis of the froth images. (i) The designed algorithms should be tuned and their parameters should be adjusted at each flotation plant. (ii) The image processing algorithms are sensitive to imaging conditions [27]. (iii) The time required for processing of images may cause some delays in the process control system. However, the pre-trained CNN algorithms are less sensitive to changes in imaging conditions and can be applied to extract features from froth images of different flotation systems. These results suggest that the pre-trained CNN algorithms such as GoogLeNet can lead to significant improvements in analysis of froth images.

4.2 Industrial Flotation Tests

The second case study is based on image and process data collected from an industrial flotation column in flotation circuit of a coal preparation plant in Iran [18]. The camera equipped with a lighting system (a 50-W LED lamp) was installed at a distance of 40 cm above the froth surface (Fig. 9). The camera and lighting systems was placed in a hood to protect them from the ambient light interference. The feed, concentrate, and tailings of the flotation column were sampled simultaneously over a period of 2 h (at 15-min intervals), and their ash content were determined for each test. The combustible recovery (CR) at different experiments was calculated from the following expression [15]:

$$\mathrm{CR}=\frac{C\left(100-c\right)}{F\left(100-f\right)}\times100=\left(\frac{t-f}{t-c}\right)\times\left(\frac{100-c}{100-f}\right)\times100$$
(5)

where f, c, and t are the ash content of feed, concentrate and tailings samples (%), respectively. At the time of sampling, continuous video sampling and analysis of the froth surface was performed at a rate of 30 frames per second. Typical froth images of industrial flotation columns at different operating conditions are shown in Fig. 10.

Fig. 9
figure 9

Coal flotation column and machine vision system [18]

Fig. 10
figure 10

Typical froth images of industrial flotation columns at different operating conditions

4.2.1 Prediction of Combustible Recovery and Concentrate Ash Content Using Pre-trained CNNs

The pre-trained networks as deep learning models were used to extract features from the froth images taken from the flotation column at different operating conditions, and the mean values of all the measured features were reported. The froth features extracted by the algorithms were reduced using PCA, which afterward used in a linear regression model to predict the metallurgical performance of the flotation column (combustible recovery and concentrate ash content). The predictive ability of the regression model was quantified by the leave-one-out cross-validation technique.

Table 6 shows the predictive accuracy of the regression models with different froth features (as predictors) extracted by the pre-trained CNNs. GoogleNet achieved the highest accuracy \(({R}^{2}\)) and the lowest prediction error (RMSE) values. As a result, GoogleNet is the best pre-trained algorithm for extraction of features from the froth images of the coal flotation column.

Table 6 Performance of pre-trained CNNs for predicting the coal flotation column performance

4.2.2 Prediction of Combustible Recovery and Concentrate Ash Content Using GoogLeNet-ANN

The previous results revealed that GoogLeNet algorithm can extract the optimal features from the froth images of the coal flotation column. The extracted features by GoogLeNet were subsequently reduced using PCA (just two components) and then used to train an ANN in order to make predictions of the combustible recovery and the concentrate ash content of the coal flotation column at different operating conditions. k-fold cross-validation technique was used to evaluate the proposed ANN model because of the limited data sample. In this method, the dataset is split into k groups. Then, each of the k groups are used as a testing set, while the remaining groups are considered as the training sets. The ANN is trained on the training sets and evaluated on the testing set. This process is repeated for the k groups. Thus, k different neural network models are trained and validated. In this approach, each sample has the opportunity to be used one time for testing and k − 1 times for training. To avoid overfitting, a term including sum of the squared weights (\(\sum_{j=1}^{n}{{w}_{j}}^{2})\) was added to the cost function as follows [18]:

$$J(w)=(1-\lambda )\frac{1}{m}\sum_{i=1}^{m}{({y}_{i}-{\widehat{y}}_{i})}^{2}+(\lambda )\frac{1}{n}\sum_{j=1}^{n}{{w}_{j}}^{2}$$
(6)

where \({y}_{i}\) and \({\widehat{y}}_{i}\) are the observed and model-predicted values, \(m\) is the number of training data, n is the number of weights, and \(\lambda\) is the regularization parameter (\(\lambda\) = 0 − 1). The structure of developed ANN models is shown in Table 7.

Table 7 Characteristics of developed ANN models

The results of prediction of the flotation column performance by use of a combination of GoogLeNet and ANN algorithms are summarized in Table 8. The results suggest that GoogLeNet is an efficient CNN algorithm that can be used in conjunction with different ANNs for predicting the metallurgical performance of flotation systems.

Table 8 Comparison of predictive capacity of GoogLeNet-ANN and VTFE-BRANN algorithms for predicting the coal flotation column performance

In the previous study, the authors proposed visual and textural feature extraction (VTFE) algorithms for measuring of the froth properties (bubble size, froth velocity, red color, green color, intensity, energy, entropy, contrast, homogeneity, and correlation) of the tested flotation column [18]. The extracted visual and textural features were subsequently used by different intelligent algorithms (BRANNs,Footnote 1 ANFIS,Footnote 2 and SVMFootnote 3) to estimate the metallurgical performance of the flotation column. BRANN was found to be the best suited model for predicting the combustible recovery and the concentrate ash content. The results presented in Table 6 show that the both VTFE-BRANN and GoogLeNet-ANN algorithms are capable of providing accurate predictions of the flotation column performance at different operating conditions.

Figures 11 and 12 show scatter plots of the observed versus model-predicted values of the combustible recovery and the concentrate ash content obtained by GoogLeNet-ANN and VTFE-BRANN algorithms. The results show that the visual and textural features are more reliable predictors of the flotation conditions than the extracted features by GoogLeNet. It can be concluded that the pre-trained networks such as GoogleNet can be applied to extract features from froth images and the extracted features may be used to predict working conditions and performance of flotation process.

Fig. 11
figure 11

Scatter plots of observed versus model-predicted values of combustible recovery by GoogLeNet-ANN (top figure) and VTFE-BRANN (bottom figure) algorithms

Fig. 12
figure 12

Scatter plots of observed versus model-predicted values of concentrate ash content by GoogLeNet-ANN (top figure) and VTFE-BRANN (bottom figure) algorithms

Transfer learning can be used to improve the performance of pre-trained networks for analysis of industrial datasets. This can be done by tuning the structure of considered pre-trained neural network corresponding to froth image data base. Afterwards, adjusted network can be retrained on a large dataset. In other words, the pre-trained networks can be fine-tuned using the flotation plant datasets to learn higher-level features from the actual froth images [23, 28, 40].

Despite several advantages, deep learning algorithms have some drawbacks. The main practical limitations of deep learning algorithms include a large amount of labeled data required as well as a long training time. The problem of massive data required can be overcome by using active learning, core sets, and data augmentation. The computing time can be reduced by training the network with reduced data and using compact models.

5 Conclusion

In this investigation, a soft sensor based on pre-trained convolutional neural networks was developed for prediction of the metallurgical performance of the flotation systems. The following conclusions can be drawn from this research study.

  • The pre-trained networks were effectively used to extract features from the flotation froth images in spite of their different training source (ImageNet). This would be beneficial because the pre-trained algorithms require less training and less effort to build a model of the process.

  • The metallurgical parameters of a batch flotation system and an industrial flotation column were accurately predicted from the features generated by the pre-trained structures. The predictions were comparative to the previous results by image processing algorithms developed by the authors for extraction of visual and textural features of the froth images. In addition, pre-trained neural networks are more robust against several challenges which is often occurred in feature extraction by image processing algorithms.

  • The difference between the froth sampling and the image capturing rates is the main challenge of froth image analysis in a batch flotation system. The sampling rate of froth images is much higher than the metallurgical data, which leads to several unlabeled froth images. The prediction accuracy of the pre-trained algorithms improves as the image capturing time approaches that of the froth sampling time.

  • GoogleNet architecture outperformed all the other pre-trained networks and achieved more accurate estimations of the process conditions and performance on these particular databases.

  • The promising results demonstrate that deep learning algorithms have great potential to be used as feature extractors in froth image analysis. Longer computation time because of massive amounts of labeled data is the main demerit of deep learning algorithms for practical applications.