1 Introduction

Agglomeration is a common process in steel manufacturing industry. Fine iron ore is fed into a disk pelletizer and sprayed by water. With continuous rotation of the disk, fine particles are gathered and agglomerated into larger pellets, falling out of the disk as green pellets. The pellet size distribution (PSD) should be monitored to guarantee that it is in a desired size range of 9–14 mm [1].

Manual measurement of PSD has many limitations. For instance, the pellet samples are limited and the sampling and sieving process are rather time consuming. Human error also has a significant impact on the measuring results [2]. Optical imaging is a good method for online monitoring of pellet size [3,4,5]. It has advantages in simple hardware configuration, wide view field and abundant choice of image processing algorithms. A typical optical imaging system is shown in Fig. 1. The images of the pellets are usually taken by a camera and then, processed to detect individual pellets and to calculate the pellet size.

Fig. 1
figure 1

Illustration of the optical imaging system (left). An image taken at the discharge of the disk pelletizer (right)

Traditional image processing framework for pellet size measurement includes filtering, thresholding, clustering and segmentation. Pellet detection and segmentation are crucial prerequisites in image processing procedures, the performance of which greatly affects the measuring accuracy of pellet size [6,7,8]. However, the noisy background, uneven illumination, harsh light reflectance at the pellet centers and pellet overlapping are the key problems in image segmentation. Traditional algorithms, like watershed-based algorithms [9, 10] and multi-threshold segmentation algorithm [11], are limited in solving such problems. For instance, watershed may result in oversegmentation, meanwhile, multi-threshold algorithms require redundant setting of thresholds to achieve satisfactory results and it may be turned out to be tedious works to adapt to the local pixel value distribution [12,13,14]. Furthermore, the overlapping problem also plays a key role that influences the segmentation performance. Lu et al. [15] developed a technique based on combining wavelet transform and Fuzzy C-means clustering (FCM) for particle image segmentation. This method was feasible for separating overlapping particles. Zhang et al. [16] proposed an effective approach for particle segmentation based on combing the background difference method and the graph cut-based local threshold method. Building the background model and performing subtraction between particle image and background model are performed to eliminate the droplets, while the local threshold method is performed further to eliminate the influences of particle shadow. FogBank [17] is also a marker-controlled watershed-based algorithm for pellet segmentation, which combines the features of pixel value histogram and distance transform. However, these algorithms are more or less limited in processing images with uneven illumination.

Deep neural network could be a powerful solution to solve the above-mentioned problems in image segmentation [18], because it has been successfully applied in biological and medical field for segmentation of cells [19], which has many similarities with segmentation of iron ore pellets. In particular, U-net architecture is characterized with its relatively lower requirement of the number of training data, while some other deep learning architecture has more or less limitations related to the training. The segmentation of granule-shaped edges is easy to be realized by using such networks under partially or heavily overlapping conditions. Plissiti and Nikou [20] presented a method for the segmentation of overlapping nuclei, which combines local characteristics of the nuclei boundary and a priori knowledge about the expected shape of the nuclei. Fabijanska [21] used a U-net-based convolutional neural network for cell segmentation. Particularly, the network was trained to discriminate pixels located at the borders between cells. The edge probability map outputted by the network was next binarized and skeletonized in order to obtain one-pixel wide edges. Park et al. [22] also presented a U-net-based autoencoder deep neural network for the analysis of particle interaction images, which was an innovative attempt to adopt deep learning into industrial applications. This work was boosted by higher parameter efficiency and being able to get rid of the harsh background influences. Artificial neural networks such as backpropagation networks (BPNN) are also applied in the segmentation of shot-peened areas which are presented in round-like shape objects segmentation in industrial computer vision [23] to overcome the inaccurate and incorrect object segmentation problems in traditional techniques.

Inspired by the successful application of deep learning algorithm in biological and medical field for cell segmentation, we propose a lightweight U-net deep neural network in the present work for detection and segmentation of iron ore green pellets in order to overcome the limitations of traditional image segmentation algorithms. The features of the proposed method are listed as follows:

  1. (1)

    The proposed deep learning framework is a kind of lightweight U-net, with much less parameters than the classic U-net. Its requirement for GPU memory is dramatically reduced and therefore is more suitable for practical usage in industries where online processing is required;

  2. (2)

    Batch normalization (BN) layers are introduced after the deconvolution layer in each unit in the decoder part, so that the generalization ability of the proposed network can be improved.

  3. (3)

    A concentric circle model is proposed to automatically detect the centers and the respective surrounded contours of the pellets. With this model, it is convenient to perform ellipse fitting and statistical analysis of physical properties of the pellets.

This paper is organized as follows: The basic idea and the pipeline of the proposed method are given in Sect. 2; the experimental results and performance analysis of the proposed method are given in Sect. 3. The main conclusions are summarized in Sect. 4.

2 Method

The flowchart of the proposed method is shown in Fig. 2. It contains three main steps: (1) pellets detection using a U-net-based deep neural network; (2) separation of clumped pellet contours by a concentric circle model; (3) ellipse fitting of the pellet contour and statistical analysis of the pellet size distribution. Each step will be described in detail in the following sections.

Fig. 2
figure 2

Flowchart of the proposed method

2.1 Architecture of the U-net-based deep neural network

The U-net exhibits the encoder–decoder architecture where the encoder gradually reduces the input data spatial dimension, while the decoder gradually recovers it [24]. Here, we design an architecture which contains nine layers (Fig. 3). A rectified linear (ReLU) function S is selected as the activation function after each convolutional layer:

$$ A\left( x \right) = \hbox{max} \left( {0,x} \right) $$
(1)

where x denotes the input of the ReLU function.

Fig. 3
figure 3

Architecture of the U-net-based neural network

The size of convolutional kernel is 3 × 3, meanwhile, the maximum pooling layers with the window size of 2 × 2 with 2-pixel stride for downsampling are deployed after each even convolutional layer. Additionally, the number of feature channels is doubled after pooling layers and reduced by half in the procedure of encoder and decoder, respectively. In the process of image propagation through this pipeline, each pixel is classified separately as edge or non-edge with a certain probability by using the Softmax classifier.

Compared with classic U-net, the proposed network introduces the batch normalization layers at each end of the fully connected convolutional units. This feature could improve the generalization ability of network in processing images, meanwhile, batch processing improves the speed in adapting the weights model to all images in the training dataset. Moreover, the proposed network is a lightweight deep neural network, with much less parameters than the classic U-net. Its requirement for GPU memory is dramatically reduced and therefore is more suitable for practical usage in industries where online processing is required.

2.2 Network training

After the procedures as mentioned above, the probability map of the pellet edges can be obtained. More specifically, the U-net architecture is trained with its input of the original pellet image and the corresponding mask image. For the labeling work, the masks of the raw image dataset are labeled as ground truth by manual operations, while the objects and the background are marked with different colors to stand for the respective classes.

We captured 500 images with the size of 1280 × 1024 using the optical imaging system shown in Fig. 1. The number of images for training is 352, and the remaining images are used as testing dataset. The original image data with full size and the image patches cropped from the full-size images are used as the input data with respect to the size of 1280 × 1024 and 256 × 256, respectively. The full-size images are used for pellet size distribution (PSD) statistics, while the cropped patches are used to test the segmentation performance of our deep neural network. The network was trained for 400 epochs with the batch size of 8 to achieve the minimum loss between the probability map and its ground truth. In particular, to train the networks, we employed weighted cross-entropy function as the loss function in training procedure.

$$ {\text{Loss}}\left( {y,\hat{y}} \right) = - \frac{1}{N}\sum\limits_{i = 1}^{N} {\sum\limits_{c = 1}^{2} {w_{i}^{c} y_{i}^{c} \log \hat{y}_{i}^{c} } } $$
(2)

where \( \hat{y}_{i}^{c} \) denotes the probability of pixel i belonging to class c (background), \( w_{i}^{c} \) denotes the weight and \( y_{i}^{c} \) indicates the ground truth label for pixel i.

The U-net-based architecture used here is characterized by achieving a good segmentation performance after training with a small number of training data. For consideration of time efficiency, thereafter, this network is trained with 20 full-size images and 400 epochs. Then, it comes to a reasonable edge probability map of all the images with full size in the test dataset.

2.3 Prediction of the pellet edge

The prediction of the edge is shown in the form of the probability map. The transposed convolution of stride 2 is performed in two phases. On one hand, it is able to provide the probability map of each convolutional computation followed the convolutional layers. On the other hand, adopting transposed convolution ensures the dimension balance in the propagation through the network by addressing two-dimensional upsampling. Finally, the probability map of the binary segmentation task is generated from the output layer consisting of two units, and the activation values are fed into a binary Softmax function that is converted into probability distributions over the class labels. Suppose that ok is the k-th output of the network for a given input, the probability pk assigned to the k-th class can be calculated as the output of the Softmax function

$$ p_{k} = {{\exp \left( {o_{k} } \right)} \mathord{\left/ {\vphantom {{\exp \left( {o_{k} } \right)} {\sum\limits_{{h \subseteq \left\{ {0,1} \right\}}} {\exp \left( {o_{h} } \right)} }}} \right. \kern-0pt} {\sum\limits_{{h \subseteq \left\{ {0,1} \right\}}} {\exp \left( {o_{h} } \right)} }} $$
(3)

where k = 0 and k = 1 represent non-nodule and nodule pixels, respectively.

By referring to overlapping instances problem [25], weighted cross entropy shown in Eq. (4) is used to emphasize learning the edges of pellets, and to force the network to learn the small separated edges among the touching pellets. This method helps to distinguish overlapping instances. The basic idea is to weight edges more and to push network toward learning gaps between close pellets. The separation edge is computed using morphological operations. The weight map is then computed as in Eq. (5).

$$ E = \sum\limits_{x \in \varOmega } {w\left( x \right)\log \left( {p_{\ell \left( x \right)} \left( x \right)} \right)} $$
(4)
$$ w\left( x \right) = w_{c} \left( x \right) + w_{0} \cdot \exp \left( { - \frac{{\left( {d_{1} \left( x \right) + d_{2} \left( x \right)} \right)^{2} }}{{2\sigma^{2} }}} \right) $$
(5)

where \( \varOmega \Rightarrow {\mathbb{R}} \) is the weight map to balance the class frequencies, \( d_{1} :\varOmega \Rightarrow {\mathbb{R}} \) denotes the distance to the edge of the nearest pellet and \( d_{2} :\varOmega \Rightarrow {\mathbb{R}} \) denotes the distance to the edge of the second nearest pellet. w(x) denotes the weights function with the input x; wc(x) denotes the class-weights function which belongs to the class c; and w0 is a constant value but not a function.

2.4 Contour detection using ellipse fitting

By use of the U-net-based network described above, the probability maps of the pellet edges can be obtained, based on which the pellet contour can be extracted using ellipse fitting and the pellet size can be analyzed.

Hereby, an ellipse observed in an image is described in terms of a quadratic polynomial equation shown in Eq. (6) [26].

$$ Ax^{2} + 2Bxy + Cy^{2} + 2f_{0} \left( {Dx + Ey} \right) + f_{0}^{2} F = 0 $$
(6)

where f0 is the scale factor and a constant number for adjusting the scale. Theoretically, we can let it be 1, however, for finite-length numerical computation, f0 should be so chosen that x/f0 and y/f0 have approximately the order of 1, so that the numerical accuracy can be increased and loss of significant numbers be avoided. In view of this, we take the origin of the image XY coordinate system at the center of the image, rather than the upper-left corner as it is conventionally done, and then take f0 as the square error to denote the scale factor of the ellipses needed to be fitted.

2.4.1 Automatic pellet contour extraction

Before edge fitting step, some essential preprocessing steps in edge fitting by ellipse polynomial are only effective for the points distributed as single spherical shape. Thus, we design a framework to extract the pellet contour one by one from a cluster of pellet contours in one image. The flowchart shown in Fig. 4 represents such a pipeline.

Fig. 4
figure 4

Flowchart of automatic contour detection

2.4.2 Detection of the pellet center

Firstly, we mark the background and the pellets with white and black color, which indicates the two targets to be classified. Blurring is optional, but useful to reduce high frequency noise to make our contour detection process more accurate. Moreover, morphological processing such as image dilation and erosion is also a strategy used as a preprocess to improve the performance for the next steps. Secondly, distance transform and complementary distance transform [27] are applied to highlight the centrum region of the pellet area in the binarized image. Let A be a regular grid, and \( f:{\mathfrak{A}} \to {\mathbb{R}} \) is a function on the grid. The mirror function Df of distance transform is

$$ D_{f} \left( p \right) = \mathop {\hbox{min} }\limits_{{q \in {\mathfrak{A}}}} \left( {d\left( {p,q} \right) + f\left( q \right)} \right) $$
(7)

where d(p, q) is some measure of the distance between p and q. Intuitively, for each point p, we find a point q that is close to p and for which f(q) is small. Note that if the distance transform f has a small value at some location, Df will have small value at that location and any nearby point, where nearness is measured by the distance d(p, q).

In this way, we can make sure whatever regions of pellets are identified as really pellets. The remaining regions are pellets or background that can be determined by use of watershed algorithm [28]. An algorithmic definition of the watershed transform by simulated immersion was given by Vincent and Soille [29, 30]. The watershed transform is the method for image segmentation in the field of mathematical morphology. Imagine that the landscape is immersed in a lake with holes pierced in local minima. Basins will fill-up with water starting at these local minima, and, at points where water coming from different basins would meet, dams are built. When the water level has reached the highest peak in the landscape, the process is stopped. As a result, the landscape is partitioned into regions or basins separated by dams, called watershed lines or simply watersheds. In the present work, the discrete watershed transform proposed by Roerdink and Meijster [28] is applied to detect the pellet center, as described in Eqs. (8) and (9).

Let \( f:D \to {\mathbb{N}} \) be a digital gray value image, and hmin and hmax is the minimum and maximum value of f, respectively. Define a recursion with the gray level h increasing from hmin to hmax, in which the basins associated with the minima of f are successively expanded. Let Xh denote the union of the set of basins computed at level h and MINh denotes the union of all regional minima at altitude level h. A connected component of the threshold set Th+1 at level h + 1 can be either a new minimum, or an extension of a basin in Xh, resulting in an updated value Xh+1

$$ \left\{ {\begin{array}{*{20}l} {X_{{h_{\hbox{min} } }} = \left\{ {p \in D|f\left( p \right) = h_{\hbox{min} } } \right\} = T_{{h_{\hbox{min} } }} } \hfill \\ {\begin{array}{*{20}c} {X_{h + 1} = {\text{MIN}}_{h + 1} \cup IZ_{{T_{h + 1} }} \left( {X_{h} } \right),} & {h \in \left[ {h_{\hbox{min} } ,h_{\hbox{max} } } \right)} \\ \end{array} } \hfill \\ \end{array} } \right. $$
(8)

The watershed Wshed(f) of f is the component of \( X_{{h_{\hbox{max} } }} \) in D:

$$ W_{\text{shed}} \left( f \right) = {D \mathord{\left/ {\vphantom {D {X_{{h_{\hbox{max} } }} }}} \right. \kern-0pt} {X_{{h_{\hbox{max} } }} }} $$
(9)

After the above processing, the contour circled area is split up. A novel contour-based object detector using Hough transform at each separated area is then proposed in the present work, in which each local part casts a vote for the possible locations of the object center. To be specific, the image I is described by use of polar parametrization (θ, ρ), where ρ is the distance between the line and the origin and θ is the angle made by the normal to the line with the x-axis. If there is a line passing through point p, then we have

$$ \theta_{p} = \arg \nabla I $$
(10)
$$ \rho_{p} = \frac{{\left| {p \cdot \nabla I} \right|}}{{\left\| {\nabla I} \right\|}} $$
(11)

where θp denotes the direction of the gradient vector and ρp is the distance between the origin and the line passing through p and perpendicular to the gradient vector. The magnitude of gradient is \( \left\| {\nabla I} \right\| = \sqrt {I_{x}^{2} + I_{y}^{2} } \).

Assume that \( C \in {\mathbb{R}}^{2} \) is the center and r is the radius of the circle. If there is a circle passing through point p, then we have

$$ r_{p} = \frac{1}{{\left| {\kappa_{p} } \right|}} $$
(12)
$$ \mathop {pC_{p} }\limits^{ \to } = \frac{\nabla I}{{\kappa_{p} \left\| {\nabla I} \right\|}} $$
(13)
$$ \kappa = - \frac{{I_{xx} I_{y}^{2} - 2I_{xy} I_{x} I_{y} + I_{yy} I_{x}^{2} }}{{\left\| {\nabla I} \right\|^{3} }} $$
(14)

where the radius rp is the inverse of the absolute curvature кp calculated at point p using Eq. (12). The center Cp is obtained by tracing from the vector p. The magnitude of p corresponds to rp, and its direction depends on the sign of curvature and is the same as the gradient.

Under the concept of topological hierarchy-contour tracing, the separate contours of each pellet area are extracted successfully by detecting the outermost contour of the spherical pellets area and establishing the complete hierarchical tree where all the contours contain a list of contours surrounded by the contour directly [31,32,33]. 3D histogram is used in order to vote the matched feature in parameter space. As shown in Fig. 5, the vector [x, y, α] represents the position and the pose angle in the spatial coordinate. The higher the degree of matching, the higher vote at some bins. Consequently, the extreme voting value represents the position and pose of the center.

Fig. 5
figure 5

Voting scheme

2.4.3 Contour exploring

The contours usually surround the respective centers, therefore, we design a concentric circle model to explore the position of the points that form the contour (see Fig. 6). In this figure, the black curve is the contour of one iron pellet, the white arrows indicate the radius to cover the circular area that is labeled by different colors. The white arrow grows with a certain length after each traversing of its covered area. During the whole exploration procedure, according to the shape and size of the detected pellets, we set the explore radius to increase from 1 to 20 gradually. The distance between two nearby pellet centers varies from 1 to 11, and the threshold value of the number of detected points on one pellet contour is set to 50. The exploration will be terminated when the number of detected points reaches this threshold value.

Fig. 6
figure 6

Schematic diagram of concentric circle model

2.4.4 Ellipse fitting

The basic concept of curve fitting is to fit the pellet discrete edge points by using ellipse function. Given the coordinate value (xi, yi) of each point, Eq. (6) can be converted into matrix form:

$$ \begin{aligned} f\left( {x,y} \right) & = Ax^{2} + 2Bxy + Cy^{2} + 2f_{0} \left( {Dx + Ey} \right) + f_{0}^{2} F = X^{T} CX \\ & = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {x^{2} } & {\begin{array}{*{20}c} {2xy} & {\begin{array}{*{20}c} {y^{2} } & {2x} \\ \end{array} } \\ \end{array} } \\ \end{array} } & {2y} & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} A \\ B \\ \end{array} } \\ {\begin{array}{*{20}c} C \\ {Df_{0} } \\ \end{array} } \\ {\begin{array}{*{20}c} {Ef_{0} } \\ {f_{0}^{2} F} \\ \end{array} } \\ \end{array} } \right] = 0 \\ \end{aligned} $$
(15)
$$ X = \left[ {\begin{array}{*{20}c} x \\ y \\ 1 \\ \end{array} } \right],\quad {\rm P} = \left[ {\begin{array}{*{20}c} A & B & {Df_{0} } \\ B & C & {Ef_{0} } \\ {Df_{0} } & {Ef_{0} } & {f_{0}^{2} F} \\ \end{array} } \right] $$
(16)

where X represents the homogeneous coordinate of points and P represents the coefficient matrix of ellipse.

If there are N points to be fitted, we can obtain ellipse function similar to Eqs. (15)–(16)

$$ D = \left\| A \right\|{\rm P} = A^{T} {\rm P}A{\rm P}^{T} $$
(17)
$$ \left\| {\rm P} \right\| = {\rm P}^{\text{T}} {\rm P},\quad {\rm P} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} A \\ B \\ \end{array} } \\ {\begin{array}{*{20}c} C \\ {Df_{0} } \\ \end{array} } \\ {\begin{array}{*{20}c} {Ef_{0} } \\ {f_{0}^{2} F} \\ \end{array} } \\ \end{array} } \right] $$
(18)

where P represents the ellipse coefficients. The parameters vector Ρ can be estimated by using singular value decomposition or Eigenvalue decomposition. But, this estimation algorithm does not take use of the ellipse constraint condition

$$ \Delta = \det \left( {\left[ {\begin{array}{*{20}c} A & B \\ B & C \\ \end{array} } \right]} \right) > 0 $$
(19)

Consequently, the fitting result may be hyperbola or parabola when the noise exists, and the ellipse is unstable. To solve this problem, the direct least square fitting of ellipses algorithm proposed by Takashimizu and Iiyoshi [34] is applied in the present work, with which the fitting results can be forced to be ellipses when the noise exists.

3 Experiment and analysis

3.1 Data acquisition

The equipment and imaging system used in the present work are shown schematically in Fig. 1. The disk pelletizer is located in a steel company. It has a diameter of 6 meters and an inclination angle of 45 degree, operated with a rotation speed of 8 rpm. An industrial camera (Baumer VCXG-53M with a lens 50 mm, resolution 2048 × 2592) is used to capture image of the pellets that falling directly from the outlet of the disk. The images are sampled with an interval of 2 s. Although LED lamps are used to improve illumination, there still exists problem of uneven illumination in the captured images due to the inclination and fast rotation of the pelletizing disk. The light reflection near the pellet center is much stronger than other areas, and there exist shadows in overlapping pellets. Detection of pellets from such images is a big challenge in image processing. Those images will be used to test the proposed method described in Sect. 2.

3.2 Evaluation benchmarks

A quantitative assessment of the proposed method is performed twofold. Firstly, the segmentation performance of the trained CNN for recognizing the pellets contours was evaluated by means of a ROC curve and the corresponding area under the curve (AUC) [35]. Denote the mapping function of the position p in segmentation results and ground truth as F(p) and G(p), respectively. The ROC curve function and AUC are

$$ {\text{ROC}} = G\left\{ {F^{ - 1} \left( p \right)} \right\} $$
(20)
$$ {\text{AUC}} = \int\limits_{{p \in\Omega }} {{\text{ROC}}\left( p \right){\text{d}}p = \int\limits_{{p \in\Omega }} {G\left\{ {F^{ - 1} \left( p \right)} \right\}{\text{d}}p} } $$
(21)

where \( \Omega \Rightarrow {\mathbb{R}}^{2} \) stands for the two-dimensional space of the image data. Secondly, the segmentation performance of the proposed method is compared with traditional methods and classic U-net in the aspects of Sørensen-DICE coefficient. The mathematical expression of DICE is

$$ {\text{DICE}} = \frac{{2\left| {\varepsilon \cap \vartheta } \right|}}{{\left| \varepsilon \right| + \left| \vartheta \right|}} $$
(22)

where ε is the level of alignment between the pellet contours, and \( \vartheta \) is the ground truth.

Additionally, training loss and Intersection of Union (IoU) are given to describe the training process of this deep neural network. Intersection over Union is an evaluation metric used to measure the accuracy of an object detector on a particular dataset, which is expressed as

$$ {\text{IoU}} = \frac{A \cap B}{A \cup B} $$
(23)

where A and B denote the pellet contours in thresholded probability map and ground truth, respectively.

3.3 Experimental results and analysis

A number of patches cropped from the original full-size images are used to examine the performance of the proposed architecture. In Fig. 7, we select four pellet patches with uneven light illumination as example to illustrate the training procedure, and the segmentation results are compared with traditional methods (mark-controlled watershed segmentation, seed-point segmentation method, FogBank segmentation method) and classic U-net segmentation. DICE index (Table 1) is used to evaluate quantitatively the segmentation performance of different methods. It can be seen that the segmentation results generated by the traditional methods (see Fig. 7c–e) are strongly affected by the uneven illumination and harsh light reflection, resulting in lower values of DICE (lower than 0.55). In contrast, the performance of the classic U-net (Fig. 7f) and our proposed network (Fig. 7g) is comparable for Pellets 2–Pellets 4 (DICE \( \approx 0.9 \)) and both outperform the traditional methods to a great degree. For Pellets 1, the DICE of the proposed network is much higher than that of the classic U-net (0.8507 vs. 0.6035).

Fig. 7
figure 7

Segmentation results of test pellets. a Original image of the pellet patches. b Ground Truth. c Segmentation results using mark-controlled watershed. d Seed-point segmentation results. e FogBank segmentation results. f Segmentation results using classic U-net. g Segmentation results using the proposed network

Table 1 DICE index of the pellet patches from Pellets 1 to Pellets 4 using different segmentation methods

To illustrate the characteristics and advantages of our proposed network in segmenting iron ore pellets, some more pellet images are added for testing. In Fig. 8 and Table 2, the segmentation results are compared again with classic U-net. The blue suspending masks denote the successfully segmented pellet area. It can be seen that the proposed network is able to segment almost all the pellets from the background, while the classic U-net fails in the segmentation of some more pellets. The good performance of the proposed network is mainly due to the added BN layers which are able to improve the generalization ability of the network. As given in Table 2, for the case Pellets 2, Pellets 3 and Pellets 4, the proposed network has the similar DICE values to the classic U-net. However, the proposed network achieves much greater DICE values than classic U-net in the remaining seven cases.

Fig. 8
figure 8

Suspending mask of segmentation results generated by proposed network and classic U-net. The blue mask denotes the detected part and the remaining part denotes the area that are not detected (color figure online)

Table 2 DICE index of the pellet patches from pellets 1 to pellets 10 using different deep neural networks

For further comparison of the proposed network and the classic U-net, we sample the segmented image for multiple times. The sampled images have 80% of the original image size. For each sample image, the ROC curve function and AUC value of the two networks (see Fig. 9 and Table 3) are calculated using the mean value of respective index. Fig. 9 shows that the segmentation performances of the proposed network (plotted by dash lines) achieve higher convexity of ROC curves than the classic U-net does (plotted by solid lines). For the ten testing pellet patches, the corresponding AUC value of our proposed network is generally higher than that of the classic U-net, except the case Pellets 3 (see Table 3).

Fig. 9
figure 9

ROC curve of the proposed network and the classic U-net for ten testing pellet patches (represented by different colors) (color figure online)

Table 3 Mean AUC value of the pellet patches from Pellets 1 to Pellets 10 processed by classic U-net and the proposed network

After segmentation of the pellet area from the background, the contour of each individual pellet should be extracted so that the pellet size can be measured in case of pellet overlapping. For this, the proposed centrum detection algorithm is used to extract the pellet centers and the points on the pellet contour in the thresholded probability map. With extracted contour of the pellet, ellipse polynomial fitting is then applied to approximate the pellet shape. Results of the above procedure are shown in Figs. 10 and 11, for example. It can be seen that the proposed method is capable of extracting the contours of most pellets in the image, even for strong overlapping pellets (see Pellets 13 and Pellets 15 in Fig. 10).

Fig. 10
figure 10

Extraction of pellet contour for pellet patches

Fig. 11
figure 11

Extraction of pellet contour for full-size pellet image

In order to investigate the performance of the proposed method on processing full-size image, the ROC curves, AUC and DICE of the proposed network for the full-size images in Fig. 11 are given in Fig. 12 and Table 4. Each point on the ROC curve represents a sensitivity/specificity pair. In this section, we adopt ROC curve analysis in a cross-validation manner, consequently, segmentation results and the respective markers are matched randomly with 80% of their whole size, and the ROC curve together with AUC index are calculated ten times. Thereafter, we select three candidates with the highest AUC indexes to calculate the mean ROC curve and AUC. From Table 4, it can be seen that the AUC values of the full-size pellet images vary between 0.83 and 0.86, and the values of DICE indexes are very high (about 0.95). This indicates that the proposed method has good performance in segmenting pellets (including overlapping) and is advantageous to the subsequent processing for PSD statistics.

Fig. 12
figure 12

ROC curve of the proposed network for a image 1 in Fig. 11, b image 2 in Fig. 11, c image 3 in Fig. 11

Table 4 Mean AUC and DICE index of the segmented full-size images in Fig. 11

3.4 Discussion

3.4.1 Computing time

The proposed method is accomplished on an ordinary computing platform, which has an Intel Xeon E2520 v3 CPU with 16 GB memory and a NVIDIA GTX750i GPU with 2 GB memory. Python V3.5 is employed as programming language. In the study, 352 patch images with the size of 256 × 256 are used as training dataset. The batch size and training epochs are set to 8 and 400, respectively. Under such conditions, it needs about 3 days to complete the whole training procedure described above. After training, the weight model is obtained, and it needs only 1.5 s to 3 s to process one test patch image (256 × 256).

Considering the practical usage of the proposed network, we adopt 30 full-size raw images (1280 × 1024) for training and 148 full-size images for testing. The experiments show that it needs about 3 days for network training, but needs only about 4–6 s for segmentation of one full-size testing image. Such computing time is acceptable for online usage in practice. The computing time of the proposed method can be further reduced by employing computer with higher configurations and is prosperous for real-time measurement.

3.4.2 Parametric study

Training epochs and batch size are two important parameters that may influence the segmentation performance of the proposed network and are therefore discussed in this section.

Figure 13 shows the influence of training epochs on segmentation performance (the loss and mean IoU) of the proposed network, using images of pellets patches and full-size images for testing. It is shown that, for the training progress of the pellet patches, the loss of the proposed U-net-based framework converges fast at the first 150 epochs and settles at 0.01 after 350 epochs. The intersection of union (IoU) is almost constant (0.7) when the epochs are less than 100 and then increases rapidly to 0.85 when the epochs are 400. For the training progress of full-size images, the loss settles at about 0.01 after 300 epochs, while the mean IoU is approximately constant (0.96) after 400 epochs. Therefore, it is reasonable for us to 400 training epochs in the training of the proposed network.

Fig. 13
figure 13

Loss and mean IoU in the training process of the proposed network using images of a pellets patches and b full-size images for PSD statistics

Batch size in the training procedure is another important factor during training procedure. It affects both indexes of segmentation performance and memory occupation. To clarify the influence of batch size selection to these two indexes, the values of mean IoU and memory occupation ratio (the percentage of required computing memory in our GPU) with respect to six different batch size values (2, 4, 6, 8, 12 and 16) are given in Fig. 14. It can be seen that the memory occupation ratio is proportional to batch size values, and mean IoU generally increases with the increase in batch size. However, the improvement of mean IoU may be limited and memory occupation ratio will increase at the same time. The GPU memory will be exhausted if the batch size is set to 16 or greater. Therefore, we select batch size of 8 for our method to achieve good segmentation performance under reasonable memory occupation.

Fig. 14
figure 14

Memory occupation and mean IoU w.r.t batch size after 400 epoch training

4 Pellet properties statistics

4.1 Pellet radius statistics

With the proposed method described above, the size of each pellet in the image can be described by a fitted ellipse with a short radius rshort and a long radius rlong, as shown in Fig. 15a for the pellets in the three full-size images (see Fig. 11). For simplicity, the pellet shape can be also described roughly by a circle with diameter d:

$$ d = (r_{\text{short}} + r_{\text{long}} ) $$
(24)

The pellet size distribution (PSD) defines the relative number of particles present according to pellet size d, as shown in Fig. 15b for example.

Fig. 15
figure 15

Analysis of the pellet size for full-size images. a Pellet size measured by the proposed method, b pellet size distribution (PSD)

4.2 Roundness analysis

Roundness is a physical property of the pellets that may affect the discharge behavior [36]. It can be calculated by Eqs. (2527):

$$ {\text{Circularity}} = 4\pi \cdot \frac{\text{Area}}{{{\text{Perimeter}}^{2} }} $$
(25)
$$ {\text{AR}} = \frac{{r_{\text{short}} }}{{r_{\text{long}} }} $$
(26)
$$ {\text{Roundness}} = {\text{Circularity}} + \left( {{\text{Circularity}}_{{{\text{Perfect\_circle}}}} - {\text{AR}}} \right) $$
(27)

where the aspect ratio AR is equal to one for a perfect circle, and CircularityPerfect_circle is the maximum of circularity.

The roundness distribution of the pellets detected in seven successively captured full-size images is shown in Fig. 16. The average number of pellets in each image is 70. It can be seen that there is a peak for each curve, and 30–50% of the pellets have a roundness of 1.33.

Fig. 16
figure 16

Roundness distribution of pellets detected from seven successively captured full-size images

In the pelletizing process of the local steel company, pellets with roundness 1–1.3 and particle size 9–15 mm are considered as “good quality.” With our proposed method, the change of product quality with time can be monitored, as shown in Fig. 17 where percentage of pellets with desired roundness and size is calculated for seven full-size images captured at different times. It can be seen that at least 55% pellets have desired roundness and size (good quality), expect at time t = 0 s and t = 2 s where only about 45% pellets have desired size.

Fig. 17
figure 17

Percentage of pellets with good quality and its change with time

5 Conclusion

The present work proposed a lightweight U-net for detection and segmentation of iron ore green pellets in images, with the aim to solve pellet overlapping problem and uneven illumination problem that are difficult for traditional methods. Compared to the classic U-net, the proposed deep learning framework has two advantages: (1) The network is a kind of lightweight U-net requiring less GPU memory and computing time. It is therefore more suitable for online image processing in practical usages; (2) by introducing batch normalization (BN) layers after the convolution and deconvolution layers in each unit in the encoder and decoder parts, respectively, the generalization ability of the proposed network can be improved.

The proposed method was tested by images captured from a local steel company. It shows good segmentation performance in terms of DICE and ROC evaluations. Compared with traditional morphological algorithms and classic U-net, it has much better robustness to overlapping, uneven illumination and harsh light reflectance. Tests with static images and temporal image sequences demonstrate that the proposed method is effective in measuring the pellet size distribution and the shape evolution as well. The proposed method has potential usage in online detection of iron ore green pellets and other types of particles.