Keywords

1 Introduction

In the past, the accuracy and development time of the traditional classification system is worse than machine learning. As the technology advances, the operational performance of the hardware was already satisfied machine learning with the complicated algorithms now. However, machine learning includes a lot of algorithms, the chopsticks tube data [1] will be used to compare accuracy and speed for ANN, RF, and SVM which are optimized. The contour perimeter, circle radius of circle fitting, two radii of ellipse fitting, maximum inscribed circle, and minimum inscribed circle are used by the developer to classify as “Normal”, “Large”, “Small”, “Deformed” chopstick tube and “Empty” [1]. The accuracy of the result is about 91%. You must put more effort and time to try if you want to improve the accuracy. This paper will use the same training data and the test data to compare accuracy and calculating the speed for ANN, RF, and SVM which are optimized model, and how to find these parameters. In the results, the accuracy of the test set is higher than the traditional method after the paper uses a grid search [2] to find the optimization parameters of these algorithms. Nevertheless, SVM is the most time-consuming in this paper, then ANN is the slowest at module calculating speed, and RF is the easiest to observe the structure of the algorithm.

2 Methodologies

2.1 Artificial Neural Network, ANN

This structure simulates the characteristics of nervous [3], and the nerves are connected to each other. Figure 1 shows the basic structure of ANN, which includes the input layer, hidden layers and the output layer, and the number of neurons and layer is mainly according to application design. If the module can’t satisfy the complicated application, then increase the number of neurons or hidden layer, but the time of training will relatively increase. In the modern, graphics processing unit (GPU) can be used to accelerate parallel computing the complicated module.

Fig. 1.
figure 1

Artificial Neural Network system architecture

Furthermore, ANN will use the active function [3], this paper used the sigmoid function. However, the value of each layer needs to pass the sigmoid function, and complete the prediction. This paper used cross-entropy [4] and backpropagation [5] to calculate cost and correct weight during training. The structure of ANN is simple, but we cannot understand the meaning for weight.

2.2 Decision Tree, DT

This algorithm is a tree structure on the application for classification or regression. In the process of the training module, it’s mainly used to find the maximum information gain (IG) (1) of every parent node. \( D_{p} \) and \( D_{j} \) are the datasets of the parent node and the child node. The parameter \( f \) is the feature to perform the split. \( N_{p} \) is all of the samples at the parent node, \( N_{n} \) is the number of samples in the child node. The parameter \( I \) is our impurity measure. Then the parameter \( I \) will change with different Algorithm. The algorithm included Iterative Dichotomiser 3 (ID3), C4.5, C5.0 [6], classification and regression trees (CART). The CART is used in the paper, and this algorithm is a binary method that likes the algorithm of ID3. The impurity measure of CART use Entropy (2) with ID3, parameter \( p \) is the proportion of c sample in nodes, but the algorithm of CART changes the impurity measure method to Gini index (3) method in building DT.

$$ {\text{IG}}(D_{p} ,f) = I(D_{p} ) - \sum\limits_{j = 1}^{m} {\frac{{N_{j} }}{{N_{p} }}I\left( {D_{j} } \right)} $$
(1)
$$ I_{H} (t) = - \sum\limits_{i = 1}^{c} {[p(i|t)log_{2} p(i|t)]} $$
(2)
$$ I_{G} (t) = 1 - \sum\limits_{i = 1}^{c} {p(i|t)^{2} } $$
(3)

2.3 Random Forest, RF

This algorithm is the ensemble method of DT, the concept of algorithms mainly uses a lot of DT method to achieve majority voting rules, the performance of RF is better than a DT. The method of Bagging is used in random sampling to train the module, also this method used in this paper. As a result, the every DT can keep accurate results for a part of samples, and improve the problem of overfitting. You do not need to prune the RF if you use the RF method.

2.4 Support Vector Machine, SVM

The SVM [8, 9] is mainly to find the hyperplane (4) that can be divided into two categories such as Fig. 2. The \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x} \) is the feature vector of input, \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{\omega } \) is the weight vector, b is the bias of hyperplane, pink line is the hyperplane that classify different sample into red dots and blue forks, red dots and blue forks are support vectors (SV); moreover, the hyperplane has to possess maximum distance (5) between hyperplane and SV.

Fig. 2.
figure 2

Schematic that SVM classify the sample (Color figure online)

$$ h(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x} ) = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{\omega }^{T} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x} + b $$
(4)
$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{r} = \frac{{h(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x} )}}{{\left\| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{\omega } } \right\|}} $$
(5)

If the application can’t use linear SVM to classify, we can use the kernel function to increase the dimensions. The kernel function included radial basis function kernel (RBF), polynomial kernel, sigmoid kernel, inter kernel and so on. For example, Fig. 3 uses the polynomial kernel to increase two-degrees to three-degrees. The hyperplane can be found in high dimensions by this method, and the best kernel and the related parameters will be found by grid search in this paper.

Fig. 3.
figure 3

Use the polynomial kernel to improve the dimensions

3 Experimental Result

In this section, this paper will use tube inner circumference state [1] as a sample to find the optimized algorithm which ANN, RF, and SVM, and improve the traditional method. The tube inner circumference state and feature [1] are displayed in Fig. 4. The white line is the contour of chopstick tube after image analysis, the green line is drawn by circle fitting, a rad line is drawn by ellipse fitting, a light blue line is the maximum inscribed circle, and the purple line is the minimum circumscribed circle. Furthermore, the ellipse can get the two feature is a radius. However, the result predicted by [1] are Normal, Large, Small, Deformed chopstick tube and Empty. The above information will be used in this experiment, and the data set will be divided into four to one which are training data and testing data in this experiment. Finally, the Tensorflow and OpenCV will be implemented and compared on NVIDIA TX2 platform in this paper.

Fig. 4.
figure 4

Schematic diagram of chopsticks after image analysis (Color figure online)

3.1 Artificial Neural Network, ANN

The number of hidden units and hidden layers will be adjusted to optimize the ANN module. The range of hidden units is between 6 and 18, and the range of hidden layers is between 1 and 6. The weight of ANN is random in the beginning, so this training result is the average value of 5 times such as [10]. Furthermore, the parameters in machine learning are learning rate η = 0.01 and epochs are 7000. Table 1 indicates the test data accuracy of ANN with different parameters. Hidden units and hidden layers use the optimal solution about 16 units and 3 layers, the maximum accuracy of test data is 0.96857 shown in Table 1. With 12 units and 4 layers are also the optimal solution of hidden units and hidden layers, but the module with 4 hidden layers will spend more calculating time than 3. The module with 16 units and 3 layers be chosen in this paper.

Table 1. The accuracy of test data with the various number of parameter

This paper will train the module up to 17000 epochs used the module with 16 hidden units and 3 hidden layers, and the result is shown in Fig. 5. And then, the blue line is the loss from root-mean-square error, the red solid line is a curve of training accuracy, and the red dotted line is a curve of testing accuracy. The training set accuracy and the test set accuracy also arrive 0.985612 and 1. The result can prove the module is not overfitting, and the module is the best for this experiment.

Fig. 5.
figure 5

Schematic diagram of chopsticks after image analysis

3.2 Random Forest, RF

RT mainly adjusted the number of DT and the maximum branch for every DT, this paper will adjust the maximum branch between 20 and 200, and the number of DT between 10 and 100. However, RF is easy to overfit during training and shown in Fig. 6. The red solid line is testing accuracy, the red dotted line is training accuracy, and the blue line is a branch of average. As a result, the testing set accuracy gradually decreases after each training, and this paper used early stopping [9] to improve the problem. Furthermore, RF used the random training set to build DT as the same as bagging algorithm, so this practice will use the average value from 10 times data.

Fig. 6.
figure 6

The curve of DT training

The result is shown in Table 2, the row and column are the numbers of DT and maximum branch. The range of table is large, so only show the quantity of DT between 20 to 80. There are 3 accuracies with optimal parameter close to 0.99286. The first, the number of DT and maximum branch are 70 and 160, the second are 30 and 180, and the third are 50 and 100. The module with parameters are 50 and 100 is more simple than others, so this module will be selected to reduce the complexity of this paper.

Table 2. Testing set recode from RT training.

Figure 7. displays the testing accuracy at each time of training, and the parameter of the module is 30 DT and 180 maximum branches. The x-axis is a quantity of training times. The y-axis is testing accuracy. As a result, the best module can easily arrive at in 100% for the testing accuracy in this experimental.

Fig. 7.
figure 7

Test accuracy of 50 DT and 100 branch

3.3 Support Vector Machine, SVM

The SVM mainly adjusted the kernel function and the parameters of function for the application, this paper will fix the cost penalization and iterations to look for the optimal module. Furthermore, this paper will try these kernel function: linear kernel, RBF, polynomial kernel, sigmoid kernel and inter kernel, to find the optimal parameters.

The training set accuracy and the test accuracy of a linear kernel and inter kernel showed in Table 3, a linear kernel is a normal linear SVM, inter kernel can reference the function (6), these two kernels don’t have parameters to adjust.

Table 3. Linear and inter kernel result
$$ {\text{K}}\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x}^{'} } \right) = min(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x}^{'} )^{ } $$
(6)

Polynomial kernel had to adjust two parameters (7), R and d are constant and the degree of a kernel function, setting the degree to 2, because of the high degree is very easy to overfit. In Table 4, this paper looks up for R between 0 to 1, but the data is too large, so only show the data between 0.545 to 0.55, and the optimal R is 0.548 for the test accuracy that is the highest value.

Table 4. Polynomial kernel result
$$ {\text{K}}\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x}^{{\prime }} } \right) = (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x}^{{{\prime }{\rm T}}} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x} + R)^{d} $$
(7)

The kernel of RBF (8) can infinitely increase dimensions for a sample, γ is constant that influenced Euclidean Distance for the sample, and only γ between 0.903 and 0.908 be shown in Table 5 in this paper because the data range is too large. However, the optimal γ is 0.906 for the test accuracy that is the highest value.

Table 5. RBF kernel result
$$ {\text{K}}\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x}^{'} } \right) = e^{{ - \gamma |\left| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x} - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x}^{'} } \right||^{2} }} $$
(8)

In finally, sigmoid kernel (8) has to adjust the constant R, a part of the result shown in Table 6, a sigmoid kernel is more suitable for the regressive application, so the accuracy is lower than other kernels.

Table 6. Sigmoid kernel result
$$ {\text{K}}\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x}^{'} } \right) = tanh(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{x}^{{{\prime }{\rm T}}} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {x} + R) $$
(9)

In conclusion, the optimal module of SVM will choose RBF with γ = 0.906, and the test accuracy is up to 0.982014.

4 Conclusion

The accuracy of testing can increase to 100% when this structure is complicated, and the accuracy is better than a traditional system is 91%. However, the complexity is not necessarily proportional to testing accuracy. The complex model is usually overfitting for the training set. When training the module, ANN and RF are simple, but the prediction of ANN is time-consuming. However, the training of SVM is more time consuming than ANN and RF, because has to set the parameter of a different kernel, and this method will be easy to do the hyperparameters. In the task of prediction, the calculating speed of SVM faster than and easier other. As a result, SVM is more suitable than other in an application of tube detection. Furthermore, we can select module are ANN and RF, if we do not care speed of predict.

The ratio is 4:1 for training and testing data in this experiment. The noise of testing data is lower than training data, and testing accuracy can easily increase to 100% because the sample is few and noise is not uniform. In part of the evaluation, there are still many places needed to improve. Such as: evenly distributed the noise, and optimize value is hyperparameters. The above are all important topics that is worth to discuss.