Keywords

1 Introduction

At present, most water meters are still mechanical water meters, and the update speed of smart water meters is slow. For users who have not replaced smart water meters, the work of water meter readings is mainly done by humans every month, which is not only inefficient, but also has potential safety hazards. A lot of human resources are wasted, so remote meter reading is required. One of the remote meter reading methods uses image acquisition and then recognizes the number on the picture.

Image-based remote meter reading is mainly divided into two steps. The first step is the detection of the reading area and the second step is the recognition of characters. Among them, the detection of the reading area is the primary premise of character recognition. Common methods for detecting the reading area are: method based on Canny edge detection [1,2,3] and Hough transform [4,5,6,7,8,9], method based on projection [10,11,12].

The method based on Canny edge detection and Hough transform, the Canny edge detection algorithm is used to obtain the edge of the image. The Hough circle detection removes the pixel interference between the circular area and the area where the circular metal casing is located and then finds the water meter character by Hough line detection. The line of the four sides of the number box is located, and finally, the positioning is based on the number and geometric features of the number box. However, such a detection method has obvious disadvantages. It is easily affected by bubbles on the dial of the water meter, the angle of inclination of the image, blurring, etc., and the detection accuracy is low.

The method based on projection is a method of combining the gray values of each pixel of the obtained binarized image in a certain cross-sectional direction. The projection method is to count the number of black pixels in the horizontal and vertical directions displayed in the image reading area, and draw a corresponding projection image according to the counted number of pixels, and then locate the position where the number on the dial is located. Specifically, the horizontal projection is to judge the pixel value of each row of the obtained binarized image, and the number of gray values of all the black pixels is counted; the vertical projection is to judge the pixel value of each column of the obtained binarized image and count the number of gray values of all the black pixels of the column. However, such detection methods are also susceptible to many factors, such as light spots, stains, partial occlusions, etc., on the dial of the water meter, and the accuracy is also low.

In this paper, the convolutional neural network is used to predict the reading area, and then the non-maximum suppression algorithm (NMS) is used to remove highly overlapping results from prediction region results to obtain the position of the reading area. The IoU of the images in the 1000 test sets are all above 0.8 and combined with the three-layer BP neural network for character recognition, the accuracy rate is 98.0%. Compared with other methods, the reading results of the method in this paper are not easily interfered by external factors such as bubbles, spots, blurs, etc., and the sensitivity is low.

The paper is organized as follows. Section 1 is talked about introduction to this research. Section 2 introduces the specific implementation method of the predicted reading area. Section 3 contains the experimental results and analysis. Section 4 summarizes the paper.

2 System Implementation

The network structure of the method is shown in Fig. 1. The implementation mainly includes the following parts. Firstly, the convolutional neural network is used to predict the reading area. Secondly, the non-maximum suppression algorithm (NMS) is used to remove highly overlapping results from prediction region results. Finally, the accuracy is calculated by the result of character recognition and thus compared with other algorithms.

Fig. 1.
figure 1

Network structure for detection of reading area

2.1 Predict the Reading Area with Convolutional Neural Network

To perform character recognition on the readings in the water meter image, the first thing to do is to detect the correct reading area and eliminate character interference in the non-reading area, which is critical for character recognition of the readings.

The specific structure of the convolutional neural network is shown in Fig. 1. It consists of several convolutional layers and several pooling layers, including five two-dimensional convolutional and pooling layers and a separate convolutional layer, the first volume. The first convolution kernel is 5 × 5, the padding is set to the same mode, and the step size is 1. The convolution kernel of the remaining convolutional layers is 3 × 3, the padding is set to the same mode, and the step size is 1. The pooling layer is of the pool size 2 × 2, and the step size is 1. In the training process of the network, the Adam algorithm [13] is used instead of the traditional random gradient descent algorithm to perform a gradient optimization update.

The input of the whole network is the image data of the water meter, which is regarded as two-dimensional data. If the image is a RGB image of three color channels, we use three convolution kernels. If the image is a grayscale image, the gray value is copied to the corresponding three color channels. The overall loss function is composed of a position loss function and a confidence loss function. The specific calculation formula is as follows:

$$L\left( {x,c,l,g} \right) = 1/N\left( {L_{\text{conf}} (x,c) + \alpha L_{\text{loc}} (x,l,g)} \right)$$
(1)

where L is the overall loss function; x is the match between the i-th prediction box and the j-th real box, and the value range is {0, 1}; c is the confidence, l is the prediction box, and g is the real box. Lconf is a confidence loss function and Lloc is a position loss function; N is the number of matching default boxes and α is the weight of Lconf and Lloc, which is set to 1.

2.2 Remove Highly Overlapping Results with NMS

As shown in Fig. 1, we add the output of the fourth, fifth, and sixth convolutional layer to the end of the network and perform non-maximum suppression (NMS) [14,15,16] to remove the high overlap predicted boxes to obtain the optimal solution. The purpose of the NMS algorithm is to eliminate redundant boxes and find the best object location. The core idea of NMS to remove high-overlapping predicted boxes is based on the premise that there is no overlap or low overlap between the hypothetical instances.

Firstly, we sort the predicted boxes according to the confidence level and select the predicted box with the highest confidence. Secondly, set an overlap threshold and traverse other adjacent predicted boxes. When the overlap between the adjacent predicted box and the highest confidence predicted box is higher than the threshold we set, the adjacent predicted box is removed, and when it is lower than the threshold we set, the adjacent predicted box is retained. Then repeat the above steps by selecting one of the most confident predicted boxes in the unprocessed predicted boxes.

3 Experiment Result

The water meter reading area detection method in this paper is mainly evaluated from two aspects. One is the overlap between the predicted area and the real area, and the other is the accuracy of the reading recognition combined with the three-layer BP neural network. And in terms of accuracy, the method of this paper is compared with other methods. In addition, because there is currently no authoritative dataset for water meter readings, this paper is based on our own dataset for evaluation.

3.1 Dataset

The acquisition and labeling of image datasets is the basis of neural network training. The size of datasets and the quality of data annotation directly affect the training effects and prediction results of neural networks. The images required for training and testing the model are randomly divided into training set, verification set, and test set. The ratio of the three data sets can be in the range of 4:1:1–8:1:1. We use the camera to directly shoot the water meter dial, and adjust the position of the camera and the reading of the water meter, and mark the acquired picture, including the coordinate values of the two endpoints of the diagonal of the reading area and the reading of the water meter value.

For the size of the water meter image, the neural network of this paper does not have strict requirements for this and can use different sizes of pictures. For the color channel of the water meter picture, RGB three color channels and grayscale images can be used. For the tilt problem of the image, there is no need to tilt the image. We only need to tilt a part of the image at an appropriate angle. This part of the image can be randomly placed into the training set, verification set, and test set, which can increase robustness of the model. In addition, in order to better simulate the real application scenario, we can also grayscale and blur the image, add noise, light spots to the image, which can enhance the generalization of the model.

This paper collected 6000 water meters, randomly divided the pictures into three parts according to the ratio of 4:1:1, 4000 pictures as the training set, 1000 pictures as the verification set, and 1000 pictures as the test set.

3.2 The Overlap Between Predicted Area and Real Area

The degree of overlap between the predicted area and the real area (IoU) is a criterion for measuring the accuracy of detecting the corresponding object in a specific data set. The calculation method is the area where the two regions overlap and the area where the two regions are combined. The value of IoU is between 0 and 1. The larger the value, the better the prediction effect and the better the work for subsequent character recognition.

The test result of the reading area IoU is that the IoU of all pictures is above 0.8, and the specific distribution is shown in Table 1. As it can be seen from the test results in Fig. 2, the confidence of this predicted box is 0.99, and the predicted results are completely in line with expectations.

Table 1. Distribution of IoU
Fig. 2.
figure 2

Results of the predicted reading area

3.3 Recognition Accuracy with Three-Layer BP Neural Network

Accuracy is the standard used to measure character recognition. The calculation method is the number of pictures that are correctly recognized by all five characters divided by the number of all pictures. The accuracy rate is between 0 and 1. The higher the value, the more accurate the recognition result.

The test result of character recognition is that 980 images of all five characters are correctly recognized, and the accuracy rate is 98.0%, which can be applied to reality. Figure 3 shows the test results in the actual scenario. The predicted result of the reading is 00645, and the result is in line with expectations.

Fig. 3.
figure 3

Predicted results of readings

In order to better compare the results with other methods, we have tested various methods on the dataset of this paper. Because different methods have different ideas, not all methods can be evaluated with IoU. However, all methods evaluate the accuracy of the readings, so the accuracy of the readings is used here for evaluation. The test results are shown in Table 2. It can be seen that the accuracy rate based on projection method and BP neural network method is 90.1%, and the accuracy rate based on Canny edge detection, Hough transform, and template matching method is 93.3%, and the accuracy of the proposed method is 98.0%. There are obvious advantages in the proposed method.

Table 2. Distribution of IoU

4 Conclusion

This paper proposes a method based on convolutional neural network for water meter reading area detection, which lays a foundation for the recognition of readings. In this paper, the convolutional neural network is used to predict the reading area, and then the NMS algorithm is used to remove highly overlapping results from prediction region results to obtain the position of the reading area. The experimental results show that the IoU of the images of 1000 test sets are all above 0.8 and combined with the three-layer BP neural network for character recognition, the accuracy rate reaches 98.0%. The method of this paper achieves the detection of water meter readings efficiently and accurately. The method is not only unsusceptible to external environmental factors, but also has generalization ability, and the model structure is simple, requiring less computing resources.