Keywords

1 Research Background and Significance

With the increase in the speed of trains and the expansion of the scale of the railway network, the importance of driving safety has further increased. If there is an intrusion of people on the perimeter of the high-speed rail, it may seriously endanger driving safety and affect the operating efficiency of the entire road network. The “Opinions on Strengthening the Governance of Safety and Environment Along Railways” issued in 2021 proposes to make full use of video monitoring, artificial intelligence and other means to improve the ability to investigate and manage potential safety and environmental problems along railways. The cross-region target tracking function meets the anti-terrorism needs of the high-speed rail perimeter, and alarms and tracks suspicious persons who have repeatedly entered the perimeter.

At present, manual investigation is mainly used to determine whether there are suspicious persons. This method is time-consuming and labor-intensive, and cannot effectively track suspicious persons, identify terrorists, and solve hidden dangers. When traditional algorithms deal with perimeter intrusions based on video images, they are easily affected by factors such as light, and cannot achieve cross-region tracking of suspicious persons. Therefore, this paper proposes a study on the perimeter cross-region target tracking of high-speed rail based on deep learning algorithm.

2 Research Status at Home and Abroad

The related issues of cross-regional target tracking were first raised by Cai Q et al. at the ICPR meeting in 1996 [1]. In 2006, N. Gheissari et al. first proposed the concept of pedestrian re-identification on CVPR [2]. Since then, related research on pedestrian re-identification has continued to develop.

The traditional manual feature cross-region target tracking has the following classic representatives: the author D. Gray and others combined color and texture features to extract pedestrian image features [3]. Author Farenzena et al. extracted features of three complementary aspects of human appearance [4]. S. Liao and others of the Institute of Automation of the Chinese Academy of Sciences proposed a feature representation method of Local Maximal Occurrence, referred to as LOMO, which is used for pedestrian re-identification [5], which reduces the interference of background noise.

In complex application scenarios, cross-region tracking algorithms based on manual features can no longer meet the needs. The emergence of algorithms based on deep learning is a leap forward in the performance of video image processing technology.

In 2014, the author W.Li et al. [6] proposed a new filter paired neural network (FPNN). In the same year, Simonyan et al. used a 3 × 3 size convolution filter instead of a 5 × 5 size filter, and increased the model’s fitting ability by deepening the number of network layers [7]. Geng et al. used CNN to extract global features and merged classification loss and verification loss to solve the problem of data sparsity [8]. Zhang et al. only retain global features to calculate the similarity between images [9].

The cross-region tracking model based on deep learning has been well developed in theory, but it will encounter the problem of model generalization performance in the application process. Therefore, this paper studies the application of the ResNet-50 algorithm on the basis of constructing the data set of the perimeter of the high-speed rail, and examines the application effect of the model.

3 Cross-region Target Tracking Algorithm Based on ResNet-50

In order to realize the security monitoring of the perimeter along the high-speed rail, the need to identify, detect and track intruders is proposed. If criminals enter the perimeter of the high-speed rail, they will often obscure their facial information. They may also appear in multiple key areas on the perimeter of the high-speed rail, threatening the safety of high-speed rail traffic. Therefore, this question proposes a cross-region target tracking algorithm based on ResNet-50 to realize the identification and tracking of intruders.

The research in this paper is mainly divided into three experimental stages, namely the detection of pedestrian targets, the feature extraction of target images, and the similarity matching of image features. Because yolov3 is a target detection network with relatively balanced speed and accuracy, it solves the problem of small object detection and can identify targets at 100m from high-speed rail monitoring. The background false detection rate is low, and it is easy to transplant to other platforms. Therefore, the yolov3 algorithm is selected as the method of pedestrian target detection.

In the second stage, ResNet50 is the basic feature extraction algorithm. The feature extraction model performs semi-supervised learning training to learn the characteristics of different pedestrians in the image. After the training, the target is tested and the features of the image are extracted. Finally, the cosine similarity is used to sort the distances between the feature vectors from small to large, and the retrieval results are obtained. Compare and analyze the test results of different hyperparameter models (Fig. 1).

Fig. 1.
figure 1

Flow chart of cross-region target tracking algorithm based on DCGAN and ResNet-50

4 Experiment and Result Analysis

4.1 Generate Perimeter Data Set

The first stage is to use the yolov3 algorithm to detect the target pedestrian, and generate pedestrian images based on this. This experiment uses video data collected on an experimental railway line. A total of 1911 pedestrian images were intercepted, including 960 images in the query set and 951 images in the candidate set.

The captured image result is shown in the figure below (Fig. 2):

Fig. 2.
figure 2

Part of the perimeter dataset images

And annotated the image results of pedestrian target detection, and each image contains pedestrian id information, camera information, and so on.

4.2 Perimeter Image Feature Extraction and Matching

After the data set is generated, it is used as the input of the ResNet-50 feature extraction model, and the output is the high-level semantic feature value of the target image.

First, we must train the ResNet-50 feature extraction model. The training set uses the Market-1501 data set, a public data set for pedestrian re-identification. The Market-1501 data set is collected on the campus of Tsinghua University. The training set and test set can be set and used in single-target tracking or multi-target tracking tests.

The trained ResNet-50 feature extraction model extracts the image features in the high-speed rail perimeter data set, outputs the following matrix results, and calculates the cosine similarity of the feature values to obtain the similarity ranking between the target images (Fig. 3).

Fig. 3.
figure 3

Image feature extraction result

Query represents the target pedestrian image. The top 9 similar images retrieved from the candidate set are listed on the right. The green number indicates that the image retrieval result is correct, and the red number indicates that the image retrieval result is wrong.

The following pictures show part of the experimental results (Fig. 4):

Fig. 4.
figure 4

Visualization results of cross-regional target tracking experiments

The Rank-1 value of the ResNet-50 model on the high-speed rail perimeter data set reached 97.90%, and the Rank-10 reached 99.78%, with an average accuracy rate of 67.23%, and achieved good application results (Table 1).

Table 1. Cross-region target tracking experimental training results

4.3 Results and Comparative Analysis

In order to further test the detection accuracy of cross-region tracking of intrusive targets in the high-speed rail perimeter and analyze the factors that affect the detection accuracy of the cross-region tracking algorithm, this paper optimizes the generated high-speed rail perimeter data set, and more strictly distinguishes the images of the two regions. The data is used as a query set and a gallery set, and some interference images in the data set are deleted. For example, different pedestrians wearing yellow vests are easily identified as the same target, which affects the accuracy of the cross-region tracking algorithm.

After optimizing the high-speed rail boundary data set, the ResNet-50 feature extraction model extracts feature values from the No-vest data set, and performs a cosine similarity measurement to evaluate the matching results. The adjusted Rank value is slightly lower than that of the original data set, but it is also above 95%. The result of Rank-10 also reaches 99%, which still has a high matching efficiency. The mAP value of the No-vest data set has increased significantly, reaching 88.84%, which is +21.41% better than the mAP result of the original data set. Therefore, it can be analyzed that the optimized data set has a positive effect on the experimental results (Table 2).

Table 2. Optimized data set ResNet-50 model comparison evaluation results

The following table is the test results of this algorithm and the classic pedestrian re-identification method on the Market-1501 data set (Table 3).

Table 3. Different algorithm results

Compared with these algorithms, the experimental results of the algorithm in this article on the public data set show that the values of mAP and Rank-1 can be ranked high among them, and it has already had a good training effect. The application of the ResNet-50 feature extraction model on the high-speed rail perimeter data set, the experimental results are not weaker than these methods, and the optimized data set application is far superior to the results of the algorithms listed above.

5 Conclusion

In this paper, a deep learning-based cross-region target tracking algorithm for high-speed rail perimeters is carried out. The cross-region tracking of suspicious persons invaded by the perimeter of high-speed rail is realized. Good detection results are obtained on the high-speed rail perimeter data set, which can achieve maintenance driving purpose of security.

In future research, first of all, the high-speed rail perimeter data set needs to continue to be expanded, and the optimized network is continuously trained to improve detection accuracy. Secondly, the algorithm calculation process should be further optimized to reduce the calculation amount and complexity of the detection process. In addition, continue to study advanced technologies such as unsupervised learning and generative confrontation networks, and integrate the advantages of various algorithms to make the high-speed rail perimeter monitoring system more efficient and accurate.