Keywords

1 Introduction

With the progress of society and the improvement of living standards, people have begun to pay attention to the topic of physical health, and constantly improve their physical quality through sports. In the process of sports, the standard movement posture can not only determine the effect of sports to a certain extent, but also protect themselves from injury to the maximum extent in the process of sports. For basketball, using the basketball dribble mark Quasi movement for daily exercise can not only effectively exercise, but also improve the level of competition. However, in the process of motion, the definition of standard motion posture is mostly based on pictures or oral guidance, which leads to the lack of quantitative standard of standard motion posture. Therefore, it is of great significance to estimate human posture and recognize human action through the collected image or video sequence [1].

In reference [2], a recognition method based on image structure model is proposed. Heuristic local search technology is used for optimization processing to search reasonable initial solution and global optimal solution. Multiple limb components are used to represent the human body model. By counting the confidence of each component, human posture recognition and target detection are realized. In this method, each component can be represented as a joint or limb. If the components are connected with each other, they can represent the different postures of the human body model and qualitatively describe the meaning of the human body posture. In reference [3], a flexible hybrid model was used to capture the relationship between the limbs. At the same time, a star structure was used to represent the human body structure by using the variable component model DPM. Through rotation, scaling and size transformation, the position changes of different postures were displayed.

On this basis, a high-precision recognition method of basketball dribble posture based on lightweight RFID mobile authentication protocol is proposed. The efficiency of basketball dribble posture recognition is improved by using lightweight RFID mobile authentication protocol, which provides a theoretical basis for measuring standard movement posture.

2 Design of High Precision Recognition Method for Basketball Dribble Attitude Based on Lightweight RFID Mobile Authentication Protocol

2.1 Image of Basketball Dribble

Placement RFID Camera Based on Lightweight Mobile Authentication Protocol

The camera is placed on the basketball court, and the angle of view is broadcast to collect the image of basketball dribbling posture. The standard basketball court is a rectangular court. In the actual field test, adjust the camera angle, let the camera hang down above the basketball court behind the basketball players, and form an angle of 45° to 60° with the horizontal plane, showing a kind of overlooking posture. You can see all the actions of the basketball players, synchronously collect the left and right camera images, and get two human posture coordinate systems, and then pass the test. The coordinated transformation of binocular vision transforms the posture of the human body to the world coordinated system. The linear distance between the camera and the basketball player is about 10.1 m, so that the shooting angle of the camera is the same as the height of the camera parallel to the horizontal plane and the height of the human body. The linear distance between the camera and the player is reduced, and the image features of the human body posture in the image are retained, so that the basketball player can drive the network to find the elbow covered by the trunk to reach the hand when he dribbles in order to obtain the feature representation information [4].

Due to the long distance of the camera, the human body is easily compressed to a very small area, and many image features are lost in the collected image. Therefore, the image features are perceived by using UHF RFID equipment combined with lightweight RFID mobile authentication protocol. RFID reader is selected. The RFID reader adopts R2000 imported RFID module, the kingerton R2000 chip, and four antennas are connected at the same time for data receiving, processing and command sending. The RFID antenna adopts Keller 12 dB high gain linear polarized UHF antenna, which has directivity and can detect long-distance RFID tags. The mobile authentication protocol tag adopts d68 long-distance electronic tag, which is a kind of RFID reader Passive tag, which has no harm to human health, is economical and practical, and can transmit data from lower computer to upper computer. Considering the rationality and economy, the distance between RFID mobile authentication protocol tags is half of the wavelength of RFID equipment.

$$ A{ = }\left\{ \begin{gathered} \left[ {\frac{{2{\text{a}}}}{\xi }} \right] \times \left[ {\frac{{2\left( {{\text{b}} - 0.2} \right)}}{\xi }} \right]\quad {\text{b}} \le 2.2\,{\text{m}} \hfill \\ \left[ {\frac{{2{\text{a}}}}{\xi }} \right] \times \left[ {\frac{2 \times 2}{\xi }} \right]\quad \quad \quad {\text{b > }}2.2\,{\text{m}} \hfill \\ \end{gathered} \right. $$
(1)

Where, \(A\) is the label spacing, \(\xi\) is the wavelength of RFID equipment, \({\text{b}}\) is the width of basketball court, and \({\text{a}}\) is the length of basketball court. Through formula (1), RFID tags are arranged to sense the change of tag return information caused by players in the range of tag array in the basketball court, assist the camera to capture the players’ dribbling state, and retain more image features. So far, the camera position is placed.

Clear Camera Acquisition Parameters

Clear camera parameters, including interface type, pixel size, frame rate size, focal length size, color channel, the specific setting parameters are as follows: select USB3.0 as the interface between camera and computer, the theoretical maximum transmission bandwidth of USB3.0 is as high as 5 GB/s, in fact about 200 MB/s, the image storage size is about 1 MB, use USB3.0 to maintain the transmission speed of 200 frames/s, ensure the image quality The USB3.0 interface of holding camera is consistent, which makes it possible to use the interface layer to adjust and realize the compatibility of multiple cameras. Pixel is the basic unit of a picture. In the image presented by the computer, the image is composed of pixels. The image can be abstracted as a basic unit of matrix rate. The number of pixels in the source image determines the quality of the image. The more the number of pixels, the more feature information can be extracted. Select a high-pixel camera, take the cost control into account, and test a variety of different pixels The pixel parameters of the camera are given. Frame rate refers to the number of images collected per second. The frame rate itself does not affect the image quality, and mainly determines the delicacy of athletes’ posture extraction. The higher the frame rate is, the more images of athletes’ posture are collected per second. The posture changes of athletes’ posture during dribbling will not be missed, that is, there will be no dynamic blur in the collected images, while the camera frame rate is higher If it is too low, there may be dynamic blurring of human limbs in the collected image, which is manifested by the diffusion of pixels in the dynamic blurring area to the surrounding area. Therefore, the frame rate is also an important parameter of the camera. Considering the cost control, test a number of cameras with different frame rates, and give the camera frame rate parameters that meet the requirements [5, 6]. The camera lens is divided into wide angle, standard and long focus. The focal length of the three lenses has their own range. When collecting images from the left side of the court to the athletes, it is found that the size of the camera focus will affect the acquisition angle. The effect test is carried out by using a variety of cameras with different focal lengths, and the camera with smaller focus is selected to make the acquisition angle include the whole half-time, so as to ensure the collected camera The image, including the basketball player in half-time, collect all the dribble posture of basketball players. The image has more than one color channel, which is used to save the color information of the image, and to overlay the color of all color channels in a certain position of the image, representing the color of the position in the image. Currently, the mainstream cameras have two color channels, one is RGB three channel camera, the other is gray single channel camera. The gray scale map is used to expand the single channel to three channels, and then RGB is selected Three channels are used to learn. The calculation formula of pixel \(Q\) at each position is:

$$ Q = \frac{R \times 30 + G \times 59 + B \times 11 + 50}{{100}} $$
(2)

Among them, \(R\), \(G\) and \(B\) are the pixel values of RGB channels of the camera. Using formula (2) to transform, the RGB three channels can be restored in single channel without external constraints, and the color features of the image can be extracted. So far, the camera acquisition parameters are clear, and the image data acquisition of basketball dribbling posture is realized.

2.2 Preprocessed Basketball Dribble Attitude Image Data

Preprocess the collected image to eliminate the irrelevant information in the image and ensure the image quality. Image preprocessing uses image compression, graying, geometric transformation, image enhancement and other means. Firstly, bilinear interpolation is used to compress the image, recover useful feature information, enhance the detectability of feature information, and simplify the image data to the maximum extent, so as to improve the reliability of feature extraction. The side length ratio of the source image and the compressed target image is calculated, and then the edge length ratio is calculated compared with the corresponding traceability image, the corresponding coordinate \(\left( {X{,}Y} \right)\) is calculated.

$$ \left\{ {\begin{array}{*{20}l} {X{ = }\frac{{\varpi \times {\text{u}}}}{\tau }} \hfill \\ {Y = {\text{n}} \times {\text{m}}} \hfill \\ \end{array} } \right. $$
(3)

Where \(\left( {\varpi {\text{,n}}} \right)\) is the pixel position coordinates of the compressed target image, \(\left( {\text{u,m}} \right)\) is the size of the source image, and \(\left( {\tau ,1} \right)\) is the size of the compressed target image [7]. After image compression, the cognitive region of the source image is preprocessed by noise suppression. The initial frame of the source image is used as the background, and compared with the subsequent image to eliminate the background points in the cognitive region of the image. Then, the source image is decomposed by using the wavelet domain Markov tree model. The cognitive region of the source image is transformed in the wavelet domain, and the wavelet coefficients are reconstructed to obtain the denoised image. The average filtering method is used to deal with the general noise. The given points of the pixels in the cognitive region are selected, and the neighboring pixels of the given points are weighted to calculate the average value and then replace the value of the given points. The average filtering formula is as follows.

$$ B{ = }\sum\limits_{{\alpha { = }1}}^{C} {\sum\limits_{{\beta { = }1}}^{D} {{\text{e}}\left( {\gamma_{\alpha \beta } ,\eta_{\alpha \beta } } \right)} } $$
(4)

Among them, \(B\) is the given point value after mean filtering, \(\alpha ,\beta\) is the left neighborhood and right neighborhood of the given point in the image cognitive region, \(C,D\) is the left neighborhood threshold and right neighborhood threshold of the given point, \({\text{e}}\) is the normalized weighting coefficient, \(\gamma_{\alpha \beta }\) and \(\eta_{\alpha \beta }\) are the horizontal neighborhood and vertical neighborhood of the given point [8]. For discrete noise, k-nearest neighbor filtering is used to denoise, establish the topological relationship between the pixels in the cognitive region, read in the pixels to be processed, establish KD tree, give a reference point in the pixels, and calculate the distance mean of k-nearest neighbor of the reference point. The calculation formula of distance mean \(E\) is as follows:

$$ E{ = }\frac{1}{\zeta }\sum\limits_{{{\text{s}} \in H}} {\left\| {{\text{s}} - E} \right\|} $$
(5)

Where \({\text{s}}\) is the given reference point of the pixels in the cognitive region, \(H\) is the k-nearest neighbor of \({\text{s}}\), and \(\zeta\) is the KD tree nearest neighbor coefficient. The threshold of k-nearest neighbor distance is set. When the mean distance \(E\) is less than the threshold, the pixel is retained. When \(E\) is greater than or equal to the threshold, it is judged as discrete noise point and marked out. Finally, the image is transformed into gray image, and three primary colors \(R,G,B\) are selected as blue gray, green gray and red gray respectively. The maximum value of primary color component of each pixel is selected and multiplied by the corresponding weight to obtain the gray value after image graying. The calculation formula of gray value \(V\) is as follows:

$$ V{\text{ = max}}\left( {R \times W_{R} + G \times W_{G} + B \times W_{B} } \right) $$
(6)

Among them, \(W_{R} ,W_{G} ,W_{B}\) are the different weights of \(R,G,B\), which are converted into the gray image of quantization level 0–255 by formula (6). Then the gray-scale image is enhanced to highlight the key information of the image. The segmented linear enhancement method is used to select two lines with different slopes, so that the amplitude of gray-scale transformation of the image is different. The transformation function \(S\) is as follows.

$$ S{ = }\int_{0}^{1} {P_{{\text{r}}} \left( {\text{w}} \right){\text{dw}}} $$
(7)

Where, \({\text{r}}\) is the gray level of the image, which is 0–255, \(P_{{\text{r}}}\) is the distribution function of the gray level of the image, and \({\text{w}}\) is the integral variable of the gray level distribution function. After the transformation, the probability density of gray level of the image with uniform distribution is generated to expand the distribution range of pixel values and further enhance the background degree of the target and background region. Impulse noise and Gaussian noise are taken as the main noise of preprocessing, and linear mean filter is selected for denoising. For the pixels to be processed with noise, a 3 × 3 window is given to cover several adjacent pixels of the pixels to be processed. So far, the preprocessing of the image data of basketball dribble posture has been completed.

2.3 Recognition of Basketball Dribble Attitude

Recognition of Coarse Features of Basketball Dribble Posture

After preprocessing, the rough features of basketball dribble pose image are extracted, and the edge and contour of human body pose are preliminarily recognized. Hessian matrix is applied to the preprocessed image to select the scale. The scale \(\sigma\) of the matrix in image \({\text{h}}\left( {\text{x,y}} \right)\) is defined as:

$$ H{ = }\left[ {\begin{array}{*{20}c} {L_{{{\text{xx}}}} \left( {{\text{h,}}\sigma } \right)} & {L_{{{\text{xy}}}} \left( {{\text{h,}}\sigma } \right)} \\ {L_{{{\text{xy}}}} \left( {{\text{h,}}\sigma } \right)} & {L_{{{\text{yy}}}} \left( {{\text{h,}}\sigma } \right)} \\ \end{array} } \right] $$
(8)

Where \({\kern 1pt} {\kern 1pt} H\) is the Hessian matrix of image \({\text{h}}\left( {\text{x,y}} \right)\), \(L_{{{\text{xx}}}} \left( {{\text{h,}}\sigma } \right)\) convolution \({\text{h}}\) at \({\text{x}}\) for Gaussian second order partial derivative, \(L_{{{\text{xy}}}} \left( {{\text{h,}}\sigma } \right)\) is the convolution of \({\text{h}}\) at \({\text{x}}\) and \({\text{y}}\), and \(L_{{{\text{yy}}}} \left( {{\text{h,}}\sigma } \right)\) is the convolution of \({\text{h}}\) at \({\text{y}}\) [9, 10]. The Hessian matrix template is selected to calculate the convolution, and all the continuous regions of the image are given the same weight. The integral image at any pixel position in the image is calculated, and the convolution of the image at the pixel position is obtained. Let the integral image corresponding to \(\left( {\text{x,y}} \right)\) be \(D\left( {\text{x,y}} \right)\), and the definition formula is as follows:

$$ D\left( {\text{x,y}} \right){ = }\sum\limits_{{{\text{i}} = 1}}^{{\text{x}}} {\sum\limits_{{{\text{j}} = 1}}^{{\text{y}}} {I\left( {\text{i,j}} \right)} } $$
(9)

Where \(I\left( {\text{i,j}} \right)\) is the pixel value at position \(\left( {\text{x,y}} \right)\) for the image after scale selection. The scale space of the integrated image is constructed by using pyramid method, and the filter size is changed with the change of the construction size, so as to ensure the image size remains unchanged. In different scales of the integrated image, the target feature points are found. By using fitting method, the pixel points of the same scale in pyramid are compared, local extreme points are obtained, the position of feature points of target is initially located, and the spatial scale of contour and edge is determined. Using surf algorithm, the main direction of the target feature points is allocated. Firstly, a circular neighborhood is delineated, and the Harr wavelet features in the horizontal and vertical directions of all the image feature points in 60° fans are counted. Then, the sector area is rotated according to the interval standard of 0.2 rad, and the Harri small Porter sign is counted again until the whole circle area is traversed, and the longest vector direction is selected and taken as the target feature point The main direction of. At this point, the recognition of coarse features of the basketball dribble posture image is completed.

Identification of Fine Characteristics of Basketball Dribble Attitude

Combined with convolution neural network, the fine features of basketball dribbling attitude image are extracted, and each layer of features is used to identify human posture. Around the feature point, a square area block with a side length of 20 s is selected, which rotates to the main direction of the feature point, and the target is fixed with a square mesh so that it is not affected by the size of the spectral image. According to the fixed in the square grid, the convolution operation is carried out layer by layer. First, the weight function is used to weight the image. The formula is \(F\) as follows:

$$ F{ = }\int {D\left( {\text{x,y}} \right)R\left( {\text{t}} \right){\text{dt}}} $$
(10)

Where \(R\left( {\text{t}} \right)\) is the weighting function of the integral image and \({\text{t}}\) is the time interval from the current time after the convolution neural network training. Input the integral image, convolute the image in multiple dimensions, pool the maximum value, carry out nonlinear transformation on the weighted image, divide the contour image of rough target recognition into multiple sub regions, output the average value of each sub region, and search the local details of the target in small scale [12, 13]. Through the classification layer of neural network, the multi-dimensional vector of local features is output, and the vector is mapped to the probability space with sum of 1, so as to reduce the data dimension between layers in the connection layer of neural network, and constantly extract the front-end features above the search features. The first convolution layer output search feature is \(O_{{\text{z}}}\), \({\text{z}}\) is the search feature category, and the jump connection formula is expressed as follows:

$$ O_{{\text{z}}} = {\kern 1pt} {\kern 1pt} I\left( {U{ + }C_{1 - \mu } } \right){ + }C_{1 - \mu } $$
(11)

In the formula, \(\mu\) is the number of skip layers of the search feature in the neural network, \(I\) is the weighting function of the integral image, and \(C\) is the composite function of different layers. Finally, for the small-scale target features, the convolution kernel with step size of 2 is used to carry out 0-complement interpolation expansion to ensure the same size of the detail features in the connection layer. The output channels of different convolution layers are added, and the detail features extracted from each layer are used to output the fine features of the image target. The specific process is as follows: searching the feature points corresponding to the template image, matching the fine features of the target image, and recognizing the basketball dribbling posture (Fig. 1).

Fig. 1.
figure 1

Basketball dribble attitude recognition process

Through the convolution neural network, the feature vectors of the reference image are trained repeatedly to get the feature vector set of the reference image, and the precise features extracted from the target are verified to realize the recognition of basketball dribbling posture. So far, the recognition of basketball dribble posture based on fine features is completed, and the design of high-precision recognition method of basketball dribble posture based on lightweight RFID mobile authentication protocol is completed.

3 Experiment and Analysis

The design method is compared with two groups of high-precision recognition methods of basketball dribble posture, and the accuracy of three groups of methods is compared.

3.1 Preparation of the Experiment

Selection Microsoft. NET Platform, the programming language is matlab 2010a, the development environment is visual studio integrated development environment, the development tool is Visual C++ 7.0, the configuration of 4gb64 bit lpddr4 memory, Quad ARM® A57/2 MB L2 CPU graphics card. Through the camera, the original sequence image of basketball dribbling posture is collected, and the normalized image size is 50 * 165 pixels. The original image in the sequence image is selected as the experimental object of the three recognition methods. The technical parameters of camera and RFID equipment are shown in the following Table 1:

Table 1. Technical parameters of robot sensing equipment

The captured images are shown below (Fig. 2):

Fig. 2.
figure 2

Image of basketball dribble

Until the original image to be processed, the overall brightness of the image sequence is low, and multiple cognitive regions are arranged. In the cognitive region of the image, less detailed information can be obtained, some regions are fuzzy, and there is a certain degree of color distortion. After random pruning, noise disturbance, color dithering and rotation transformation for many times, the original image data is expanded, and a total of 1154 basketball dribbling posture images are obtained in the data set.

3.2 Distance Recognition Accuracy Test

Three groups of methods are used to identify the three-dimensional position of the main bone points in the basketball dribbling posture, and the distance deviation between the identified position and the actual position is compared. Firstly, the athlete’s bone points are numbered as follows (Table 2):

Table 2. Number of bones in basketball posture

The spatial position of the identified bone points is recorded, and the distance deviation between the identified position and the actual position of the bone points is measured (Table 3).

Table 3. Comparison of distance deviations between skeletal points (m)

It can be seen from the above table that the distance deviation of the three groups of methods to identify the spatial position of the bone points is less than 0.03 m. Within the control range of the distance accuracy standard index, the average distance deviation of the design method is 0.017 m, and the average distance deviation of the conventional method 1 and the conventional method 2 are 0.024 m and 0.028 m respectively. Compared with the two traditional methods, this design method can use the lightweight RFID mobile authentication protocol to identify the spatial position of bone points, and the distance deviation between the two positions is reduced by 0.007 m and 0.011 m respectively.

3.3 Precision Testing for Angle Recognition

Measure the angle deviation between the three groups of methods to identify the spatial position of the bone points and the actual position of the bone points.

Table 4. Comparison of Angle Dev of bone points(°)

It can be seen from the above Table 4 that the angle deviation of the three groups of methods for identifying the spatial position of bone points is less than 13°. Within the control range of the angle accuracy standard index, the average angle deviation of the identification results of the design method is 8.07°, and the average angle deviation of the identification results of the conventional method 1 and the conventional method 2 are 10.97° and 12.18° respectively 9° and 4.11° respectively. To sum up, the design method uses lightweight RFID mobile authentication protocol, which can accurately identify the skeleton points of basketball dribble posture, reduce the distance deviation and angle deviation of recognition space position, and improve the accuracy of posture recognition.

4 Conclusion

In the field of basketball, the existing training plan mainly depends on the coach’s artificial observation and personal experience, which is inevitably subjective. The application of body area network technology in the training of athletes can help coaches to assist in decision-making and greatly improve the competitive ability of athletes, Accurate recognition of basketball posture plays an important role in basketball competition and training. To sum up, the design method gives full play to the technical advantages of lightweight RFID mobile authentication protocol, and improves the accuracy of basketball dribble gesture recognition. However, there are still some shortcomings in this study. In the future research, human posture bone points will continue to be classified, and the influence of high priority bone points on low priority bone points will be considered to further improve the accuracy of posture recognition.