1 Introduction

In the process of target recognition, there are two most basic problems, one is the extraction and description of target features, and the other is target recognition and matching. The former is the key and the foundation of the latter. Generally, the appearance geometry is the most basic feature of the target, so how to describe it effectively is a very critical topic. The traditional description methods include Fourier description, projection description, and statistical description. We know that the edge contour best reflects the shape of the object, because it concentrates the main information of the object shape feature, so how to effectively and concisely characterize the target edge contour is the key to the description of the target shape.

The key to people being able to easily identify occluded, missing, or ambiguous targets lies in the fact that they firmly grasp the typical characteristics of the target during the recognition process, and ignore some unimportant details, even if the characteristics of the target are consistent with people’s previous knowledge there are some differences. People can use the fuzzy reasoning ability of the brain to obtain a more satisfactory matching result. Therefore, the theory of missing element matching and fuzzy reasoning should be introduced in the matching process.

Fuzzy neural network can be used to diagnose the operating status of partial discharge detection and analyze the faults in the stator windings of high-voltage motors. Chang uses a high-frequency current sensor to measure the partial discharge signal of the stator winding, and then uses the phase-resolved partial discharge technology to convert it into a three-dimensional figure. The analysis process of the system is designed, including the use of fractal theory framework to extract faults from signal features to identify faults and obtain the fractal dimension and implicitness of characteristic parameters. The extended theory of similarity function definition is used to build a fault feature database for each fault. Although he used the partial discharge energy value and fault type as the index of the fuzzy algorithm in order to design and establish the fuzzy membership function and reasoning system for analyzing the running state of the motor, the research is not novel enough (Chang et al. 2016). Once a reinforced concrete (RC) structure is damaged by a fire, the fire should be evaluated to take appropriate post-fire actions, including a decision on whether it can be repaired for reuse. Hae-Chang believes that since the assessment results of current fire damage diagnosis methods are highly dependent on the subjective judgment of inspectors, it is difficult to ensure its objectivity and reliability. His research aims to develop a new type of fire damage diagnosis system (FDDS) based on fuzzy neural networks, which can consider all the damage observed in the inspection of reinforced concrete members exposed to fire. Although his research can provide objective and comprehensive evaluation results, the research process lacks data (Hae-Chang et al. 2017). Water resources are not only the essence of human life, but also an important prerequisite for ensuring the economic and social development of a country or city. Baohui believes that due to long-term global overdevelopment, a water crisis has already begun to appear. The rule layer uses fuzzy neural network to determine the weight. In order to verify the accuracy of the results, the Analytic Hierarchy Process (AHP) and entropy method were used to calculate the weight of each indicator, and they were included in the above evaluation methods. Although he used the matter-element model to evaluate the system and Spearm, his research lacked experimental comparison (Men et al. 2017). Mansouri believes that to provide cost-effective execution in a cloud environment, an appropriate task scheduling strategy is necessary. He proposed a hybrid task scheduling algorithm FMPSO based on fuzzy system and improved particle swarm algorithm to improve load balancing and cloud throughput. The FMPSO strategy first considers four improved speed update methods and roulette selection techniques to enhance global search capabilities. Then, it uses crossover and mutation operators to overcome certain shortcomings of PSO, such as local optima. Although his scheme applies fuzzy neural network to fitness calculation, the calculation accuracy of FMPSO proposed by him is not enough (Nma et al. 2019).

This research mainly discusses the pattern recognition of the take-off action of basketball players based on the fuzzy neural network system. In this study, the key points of the human body are enhanced by introducing a 3D posture fuzzy neural network. In order to visualize the depth map, it is processed in pseudo-color. The network architecture is divided into two modules, namely a 2D human body key point detection module and a 3D posture estimation network module. Among them, the function of the 2D human body key point detection module is to detect the position information of the human body key points in a single color image. Each joint point is represented by a score map. The largest score map predicts that the key points in the image exist, and all score maps of are merged into a set, which can uniquely represent the 2D key points of the human body. The function of the 3D pose estimation fuzzy neural network module is to take the depth map and the coordinate set of the 2D human body key points of the previous module as input data. Through a series of 3D convolution operations, the coordinates of the 3D human body key points are obtained as the output of the module. A complete 3D human pose. The designed fuzzy neural network classifier also uses the backward propagation algorithm of the gradient descent principle to modify its connection weights. In the application, it is also divided into two stages, namely the supervised learning stage and the classification stage of the unknown data. It can also be adapted to data with fuzzy nature in classification, and its output is no longer the result of a single classification, but various degrees of membership.

2 Pattern recognition of basketball players' take-off action

2.1 Fuzzy neural network

Fuzzy neural networks are widely used in pattern recognition, automatic control information processing and other fields. Fuzzy neural network is similar to the parallel structure of multi-sensor information fusion (Ning et al 2019; Yi et al 2020). Fuzzy neural networks have inherent advantages in describing and processing uncertain events and incorrect knowledge. Reducing the fuzzy rule set according to the degree of support of the rule can reduce the influence of noise data and deviation value to a certain extent. The incompleteness of noise data and rule set affects the classification accuracy of the algorithm. If the sample data is large, the following formula can be used to calculate the support of fuzzy rules to simplify the calculation process (Zhu et al. 2018). Human body pose estimation has developed rapidly, especially the 2D human body pose fuzzy neural network algorithm. However, due to the variability of appearance, joint changes, background interference and severe occlusion, the results of human body pose estimation are affected. In sports, the player's movements change greatly, and the 2D pose estimation algorithm is not applicable. In contrast, 3D human posture fuzzy neural network algorithms are more suitable. Most 3D human posture fuzzy neural network algorithms are based on multiple camera systems. Although the accuracy rate is high, the system is more complex, and it is necessary to modify the basketball court environment. More (Xue et al. 2018). Assume the relative member matrix S of standard eigenvalues and the relative member matrix \(B_{{hj}}\) of all levels.

$$ B_{{hj}} = u_{{hj}} \left\{ {\sum\limits_{{i = 1}}^{m} {\left[ {w_{{ij}} \left( {r_{{ij}} - s_{{ih}} } \right)} \right]^{p} } } \right\}^{{\frac{1}{p}}} + \min \sum\limits_{{h = aj}}^{{b_{j} }} {B^{2} _{{hj}} } $$
(1)

By using neural networks to extract effective signals and input them into fuzzy inference, the difficulty of the fuzzy rule process can be greatly reduced (Wei and Huang 2018; Lin et al. 2018). When the neural network is delayed, the fuzzy hierarchical algorithm is used to preprocess the input signal, and then the neural network is used to complete the fault diagnosis function, which can effectively improve the accuracy of the diagnosis result (Yang et al. 2018).

$$ F\left[ {M\left( {S,t + 1} \right)} \right] = M\left( {S,t} \right)\frac{{f\left( {S,t} \right)}}{{f\left( t \right)}}\left[ {1 - P_{c} \frac{{\delta \left( S \right)}}{{L - 1}} - O\left( S \right)P_{m} } \right] $$
(2)

If we set the connection weight between the clarification and the rule layer \(\delta _{i}^{3}\), there are:

$$ \delta _{i} ^{3} = \frac{{\partial E}}{{\partial O^{3} _{i} }} = \sum\limits_{{i = 1}}^{C} {\frac{{\partial E}}{{\partial Q_{i}^{4} }}} \times \frac{{\partial Q_{i}^{4} }}{{\partial Q_{i}^{3} }} = \sum\limits_{{i = 1}}^{C} {\frac{{\partial E}}{{\partial Q_{i}^{4} }}} \times W_{{ij}} \frac{{\sum\limits_{{k = 1}}^{\delta } {Q_{k}^{3} - Q_{j}^{3} } }}{{\sum\limits_{{k = 1}}^{\delta } {Q_{k}^{3} } }} $$
(3)

Here, \(S\) is the number of fuzzy rules generated by \(\delta _{i}^{3}\) (Wang et al. 2018; Yang et al. 2017).

$$ J_{p} \left( {z\left( t \right)} \right) = \frac{{\prod\nolimits_{{m = 1}}^{l} {M_{{pm}} \left( {z\left( t \right)} \right)} }}{{\sum\nolimits_{{p = 1}}^{r} {\coprod\nolimits_{{m = 1}}^{l} {M_{{pm}} \left( {z\left( t \right)} \right)} } }} $$
(4)

\(r\) represents the number of defined fuzzy rules. By adjusting the parameter input, the probability of \(\xi _{t} \left( {i,j} \right)\) is maximized (Chen and Ma 2017).

$$ \xi _{t} \left( {i,j} \right) = P\left( {i_{t} = i,i_{{t + 1}} = j\left| {Q,\lambda } \right.} \right) = \frac{{\alpha _{t} \left( i \right)a_{{ij}} B_{j} \left( {Q_{{t + 1}} } \right)\beta _{{t + 1}} \left( j \right)}}{{P\left( {Q\left| \lambda \right.} \right)}} $$
(5)

\(P\) refers to the probability of becoming state \(J\) at time t + 1 [15].

$$ J = \sum\limits_{{i:yi = 1}} {\left( {C_{\mu }^{1} \left( {x_{i} } \right) - C_{\mu }^{{ - 1}} \left( {x_{i} } \right) - 1} \right)} ^{2} + \sum\limits_{{i:yi = - 1}} {\left( {C_{\mu }^{{ - 1}} \left( {y_{i} } \right) - C_{\mu }^{1} \left( {y_{i} } \right) + 1} \right)} ^{2} $$
(6)

Among them, \(C_{\mu }^{1}\) represents different categories (Li et al. 2017).

2.2 Take-off feature extraction

Feature extraction is often based on a certain decision rule used in classification. The extracted features minimize the classification error under certain criteria. For this reason, it is necessary to examine the statistical relationship between the loyalty features and select the appropriate orthogonal transformation to extract the most effective features. Feature selection also requires a certain classification criterion, under which the features that contribute significantly to the classification are selected. By deleting the features with little contribution, through feature extraction and selection, not only the processing time is reduced, but also the classification error is reduced.

$$ R_{S} \left( {r_{x} } \right) = \frac{{\sum\nolimits_{{j = 1}}^{{\overline{{\left| X \right|}} }} {\prod\nolimits_{{j = 1}}^{{Nf}} {\sup _{{B_{i}^{k} \in B_{i} }} \mu B_{j}^{k} \left( {x_{i} ,j} \right)} } }}{{\sum\nolimits_{{i = 1}}^{{\left| X \right|}} {\prod\nolimits_{{j = 1}}^{{Nf}} {\sup _{{B_{j}^{k} \in B_{j} }} \mu B_{j}^{k} \left( {x_{i} ,j} \right)} } }} $$
(7)

Take \(x_{i}\) as a sample of the training data set \(R_{S} \left( {r_{x} } \right)\), and \(\overline{{\left| X \right|}}\) as a subset of \(\left| X \right|\) (Xiao and Jian 2016). On the basis of the above definition, for those samples whose corresponding rules cannot be found in the rule set, recent rules have been used to classify such samples. The nearest rule is essentially an approximation of the rule corresponding to the sample to be classified. The definition is based on the inherent nature of a classification problem, that is, a sample and its nearest neighbor often belong to the same class. According to the above definition, the latest rules are used to classify samples that cannot find corresponding rules in the rule set. The most recent rule, essentially an approximation of the rule corresponding to the classification sample, is based on the inherent characteristics of the classification problem. In other words, most of the samples and recent ones belong to the same category.

$$ M\left( {f_{i} ,f_{j} } \right) = \smallint \smallint p\left( {f_{i} ,f_{j} } \right)\log \frac{{p\left( {f_{i} ,f_{j} } \right)}}{{p\left( {f_{i} } \right)p\left( {f_{j} } \right)}}df_{i} df_{j} $$
(8)

Here, \(p\left( {f_{i} } \right)\) and \(p\left( {f_{j} } \right)\) are the probability distributions of the feature \(f_{i}\) or \(f_{j}\). FCM's clustering algorithm divides the data set X into C categories. Indicates the degree of membership of the X sample to the i-th cluster center. The best objective function of the FCM algorithm is as follows (Rocha, & Alvaro, 2016).

$$ Minimize{\text{ }}S_{m} \left( {U,V} \right) = \sum\limits_{{i = 1}}^{c} {\sum\limits_{{j = 1}}^{n} {\mu ^{m} _{{ij}} } } \left\| {x_{j} - v_{i} } \right\|^{2} $$
(9)

Here, \(S_{m} \left( {U,V} \right)\) is the Euclidean distance of the cluster center. \(\mu\) is called a weighted index, which may affect the clustering effect of FCM. Generally speaking, time series multi-step (long-term) forecasting refers to the forecast of multiple data points in the future. As people's research problems and actual needs deepen, multi-step prediction is more valuable in theory and application than single-step prediction. At the same time, most time series models take the improvement of forecast accuracy as the only goal. However, in terms of practical problems, the model not only needs to predict correctly, but the model data is also transparent, which can explain the forecast results. The spatial information of image segmentation is very important, but it means that there is no need to consider traditional algorithms for the spatial correlation of blurred pixels (Wang et al. 2016a, b; Yang et al. 2016).

$$ S_{m} = \sum\limits_{{i = 1}}^{c} {\sum\limits_{{j = 1}}^{N} {u_{{ij}}^{m} } } \left\| {x_{j} - v_{i} } \right\|^{2} + \frac{\alpha }{{N_{R} }}\sum\limits_{{i = 1}}^{c} {\sum\limits_{{j = 1}}^{N} {u_{{ij}}^{m} } } \left\| {x_{r} - v_{i} } \right\|^{2} $$
(10)

Among them, \(N\) is a pixel, which means the number of pixels contained nearby (Marasini et al. 2016).

2.3 Take-off gesture recognition

Human action recognition is essentially a classification process, that is to say, some action images are used as training samples for action recognition, and then the human action images that need to be recognized are recognized according to different recognition algorithms to achieve the purpose of classification. Its essence is the process of identifying the image that needs to be recognized according to the size of the difference in the similarity between the image containing the human action and the training sample, and comparing the extracted human action features with the prior knowledge. Based on their characteristic information, the process of classifying behavior patterns can be realized. The basketball take-off action is shown in Fig. 1.

Fig. 1
figure 1

Basketball take-off action Basketball takeoff (http://alturl.com/wnamn)

A neural network is a matrix of processor nodes connected to each other. Each node is a neuron, which is a simple simulation of the human brain nerve cells. Each neuron accepts more than one input that is multiplied by a weight factor, and adds these inputs together to produce an output. Neurons can be arranged hierarchically, the first layer accepts basic input, and then passes its output to the second layer: the second layer has its own weight and algebraic sum, etc., until the output of the last layer. The fuzzy membership function of each pixel of the matrix is independent (Jiang and Wang 2016).

$$ \min S\left( {U,V} \right) = \min \left\{ {\sum\limits_{{i = 1}}^{n} {\sum\limits_{{k = 1}}^{c} {B^{m} _{{ik}} d_{{ik}}^{2} } } } \right\} = \sum\limits_{{i = 1}}^{n} {\min \left\{ {\sum\limits_{{k = 1}}^{c} {B_{{ik}}^{m} d^{2} _{{ik}} } } \right\}} $$
(11)

The constraint condition of the above extreme value is \(\sum\limits_{{k = 1}}^{c} {B_{{ik}} } = 1\). Take the maximum value \(R_{{\max }}^{{}}\) as the pixel value of the picture (Kazmiruk et al. 2018; Wang, et al. 2016a, b).

$$ Y(x,y) = \max \left\{ {R\left( {x,y} \right),G\left( {x,y} \right),B\left( {x,y} \right)} \right\} $$
(12)

Image integration \(Y\left( {x,y} \right)\) often requires a combination of multiple colors. \(R\left( {x,y} \right)\) gradually tends to the standardization of images from various average divisions of multiple colors. At this time, the system will automatically select the clearest part of the image, which is the largest part of the image generation, \(N\left( {x,y} \right)_{{\max }}\), as the overall system data. The image calculated by the average method is basically relatively uniform (Xu and Peng 2020).

$$ N(x,y) = \left( {R\left( {x,y} \right) + G\left( {x,y} \right) + B\left( {x,y} \right)} \right)/3 $$
(13)

The three elements have different weights \(R\)(Liang et al. 2019; Sun et al. 2018).

$$ M(x,y) = 0.3R\left( {x,y} \right) + 0.59G\left( {x,y} \right) + 0.42B\left( {x,y} \right) $$
(14)
$$ P_{{Ro}} = I^{2} Ro = \left( {\frac{{V_{{in}} }}{{Ri + Ro}}} \right)^{2} \times Ro $$
(15)

The maximum output power of the system \(P_{{Ro}}\) is usually equal to the external load of the internal resistance (Zhou et al. 2018).

3 Basketball player take-off action pattern recognition experiment

3.1 Overall architecture of fuzzy neural network system

Since the biggest problem encountered by the 3D human pose fuzzy neural network is to use three-dimensional spatial information to make up for the shortcomings of the 2D human pose fuzzy neural network and to train the network structure through the 3D human pose data set. Ability. At present, most researchers use a semi-supervised method of training, that is, by introducing a 2D pose dataset to constrain the 3D human pose fuzzy neural network to improve the accuracy of the 3D network model.

In this paper, the key points of the human body are enhanced by introducing a 3D pose fuzzy neural network. In order to visualize the depth map, it is processed in pseudo-color. The network architecture is divided into two modules, namely a 2D human body key point detection module and a 3D posture estimation network module. Among them, the function of the 2D human body key point detection module is to detect the position information of the human body key points in a single color image. Each joint point is represented by a score map S2n. The largest score map predicts the existence of key points in the image. All the score maps are merged into a set, which can uniquely represent the 2D key points of the human body. The function of the 3D pose estimation fuzzy neural network module is to take the depth map and the coordinate set of the 2D human body key points of the previous module as input data. Through a series of 3D convolution operations, the coordinates of the 3D human body key points are obtained as the output of the module. A complete 3D human pose. The 3D posture fuzzy neural network system is shown in Fig. 2.

Fig. 2
figure 2

3D attitude fuzzy neural network system

3.2 2D overall network architecture

In the 2D human body key point detection, the network structure has multiple choices such as CPM and RMPE. When selecting the network structure, it is necessary to select a suitable network according to the detection efficiency and result accuracy. The 2D human body key point detection results are used as the input of the 3D posture fuzzy neural network module. The running time of 2D human key point detection should be minimized to ensure the operating efficiency of the 3D posture fuzzy neural network module, and at the same time, try to improve the accuracy of the 2D human body key point detection results. This article uses CPM + PAFs network structure to detect the key points of 2D human body. A bottom-up partial affinity field (PAFs) correlation score is used, where PAFs are a set of 2D vector fields in the image domain that encode the position and direction of the limbs. The bottom-up detection representation and the correlation of the global content of the encoding are used, and the fuzzy algorithm is applied to reduce the computational overhead and improve the detection effect of key points. A 2D human body key point detection module based on the PAF structure in series, in which the affinity field and confidence map from the coding part to the part can be predicted. The network is divided into two branches: branch 1 prediction confidence map, branch 2 prediction affinity field, and two branches It belongs to an iterative forecasting structure, and supervision is carried out in the middle of each stage. The picture is first analyzed by a fuzzy neural network to generate a set of feature maps F and input into each stage. A set of detection confidence maps S = p1 (F) and a set of partial affinity fields L' = 01 (F), SI, L and F are used for the prediction of the next stage, until stage t. After getting multiple detection confidence maps and partial affinity fields, we use binary matching to organize the related body parts, and finally output the body posture of each person in the picture. The network can guarantee low-level and high-level features, improving the performance and accuracy of runtime.

3.3 3D attitude estimation module

The 3D pose estimation module receives the score map of the 2D human body key point detection. This module uses the 2D human body key point detection results and the voxel coordinates in the depth map to calculate the 3D coordinates of each joint of the human body. The 3D pose estimation network structure is similar to U The encoder and decoder architecture of the net network uses multiple loss values in the middle to decrypt the full-resolution score map.

When calculating the voxel grid, it is necessary to convert the depth map into a point cloud centered on the voxel grid, that is, the depth vectorized projection to the world coordinate system, discard the extra part of the point cloud, and reduce the point cloud to voxels The size interval is quantified at the same time. When at least one point of the point cloud is located in the indicated interval, the element is set to 1, otherwise it is set to 0 to calculate the voxel grid.

3.4 Design of fuzzy neural network classifier

The traditional multi-layer perceptron cannot process the input with semantic form (such as high, medium, low). In addition, it is generally only suitable for the situation where the input data is ideal. For example, for a classifier, a certain data can only belong to a certain category, but cannot belong to several categories at the same time. But in real life, there are often morbid data, or a certain data can belong to several categories. Fuzzy neural networks that combine fuzzy concepts with neural networks can deal with the above problems. The fuzzy network classifier introduced here combines fuzzy concepts in each layer of the network. Its input is a degree of membership with semantic properties, and its output is also the degree of membership of a fuzzy set. In addition, it also uses the backward propagation algorithm of the gradient descent principle to modify its connection weights. In the application, it is also divided into two stages, namely the supervised learning stage and the classification stage of the unknown data. Since its input and output are membership degrees, it can handle semantic input. It can also be adapted to data with fuzzy nature in classification, and its output is no longer the result of a single classification, but various degrees of membership.

3.5 Data set selection and training

MPII data set: MPII human body posture data set belongs to the 2D human body posture annotation data set. It mainly records the human body posture in daily activities, including 410 daily activities. There are 25 K existing pictures and a total of 40 K annotated human joints. Each picture provides an activity category label uses 16 joint points to record the human body posture. The image data comes from Youtube videos. The annotations on the human body posture in the test set are more comprehensive, especially the occluded body joints, 3D body joints and head orientation.

COCO key point data set: training set, validation set and test set, including a total of about 200 K pictures and 250 K human examples with key points. Among them, 150 K examples of training set and validation set are available. The total number of videos in the data set is 13,320, of which the number of training samples is 9537, and the test samples are 3783. In terms of data preprocessing, first, in order to increase the training samples, the video sequence is divided into 16 non-overlapping frames, the picture size is 171 × 128, a total of 148,856 video clips. In terms of data enhancement, the training pictures are randomly flipped, and random cropping is used to fix the picture size to 112 × 112, while the test pictures are only cropped at a fixed position. In terms of training and testing, due to the large training samples, batch training is adopted, batchsize is set to 16, and z-score standardization is applied to each batch, so that the mean value of the processed data is 0 and the standard deviation is 1. At the same time, batch normalization (BN) processing is applied to the middle layer of the deep network, which can accelerate the network training speed, eliminate the dropout processing method, and further improve the model classification accuracy. Stochastic Gradient Descent (SGD) is used, and Momentum is set to 0.9. Set the initial learning rate to 0.005, divide the learning rate by 10 for 4 iterations, and stop training after 16 iterations. The experimental platform is Ubuntu 16.04 LTS + Tensorflow 1.9 + Keras2.2.

Finally, in the training process of the optimized model, first use the COCO data set for training, and then use the collected basketball player motion images for further training. Training with the COCO data set first can ensure the accuracy of the overall estimation of the optimized model. After training on the COCO data set, use the collected basketball player motion images to train again to ensure the accuracy of the optimized model's estimation of the basketball player's body posture.

4 Results and discussion

4.1 12D Detection and analysis of key points of human jump

First, the 2D human body key point detection algorithms are compared. The model in this paper is trained on the MPII and COCO data sets. Using the output COCO key point model results, 18 human key points can be detected. In the take-off motion estimation, the shooting motion in the NtURGB + D120 data set is used for evaluation, and the color picture provided by the data set and the 2D coordinates of the human body key points are also used in the 2D human body key point detection and evaluation. Taking the action of the performer in the shooting action as an example, three pictures are selected for experiment, and 14 of the key points of the human body are selected for visualization. The second left in the figure is the 2D human key points marked in the data set, and the third from the left is detected using the model in this article. The key points of the 2D human body, as can be seen from the figure, the model in this paper can better detect the key points of the 2D human body. The key point detection of MPII data set is shown in Fig. 3. The key point detection of 2D shooting in this study is shown in Fig. 4. The number of test frames for each video is shown in Table 1.

Fig. 3
figure 3

Key point detection of MPII data set (http://alturl.com/rr28j)

Fig. 4
figure 4

The key point detection of 2D shooting in this study

Table 1 Number of test frames for each video

When recognizing a complex target, people’s prior knowledge is usually used together with fuzzy reasoning ability, so it is necessary to introduce the concept of fuzzy reasoning into the matching criterion. On the other hand, when people recognize a occluded complex target, they only need to identify the most important feature of the target. Therefore, when matching, we should determine the number and priority of the symbol feature sequence subsets that need to be matched according to different requirements, and weight them according to their feature priority. Calculate the AP and detection time of the key points of the 2D human body, mAP represents the average AP of the three pictures, and use the 1st, 3rd and 4th stages of the 2D human key detection methods for experiments. It can be seen from the table that this article performed well in the 3-stage method. It reached a mAP of 48.2, surpassing the mAP of stage 1, and the processing time for a single picture was increased by 0.1 s compared to stage 1, and it was reduced by 0.3 s compared with the processing time of stage 4. Comprehensive comparison, the speed and accuracy of the three-stage method used in this article are better, and it is used as the 2D human body key point detector in this article. The comparison results of multi-stage methods are shown in Table 1. The analysis of the multi-stage method is shown in Fig. 5 and Table 2.

Fig. 5
figure 5

Multi-stage method analysis

Table 2 Comparison results of multi-stage methods

4.2 Fuzzy neural network recognition effect

In the experiment, the tester is completing a complete shooting action and decomposing the action into multiple action frames. By calculating the average Euclidean distance of all key points in each action frame, the coordinates are aligned with the buttocks key points before the calculation. Because the method in this paper uses the COCO key point coordinate format, there is no buttocks key points. When point, the average value of the sum of the coordinates of the right hip and the left hip is used as the key point of the hip. Using the 3D skeleton marked in the data set as the real key point coordinates, the MPJPE (average joint point coordinate error) between the 3D skeleton coordinates and the real 3D skeleton coordinates is calculated by the method in this paper. The statistical results are shown in Table 3. It can be seen from Table 3 that the 3D key point coordinate error obtained by the method in this paper is relatively small, and it is closer to the visualized action of the shot in the picture, indicating that the method in this paper can achieve good results in the estimation of the jump action. The 3D skeleton coordinate take-off action analysis is shown in Fig. 6.

Table 3 MPJPE (average error of joint point coordinates)
Fig. 6
figure 6

3D skeleton coordinate take-off action analysis

The fuzzy algorithm is used to calculate the measured values of the human posture in the two test actions, and the results are recorded in Table 4. The higher the measured value of the human body posture, the more similar the action is to the standard action. According to the results in the table, the measured value of test action 2 is higher than that of test action 1, and test action 2 is the most similar. This is the picture in Fig. 7. The displayed consistency shows the rationality of the choice of distance measurement method. The analysis of the two sets of test actions is shown in Fig. 7.

Table 4 Measured values of human posture in two test actions
Fig. 7
figure 7

Analysis of the two sets of test actions

4.3 Skeleton extraction analysis of take-off action

In the NTURGB + D120 data set labeled with KinectSDK, the three-dimensional key point coordinates are used as a reference, and the performer's actions in the A63 shooting action in the NTURGB + D120 data set are used. The second and third left sides of each picture are the 3D marked by KinectSDK in the data set. The results of human body key points are the same as the estimated results of this method. The viewing angle used for display is the same. 14 points are selected from the 25 key points obtained by Kinect for visualization, and the same 14 key points are selected for visualization in the 18 key points obtained by this method. As shown in Figure a, there is a certain difference between the key points of the elbow and the leg, and the joint length is different. From the comparison results of the two methods, the method proposed in this paper can better estimate the position of the corresponding key point. The detection accuracy of the right ankle was 93.47%, and the detection accuracy of the right knee was 88.99%. The binary processing of the take-off action is shown in Fig. 8. The skeleton extraction of the take-off action is shown in Fig. 9. The result analysis of this research and optimization is shown in Fig. 10. The estimated accuracy of each bone point is shown in Table 5.

Fig. 8
figure 8

Binarization of take-off action

Fig. 9
figure 9

Skeleton extraction of take-off action

Fig. 10
figure 10

Analysis of the results of this research and optimization

Table 5 Estimated accuracy of each bone point

5 Conclusion

This research mainly discusses the pattern recognition of the take-off action of basketball players based on the fuzzy neural network system. In this study, the key points of the human body are enhanced by introducing a 3D posture fuzzy neural network. In order to visualize the depth map, it is processed in pseudo-color. The network architecture is divided into two modules, namely a 2D human body key point detection module and a 3D posture estimation network module. Among them, the function of the 2D human body key point detection module is to detect the position information of the human body key points in a single color image. Each joint point is represented by a score map. The largest score map predicts that the key points in the image exist, and all the score maps of are merged into a set, which can uniquely represent the 2D key points of the human body. The function of the 3D pose estimation fuzzy neural network module is to take the depth map and the coordinate set of the 2D human body key points of the previous module as input data. Through a series of 3D convolution operations, the coordinates of the 3D human body key points are obtained as the output of the module. A complete 3D human pose. The designed fuzzy neural network classifier also uses the backward propagation algorithm of the gradient descent principle to modify its connection weights. In the application, it is also divided into two stages, namely the supervised learning stage and the classification stage of the unknown data. It can also be adapted to data with fuzzy nature in classification, and its output is no longer the result of a single classification, but various degrees of membership. The fuzzy neural network system designed in this research has a good effect on pattern recognition of take-off actions.