Keywords

1 Introduction

A large number of various human posture data with high precision are required to improve performance in statistical posture analysis. However, we found that there are not enough posture data available, because of complexity for acquiring posture data. In order to acquire whole posture data as a time-series, there are two major methods. One is video motion analysis (manual marking) and the other is motion capture analysis. Video motion analysis is very laborious, because we have to divide video into photographs and mark annotations on the pose images. Motion capture analysis is necessary to attach so many sensors on limbs that acquisition of data is complicated in operation. At CVPR 2017 conference, Carnegie Mellon University (CMU) presented OpenPose [1]Footnote 1 which can recognize skeletons of multiple players in real-time, using a simple web camera, as shown in Fig. 1. OpenPose adopts unique top-down position recognition using Deep Learning and also the unique algorithm as affiliation recognition of body parts by PAF (Part Affinity Fields) [2]. As a result, in the moving skeletal pictures generated by OpenPose, the skeleton marks are shown and overlapped well with the figure of people. And it seems that recognition accuracy is very high even for various people in various environments.

Although Results of OpenPose’s paper [2] said that OpenPose had achieved State-of-the-Arts in the COCO2016 keypoints challenge [3], we decided to evaluate the performance of OpenPose on our own in two aspects. At first we evaluated the correspondence between the actual body positions and the output data generated by OpenPose. As a result of column (actual/openPose rate) in Fig. 2, we found that the CV (Coefficient of Variation: std/mean) of rates of the distance from the neck of OpenPose to the actual body was 0.08. For example, since a neck is wide in the range of several cm, measurement errors can not be avoided to measure actual distance from the neck to each part. We estimate that CV shows acceptable accuracy. So we can evaluate that OpenPose recognizes body points in a static pose.

However the skeletal recognition of OpenPose in dynamic motion is not clear. In order to evaluate OpenPose’s performance in a dynamic motion, next we decided to build a basketball shooting prediction model using real-time skeletal data generated by OpenPose. The “OpenPose’s performance” we aim to clarify includes the accuracy of dynamic recognition and usefulness as a dynamic data generator with the accuracy of our shooting prediction model using dynamic data generated by OpenPose.

As a result, we found that the free throw prediction model indicated sufficiently significant accuracy. Thus, we found that OpenPose is a convenient and practical generator of posture data.

The rest of the paper is as follows. In Sect. 2, we briefly review previous dynamic posture analysis models. In Sect. 3, we show our experimental methods and selection of the prediction model. In Sect. 4, we show experimental result and estimation the accuracy of prediction. In Sect. 5, we conclude this paper. In Sect. 6, we present future works.

Fig. 1.
figure 1

OpenPose

Fig. 2.
figure 2

Comparison of distance from neck between OpenPose and actual body positions, Column Distance from no. 1 is the distance from neck to each point of OpenPose. Column actual/openPose rate is actual distance divided by Distance from no. 1

2 Previous Research for Posture Analysis

As a previous analysis of sports motion, there is a method called video motion analysis which divides video into photographs and marks points manually on the pose image for annotation [4]. As a direct sampling posture data, motion capture is used to collect data from sensors on body and limbs [5]. However, these methods are so expensive to collect data that statistical models such as regression could not be applied. MicroSoft KINECT, which is not sold as of early 2018, can easily take 3D posture data, but the sensing range is very narrow and the recognition accuracy of skeletal points are somewhat lower [6].

On the other hand, as a statistical approach, it is realized that winning prediction of basketball game was modeled by logistic regression using records which include the winning/losing results and the frequency of shooting and robbing the ball in the game [7].

As a time-series analysis for motions, there is a research that tries to transfer abstracted motions from a human to a robot with hidden variables estimated by Hidden Markov and reversely predict the next action of the robot from estimated hidden variables [8]. However, our shooting prediction of basketball free throw is not a general time-series model that predicts the next action from the last time-series of motions because our model predicts a result whether to shoot in the basket or not rather than an action. The number of persons monitored in this experiment was limited to 51. We adopted a logistic regression [9] using the features which are composed with the positions at the start and the end or their difference, velocity and acceleration between start and end positions extracted from a time-series of the free throw motions.

3 Method

3.1 Subject of Experiment

In order to construct the shooting prediction model of basketball free throw, we took movies of basketball free throw motions with a full hi-vision video camera. For subject of experiment, we used 51 records which were obtained by two or three trials of various skill levels of 23 persons of a high-school basketball team and some members of an exchange student basketball circle. We generated their skeletal data by OpenPose from the movies. In this experiment, 20 out of 51 records succeeded in the free throw.

3.2 Output of OpenPose

The version of OpenPose (see Footnote 1) adopted in this paper is for 2 dimensional skeleton recognition, and the skeletal coordinates of 18 points (COCO keypoints [2]) as shown in Fig. 3 are outputted to files in about 10 to 20 frames per second depending on a computer performanceFootnote 2 and connected as shown in Fig. 4 to make time-series data. A skeletal coordinate is composed of 3 values which are x as horizontal, y as vertical and p as confidence probability. We ignored low confident coordinates with less than 0.7 confidence probability.

Fig. 3.
figure 3

cited from [1]

Skeleton points of OpenPose

Fig. 4.
figure 4

Connection of frames

Though we used only skeletal data for prediction of shooting model, OpenPose can recognize also hands and faces as shown in Fig. 5 and outputs each recognized data to each file.

Fig. 5.
figure 5

(cited from [1])

Recognized finger and face points of OpenPose

3.3 Statistical Model

The shooting prediction model is a binary prediction as to whether to enter the basket or not. As major binary prediction models, there are logistic regression, SVM [10] and Xgboost [11]. The SVM using the kernel method is a nonlinear model which may make high accuracy but cannot calculate the shooting probability because SVM maps data space to higher dimensional space. Xgboost using the stochastic gradient method that has a high reputation for accuracy and robustness is not adequate for a diagnostic model because this model cannot indicate explicitly the degree of importance of explanatory variables. So we adopted a logistic regression model that is easy to interpret and commonly used.

Probability of logistic regression is as follows using \(\alpha \), \(\beta \) and features. The relationship between Z value and probability is shown in Fig. 6. Regression intercept \(\alpha \) and coefficients \(\beta \) are calculated by multivariate logistic binary regression on Maximum Likelihood whose partial difference can be solved by Newton-Raphson method [9] because of no local minimum in this optimization.

$$\begin{aligned} \mathcal {Z}= & {} \alpha + \sum ^{n}_{i=1} \beta _{i} \cdot feature_{i} \end{aligned}$$
(1)
$$\begin{aligned} probability= & {} \frac{1}{1+\exp (- \mathcal {Z})} \end{aligned}$$
(2)
Fig. 6.
figure 6

Logistic regression

3.4 Variables of Logistic Regression

The explanatory variables of the logistic regression model are features calculated from the time-series data. Features include the positions of skeletal points, moving speed, acceleration, etc. The target variable is the fact whether the ball enters the basket or not In the time-series data, as shown in Fig. 7, everyone bent knees at the start of the throw and lifted hands to the highest level upon completion of the throw. So we decided to define the interval of a free throw between the time when the knees were bent the most at the start and the time when the hands were lifted to the highest physical point at the end. All the coordinate positions were relative from the neck point.

Fig. 7.
figure 7

Posture of start and end

4 Results

4.1 Accuracy of Logistic Regression

In general, the precision of the logistic regression model is expressed in the pareto diagram as shown in Fig. 8. In this diagram, the horizontal axis shows the composition rate of all the members in descending order of shooting probability predicted by the logistic regression, and the vertical axis shows the composition rate of the number of people who succeeded in free throw. About 40% people succeeded at this experiment. Red dots in Fig. 8 indicate composition rate of accumulated people who shot in basket. For example, if a player with a high shooting probability at 5% point of composition rate in descending order of the probability and this success shoot is at the fourth among the total 100 success shoot, the red dot is marked at (0.05, 0.04). If the model was perfect, it would be represented by the line of the perfect model with descending order of shooting probability, and if the shooting probability of the model was uncorrelated with the actual shooting in basket, it would be the line of the uncorrelated model. The accuracy of logistic regression model is indicated by the ratio of the area A of the cumulative curve shown in Fig. 9. This figure shows that sufficiently significant accuracy AR (AccuracyRatio) = 41% was obtained. We also discuss this AR value in more detail in Appendix.

Fig. 8.
figure 8

Parete figure

Fig. 9.
figure 9

Area of AR value

4.2 Interpretation of Significant Features in Logistic Regression

As a result, significant features that make the high shooting probability in the logistic regression were shown in Fig. 10. The shooting probability becomes higher when the blue color features are larger. The shooting probability also becomes higher when the red color features get smaller. From this result, the followings were found out. It shows that the shooting probability is higher if the bend of the knees is increased and knees are pulled quickly and at the same time the ball is pulled back and thrown over head. This motion uses the force of the knees’ extension and the centrifugal force created when throwing the ball overhead.

Fig. 10.
figure 10

Relation for shooting probability and features

4.3 Real Time Diagnosis

The diagnostic system using OpenPose can display the shooting probability in real-time as shown in Fig. 11 and even if there are not any basket and ball, it becomes possible to judge the skill level directly just by gesture.

Fig. 11.
figure 11

Real time diagnosis by OpenPose

4.4 Posture Diagnosis

By comparing feature quantities between a beginner (a person with low shooting probability) and an expert (a person with high shooting probability), it is possible to diagnose the amount of correction for beginner’s postures. In the example of Fig. 12, one of the remarkable differences between the beginner and the expert is the position of the arm at the start in this experiment. The beginner pushed the ball from the chest, but the expert put the ball in front of the head and threw the ball over head. In this case, it is necessary to teach the beginner the form of overhead throwing.

Fig. 12.
figure 12

Comparison of features between the beginner and the expert. Horizontal axis represents significant feature value in Fig. 10.

Next we tried to analyze the sensitivity. As the result in Table 1, improvement of shooting probability was obtained when the beginner’s pose improves 2.0 (about 1 cm) at each feature.

Table 1. Sensitivity analysis, Column inc(%) shows incremental of shooting probability

5 Conclusion

We evaluated the performance of OpenPose. At first, we evaluated the correspondence between the actual body positions and the output data generated by OpenPose in the static conditions. Next, for evaluating OpenPose in the dynamic conditions, we built a basketball free throw prediction model by a logistic regression model. We found out the followings.

  1. 1.

    The skeletal data recognized by OpenPose are found to be highly applicable with sufficient accuracy.

  2. 2.

    In the previous posture diagnosis, data are generated by marking on a picture frame from a video stream or collected sensor signals by motion capture on the human parts. These methods are so expensive for data collection that statistical models could not be introduced. On the other hand, OpenPose can easily collect accurate data by using a simple web camera, it made it possible to obtain more accurate posture diagnosis by collecting more data.

6 Future Works

The data of basketball free throw in this experiment were taken from one side only by a web camera, so it was suitable to analyze with 2 dimensional data provided by OpenPose. However 3 dimensional motion data could bring better analysis in many occasions in general sports motion analysis field. So use of 3 dimensional OpenPose or expand 2D data generated by 2D OpenPose to 3D data [12] could be demanded. Also instead of the regression model, we would like to challenge a motion analysis by time series models reflecting correlation between the skeleton points using many data generated by OpenPose.