1 Introduction

Gait recognition is to identify individuals based on their walking posture, medical studies show that human gait has differences, and each person has their own unique walking style [1]. Compared to other biometrics (fingerprints, faces, irises, etc.), gait recognition has the advantages of being non-contact, noninvasive, easy to perceive, difficult to hide and difficult to camouflage. The existing methods can be divided into three categories: structural characterization, non-structural characterization and fusion characterization [2]. Structural characterization method, also known as model-based method, is one of the important directions of gait recognition. Although the accuracy is slightly lower than the other two methods, it has great advantages in rapid recognition.

Zhang [3] created a five-link bipedal human model, which is used together with the human body’s height vector and the base height vector to characterize gait. Lu [4] proposed a body layered deformable model that divides the body into several blocks and calculated 22 parameters including body shape and dynamic features as the gait features. Lin [5] performed gait recognition based on the extracted angle and other related parameters in the 2D contour image. These are the early exploration of the human body model for gait recognition. Feature extraction and recognition are performed by the kinematic data of some bones, but the feature is less, and the importance of bone length is neglected, which makes the recognition error larger.

Hamzaçebi [6] proposed a new MD-SLIP template based on the general spring-loaded SLIP template for leg movements, which enhances the stability of gait tracking. Wang [7] established a skeletal model from the joint point coordinates automatically acquired by Kinect which can obtain infrared images. Wang [8] proposed a new method based on global optimization for searching these correspondences, which performs favorably methods in highly cluttered backgrounds. Zhang [9] introduced the contourlet transform into the gradient vector flow (GVF) Snake model to effectively extract edge features. Zhang [10] created an initial 3D model using the example-oriented radial basis function model that maps the set of 30 measurements to the body shape space and is established based on the given examples. Kusakunniran and Wu [11] utilized procrustes shape analysis and related similarity measure methods for identification problems caused by speed changes. These studies reflect the process of making the human body model more accurate in identifying and tracking, where the researchers overcome the influences of walking speed, complex backgrounds, etc. However, the selection of the shoulder and hip joints is not accurate enough for overusing the anatomical bone proportion and the results of body model pattern recognition, and the application in gait recognition is still insufficient.

Particularly, people have done a lot of research and made great progress in the analysis of human body models based on HumanEva datasets. Sigal [12] described a baseline algorithm for 3D articulated tracking, which help establish the current state of the art in human pose estimation and tracking. Poppe [13] presented an example-based approach to pose recovery, using histograms of oriented gradients as image descriptors to test on the HumanEva-I and HumanEva-II datasets, which provides the 3D error of human joints. Zhang [14] extended the Gaussian process latent variable model (GPLVM) for JGPM learning, where two heuristic topological priors, a torus and a cylinder, are considered and several JGPMs of different degrees of freedom (DoFs) are introduced for comparative analysis to estimate 3D gait kinematics. Jahangiri [15] proposed a method to generate multiple hypotheses for human 3D pose all of them consistent with the 2D detection of joints in a monocular RGB image. These studies provide tracking and prediction of human 3D models and analysis of joints errors, but lack of spatial position calculations and extension to the application for gait recognition.

Establishing a model with high-precision bone length and angle is the purposes and key to the model-based method, which plays a decisive role in the subsequent feature extraction and recognition process [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]. From the above research, we can find that: (1) The traditional method of skeletal model applied to gait recognition lacks the application of bone length and the reasonable determination of hip and shoulder joints. (2) In human pattern recognition, the extraction of certain joints (shoulders, wrists, knees and ankles) has reached a certain degree of precision. However, there is a lack of an accurate transform from the planar position to the spatial position and an extension to gait recognition.

In traditional spatial positioning only by vision, at least two cameras are required. Each camera produces a calibration error after performing spatial calibration. When calculating the spatial position, these two errors are not a simple superposition, but a random coupling error which is difficult to analyze and will be lager if the sampling moments of the two cameras are not uniform. Therefore, tactility is introduced to replace a vision in this paper. the positioning error in the tactility is almost negligible due to the force particularity of the second phalangeal end, so there is only one camera calibration error, and it is easy to analyze and reduce. The shape of tactile sensor is a square with a side length of 8.4 mm, and when used for spatial positioning, it can be ensured that the positioning error of Z-axis coordinates is less than 4.2 mm at any distance.

Out of these considerations, a method for extracting a gait model based on the spatial and temporal fusion between vision and tactility is proposed to rapidly establish a dedicated 3D gait skeleton model for individuals, which is based on the accurate plane coordinates obtained by human body model pattern recognition and the equal length of the bones on the left and right sides in human anatomy. Moreover, the use of tactility lays the foundation for the study of gait recognition through dynamic response of the ground support to human motion.

2 Theory and extraction method of 3D gait skeleton model

The human skeleton is made up of many rigid bones connected by joints. The gait skeleton model presented in this paper (see Fig. 1) simplifies these rigid bones into line segments with joint points as their endpoints. The model mainly considers joints which have relative movement to the center of gravity during walking (S: shoulder, E: elbow, W: wrist, H: hip, K: knee, A: ankle, M: phalangeal end). The midpoints of the hips and shoulders (\(S_\mathrm{C}\): shoulders’ midpoint, \(H_\mathrm{C}\): hips’ midpoint) were introduced for simplifying the model, and reasonable assumptions based on human anatomy were proposed:

  1. 1.

    left body (L: left body) and right body (R: right body) are symmetrical to each other;

  2. 2.

    the same kinds of bones in the left body and right body are equal in length.

Fig. 1
figure 1

The position schematic of the joint points selected in the gait skeleton model

Each joint has multiple degrees of freedom which causes the complexity of human behavior [4]. In the walking process, these joints generally only have a large variation in one degree of freedom, but have little variation in others. Therefore, the following reasonable accommodations were proposed:

  1. 1.

    the oscillating planes of the bones represented by \(K-H\), \(K-A\), \(E-S\), \( E-W\) and \(S_\mathrm{C}-H_\mathrm{C}\) are parallel to the walking direction and perpendicular to the walking plane;

  2. 2.

    the skeletons represented by \(H_{R}-H_{L}\) and \(S_{R}-S_{L}\) are parallel to the walking plane.

According to the different spatial positions of different skeletal joints during walking, as shown in Fig. 2, the joints of this gait skeleton model can be assigned to seven planes (except the joint M):

Fig. 2
figure 2

The plane formed by the swing of each bone of the gait skeleton model in the actual movement, where all the planes are parallel to the walking direction

Plane 1: The oscillating plane of the bones represented by \(S_{R}-E_{R}-W_{R}\); Plane 2: The oscillating plane of the bones represented by \(S_{L}-E_{L}-W_{L}\); Plane 3: The oscillating plane of the bones represented by \(H_{R}-K_{R}-S_{R}\); Plane 4: The oscillating plane of the bones represented by \(H_{L}-K_{L}-S_{L}\); Plane 5: The symmetry plane of Planes 1–4, including \(S_\mathrm{C}\) and \(H_\mathrm{C}\); Plane 6: The plane of rotation of the bones represented by \(H_{R}-H_{L}\); Plane 7: The plane of rotation of the skeleton represented by \(S_{R}-S_{L}\).

2.1 The spatial and temporal fusion

In the experimental process, visual data and tactility data are measured in different ways. It is difficult to unify the sampling frequency and corresponding start–stop time periods of both parties. In this paper, both data will be unified in time though the coupling relationship between the plantar pressure and the distance of the human feet. After the time is unified, the gait skeleton model will be extracted by spatial fusion.

2.1.1 The temporal fusion

During walking, the distance between the two feet (the number of pixels from the toes of the front feet to the heels of the other feet, see Fig. 3) and the plantar pressure (see Fig. 4) will change regularly with time, which are repeatable and stable.

Fig. 3
figure 3

Schematic of the distance between the two feet during the time fusion

Fig. 4
figure 4

Definition of pressure region on the plantar pressure integration map

Fig. 5
figure 5

The distance difference curve of the adjacent frame in the visual date(the distance of present frame—the distance of previous frame)

Fig. 6
figure 6

The plantar pressure curve over time in different regions

Fig. 7
figure 7

The fusion curve of width difference and plantar pressure

Figure 5 shows the distance difference curve of the two feet of the adjacent frame. The sampling frequency of the camera (\(f_\mathrm{c}\)) is 30 frames per second. The positive and negative differences represent the increase or decrease in the distance between the present frame and the previous frame, respectively, so the positive zero points \(N_{1}\) and \(N_{2}\) are the critical frames from increase to decrease, and the width of the feet in those frames is the largest.

Figure 6 shows the curve of the plantar pressure over time in the three regions, whose starting time was not a moment with that of Fig. 5. The sampling frequency of the plantar pressure test equipment (\(f_{w}\)) is 50 frames per second. In the walking process, the pressure on the left and right feet is equal at moment \(t_{1}\) and \(t_{2}\). When the pressure on the left and right feet is equal, the heel of one foot and the forefoot of the other foot land at the same time, the inclinations of the two feet are the smallest relative to the ground. Therefore, the moment \(t_{1}\) and \(t_{2}\) can be considered as the moment when the distance of two feet is the largest within the allowable range of error.

Compared to the camera, plantar pressure test equipment has higher sampling frequency and higher data accuracy. For this reason, the visual data are integrated into the timeline of the plantar pressure data by using the characteristic moment of the widest feet distance. Define that the plantar pressure of the both feet in Fig. 6 is equal at the moments \(t_{1}\), \(t_{2}\), \(t_{3}\)\(\cdots \)\(t_{j}\), j\(\in \)\(N^{+}\), and the positive zeros in Fig. 5 appear at the frames \(N_{1}\), \(N_{2}\), \(N_{3}\)\(\cdots \)\(N_{i}\), i\(\in \)\(N^{+}\). There are multiple characteristic moments in the entire walking process, so the matching function is introduced under the assumption where the moment of first frame \(N_{1}\) in Fig. 6 corresponds to the moment \(T_{1}\) in Fig. 5 for improving the precision of the fusion:

$$\begin{aligned} f(T_{1})=\frac{1}{n}\sqrt{\sum _{i=j=1}^n(T_{i}-t_{j})^{2}} \end{aligned}$$
(1)

where n is the number of characteristic moments, \(T_{i}=T_{1}+(N_{i}-N_{1})/f_\mathrm{c}\) is the moment of each positive zero point in Fig. 5. When the matching function \(f(T_{1})\) obtains the minimum value, the errors of the corresponding characteristic time points of the visual data and the tactility data are the smallest, and the value of \(T_{1}\) at this time is set as the moment of the \(N_{1}\) frame in the visual data, which represents that the process of time fusion is unified. The fusion of visual data (Fig. 5) and tactility data (Fig. 6) is shown in Fig. 7.

2.1.2 The spatial fusion

In the visual imaging process, the actual direction of the object is the same as the imaging direction of the camera screen, but is opposite to the imaging direction on the camera imaging surface (see Figs. 810, 11). The experimental equipment includes a camera lens and the plantar pressure test equipment (hereinafter referred to as Walkway).

The visual and tactility data are spatially fused according to the imaging principle of the black rectangle frame of the Walkway in the lens. As shown in Fig. 8, use the optical center \(O_{1}\) of the lens as the origin and the optical axis of the lens as Z-axis to establish the coordinate system \(O_{1}-xyz\). The length of the rectangular frame of Walkway is L, and the width is W. The four vertices of the frame are represented by a, b, c, d, and their coordinates belong to unknown variables. The image points on the imaging plane corresponding to the four vertices are represented by \(a'\), \(b'\), \(c'\), \(d'\), and their coordinates belong to known variables.

Fig. 8
figure 8

The imaging process and principle of the Walkway frame in the lens

Fig. 9
figure 9

The selection principle of the M-joint coordinates in the coordinate system \(O_{2}-xy\) in the tactility data

Fig. 10
figure 10

The imaging process and principle of the lower gait skeleton model in vision where Planes 3, 4, 5 and imaging surface are parallel to each other, parallel to the walking direction, perpendicular to the Plane 6

When the seven planes of the human skeleton model are parallel or perpendicular to the imaging surface, the calculation time and difficulty of the visual geometry can be reduced. Therefore, the experiment requires that the imaging coordinates of the Walkways frame in the lens satisfy the following relationship:

$$\begin{aligned} y_{a'}/x_{a'}=y_{b'}/x_{b'}=-y_{c'}/x_{c'}=-y_{d'}/x_{d'}. \end{aligned}$$
(2)

In this condition, the Plane adcd is perpendicular to the Y-axis , the Line bc is parallel to the X-axis. The Line ab and Line cd are symmetric about Plane yOz. The walking direction at any time during the experiment is processed parallel to the X-axis. According to geometric relations, the seven planes can be divided into two categories:

  1. 1.

    Planes 1–5 are parallel to the imaging plane, referred to as parallel plane.

  2. 2.

    Planes 6–7 are perpendicular to the imaging plane, referred to as vertical plane.

Position of a plane requires only three space points, so the six points (a, b, c, \(a'\), \(b'\), \(c'\)) are used for analysis. The expression for the point imaging principle is described as:

$$\begin{aligned} (X,Y,Z)=-\gamma (x,y,p) \end{aligned}$$
(3)

where p represents the camera focal length, \(\gamma \) represents the distance ratio relative to the origin between the space point (XYZ) and its imaging point (xyp). It can be seen from formula 3, when the coordinates of points \(a'\), \(b'\), \(c'\) are known, the spatial position of points a, b, c is determined by the distance ratio \(\gamma _{a}\), \(\gamma _{b}\), \(\gamma _\mathrm{c}\). According to the condition \(l_{ab}=W\), \(l_{bc}=L\), and \(\overrightarrow{ab}\bot \overrightarrow{bc}\). The expressions of \(\gamma _{a}\), \(\gamma _{b}\), \(\gamma _\mathrm{c}\) are described as:

$$\begin{aligned} \gamma _{b}= & {} \gamma _\mathrm{c}=L/(x_{b'}-x_{c'}) \end{aligned}$$
(4)
$$\begin{aligned} \gamma _{a}= & {} \gamma _{b}x_{b'}/x_{a'}. \end{aligned}$$
(5)

To locate each square pressure sensor (side length is cs) of Walkway, take the Point b as the origin, the Edge bc as X-axis, the Edge ba as Y-axis, and side length cs as the unit length to establish a plane coordinate system \(O_{2}-xy\). The spatial coordinates of any Point (ij) in \(O_{2}-xy\) can be calculated by the space fusion formula:

$$\begin{aligned} (x,y,z)_{ij}&= \gamma _{b}(x_{b'},y_{b'},p)\nonumber \\&\quad +\, i[\gamma _{a}(x_{a'},y_{a'},p)-\gamma _{b}(x_{b'},y_{b'},p)]cs/W\nonumber \\&\quad +\, j[\gamma _\mathrm{c}(x_{c'},y_{c'},p)-\gamma _{b}(x_{b'},y_{b'},p)]cs/L. \end{aligned}$$
(6)

2.2 Spatial geometry analysis

For obtaining the position of the relevant joints as accurately as possible and reducing the error, the extraction order of the skeletal model joints is taken from known to unknown. Unlike hard-to-handle vision data, the position of the human body is more accurately and easily located by tactility. The points of the upper and lower limbs are all in two parallel planes and one vertical plane, which causes that the spatial coordinates of the joints of the upper and lower limbs can be obtained in the same algorithm. In this paper, take the process of obtaining the joint point spatial coordinates of the lower limbs as the example to perform spatial geometry analysis.

Fig. 11
figure 11

Imaging of human lower skeleton model in imaging surface

2.2.1 The spatial coordinates of M

In the plantar pressure distribution (see Fig. 9) measured by the Walkway, the highest point means the main stress point, which is considered as the position of the phalangeal end M.

As shown in Fig. 9, the coordinates \((i_{M_{R}}, j_{M_{R}})\), \((i_{M_{L}}, j_{M_{L}})\) of the M joint points in \(O_{2}-xy\) can be obtained by the pressure distribution, which are the position coordinates of the pressure peaks. Through the space fusion formula 6, the two points are transformed from \(O_{2}-xy\) to \(O_{1}-xyz\) and the spatial coordinates of \(M_{R}\) and \(M_{L}\) are obtained. The left and right sides of the skeletal model are symmetrical and Plane 5 is a parallel plane, so the Z-axis coordinate expression at any point on Plane 5 is described as:

$$\begin{aligned} z_{\mathrm{Plane5}}=(z_{M_{R}}+z_{M_{L}})/2 \end{aligned}$$
(7)

where \(z_{M_{R}}\) represents the Z-axis coordinate of \(M_{R}\), \(z_{M_{L}}\) represents the Z-axis coordinate of \(M_{L}\).

2.2.2 The spatial coordinates of A and K

The imaging principle of human lower limb is shown in Fig. 10. In \(O_{1}-xyz\), the length of the bone \(K_{R}-A_{R}\) is equal to the length of the bone \(K_{L}-A_{L}\), and the Plane 3, Plane 4 are perpendicular to the Z-axis. When the camera is imaging, the imaging points of joints \(A_{R}\), \(A_{L}\), \(K_{R}\), \(K_{R}\) are represented by \(A_{R'}\), \(A_{L'}\), \(K_{R'}\), \(K_{R'}\) in the image surface and their coordinates can be extracted directly as known variables. According to \(\triangle O_{1} A_{L} K_{L}\)\(\sim \)\(\triangle O_{1} A_{L}' K_{L}'\) and \(\triangle O_{1} A_{R} K_{R}\)\(\sim \)\(\triangle O_{1} A_{R}'K_{R}'\), the Z-axis coordinate expression of any points of the Plane 3 and Plane 4 is described as:

$$\begin{aligned} z_{\mathrm{Plane3}}= & {} 2z_{\mathrm{Plane5}} l_{A_{L}'K_{L}'}/(l_{A_{R}'K_{R}'}+l_{A_{L}'K_{L}'}) \end{aligned}$$
(8)
$$\begin{aligned} z_{\mathrm{Plane4}}= & {} 2z_{\mathrm{Plane5}} l_{A_{R}'K_{R}'}/(l_{A_{R}'K_{R}'}+l_{A_{L}'K_{L}'}) \end{aligned}$$
(9)

where \(l_{A_{R}'K_{R}'}\) represents the length of \(A_{R}'-K_{R}'\), \(l_{A_{L}'K_{L}'}\) represents the length of \(A_{L}'-K_{L}'\). When the values of \(z_{\mathrm{Plane3}}\) and \(z_{\mathrm{Plane4}}\) are calculated, the spatial coordinates of joint points A and K can be obtained by formula 3.

2.2.3 The spatial coordinates of H

Obstructed by the body, the imaging coordinates of the joints H on the imaging surface are difficult to be extracted directly. For this problem, a method based on human anatomy is presented through visual geometry analysis in our experimental environment. As shown in Fig. 10, Plane 3, Plane 4 and Plane 5 are perpendicular to the Z-axis, Plane 6 is perpendicular to the Y-axis, and the length of the skeleton \(K_{R}-H_{R}\) is the same as the length of the skeleton \(K_{L}-H_{L}\). By the conditions \(\triangle O_{1} H_{L} K_{L}\)\(\sim \)\(\triangle O_{1} H_{L}' K_{L}'\), \(\triangle O_{1} H_{R} K_{R}\)\(\sim \)\(\triangle O_{1} H_{R}' K_{R}'\) and \(Z_{H_{R}}=Z_{H_{L}}\), we can infer that:

$$\begin{aligned}&l_{H_{L}'K_{L}'}{:}l_{H_{R}'K_{R}'}=z_{\mathrm{Plane3}}{:}z_{\mathrm{Plane4}} \end{aligned}$$
(10)
$$\begin{aligned}&y_{H_{L}'}{:}y_{H_{R}'}=z_{\mathrm{Plane3}}{:}z_{\mathrm{Plane4}} \end{aligned}$$
(11)

where \(l_{H_{R}'K_{R}'}\) represents the length of \(H_{R}'-K_{R}'\), \(l_{H_{L}'K_{L}'}\) represents the length of \(H_{L}'-K_{L}'\).

As shown in Fig. 11, the human lower limbs skeleton model forms 5 segments on the imaging surface. For getting the coordinates of joints \(H_{R}'\) and \(H_{L}'\) that blocked by the body, it is necessary to know the coordinates of joints \(K_{R}'\) and \(K_{L}'\), the slope \(k_{1}\) of bone \(H_{R}'-K_{R}'\) and the slope \(k_{2}\) of bone \(H_{L}'-K_{L}'\), where the coordinates of joints \(K_{R}'\) and \(K_{L}'\) can be directly extracted as known variables, but the slope \(k_{1}\) and \(k_{2}\) which are also blocked by the body are also difficult to be directly extracted. For this problem, a method was proposed by visual geometry analysis (in Fig. 12).

As shown in Fig. 12, there are two oblique truncated cones in the space (representing the left thigh and right thigh, respectively). Assume that the centerline of the top and bottom circle of the truncated cone represents the thigh bone. When viewed from different angles, the position of the two thigh in plane image will be very different, and 3 imagines of different visual angles are taken as examples. In each visual imaging, the plane images form a coincident part, an irregular quadrangle. The position of these quadrilaterals in the visual images will rise or fall regularly with the continuous change of the viewing angle, but the slope of the thigh bone will not change.

Fig. 12
figure 12

The principle of selecting the slope of the thigh bone \(H-K\), in which the slopes of the \(l_{13}\) and \(l_{14}\) are selected as the slope of the thigh bone \(H-K\)

In practical applications, the plane line \(l_{5},l_{6}\) regularly exists so short that the error is relatively large, and the plane line \(l_{7},l_{8}\) cannot be directly extracted due to the blocking of the body(as shown in the imagine of visual angle 3). However, the position of the intersection q, the plane line \(l_{9},l_{10}\) is relatively accurate and easy to be extracted (as shown in the imagine of visual angle 2). Therefore, make the lines \(l_{11},l_{12}\), respectively, perpendicular to \(l_{9}\) and \(l_{10}\) though the intersection p, and the slope of the line \(l_{13},l_{14}\) connecting the midpoints of the opposite sides of the new quadrilateral \(q_{1}q_{2}q_{3}q\) is defined as the slope \(k_{1}\) and \(k_{2}\). Under the assumption that the centerline of the upper and lower bottom circles represents the tight bone. When the straight line \(l_{3},l_{4}\) is parallel to the line \(l_{1},l_{2}\) (as shown in the imagine of visual angle 1), the line connecting the midpoints of the opposite sides of the quadrilateral is only the tight bone. Actually, the assumed slope itself does not represent the true value, and the final slope obtained by the above method differs slightly from the assumed slope. Therefore, it is reasonable to represent the slope of the tight bone with the midpoint line of the new quadrilateral \(q_{1}q_{2}q_{3}q\). The final result shows that the positions of H joints obtained by the above method meet the actual body structure.

Assume that the slope of the line containing \(H_{R}'\) and \(H_{L}'\) is k, the Y-axis coordinate of the intersection of Line \(l_{H_{R}'H_{L}'}\) and the Plane \(yO_{1}z\) is e. Through the formula (10, 11), the analytical formulas of k and e can be described as:

$$\begin{aligned} \frac{P_{1}k^{2}+P_{2}k+P_{3}}{P_{4}k-P_{5}}=\frac{P_{6}k^{2}+P_{7}k}{P_{8}k-P_{9}}=e \end{aligned}$$
(12)

where

$$\begin{aligned} \left\{ \begin{array}{l} P_{1}=-x_{K_{R}'}t-x_{K_{L}'} \lambda \\ P_{2}=x_{K_{R}'}tk_{2}+x_{K_{L}'} \lambda k_{1}+y_{K_{R}'}t+y_{K_{L}'} \lambda \\ P_{3}=-y_{K_{R}'}tk_{2}-y_{K_{L}'} \lambda k_{1} \\ P_{4}=t+\lambda \\ P_{5}=tk_{2}+\lambda k_{1} \\ P_{6}=y_{K_{R}'}-y_{K_{L}'} \lambda -x_{K_{R}'}k_{1}+ \lambda x_{K_{L}'}k_{2} \\ P_{7}=x_{K_{R}'}k_{1}k_{2}-\lambda x_{K_{L}'}k_{1}k_{2}-y_{K_{R}'}k_{2}+\lambda y_{K_{L}'}k_{1} \\ P_{8}=k_{1}+\lambda k_{2} \\ P_{9}=k_{1}k_{2}+\lambda k_{1}k_{2} \\ \lambda = z_{\mathrm{Plane3}}/z_{\mathrm{Plane4}} \\ t=\sqrt{1+k_{1}^{2}}/\sqrt{1+k_{2}^{2}} \end{array} \right. \end{aligned}$$
(13)

when k and e are calculated, the spatial coordinate of joint H can be calculated by formula 3.

From the above analysis we can know, the spatial coordinates of all joints can be obtained by the method of this paper, which means that the skeleton model of the human body can also be reconstructed in \(O_{1}-xyz\). The gait skeleton model obtained by this method is reconstructed according to the actual positions of the joint points, so it is not affected by the visual observation angle and has good versatility.

3 Experiment and result analysis

In this experiment, a white wall is selected as the background, and the experimental equipment include: 1. camera with monocular zoom lens; 2. Walkway Metric 3150TL foot pressure test equipment (Walkway). The focal length of the monocular zoom lens was 27 mm after calibration. The length of Walkways black frame was 2035 mm, and the width was 455 mm. 9152 square sensors whose side length is 8.4 mm are distributed on the Walkway. The sensor of the Walkway uses piezoresistive sensing technology with a maximum measurement value of 862 Kpa. It’s detection frequency can reach 50 HZ, and it can even be upgraded to 185 HZ if equipped with a dedicated detection module. The matching software can clearly record and display the 2D and 3D foot pressure contours at each moment and provide a variety of fixed and custom foot pressure analysis functions, such as the real-time position of the pressure center, the integral of the pressure-receiving area, the target area pressure value curve and so on. Although experiments and data processing can be easily carried out through the Walkway, the piezoresistive structure results in the sensor’s distribution density and limit pressure values not coexisting. If you want to obtain higher densitypressure data, then the limit measurement pressure value will inevitably decrease. When used with other devices, if the detection frequencies of the two devices are inconsistent, the data correspondence is difficult due to the fixed frequency of the Walkway, which brings errors to the processing.

Fig. 13
figure 13

The relative position of the selected 4 samples and the camera lens

Fig. 14
figure 14

The time fusion schematic of visual data and tactility data, in which there are two moments when the distance between the two feet is the largest, and the fusion process is based on the timeline of the tactility data

During the experiment (see Fig. 13), first place the lens at a distance (Distance 1) away from the background to ensure that the relevant bones and joints of the walking sample are completely imaged in the lens. Then adjust the angle and position of the lens, and make the image of the Walkway frame in the lens satisfy formula 2, which is called position adjustment. After the adjustment was completed, the sample walks from left to right on the Walkway and the data collection was performed at the same time. Finally, change the lens placement distance (distance 2) and repeat the experiment. Figure 14 shows the process of matching the visual data to the timeline of the tactility data, in which the value of \(T_{1}\), the moment of the \(N_{1}\) frame, is 1.433 s when the minimum of \(f(T_{1})\) is taken.

Fig. 15
figure 15

Visual graphics at right side of distance 1

Fig. 16
figure 16

The tactile images at right side of distance 1

Fig. 17
figure 17

Visual graphics at left side of distance 1

In the experimental results, we selected four groups of samples from different sides of different distances about one subject to analyze the experimental results (see Figs. 15, 16, 17, 18, 19, 20, 21, 22), where the visual images were background-removed and the process of extracting the slope of the thigh bone was expressed. The relevant joint coordinates and the parameters that can be extracted accurately by vision are shown in Tables 1, 2 and 3, and the coordinates of the second phalangeal end in \(O_{2}-xy\) that can be extracted by tactility are shown in Table 4.

Fig. 18
figure 18

The tactile images at left side of distance 1

Fig. 19
figure 19

Visual graphics at right side of distance 2

Fig. 20
figure 20

The tactile images at right side of distance 2

Fig. 21
figure 21

Visual graphics at left side of distance 2

Fig. 22
figure 22

The tactile images at left side of distance 2

Table 1 Joint point coordinates that can be directly extracted from two samples of distance 1 (mm)
Table 2 Joint point coordinates that can be directly extracted from two samples of distance 2 (mm)
Table 3 The \(\gamma \) values of the Walkway frame at different distances
Table 4 M point coordinates in \(O_{2}-xy\) that can be directly extracted from Plantar pressure distribution
Table 5 The random sampling time for solving the spatial skeleton model once
Table 6 The bone length at distance 1 and the measured values (mm)
Table 7 The bone length at distance 2 and the measured values (mm)
Table 8 The average bone length of the same distance and the measured values (mm)
Table 9 The error rate between the average bone length and the measured values

3.1 Results and analysis

The walking postures of the skeleton models in different locations are complex, and the swing angles of the same bone are also different, which makes the characteristics of the model difficult to directly analyze and compare. Taking into account that the length of each bone does not change with the angle, the accuracy of the experimental results is verified by analyzing the bone length.

Table 5 shows the random sampling time for solving the spatial skeleton model by the LabVIEW through the algorithm of this paper according to the data in Tables 1, 2, 3 and 4. It can be seen that the time is up to 57.2 \(\upmu {\hbox {s}}\) and the minimum is 6.7 \(\upmu {\hbox {s}}\), which will be slightly different depending on the configuration of the computer or the further function of the program. With the visual acquisition frequency (33 ms for one frame) and the mature human body model pattern recognition technology, this time can fully meet the requirements of fast response.

Tables 6 and 7 show the bone lengths at different locations. It can be seen that the bone lengths of the models extracted from different locations have small fluctuations above and below the actual measured values. The length errors of the shoulder and hip bone are slightly larger than those of other bones, that is because the coordinates of H and S points are obtained by thigh slopes, which means that they have one more calculation and error than coordinates of the other points. Under the same distance, the lengths of the left and right bones are overall larger or smaller, indicating that the error originates from the initial positioning in the experiment, which can be reduced through another position adjustment.

The average bone lengths of the left and right sides at the same distance are shown in Table 8 where the average values can be well fitted to the measured values of the sample. The error on the left side and the error on the right side complement each other, and it is proved once again that the reason why the overall length of the left or right side is larger or smaller is the error caused by the position adjustment during the experiment. The bone length error rate between the average bone lengths and the measured values is shown in Table 9 where each bone length error rate is eventually controlled within \(2\%\). It is proved that the model extracted by this method can represent the true skeleton of human body to some extent.

4 Conclusion and future work

In this paper, the gait skeleton model is rapidly extracted through geometric analysis based on the spatial and temporal fusion of vision and tactility. Through the establishment of equal-scale model whose bone length error is eventually controlled within \(\pm \,5\,{\hbox {mm}}\), not only the individualized bone length is ensured, but also the model has good stability at different distances from different angles. Although the joint points were not automatically selected, the blocked shoulder and hip joints were fully analyzed and verified through geometric analysis. The experimental results show that the method proposed in this paper can quickly extract the skeleton model in which the bone lengths are representative.

In the research process of this paper, a high-precision model was obtained through the fusion between vision and tactility, and the tactility played a major role in the spatial positioning accuracy. In the future work, we will use this gait skeleton model to study the gait characteristics of tactile and vision in time. Then the dynamics coupling relationship between vision and tactility will be explored to improve the recognition efficiency and recognition accuracy of gait recognition according to the temporal and spatial fusion method in this paper.