Development of 3D Human Body Model

Konstantinov, Vasiliy M.; Rozaliev, Vladimir L.; Orlova, Yulia A.; Zaboleeva-Zotova, Alla V.

doi:10.1007/978-3-319-33816-3_15

Vasiliy M. Konstantinov⁶,
Vladimir L. Rozaliev⁶,
Yulia A. Orlova⁶ &
…
Alla V. Zaboleeva-Zotova⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 451))

501 Accesses
1 Citations

Abstract

This article describes a software system for scanning and three-dimensional reconstruction of the human body using the two MS Kinect devices. The article describes the main stages of the conversion of partial frames of depth in the general polygon mesh model of the human body. Also, we propose and describe a method for constructing the surface in the “empty zones”. We show that using two Kinect devices our scanning system can reconstruct the human body and build the surface in “empty zones”. The main idea of our reconstructing approach is to divide the whole surface of human body in two parts: the front part and the back part. Each of that two parts is dividing in three parts with overlaps: the upper part (head, part of the chest, arms), middle part and lower part. Scans of each part aligned to each other with ICP algorithm and stored in two separate point clouds, which represents the front and back part of human body. Finally, two scans reconstructed and “empty zones” between them build with our algorithm based on Bezier curve model. In conclusion represent the final model of human bode with some parameters.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Design and development of a novel body scanning system for healthcare applications

Article 16 August 2017

3D Reconstruction of Human Body Biometry

Study on Three Dimensions Body Reconstruction and Measurement by Using Kinect

Keywords

1 Introduction

Three-dimensional models of real world objects, in particular models of a human body are applied in many areas, such as applications of the added and virtual reality, video games, in the reengineering process. Such models are created by expensive specialized hardware and software applications.

In this work we offered and described the method of receiving three-dimensional model of a human body with use two MS Kinect devices without additional equipment. The Kinect device is much cheaper than specialized scanners and provides depth stream which can be transformed to a set of three-dimensional points, and then to a polygonal grid of human body.

Now there are a large number of reconstruction systems of complex objects using the Kinect device. However, the reconstruction of the human body imposes a number of additional restrictions on the configuration of the scanning system. So to provide acceptable details level of the surface must be obtained separately the original surface of the scanned body [1, 2]. Some systems use a rotating mechanism to rotate the scanned person. In other multiple devices, which scanning certain areas [3]. Third, these approaches are combined [4].

Another problem is that not all parts of the body surface into the view of the camera. These blank areas are called “empty zones” [5, 6]—areas where the surface is not determined. This problem might be solved in different ways, but most often just an empty area being completed manually in the package of 3D graphics [7].

2 Description of the System Workflow

As entrance data the system accepts two streams from two depth cameras of a Kinect sensor. The system processes each scan $ (R_{k}^{\prime}) $ in streams which represents the image with the permission 640 × 480 pixels. Each pixel describes distance to a site of the scanned surface. This distance is expressed in millimeters.

Before processing of depth it is necessary to carry out their filtration. It is a problem it is connected with that these depths returned by the device rather strongly noise. These noise have casual additive character statistically independent of a signal. In this case it is necessary to remove noise from shots of depth and in too time to keep the clearness of borders. It is caused by that borders on a shot of depth express transition between surface levels. Smoothing of borders is inadmissible and will lead to loss of information. Use of the bilateral filter allows to smooth the image with preservation of a clear boundary [8].

Further the filtered depth shots $ (R_{k} ,R_{k - 1} ) $ are processed for receiving a set of three-dimensional points $ (V_{k} ) $ and normals $ (N_{k} ) $ to them. As, resolution of the camera of depth isn’t great it makes sense to scan a surface of a body of the person in parts. For combination of partial frames of a surface the variation of algorithm of ICP is used. At the exit the full frame of a surface $ (V^{g} ,N^{g} ) $ forward and back part of a body of the person turns out.

Further in view of the fact that there are surface sites which can’t come into the view of any of depth cameras (“empty zones”) it is necessary to complete a surface in these zones. The algorithm of creation of tops on the basis of square Bezier curves is for this purpose developed. At the exit of this stage the general surface $ (V^{{\prime }} ,N^{{\prime }} ) $ a human body united by the added tops between forward and a back surface will be received. After calculation of the general surface it is necessary to receive polygonal model $ (M_{x,y,z} ) $. Is for this purpose used the special algorithm of reconstruction based on the solution of the equation of Poisson (Poisson Surface Reconstruction) [9].

As the output data the system provides the *.ply file which contains polygonal model of a surface of a body of the person. In Fig. 1 the scheme of the general functioning of system is submitted.

3 Description of a Creation Human Body Surface Model Method

Before start of system it is necessary to place MS Kinect devices and to allocate space for scanning of a scene. Two devices are located before the person and behind him, and sent to the opposite sides. The distance between two devices depends on growth of the scanned person. Height at which devices are located is equal to a half of growth of the scanned person.

During process of scanning of people has to be motionless precisely in the center (i.e. on identical removal from both devices) the virtual cube limited to the planes of cameras of MS Kinect devices. Generally, the user can accept any motionless pose. However, thus no part of a lobby or back surface should block the review of the camera.

The viewing angle of the MS Kinect chamber is limited and makes 43° down. It means that the distance of capture of a surface to the utmost of the scanned person is limited. However if conditionally to divide a surface into 3 parts (top, average and lower) with rather big consecutive overlappings the distance can be reduced. It will increase the accuracy of measurement of distance to surface sites as now each partial shot will have permission 640 × 480 pixels, but not a frame of all surface.

After installation and control of system thus the preparatory stage is finished. The scheme of physical placement of system is submitted in Fig. 2.

Before processing of shots of a stream of depth it is necessary to carry out their filtration. In this work for a filtration the bilateral filter [10] is used. As the intensity of the color pixels is the distance to the surface. Thus, the formula of new value of depth $ ( {\text{R}}_{k} ) $ in concrete pixel (q) is expressed by a formula, where $ R_{k}^{{\prime }} $—value of distance in a concrete point of a shot of depth, u—the central point of level of a surface, $ {\mathcal{N}}_{\sigma } \left( t \right) = { \exp }( - t^{2} \sigma^{ - 2} ) $—Gauss function, $ W_{\text{p}} $—some weight coefficient.

$$ {\text{R}}_{\text{k}} \left( {\text{q}} \right) = \frac{1}{{{\text{W}}_{\text{p}} }}\sum\nolimits_{{{\text{q}} \in {\mathcal{U}}}} {{\mathcal{N}}_{{\sigma_{\text{s}} }} (||{\text{u}} - {\text{q||}}_{2} ){\mathcal{N}}_{{\sigma_{\text{r}} }} (||{\text{R}}_{\text{k}}^{{\prime }} ({\text{u}}) - {\text{R}}_{\text{k}}^{{\prime }} ({\text{q}})||_{2} ){\text{R}}_{\text{k}}^{{\prime }} ({\text{q}})} $$

(1)

After processing the bilateral filter will receive a smoothed shot of depth. For transformation of a frame of depth to a cloud of three-dimensional points it is necessary to calculate the first step a matrix of a perspective projection of the camera (K). It can be received on the basis of an internal calibration matrix of the MS Kinect chamber.

Each pixel of a frame of depth represents a vector, where x—the coordinate of “pixel” on width of a shot, y—coordinate on height of a shot, z—value of depth.

$$ {\text{Q}} = \left[ {\begin{array}{*{20}c} {\text{x}} & {\text{y}} & {\text{z}} \\ \end{array} } \right]^{\text{T}} $$

(2)

For receiving a set of three-dimensional tops $ (V_{k} ) $ it is necessary to increase a vector (Q) by the return internal matrix for each pixel of a shot of depth.

$$ {\text{V}}_{\text{k}} ({\text{q}}) = {\text{D}}_{\text{k}} ({\text{q}}){\text{K}}^{ - 1} $$

(3)

The following step—calculation of a normal for each top. The vector of a normal $ ( {\text{N}}_{\text{k}} ) $ is calculated on the basis of coordinates of the next tops.

$$ {\text{N}}_{\text{k}} ({\text{q}}) = \left( {{\text{V}}_{\text{k}} ({\text{x}} + 1,{\text{y}}) - {\text{V}}_{\text{k}} ({\text{x}},{\text{y}})} \right) \times \left( {{\text{V}}_{\text{k}} ({\text{x}},{\text{y}} + 1) - {\text{V}}_{\text{k}} ({\text{x}},{\text{y}})} \right) $$

(4)

Forward and back surfaces have to be constructed by combination of partial frames. The top, average and lower frames have mutual overlappings therefore for their combination we will use algorithm of ICP (Iterative Closest Point) [11]. The top and average frame, then average and lower frames are consistently combined. On the last step the turned-out frames in a uniform surface are combined (forward and back). The scheme of overlappings in partial frames is submitted in Fig. 3.

The first step of algorithm consists in finding of coinciding tops in two partial frames. The current frame $ ( {\text{R}}_{\text{k}} ) $ and the previous frame are compared $ ( {\text{R}}_{{{\text{k}} - 1}} ) $. If the matrix of transformation of the camera from the previous step (the previous partial shot of depth) is known it is possible to project tops in global coordinates $ ( {\text{V}}_{{{\text{i}} - 1}} ) $. As the Kinect chamber is fixed in one point and has only one degree of freedom (rotation round axis X) the matrix of transformation has an appearance

$$ {\text{T}}_{{{\text{i}} - 1}} = \left[ {\begin{array}{*{20}c} 1 & 0 & 0 \\ 0 & {\cos \,{\upvarphi }_{{{\text{i}} - 1}} } & { - \sin \,{\upvarphi }_{{{\text{i}} - 1}} } \\ 0 & {\sin \,{\upvarphi }_{{{\text{i}} - 1}} } & {\cos \,{\upvarphi }_{{{\text{i}} - 1}} } \\ \end{array} } \right] $$

(5)

Thus the formula of a projection of tops to the card of depth (p) has an appearance

$$ {\text{V}}_{{{\text{i}} - 1}} = {\text{T}}_{{{\text{i}} - 1}}^{ - 1} {\text{V}}_{{{\text{i}} - 1}}^{\text{g}} $$

(6)

Further, if the projection contains tops ($ {\text{V}}_{\text{i}} $), then it is necessary to calculate new values of a normal (n) for each top.

$$ {\text{n}} = {\text{T}}_{\text{i}} {\text{N}}_{\text{i}} ({\text{p}}) $$

(7)

Coordinates of tops don’t need to be recalculated as the camera in a new shot isn’t displaced on spatial axes. If threshold value (ts) that is exceeded the current point will be recognized as the combined. If isn’t present, then the point is considered new (from a frame of a new shot of depth).

$$ {\text{n}} \times {\text{n}}_{{{\text{i}} - 1}}^{\text{g}} < ts $$

(8)

Value of a threshold (ts) determines by the following formula

$$ {\text{argmin}}\sum\nolimits_{u} {||(TV_{i} (u) - V_{i - 1}^{g} (u)) \times n_{i - 1}^{g} (u)||^{2} } $$

(9)

At the exit of this stage back and forward surfaces in the form of clouds of three-dimensional points and normals to them will be received.

After paired combination of partial frames we will receive separate back and forward surfaces of a body. Between these surfaces will be there is a thin strip of “empty zones”. These zones are formed because of inaccessibility to scanning in view of a physical arrangement of cameras.

The main idea of a method of filling the empty zones is represented in Figs. 4 and 5.

For finding of boundary tops threshold value of depth by means of the operator Sobel (3 × 3) is used [12]. The top $ (P_{1} ) $ consistently gets out of a set of boundary tops for a forward surface $ (P_{f} ) $. For each such top the next top $ (P_{2} ) $ gets out of a set for a back surface $ (P_{b} ) $. Further is under construction lines of a normal for both tops and the point of intersection $ (P_{c} ) $ pays off. On the basis of the received tops the square Bezier curve on the following formula, where B(t)—auxiliary top, t—parameter [0, 1].

$$ {\text{B}}\left( {\text{t}} \right) = \left( {1 - {\text{t}}} \right)^{2} {\text{P}}_{1} + 2{\text{t}}\left( {1 - {\text{t}}} \right){\text{P}}_{\text{c}} + {\text{t}}^{2} {\text{P}}_{2} $$

(10)

Thus, “empty zones” will be filled with a set of tops of B(t), as shown in Fig. 5.

The output will be obtained overall model of the human body in the form of three-dimensional vertices reciprocally clouds. The last stage—the construction of the mesh-method, based on the solution of the Poisson equation. The output is a polygon mesh, which is described geometry of the human body in the format * .ply.

4 Results

On the basis of this method we have developed a program for the reconstruction of surface geometry of the human body. For the implementation of reconstruction used Poisson library PCL (Point Cloud Library). The method of combining the partial surface of the frame is realized on the basis of the library Kinect Fusion. The architecture developed by the program is shown in Fig. 6. In the nearest future development of the system provided for adding functionality to animate models using BVH skeleton.

Figure 7 in left part shows the partial frames depths, which will be processed by our system. On the right represents the reconstructed body model after filtering and alignment.

References

Huang, X.Q., Adams, B., Wicke, M., Guibas, L.J.: Non-rigid registration under isometric deformations. Comput. Graph. Forum 27(5), 1449–1457 (2008)
Google Scholar
Weiss, A., Hirshberg, D., Black, M.J.: Home 3D body scans from noisy image and range data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1951–1958 (2011)
Google Scholar
Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., et al.: KinectFusion: real-time dense surface mapping and tracking. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2011 pp. 127–136 (2011)
Google Scholar
Rozaliev, V.L., Orlova, Y.A.: Recognizing and analyzing emotional expressions in movements. In: Isaias, P. (ed.) E-Learning Systems, Environments and Approaches. Theory and Implementation, Part II, § 9, pp. 117–131. Springer (2015)
Google Scholar
Shum, H.P.H., Ho, E.S.L., Jiang, Y., Takagi, S.: Real-time posture reconstruction for Microsoft Kinect. IEEE Trans. Cybern. 43(5), 1357–1369 (2013)
Google Scholar
Li, H., Vouga, E., Gudym, A., Luo, L., Barron, J.T., et al.: 3D self-portraits. ACM Trans. Graph. 32(6) (2013)
Google Scholar
Sanna, A., Lamberti, F., Paravati, G., Rocha, F.D.: A Kinect-based interface to animate virtual characters. J. Multimodal User Interfaces 7(4), 269–279 (2013)
Google Scholar
Alekseev, A., Rozaliev, V., Orlova, Y.: Automatic coloring of grayscale images based on intelligent scene analysis. Pattern Recogn. Image Anal. (Advances in Mathematical Theory and Applications) 25(1), 10–21 (2015)
Google Scholar
Kazhdan, M., Bolitho, M., Hoppe, H.: Poisson surface reconstruction. In: Eurographics Symposium on Geometry Processing, pp. 61–70 (2006)
Google Scholar
Cui, Y., Chang, W., Nöll, T., Stricker, D.: KinectAvatar: fully automatic body capture using a single kinect. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7729 LNCS (PART 2), pp. 133–147 (2013)
Google Scholar
Lim, H., Lee, S.O., Lee, J.H., Sung, M.H., Cha, Y.W., et al.: Putting real-world objects into virtual world: fast automatic creation of animatable 3D models with a consumer depth camera. In: Proceedings—2012 International Symposium on Ubiquitous Virtual Reality, ISUVR 2012, pp. 38–41 (2012)
Google Scholar
Chen, Y., Dang, G., Cheng, Z.Q., Xu, K.: Fast capture of personalized avatar using two Kinects. J. Manufact. Syst. 33(1), 233–240 (2014)
Google Scholar

Download references

Acknowledgments

The study was financially supported by RFBR research projects №15-47-02149, 15-07-06322, 15-37-70014, 16-07-00407, 16-07-00453.

Author information

Authors and Affiliations

Volgograd State Technical University, Volgograd, Russia
Vasiliy M. Konstantinov, Vladimir L. Rozaliev & Yulia A. Orlova
Russian Foundation for Basic Research, Moscow, Russia
Alla V. Zaboleeva-Zotova

Authors

Vasiliy M. Konstantinov
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir L. Rozaliev
View author publications
You can also search for this author in PubMed Google Scholar
Yulia A. Orlova
View author publications
You can also search for this author in PubMed Google Scholar
Alla V. Zaboleeva-Zotova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir L. Rozaliev .

Editor information

Editors and Affiliations

Sci N/w for Innova and Research Exc, Machine Intelligence Research Labs, Auburn, Washington, USA
Ajith Abraham
Rostov State Transport University, Rostov-on-Don, Russia
Sergey Kovalev
Bauman Moscow State Technical University, Moscow, Russia
Valery Tarassov
Institute of Mathematics, VŠB-Technical University of Ostrava, Ostrava, Czech Republic
Václav Snášel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Konstantinov, V.M., Rozaliev, V.L., Orlova, Y.A., Zaboleeva-Zotova, A.V. (2016). Development of 3D Human Body Model. In: Abraham, A., Kovalev, S., Tarassov, V., Snášel, V. (eds) Proceedings of the First International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’16). Advances in Intelligent Systems and Computing, vol 451. Springer, Cham. https://doi.org/10.1007/978-3-319-33816-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-33816-3_15
Published: 24 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33815-6
Online ISBN: 978-3-319-33816-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Development of 3D Human Body Model

Abstract

Similar content being viewed by others