Keywords

1 Introduction

Motion analysis is the way toward capturing real life gestures and movements of a subject as arrangements of Cartesian facilitates in 3D space. A motion capture framework has applications in various domains such as surveillance [1,2,3,4], assistive interaction of human with computer technologies [5, 6], deaf sign word recognition [7,8,9], computational behavioral science [10, 11] and consumer behavior analysis [12], the focus is the detection, recognition and analysis of human movements and behavioral actions.

Motion capture systems are classified into magnetic, mechanic, and optical. In magnetic system, the electromagnetic sensors connected to a computer used to produce the real-time 3D data at lower processing cost. But some of the movements are restricted due to the cabling. The mechanical motion capture system uses suits with integrated sensors which records the real-time movements as 3D data.

Optical motion capture utilizes cameras to recreate the body stance of the entertainer. One approach utilizes an arrangement of numerous synchronized cameras to catch markers set in vital areas on the body.

As introduced newly in [13], the community of computer vision describes a skeleton as a schematic model of the human body. The skeletal parameters and motion attributes can be used as an illustration of the gestures commonly known as actions and, consequently, the human frame pose is described by means of the relative joint locations in the skeleton. Application domains such as gaming and human–computer interfaces are greatly benefiting of this new innovative technology.

In recent years, the motion capture technology is introduced to capture the hand motions, which in turns greatly boost the field of sign language recognition in 3D environment. This increases the efficiency and accuracy of sign language recognition system. The biomechanical anatomy of human hand incorporates many number of degrees of freedom (DoF) which making the mapping of external measurements with functional variables complex. Hence the finest assessment of human hand kinematics is a complex task [14].

The present trends in sign language recognition technologies are on the primary basis on virtual reality, unable to produce animations on their own in which the signer associates locations in space with entities under discussion. The automatic spatial association of signs locations is closely related to the sign recognition process. Without extracting the signature of the sign word through motion analysis, the spatial location of signs cannot be modified as the change in location of entities under discussion.

Kinematic movements of a human hand are increasingly demanded in sign language recognition. The appropriateness of the finger movements is greatly prompting to identify the accurate sign. From the past few decades, the development is confined to human gait analysis in clinical research field. In clinical gait analysis, the positional changes of the markers attached to the skin greatly affect the analysis. Performing the sign, which provides small displacements in the marker positions due to skin moments will not effects our sign recognition.

The Optical motion capture of a subject flow is shown in Fig. 1.

Fig. 1
figure 1

3D motion capture model

Hardware Setup: As an initial stage the capture volume must be defined accordingly the camera positions to be fixed such a way that at least any two cameras to see the markers, which is known as the field of view. Using the software interface nexus, the cameras to be calibrated and a global coordinate system is set to produce reliable 3D data. A special wand is used to calibrate the cameras.

Subject Preparation: The signer is a subject attaches a passive retro reflective marker set on to the surface of the hand and a static trial captured. The skeletal structure of the subject and the marker set is described in the nexus and stored as a Vicon skeleton template (.vst) file.

Motion Data Capture: Now in the dynamic trial the cameras capture the radiated light from the markers and produces blobs at exact locations. The cameras will use the calibration information to reconstruct and locate the markers on to the 3D coordinates.

Subject Calibration: The VST file is not necessary to be 100% accurate all the times. The lengths of the fingers vary with subject to subject. In this case the subject calibration fixes the problem.

Kinematic Fitting: All captured trials are labeled and fit the kinematic model to compute the joint angles. These joint angles are treated as outputs of motion capture and necessary to operate the 3D model.

2 Motion Capture Setup

The finger movements of a signer can be recorded using motion capture setup. We conducted motion capture using a 6-camera Vicon model at 100 Hz. Each hand is represented using 22 markers of size 6.4 mm. A 14 mm sized markers were used for head. A total of 54 markers were used to represent a signer. The below Fig. 2 shows the camera arrangement on a 3D Cartesian plane.

Fig. 2
figure 2

Camera positions on a 3D cartesian plane

2.1 Marker Placement for Hand Motion Capture

Several methods of marker placements were proposed by many researchers. Miyata [15] have introduced a model with 25 markers for each hand to measure wrist, fingers joint angles. Carpinella [16] developed a hand model with less number of markers but wrist movements are not included. In [17], three linear markers used at metacarpal, proximal and distal interphalangeal joints for each finger.

Even though several models were proposed by many researchers the sign language requires a sophisticated hand model in order to produce accurate data. Figure 4 shows the proposed hand model with marker placements based on the hand anatomy as shown in Fig. 3.

Fig. 3
figure 3

The hand anatomy

The model in Fig. 4 can capture every sign language character which can be easily recognized accurately. Whereas in 2D the visualization is only in one direction and the information which can only be seen in other direction can be lost leading to inappropriate classification. As shown in Fig. 5 blurring will also affect the recognition in 2D which leads to false classification.

Fig. 4
figure 4

Maker placement design and actual markers used for capturing the sign language

Fig. 5
figure 5

a 2D signs with missing finger information, b 2D sign with blur

3 Results and Discussions

To validate our approach, we captured the Indian sign ‘Good Morning’ in 3D using motion capture technology and successfully obtained the sign in all the direction which shows the information that is missed in 2D. Figure 6 shows the sign in different angle orientations. In frame 3 and 4 the finger information is missed. But the 3D frame will provide the missing information because it captured in all orientations. Angle plots and marker position plots were obtained as shown in Fig. 7, which helps in detailed study and accurate classification of sign.

Fig. 6
figure 6

3D image data captured for some signs from Indian sign language. The meaning of the sign is ‘good morning’

Fig. 7
figure 7

3D data plots of signs during motion capture

4 Conclusion

The results show the advantage of 3D motion capture in sign language recognition. It captures the data on a Cartesian coordinate which provides the information in all orientations. In 2D some finger’s information is missing and blurring is also affecting the recognition. Motion capture is immune to blur, lighting, color change and self- and external occlusions. A 3D hand model was designed for marker placements to capture the signs meaningfully. At this juncture, we can conclude that 3D motion capture is best suitable for Indian sign language recognition. We further working to develop best algorithms to process and classify the signs of Indian sign language.

Declaration: The images used in the work are of private image and due permission has been taken. Authors of the paper bear all responsibilities if any issues arise due to this. Publisher will not be responsible for same.