Keywords

1 Introduction

Among the basic needs in intelligent environments is the supply of personalized information to users through embedded systems in which these users can interact naturally with devices. Therefore, Hand Posture Recognition (HPR) techniques are of interest to facilitate daily life. HPR applications are on the way to being used to control home appliances, for interaction with computer games or for sign language translation. HPR is another input communicating with ubiquitous systems for achieving intuitive and natural interaction. The purpose of this study was detection of hand movements in real time. This is not easy due to the number of variants in forms and viewpoints hands can appear in, showing the palm or fist, partially hidden and with a wide variety of finger positions. The real-time detector proposed is based on segmentation by Background Subtraction, face and skin-color detection, supported by edge detection and analysis of geometric shapes.

The main selling point of hand gesture recognition is that you do not need to touch any input device. In human-computer interaction, there are examples of control by several types of hand movements. This paper presents a definition of basic hand-movements and gestures for interaction in a user interface. This system is intended to support user demands in real time. Identifying and following hands requires a robust system that is able to recognize the complex structure of the hand, follow it and interpret it. Some studies on real-life applications are described in [14, 23].

The application domains of hand gesture recognition [17] are: desktop applications, sign language, robotics, virtual reality, home automation, smart TV, medical environment, etc.

In the sign language recognition [4], the hand segmentation is a key task in the gesture recognition process, that’s why some authors use HSV [5] color model or YCbCr [1]. Currently, there are techniques related to Machine Learning such as Hidden Markov Models [21] or Neural Network [9], which are used in the recognition process. In fact, some projects are integrating devices as Microsoft Kinect [12], Intel RealSense [8] or Leap Motion [15], in order to identify the different signs of this language.

In home automation there is a project called Wisee [16] which is a system that detects when a person is doing a gesture in everywhere at home because the system works through WiFi.

In games, it should be pointed out that educational serious games recognize gestures to improve the learning and physical skills in preschool children [7].

The gestures in the field of medicine are useful to interact with medical instruments, control the resources management of a hospital and special needs people have an alternative way to interact with the computer. In addition, in this case [20] it had the aim of developing an intelligent operating room. This operating room is composed by four subsystems where one of them is hand gesture recognition. The user is able to move ray-X images, select the history of a patient from the database or write down a comment in the image.

The rest of the paper is organized as follows: Sect. 2 describes the methods which has been implemented in order to develop a functional prototype. Section 3 illustrates the prototype developed and the process used to evaluate it. Section 4 summarizes the conclusions and discusses the future work.

2 The HPR Proposal for Natural Interaction

The methods used are described below: Background subtraction, classifier cascade for face, skin colors detection, shape identification and tracking characteristics. The prototype was developed in C++ along with Qt libraries (for the graphical user interface and event management), OpenGL (for 3D effects in presenting processed images), the standard C++ library and Open CV library, which provides a large number of implementations of the algorithms most widely used in image analysis and processing. Figure 1 shows an overview of our work flow and the subsections below discuss each technique used in detail.

Fig. 1.
figure 1

The schema of the workflow system

2.1 The Recognition Process

The gesture recognition process is organized in three main parts: Background Subtraction, Calibration and Hand Pose Recognition.

The Background Subtraction process separates the background of the image obtained from the camera with the skin color detection method. In this way, the user image is obtained in order to detect hands and the background is ignored.

The calibration process is responsible for identifying the position of the hands. First of all, the face location is detected to make the process faster and more efficient, since the hands will be close to the face and thus the process will not seek in unnecessary regions. Once the face has been identified, it is removed as the background and we get the hands position.

Finally, in the process of HPR, is used the Lucas-Kanade algorithm to track the hands and identify its position at all times. Then, the geometric features extraction method identifies the contour of the hand and a few significant points in the fingers are placed to identify if the hand is close or open.

The following sections explains in more detail the main methods of the process.

2.2 Background Subtraction

Background subtraction is a method used in computer display to separate foreground objects from the background in images captured by a stationary video camera by calculating the differences between frames. This technique saves samples of previous images in the memory and generates a background model based on statistical properties of those samples. From there, a binary image is constructed that acts as a mask for segmentation or separation of background and foreground objects. In addition, a Gaussian blur filter is used to remove brusque changes in images. Figure 1 shows the result of background subtraction.

2.3 Cascade Classifiers

OpenCV implements face detection by a statistical method based on training samples (images with faces and images without faces) from which information is extracted that distinguishes a face from one which is not a face. One of the most widely used methods, developed by Viola and Jones [18], which trains to determine the characteristics of a particular object (face, eyes, hands, etc.), comes from this idea. In this study, class CascadeClassifier (available in OpenCV) is used in face detection for two purposes: To detect the starting position of the hands with respect to the face at the time of system calibration, and to identify image regions where it is unnecessary to search for hands. Starting conditions for correct use of this prototype are defined and one of them is the starting location of the person for beginning the calibration stage.

2.4 Skin Color Detection

Segmentation is complemented with skin-color detection to reduce the hand detection region. Lab Color space is used because it translates a change in color into a change with closely matching visual importance. The three parameters represent luminosity (L = 0 black and L = 100 is white), its position between red and green (a, green is negative and red is positive) and its position between yellow and blue (b, negative for blue and positive for yellow). Good results are obtained by selecting 109 to 133 for a component.

2.5 Geometric Features Extraction

Hand geometry extraction begins by using the OpenCV findContours function which returns a set of points that form the contours in the binary input image [6]. Some contours may be contained within others. However, only the outermost contours are retained and they are filled in with a solid color. This way, a new binary image is created which works like a mask to obtain a delineated image of the hands. The points on the convex hull of this image are likely to be fingers. However there will also be other convex points because part of the arm may appear in the image. One way to identify the convex points corresponding to finger tips is to make use of convexity defects. The points that make up the convex contour are found by making use of the convexHull function, and the points in the convexity defects are calculated using the convexityDefects function. The convex hull includes the contour of the hand and by selecting the points in each segment of it that are separated the most from this hull, the points forming the bottom of the spaces between the fingers are found. These points are the defects in convexity and enable the number of outstretched fingers to be found very simply. An example is shown in Fig. 2.

Fig. 2.
figure 2

The convex hull and convexity defects

One advantage of this finger detector is that the algorithm is very simple, and its disadvantage is its low precision in determining the number of extended fingers. However, in this study, it was only necessary to differentiate a closed hand from an open one, and this method is good enough for that. The threshold for distinguishing between a closed hand and an open one is three fingers.

Fig. 3.
figure 3

Forward-Backward Error method

2.6 Tracking Hands

The last stage consists of tracking the hands by implementing the Lucas-Kanade method based on patterns of apparent motion or optical flow. This method, developed by Bruce D. Lucas and Takeo Kanade is a widely used differential method for estimating optical flow [2]. It is assumed that the flow is constant in a local zone of the pixel under consideration, and solves the basic optical flow equations for all the pixels in that neighborhood using the least frames criterion. It is assumed that movement of the hand in the image between two consecutive frames is slight and approximately constant within the vicinity of the point considered. Then it may be assumed that the optical flow equation is maintained for all the pixels within the vicinity centered on the point considered. Among the OpenCV tools is the goodFeaturesToTrack function which enables a group of pixels good for tracking to be found. As would be expected, a group of pixels good for tracking is one which has texture and edges. This is a problem in images of hands because they have a uniform visual texture. After the groups of points (good features to track) are found, the calcOpticalFlowPyrLK function is used to find the corresponding characteristics or group of pixels in the next frame. Since the result could deliver false positives, the Forward - Backward Error method is used (see Fig. 3). The same method is used in the second frame to estimate the characteristics in the first frame. With this, characteristics common to both are acquired which are the correct tracking characteristics in the second frame.

3 Practice and Experience

This section shows the prototype developed to test the techniques of hand posture recognition described in previous section and the real interface where it has been applied. Finally, it is showed the results about the experiments which have been done in different conditions.

3.1 Prototype Interface

The experimental prototype requires a first adjustment before starting consisting of two steps: capture of images showing background, and exposure of both hands for their calibration. Figure 4 shows the graphical user interface for the first adjustment. When the user presses the Calibration button, a message on the screen tells him to move away from the scene so background images can be taken. Then the user is told to position his body and hands for calibration.

Fig. 4.
figure 4

Graphical user interface of our application

3.2 Application Domain

The concept mashup [22] in web development is related to Web applications which obtained data from other APIs or sources and thus creating a new service which is more useful for the user. The Enia project [19, 24] is based on mashups interfaces (see Fig. 5). This kind of interface is the application domain of the gesture recognition flow. The main feature of this project is the developing of a dynamic user interface which is adapted for the user habits. The user interface contains COTS interface components like widgets which are managed by an intelligent agent.

Fig. 5.
figure 5

The ENIA interface

The user can do different actions in the interface such as opening or closing the menu, create a component or move a component. The gestures about opening and closing the hand have been included in the prototype of ENIA project and have been associated to the opening or closing menu actions.

The gesture recognition has been designed in order to recognize them regardless of the hand. The aim of this is the interface can be managed by people with hemiplegia. These people just can move a side of their body, for instance the right hand or the left one, thus the system has been designed to recognize gestures with any of the hands. In addition, this feature is really useful for left-handed or right-handed people.

3.3 Experiments

The experimental prototype was evaluated in different environments on an HP ENVY 2.2 GHz Intel Core i7 computer with 8 GB DDR3 SDRAM at a rate of 15 fps with an RGB WebCam with resolution of 640\(\,\times \,\)480 pixels. There are no problems in capturing background images as long as the camera remains immobile and does not cause changes in that background. The clearest examples of events that generate problems for background subtraction are changes in scene lighting and shadows a person could cast while using the prototype. Techniques such as those mentioned in [6] can be used to minimize these problems. During the initial calibration, optimum results are obtained in detecting the hand by its geometry and color. Problems that arise during tracking with a decrease in accuracy rates are due to the uniform visual hand texture. This issue impedes detection of good points for tracking. However, the results shown in Table 1 are promising.

Table 1. Accuracy rate in hand posture recognition

Test videos which were included in the test sessions, were recorded with a non-uniform background and three people were doing different movements with one of their hands at the same time. The videos show opening and closing hand motions, and vertical and horizontal movements with the purpose of controlling the interface with joystick. Closing the hand, the users can control the joystick, however, if they open the hand the joystick control is released.

The videos for the test have these features:

  • 10 opening and closing hand movements (5 videos with each hand). The posture recognition was tested in two distances: 60 cm and 1 m. The subjects did 200 opening and closing movements and 20 videos were recorded totally.

  • 10 vertical motions with the close hand (5 videos with each hand). The posture recognition was tested in two distances: 60 cm and 1 m. The subjects did 200 opening and closing movements and 20 videos were recorded totally.

  • 10 horizontal motions with the close hand (5 videos with each hand). The posture recognition was tested in two distances: 60 cm and 1 m. The subjects did 200 opening and closing movements and 20 videos were recorded totally.

This process was tested with three different people, using a distinct background. The video dataset has 180 videos with 1800 movements totally. The information in Table 1 shows a hit rate of 91%. However, it’s necessary to know that the users who made the test, knew the interface operation, therefore, the hit rate wasn’t reduced because of inexperienced users.

4 Conclusions and Future Work

This paper has shown evidence that the background subtraction technique, when the camera is kept stationary, is an excellent option for segmenting images. It should also be emphasized that the accuracy rate is increased by combining different techniques to detect the same object, without each technique separately having to have a high accuracy rate. Customization of hand movements and their associated effect on the device controlled is very important in natural interaction systems. And this is logical, since in daily life different people may perform the same task with different movements. The most common examples are people who are right-handed or left-handed and perform their tasks with different hands.

Future work is to recognize more gestures and attach them to another actions of the ENIA project such as moving a component or zoom in and zoom out. Apart from hand gesture recognition, we want to add face recognition intended to include this system in adaptive Web user interfaces as an interaction mechanism for selecting and managing components. For environmental intelligence, this prototype could also facilitate work in interaction and modeling of actions and behaviors in an intelligent building.