Abstract
There have been many studies of gesture recognition and posture estimation by combining real-world sensor and machine learning. In such situations, it is important to consider the sensor layout because the measurement result varies depending on the layout and the number of sensors as well as the motion to be measured. However, it takes time and effort to prototype devices multiple times in order to find a sensor layout that has high identification accuracy. Also, although it is necessary to acquire learning data for recognizing gestures, it takes time to get the data when the user changes the sensor layout. In this study, we developed software that can arrange real-world sensors. In this time, the software can handle distance-measuring sensors as real-world sensors. The user places these sensors freely in the software. The software measures the distance between the sensors and a mesh created from measurements of real-world deformation recorded by a Kinect. The classifier is generated using the time-series of distance data recorded by the software. In addition, we created a physical device that had the same sensor layout as the one designed with the software. We experimentally confirmed that the software could recognize the gestures on the physical device by using the generated classifier.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Physical changes in the real world can be measured by deploying the real-world sensors in daily life. For such measurements, the number and layout of the sensors have to be determined, taking into account the real-world changes we want to capture, the associated costs, device sizes, and so on. However, it is difficult to design an optimal sensor layout in consideration of these various limitations. In addition, gesture recognition and posture estimation can be performed by combining real-world sensors and machine learning. For example, Touché [1] identifies touch gestures by machine learning of the frequency response when the sensor is touched. In this way, complicated actions can be identified by learning in advance, but it is necessary to generate a classifier using unique sensor data according to the state to be identified. However, the number, position, and angle of sensors that can acquire such unique sensor data are often determined by trial and error. Moreover, it takes time and effort to accumulate learning data in the real world every time the number and layout of the sensors are changed.
In this study, we developed software that helps to arrange real-world sensors and support learning data collections. At present, the software handles distance-measuring sensors as real-world sensors. The user places models of the sensors freely in the software, and the software measures the distance between the placed sensors and real world objects recorded by an RGB-D camera (a Kinect). The obtained distance data is recorded as time series data to identify real-world changes such as gestures. Furthermore, in this study, to confirm that the software can accumulate learning data, we used a classifier generated from gesture data acquired by the software to identify gesture data acquired by sensors placed in the real world.
2 Related Works
2.1 Simulation of Real-World System on a Computer
There have been many studies to simulate by restoring the shapes and conditions of the real world on a computer. Ino et al. produced a CG model of a human hand on the computer by using motion capture data and software called Dhaibaworks which can generate digital models of the human body [2]. They estimated a grasping hand pose by performing learning with a convolutional neural network on a photographed image of a CG model and three-dimensional position data such as hand joints. Kanaya et al. developed a human-mobility sensing system simulator, called Humans, that uses map data, human agent information, sensor information, and network information [3]. They simulated the walking behavior of people in a city while simulating changes in the kind or arrangement of sensors on the map. Yuan et al. learned virtual egocentric video and posture of a humanoid model walking in a virtual world [4]. They estimated walking postures of people in the real world by combining data in images of the real world shot by a camera mounted on the head of a pedestrian and learning data in the virtual world. Since these studies are computer simulations, there is an advantage that it is not necessary to accumulate learning data in the real world and it does not take much money and time.
In this study, the target of simulation was a real-world sensor. We developed software that helps user to avoid the trial and error involved in setting up physical sensors and that also accumulates learning data.
2.2 Measuring Motion with Photo Sensors
We developed software that can handle distance-measuring sensors, which are a kind of photo sensor. There have been many studies on measuring and identifying human body movements by using photo sensors. For instance, AffectiveWear is a device for recognizing facial expressions that measures the distance between the skin and the frame of a pair of glasses by using an array of photo-reflective sensors on the frame [5]. EarTouch recognizes gesture using the ear as an input surface by using photo-reflective sensors attached to earphones to measure skin deformation inside the ear [6]. iRing measures the distance between the skin of the finger and photo-reflective sensors attached to a ring-shaped device to recognize bending of a finger [7]. Miyata et al. estimated the grasping posture of a hand by attaching a band-type device with embedded distance-measuring sensors to the grasped object [8].
In this study, we detected and identified hand gestures made in front of distance-measuring sensors.
3 System Implementation
3.1 Overview
We developed software for arranging the placement of distance-measuring sensors which are often used as real-world sensors. The flow of our system incorporating this software, shown in Fig. 1, is divided into a phase of accumulating learning data by the software and a phase of gesture recognition using data acquired by distance-measuring sensors in the real world. In the learning phase, the user makes gestures in front of a Kinect and acquires time-series distance data with sensors placed on the software. The acquired data is converted into image data from which the histograms of oriented gradients (HOG) feature is then extracted. The classifier is generated by learning with a support vector machine (SVM) that takes the obtained HOG feature as input. In the gesture recognition phase, the same kind of gesture as one performed in front of Kinect is performed in front of distance-measuring sensors arranged in the real world in the same layout as in the software, and sensor data is acquired. The acquired sensor data is converted in order of distance value and image, and the HOG feature is extracted. The obtained HOG feature performs gesture recognition by referring to the classifier generated from the data acquired in the software.
3.2 Configuration of Software
We used Unity to create the software. The real-world changing data was acquired as 3D depth information with a Kinect v2, and the shape of the real world was restored in the virtual world with Unity. First, we converted the Kinect depth coordinate system into a real-world space coordinate system (Unity’s coordinate system). The 3D depth information was converted into a 3D mesh on Unity. In order to construct a high-resolution mesh on Unity, we used depth information from the center of a 1/4 × 1/4 range of the image taken with Kinect. The simulated distance-measuring sensors were placed in the virtual world holding the mesh and the distances between the sensors and the mesh were measured, as shown by the black cylindrical object in Fig. 2.
The distance value was calculated by using a binary search algorithm using a Raycast function that can determine collision with the collider (we set up a 3D mesh is a collider). The binary search was continued down to the resolution of Kinect’s depth (1 mm), and when there was no 3D mesh in the measurable range of the sensor, we set the distance value to the maximum measurable distance of the sensor. The acquired distance values were displayed as graphs on Unity’s game screen (Fig. 2) and saved as a CSV file.
3.3 Real-World Sensor Data Acquisition
The distance-measuring sensors in the real world were GP2Y0E02A developed by Sharp Corporation. Each sensor consisted of an infrared LED and a position detector (PSD) and measured the distance to the target object based on an incident position of the reflective infrared light. They could detect distances from 4 cm to 50 cm. Since the relationship between the sensor value and the distance value is linear, the distance value was calculated from the sensor value (1) and recorded as time-series data.
The sensors were connected to a microcontroller (Arduino Pro Mini, 3.3 V). The sensor data were sent to a PC. Since the time intervals for acquiring the distance data differed between the software and the distance-measuring sensor, the data of the distance-measuring sensor was extracted according to the time interval of the software, which was the longer time interval, and saved as a CSV file.
3.4 Obtaining Learning Data and Making an Identification
We converted the CSV files acquired by the software and distance-measuring sensors into 2D-grayscale image data.
The start timing of imaging was taken to be when the distance data of any two sensors became smaller than 30 cm. We applied a low-pass filter (RC filter) to the distance data, as follows:
(x: present data, y: time series data frame)
After obtaining the filtered distance data, we converted it into 2D-grayscale image data by means of normalization. In the converted image, each distance value was allocated to the vertical axis, and the time-series data were allocated to the horizontal axis.
The SVM classifier was generated by learning using the HOG features of the 2D-grayscale image generated from the software data. A gesture performed in front of the distance-measuring sensors in the real world was identified using the HOG features of a 2D-grayscale image generated from the data of the distance-measuring sensors and the classifier generated with the data of the software.
4 Evaluation
4.1 Overview
We conducted experiments to determine whether the learning data accumulated by our software can be used to identify data measured with distance-measuring sensors in the real world. Seventeen distance-measuring sensors were attached around the perimeter of a Hue Go light developed by Philips. The distance between adjacent sensors was about 2.5 cm (Fig. 3 (left)). The sensors in the software were arranged in the same layout as those in the real world (Fig. 3 (right)). Next, we performed several gestures in front of the Kinect and acquired distance data. After that, the same gestures were performed in front of distance-measuring sensors placed in the real world, and the distance data were acquired from the sensor data. Figure 4 shows the types of identified gesture and examples of generated images. Each gesture was performed ten times in front of the Kinect and ten times in front of the distance-measuring sensors; 15 frames were acquired per gesture and imaged. Leave-one-out cross-validation was performed on the dataset acquired with the software to calculate the identification rate, and a classifier was generated. Then, we identified each gesture acquired by the distance-measuring sensors using the classifier generated with the data acquired on the software.
4.2 Results and Discussion
Figure 5 (left) shows the identification results when learning and identifying were performed using data acquired with the software; the average accuracy of the identification was 100.0%. Figure 5 (right) shows the identification results for each gesture data acquired with the distance-measuring sensors using the classifier generated with the software data; the average accuracy of the identification was 81.7%. This identification rate was high enough to show that it is possible to identify the data measured with the sensors in the real world with reference to the learning data from the software, instead of learning data from the real world. The misidentification of the right-to-left gesture as the left-to-right gesture in Fig. 5 (right) is considered to have been caused by the fact that the gesture has been completed before all frames had elapsed (Fig. 4). Therefore, the number of frames that can be acquired for one gesture can be increased by increasing the processing speed of the software, and the identification rate will also increase.
5 Limitations and Future Work
The simulated sensors placed in the software are not affected by ambient light, but the actual distance-measuring sensors are affected by it; the ambient light adds noise to the sensor data. The identification was performed on relatively slow gestures because the processing speed of the software was slow in the experiment and the distance data were acquired at intervals of about 0.30 s. Since the distance-measuring sensor can obtain data at a higher speed, it is considered possible to identify faster gestures by increasing the processing speed of the software. In future work, we will develop a system that presents a sensor layout that increases the identification rate when the user specifies the number of sensors.
6 Conclusion
We developed software for designing the layout of real-world sensors and verified the identifiability of the data acquired with the sensors placed in the real world by using a classifier generated from the data acquired on the software. The software acquires distance data by measuring the distance in the virtual world based on depth information from a Kinect and sensors placed in the virtual world. Data acquired with sensors in the real world can be identified using a classifier generated from data acquired on the software. We found that the average accuracy of the identification was 81.7%. In future work, we will increase the processing speed of the software and develop a system that presents a sensor layout having a high identification rate.
References
Sato, M., Poupyrev, I., Harrison, C.: Touché: enhancing touch interaction on humans, screens, liquids, and everyday objects. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2012), pp. 483–492. ACM, New York (2012)
Ino, K., Ienaga, N., Sugiura, Y., Saito, H., Miyata, N., Tada, M.: Grasping hand pose estimation from RGB images using digital human model by convolutional neural network. In: 9th International Conference and Exhibition on 3D Body Scanning and Processing Technologies, pp. 154–160. HOMETRICA CONSULTING, Lugano, Switzerland (2018)
Kanaya, T., Hiromori, A., Yamaguchi, H., Higashino, T.: Humans: a human mobility sensing simulator. In: 2012 5th International Conference on New Technologies, Mobility and Security (NTMS), pp. 1–4. IEEE, Istanbul, Turkey (2012)
Yuan, Y., Kitani, K.: 3D ego-pose estimation via imitation learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 763–778. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_45
Masai, K., Sugiura, Y., Ogata, M., Kunze, K., Inami, M., Sugimoto, M.: Facial expression recognition in daily life by embedded photo reflective sensors on smart eyewear. In: Proceedings of the 21st International Conference on Intelligent User Interfaces (IUI 2016), pp. 317–326. ACM, New York (2016)
Kikuchi, T., Sugiura, Y., Masai, K., Sugimoto, M., Thomas, B.: Eartouch: turning the ear into an input surface. In: Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI 2017), pp. 27:1–27:6. ACM, New York (2017)
Ogata, M., Sugiura, Y., Osawa, H., Imai, M.: iRing: intelligent ring using infrared reflection. In: Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (UIST 2012), pp. 131–136. ACM, New York (2012)
Miyata, N., Honoki, T., Maeda, Y., Endo, Y., Tada, M., Sugiura, Y.: Wrap sense: grasp capture by a band sensor. In: Proceedings of the 29th Annual Symposium on User Interface Software and Technology (UIST 2016), pp. 87–89. ACM, New York (2016)
Acknowledgments
This work was supported by JST AIP-PRISM JPMJCR18Y2 and JST PRESTO JPMJPR17J4.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Saito, A., Kawai, W., Sugiura, Y. (2019). Software to Support Layout and Data Collection for Machine-Learning-Based Real-World Sensors. In: Stephanidis, C. (eds) HCI International 2019 - Posters. HCII 2019. Communications in Computer and Information Science, vol 1033. Springer, Cham. https://doi.org/10.1007/978-3-030-23528-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-23528-4_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23527-7
Online ISBN: 978-3-030-23528-4
eBook Packages: Computer ScienceComputer Science (R0)