Keywords

1 Introduction

Object tracking in computer vision can be done either by using a marker-less or marker-based approach. Computer vision systems have been using Fiducial markers for pose estimation in different applications such as augmented reality [5] and robot navigation [4]. With the advancements in Augmented Reality (AR), new tools such as AugmentedReality uco (ArUco) [6] markers have been introduced to the literature. ArUco markers, are used to tackle the localization problem in AR, allowing camera pose estimation to be carried out by a binary matrix. Using a binary matrix not just simplifies the process but also increases the efficiency. As a part of our initiative to create a cost-efficient, 24/7 accessible, Virtual Reality (VR) based chemistry lab for underprivileged students, we wanted to create an alternative way of interacting with the virtual scene. In this study, we used ArUco markers to create a low-cost keyboard only using a piece of paper and an off-the-shelf webcam. We believe this method of keyboard will be more beneficial to the user as they can see the keys before they are typing in the corner of the screen instead of an insufficient on the screen VR keyboard or a regular keyboard where the user can’t see what they are typing with a VR headset.

Our setup is straightforward and consists of a webcam and a piece of paper with a keyboard-like pattern printed on it, see [4]. Basically, there is a numeric keypad with rectangular regions labeled from 0 to 9, and each region has the ArUco code for the corresponding key value. When the system is running in “live” mode, users can use this printed paper as a keypad. All “touched” key values will be translated to keypress events and the printed paper will act as a regular keyboard. This system needs both computer vision and smoothing/filtering techniques which can be fine-tuned for an average user or a specific user.

In this paper, we propose using a real-time OpenCV-based computer vision approach and a specific state-machine based fast smoothing/filtering algorithm. The filter has a parameter, N, which represents the filter strength. We have first created a dataset using six-digit numbers typed by the same user using this paper-based keyboard. Then we varied the filter strength parameter N from 1 to 10 and measured the accuracy of the proposed paper-based keyboard. For a specific trained user, and for a specific dataset of size ten, the system accuracy is measured as 0.0 for N less than 4, 0.6 for \(N=4\), 1.0 for \(N=5,6,7\), 0.3 for \(N=8\), 0.10 for \(N=9\), and finally 0.0 for \(N=10\). Optimal values seem to be \(N=5,6,7\), but if we eliminate \(N=5\) and 7 as potential boundary cases, we get \(N=6\) as the optimal choice for this specific trained user.

Fig. 1.
figure 1

ArUco keyboard.

The ArUco keyboard used in this study is shown in Fig. 1, and the base system demo is presented in Fig. 4. As potential extensions of the base system, we have also designed and evaluated a stereo camera and an IMU sensor based system with various sensor fusion techniques. The specific stereo camera used for this research was a USB3 ZED camera, see Fig. 2, tested with a GeForce GTX 1050 Ti Max-Q 4 GB laptop running Ubuntu 18 LTS. It has been observed that the stereo camera reduces occlusion related issues, and results more robust detection performance. The IMU sensor used in this research is a GY-521 accelerometer and gyro sensor, see Fig. 3, interfaced to an Arduino Uno board over the SPI interface. The IMU sensor detects keypress/touch related vibrations and sends this information to the host computer. Most of the mobile devices used today do have a camera(s) and IMU sensors, therefore the proposed extensions to our base system is quite realistic. Basically, the IMU sensor detects vibrations which in turn simplifies the KeyPress detection problem.

In summary, it has been observed that use of any of these additional sensors, i.e. additional camera and/or IMU sensor, improves the overall system performance.

Fig. 2.
figure 2

Stereo camera (USB3 ZED camera) based ArUco keyboard system.

Fig. 3.
figure 3

GY521: InvenSense MPU-6050 based IMU sensor board interfaced to an Arduino Uno over SPI.

2 Base System

Our base system [1] shown in, Fig. 4, has a single webcam. The algorithm used in this base implementation is shown in Algorithm 1. In each OpenCV frame, we first detect all visible ArUco markers and then determine all blocked ArUco markers. For each frame, we also determine the highest blocked marker value. If the highest blocked marker is the same during the past N frames, then we generate a KeyPress event. A KeyRelease event is generated in the first frame having all ArUco markers visible.

figure a
Fig. 4.
figure 4

ArUco keyboard: Base system, https://youtu.be/tnKc6zvXliY

The detection performance of the system depends on the value of N. For a specific trained user, \(N=6\) value is found to be optimal for a webcam running at 25 frames/s. In general, the optimal N value depends on the frame rate and the user.

3 IMU Sensor Based System

The base system presented in the previous section works by detecting blocked ArUco markers in each frame. However, this single camera based system cannot differentiate between blocked without touch and blocked because of touch cases. Because of this technical difficulty, a user should be trained not to keep his/her hand stationary for a “long” period of time (5/25 s) while being visible by the camera. Although this is technically possible, and the training process is observed to be not that difficult, we have developed an alternative approach to overcome this problem.

This new approach [2] is based on using an IMU sensor, see Fig. 5, to differentiate between blocked without touch and blocked because of touch. IMU sensors have acceleration sensors in xy and z directions, and can be used to detect even a slight tap on a surface. We have used an InvenSense MPU6050 chip as our IMU sensor. A first order digital low-pass filter is used for smoothing, and a thresholding with hysteresis is used for tap detection. In this case, the microcontroller sends the tap events to the host device, and only after this stage the host device starts executing Algorithm 1. See the full source code given in the appendix for digital low-pass filter, thresholding and hysteresis parameters.

Fig. 5.
figure 5

ArUco keyboard: IMU sensor based version, https://youtu.be/sIuhZQpu0AE

4 Stereo Camera Based System

As a final improvement of the proposed ArUco keyboard system, we have implemented a stereo camera based solution [3] shown in Fig. 6. A stereo camera based system provides more data which can be used to improve the overall system performance, and this is true with or without using an IMU sensor. Sometimes, we may have certain ArUco markers being blocked because of occlusion, and not because of touch. Basically, after a touch or tap is detected we may still have multiple ArUco markers being blocked. The priority scheme used in Algorithm 1 seems to work for most cases, but the failure rate is non-zero and becomes more noticeable if the ArUco keyboard is rotated significantly. A stereo camera greatly improves detection performance for such cases.

Fig. 6.
figure 6

ArUco keyboard: Stereo camera version (USB3 ZED camera), https://youtu.be/ssbv2NqfAJg

If both cameras report a particular ArUco marker as not detected, then the probability of failure, i.e. being not-detected because of occlusion, will be smaller compared to a single camera system. Therefore, use of a stereo camera reduces false KeyPress events and also key value errors. But it requires more processing power and more complex hardware which may not be practical for all possible use cases.

5 Conclusion

In this paper, we have presented a paper based numeric keypad using ArUco markers. Full details of all source codes are given in the appendix. This system can be quite useful as a low-cost disposable keyboard for VR systems and mobile devices equipped with a camera. It has been observed that, the use of an IMU sensor greatly improves the overall system performance. Since almost all mobile devices, whether it is a phone or a tablet, do have IMU sensors, the improved IMU based keyboard can be used without any additional sensor or equipment. We have also implemented a stereo camera based system, but to the best of our knowledge mobile devices with stereo cameras are not widely available. The stereo camera based implementation is a feasible alternative for VR systems.