Keywords

1 Introduction

With the development of science and technology, recreational assistive robots are also a very fast growing field today. Some types of robots can move, arm robots can pick up objects, some types of robots can talk, or a combination of the above. The robot moves based on wheels to move faster. The arm robot is designed to hold, lift and move objects. Combining them makes robots more functional.

Krofitsch et al. [1] introduces intelligent robots for education and research. By using a smartphone approach, it makes controlling the robot more flexible than traditional ways. However, there is no live video and audio playback feature yet.

López-Rodríguez et al. [2] low-cost robot module design based on Android and Arduino, with Internet and Local Area Network (LAN) connectivity. This design has the disadvantage of lagging and limited functionality.

Vanitha et al. [3] designed a surveillance robot that could be controlled over the Internet using a Raspberry Pi board. This robot uses a PIR sensor to detect when a person or object enters the monitoring area and a smoke sensor detects a fire. Successfully designed a website to monitor and control robots over the internet.

Sun et al. [4] designed a surveillance robot capable of recording real-time images, video and audio footage for a specific area or person. This approach uses the ZigBee network to control the robot. In addition, the system includes a face recognition feature, which can recognize faces with maximum accuracy matching 70% of faces.

Pahuja and Kumar [5] designed an Android smartphone-controlled robot. Using Bluetooth HC-06 and microcontroller 89c2051 on Android smartphone device. The data received by the Bluetooth module from the Android Smartphone is provided as input to the controller. The system was able to live stream ambient video.

Bokade and Ratnaparkhe [6] have designed an Android-based application to control the robot wirelessly. The Android app will open a web page with a video screen for monitoring and buttons for controlling the robot and camera. Test results show that good quality video is very fast and clear, up to 15 frames per second.

Alli et al. [7] designed an obstacle detection and avoidance system for unmanned lawn mowers. Using ultrasonic and infrared sensor modules placed in front of the robot, the signal is transmitted to the Arduino microcontroller which calculates the distance and programs the robot to avoid obstacles. The performance of the system shows 85% accuracy and 0.18% failure probability respectively.

Pedre et al. [8] proposal on the design of a versatile low-cost mobile robot for research and education. However, the robot is complicated for pre-teens and teenagers, and its design focuses on vision-based automatic navigation, excluding audio capabilities.

This study aims to design, and control a mobile robot have a robotic arm with 3 degrees of freedom, which can be controlled via smartphone, hand gestures and voice.

2 System Overview

Figure 1 is the robot model. The robot consists of a movable robot body and a robotic arm with clamps, camera and micro. The robot is controlled by voice, hand gestures and smartphone. The Smartphone can connect to the robot via the HC-05 Bluetooth module to send and receive signals. These signals are used to control the Robot. The camera is used to capture images, recognize and mark points on the hand to control the robot according to hand gestures. Voice control commands will be received through the mic. These parameters are used by the robot to recognize and fulfill requests.

Fig. 1
A diagram and a photo of the Smart Joybot consist of chassis, sensors, actuators, microcontrollers, and power source.

Smart Joybot

3 Control the Robot

In this robot, Arduino Mega is selected to control servo motor, send and receive signals from module bluetooth HC-05 and Raspberry Pi. Module bluetooth is used to connect to the Smartphone and Raspberry Pi. Camera and micro are used for hand gesture and voice recognition. The system for controlling the robot is shown in Fig. 2. The power supply includes three 3.7 V batteries responsible for supplying power to the system through the L298 driver circuit to provide 12 V power for the DC motor and 5 V for the Raspberry Pi, DC control board, super sensor audio, servo motor, camera module and USB microphone (each with operating voltage between 3 and 7 V).

Fig. 2
A flowchart depicts the hardware connection system. Raspberry P i is attached with a microphone and camera for data processing. Raspberry P i and smartphone connect to an Arduino to control four servo motors and two motors with encoder feedback via Bluetooth and the L 298 N motor driver.

Hardware connection diagram of the system

The robot servos are controlled via Arduino. Arm servomotors are controlled position precisely. The 2 wheel motors are speed controlled by pulse width modulation and encoder feedback for precise control of the robot's speed.

3.1 Control by Smartphone

To control the robot to move and pick up objects at the desired positions, the robot is connected to the phone via the bluetooth module HC 05. Using the control app programmed on the MIT App Inventor. We design a GUI like scratch for elementary and middle school students as in Figs. 3 and 4. Student can drag block to control the wheel on mobile robot, camera angle, servo of mobile robot arm.

Fig. 3
A screenshot of the mobile app exhibits an activity diagram for an action of a robotic arm with text in a foreign language. Then, a servo 3 is activated to rotate.

Smart phone control with drag and drop GUI

Fig. 4
A photo of the robotic device on the floor exhibits the automatic handling of an object with its arm.

Object picking task

3.2 Voice Recognition

The voice recognition system supports humans in interacting with robots more flexibly. In this study, Speech Recognition library [9] was used to simplify speech recognition and give commands to the robot to perform. A simple system: The received sound will be heard by the smartphone and recognize the words and phrases use google voice recognization. Voices in predefined cases are recognized to give tasks to the robot to perform.

3.3 Hand Gesture Recognition

The ability to perceive hand shapes and movements can be a key component in improving user experience. Robust real-time hand perception is a decidedly challenging computer vision task. In this robot, MediaPipe Hands library [10] is used for easy hand gesture recognition.

MediaPipe Hands is a high-fidelity hand and finger tracking solution. It employs machine learning (ML) to infer 21 3D landmarks of a hand from just a single frame.

MediaPipe Hands utilizes an ML pipeline consisting of multiple models working together: A palm detection model that operates on the full image and returns an oriented hand bounding box. A hand landmark model that operates on the cropped image region defined by the palm detector and returns high-fidelity 3D hand keypoints.

3.4 Palm Detection Model

To detect initial hand locations, we designed a single-shot detector model optimized for mobile robot in real-time. Detecting hands is a decidedly complex task: our lite model and full model have to work across a variety of hand sizes with a large scale span (~20×) relative to the image frame and be able to detect occluded and self-occluded hands. The lack of such features in hands makes it comparatively difficult to detect them reliably from their visual features alone. Instead, providing additional context, like arm, body, or person features, aids accurate hand localization.

3.5 Hand Landmark Model

After the palm detection over the whole image, our subsequent hand landmark model performs precise keypoint localization of 21 3D hand-knuckle coordinates inside the detected hand regions via regression, that is direct coordinate prediction. The model learns a consistent internal hand pose representation and is robust even to partially visible hands and self-occlusions.

To obtain ground truth data, we have manually annotated ~30 K real-world images with 21 3D coordinates, as shown below (we take Z-value from image depth map, if it exists per corresponding coordinate). To better cover the possible hand poses and provide additional supervision on the nature of hand geometry, we also render a high-quality synthetic hand model over various backgrounds and map it to the corresponding 3D coordinates (Figs. 5 and 6).

Fig. 5
A geometrical diagram of the hand model. It exhibits the motor control signals starting from the palm at 0 and directing to the fingers. The joint positions in the fingers are numbered from 1 to 20, progressing from the thumb to the little finger, respectively.

Marked points of the hand

Fig. 6
A photo of the robotic device placed on the table responds to the movements of the human hand.

Hand follow task

4 Results

After conducting programming and control on the actual model we have the following results:

The robot can be precisely controlled by smartphone such as moving, picking and dropping objects, saving and re-executing saved actions in turn.

The robot can recognize hand gestures with high accuracy, performing simple operations according to hand gestures such as go straight, turn left, turn right, stop.

Robot identifies and successfully fulfills voice request. In some case due to the influence of the environment and pronunciation, the robot may not be able to recognize the voice command.

5 Conclusions

Successfully designed Smart JOYBOT capable of manual control and automatic movement, capable of picking and releasing small objects. By using the front-mounted smartphone as Camera and receiving voice, Smart JOYBOT can display images and receive control commands while being remotely controlled manually or automatically via an app. installed in Android smartphone via internet.

In addition, the robot can also be controlled by hand gestures or voice, and can be programmed to move simply by the drag-and-drop Scratch interface. Therefore, it can be applied in teaching STEAM at all levels from primary school to university level.