1 Introduction

Gestures have been used as a global language for communication since the dawn of time. Facial gestures aid in comprehending emotions, whereas body movements indicate human action and reaction. Although significant, much work has already been poured into Facial gesture detection [1,2,3,4], as well as Human motion detection [5, 6], particularly in sports, deciphering body language, and transportation surveillance for which various video processing techniques have been utilized [7]. Because of the amount of permutations and combinations of motions performed by fingers and thumb, hand gesture identification is a unique task. It is essential in Human Machine Interaction (HMI) applications such as interactive gaming, augmented reality, virtual reality, smart homes, and industrial automation [8, 9].

Gesture detection is a fundamental component of computer vision applications [10]. The recent pandemic has underlined the need of human–machine association. Sign language has proven to be a lifeline for persons with special needs in communicating with the rest of the world. Many researchers have investigated various strategies for detecting sign language, which naturally incorporate hand motions [11,12,13]. Gesture detection technology is widely used in smart homes and gadgets. As a result, developing superior gesture detection system becomes imperative.

1.1 Dataset specifications summary

Subject

Artificial intelligence, Computer vision and pattern recognition, Human–computer interaction

Specific subject area

Gesture detection, Human–machine interaction, Depth estimation

Type of data Image

Resized image (480 × 360) RGB color space, (640 × 480 pixels) FGBG separated and depth estimated images

How the data were acquired

RGB Camera, ResNet 101, MiDas model, Graphics processing unit (GPU)

Data format

RAW

Parameters for data collection

FGBG separated and depth estimated image

Description of data collection

4200 images consisting of hand gestures in three different format

Data accessibility

Mendeley Data https://data.mendeley.com/datasets/vvdy2x5vpr/1

2 Literature review

An efficient application of technology is unlocked by an effective dataset. Modern technology and decision making are highly reliant on data. As previously stated, several datasets for diverse Sign languages have been created due to the importance of gestures in Sign language recognition. There are various dataset versions available for hand gesture detection as well [14, 15]. These datasets are mostly focused on single application. Furthermore, the dataset's subject is peculiar. Some exhibit gestures in the wild, while others have a clear background or only display depth images of the gestures. Also, the shape and size of the gestures have never been emphasised. This variation in hand gestures will undoubtedly enhance the accuracy of the gesture detection system. Further said dataset finds its application in building blocks of dynamic hand gesture detection through 3D reconstruction of hand gestures in real time [16].

Gesture detection is the process of transforming an image or sequence of pictures, together into systematic representation. There are several methodologies, datasets, and techniques used for human gesture detection [17], however many of them are bulky [18, 19] as they incorporate videos of gestures, overwhelming the system. One of the most active research areas in machine vision and human–machine communication is hand gesture detection, with several potential applications including video games, healthcare systems, wearable technology and video processing [20, 21].There are several image processing-based approaches accessible in the literature [22]. As shown in Table 1, multiple forms of datasets on hand gesture detection are present, however their specifications varies depending on their application. IHMG [23] depicts hand motions from two different viewpoints captured via an RGB-D camera whereas, the HGM-4 dataset [24] depicts letters of Vietnamese sign language in gesture form from four distinct angles. GesHome dataset [25] consists of 18 hand gestures recorded using an embedded accelerometer and gyroscope from 20 subjects of various ages, whilst the Holoscopic micro-gesture recognition (HoMG) dataset [26] was recorded using a holoscopic 3D camera and includes only 3 mainstream gestures from 40 participants in various settings and conditions.

Table 1 Comparison between various datasets

When it comes to high resolution image data, current datasets are rather large at the expense of data quality and clarity. Few datasets, such as the Leapgesture dataset [27], address this issue by including depth images, which enhances data accuracy and validity. Because of depth perception, data size may be considerably decreased without sacrificing dataset transparency. FabDepth I not only gives depth estimated images of 21 distinct gestures for use in Human–Machine interaction, but it also produces efficient FGBG separated images of the same gestures. In addition to these features a separate resized folder has scaled and normalized images having low light and noisy background.

3 Methodology

The experimental setup necessitates the use of a basic monocular RGB camera as well as a computer with a high-speed broadband internet connection. A minimum of 8 GB of RAM is necessary, as well as the most recent GPU. A GPU is specialized to image acquisition and processing, which speeds up the Dataset building process. The usage of an expensive stereo camera is avoided, reducing the system's cost and assuring exceptional repeatability. The size of the data plays a vital role in its application. Valid and clean but low sized data helps researchers to augment data whenever necessary, hence the resolution of the Depth estimated and FGBG separated images is kept at 640 × 480 while the resized gestures resolution is at 480 × 360 pixels.

Figure 1 depicts the dataset generation procedure. For consistency, the images are acquired from a live video and downsized to 75% of their original size. A folder for scaled images is created, including 100 images of the same gesture. For FGBG separation of acquired and scaled gesture images, the Semantic Segmentation methodology is adopted. Deeplab ResNet 101 [28] algorithm is highly effective for semantic segmentation where the background is maintained black, highlighting maximum pixels for a gesture. Accordingly, a pre-trained ResNet 101 is employed to get FGBG separated images in real time. The FGBG split pictures are then fed into Intel's online MiDas model [29], which allows accurate Depth estimate of various gestures, thus having 3 different folders for all 21 gestures. This dataset is publicly made available on Mendeley Repository [30]. Researchers can either individually use or combine the three different formats of 21 gestures for training and testing as per their requirements.

Fig. 1
figure 1

Hierarchical process of inception of dataset

4 Results and discussion

This particular data can help train the machine to recognize gestures in the wild. As seen in Fig. 2. The gesture caught in low light is first resized, then background noise is totally removed using the FGBG separation technique, and finally depth in the picture is discovered where the gesture is automatically highlighted. This method provides researchers with new perspectives and ideas to work with.

Fig. 2
figure 2

a A low-light, noisy background gesture is scaled, b FGBG separated, cropped and centered image, c Depth estimated image of the same gesture

A meaningful set of gestures is chosen on the basis on their use in HMI and one-on-one communication amongst persons with special needs. A total of 21 gesture folders are organised into three groups: Depth-Gestures, FGBGS-Gestures, and Resized-Gestures. Each of the first two subfolders has 50 images. Depth gesture images are unique in that they are not only resistant to poor light and background noise, but they also allow users to comprehend disparity in the given frame. In terms of hand gestures, there is a considerable risk of occlusion attributed to finger proximity and intertwining, which can impair the precision of FGBG gesture separation. Having a depth map estimation of the same gesture allows for substantially better investigation. The Resized Gestures folder contains 100 samples of the same gesture that have been scaled and normalised but still provide a realistic setting for training a machine to recognise gestures in the wild. The FabDepth I dataset is summarised in Table 2 based on its description.

Table 2 Key points of FabDepth I dataset

Aside from gestures indicating digits 0 through 9, other unconventional hand gestures are provided that might be readily employed in HMI or other computer vision solutions. Figure 3 outlines a few of them.

Fig. 3
figure 3

Few of the gestures which are Captured-resized, Center cropped, FGBG Separated and Depth estimated. a On, b Right, c Stop, d Ok

5 Conclusion and future scope

FabDepth I dataset is designed for hand-gesture identification and includes 21 distinct gestures that may be utilised in HMI. In contrast to the other public datasets, this is the first to combine hand-gestures in the wild with low light background and noise along with FGBG separated and Depth Estimated ones. There is no special camera used to measure depth, and no additional software or processing is done to discover FGBG split photos for various gestures. This dataset may be used to investigate hand-gesture detection challenges from numerous perspectives and improve the system's depth accuracy. Potential applications include sign language comprehension, contactless intelligent systems, augmented reality, virtual reality and interactive gaming. Researchers may approach this dataset through three different selections and combine any two of the three modalities in accordance with their requirements. As a result of this deconstruction, FabDepth I stands apart as a benchmark dataset for Hand gesture detection.

For future work, a variety of diverse samples, such as women, children, and senior citizens, can be employed [31]. A dataset that focuses entirely on HMI and can incorporate gestures done with two hands can also be created.