1 Introduction

Replicating natural capabilities like dexterity and touch while building a prosthetic hand has been challenging in rehabilitation robotics communities. One of the primary objectives of an intelligent prosthetic hand is its inherent ability to recognize objects on touch or during a grasp. The implementation of object recognition by robotic grippers or prosthetic hands requires prior information on the extrinsic structural properties of objects and the gripper’s position and orientation (Martinez-Hernandez et al. 2017; Watanabe et al. 2017). Vision-based learning methods have by far dominated the realm of object recognition for robotic hands. Concrete vision-based object recognition techniques for robots and robotic hands have been discussed recently by Yu et al. (2013), Petković et al. (2016), Bandou et al. (2017), Cognolato et al. (2017), Asif et al. (2017) and Martinez-Martin and Del Pobil (2017). Still, vision is not sufficient for the perceptual requirements of a robot in many situations. Visual approaches have their shortcomings, as they are affected by self-occlusions, lighting conditions, and limitations in the field of view (Papazov et al. 2012; Chakraborty et al. 2018). To improvise on these limitations, deep learning-based strategies for object reconstruction (Schmidt et al. 2018; Mahler et al. 2017; Tian et al. 2019), prediction of object shapes based on properties of hands through temporal (Ansuini et al. 2015; Säfström and Edin 2008), tactile (Dang et al. 2011) and parametric (Spiers et al. 2016) aspects have been tried and tested. Yet, achieving an effective prediction rate for learning to grasp unknown objects is still a challenge (Murali et al. 2018). In a recent work by Falco et al. (2019), the implementation of multi-modal perception approaches using visual and tactile information with advanced learning algorithms to recognize objects achieved a high prediction rate. This confirms that, along with visual percepts, hand kinematics is equally crucial for realizing objects’ dimensional and structural properties. The significance of kinematic information for object recognition, which primarily includes grasping force and joint angle shifts of finger phalanges from reference frames, has also been discussed in works by Boruah et al. (2019); Cotugno et al. (2018), and Boruah et al. (2018).

Achievement of human-level abstraction for object recognition by today’s image processing techniques is a challenging task (Ayzenberg and Lourenco 2019). Preliminary kinematic information for object recognition can always be extracted from a form closure (Lakshiminarayana 1978) on the object. A form closure confirms the completeness of a grasp as the contacts can resist any wrench applied to the grasped object (Asada and Kitagawa 1989). Such an enclosure also enables the human brain in recognizing and memorizing objects through their structural information along with the enclosure configuration, transmitted to the brain by the numerous mechanoreceptors of the human hand (Lederman and Klatzky 1987). Possibilities of such exploratory procedure-based object recognition approach by human hand were stated by Lederman and Klatzky (1993). The shape is among the critical cues that help the human brain to recognize an object on grasp (Vàsquez and Perdereau 2017).

Grasping requires the knowledge of structural information of objects followed by mapping of the hand kinematics to a grip aperture (Buxbaum et al. 2006). Analyzing an object’s intrinsic structural properties leads identification of an object based on its shape enclosed by the grasp. Such shape-based object recognition approaches primarily rely on the joint kinematic values of the hand.

An attempt to manipulate unknown objects using kinematics and tactile data was carried out by Montaño and Suárez (2019). The authors did not report using any kinematic model; hence,they implemented manipulation without object recognition. Ansuini et al. (2015) attempted to predict the size of objects before grasping using Support Vector Classification (SVC). They used a motion capture system to record extrinsic kinematic information of the hand like wrist velocity and grip aperture while trying to reach the objects. Their results explicitly displayed variation of features with varying object sizes. Gorges et al. (2010) used haptic key features, a combination of kinematic and tactile information to recognize objects on grasp. They trained their dataset on Self Organizing Maps. However, the recognition rates showed significant differences while classifying single objects. Gorges et al. (2011) discussed an object recognition method using tactile sensor data and hand kinematic information. They generated a set of point clouds through haptic exploration and the classification process was carried out using K nearest neighbour (KNN) classifier. The procedure successfully recognized most of the test objects with the highest prediction accuracy for spherical objects. Vàsquez and Perdereau (2017) reported generation of size invariant signatures of objects based on proprioceptive data of a robotic hand. They encoded kinematic information of joint angle rotations to signatures to represent and detect the object’s shape based on simulated and real-time data. Their results showed a remarkable correlation among real objects and their corresponding signatures. A tactile-based method for recognizing and manipulating grasped elastic objects has been presented by Delgado et al. (2017), focusing on the contact point forces that change according to the deformative characteristics of objects. Vásquez et al. (2017) reported an object’s shape recognition approach using machine learning. The authors tried to increase the system’s accuracy by including multiple modalities like proprioceptive signatures and contact normal features. They implemented multiclass classification using sequential training on a Neural Forest and achieved remarkable recognition accuracies for all the test shapes involved in the experimental simulations.

This review evidenced that both tactile and kinematic information plays a vital role for recognition of grasped objects. Most of the works which involved real-time grasp experiments, were implemented on state-of-art but expensive apparatus. We attempt a comprehensive approach in this work with a customized low-cost data glove, focusing on generating kinematic grasp data for object recognition using contour-dependent grasp pattern extraction.

A kinematic model based on the finger joint angles contain shape signatures that assist human proprioception, as information about the object’s shape doesn’t vary concerning its pose and size in a grasp (Vàsquez and Perdereau 2017). Hence, in this work, models of the human hand solving the forward kinematics with Denavit–Hartenberg (DH) parameters (Hartenberg and Denavit 1955) have been studied, along with an analysis of finger kinematics values with interphalangeal relationships. A data glove embedded with flex sensors has been designed for the acquisition of a grasp dataset. Experiments have been performed where subjects grasped selected daily in use objects for extraction of real-time finger joint deflection information. Considering only circular grasps with natural curls, seven objects belonging to two shape categories: cylinder and sphere, were considered in this work. The data is used to train state-of-the-art classification algorithms. A comparative analysis of individual classifier results has been presented to show the performance of the individual classifiers while recognizing objects.

This paper focuses on the visionless method for object recognition and uses shape entities and kinematic information. The kinematic information is encoded as a shape primitive feature and is used to recognize objects used during activities of daily living. The work in this paper contributes to a grasp polyhedron visualization method concerning an object’s shape. It claims that the area of the grasp polyhedron is a significant feature during visionless object recognition. Samples from the grasp dataset were used to generate 3D grasp polyhedrons respective to the grasp contours of the objects. These polyhedrons revealed critical structural observations while mapping spatial information of an object to its individual shapes. Works by Kamper et al. (2003), Sartori et al. (2011), and Allen and Michelman (1990) suggested using fingertip coordinates for the realization of an object’s shape during a grasp. Shape recognition works by Yoshikawa et al. (2008), Kucherhan et al. (2018), Liu et al. (2016) were performed by robotic hands equipped with tactile sensors at the fingertips, as fingertips provide the maximum amount of information related to the grasp enclosure and the shape and size of the grasped object. Kimoto and Matsue (2011) suggested that fingertips are the prime locations for extracting crucial object properties like spatial features, roughness, and friction. Eventually, the information from other joint coordinates would be redundant compared to the fingertip coordinates. This work focused on the fingertip coordinates while deducing the grasp polyhedron. Areas of these polyhedrons were used as an additional feature by classifiers while recognizing objects. Classification results revealed that this new area feature improved the accuracy of all the classifiers compared to the accuracy of the classifiers trained with earlier kinematic features.

The organization of the rest of this article is as follows: Deduction of DH parameters values of the hand from interphalangeal relationships have been detailed in Sect. 2. Section 3 presents the experimental set-up, including the prototype description of the data glove, grasp protocols and experimental objects. Practical outcomes including a comparative evaluation between the recognition accuracies of learning with plain kinematics vs. shape primitives are discussed in Sect. 4. This work has been concluded in Sect. 5 with a deliberation on how the approach is useful for object recognition by analyzing the composition of the generated grasp data and results of the experimental evaluations.

2 Estimation of fingertip coordinates

The human hand is a complex kinematic system with 21 controllable DoFs (Bullock et al. 2013). Perceiving an object’s shape during a grasp from the fingertip contact points requires the deduction of forward kinematics of the hand. Deducing forward kinematics of a physical system with revolute and prismatic joints refer to the calculation of end effector coordinate position and orientation. There are many standard ways of calculating the forward kinematics, the most popular being the DH notation. The DH notation provides four DH parameters related to the position and orientation for translation or rotation or both between two frames of reference (Saha 2014).

The scope of enhancement in thumb’s existing forward kinematic models has been identified during the initial investigation in this work. Earlier, the thumb’s abduction adduction motion was assumed to originate from the same plane as of the other four fingers, with the wrist as the base frame of reference. This had been reported in previous works by Parasuraman and Zhen (2009), Cobos et al. (2010), Cordella et al. (2014), where the trapezo metacarpal (TMC) joint’s abduction adduction (ab/ad) angle (i.e., DH parameter representing the angle around the common normal between the previous Z-axis and current Z-axis, with notations as in Saha (2014)) is +90/-90 or 0 degrees, depending on the initial orientation of the base frame. Usually, the thumb seems to be residing on the same plane. On careful observation of the subjects during the experiments conducted in this work, it was realized that while initiation of a grasp formation, the metacarpal link starts with a minimal deviation from the plane along with the usual TMC flexion-extension (f/e) motion. Fig. 1 provides a visual explanation of this deviation. The deviation occurs as the rotational DoF of the TMC joint shifts the axis of rotation for abduction adduction motion by a certain degree as soon as the thumb starts moving for a grasp. Works by Ma’touq et al. (2018) and Lenarčič et al. (2013) also mentioned this auxiliary rotational degree of freedom (DoF) of the TMC joint.

Fig. 1
figure 1

Change of thumb initiation plane during a grasp. a Initial position with \(X_iY_iZ_i\) as frame of reference at the wrist and \(X_{i+1}Y_{i+1}Z_{i+1}\) as frame at the TMC joint for ab/ad motion. b Showing existance of the \(\alpha\) DH parameter between the two frames during initiation of a grasp. c The \(\alpha\) DH parameter while implementing a grasp on a circular object

The angular displacement has been incorporated in this work by considering an initiation angle (30 degrees) to the Z-axis of ab/ad movement of thumb’s TMC, i.e. the \(\alpha\) DH parameter. This angle has been manually recorded using a goniometer during the grasp experiments discussed in section 4. The remaining f/e motions of the metacarpophalangeal (MCP) and interphalangeal (IP) joints will remain the same as in the existing literature. Table 1 presents the modified DH parameter values for the thumb. \(l_{mc}\), \(l_{pp}\) and \(l_{dp}\) are lengths of the metacarpal, proximal and distal phalanges respectively.

Table 1 Improvised DH parameter values of the thumb

As per the anatomical investigation works carried out by Alexander and Viktor (2010), the relationship between the phalanges and metacarpal bones has been established according to Table 2. Figure 2 presents a hand model abiding by the kinematics, where \(d_{th}\) , \(d_i\) , \(d_m\) , \(d_r\) and \(d_l\) are the distal phalanges of thumb, index, middle, ring, and little fingers, and \(W_l\) is the wrist breadth. A dark joint in Fig. 2 represents two DoF (abduction-adduction and flexion-extension) whereas a white joint represents single DoF (flexion-extension only).

Table 2 Dependency ratios between distal and rest of the phalanges of a human hand
Fig. 2
figure 2

The kinematic model of hand with interphalangeal relationships

As reflected from Fig. 2, 20 DoFs has been considered in this model with 19 links, the maximum required for power, and precision grasp analysis during a circular grasp. The DoF of wrist movement has been ignored in this work, which is mainly required for object manipulation tasks.

Fig. 3
figure 3

Metacarpal-to-wrist angle (\(\omega _i\)) for index finger

Figure 2 introduces four triangles which are formed from the based frame of reference to the metacarpal phalanges of each of the fingers. One of them formed by the metacarpal phalange of the index finger is shown in Fig. 3. These four angles formed at the base frame on the wrist have been defined as the metacarpal-to-wrist angles (\(\omega\)). \(\omega\) is used to deduce the DH parameter’s \(\alpha\) values of the metacarpal phalange’s abduction adduction motion. The metacarpal-to-wrist angle of the middle, ring, and little fingers are diagrammatically similar to Fig. 3, with different dimensions.

From Fig. 3, the metacarpal-to-wrist angle of the index finger \(\omega _i\) is calculated as:

$$\begin{aligned} \omega _i = \arctan \left( \frac{8.6*d_i}{W_l}\right) . \end{aligned}$$
(1)

Similarly, the metacarpal-to-wrist angles of the middle, ring and little fingers are represented in equations 2-4, respectively, by the terms \(\omega _m\), \(\omega _r\) and \(\omega _l\).

$$\begin{aligned} \omega _m= & {} \arctan \left( \frac{22.2*d_m}{W_l}\right) \end{aligned}$$
(2)
$$\begin{aligned} \omega _r= & {} \arctan \left( \frac{20.4*d_r}{W_l}\right) \end{aligned}$$
(3)
$$\begin{aligned} \omega _l= & {} \arctan \left( \frac{6.8*d_l}{W_l}\right) . \end{aligned}$$
(4)

Accordingly, the tables from Tables 3, 4, 5, 6 and 7 represent the DH parameters with their values to extract fingertip coordinates by the approach presented in this work. \(J_{<x> <i>ab/ad}\) represents the abduction-adduction motion at the \(i^{th}\) joint of finger x, where x is: th for thumb, i for index, m for middle, r for ring and l for little. Similarly \(J_{<x> <i>f/e}\) represents the flexion-extension motion at the ith joint of finger x. For convenience, Table 11 (Appendix) lists out the definitions of the annotations used in Tables 3, 4, 5, 6 and 7 to represent motions in the finger joints.

Table 3 DH parameters for thumb
Table 4 DH parameters for index finger
Table 5 DH parameters for middle finger
Table 6 DH parameters for ring finger
Table 7 DH parameters for little finger

The overall transformation matrix for the ith finger i was deduced by multiplication of the transformation matrices (from the base frame of reference i.e., \(i\_0\) to the end-effector(ee) i.e., \(i\_ee\)) as follows:

$$\begin{aligned} T^{\text {i}\_\text {0}}_{\text {i}\_\text {ee}} = T^{\text {i}\_\text {0}}_{\text {i}\_\text {0}\_\text {ab/ad}} * T^{\text {i}\_\text {0}\_\text {ab/ad}}_{\text {i}\_\text {0}\_\text {f/e}} * T^{\text {i}\_\text {0}\_\text {f/e}}_{\text {i}\_\text {1}\_\text {f/e}} *T^{\text {i}\_\text {1}\_\text {f/e}}_{\text {i}\_\text {2}\_\text {f/e}} *T^{\text {i}\_\text {2}\_\text {f/e}}_{\text {i}\_\text {ee}} \end{aligned}$$
(5)

where a single transformation matrix is of the form:

$$\begin{aligned} T^{\text {n-1}}_{\text {n}} = \begin{bmatrix} \cos \theta &{} -\sin \theta \cos \alpha &{} \sin \theta \sin \alpha &{} a\cos \theta \\ \sin \theta &{} \cos \theta \cos \alpha &{} -\cos \theta \sin \alpha &{} a\sin \theta \\ 0 &{} \sin \alpha &{} \cos \alpha &{} d \\ 0 &{} 0 &{} 0 &{} 1 \end{bmatrix} \end{aligned}$$
(6)

Equations 5 and 6 have been derived from the homogeneous matrix transformations of serial links as discussed in Saha (2014). In Sect. 4, we present how the end-effector coordinates are accessed from the above transformation matrices to trace the contact polyhedrons during a force closure grasp.

3 Experimental setup

3.1 Development of the data-glove prototype

Data-gloves are widely used wearable haptic devices for experiments related to the kinematic analysis of the human hand. They are well-suited platforms for studying spatial and temporal aspects of the dexterity features of human hands (Pacchierotti et al. 2017). Data-gloves can be either custom-made for specific experiments as in Temoche et al. (2012), or products like the CyberGlove © II, III (CyberGlove Systems LLC), which have been implemented by Jarque-Bou et al. (2019) and Stival et al. (2019). We design a low-cost data-glove in this work to attain the following objectives:

  • To provide real-time information of phalanges’ motion from the embedded joint and tactile elements.

  • To build a dataset to analyze the hand-object kinematics and establish its significance for object recognition.

  • To provide kinematic inputs for a 3D grasp polyhedron generation algorithm.

Based on the categories of kinaesthetic gloves discussed by Pacchierotti et al. (2017), a dorsal system has been considered for this work. The dorsal surface of the glove has been embedded with 5 flex sensors (approximately 55 mm active length) on the metacarpophalangeal (MCP) joints to record the flexion-extension movements. Figure 4a shows a data glove prototype, and Fig. 4b shows the experimental platform.

Fig. 4
figure 4

a Prototype of the glove design. b Experimental data collection with the glove

No additional sensors were installed on the glove to record the distal interphalangeal (DIP), proximal interphalangeal (PIP), interphalangeal (IP), and TMC joints deflections. The following constraints have been considered while selecting sensor positions on the glove:

  • The inter-finger constraints for index, ring, middle and ring fingers, proposed by Rijpkema and Girard (1991):

    $$\begin{aligned} \theta _{DIP} \approx \frac{2}{3} \theta _{PIP} , \theta _{PIP} \approx \frac{3}{4}\theta _{MCP(f/e)}. \end{aligned}$$
    (7)
  • The inter-finger constraints of the thumb proposed by Chen Chen et al. (2013):

    $$\begin{aligned} \theta _{IP} \approx \frac{1}{2}\theta _{MCP(f/e)} , \theta _{MCP(f/e)} \approx \frac{5}{4} \theta _{TMC(f/e)}. \end{aligned}$$
    (8)

3.2 Objects

Extensive experimental works by Bullock et al. (2013) revealed that circular power grasps with natural curls are highest in daily activities. With a focus on circular grasps, two categories of shapes- sphere and cylinder, were considered while selecting objects of everyday use. Objects with irregular or organic shapes exhibit high morphological variations on their surfaces and mainly require contour analysis of the object’s images for their recognition (Iivarinen and Visa 1996; Iivarinen et al. 1997). As a result, recognizing irregular-shaped objects is less effective with haptic features alone. The work reported in this paper includes the recognition of objects with regular shapes using finger kinematics during grasping.

Objects selected for the experiment were from the Yale-CMU-Berkeley (YCB) object set (Calli et al. 2017). This was to ensure the inclusion of objects with varying sizes but similar shapes among the two categories sphere and cylinder. The selected objects are rubber ball, tennis ball, table tennis ball in the spherical category, small pipe, medium rod, coffee bottle, and coffee mug in the cylindrical category. The objects used in the experiments are displayed in Fig. 5. Dimensions of the objects are mentioned in Table 8. The YCB object and model set is a standard set of objects defined to facilitate benchmarking in robotic grasping and manipulation applications. The YCB object set incorporates objects in daily usage with varying shapes, sizes, weights, and textures.

Fig. 5
figure 5

Objects considered for the experiments. From left to right: 1. small pipe, 2. medium rod, 3. coffee bottle, 4. coffee mug, 5. rubber ball, 6. lawn tennis ball and 7. table tennis (TT) ball

Table 8 Dimensions of the experimental objects
Fig. 6
figure 6

Grasp types implemented with their respective objects a power sphere, b sphere 4 finger, c sphere 3 finger, d tripod, e large diameter, f sphere 4 finger, g medium wrap, h adducted thumb, i large diameter

A few of the grasp types adopted from Feix et al. (2015) and implemented by the subjects are shown in Fig. 6. All grasps that are possible and practically plausible on each object have been implemented.

4 Experiments and results

4.1 Experimental procedure

Finger joint angle data from 4 healthy subjects aged between 22 and 25 years were collected during the experiments. Two subjects were male, and two subjects were female. All the subjects were right-handed. The heights of the subjects (S1–S4) are as follows: S1—170.18 centimeters (cms), S2—173 cms, S3—175.26 cms, and S4—175.26 cms. Every participant provided written agreements before the experiments were conducted.

The definition of a grasp for the experiment adopted from Feix et al. (2015) is as follows:

“A grasp is a static hand posture with which an object can be held securely, irrespective of the hand configurations”.

To reduce arm muscle fatigue, subjects sat on a chair with an armrest at the same level as the table’s surface during the experiments. The objects are placed on the surface of the table. Each one of the subjects performed experiments in a single session for a single object. Subjects performed three repetitions on each object for a similar grasp type to include temporal grasp variations. A temporal grasp variation is defined as the situation in which a person grasps the same object in different postures at different times. The subjects performed all the grasp types shown in Fig. 6. All movements are initiated at a rest position with muscles relaxed, hands open with the lower arm on the armrest, and wrist on the table surface.

We followed a trial-based procedure, where cues of the grasp type to be performed at a specific trial were displayed as images on a computer screen in front of the subject. A stopwatch is used as a timer to keep track of the time flow. At the beginning of a trial, the subject is instructed to form the grasp before the 5th second. At the 0th second, the subject is instructed to gradually initiate hand posture from rest to a grasp position. At 5th second, recording of the kinematic data starts for 10 seconds, till the 15th second. At the end of each trial, the subject releases the object and moves their hand posture back to rest. Twenty samples were recorded in each trial at the rate of 2 samples per second. A sequence of one trial is shown in Fig. 7. Abduction adduction angles for a grasp on an object at the MCP joints are manually recorded using a goniometer in another trial immediately after the sensor readings are completed for that specific trail on the object. This trail is not time-constrained and depends on the completion of the measurements. Participants are supervised to maintain the similar configuration as earlier. Post grasp manipulations (lifting and releasing) are not considered in this work.

Fig. 7
figure 7

Sequence of a trial performed in the experiment

4.2 Kinematic data acquisition

Kinematics of the five fingers were extracted through the flex sensors embedded in the MCP joints to record their flexion-extension movements. The following grasp types were implemented on the objects:

  • Spherical objects: power sphere, sphere three finger, sphere four finger, quadpod and tripod, with abducted thumb position.

  • Cylindrical objects: large diameter, small diameter and medium wrap with thumb abducted position; fixed hook, adducted thumb with the thumb in the adducted position.

Angular movements of DIP, PIP and IP joints of the fingers were generated using Eqs. 7 and 8. We recorded every sample with their respective label specifying the type of object being grasped. A dataset with twenty numerical features representing each of the finger joint movements was obtained. These features are:

  • Thumb: TMC (ab/ad), TMC (f/e), MCP (f/e), IP (f/e).

  • Index finger : MCP(ab/ad), MCP(f/e), PIP(f/e), DIP(f/e).

  • Middle finger: MCP(ab/ad), MCP(f/e), PIP(f/e), DIP(f/e).

  • Ring finger: MCP(ab/ad), MCP(f/e), PIP(f/e), DIP(f/e).

  • Little finger: MCP(ab/ad), MCP(f/e), PIP(f/e), DIP(f/e).

Every sample in the dataset was labelled with a categorical target label i.e., the Object Label. The number of samples for each of the object types in the dataset is presented in Fig. 8.

Fig. 8
figure 8

Statistical distribution of the objects used by the subjects during the grasp experiments

4.3 Feature selection from kinematic data

Feature selection procedures highly influence the performance of machine learning algorithms (Kira and Rendell 1992; Cai et al. 2018). Redundant features, features with low or no relevance at all concerning to the target variables, lower the performance of the models (Cai et al. 2018). We used two methods to select relevant features for the task of object recognition: (a) cross-checking correlation matrices of the joint angles for the five fingers using Pearson standard correlation coefficient for multiple features and (b) deducing the importance of all the features using random forest. The correlation matrices representing the degree of interdependence of the features to the target Object Label for little, ring, middle, index, and thumb fingers are presented from Figs. 9, 10, 11, 12 and 13, respectively.

Fig. 9
figure 9

Correlation matrix of little finger

Fig. 10
figure 10

Correlation matrix of ring finger

Fig. 11
figure 11

Correlation matrix of middle finger

Fig. 12
figure 12

Correlation matrix of index finger

Fig. 13
figure 13

Correlation matrix of thumb

Fig. 14
figure 14

Important feature variables bar plot using Random Forest

High feature correlation values in a correlation matrix represent a casual statistical and predictive relationship among the variables. Correlation values representing the relationship among features and the target variables in a dataset could be used for the task of feature selection to enhance the performance of classification algorithms. This work selects the features with a good correlation value (\(\ge 0.4\)) with the target variable ‘object label’ from the correlation matrices. Four features viz. Little_MCP(ab/ad), Ring_MCP(ab/ad), Middle_MCP (ab/ad) and Index_MCP(ab/ad) show acceptable correlation related to the prediction of object types. Consequently, these four features are considered relevant for the task of object classification by the classifiers. The Random Forest feature section method discussed next also confirms the relevance of these features.

Random forest classifier (Ho 1995; Breiman 2001) is a popular method used to select essential features from a dataset (Genuer et al. 2010). The random forest creates a subset of the most important features by pruning its decision trees under a particular node with the least impurity values. The kinematic data generated in this work was fitted on a random forest classifier to find out the most important features with respect to the target variable. Figure 14 presents a plot of the critical features in decreasing manner concerning predicting the object types. It has been observed that features like Thumb_TMC(ab/ad) which shows less correlation with the object types in the correlation matrix have been mentioned as the most important feature in the random forest method. To reduce information loss, we followed a hybrid approach where features mentioned important by both these two techniques are considered for training the classifiers. Accordingly, the finally selected features are: Little_MCP(ab/ad), Ring_MCP(ab/ad), Middle_MCP(ab/ad), Index_MCP(ab/ad) and  Thumb_TMC(ab/ad).

4.4 Hybrid shape primitive as feature

The shape is an inherent property used by the human brain for the cognitive perception of objects (Loncaric 1998). Shape-based grasp quality index for multi-fingered hands was initially proposed by Kim et al. (2004). The authors used a three-fingered robotic hand to define a two-dimensional grasp polygon and extracted the area of the grasp polygon as a grasp quality measure. A similar approach was forwarded by León et al. (2012), where the authors defined a grasp quality measure from a two-dimensional grasp polygon for a hand using three fingers which are most effective during a grasp. The authors suggest that a grasp polygon area is an essential grasp quality index, categorized based on the force-dependent contact point location. Their work considered only the thumb, index, and middle fingers for the generation of a two-dimensional grasp polygon, as the authors contemplated that the ring and little finger plays a less critical role during a grasp. But, information of all the five fingers from a complete hand enclosure on an object are crucial to predict the object’s shape (Lederman and Klatzky 1993).

figure a

Henceforth, we forwarded a hybrid shape primitive called area of grasp polyhedron, deduced from the kinematic data of the five fingers, and used it as an additional feature for object recognition. Our approach interprets the area of a three-dimensional grasp polygon (i.e., a polyhedron) as a grasp contour-based shape primitive. We considered all the fingertips, as features from all the fingers were statistically found to be significant during feature selection. The new grasp index is calculated as:

$$\begin{aligned} Q_{AGP} = Area(Polyhedron(p_1,p_2,p_3,p_4,p_5)) \end{aligned}$$
(9)

where \(p_1\) to \(p_5\) are the three-dimensional fingertip coordinates. Based on the kinematic model as discussed in Sect. 2, the fingertip coordinates are calculated and grasp polyhedrons are generated for every sample of the dataset. Apart from availing the grasp quality index, a grasp polyhedron can also be trivial for the extraction of parametric information like dimension estimation, pose estimation, and grasp stability (Spiers et al. 2016).

We developed an algorithm (Algorithm 1) to fulfil the objective of grasp polyhedron generation for different shapes of objects. This algorithm takes \(\theta\) values and ’a’ values from the dataset and executes forward kinematics to calculate the fingertip coordinates during a grasp. The grasp quality index is also calculated alongside for comparison among the pentagons for different objects.

Fig. 15
figure 15

Grasp polyhedrons generated for a small pipe, b big pipe, c coffee bottle, d cup, e rubber ball, f lawn tennis ball, g TT ball

Figure 15 presents samples of polyhedrons for each object generated by Algorithm 1. It has been observed that all the spherical objects form approximately a pentagon as all the fingertips fall nearly on the same plane during a power sphere or tripod grasp. The cylindrical objects display a proper 3D polyhedron as, during their grasp, most of the fingertips are on separate planes. Similar observations are reflected from the surface plots of the grasp polyhedrons presented in Fig. 16.

Fig. 16
figure 16

Surface plots of the grasp polyhedrons for a small pipe, b big pipe, c coffee bottle, d cup, e rubber ball, f tennis ball, g TT ball

To incorporate a hybrid shape primitive learning approach, we included the area of grasp polyhedron (\(Q_{AGP}\)) as a determinant feature for the object recognition process.

4.5 Discussion of the classification results

4.5.1 Training on kinematic data

State-of-art non-linear classifiers were trained on the dataset with the selected features, for comparative analysis among the classification accuracies of the classifiers. The classification algorithms considered in this work are: nonlinear support vector machine (SVM), Gaussian Naive bayes (GNB), decision tree, K-nearest neighbour (KNN) and random forest (RF). As suggested by Borra and Di Ciaccio (2010), we implemented a 10-fold cross-validation technique before training the algorithms to avoid over-fitting.

Fig. 17
figure 17

Confusion matrix of the classifiers trained with kinematic data: a GNB, b KNN, c SVM, d decision tree, e RF (with 10 trees), f RF (with 50 trees). Objects: 0—big pipe, 1—coffee bottle, 2—cup, 3—rubber ball, 4—small pipe, 5—TT Ball, 6—lawn tennis ball

We implemented SVM with a polynomial kernel, as suggested by Hussain et al. (2011). Their work reported that a polynomial kernel displays better accuracy in comparison to the radial basis function, linear and sigmoid kernels for multi-class classification problems. The RF classifier was used twice: once with ten trees and once with fifty trees, but no variance in classification accuracy has been observed. 25% of the dataset was used as test data for all the classifiers. From Table 9, it is seen that SVM as well as both RF classifiers outperform the rest by a small margin while distinguishing between the objects.

The confusion matrices of the classifiers are listed in Fig. 17. The values reflect that there are few classification errors among recognition results between Big Pipe, Coffee Bottle, Cup and Small Pipe. Matrices of GNB, KNN and SVM showed similar classification errors between Rubber ball and Tennis ball . This observation is plausible, considering the structural similarities among these objects. Gerlach (2017) have verified and statistically analyzed this misclassification issue while dealing with objects having close structural resemblances. The GNB classifier performed worst with not a single recognition for Coffee Bottle. There have been significantly fewer classification errors while recognizing cylindrical and spherical objects depending on their structural dissimilarities. The confusion matrices of both the RF classifiers display better classification accuracies with minimal false negatives and false positives.

We further used the dataset to train an artificial neural network (ANN) with one hidden layer, in 10 repetitions. The configuration of the ANN is as follows:

  • Input layer: input size: 5, nodes: 12, activation: ReLU.

  • Hidden layer: nodes: 10, activation: ReLU.

  • Output layer: nodes: 7, activation: Softmax.

We compiled the multivariate multiclass classification model using a sparse cross-entropy loss function and was fitted with a cross-validation of 33%. In 100 epochs with batch size 10 and 10 repetitions, we achieved an average training accuracy of 81.29% with a test accuracy of 81.07%.

4.5.2 Training on kinematic data with hybrid shape primitive

To verify the usefulness of the shape primitives, we trained the classifiers on the combination of kinematic data, including the area of grasp polyhedron deduced from the dataset. Results revealed that the new shape primitive increased classification accuracy significantly compared to training with kinematic features alone. Table 9 presents the comparison of classification results for recognizing objects using kinematic data vs. inclusion of \(Q_{AGP}\). Confusion matrices of the classifiers trained with the shape attribute are shown in Fig. 12. Comparing the confusion matrices in Fig. 17 with the matrices in Fig. 18 confirmed that the inclusion of the \(Q_{AGP}\) feature significantly improves the classification accuracies of the classifiers.

Fig. 18
figure 18

Confusion matrix of the classifiers trained with combination of kinematic data and \(Q_{AGP}\): a GNB, b KNN, c SVM, d decision tree, e RF (with 10 trees), f RF (with 50 trees). Objects: 0—big pipe, 1—coffee bottle, 2—cup, 3—rubber ball, 4—small pipe, 5—TT Ball, 6—lawn tennis ball

Table 9 Comparison of classification results for training with kinematic data vs training with inclusion of hybrid shape primitive

Furthermore, we trained the ANN in 10 repetitions using the dataset with shape primitives and achieved an average training accuracy of 87.62% and a test accuracy of 87.67%. This result is better compared to using only kinematic data without shape primitives. Accuracy curves of the trained network for 100 epochs for ANN trained with kinematic data and ANN trained with the combination of kinematic data \(Q_{AGP}\) are shown in Figs. 19 and 20 respectively. It has been observed that accuracy of the network increases and becomes constant somewhere near the 10th epoc. Changing the network’s hyper-parameters would yield comparative results and will be analyzed in the future version of this work.

Fig. 19
figure 19

Accuracy of the ANN trained with kinematic data

Fig. 20
figure 20

Accuracy of the ANN trained with combination of kinematic data and \(Q_{AGP}\)

Table 10 presents a detailed comparison of the evaluation indices for ANN classification using kinematic features vs. a combination of kinematic and shape primitive features. The bold values in the table represent the average accuracies of the ANN after the ten trials during both the classification phases. The comparison shows that accuracy increases significantly when the combination of shape attribute and kinematic features has been used, with a drastic decrease in validation loss.

Table 10 Comparison of ANN classification with kinematic vis-a-vis kinematic and shape features

5 Conclusion

Humans can recognize almost all the objects encountered earlier on grasp without having a visual percept. The prime motive of this research is to claim that preliminary finger joint kinematics are crucial for this task of object recognition on grasp. Using finger kinematics only as features for object recognition stems from the fact that visual approaches are dependent on lighting conditions and obstacles in practical situations.

This work focuses on tactile-based object recognition using kinematic features extracted from grasping experiments with a sensorized glove. Experimental results using the proposed shape primitive feature Area of Grasp Polygon showed an increase in accuracy of 3–6% for all state-of-the-art classifiers compared to the general kinematic features. Since the accuracies of most of the classifiers using only the kinematic features are above 85%, even a minimal increase in accuracy is considerable. Moreover, using a sensor with comparatively higher accuracy, precision, resolution, and sensitivity than the flex sensors used in the experiments may achieve better results.

Another important fact about the classification results is that they may vary with more experimental objects. As more objects with varying sizes will lead to fewer other grasp types, there will be higher variance among finger kinematics. For example, grasping two rubber balls with different diameters will affect the classification accuracy. In such a scenario, the features selected in this work may not be practical. Exploring the geometric relationships between grasp polyhedrons of similar objects during different grasp types will also be an exciting aspect. Even though this work doesn’t consider the inclusion of other tactile features, in reality, force feedback plays a significant role in differentiating between objects of the same shape and size. For example, a rubber ball and a tennis ball with the same diameter might be classified as any of them, reducing the accuracy by a critical amount. The future work will be focused on solving most of these trivial issues to develop effective object recognition modalities. Inherently, rehabilitation robotics researchers can use this visionless object recognition module with a network trained on multiple daily in-use objects to develop inexpensive prosthetic hand solutions for amputees.