1 Introduction

A handwritten signature is a universally acknowledged biometric trademark whose indistinguishable angles nudge toward a person’s identity. Other biometric properties, for example, fingerprints or iris have minor intra- singular variety, if you compare them with signatures. Signature is perhaps the most well-known method of addressing a person’s individuality or monetary assent, participation in office, and so forth, In addition, this biometric implies is applied in a huge number of fields, for example, medical care, security, government work- places, schools, banks, etc. (Sundararajan and Woodard 2018). Attempts of signature falsifying cases have been surging in recent years, also known as signature forgery, in different areas like bank cheques, investment banking, share market, and corporate sectors. This has prompted the researchers to build a model which can automatically distinguish the original and forged signature.

Intra-personal signature variation is one of the most challenging tasks as it is difficult to differentiate between skilled forgery and intra-personal variation of signature. This variation may happen due to stress, old age, habits of multiple signatures, shaky handwriting, or maybe due to an unconscious mind. We can divide the handwritten signature forgeries into two parts: random signature forgeries (Chang and Shin 2008)skilled signature forgeries (Ferrer et al. 2019). In the first type of forgeries, the forged signature has little to no resemblance to the original signature and it may happen due to a lack of knowledge about the original signature. Random signature forgeries are easy to detect.

The second type of forgery poses the real challenge and the most complex type of forgeries to detect (Malik et al. 2015). These signatures are made by criminals who have spent a lot of time practicing and have the ability to replicate real signatures in a way that appears accurate and relatively fluid to the naked eye.

Depending on the input format signature verification system can be divided into two types. The first type is offline signature verification, in which input signature is captured through a scanner or other type of imaging devices from official documents, bank cheques, legal documents, etc., and produces two-dimensional signature images. On the other hand, online signature verification systems obtain signatures from electronic devices such as smartphones, tablets, and digital writing pads, which can mainly record the coordinate sequence of the electronic pen tip when signing. In addition to the signature writing coordinates, these devices can also obtain additional information such as writing speed and pressure for use in the online verification process.

Moreover, the online signatures verification system is more robust in terms of performance than its offline counterpart due to the availability of dynamic behavioral components such as stroke order, speed, and pressure of online signature (Munich and Perona 2003).

However, this performance improvement comes at the cost of requiring special hardware to record the trajectory of the pen tip, increasing the cost of the system and reducing actual application scenarios. In many cases, offline signature verification is the only option, such as transaction verification and document verification. Due to its broader areas of application, in this article, we focus on the more challenging tasks of offline automatic signature verification. Our goal is to deliver a less complex model which can quickly train either on the GPU or even on the CPU, which is memory efficient, requires less pre-processing, has fewer lines of python code, less time to set up, easily compatible with different kinds of software, more readable and can even achieve competitive results.

On the basis of functionality, offline signature verification systems can be addressed with writer-independent and writer-dependent approaches. In the writer independent system, a generic model is built for all the users, it divides all the signers into trains as well as tests for the training. Whereas the writer-dependent has their own individual methods for all the users. The novelty and major contribution of the present work is as follows:

  1. 1.

    The proposed article used the transfer learning approach to analyse genuine and forged signature by extracting features from the pretrained model along with Random forest. The proposed hybrid VGG19 model has performed well with respect to existing models. The Transfer learning has been used first time for the data set of BHSig260 in the present work.

  2. 2.

    Different performance metrics have been used in support of the claim made in the present paper. The performance of the proposed model is also compared with the different existing method. The analysis of the framework has been done in a different ways.

The composition of the present manuscript is as follows: Sect. 2 described the related work, whereas Sect. 3 summarises the paper’s proposed methodology. Results are presented in Sect. 4 and conclusions are presented in Sect. 5.

2 Related work

Offline signature verification has explored a large verity of methods for verification. Over last few decades research on signature verification is going vigorously but still research challenge is being explored. The main aim of offline signature verification system is to discriminate between original and forged signatures. The present work is also explored the verification of offline signatures. Anamika Jain et al. suggested a shallow convolutional neural network model to authenticate a genuine signature (Jain et al. 2020). They have used CNN as their base model, which has the three convolutional layers as well as a fully connected layer. Rajib Ghosh presented a Recurrent Neural Network-based deep learning model for verifying offline signatures (Ghosh 2021).

The R-Signet architecture is discussed in the work by Danilo Avola et al. (Avola et al. 2021). Authors have used a multitask approach for writer-independent (WI) signature verification. The authors also extract the compact generic features, enabling a support vector machine (SVM) for training and validation in an offline writer-dependent (WD) mode.

For signature verification, a unique feature set based on quasi-straightness of boundary pixel runs is introduced by Md Ajij et al. (Ajij et al. 2023). In order to get the feature set, they first extract the quasi-straight line segments from the signature border pixels. The quasi-straight line segments combine straightness with tiny curvatures to produce a robust feature set for signature verification. Maryam Houtinezhad et al. improved the discriminative characteristics of the signature photographs employed canonical correlation analysis (CCA), which included extracting parametric features and local binary patterns (LBP) from the signature images (Houtinezhad and Ghaffary 2020). In the method by Muhammad Shehzad Hanifa et al., a unique global representation of signatures is employed in combination with the Mahalanobis distance based dissimilarity score to discern between the genuine signatures and their sophisticated counterfeits (Hanif and Bilal 2020).

The authors demonstrated that Convolutional Neural Networks (CNNs) have the capability of extracting the micro deformations through the use of maximum pooling (Zheng et al. 2021). Further, the location coordinates of the maximum values in pooling windows of max-pooling may be utilised to detect micro deformations by looking at the position coordinates of the maximum values in pooling windows of max-pooling. BHSig260 is used as a case study for the transfer learning approach that we employ in this study (Pal et al. 2016). Sounak Dey et al. used a convolutional Siamese network to model an offline writer independent signature verification problem (Dey et al. 2017). Siamese networks are twin networks with shared weights that can be used to learn a feature space in which comparable observations are clustered together. This is accomplished by exposing the network to a pair of similar and dissimilar observations and lowering the Euclidean distance between similar pairs while increasing it between dissimilar pairs.

Soumitri Chattopadhyay et al. describe a two-stage deep learning system for writer-independent OSV that combines self-supervised representation learning with metric learning (Chattopadhyay et al. 2022). Siladittya Manna et al. implemented a self-supervised learning model (Manna et al. 2022). minimizing the cross-covariance between two random variables signifying distinct feature directions and maintaining a positive cross-covariance between the random variables denoting the same feature direction is the goal of self-supervised representation learning from signature photos. This ensures that the features are linearly decorrelated and that unnecessary data is removed.

Table 5 presents the current state-of-the-art BH- Sig260, which may be used to describe the contribution of this study in a nutshell. It should be mentioned that the values for some of the approaches shown in Table 5 are removed since their assessment settings were different from those of the other techniques presented in the table. As a result, the numbers are offered for methodologies that have made use of the data supplied in conjunction with the most frequently used assessment criteria. The present manuscript explored whether the classification accuracy can be significantly increased by the use of Transfer learning models along with the traditional learning approach. More specifically, our proposed model has been used the VGG19 with Random Forest. We have taken the features from every block, and that feature has been passed with the VGG19, and finally, Random Forest has been applied for classification purposes. Another unique aspect of the present work is the use of data set. Transfer learning is being used for the first time on signatures in the Bengali language. The present model has been trained with BHSig260, and its authors presented the result with signatures that are in the Bengali language.

3 Materials and methods

This section contains proposed models and description of performed experiments, data sets used to validate the proposed model.

3.1 Proposed methodology

3.1.1 Transfer learning

Transfer learning is the process of developing a model in the target domain(TD) from the information gained from the source domain (SD).The source and target domains are both labelled. Consider the domain SD to grasp the goal of transfer learning. Here, SD has the following data points: {a1,b1},{a2,b2},…,{an,bn},where ai ∈ A are the data used for training. In our experiment, ai represents the input visual, and bi represents the classification label in B. A binary classification is the focus of this paper. The goal is to automate the learning of the conditional probability function p(bt |at ). It is necessary to apply the current distribution function fs(.) in order to estimate ps(bt|at). In order to learn the function fs(.), a network or model is trained in the source domain. Trans- fer learning can occur in a variety of situations. The input set of attributes from the (SD) differs from the input set of features from the (TD) ∀iaiS ≠ aiT and pT(bt|at) ≠ pS(bs|as). Other situations include the source domain’s feature set matching the target domain’s, e.g. ∀iais = ait, however, ps(bt|at ) ≠ ps(bs|as ), in addition to the (SD ) and the (TD ) being same, for example, ∀iais = ait and ps(bt|at) = ps(bs|as). This is a classic machine learning situation that has been extensively researched (Pan and Yang 2009).

Questions to answer to apply transfer learning in practise:

  1. 1)

    What needs to be transferred?

  2. 2)

    How to transmit information?

Researchers must rely on their own judgment as well as substantial feature engineering. The subject of model selection and how to increase forecast accuracy is addressed in the second question. This is accomplished via the usage of VGG-19 (Simonyan and Zisserman 2014).

3.1.2 CNN: convolutional neural network

The section provides a general overview of CNN. In this study, we will use a Convolutional neural network framework (Visual Geometry Group: VGG-19) as the foundation for our analysis.

Recent years have been a resurgence of interest in convolutional models among researchers (Li et al. 2017), (Gao et al. 2018). Layers are split into three sections: the (i)input layer, (ii)convolutional layer, and (iii)max pool. They come together to create a feed-forward network, which may be depicted as follows:

$$\theta (A)={\theta _U}(.{\theta _2}({\theta _1}(A,{\delta ^{(1)}}),{\delta ^{(2)}}),.{\delta ^{(U)}})$$
(1)

Here δ represents the learning rate, U represents the stages, and varT heta represents a step-wise operation. The convolution takes in pictures and outputs a feature map, denoted by the symbol fm. It should be noted that this is a significant component of the layer(convolutional) and is essential to generating attributes kernel. After that, the attributes are passed into the next layer, which is represented as:

$${A_t}^{{(l)}}=\theta \left( {\sum\limits_{{i=1}}^{N} {{v_{ji}}^{{(l)}}*{A_j}^{{(l - 1)}}+{q_i}^{{(l)}}} } \right)$$
(2)

Where, ∗ denotes the convolution operator and Ai denotes a feature map for the ith element, followed by ReLU. The following is an example of how this is defined:

$${u_{abc}}=\mathop {\hbox{max} }\limits_{c} (0,{a_{abc}})$$
(3)

where uabc represents the value of the attribute (cth), which is a part of aabc .

The pooling layer is the final sub-component of the first layer and is responsible for spatial invariance. Downsampling fm. is prevalent. The dimensionality of input data is reduced by combining adjacent values (pooling), one common approach is max-pooling.

The dense layers provide the acquired values for the feature map, which is passed to softmax for classification. Softmax is denoted by:

$$\Upsilon (B)=\frac{{{e^{{a_i}}}}}{{\sum\nolimits_{{j=1}}^{N} {{e^{{a_i}}}} }}$$
(4)

As a consequence, and in line with this strategy, CNN is used for classification in multiple domains. Furthermore, the efficiency of CNNs has been extensively researched in the scientific literature (Gao et al. 11,21,22,; Udmale et al. 2020; Singh et al. 2020; Singh et al. 2021).

Fig. 1
figure 1

VGG-19 Architecture

Fig. 2
figure 2

Proposed VGG-19 architectural framework

3.1.3 VGG-19 framework for extraction of features and classification

The proposed framework is discussed here. Since the research implies that transfer learning may increase system performance, we adopt the notion of transfer learning (VGG 19) (Li et al. 2017), (Gao et al. 2018). We also evaluate several categorization algorithms to optimise overall performance.

In Figs. 1 and 2, you can see how the general structure of the work is laid out. We adopted the current architecture described in (Simonyan and Zisserman 2014)since we were implementing the transfer learning technique described there. There are several versions of this framework that are usually referred to as VGG-19. A 19-layer deep neural network called VGG-19 has frequently been used in the literature ((Canziani et al. 2016), (Dumoulin 2016)). In Fig.1 of this paper, the VGG-19 base network is displayed. Considering that this network was trained on around 1.2 million images downloaded from the ImageNet database, the ImageNet database was deemed the source domain (Ds), and the collection of offline signatures we were aiming to discover was termed the target domain (Dt). The popularity of the VGG-19 net- work, as well as its broad usage in transfer learning, led us to choose this network as the pretrained framework for our research in this study and presented in Table 1.

Table 1 Features extracted from the various layers of our proposed model

Furthermore, we offered a few changes, which will be described in further detail in the next paragraph.

The present VGG-19 (Fig. 1) has five blocks, each responsible for extracting a collection of attributes (in sequence). We removed the dense layer & SoftMax and introduced a new classification method (Fig. 2). Each block of VGG-19 is responsible for efficiently extracting features from data. This new collection of characteristics is provided to another classifier. We also tried many categorization approaches (Sect. 4.1). Despite the fact that it was anticipated that one of the models would provide good outcomes across the field, the study revealed that this was not the scenario. The findings are detailed in Sects. 4.1 and  4.2.

3.2 Dataset description

This Bangla signature (BHSig260) dataset contains 100 sets of offline handwritten signatures in Bangla script. Handwritten offline signatures were collected from 100 different people of various educational backgrounds and ages. From each individual 24 genuine and 30 skilled forgeries signatures are collected. Signatures were collected throughout the course of two sessions. The genuine signatures were acquired in the first session, while the skilled forgeries were obtained in the second session, with a genuine signature being shown to a person to train and duplicate the forgeries. All 100 people provided a total of 2400 Genuine and 3000 skilled forged signatures. The data was taken on a flatbed scanner with a resolution of 300 DPI in grey scale and saved in TIFF format (Tagged Image File Format). In terms of numbers, there were 24 legitimate users and 30 counterfeit users. Figure 3 shows a few images from the dataset. Binarization was performed using a histogram based threshold approach to transform digitised grey-level images to two tone images. Because the sophisticated forged signatures obtained are so similar to genuine signatures, the dataset is particularly difficult to work with. Some binary genuine signature samples from the BHSig260 dataset, together with their associated forgeries, are included in Fig. 3 to demonstrate the complexity of the forged signatures. Genuine and skilled forgery signatures sample have very minor differences, this makes the dataset more challenging.

Fig. 3
figure 3

BHSig260: Sample signatures

3.3 Experimental analysis

We have implemented convolutional neural network using the Tensor flow and machine learning techniques. For feature extraction, we used the transfer learning model which is already trained on ImageNet. Traditional machine learning techniques such as Random Forest, SVM Linear, and Logistic regression are used to perform classification. The simulation has been performed on the machine of a 3.2 GHz processor with 32GB memory in python. The testing and validation of the proposed model have been done on the dataset of BHSig260. The used dataset contains two sets of images, genuine and forgery. A total of 5400 images have been used from both classes. Both sets of images have been divided into two sets of training and testing. A total of 70% of the data have been used for training purposes, and 30% of the total images have been used for testing purposes. A detailed description of the data that have been used for the experiment is listed in Table 2.

Table 2 Number of images used in the proposed method

3.4 Performance evaluation

A total of 5400 images were used to assess the performance of the proposed technique. Genuine and forgery are the two classes in the dataset. Depending on the types of data, the dataset is divided into a training set and a testing set. The splitting of the data has been done with the help of the train test split method. The Accuracy, Sensitivity, Specificity, False Rejection Rate(FRR), False Acceptance Rate(FAR), and Equal Error Rate (ERR) has been calculated to validate the accuracy of the proposed model.

$$Accuracy=\frac{{TP+TN}}{{TP+FP+FN+TN}}$$
(5)
$$EER=\frac{{(FAR+FRR)}}{2}$$
(6)
$$FAR=\frac{{FP}}{{(FP+TN)}}$$
(7)
$$FRR=\frac{{FN}}{{(FN+TP)}}$$
(8)
$$Sensitivity=\frac{{TP}}{{TP+FN}}$$
(9)
$$Specificity=\frac{{TN}}{{TN+FP}}$$
(10)

Where FN (false negative), FP (false positive), TN (true negative), and TP (true positive). We utilised the aforementioned assessment measures as examples of common evaluation parameters for signature verification (Pal et al. 2016).

Table 3 Evaluation of the proposed method’s performance

4 Results and discussion

This section compares the model’s performance to other frameworks. As stated in Sect. 1, we used the dataset, which contains 5400 photos of 100 users (Pal et al. 2016). Several articles have attempted to categorise the (Pal et al. 2016) dataset. The findings were going to exacerbate due to the size of the trials, i.e., all 5400 photos. The improvement of the classification accuracy is the main purpose of the proposed model. In the proposed model, VGG19 with Random forest gives the highest accuracy. The proposed hybrid model for offline signature verification performs better as compared to other traditional transfer learning approaches. The performance of the proposed hybrid model has been evaluated by calculating the Accuracy, Recall, Sensitivity, Specificity, FAR, FRR and EER respectively. Since, This is a binary classification problem hence; the value of recall and sensitivity is same. The performance evaluation of the hybrid model is tabulated in the Table 3. The major advantage of the above models exists in terms of their ability to extract the features on its own as it uses in Random Forest, SVM linear and Logistic regression. It is clear from the our experiments that modified VGG19 with Random forest gives the highest accuracy which is 97.8. The experiment has been also performed with the SVM linear and Logistic regression to compare the results of the proposed model. It has been also observed from the Table 4 that FAR is 0.06 in case of Random Forest which is much lesser than the other classifiers.

Table 5 Shows the comparative outcomes of the proposed approach. The information in Table 5shows that the work in this study outperformed the many evaluation criteria. Some of the comparison strategies were discussed and applied in (Pal et al. 2016). As it can be observed, the suggested model outperforms current techniques on various metrics. Achieved better performance on three out of four evaluation parameters. The findings indicate the model’s superiority over previous efforts. The experiment’s size also revealed important insights about offline signature verification, which are detailed in Sect.4.2.

Table 3 contain the complete findings of the experiment as described in Sect. 3.1.3. Random forest was chosen for classification since it is one of the best algorithms for high - dimensional data. Random forest resists overfitting and has simple parametrization. Table 3 shows the results of categorization using random forest as the basic framework. This table shows the methodology’s overall performance. The suggested study outperformed earlier research on a few aspects. The findings also shows the framework’s capacity to discriminate between the two classes

Table 4 Comparison: Classification Models

4.1 Ablation analysis

It is not necessary that original transfer learning methods always gives better accuracy. As a result of the study provided here, the numbers in Table 3 demonstrate that adding more blocks degrades performance. Specifically, after conv1 of block5, the performance degrades. Similar scenarios exist for FRR and EER (after conv2 of block5). We realized this was related to the problem of transferring negative knowledge. Negative transfer is a well-established notion in the scientific literature. This is acceptable since we can improve the dataset’s performance.

The number of features may also be a factor in poor performance. Table 3 showed that conv1 of block1 of had the most attributes while the pooling layer of block5 had the least. We didn’t witness the finest performance here. Rather, the greatest statistics came from an intermediate layer. The best results for FAR, FRR, EER, etc., were achieved at various layers. This involves keeping a balance between performance, features, and computation time.

Transfer learning was used as the basis model and was supplemented with a random forest model, as described in the above section. Additional experimentation using a variety of methodologies is carried out in this subsection. In order to improve classification accuracy, the model was modified to include SVM and logistic regression. Aside from that, nothing changed in terms of the configuration. Table 4 shows the results of the study with this architecture.

Table 5 Table comparing the results of the proposed approaches and the current methods

4.2 Discussion

Specifically, in this study, we employed VGG-19 and performed several classification algorithms to determine the most effective method of improving performance while dealing with unbalanced offline signature photos. We discovered, after conducting extensive numerical experiments, that employing a classifier to retrain VGG- 19 produced the expected outcomes. As previously said, this research examines one of the more complex difficulties in the literature, and the concepts given here should not be seen as a definitive answer to the problem. The model has been examined from a variety of perspectives, and we have done our best to give a comprehensive examination of it. As stated in this article, a few recommendations are also presented, as well as a prospective road map that may be used to develop a better and more effective system at some point. We thus reviewed some limitations of the experimental findings and presented a few proposals to supplement the literature in light of the experimental findings. Also included were prospective recommendations that may be used to aid with the size of testing that is being conducted.

  1. 1.

    Developing a deep neural network is often regarded as one of the most difficult issues in computational science. This article employed 5400 photos, so it’s easy to see why training a deep learning model with such a tiny number of data was a difficult challenge to do. We acknowledge that using standard machine learning methodologies, it is possible that the intended results would not be obtained. Consequently, transfer learning was included into the suggested model; however, the kind of transfer learning used is depending on the application. In order to get the desired outcomes, it is necessary to consider the context and the scope of the issue.

  2. 2.

     Our experiments in this research included many distinct strategies; nevertheless, no one strategy was successful in meeting all of the assessment criteria. This decisive statement was reached after extensive experimentation with the VGG-19 and the addition of random forest, SVM, and logistic regression models. Table 4 demonstrates that no one strategy outperformed the others in all of the assessment criteria (Sect. 4.1 for more information). This was predicted since we were unable to create a generalized model that outperformed the other classification models in every situation under consideration. To substantiate this conclusion, we turn to machine learning (Wolpert and Macready 1997), which states that ”there is no free lunch.” It is assumed that scholars would have to go for ablation analysis in order to get the best possible outcome in each and every case. In summary, there is no one offline signature approach that is both efficient and effective. As a result, we urge that alternative ideas be tested before coming to a final decision.

The findings reported in Sect. 4 illustrate that despite the inadequacies of the proposed work, the framework exhibits competitive performance when compared to state-of-the-art methodologies. Finding a one-of-a-kind solution in the context of the topic being addressed here is a difficult task.

5 Conclusion

On the basis of automated machine-driven techniques,

we investigated how to classify offline signature images in this study. The dataset only comprised 100 individuals with 24 real and 30 forged signatures, making it too tiny to train a deep learning model. To solve this, we employed transfer learning to train the current VGG19 on over a million ImageNet frames before applying it to the offline signature dataset. The output layer of the model included additional categorization algorithms. As a result of this research, current competence in categorising BHSig260 in various ways has been expanded. The technique’s categorization performance was also compared to current frameworks. Given the scope of the studies, it is likely that additional work, whether that be in Transfer Learning or not, will result in an improvement in the baseline performance. In order to determine the impact of augmentation, stain normalisation, and other classifiers on the efficiency of the suggested system, additional study is required. After establishing the method’s baseline performance, further research should focus on improving the method’s performance and testing its efficiency on a different datasets.