Keywords

1 Introduction

Signature is one of the most commonly accepted methods for personal verification and identification. Signature verification is important for banking, legal documents and still an important area of research in the field of machine learning and deep learning. Typically, signatures are of two types: (1) handwritten and (2) digital. Capturing handwritten signature needs a paper with pen or electronic pad with stylus. Apart from the ink impression on the paper, signature verification also requires to consider writing speed, pressure, etc.

In this paper, we focus on feature extraction and classification on the image dataset of handwritten signature stored in PNG format. We make several contributions for each feature extraction and classification. First, to obtain feature extraction for each algorithm and CNN architectures independently. Second, we present algorithms that are more suitable for classification famous as supervised learning algorithm. Data augmentation is another aspect of our paper where even the small dataset can be used to increase the dataset for performing the feature extraction task (Figs. 1 and 2).

Fig. 1
figure 1

Signature of the same subject left: real right: morphed

Fig. 2
figure 2

Augmentation on same image rotation and zoom

The remainder of this paper is organized as follows. In Sect. 2, we have discussed the problem and literature survey. Section 3 introduces algorithms and architectures to efficiently extract features and discusses algorithms and methods of classification using those extracted features. In Sect. 4, perform a set of experiments on various architectures using concepts of transfer learning and data augmentation. Finally, we have a conclusion for our work and suggestions for future work in Sect. 5.

2 Literature Survey

The objective is to develop the handwritten verification system using the latest advancement in deep learning. Input parameter to this system is paired of two signatures in portable network graphics images (PNG) format and outputs the Boolean value (1 or 0). This paper focuses on the experiment of convolution neural network architectures and different classification techniques. The deep learning-based method has emerged as successful tool for computer vision and pattern recognition-related applications [17]. It is a lot easier to verify the digital signature as compared with handwritten signature verification, this counts in the most challenging areas of pattern recognition. Although signature verification is a well-researched problem and there are many contributors of the same.

SVC2004 [1] “The first international signature certification competition”. The competition has two competitions, competing first with 13 teams with ERR 2.84% and second with 8 teams with ERR 2.89%.

Many ways to get the right limit from the reference are being investigated. A positive result yields a false negative rejection rate of 2.8% and a false acceptance rate over 1.5%. A database test containing a total of signatures over 1200 of people greater than 100 shows that author-based thresholds provide better results than using the same limit [3]. The “Siamese” process of the neural network is used.

The authenticity of the test signature is established by aligning it with the reference user's reference signature, using a dynamic time. The authors [4] compared the test signature with the corresponding mean values found in the reference set, forming a three dimensional vector. This feature vector is then divided into one of two categories (real or fraud). The key component analysis received an error rate of 1.4% of the 619 test signatures and 94 people.

From Ref. [5], an online signature verification methodology has been introduced. The system uses a timeline set with Hidden Markov Models (HMMs). Development tests and tests are reported in the subcorpus of the MCYT bimodal biometric database containing more than 6500 signatures from a total of 145 studies [6]. The developers were familiar with the verification process and did their best to defraud the system. The acceptance rate for random forgeries, i.e., the accidental similarity of two different signatures, was 0.16%.

Classifiers based on neural net feeds are used. The factors used to distinguish are guessing times and symbols based on the upper and lower envelope. The output of the three separators is integrated using a connecting system. The integration of these separators based on signature verification is a distinctive feature of this work. Test results show that the combination of classifiers increases the reliability of visual results.

In Ref. [7], Datasets selected by CEDAR, MCYT and GPDS were performed. The performance of the algorithm proposed is based on three precision steps such as FAR, FRR and AER [9]. Compared with the standard system, the findings were found to be 20% error. The database SVC2004 was selected to verify the signature [10]. We tested our approach to GPSSynthetic, MCYT, SigComp11 and CEDAR databases that demonstrate the generality of our suggestion. The review [11] includes the implementation of state-of-the-art programs in selected subjects in five public databases.

The authors [8] used signal processing for the signature verification task. Vector representation of words is used for the analysis of sentiments [16]. The authors [12,13,14,15] used the Google Net, Inception-v1, Inception-v3, DAG-CNN and other architectures model for signature verification. We have in-depth studied in this paper about the VGG16, ResNet-50, Inception-V3 and Xception architectures.

3 The Proposed Methodology

A. Dataset

Data used are ICDAR 2011 Signature Dataset, which consists signature of 69 subjects and multiple genuine and forged signatures.

B. Proposed system

This paper is divided into two major steps (1) Feature extraction and (2) Classification. Following toward the tasks, CNN models are involved for feature extraction and supervised models for classification.

C. Feature extraction

Convolution neural network (CNN) is a popular neural network architecture for working on image dataset. It consists of certain number of layers such that output of previous layer is fed to next layer as input. These images are feed as 3D array than 1D or flattened array because CNN architectures are designed to treat images as human visual cortex. The architecture of CNN determines the function of each layer and connections in layers. There are four architectures of CNN used in the paper for experiment, VGG16, Inception-v3, ResNet-50 and Xception. Choosing a suitable architecture for the dataset is crucial to complete the first step, i.e. feature extraction of our handwritten signature verification system.

Convolution Layers: The convolution layer is the building block of the convolutional network that does a lot of complex computer-based lifting. CONV layer parameters contain a set of readable filters. Each filter is small in area (in terms of length and width) but expands to the full depth of input volume. During the progression, we slide (accurately, convince) each filter into the width and height of the input volume and calculate the dot products between the filter input. As we load the filter over the dimensions of the input, we will produce a map that provides the feedback for that filter to all areas below.

Max-pooling Layers: High integration, or greater cohesion, is a merging function that determines the maximum, or largest value in each section of the map for each feature. The results are sample or combined feature maps highlighting the feature that is most present in the piece, not the central presence of the feature in the case of a moderate combination. This has been found to be more effective in performance than standard integration of computer viewing functions such as image splitting. We can make concrete of the composite work by re-inserting it into the feature map of the active metal detector and manually calculating the first line of the composite map.

All the architectures used in the paper are modified by removing the fully connected layers with the output layer to fine-tune the already trained model to extract features.

Architecture 1 : VGG16

VGGNet-16 has 16 layers of resolution and is very attractive due to its similar Architecture, similar to AlexNet, but has many filters. It is currently the most widely used way to remove elements from images. VGG16 weight loss is publicly available and used in many other programs and challenges as a first feature release. However, VGG16 has 138 million parameters, which is a challenge to train. When the model is specified in the database and the parameters are changed and updated for increased accuracy, we can use the parameter values.

Architecture 2: Inception-v3

In 2014, Google researchers introduced the first network that stood first in the competition, which has ImageNet dataset for discovery challenges.

The model is made up of a basic unit called the “Inception Cell” in which we perform a series of interaction. Inception-v3 has 24 M parameters.

Architecture 3 : ResNet50

ResNet50 is a variety of ResNet model with 48 layers. It is the most popular and used ResNet model, and we have the design of ResNet50 in depth. ResNets was originally included in the image recognition function but as stated in the paper that the framework can be used for non-computer activities and for better accuracy. ResNet50 has 23 M parameters.

Architecture 4 : Xception

Xception is known as Extreme Inception based entirely on divisively divisive structures. The construction of Xception has 36 disclosure layers that form the basis of network outsourcing. The 36 layers of convolutional are organized into 14 modules, all with residual connections around it, with the exception of the first and last modules. The Xception architecture is a series of deep dividing layers with remaining connections. Xception is an adaptation from Inception model and has 23 M parameters.

Optimizers used are (i) Stochastic gradient descent (SGD), (ii) Root Mean Square Propagation (RMSprop) (iii) Adaptive Gradient Algorithm (Adagrad), (iv) Active Design and Analysis Modeling (Adam).

D. Classification

After the feature extraction, they are stored in the comma separated value (CSV). We have obtained a different number of features as described in Table 1. Explaining back our dataset, we have prepared the pairwise data for each subject. The first column is genuine signature, and the second column is either genuine signature or fraud signature associated with the same subject. Labels used are 0 if both signatures are genuine and 1 otherwise.

Table 1 Basic info about architecture
Table 2 The random sample of five points

File format used for genuine signature is PNG and naming schema used is XXX/YY_XXX for Genuine signature and XXX_forg/YY_ZZZXXX for forged signature. XXX denotes the person ID, YY denotes the signature number, ZZZ is the person ID signed the signature.

In the classification, we have used (i) Euclidian Distance, (ii) Cosine Similarity, (iii) Linear SVM, (iv) RBF SVM, (v) Sigmoid SVM, (vi) Poly SVM, (vii) Logistic Regression and (viii) Random Forest. We first created pairwise similarity between column 1 and column 2 using Euclidian distance and Cosine similarity. Pairs with the similarity greater than trainable hyper-parameter are not forged.

$${\bf{Euclidian}}\,{\bf{Distance}}:d(p,q) = \sqrt {\sum_{i - 1}^n {\left( {q_i - p_i } \right)} }$$
$${\bf{Cosine}}\,{\bf{Similarity}}:sim\left( {p,q} \right) = \frac{{p \cdot q}}{{\left\| p \right\| \ast \left\| q \right\|}} = \frac{{\sum_{i = 1}^n {\left( {p_i \ast q_i } \right)} }}{{\sqrt {\sum_{i - 1}^n {pi^2 } } \ast \sqrt {\sum_{i - 1}^n {qi^2 } } }}$$

∴ p, q are Euclidean points;\({ p}_{i}\), \({q}_{i}\) are feature vectors; n is dimension of vector.

Support Vector Machine, Logistic Regression, Random Forest: K(Xi, Xj) = (Xi,Xj + 1)d.

Features of image 1 and image 2 are concatenated to make total features of 2 * features of 1 image. The reason to choose SVM is that our previous experience and its results on high dimensional data (Table 2).

4 Results

The results obtained with feature extraction are presented in Tables 3, 4, 5, 6 and Graphs 1 and 2. The results obtained with classification are presented in Tables 7, 8, 9, 10. Bold value represents the best results.

Table 3 Training accuracy (3-fold)
Table 4 Training loss (3-fold)
Table 5 Validation accuracy (3-fold)
Table 6 Validation loss (three-fold)
Table 7 Model i—VGG16-Adam
Table 8 Model ii—VGG16-RMSprop
Table 9 Model iii—Inception-v3—Adam
Table 10 Model iv—Inception-v3—Adagrad

The first observation from the above tables VGG16 architecture outperformed all other architectures and features from the models that can be used for classification are with at least 95% training accuracy and 60% validation accuracy. Four models that we choose to test our classification algorithms are (i) VGG16—Adam, (ii) VGG16—RMSprop, (iii) Inception-v3—Adam and (iv) Inception-v3—Adagrad.

We now, for the classification, refer to the selected architecture for feature extraction with the assigned roman number above. i, ii, iii and iv for VGG16—Adam, VGG16—RMSprop, Inception-v3—Adam and Inception-v3—Adagrad, respectively.

Graph 1
figure 3

Feature extraction results for a selected model of VGG16

Graph 2
figure 4

Feature extraction results for a selected model for Inception-v3

For the Classification Tasks, models selected for feature extraction are used with supervised learning algorithms and supporting performance metrics such as Accuracy for each model, Confusion Matrix whenever necessary. All the models are trained on CPU i5-7200U with 8 GB of RAM. We ran the tests for three cross-fold validation.

From our observation, Euclidean distance and Cosine similarity didn’t perform well with our model while SVM outperformed all the models. Euclidean distance and Cosine similarity-based classification methods tend to overfit with our features. Average training time for Inception-v3-Adagrad architecture is significantly lower than other architectures. Bold marked model is with accuracy greater than 99% in Tables 7, 8, 9, 10. Best performing model is poly SVM with an accuracy of 99.28%. Other metrics to evaluate the model are also in Fig. 3. Figures 4, 5, 6, 7 are sample examples of our final handwritten signature verification system.

Fig. 3
figure 5

Confusion matrix for best performing model

Fig. 4
figure 6

Sample Example 1 after classification

Fig. 5
figure 7

Sample Example 2 after classification

Fig. 6
figure 8

Sample Example 3 after classification

Fig. 7
figure 9

Sample Example 4 after classification

5 Conclusion

In this paper, we have presented methods for feature extraction and classification on signature dataset. This paper does not focus on manual crafted features, inspire feature extraction is done using CNN architectures. Experiment conducted on the ICDAR 2011 Signature Dataset showed features extracted from VGG16 outperformed all other architectures with small margins. Other architectures may perform better than other datasets because experiments are random for feature extraction. In addition, classification best results are obtained with poly SVM on feature extracted from Inception v3trained on Adagrad optimizer.

Though studies till now have proved its best results on the recognition of handwritten digits (MNIST), its performance is not significant in the verification of signatures. As future work, we will focus on the development of a more purpose-specific neural network model. Furthermore, other different classification techniques specific to signature data can be explored.