Keywords

1 Introduction

Biometrics has become more and more an important need for automatically verifying individuals and evenly for the security of enterprises. Nowadays, among the different modalities of biometrics, the signature remains a very confident, lawfully, and socially accepted modality for verifying identities [1].

To design an Automatic Signature Verifier (ASV), the literature proposes to use two approaches: Writer-Dependent and Writer-Independent [1]. In the first approach, the samples of each individual are trained by a classifier separately from others, whereas in the Writer-Independent approach, only one classifier is used to train all the writers’ signatures. In both cases, the aim is to verify whether a questioned signature is genuine or forgery.

Since the texture remains one of the main discriminant characteristics to extract useful information from the images, many ASV systems are based on textural features for the signature image analysis and pattern recognition process. In our work, we propose to use a novel handcrafted feature for off-line signature verification based on both textural properties and run-length distributions.

The run-length features have been a favored method in several fields of image processing. They present one of the used features for image classification [2, 3], writer identification [4], and in our case, for offline signature verification. In the latter domain, they give a powerful spatial presentation of pixels, and under the concept of runs [2]. Typically, such spatial distribution is achieved by counting the runs in four directions: horizontal, vertical, and two diagonal directions.

However, a major problem is the well-known high intra-class variability of a user signature. It could be mainly due to changes in shape, size, or other visual aspects, which causes a spatial distribution distortion within the image signature of a user. All this limits the classic run-length features performance.

The main contribution of this paper is the definition of a new direction in the framework of run-length features. This new direction is named spiral direction, which adds a new representation of the image. Moreover, we combine this new direction to the classical four directions to improve the representation of the run-length features. Our work aims to study the efficiency of run-length features when adding the spiral direction for off-line ASV.

It is expected that this new direction will expand the run-length limitations due to its flexibility within the orientation and the size of the scanned lines, which raises its robustness regarding the intra-class variability, and compensates the static of each direction of the run-length features, that traverses the image line by line in only one given direction.

The paper is organized as follows: Sect. 2 includes some related works on run-length features in off-line signature verification. Section 3 defines the previous run-length features whereas the proposed spiral run-length feature is given in Sect. 4. Section 5 is devoted to the experiments and results. We close the paper by the conclusion in Sect. 6.

2 Related Works on Run-Length Features

Many techniques have been used for image texture analysis in signature verification [1, 2]. Run-length features are one of the textural descriptors basing on the lengths of runs. A run can be explained as a set of consecutive pixels in a given direction having the same value [3]. The length of the run is the number of pixels composing this run.

As a consequence, we work out the run-length histograms, which are composed of the numbers of runs of different lengths. This process is generalized for the four principal directions, Horizontal (0°), Vertical (90°), right-diagonal (45°) and left diagonal (135°). As a result, it gives four feature vectors comprising the four directions.

In 1975, Galloway [3] applied the run-length features to a set of textures representing nine terrain types, each one with six samples. He arranged two adjustments on the run-length technique to obtain numerical texture measures: the first one was based on all diagonal run lengths should be multiplied by \(\sqrt{2}\), while the second one was the short-run emphasis function. The classification results were quite promising.

The use of run-length features has been spread frequently in the field of texture analysis. Further, they have been adopted for purposes related to handwriting, such as writing or writer identification and verification and, more specifically, the verification of off-line handwritten signatures.

Djeddi et al. [5] applied the run-length and the 2D autoregressive coefficient features in signature verification. They used 521 writers from the GPDS960 dataset and the Support Vector Machine as the classifier. They performed the run-length on black pixels which correspond to the ink trace of the signatures and considered only runs of a maximum of 100 pixels for each direction (0°, 45°, 90°, 135°). A final vector of 400 values was obtained as a feature vector of 100 values per direction.

Serdouk et al. [6] proposed a combination of two data features, the orthogonal combination of local binary patterns and the Longest Run Features (LRF). The LRF calculated the connected pixels through the four principal directions: horizontal, vertical, right diagonal and left diagonal. For each direction, the longest run of the signature pixels was selected, the total sum of these numbers (lengths) constituted the LRF value in the given direction. This procedure was repeated for the remaining directions in order to get four LRF features. Finally, the four LRF features were combined with the other features to define each image-based signature. The proposed features were employed on GPDS300 and CEDAR databases, using SVM classifiers for the automatic verification task.

In Bouamra et al. [2], a new off-line ASV was designed by using run-length features. They were applied to black and white pixels, which corresponded to the signature and the background, respectively. The four run-length vectors for each color contained 400 values and the black and white output feature vector had, therefore, 800 values. They used only genuine signatures for training and employing the 881 writers of the GPDS960 (281 users for generating signature models and choosing optimal threshold, and 600 for the evaluation step). The One-Class Support Vector Machine (OC-SVM) was used for the classification phase. Some standard metrics were used to quantify the performance of the system, obtaining competitive performances.

In another work related to the prior one, Bouamra et al. [8] implemented multidirectional run-length features for automatic signature verification. The new features were based on the standard run-length features [2], with four supplementary angles added to the four primary directions: horizontal, vertical, left-diagonal, and right-diagonal direction; each angle is enhanced by its neighborhood to generate a composite one formed by three adjacent angles. Finally, eight composite angles are obtained as explicit orientations for scanning the signature image. The researchers employed the OC-SVM as a classifier to apply their features on the GPDS960 database.

The run-length features were also used on off-line ASV by Ghanim and Nabil [7]. In their study, they used different features including run-length, slant distribution, entropy, the histogram of gradient features and geometric features. Then, they applied machine learning techniques on the computed features like bagging trees, rand forest and support vector machines. The study aimed to calculate the accuracy of different approaches and to design an accurate system for signature verification and forgery detection. The Persian Offline Signature Data-set was utilized for evaluating the system, and the obtained results were satisfactory.

3 Classical Run-Length Features

Let be assumed a binarized image-based signature, in run-length histograms, \(RL_b \left( {i|\theta } \right)\) is the (i)th element describing the number of runs with black values and length i, occurring in the image along an angle \(\theta\). Thus, \(RL_w \left( {j|\theta } \right)\) is the (j)th element describing the number of runs with white value and length j occur in the image along angle θ.

Let’s indicate the following notations:

  • RLb is the number of black run lengths in the image.

  • RLw is the number of white runs lengths in the image.

  • \(N_B\) is the black run-length histograms for the four directions.

  • \(N_W\) is the white run-length histograms for four directions.

  • \(RL\_4D\) is the Global black and white Run-Length histograms for four directions.

The black and white run-length histograms are defined, respectively, as:

$$ RL_b \left( \theta \right) = \sum\nolimits_{i = 1}^{RL_b } {N_b \left( {i|\theta } \right)} $$
(1)
$$ RL_w \left( \theta \right) = \sum\nolimits_{j = 1}^{RL_w } {N_w \left( {j|\theta } \right)} $$
(2)
$$ \forall \;1 \le i \le N_b \;and\;1 \le j \le N_w . $$

The black and white run-length histograms for a given direction are concatenated as

$$ RL\left( \theta \right) = \left[ {RL_b \left( \theta \right),RL_w \left( \theta \right)} \right] $$
(3)

According to the pixel color, the black and white run-length histograms for the four directions are processed as:

$$ RL_B = \left[ {RL_b \left( {0^{\circ} } \right),RL_b \left( {45^{\circ} } \right),RL_b \left( {90^{\circ} } \right),RL_b \left( {135^{\circ} } \right)} \right] $$
(4)
$$ RL_W = \left[ {RL_w \left( {0^{\circ} } \right),RL_w \left( {45^{\circ} } \right),RL_w \left( {90^{\circ} } \right),RL_w \left( {135^{\circ} } \right)} \right] $$
(5)

where the final feature vector based on run-length histograms are concatenated as [2]:

$$ {\begin{array}{*{20}l} {RL_{4D} = \left[ {RL_B ,RL_W } \right] = [RL_b \left( {0^{\circ} } \right),RL_b \left( {45^{\circ} } \right),RL_b \left( {90^{\circ} } \right),RL_b \left( {135^{\circ} } \right),RL_w \left( {0^{\circ} } \right),} \hfill \\ {\quad \quad \quad \quad \quad \quad \quad \quad RL_w \left( {45^{\circ} } \right),RL_w \left( {90^{\circ} } \right),RL_w \left( {135^{\circ} } \right)]} \hfill \\ \end{array} } $$
(6)

In our work, we vectorized the 2D image to get a single long line. At this level, the run-lengths are calculated for both black and white pixels. This procedure is applied to the other three directions, i.e. vertical, right-diagonal and left-diagonal. In another meaning, before calculating the lengths of runs, we juxtaposed the lines of the image in the desired direction, line by line in a way to form a single vector that denotes a new different presentation of the image. On this vector, we apply the same algorithm to calculate the Run-Length distributions for this given direction, and so for the other directions.

Fig. 1.
figure 1

Run-length distribution for the horizontal direction

In Fig. 1 we illustrated a toy example of this procedure for the horizontal direction. In the black pixels, we observed that there is no run of length one, two runs of length two, one run of length three and one run of length four, as indicated in the first row. A similar observation can be made for white pixels. As such, the final horizontal vector is about 800 values (400 + 400 for black and white pixels, respectively). The procedure is repeated for the remaining directions. The resultant run-length feature vector has 3200 values due to the final concatenation of the four directions.

4 Spiral Run-Length Features

In this section, we describe first the proposed spiral run-length feature. Next, we propose two combinations to fuse the new feature with the previous four directions.

4.1 Spiral Feature Vector

A uniform displacement describes it on a rotating line until reaching a final center point. This way, the spiral run-length feature traverses the entire image in a spiral counterclockwise curve starting from the first pixel at the upper left corner of the image. Then it moves away more and more towards a last central point. This spiral movement rotates between the horizontal and the vertical directions. The procedure is shown in Fig. 2.

It could be said that the spiral feature treats four orthogonal directions differently, as shown in Fig. 2. The movement hither is done permanently, starting with a horizontal direction with an angle \(\theta_1 = 0^{\circ}\), followed by a descending vertical scan with an angle \(\theta_2 = - 90^{\circ}\). On reaching the end of the vertical column, the direction changes again moving towards the horizontal direction but on the contrary direction to the first angle with an angle of \(\theta_3 = 180^{\circ}\). The last direction to progress is the vertically upward direction by exploring the entire column from bottom to top on an angle \(\theta_4 = 90^{\circ}\). This round of four directions is iterated until browsing the entire signature image.

Fig. 2.
figure 2

New run-length direction: spiral based feature.

For counting the length of runs, the same procedure described in Sect. 3 is applied to the resulting vector of the spiral function. Accordingly, the final spiral vector size contains 800 values (400 for black pixels + 400 for white ones).

We consider the next notations:

  • \(SP_B\) is the number of black run lengths in the image.

  • \(SP_W\) is the number of white runs lengths in the image.

  • Nb is the black run-length histograms in spiral direction.

  • Nw is the white run-length histograms in spiral direction.

  • \(SP\) is the global black and white run-length histograms in spiral direction.

  • \(\theta_k\) is the browsing spiral angle:

    $$ \theta_1 = 0^{\circ} ,\theta_2 = - 90^{\circ} ,\theta_3 = 180^{\circ} ,\theta_4 = 90^{\circ}. $$

The black and white run-length histograms are defined, respectively, as follows:

$$ SP_B = \sum\nolimits_{i = 1}^{SP_B } {\sum\nolimits_{k = 1}^4 {N_b \left( {i|\theta_k } \right)} } $$
(7)
$$ SP_W = \sum\nolimits_{j = 1}^{SP_W } {\sum\nolimits_{k = 1}^4 {N_w \left( {j|\theta_k } \right)} } $$
(8)
$$ \forall \quad 1 \le i \le SP_B \;and\;1 \le j \le SP_W . $$

The global Spiral Run-Length histograms are then concatenated as

$$ SP = \left[ {SP_B ,SP_W } \right] $$
(9)

Therefore, the spiral transformation of the image is dynamic in direction (two changes: vertical/horizontal) and in orientation (two changes for every direction: \(\left( { \to , \leftarrow } \right)\) and \(\left( { \uparrow , \downarrow } \right)\)). It is also dynamic in size; with every change of direction, we subtract a pixel. This transformation is based on four changes of the directions, and every current movement is starting from the second pixel (the first of this current movement is the last of the precedent one, so it is already calculated).

The spiral feature regroups both of two horizontal and vertical directions at the same time. It helps to add complementary information to the four previous run-length directions. Thus, the spiral run-length feature can be considered as the fifth direction.

figure a

The steps of the proposed feature are highlighted in the pseudo-code Algorithms 1 and 2. They describe the spiral vector extraction and the spiral run-length features, respectively.

figure b

4.2 Combining Spiral with the Previous Directions

Two combinations are proposed to use the new spiral feature along with the previous run-length features. Specifically, they consist of combining the run-lengths features at the feature and score level.

On the feature level, the combination consists of concatenating the four run-length features and the spiral feature. On the one hand, we concatenate all the five black run-length histograms and, on the other hand, the five white run-length histograms. This way, the combined histograms contain the five directions. Let \(RL\_5D\) be the combined run-length histograms, it is defined as follows:

$$ RL\_5D = \left[ {RL_B ,SP_B ,RL_W ,SP_W } \right] $$
$$ \begin{array}{*{20}l} {RL_{5D} = [RL_b \left( {0^{\circ} } \right),RL_b \left( {45^{\circ} } \right),RL_b \left( {90^{\circ} } \right),RL_b \left( {135^{\circ} } \right),SP_B ,} \hfill \\ {RL_w \left( {0^{\circ} } \right),RL_w \left( {45^{\circ} } \right),RL_w \left( {90^{\circ} } \right),RL_w \left( {135^{\circ} } \right),SP_W ]} \hfill \\ \end{array} $$
(10)

On the score level combination, this fusion is concerned by the scores generated by classifiers. The global score is a combination of the two scores of the previous four run-length features and the spiral one. A weight sum of the two scores performs the combination:

$$ Sc = {\upalpha }.{\text{Sc}}_1 + \left( {1 - {\upalpha }} \right).{\text{Sc}}_2 $$
(11)

\(Sc\) being the final score, \({Sc}_{1}\) being the score of four directions run-length features and \({Sc}_{2}\) being the score of the spiral one, we heuristically set \(\mathrm{\alpha }\) in 0.5. In both cases of features, we process the black and white pixel distribution.

The experiments are carried out on each of the two levels of combination, with further details provided in the next passage.

5 Experiments

In this section, we present the used databases, the experimental protocol and the experiments with the two types of combinations: at both feature and score level when run-length features are used in ASV.

5.1 Database

We used the following two databases to evaluate our system:

GPDS75 Database.

This database was introduced by Ferrer et al. [9]. It contains the first 75 writers; each one has 24 genuine signatures and 30 skilled forgeries.

CEDAR Database.

IT is one of the most frequently used database for off-line ASV [10]. This database comprises a total of 55 signatures of different signers. Each individual signed 24 genuine signatures and has a total of 24 forged specimens.

5.2 Preprocessing

Our experiments necessitated the preprocessing phase since both GPDS75, and CEDAR datasets contain greyscale signatures, whereas our system’s application relies mostly on binary signatures.

The signatures were first extracted from the datasets, then binarized using Otsu’s method [17, 23], which involved determining a global threshold from the greyscale signature image. The threshold was accordingly employed to transform the greyscale signature into a binary signature by reducing the intra-class variance of the thresholded pixels.

5.3 One-Class Support Vector Machine

The availability of positive and negative training examples is one of the criteria of a classic Support Vector Machine (SVM) classifier.

The OC-SVM classifier employs only the genuine signatures for the training. The target class is discriminated from all other classes using only training data from the target class. The objective is to achieve a border that separates the target class examples from the rest of the space, a barrier that takes as many examples as possible targets [2, 11]. This border is defined by a decision function that is positive within a class S but negative outside of S: (S̅) as described in Fig. 3.

$$ f\left( x \right) = \left\{ {\begin{array}{*{20}l} { + 1} \hfill & {if\;x \in S} \hfill \\ { - 1} \hfill & {if\;x \in \overline{S}} \hfill \\ \end{array} } \right. $$
Fig. 3.
figure 3

One-class SVM classification

The parameters to be determined for the OC- SVM include the proportion of outliers (ϑ ∈ [0 1]) and the radial basis function kernel parameter (γ ∈ [0 1]). The RBF kernel was chosen after experimenting with several kernel functions [2].

5.4 Experimental Protocol

Our signature verification system comprises four steps: selecting a set of signers, building the signature models, locating the optimal decision threshold, and finally achieving the classification step.

The set of signers to be selected includes the first five (R5) and ten (R10) genuine signatures that are kept as reference signatures in the training stage. Then, the testing stage is conducted by employing the next ten genuine samples (g6…g15 in the case of R5, and g11…g20 in the case of R10) and the first ten skilled forgeries (f1…f10) for the experiments in both databases.

The optimal decision threshold is deduced from the false rejection rate (FRR) and the false acceptance rate (FAR) curves using the equal error rate (EER) [24, 25], as described in the next figure. The choice of the (EER) metric, which is defined as the system error rate when FRR = FAR [24], was chosen since it has been used in a variety of relevant studies (Fig. 4).

Fig. 4.
figure 4

EER performance measure

5.5 Results

We discuss here the combination at two levels: feature level and score level. The results of such fusions on GPDS75 and CEDAR databases are shown in Table 1 and Table 2, respectively.

For the feature level combination applied to the GPDS75 database, we gained EER = 9.24% and EER = 8.26% using 5 and 10 reference samples, respectively. Whereas, using the same database and number of references for the score level combination generated EER = 7.98% and as the best outcome we earned EER = 6.86%.

On the other hand, employing the CEDAR database affected the results illustrated in Table 2. The feature level fusion gained EER = 0.55% and EER = 0.36%, respectively, with 05 and 10 reference samples, while the results attained are EER = 0.73% and EER = 0.18% performing the score level fusion. This last outcome (EER = 0.18%) is the best value obtained operating the score level combination with 10 reference samples.

For both types of fusion, the experimental results in Tables 1 and 2 reveal that fusing the features raises the rate and improves system performance.

Table 1. Results in EER (%) on GPDS75 by combining at feature and score level.
Table 2. Results in EER (%) on CEDAR by combining at feature and score level.

Furthermore, we compare our results with previous works. Table 3 shows different works that have used the GPDS75 database. We can observe that our performances are in line with state of the art. For instance, Maergner et al. obtained the best EER = 6.49%, while in another work they got an EER = 6.84%. When we combine the five run-length features at the score level, our best performance was 6.86% on GPDS75.

According to Table 4, our results were competitive compared with previous works in CEDAR database. We observe a gap getting two minimal rates: EER = 0.18% and EER = 0.36%, followed by Hamadene et al. with AER = 2.10%, then Hafemann et al. with EER = 4.63% accompanied by Sharif et al. with EER = 4.67%. We conclude that our system was more performant with CEDAR database than GPDS75 database.

Table 3. Results on GPDS75 – comparison between the state-of-the-art and our system.
Table 4. Results on CEDAR – comparison between the state-of-the-art and our system.

5.6 Spiral Run-Length Features in External Competition

The evaluation of spiral run-length features against other handwritten signature verification systems was a critical step. For this reason, we submitted our features to the international competition on Short answer Assessment and Thai Student Signature and Name Components Recognition and Verification (SASIGCOM 2020) [19] which was organized in conjunction with the 17th International Conference on Frontiers in Handwriting Recognition (ICFHR 2020).

In the competition, six tasks were prepared for the competitors including the signature verification task, the thai student signature dataset was employed for this task shown in Table 5. Three type of forgery were adopted: simple, skilled and random forgeries. The Equal Error Rate (EER) was employed as the judge the different participating systems performance.

Table 5. Signature verification dataset (SASIGCOM 2020).

Our system based on the spiral run-length feature get EER = 0.1108% for the random forgeries, EER = 0.2045% for the skilled forgeries and an EER = 0.1459% for simple forgeries with an average of 0.1537%. The results cited in Table 6 show also that the classical run-length features get EER = 0.1308%, EER = 0.2145% and EER = 0.1599% for random, skilled and simple forgeries, while the multidirectional run-length feature obtained an average of 0.1415%. The first ranking was for a learned system with EER = 0.0019%, EER = 0.0710% and EER = 0.0090% for the same forgeries types respectively with an average of 0.0273%.

Table 6. Results of the signature verification task (SASIGCOM 2020).

According to Tables 3, 4, and 6, we notice that the different systems’ results obtained by using the GPDS75 database are more elevated than those acquired by using the CEDAR and the SASIGCOM databases.

More clearly, our system could reach very lowered EER values using the CEDAR database; this differentia is due to the system-dataset ratio. How the system scrutinizes the signature, the characteristic of each database, and how the signatures were preprocessed before including them in the database. For instance, the GPDS75 dataset is greyscaled, whereas the SASIGCOM database signatures are already binarized. Also, the background of the GPDS database is almost similar, whereas we find a difference in the CEDAR signatures background between the genuine and the forged signatures.

6 Conclusion

In this work, we propose a new direction for run-length features based on the signature’s spiral path. We observe performance improvements by combining the previous well-known four directions in run-length features with the proposal spiral direction. Thus, the spiral run-length feature can be understood as the fifth direction, which is more robust to intra-class variability and get better results than using only the four run-length features. In this work, we show results when combining the run-length features at the feature and score level, obtaining better performances at score level combination.

In our future works, we seek to improve the performance of automatic signature verification by applying other techniques of fusion and combination. In addition, we study other methods to process the run-length features and to extend its use in on-line signatures.