Abstract
This work presents a baseline system to automatic handwriting identification based only on graphometric features. Initially a set composed of 12 features was presented and its extraction process demonstrated. In order to evaluate the efficiency of these features, a selection process was applied, and a smaller group composed only of 4 features (GS = Goodness Subset) present the best writer identification rates. Experiments were conducted in order to evaluate the performance, individually and in group, of the graphometric features; and to identify the number of writers that significantly affect the accuracy of the system. The accuracy of the system applied to 100 different writers taking account the GS features set were 84% (TOP1), 96% (TOP5) and 98% (TOP10). These results are comparable to others in the literature on graphometric features. It can be observed that gradually the relation between the number of writers and accuracy is stabilized, and with 200 writers the results are maintained.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In dispute cases, questions related to the authenticity of documents presented as evidence can be discussed in Court. The problem becomes greater when dealing with handwriting documents, since the attempts to fraud and forgeries are more easily accessible, because high technology is not necessary to do them. In most cases, a writer and a pen are sufficient to accomplish a fraud or a forgery.
Currently, the forensic handwriting identification is performed by experts using optical (optical device) and/or chemicals methods. Based on Sheikholeslami [1], the manual process of feature extraction and observation is tedious and may leave doubts about the writer identification. In addition, different Forensic Document Examiners (FDE) may extract the same features from a document in a different way. Then, the use of semi-automatic identification systems can be useful and helpful to experts.
According to Sreeraj and Idicula [2], the automatic handwriting-based writer identification is an active research arena. As it is one of the most difficult problems encountered in the field of digital image processing and pattern recognition, the handwriting-based writer identification problem faces with several sub problems such as: designing algorithms to identify handwritings of different individuals; identifying and representing relevant features of the handwriting and evaluating the performance of automatic methods.
Although different approaches have been presented in researches such as [3,4,5,6,7] the principal difference between them is the feature set used to represent the handwriting. In this work, we present a baseline system to automatic handwriting identification based only on graphometric features, i.e., the same principles used by the FDEs during their analysis. Initially a set composed of 12 features was defined and its extraction process developed. To evaluate the efficiency of these features, a selection process was applied, and a smaller group composed only of 4 features present the best writer identification rates (considering the experiments realized).
It is worth mentioning that in a previous work [8] the features selection process was applied in a group composed by 8 features. In this work we increment this group including features to extract information related to the writer loops habits and better results were obtained.
Besides, experiments were conducted to determine the number of writers required to validate the baseline system. All experiments were realized considering TOP1, TOP5 and TOP10 choices. These classifications mean that the baseline system return a group of possible writers (one for TOP1, five for TOP5 and ten for TOP10) of a questioned document, and the correct writer is present in this group.
The paper is divided into the following sections. Section 2 presents the principles of forensic handwriting analysis used to define the proposed baseline system. Section 3 summarizes the baseline system, including the feature set defined and the feature selection process used to obtain the best feature set. Section 4 presents the experimental results and a brief discussion based on results obtained. Finally, Sect. 5 provides some considerations and indicates future investigations.
2 Forensic Handwriting Analysis
This work presents a baseline system to automatic handwriting identification based only on forensic features. Thus, in this section we present a discussion of the forensic handwriting analysis.
2.1 Forensic Principles and Concepts
According to Morris [9], the forensic handwriting identification is part of criminology and it analyses provide a great number of elements that affect a person’s writing. This important area also knows the relevance of writing systems and how they influence the writer since his childhood even his graphic maturity writing.
According to Schomaker [10], contrary to biometrics with a purely physical or biophysical basis, the biometric analysis of handwriting requires a very broad knowledge at multiple levels of observation. For the identification of a writer in a large collection of known samples of handwriting, multi-level knowledge must be considered. In forensic practice, many aspects are considered, ranging from the physics of ink deposition [11] to knowledge on the cultural influences in a population [12].
Bensefia [4] point out that each writer can be characterized by his own handwriting, by the reproduction of details and unconscious practices. Handwriting identification is based on the principle that there are individual features that distinguish one person’s writing from that of another.
According to Bensefia [4], the writer identification task concerns the retrieval of handwritten samples from a database using the handwritten sample under study as a graphical query. It provides a subset of relevant candidate documents, on which complementary analysis will be developed by the expert. Whereas, the writer verification task, on its own, must conclude about two samples of handwriting and determines whether they are written by the same writer or not.
2.2 Graphometry and Other Approaches
Based on Sreeraj and Idicula [2], approaches related to the feature extraction for writer identification can be divided into: global (extracted from paragraphs, lines, or just pieces of the text image); and local (extracted from characters and words).
Different approaches for handwriting identification have been presented in the literature. Many of them apply features extracted from the document image, such as texture approaches [13,14,15,16,17] or codebook approaches [6, 18]. These features are not considered graphometric, because they consist of complex computational transformations and procedures on the document image and do not consider the same principles used by the FDEs. The approach presented in this work uses specifically graphometric features as presented in [19,20,21,22,23]. These features are those observed by FDEs during their analyses.
3 Forensic Handwriting Identification Based on Graphometry
In this work, we propose a baseline system for handwriting writer identification. To conduct the experiments which validate the system, we apply documents from 200 different writers from Brazilian Forensic Letter Database [24]. This base, that is text dependent, is composed by three copies of the same letter for each writer.
During the first stage (training stage) it is necessary provide the model for each writer randomly selected from forensic database. Two letters from each writer was used in this stage. At second stage (testing stage), the baseline system compares a specific writer against the models established in the training stage applying the third letter of each writer. In the next sections we describe the preprocessing, feature extraction, classification and feature selection steps.
3.1 Preprocessing
The preprocessing consists in five tasks, that are: thresholding, that is the process of converting the 256-gray images in a binary image using the OTSU algorithm; lines segmentation, this process consists in finding and targeting the lines in the forensic letter; segmentation the words of each line, this task realizes the segmentation of the words of each line for further processing it; contours extraction, the stroke contours were obtained through the application of morphological filters; and document image segmentation, this process consist in spliting the image in 24 segments (6 × 4).
3.2 Feature Extraction
Based on the study of graphometry, the set of feature used for forensic handwriting identification process in current work is: relative placement habits (f 1 , f 3 , f 4 , f 5 , f 6 ) relative relationship between individual words height (f 2 and f 7 ), axial slant (f 8 ) and relative loop habits (f 9 , f 10 , f 11 , f12 ) as presented in Table 1.
An important feature related to handwriting individuality is relative placement habits [9]. Writers can make a better use of the paper sheet and write to its physical limit.
Another important feature is related to the size of the first word of each handwriting line. When this feature had to be computed, the first word of each line was bounded by a box and its height and proportion of black pixels were computed.
The axial slant is a graphometric feature extensively used in approaches to automatic writer identification. In fact, it represents the general angle of the handwriting and has the best individual performance in the baseline system.
The relative loop habits are a set of graphometric features extracted from words and characters. These features present information about the upward and downward loops of the words (height, width, number of pixels and axial slant).
Figures 1 and 2 presents an overview of the extraction process from a letter image of the Brazilian Forensic Letter Database [24]. The result of the extraction process is a vector containing 85 primitives (as can be observed in Table 1).
This vector is applied to SVM classifier in the training and testing stages. All features were normalized to improve the classification process.
3.3 Classification
The classification task consists in submitting the vectors of primitives extracted from the forensic letters to the SVM classifier. We select SVM classifier based on the literature and based on some tests applying other classifiers. In this stage, the questioned document (forensic letter) is confronted with the models generated for each writer (all-against-all), and a confusion matrix is generated as result. This matrix contains the probably of each writer to be the author of the questioned document. These probabilities permit to identifying not only the correct classification (TOP1), but also the five and ten (TOP5 and TOP10 respectively) candidates to be the author of the questioned document.
3.4 Feature Selection
In order to validate our feature set, a feature selection process was applied in the entire set (f 1 , f 2, f 3 , f 4 , f 5 , f 6, f 7, f 8, f 9, f 10, f 11, f 12 ) and a group composed only by the features f 1 , f 6 , f 8 and f 12 present the best writer identification rates. This selection process was reported by [8], in which, the group of features was composed by the features(f 1 , f 2, ,f 3 , f 4 , f 5 , f 6, f 7, f 8, ) and the selected features was composed by (f 1 & f 6 & f 8 ).
According to Dy and Broadley [25], feature selection is a process that selects a subset of original features. A general feature selection process comprises four steps: subset generation, subset evaluation, stopping criterion, and result validation.
4 Experimental Results and Discussion
To validate the baseline system experiments are realized focusing on: analyze the resulting group of the feature selection process and reach a maximum number of writers used in the experiments that significantly affect the accuracy of the system.
In the first experiment the feature selection process was used to achieve the best group of features, as described in Sect. 3.4. By a sequential forward search and an evaluation criterion based on dependency, the goodness set (GS) obtained was composed of features f 1 , f 6 , f 8 and f 12 . To ensure that the feature set, resulting from the feature selection process, was good, other sets of features empirically defined were evaluated (Table 2). The result of these experiments is also analyzed and TOP5 and TOP10 match classifications were prepared (as showed in Table 3) reaching writer identification rates close to 100%.
It is important to highlight that using TOP5 and TOP10 match classification the FDEs obtain better productivity since they can reduce the number of handwriting samples (to 5 or 10) which must be manually analyzed.
As mentioned before, another group of experiments was conducted to determine the number of writers which stabilizes the baseline system. To perform this task, writers randomly selected from the Brazilian Forensic Letter Database [24] were added in the group of users experimented in the baseline system, from 40 to 300 writers. The experiments were done with all the features, the best group of features (GS) and ensemble of features (Table 2) and the writer identification performance was computed.
It can be observed that gradually the relation between the number of writers and accuracy is stabilized, and with 200 writers the results are maintained. It is important note that applying 200 different writers represents to consider 400 letters in the training stage and 200 letters in the test stage, totaling 600 letters. Furthermore, the writer identification rate with the larger group (200 writers) was of 71%.
Table 4 presents a brief comparison of the results obtained using the baseline system and other present in the literature. Considering the number of writers and the accuracy, our results are very promising, as can be observed, with 160 writers our identification rate is 76% while other work with similar sample size [22] is 58%.
5 Conclusion and Future Works
Current paper discussed the efficiency of a graphometric feature set which can be applied to writer identification. Firstly, we have described the main features based on graphometric principles and research related to them. Thereafter, we presented the baseline system. We have demonstrated, based on experimental results and a feature selection process, that these features achieved promising results for forensic handwriting analysis. Results were improved in TOP1 classification when the GS was applied, and results were comparable to others in the literature (Table 4) when graphometric features were considered. Considering TOP5 and TOP10 classifications, the writer identification rates achieved was close to 100%. It is important to detach the productivity gain obtained for forensic handwriting analysis when reducing the number of handwriting samples (to 5 or 10) which must be manually analyzed.
Besides, experiments were conducted to determine the number of writers which stabilizes the baseline system performance, and with 200 different writers no significantly gain or damage was perceived in the results. As future work, new features will be studied and included in the baseline system trying to improve the results and some tests with other classifiers will be prepared.
References
Sheikholeslami, G., Srihari, S.N., Govindaraju, V.: Computer aided graphology. In: Proceedings of the 5th International Workshop on Frontiers in Handwriting Recognition, Essex, England, pp. 457–460 (1996)
Sreejaj, M., Idicula, S.M.: A survey on writer identification schemas. Int. J. Comput. Appl. 26(2), 23–33 (2011)
Plamondon, R., Lorrete, G.: Automatic signature verification and writer identification: the state of the art. Pattern Recogn. 37(2), 107–131 (1989)
Bensefia, A., Paquet, T., Heutte, A.: A writer identification and verification system. Pattern Recogn. Lett. 26(13), 2080–2092 (2005)
Blankers, V., Niels, R., Vuurpijl, L.: Writer identification by means of explainable features: shapes of loops and lead-in strokes. In: Proceedings of the 19th Belgian-Dutch Conference on Artificial Intelligence, pp. 17–24 (2007)
Schomaker, L., Franke, K., Bulacu, M.: Using codebooks of fragmented connected-component contours in forensic and historic writer identification. Pattern Recogn. Lett. 28, 719–727 (2007)
Luna, E.C.H., Riveron, E.M.F., Calderon, S.G.: A supervisoned algorihm with a new differentiated-weighting scheme for identifying the author of a handwritten text. Pattern Recogn. Lett. 32, 1139–1144 (2011)
Amaral, A.M.M.M., Freitas, C.O.A., Bortolozzi, F.: Feature selection for forensic handwriting identification. In: Proceedings of 12th International Conference on Document Analysis and Recognition. Washington: (IAPR), vol. 1, pp. 10–15 (2013)
Morris, R.N.: Forensic Handwriting Identification: Fundamental Concepts and Principles. Academic Press, Cambridge (2000)
Schomaker, L.: Writer identification and verification. In: Ratha, N.K., Govindaraju, V. (eds.) Advances in Biometrics: Sensor, Algorithms and Systems, pp. 247–264. Springer, London (2008). https://doi.org/10.1007/978-1-84628-921-7_13
Franke, K., Rose, S.: Ink-deposition model: the relation of writing and ink deposition processes. In: Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition (IWFHR 2004), pp. 173–178. IEEE Computer Society (2004)
Schomaker, L., Bulacu, M.: Automatic writer identification using connected-component contours and edge based features of upper-case western script. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 787–798 (2004)
Said, H.E.S., Tan, T., Baker, K.: Writer identification based on handwritings. Pattern Recogn. 33(1), 133–148 (2000)
Bulacu, M., Schomaker, L. Brink, A.: Text-independent writer identification and verification on offline Arabic handwriting. In: Proceedings of the 9th Conference on Document Analysis and Recognition (ICDAR) (2007)
He, Z., You, X., Tang, Y.: Writer identification of Chinese handrwriting documents using hidden Markov tree model. Pattern Recogn. 41, 1295–1307 (2008)
Helli, B., Moghaddam, E.: A text-independent Persian writer identification based on feature relation graph (FRG). Pattern Recogn. 43, 2199–2209 (2010)
Hanusiak, R.K., Oliveira, L.S., Justino, E., Sabourin, R.: Writer verification using texture-based features. Int. J. Doc. Anal. Recogn. 15, 213–226 (2011)
Siddiqi, I., Vincent, N. Combining global and local features for writer identification. In: Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition, pp. 48–53 (2008)
Zois, E., Anastassopoulos, V.: Morphological waveform coding for writer identification. Pattern Recogn. 33(3), 385–398 (2000)
Hertel, C. Bunke, H.: A set of novel features for writer identification. In: Proceedings of the 4th International Conference on Audio- and Video-Based Biometric Person Authentication, pp. 679–687 (2003)
Schlapbach, A., Bunke, H.: Off-line handwriting identification using HMM based recognizers. In: Proceedings of 17th International Conference of the Pattern Recognition – ICPR 2004, vol. 2 (2004)
Pervouchine, V., Leedham, G.: Extraction and analysis of forensic document examiner features used for writer identification. Pattern Recogn. 40, 1004–1013 (2007)
Chen, J., Lopresti, D., Kavallieratou, E.: The impact of ruling lines on writer identification. In: 12th International Conference on Frontiers in Handwriting Recognition (2010)
Freitas, C.O.A., Oliveira, L.S., Bortolozzi, F., Sabourin, R.: Brazilian forensic letter database. In: Proceedings of the 11th International Workshop on Frontiers on Handwriting Recognition (2008)
Dy, J.G., Broadley, C.E.: Feature subset selection and order identification for unsupervised learning. In: Proceedings of 17th International Conference on Machine Learning, pp. 247–254 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Amaral, A.M.M.M., de Almendra Freitas, C.O., Bortolozzi, F., Maldonado e Gomes da Costa, Y. (2018). Forensic Document Examination: Who Is the Writer?. In: Mendoza, M., Velastín, S. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2017. Lecture Notes in Computer Science(), vol 10657. Springer, Cham. https://doi.org/10.1007/978-3-319-75193-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-75193-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75192-4
Online ISBN: 978-3-319-75193-1
eBook Packages: Computer ScienceComputer Science (R0)