Detecting Document Forgery Using Hyperspectral Imaging and Machine Learning

Rastogi, Vrinda; Srivastava, Sahima; Jaiswal, Garima; Sharma, Arun

doi:10.1007/978-3-031-11349-9_2

Vrinda Rastogi¹⁰,
Sahima Srivastava¹⁰,
Garima Jaiswal¹⁰ &
…
Arun Sharma¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1568))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

872 Accesses

Abstract

Forgery is the process of fabricating, transforming or imitating writings, objects, or documents. It is a white-collar crime. Investigating forged cheques, wills or modified documents frequently involves analysing the inks used in these write-ups. Hyperspectral imaging can be used to identify various types of materials. This technology paired with powerful classifiers can be implemented to identify the various types of inks used in a document. This study leveraged the UWA Writing Ink Hyperspectral Images database (WIHSI) to carry forth ink detection by applying three different dimension reduction algorithms namely: Principal Component Analysis, Factor Analysis, and Independent Component Analysis. After which, a comparative study was carried forth between different processes applied in this study and existing methods. In essence, this work aims to integrate the use of hyperspectral imagery with machine learning and dimension reduction to detect document forgery.

V. Rastogi and S. Srivastava—Equal contribution.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Hyperspectral Imaging in Document Forgery

Hyperspectral imaging coupled with chemometric analysis for non-invasive differentiation of black pens

Article Open access 18 October 2016

Classification and Feature Extraction for Document Forgery Images

Keywords

1 Introduction

Document forgery is a ubiquitous and noteworthy problem in today’s age. It is the process of fabricating, transforming or imitating writings, objects, or documents [16]. False making entails fraudulently writing on an official document or making subtle changes to the original document. False making brings the authenticity of a document into question. More often than not it involves falsifying or forging signatures [28]. This type of forgery directly correlates to the theft of the identity of an individual. Forgery can also include backdating, overwriting, and creating addendums to the document. Thus, a different ink is used to modify an already existing document resulting in a forgery of ink.

Broadly speaking, there are two methods to pin-point or identify forgery namely: destructive methods and non-destructive methods. Destructive methods carry out analysis of the ink, usually in laboratories [1] or similar settings. However, this method doesn’t always abide by the ethical code of conduct [11]. This is because they are by default intrusive and involve taking a sample of the document, like a punch in the paper, and analysing it separately or disturbing the ink on the entire document [10]. Analysis of such kind can damage or alter the entire document permanently. The most common method of destructive ink testing is thin layer chromatography (TLC) [1].

Albeit, non-destructive methods are non-intrusive and do not change the document fundamentally [35]. The document is left intact and can be re-evaluated in the future. This method is faster and cheaper [32]. The most prevalent technique for non-destructive analysis is spectrophotometry [6]. Spectroscopy, which is the study of the relation between matter and radiation is an umbrella term for spectrophotometry [5].^{Footnote 1}

Hyperspectral imaging (HSI) is a type of spectral imaging capable of leveraging both spatial and spectral data from materials or objects [18]. With the ability to analyse a wide range of the electromagnetic spectrum [7, 14], HSI was first utilised in remote sensing technology [29]. Unlike normal imaging, it can extend the number of bands scanned to up to hundreds and thousands [4, 34]. As the bands analysed are narrow and contiguous in nature, HSI effectively records subtle variations in reflected energy.

2 Literature Survey

This section explains some of the previous work done in detecting forgery in documents using hyperspectral technology.

Khan et al. [25] created the first publicly available dataset for document forgery detection using hyperspectral imaging which is known as “UWA Writing Ink Hyperspectral Image Database”. In order to reduce dimensionality and extract crucial features from the dataset, they used Joint Sparse Principal Component Analysis (JSPCA). Similarly, to identify different inks, Joint Sparse Band Selection (JSBS) technique was implemented. This technique achieved an accuracy of around 85%.

Devassy et al. [12] created their own document dataset using hyperspectral technology. The data was then normalised using the standard reference target technique. After which, this data was fed into a one dimensional convolutional neural network (CNN) and achieved 91% accuracy.

In [22], six different CNN’s were implemented using the UWA Writing Ink Hyperspectral Image Database. It achieved 99.2% accuracy for blue inks and 99.6% accuracy for black inks. Yet this approach is limited as it requires prior knowledge for training the neural networks.

Luo et al. [30] also utilised the UWA’s Image Database where ink distinction in a local area was carried out using anomaly detection. The study concluded that Influenced Outlierness (INFLO) anomaly detection method integrated with point-to-point (P2P) criterion for feature selection gave the highest results. This technique also tackles the apriori problem via clustering.

UWA’s Image Database is also implemented in [33], yet in this technique orthogonal and graph regularised Non-negative Matrix Factorisation Model is applied. It achieves an accuracy of around 85%, but lacks in finding the optimal hyperparameter selection.

Hyperspectral data was fetched using the neighboring pixel’s spectral responses in [20]. They also took advantage of the WIHSI database. They fed the responses to convolutional neural networks (CNN) to garner the writer’s identity with the highest accuracy of 71%. However, an extreme limit of this work is that the dataset picked was not huge enough to affirm the model’s certainty.

The utilisation of Least Square Support Vector Machines (LS-SVM) for ink detection using hyperspectral data was purposed by Morales et al. [31]. To carry out their work, they formed a dataset from scratch where they created hyperspectral curves by removing the background. Smoothening procedures were applied to the curves to extract 21 crucial pixels. Feature pairs were created by finding the area and features of the slope. Finally, the SVM algorithm was applied on this data. The highest accuracy achieved was of 80%.

Another work whom created their own dataset is of Wang et al. [36]. They focused on noise removal and reducing dimensions by applying Principal Component Analysis (PSA). Then, the dataset underwent psedudo colour synthesis. It created a technique to detect the writer’s identity. The accuracy rate altered due to the changing ink patterns.

3 Dataset Description

For this work, the UWA Writing Inks Hyperspectral Images (WIHSI) database [26] was selected. This database consists of 33 visible band spectrums via 70 hyperspectral images. It contains ten different types of inks, five blue and five black, along with seven different subjects. One one page, the sentence ‘The quick brown fox jumps over the lazy dog’ was written by each subject using five different inks, from varying manufacturers. This was done to ensure they appeared visually similar as they were the same colour (blue or black) yet remained unique.

4 Methodology

The methodology followed to do the work is depicted in Fig. 1.

4.1 Preprocessing

On the dataset, background was removed and Sauvola thresholding was applied. It was chosen as it considers unequal illumination and can take crucial data from hyperspectral information efficiently. After which, the five sentences in each document were decomposed into singular sentences for easy analysis in regards to classification. These sentences have been transformed into hypercubes which are 3-dimensional matrices that reflects the 33 spectral bands. Finally, spectral response vector is created of dimensions 1$\,\times \,$33.

In order to carry forth testing of ink detection, hypercubes were created with different inks in various ratios as showed in Table 1.

Table 1. Ratios in which inks were mixed

Full size table

4.2 Dimension Reduction

Dimensionality Reduction is a process that creates subsets of important features which then in turn act like one new attribute. The significance of this technique is that less data is lost in comparison to feature selection but it takes up more space as well. The following elaborates on the various dimension reduction algorithms applied:

Principal Component Analysis. Principal Component Analysis (PCA) is a technique for acquiring significant factors (in the form of components) from an enormous arrangement of factors accessible in an informational collection [37]. It separates low dimensional arrangement of features by taking a projection of unessential measurements from a high dimensional informational collection with a thought process to catch however much data as could be expected. In the end, it aims to take the crucial data from the overall dataset and portrays it as principal components which are essentially a collection of novice orthogonal variables [3]. These showcase the similarity pattern and creates a map of point variables. It is a widely implemented tool, especially in the field of unsupervised applications [13].

Factor Analysis. In this Factor Analysis strategy, correlations are used to gather the variables, i.e., all factors in a specific gathering will have a high relationship among themselves, yet a low relationship with factors of other group(s). In factor analysis, the scale of the variables do not come into play, unlike the orthogonal rotations of each of the factors [17]. According to [21], there are two steps in this process: initially the solution is found, then it is rotated. It is not primarily limited to dimension reduction, but also to understand the various dimensions and to test hypothesis [27].

Independent Component Analysis. Independent Component Analysis (ICA) depends on data hypothesis and is additionally quite possibly the most generally utilized dimensionality decreasing procedures [19]. The significant contrast among PCA and ICA is that PCA searches for uncorrelated components while ICA searches for free factors. On the off chance that two factors are uncorrelated, it implies there is no straight connection between them. In the event that they are free, it implies they are not subject to different factors [9]. For instance, the age of an individual is free of what that individual eats, or how much TV one watches. This calculation expects that the given factors are straight combinations of some obscure idle factors. It likewise accepts that these dormant factors are commonly free, i.e., they are not subject to different factors and thus they are known as the independent segments of the noticed information [8].

4.3 Machine Learning

In this method, there are 10 different inks utilised which are known beforehand. Therefore, supervised machine learning classifiers are applied. Decision Trees and Random Forest were chosen to check how crucial different bands are in detecting inks. K-Nearest Neighbors was chosen to encapsulate document forgery inspection by leveraging neighbors in dimensionality. Similarly, Support Vector Machines (SVM) was selected and their more powerful as they use soft margins and complicated hyper-planes.

5 Results

Table 2 and 3 summarises the quantitative results obtained by applying various machine learning classifiers and dimension reduction algorithms on HSI data. The accuracy used was segmentation accuracy as displayed by Eq. 1 and is defined as the crossing point or association metric, which ascertains the quantity of accurately marked pixels of an ink isolated by the quantity of pixels marked with that ink in either reality or predicted [15]. Overall, better results were obtained for blue inks as compared to black inks, keeping the processes same.

$$\begin{aligned} Accuracy = \frac{True Positives}{True Positives + False Positives + False Negatives} \end{aligned}$$

(1)

Table 2. Black ink accuracy when applied with different processes.

Full size table

5.1 Machine Learning Without Dimension Reduction

After investigating the classification accuracy of classifiers for both inks without dimension reduction the following insights were gained.

1.
Out of the three SVM’s kernels implemented, the one with the best performance is the ‘RBF’ kernel for both blue and black inks.
2.
After the ‘RBF’ kernel, the next best kernel was ‘linear’ kernel for both blue and black inks in regards to accuracy. The kernel which had the least performance was ‘poly’ kernel.
3.
Random Forest outperformed decision trees for both inks in terms of accuracy. This can be due to the combination of many decision trees in random forest.
4.
In K-nearest neighbours, the best performance for black inks was when ‘k’ was set to 15.
5.
For blue inks, in KNN, the performance varied. For some ratios, ‘k’ was best to be at 5 and for other ratios, it was for when ‘k’ was set to 10.
6.
From all of the ML classifiers applied, the global trend was that the blue ink had a pronounced accuracy in comparison to black inks.
7.
In all of the accuracies compared, another global trend was that 1:32 had an overall lower accuracy in comparison to the other ratios. This can be due to the fact that in this ratio, the first ink has a very small impact, making it difficult to classify the ink.

5.2 Machine Learning with Dimension Reduction

After applying dimension reduction to the data, the classifiers accuracy performance showcased the following:

1.
Out of the three dimension algorithms applied, the best performance was by Factor Analysis.
2.
After factor analysis, the next best algorithm was independent component analysis. In fact, both their performances were neck-to-neck. This could be as both algorithms are very similar. The only exception is that while both focus on finding basis vectors, ICA takes to that vector and finds its kurtosis.
3.
PCA performed the worst out of all the methods applied. It couldn’t even match the performance of the process with only machine learning. This could be due as PCA doesn’t work with specific vectors nor is it generative like factor analysis.
4.
Factor analysis and ICA helped to increase the performance, especially in the 1:32 ratio and overall in black inks. This is depicted in Figs. 2 and 3 respectfully.

5.3 Comparison with Existing Methods

Table 4 showcases this method’s accuracy in contrast with existed processes which also implements the same dataset for ink detection. The proposed technique outflanks the past strategies by accomplishing the most elevated exactness on the blended ink mixes which contains unequal proportions.

Table 3. Blue ink accuracy when applied with different processes.

Full size table

Table 4. Comparative study of proposed method

Full size table

6 Conclusions and Future Work

From the investigations carried out, one can safely assume that spectral data is a good enough discriminator of inks and plays an eminent role in document forgery. It can also be concluded that though machine learning classifiers got a great accuracy while identifying ink mismatch.

In the future, more complex models and pre-analysis techniques should be explored for classification such as deep learning. This approach uses only spatial data, hence, the performance may improve using both spatial and spectral data. Another drawback that can be worked upon is the disparity in performance of black and blue inks. To further improve document forgery detection, writer detection can also be studied in the future.

Notes

1.
https://socratic.org/questions/what-is-difference-between-spectrophotometry-and-spectroscopy.

References

World of forensic science. encyclopedia.com, 16 October 2020, December 2020. https://www.encyclopedia.com/science/encyclopedias-almanacs-transcripts-and-maps/ink-analysis
Abbas, A., Khurshid, K., Shafait, F.: Towards automated ink mismatch detection in hyperspectral document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1229–1236. IEEE (2017)
Google Scholar
Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2(4), 433–459 (2010)
Article Google Scholar
Barnes, M., Pan, Z., Zhang, S.: Systems and methods for hyperspectral imaging, US Patent 9,117,133, 25 Aug 2015
Google Scholar
Braz, A., López-López, M., García-Ruiz, C.: Raman spectroscopy for forensic analysis of inks in questioned documents. Forensic Sci. Int. 232(1–3), 206–212 (2013)
Article Google Scholar
Causin, V., Casamassima, R., Marruncheddu, G., Lenzoni, G., Peluso, G., Ripani, L.: The discrimination potential of diffuse-reflectance ultraviolet-visible-near infrared spectrophotometry for the forensic analysis of paper. Forensic Sci. Int. 216(1–3), 163–167 (2012)
Article Google Scholar
Chang, C.I.: Hyperspectral Imaging: Techniques for Spectral Detection and Classification, vol. 1. Springer, Cham (2003). https://doi.org/10.1007/978-1-4419-9170-6
Comon, P.: Independent Component Analysis (1992)
Google Scholar
Comon, P.: Independent component analysis, a new concept? Signal Process. 36(3), 287–314 (1994)
Article Google Scholar
Craddock, P.: Scientific Investigation of Copies, Fakes and Forgeries. Routledge, Milton Park (2009)
Book Google Scholar
Deep Kaur, C., Kanwal, N.: An analysis of image forgery detection techniques. Stat. Optim. Inf. Comput. 7(2), 486–500 (2019)
MathSciNet Google Scholar
Devassy, B.M., George, S.: Ink classification using convolutional neural network. NISK J. 12 (2019)
Google Scholar
Ding, C., He, X.: $k$-means clustering via principal component analysis. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004, p. 29. Association for Computing Machinery, New York (2004). https://doi.org/10.1145/1015330.1015408
ElMasry, G., Sun, D.W.: Principles of hyperspectral imaging technology. In: Hyperspectral Imaging for Food Quality Analysis and Control, pp. 3–43. Elsevier (2010)
Google Scholar
Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)
Article Google Scholar
Farid, H.: Image forgery detection. IEEE Signal Process. Mag. 26(2), 16–25 (2009)
Article Google Scholar
Fodor, I.K.: A survey of dimension reduction techniques. Technical report, Citeseer (2002)
Google Scholar
Gowen, A.A., O’Donnell, C.P., Cullen, P.J., Downey, G., Frias, J.M.: Hyperspectral imaging-an emerging process analytical tool for food quality and safety control. Trends Food Sci. Technol. 18(12), 590–598 (2007)
Article Google Scholar
Hyvärinen, A.: Survey on independent component analysis (1999)
Google Scholar
Islam, A.U., Khan, M.J., Khurshid, K., Shafait, F.: Hyperspectral image analysis for writer identification using deep learning. In: 2019 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–7. IEEE (2019)
Google Scholar
Joliffe, I.T., Morgan, B.: Principal component analysis and exploratory factor analysis. Stat. Meth. Med. Res. 1(1), 69–95 (1992)
Article Google Scholar
Khan, M.J., Khurshid, K., Shafait, F.: A spatio-spectral hybrid convolutional architecture for hyperspectral document authentication. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1097–1102. IEEE (2019)
Google Scholar
Khan, M.J., Yousaf, A., Khurshid, K., Abbas, A., Shafait, F.: Automated forgery detection in multispectral document images using fuzzy clustering. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 393–398. IEEE (2018)
Google Scholar
Khan, Z., Shafait, F., Mian, A.: Hyperspectral imaging for ink mismatch detection. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 877–881. IEEE (2013)
Google Scholar
Khan, Z., Shafait, F., Mian, A.: Automatic ink mismatch detection for forensic document analysis. Pattern Recogn. 48(11), 3615–3626 (2015)
Article Google Scholar
Khan, Z., Shafait, F., Mian, A.S.: Towards automated hyperspectral document image analysis. In: AFHA, pp. 41–45 (2013)
Google Scholar
Kim, J.O., Ahtola, O., Spector, P.E., Mueller, C.W., et al.: Introduction to Factor Analysis: What It Is and How to Do It, no. 13. Sage (1978)
Google Scholar
Koppenhaver, K.M.: Forgery. Forensic Document Examination: Principles and Practice, pp. 55–60 (2007)
Google Scholar
Lu, G., Fei, B.: Medical hyperspectral imaging: a review. J. Biomed. Opt. 19(1), 010901 (2014)
Article Google Scholar
Luo, Z., Shafait, F., Mian, A.: Localized forgery detection in hyperspectral document images. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 496–500. IEEE (2015)
Google Scholar
Morales, A., Ferrer, M.A., Diaz-Cabrera, M., Carmona, C., Thomas, G.L.: The use of hyperspectral analysis for ink identification in handwritten documents. In: 2014 International Carnahan Conference on Security Technology (ICCST), pp. 1–5. IEEE (2014)
Google Scholar
Polikreti, K.: Detection of ancient marble forgery: techniques and limitations. Archaeometry 49(4), 603–619 (2007)
Article Google Scholar
Rahiche, A., Cheriet, M.: Forgery detection in hyperspectral document images using graph orthogonal nonnegative matrix factorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 662–663 (2020)
Google Scholar
Schultz, R.A., Nielsen, T., Zavaleta, J.R., Ruch, R., Wyatt, R., Garner, H.R.: Hyperspectral imaging: a novel approach for microscopic analysis. Cytometry 43(4), 239–247 (2001)
Article Google Scholar
Springer, E., Bergman, P.: Applications of non-destructive testing (NDT) in vehicle forgery examinations. J. Forensic Sci. 39(3), 751–757 (1994)
Article Google Scholar
Wang, W., Zhang, L., Wei, D., Zhao, Y., Wang, J.: The principle and application of hyperspectral imaging technology in detection of handwriting. In: 2017 9th International Conference on Advanced Infocomm Technology (ICAIT), pp. 345–349. IEEE (2017)
Google Scholar
Wells, M.R., Vaidya, U.: RNA transcription in axotomized dorsal root ganglion neurons. Mol. Brain Res. 27(1), 163–166 (1994)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Indira Gandhi Delhi Technical University for Women, Kashmere Gate, Delhi, India
Vrinda Rastogi, Sahima Srivastava, Garima Jaiswal & Arun Sharma

Authors

Vrinda Rastogi
View author publications
You can also search for this author in PubMed Google Scholar
Sahima Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Garima Jaiswal
View author publications
You can also search for this author in PubMed Google Scholar
Arun Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vrinda Rastogi .

Editor information

Editors and Affiliations

Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Technology Ropar, Ropar, India
Subrahmanyam Murala
Jadavpur University, Kolkata, India
Ananda Chowdhury
Indian Institute of Technology Ropar, Ropar, India
Abhinav Dhall
Indian Institute of Technology Ropar, Ropar, India
Puneet Goyal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rastogi, V., Srivastava, S., Jaiswal, G., Sharma, A. (2022). Detecting Document Forgery Using Hyperspectral Imaging and Machine Learning. In: Raman, B., Murala, S., Chowdhury, A., Dhall, A., Goyal, P. (eds) Computer Vision and Image Processing. CVIP 2021. Communications in Computer and Information Science, vol 1568. Springer, Cham. https://doi.org/10.1007/978-3-031-11349-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-11349-9_2
Published: 24 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11348-2
Online ISBN: 978-3-031-11349-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Detecting Document Forgery Using Hyperspectral Imaging and Machine Learning

Abstract

Similar content being viewed by others

Hyperspectral Imaging in Document Forgery

Hyperspectral imaging coupled with chemometric analysis for non-invasive differentiation of black pens

Classification and Feature Extraction for Document Forgery Images

Keywords

1 Introduction

2 Literature Survey

3 Dataset Description