Abstract
Hyperspectral images (HSIs) are captured over continuous wavebands. It includes the object's spatial as well as spectral information. Different materials have different reflection and absorption properties, which allows obtaining the underlying sample's physical structure and chemical composition. Therefore, hyperspectral imaging technology gives comprehensive information about the sample. This paper uses a hyperspectral imaging approach to detect document forgery in the documents through ink mismatch detection. The proposed approach utilizes PCA to handle multiple dimensions in HSIs. It captures the spectral features which act as an input to a convolutional neural network for image classification. The method is applied to the UWA Writing Ink HSI (WIHSI) database. The results are compared with the state-of-art results that prove the proposed approach's potential.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Hyperspectral Imaging is an advanced technique based on spectroscopy. It collects images over a wide and continuous range of wavelengths for the same spatial area. It captures the spatial and spectral information from the object under analysis. It divides the spectrum into many more bands as illustrated in Fig. 1. It helps to determine the composition and assigns a unique spectral signature of the underlying material. It is being used in forensics, agriculture, archeology, security, medical diagnosis and surgery, remote sensing, and many more [1]. The unique spectral signature of each material in hyperspectral imaging (HSI) gives it tremendous power in identification and verification purposes [2].
Document forgery involves creating, falsifying, modifying the document with the wrong intention. In the present world of tools and techniques, detecting document forgery is a mysterious task that a naked eye cannot identify. In Document Analysis, the material's unique spectral signature helps us identify ink mismatches, the timeline of manuscripts, and recovering the degraded scripts. Cutting-edge technology like deep learning approaches has attained noticeable results in many domains [3].
In the present study, we utilized PCA for dimensionality reduction, and the extracted spectral features are fed to CNN to detect document forgery. The paper has been segregated into various sections. Section 2 illustrates the applications of hyperspectral imaging in various domains. Related work is elaborated in Sect. 3. The experimental setup and results are discussed in Sects. 4 and 5. Conclusion and Future work are depicted in Sect. 6.
2 Applications
Hyperspectral imaging is an emerging approach, gaining popularity for solving problems in many domains. This section highlights the application areas for hyperspectral imaging in the document analysis domain.
2.1 Document Analysis
Ink Mismatch Detection. The document forgery has always been of great concern and is frequently seen in fraudulent bank cheques, tampering with historical manuscripts and forensic evidence, and many more. If a document consists of more than one ink of similar color that may indicate a possible forgery. Currently, there are two approaches for identifying the ink mismatching-destructive and non-destructive. Non-destructive methods such as hyperspectral imaging are very useful in identifying the homogeneity of the ink.
Recovering Degraded Documents. The historical manuscripts, sometimes with time or due to some external factors, get degraded and unreadable. The unique spectral signature assigned to every element in an HSI can help to trace the text by applying image classification models [4].
Writer Identification. HSI is used to recognize the handwriting in a document. This approach helps to identify the writer or the owner of the document. It assists in identifying the modification and tampering done in the documents.
2.2 Supply Chain Management
Inventory Management. Supply chain management requires innovations and developments that may help it work in a more efficient and faster way in managing the inventory. The crucial role is to identify and also verify the articles. The problem usually comes in the verification of the article. It is a tedious task to separate articles manually and verify them. The customarily used cameras give us the RGB image, short of the subtle differences immune to naked eyes [5]. Here comes the beauty of HSIs, which allows capturing minute differences and the properties of the article's material. It allows us to exploit this and use it in varied applications.
One important application can be calculating the applicable charges on import and export according to the quality of the material. For example, there is a big difference in the import duty on varied fabrics. Still, they all may seem identical to a non-expert, or it usually becomes cumbersome to segregate the items manually. The materials appearing to be the same in the RGB camera should be charged differently according to the material's composition [6]. Here hyperspectral imaging can be used to facilitate knowledge of the properties of the material. Another practical application can be in the verification of the quality of the material. In businesses, there is a bulk sale and purchase of goods and ensuring the desired quality is their utmost priority. The use of HSI of the received articles will help to detect and match with the properties of the article promised.
Determine the quality of Food Products. The hyperspectral images of food products give essential spectral information about it. The commonly used colored image (RGB) or seen by the naked eye only helps to determine the outer health [7]. While using hyperspectral images helps determine the complete health, i.e., inner and outer health [8]. It can be helpful in the management of the supply chain to separate unhealthy food like rotten vegetables or fruits. It can be used to effectively decide the product's cost based on their health.
3 Related Work
This paper focused on applying hyperspectral imaging for document forgery detection. We proposed to detect ink mismatch in HSIs to detect document forgery.
This section elaborates the related work for document forgery using machine learning and deep learning approaches in Table 1.
4 Experimental Setup
This section elaborates the experimental setup for detecting document forgery using hyperspectral document images. We proposed a supervised neural network algorithm to detect document forgery in HSI that uses spectral information to classify the image’s pixels (Fig. 2).
4.1 Database and Preprocessing
The WIHSI database [13] contains images of seven subjects. A single image in the database comprises five lines, all with same color (blue/black) but distinct ink. They are written in English by the subject. So, a total of 14 HSIs having 752 * 480 pixels, spanning across of 33 bands from 400 to 720 nm at a step of 10 nm, were captured [18]. The illumination in the images is non-uniform. Each hyperspectral image is exposed to preprocessing for further experimentation.
4.2 Preprocessing
We aim to process the data in such a way that it is ready for further experimentation. The first step is to separate each line to extract the background pixels via image thresholding. The global thresholding techniques fail to give satisfactory results as the image has non-uniform illumination. Sauvola’s binarization method is used [9, 10] in this case as it threshold the image locally instead of globally. After this step, we have five hyperspectral data cubes, each containing an English phrase from every image along with their binary masks.
The objective of this work is to detect different inks in the same document with their unique spectral signature. To accomplish this, inks of the same subject were mixed in varying proportions [9, 10, 13]. No two different colored inks were mixed together, as it can be distinguished visually. Samples were generated using two, three, four, and five inks in equal and unequal proportions.
4.3 Proposed Approach
Principal Component Analysis is implemented before passing the image to the neural network to reduce the dimensions. With PCA, we can get rid of some of the features and map our dataset into a reduced subspace without losing essential information about the original dataset.
The objective is to extract the spatial features by preserving the spectral information of the HSI. The number of features to preserve has to be decided in this step. The number of principal components was varied from (3, 4, 5, 8, 9, 11, 13, 15, 17, 19, and 21). Its impact on accuracy was noted. It was concluded that with a count of 9 or above, the accuracy hardly changed. After the analysis, we selected 9 as the count.
Deep Learning is a cutting-edge technology that automatically captures features from a large dataset [19, 20]. A CNN consists of various layers: convolutional, activation, and pooling layers, followed by a connected layer that produces the output. The extracted spectral features were passed through a CNN model, as illustrated in Fig. 3 for classification.
5 Results
To analyze the proposed approach, we investigated the accuracy of ink mixing proportions for blue and black inks. The computed results are compared with the state-of-art approaches as illustrated in Table 2. The results depicted that black inks were challenging to identify compared to blue ink [19].
6 Future Prospects
The present study proposed a supervised deep learning-based method combined with PCA in HSIs for forgery detection. We evaluated the proposed approach by combining different inks in various proportions. The results clearly stated the effectiveness of the proposed approach. The present work may be extended using hybrid spectral and spatial features for forgery detection. Moreover, the supervised deep learning approach demands to know the count of the inks mixed in proportion in advance, which imposes a constraint for practical application. Unsupervised deep learning techniques may be examined to overcome this limitation.
References
Jaiswal G, Sharma A, Yadav SK (2021) Critical insights into modern hyperspectral image applications through deep learning. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1426
Jaiswal G, Sharma A, Yadav SK (2022) Deep feature extraction for document forgery detection with convolutional autoencoders. Comput Electr Eng 99(107770)
Jaiswal G, Sharma A, Yadav SK (2019) Analytical approach for predicting dropouts in higher education. Int J Inf Commun Technol Educ (IJICTE) 15(3):89–102
Jyothsnaa S, Gandhe A, Deshpande A, Bodas S (2010, November) Automated inventory management and security surveillance system using image processing techniques. In: TENCON 2010–2010 IEEE Region 10 Conference. IEEE, New York, pp 2318–2321
Siche R, Vejarano R, Aredo V, Velasquez L, Saldana E, Quevedo R (2016) Evaluation of food quality and safety with hyperspectral imaging (HSI). Food Eng Rev 8(3):306–322
Lorente D, Blasco J, Serrano AJ, Soria-Olivas E, Aleixos N, Gomez-Sanchis J (2013) Comparison of ROC feature selection method for the detection of decay in citrus fruit using hyperspectral images. Food Bioprocess Technol 6(12):3613–3619
Khan Z, Shafait F, Mian A (2015) Automatic ink mismatch detection for forensic document analysis. Pattern Recogn 48(11):3615–3626
Abbas A, Khurshid K, Shafait F (2017, November) Towards automated ink mismatch detection in hyperspectral document images. In: 2017 14th IAPR International conference on document analysis and recognition (ICDAR), vol 1. IEEE, New York, pp 1229–1236
Khan MJ, Yousaf A, Abbas A, Khurshid K (2018) Deep learning for automated forgery detection in hyperspectral document images. J Electron Imaging 27(5):053001
Khan MJ, Khurshid K, Shafait F (2019, September) A spatio-spectral hybrid convolutional architecture for hyperspectral document authentication. In: 2019 International conference on document analysis and recognition (ICDAR). IEEE, New York, pp 1097–1102
Islam AU, Khan MJ, Khurshid K, Shafait F (2019, December) Hyperspectral image analysis for writer identification using deep learning. In: 2019 Digital image computing: techniques and applications (DICTA). IEEE, New York, pp 1–7
Devassy BM, George S (2019) Ink classification using convolutional neural network. NISK J 12:1–16
Qureshi R, Uzair M, Khurshid K, Yan H (2019) Hyperspectral document image processing: applications, challenges and future prospects. Pattern Recogn 90:12–22
MelitDevassy B, George S, Nussbaum P (2020) Unsupervised clustering of hyperspectral paper data using t-SNE. J Imaging 6(5):29
Silva CS, Pimentel MF, Honorato RS, Pasquini C, Prats-Montalbán JM, Ferrer A (2014) Near infrared hyperspectral imaging for forensic analysis of document forgery. Analyst 139(20):5176–5184
Luo Z, Shafait F, Mian A (2015, August) Localized forgery detection in hyperspectral document images. In: 2015 13th International conference on document analysis and recognition (ICDAR). IEEE, New York, pp 496–500
Lian Y, Liang L, Li B (2017) Hyperspectral imaging technology for revealing the original handwritings covered by the same inks. J Forensic Sci Med 3(4):210
Khan Z, Shafait F, Mian A (2013, August) Hyperspectral imaging for ink mismatch detection. In: 2013 12th International conference on document analysis and recognition. IEEE, New York, pp 877–881
Jaiswal G, Sharma A, Yadav SK (2021) Efficient ink mismatch detection using supervised approach. In: Singh M, Tyagi V, Gupta PK, Flusser J, Ören T, Sonawane VR (eds) Advances in computing and data sciences. ICACDS 2021. Communications in computer and information science, vol 1440. Springer, Cham. https://doi.org/10.1007/978-3-030-81462-5_65
Tomar A et al (2020) Machine learning, advances in computing, renewable energy and communication, LNEE vol 768. Springer Nature, Berlin, 659 p. https://doi.org/10.1007/978-981-16-2354-7. ISBN 978-981-16-2354-7
Khan Z, Shafait F, Mian A (2013, August) Hyperspectral imaging for ink mismatch detection. In: 2013 12th International conference on document analysis and recognition. IEEE, pp 877-881
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Singh, R., Jaiswal, G., Jain, A., Shrama, A. (2022). Classification and Feature Extraction for Document Forgery Images. In: Tomar, A., Malik, H., Kumar, P., Iqbal, A. (eds) Proceedings of 3rd International Conference on Machine Learning, Advances in Computing, Renewable Energy and Communication. Lecture Notes in Electrical Engineering, vol 915. Springer, Singapore. https://doi.org/10.1007/978-981-19-2828-4_68
Download citation
DOI: https://doi.org/10.1007/978-981-19-2828-4_68
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2827-7
Online ISBN: 978-981-19-2828-4
eBook Packages: Computer ScienceComputer Science (R0)