Keywords

1 Introduction

Hyperspectral Imaging is an advanced technique based on spectroscopy. It collects images over a wide and continuous range of wavelengths for the same spatial area. It captures the spatial and spectral information from the object under analysis. It divides the spectrum into many more bands as illustrated in Fig. 1. It helps to determine the composition and assigns a unique spectral signature of the underlying material. It is being used in forensics, agriculture, archeology, security, medical diagnosis and surgery, remote sensing, and many more [1]. The unique spectral signature of each material in hyperspectral imaging (HSI) gives it tremendous power in identification and verification purposes [2].

Fig. 1
figure 1

Capturing Images at multiple bands

Document forgery involves creating, falsifying, modifying the document with the wrong intention. In the present world of tools and techniques, detecting document forgery is a mysterious task that a naked eye cannot identify. In Document Analysis, the material's unique spectral signature helps us identify ink mismatches, the timeline of manuscripts, and recovering the degraded scripts. Cutting-edge technology like deep learning approaches has attained noticeable results in many domains [3].

In the present study, we utilized PCA for dimensionality reduction, and the extracted spectral features are fed to CNN to detect document forgery. The paper has been segregated into various sections. Section 2 illustrates the applications of hyperspectral imaging in various domains. Related work is elaborated in Sect. 3. The experimental setup and results are discussed in Sects. 4 and 5. Conclusion and Future work are depicted in Sect. 6.

2 Applications

Hyperspectral imaging is an emerging approach, gaining popularity for solving problems in many domains. This section highlights the application areas for hyperspectral imaging in the document analysis domain.

2.1 Document Analysis

Ink Mismatch Detection. The document forgery has always been of great concern and is frequently seen in fraudulent bank cheques, tampering with historical manuscripts and forensic evidence, and many more. If a document consists of more than one ink of similar color that may indicate a possible forgery. Currently, there are two approaches for identifying the ink mismatching-destructive and non-destructive. Non-destructive methods such as hyperspectral imaging are very useful in identifying the homogeneity of the ink.

Recovering Degraded Documents. The historical manuscripts, sometimes with time or due to some external factors, get degraded and unreadable. The unique spectral signature assigned to every element in an HSI can help to trace the text by applying image classification models [4].

Writer Identification. HSI is used to recognize the handwriting in a document. This approach helps to identify the writer or the owner of the document. It assists in identifying the modification and tampering done in the documents.

2.2 Supply Chain Management

Inventory Management. Supply chain management requires innovations and developments that may help it work in a more efficient and faster way in managing the inventory. The crucial role is to identify and also verify the articles. The problem usually comes in the verification of the article. It is a tedious task to separate articles manually and verify them. The customarily used cameras give us the RGB image, short of the subtle differences immune to naked eyes [5]. Here comes the beauty of HSIs, which allows capturing minute differences and the properties of the article's material. It allows us to exploit this and use it in varied applications.

One important application can be calculating the applicable charges on import and export according to the quality of the material. For example, there is a big difference in the import duty on varied fabrics. Still, they all may seem identical to a non-expert, or it usually becomes cumbersome to segregate the items manually. The materials appearing to be the same in the RGB camera should be charged differently according to the material's composition [6]. Here hyperspectral imaging can be used to facilitate knowledge of the properties of the material. Another practical application can be in the verification of the quality of the material. In businesses, there is a bulk sale and purchase of goods and ensuring the desired quality is their utmost priority. The use of HSI of the received articles will help to detect and match with the properties of the article promised.

Determine the quality of Food Products. The hyperspectral images of food products give essential spectral information about it. The commonly used colored image (RGB) or seen by the naked eye only helps to determine the outer health [7]. While using hyperspectral images helps determine the complete health, i.e., inner and outer health [8]. It can be helpful in the management of the supply chain to separate unhealthy food like rotten vegetables or fruits. It can be used to effectively decide the product's cost based on their health.

3 Related Work

This paper focused on applying hyperspectral imaging for document forgery detection. We proposed to detect ink mismatch in HSIs to detect document forgery.

This section elaborates the related work for document forgery using machine learning and deep learning approaches in Table 1.

Table 1 Keypoints of the related work

4 Experimental Setup

This section elaborates the experimental setup for detecting document forgery using hyperspectral document images. We proposed a supervised neural network algorithm to detect document forgery in HSI that uses spectral information to classify the image’s pixels (Fig. 2).

Fig. 2
figure 2

Experimental setup

4.1 Database and Preprocessing

The WIHSI database [13] contains images of seven subjects. A single image in the database comprises five lines, all with same color (blue/black) but distinct ink. They are written in English by the subject. So, a total of 14 HSIs having 752 * 480 pixels, spanning across of 33 bands from 400 to 720 nm at a step of 10 nm, were captured [18]. The illumination in the images is non-uniform. Each hyperspectral image is exposed to preprocessing for further experimentation.

4.2 Preprocessing

We aim to process the data in such a way that it is ready for further experimentation. The first step is to separate each line to extract the background pixels via image thresholding. The global thresholding techniques fail to give satisfactory results as the image has non-uniform illumination. Sauvola’s binarization method is used [9, 10] in this case as it threshold the image locally instead of globally. After this step, we have five hyperspectral data cubes, each containing an English phrase from every image along with their binary masks.

The objective of this work is to detect different inks in the same document with their unique spectral signature. To accomplish this, inks of the same subject were mixed in varying proportions [9, 10, 13]. No two different colored inks were mixed together, as it can be distinguished visually. Samples were generated using two, three, four, and five inks in equal and unequal proportions.

4.3 Proposed Approach

Principal Component Analysis is implemented before passing the image to the neural network to reduce the dimensions. With PCA, we can get rid of some of the features and map our dataset into a reduced subspace without losing essential information about the original dataset.

The objective is to extract the spatial features by preserving the spectral information of the HSI. The number of features to preserve has to be decided in this step. The number of principal components was varied from (3, 4, 5, 8, 9, 11, 13, 15, 17, 19, and 21). Its impact on accuracy was noted. It was concluded that with a count of 9 or above, the accuracy hardly changed. After the analysis, we selected 9 as the count.

Deep Learning is a cutting-edge technology that automatically captures features from a large dataset [19, 20]. A CNN consists of various layers: convolutional, activation, and pooling layers, followed by a connected layer that produces the output. The extracted spectral features were passed through a CNN model, as illustrated in Fig. 3 for classification.

Fig. 3
figure 3

CNN architecture

5 Results

To analyze the proposed approach, we investigated the accuracy of ink mixing proportions for blue and black inks. The computed results are compared with the state-of-art approaches as illustrated in Table 2. The results depicted that black inks were challenging to identify compared to blue ink [19].

Table 2 Comparison of the proposed approach with state-of-art results

6 Future Prospects

The present study proposed a supervised deep learning-based method combined with PCA in HSIs for forgery detection. We evaluated the proposed approach by combining different inks in various proportions. The results clearly stated the effectiveness of the proposed approach. The present work may be extended using hybrid spectral and spatial features for forgery detection. Moreover, the supervised deep learning approach demands to know the count of the inks mixed in proportion in advance, which imposes a constraint for practical application. Unsupervised deep learning techniques may be examined to overcome this limitation.