1 INTRODUCTION

A hyperspectral image is an image comprising a wide spectrum of light instead of classic red, green, and blue colors, which depict in common RGB images. In Fig. 1a visualization of hyperspectral image is shown. In hyperspecters, each band (layer or channel) depicts an intensity of light of a certain wavelength. Thus, the hyperspectral images have high spatial and spectral information. Considering information from infra specter opens new opportunities in data analyzing and helps in solving the issues which were unsolvable before. For example, using hyperspectral data may help in agriculture, since near-infra layers contain significant information required for determining normalized difference vegetation index (NDVI) [1]. NDVI can be computed using the following formula:

$${\text{NDVI}} = \frac{{{\text{RED}} - {\text{NIR}}}}{{{\text{RED}} + {\text{NIR}}}},$$
Fig. 1.
figure 1

A visualization of hyperspectral image.

where RED is hyperspectral’ layer represented red wavelength, and NIR is hyperspectral’ layer represented near-infra wavelength.

A visualization of NDVI is shown on Fig. 2. Figure 3 shows the meaning of NDVI. This index is widely used to measure crop health [2]. The hyperspecters are also useful in the medical field. They can help to determine perfusion parameters of tissue and wounds [3], to detect brain tumors [4], to detect head and neck cancer [5]. In the food industry, for example, hyperspecters can help to assess the chemical composition and texture of meat [6]. Another field, where the hypesrpecters can improve quality of life is geometallurgy. Here, the hyperspecters can substitute whole labs since they can be used for orebodies investigation [7]. In this paper, the agriculture field is considered mainly.

Fig. 2.
figure 2

NDVI visualization. Green area stands for healthy crops, red for unhealthy.

Fig. 3.
figure 3

Illustration of how NDVI works.

Usually, classic approaches are used to perform hyperspectral data analysis. Such algorithms as logistic regression [810], random forest [11], clustering [12], dimensionality reduction [13], and discriminant analysis [14] can perform well in different tasks of hyperspectral data classification and segmentation. Recently, neural networks become very popular in many fields of computer vision, but their application in hyperspectral data analysis has not been sufficiently studied. There are several reasons for this. The first one is the lack of labeled data. Special expertise in hyperspectral data understanding is required for the process of hyperspectral data labeling. The most popular HSI (hyperspectral imaging) benchmark consists just of fourth hyperspecters [15], whereas RGB datasets usually comprise millions of labeled images [16]. Another issue is the specificity of hyperspectral data. Hyperspecters have large spatial resolution and hundreds of bands, so there is a need for powerful algorithms which can generalize such an amount of data and special data preprocessing algorithms that can consider the hyperspecter’s nature. The last issue we mention is the process of hyperspectral data capturing. This process can take a significant amount of time; meanwhile the weather’s conditions may change. The weather can significantly affect the result of capturing. So, the preprocessing and analyzing algorithms should take this issue into account.

2 RELATED WORKS

The most popular approaches to analyze hyperspectral data include classic algorithms such as logistic regression, random forest, discriminant analysis.

Logistic regression can be applied in a naive way, i.e., to each pixel of a hyperspectral image and it will work. In most cases, considering multi-label tasks, a multinomial logistic regression (MLR) is used. Most recent researches combine MLR models with additional algorithms or use MLR for doing preprocessing steps. For instance, in [17] were considered an application of subspace-based MLR algorithm which enhances class separability by using class-dependent subspace feature vectors. These feature vectors help to manage nonlinearities and better characterize noise and mixed pixels.

Random forest (RF) also can be used naively to classify each pixel in the hyperspecters. This approach will give the result similar to the MLR gives. In recent studies, RF is used for data preprocessing. For example, in [13] RF algorithm is applied on hyperspectral images in order to estimate feature importance. Features with the highest importance can be used for image segmentation. Here, the researchers used the RF algorithm for dimensionality reduction. In [18] researchers propose several algorithms based on RF ensembles.

In [19] different variation of linear discriminant analysis (LDA) was studied. Through exhausting tests, the authors show the effectiveness of LDA, especially their modification named regularized LDA in case of ill-posed hyperspectral images classification tasks. The tasks are ill-posed because of few numbers of training samples regarding a number of spectral features (bands). In [20] the authors suggest a combination of local linear feature extraction methods and LDA. Their framework was designed, so it conducts information inferred from unlabeled samples while simultaneously maximizing class discrimination of the data inferred from the labeled samples. In [14] the authors suggest using Independent Component Discriminant Analysis (ICDA) in order to find a transform matrix that transforms components so they are independent as much as possible. These transformed components when can be used to find the Bayes rule for the classification.

Another specific type of algorithms performing hyperspectral data analysis is indices such as vegetation indices. The indices use hyperspectral bands to enhance the target properties of the image. A process of index creation requires a solid knowledge of physics and an extensive experience in the hyperspectral field. Hence, index creation is a complicated task. Despite, there are some methods which allow building custom indices for specific cases. One of such methods builds Informative Normalized Difference Index (INDI) [21].

In recent, neural networks were studied too. The most common application of neural networks leads us to pixel-wise (1D) strategy. In the following papers [2224] the authors applied convolutional neural networks to classify a single pixel of hyperspectral image. In these papers, the authors used the convolutional neural networks with architectures such as AlexNet [25], VGG [26], LeNet [27], which use dense layers at the end of the net. These architectures are pretty old nowadays. The authors use a pixel-wise strategy, so the neural networks miss neighborhoods’ pixels information.

In most novel papers, the authors are using fully-convolutional architectures such as Unet [28] which combines downsampling and up-sampling paths. In [29] the authors suggest using as input a special multi-feature fusion block instead of a raw hyperspecter. According to the result of their study, this block improves the overall accuracy of fully-convolutional networks in hyperspectral analysis.

In [30] the authors made a comparison of 1D-CNN (pixel-wise strategy), 2D-CNN (fully convolutional NN like Unet), and 3D-CNN (same as previous one, but usage 3d convolutions). By their study, there is no big difference between different architectures. But the 2D-CNN and 3D-CNN produce slightly better results in most cases.

In [31] the authors also compare 1D-CNN and 2D-CNN approaches. And, they additional study the influence of using a special feature selection layer that doesn’t use non-informative and noisy bands of a hyperspecter. They also study the influence of separating visual spectrum and near-infrared spectrum in different streams. By the result of their study, the 2D-CNN has better accuracy in all the experiments and there is no big difference in using a dual-stream neural network.

In this work, the different architecture with classic approaches was compared, as well as the influence of CNN’s input size and data preprocessing methods.

3 EXPERIMENTS SETUP

3.1 Dataset

A dataset of labeled hyperspectral images [32] was used for the experiments. The dataset comprised 385 hyperspecters with 236 bands depicting wavelength in the range from 420 to 979 nm. Each hyperspecter is labeled with a segmentation mask. The hyperspectral is labeled with 16 classes, excluding the background class. The following classes are used: apple cucumber (I), beet (II), cabbage (III), carrot (IV), corn (V), cucumber (VI), eggplant (VII), grass (VIII), milkweed (IX), oats (X), pepper (XI), potato (XII), amaranth (XIII), strawberry (XIV), soy (XV) и tomato (XVI). This dataset was manually divided into train and test sets. Figure 4 shows the classes distribution of the train set. The classes’ distribution is unbalanced. Hyperspectral images were normalized using standardization.

Fig. 4.
figure 4

Classes distribution among images.

3.2 Classic Algorithms Setup

For the experiments, we used the following algorithms: logistic regression, discriminant analysis, random forest. All the algorithms were trained using 1% of pixels from every hyperspecter and all their results were validated using k-fold with five folds.

In the experiment, a multinomial logistic regression (MLR) was used. MLR was trained using the cross-entropy loss and L2 penalty.

To perform discriminant analysis, a quadratic discriminant analysis was used.

Random forest classifier was used with gini criterion; the number of trees in the forest 100, the maximum depth of the tree selects automatically until all leaves are pure or contain a few samples.

3.3 Neural Networks Setup

All the neural networks were trained using Focal Loss [33] with gamma parameter 5.5. This loss function was chosen since it developed for the cases with class imbalance. The gamma value was chosen from the special set of experiments. Neural networks were trained with a different number of epochs as well as batch size because of the neural network’s input size. The learning rate had an initial value of 0.001 and was changed using “Cosine Annealing With Warm Restart” [34] with t_0=2,t_mult=1 parameters. Hyperspecters are dynamically augmented using rotation on a random angle and horizontal/vertical flips. Neural networks were trained using the Pytorch framework [35].

4 ARCHITECTURE OF NEURAL NETWORK

4.1 Used Architectures

In this experiment, two distinct architectures were used. The Unet is a classic neural network for performing semantic segmentation. The feature of the Unet is in a fully convolutional nature. Unet comprises down-sampling and up-sampling paths. The down-sampling path extracts features from an image and the up-sampling path performs classes’ localization. Many experiments show Unet-like architectures’ predominance in the segmentation tasks of RGB images. The second one is architecture inspired by L2Net [36]. We developed our architecture shown in Fig. 5 to achieve two goals. First, we wanted to have a small neural network with a few weights. So, it will allow us to train networks faster and the networks became more robust and generalizing. Also, we wanted to try a completely different architecture. Our architecture has no bottleneck, up sampling and down-sampling paths, which are quite unusual nowadays.

Fig. 5.
figure 5

Proposed architecture of neural network. B—batch size, N—number of bands.

4.2 Experiment Description

In this experiment, a comparison between different neural networks and classic approaches will be done. The classic approaches include multinominal logistic regression (MLR), random forest classifier (RFC), and quadratic discriminant analysis (QDA).

As the input of neural networks, PCA [37] processed hyperpsecters were used. We choose 17 primary components because we have 17 classes, including background class. The PCA is a commonly used dimensionality reduction algorithm. Hyperspectral images are usually preprocessed using PCA to significantly decrease the number of bands. Classic algorithms were trained on original hyperspecters using 1% of total pixels from each hyperspecter.

4.3 Result of the Experiment

In Table 1, we show the results of the experiment in terms of F1 metric. Neural network with our architecture showed the best result in the experiment. Because of the small number of training parameters in our neural network, it learns more general features and does not tend to overfit. The Unet showed a worse result, even the classic approaches, because it has many training parameters and tends to overfit. RFC and MLR models showed similar performance in the experiment, worse than a neural network with the proposed architecture. In some classes, classic approaches perform better than a neural network. For example, for classes IV, V, VI, VII, IX, XVI neural network has a worse value of the F1 metric than classic approaches. However, in some classes, the neural network performs much better than classic algorithms. For example, on I, VIII, X neural network of proposed architecture performs significantly better than other algorithms.

Table 1. Detailed result of the experiment comparing neural network and classical approaches in terms of F1 score

By this study, we can conduct that neural network performs better than classical algorithms. Also, we can say that a choice of architecture significantly affects the result. Unet architecture may not be suitable for our task. The proposed architecture showed better results and may be more suitable for our case.

5 DIMENSIONALITY REDUCTION

5.1 Experiment Description

In this experiment, a comparison of different approaches to hyperspecter preprocessing in terms of dimensionality reduction was done. We study the usage of different inputs of neural networks such as hyperspecters preprocessed with PCA algorithm, RGB components of hyperspecters, and raw hyperspecters. PCA algorithm was used with 17 primary components, as in the previous experiment. This experiment was performed for both Unet and the proposed architecture.

5.2 Results of the Experiment

In Table 2 results of the experiment are shown. The main conclusion is PCA impacts neural networks significantly; hence the dimensionality reduction in the case of hyperspectral images is extremely important. Instead, neural networks which take RGB components on input show result substantially worse. Hence, we can conduct that the current task is unsolvable with only RGB images. In the same way, the neural networks trained on hyperspecters with all bands show the worst result in the experiment. The main reason is a large amount of information stored in 245 original bands. The neural network, in such a case, does not generalize the knowledge.

Table 2. Detailed results of the experiment comparing different approaches to dimensionality reduction in terms of F1 score

6 HYPERSPECTER EQUALIZATION

6.1 Experiment Description

In this experiment, the influence of equalization on hyperspecters was studied. The results of training neural networks on hyperspecters preprocessed with CLAHE and histogram equalization were compared to the result of a neural network trained on a non-equalized hyperspecter. Prerequisites to use equalization in non-uniform brightness or in the process of hyperspecters capturing, which is susceptible to weather influence. In the experiment, only two architectures were used: Unet and proposed one. Before doing Hyperspectral equalization, the PCA algorithm was applied.

6.2 Results of the Experiment

In Table 3 results of the experiment are shown. By the results, histogram equalization makes the neural network performs worse than without equalization. On the other part, CLAHE equalization allows training neural networks with the same performance as neural networks trained on original hyperspecters or slightly better.

Table 3. The total results of the experiment comparing different approaches of hyperspecter preprocessing

7 SPATIAL SHAPE OF THE NEURAL NETWORK’S INPUT

7.1 Experiment Description

In this experiment, an input shape of the neural network was studied. It is an important decision which researchers should consider computing power and neural network performance. In the experiment, different shapes were tested: 1 × 1 (pixel-wise), 2 × 2, 4 × 4, 8 × 8, 16 × 16, 32 × 32, 64 × 64, 128 × 128, 256 × 256, 512 × 512 (original shape). To satisfy the input shape of the neural network, original hyperspecters were cropped using the sliding-window method. This experiment throws light on the difference between pixel-wise strategy and common semantic segmentation. As in previous experiments, the hyperspecters were preprocessed using the PCA algorithm. Two different architectures were used: Unet and proposed one. The main idea is that images of original size may provide substantial semantic information, which leads to better accuracy.

7.2 Results of the Experiment

The results of the experiments are shown in Table 4. By this study, the best results were acquired using a 128 × 128 input shape. In particular, the difference between results with pixel-wise strategy and full-size images is not as big as expected. Usage of shapes from 2 × 2 up to 128 × 128 allows achieving the best result. Nonetheless, the training process is strikingly different. Training the neural network with big images even for 128 × 128 resolutions requires much more time than other resolutions. In the other hand, dealing with cropped images requires training pipeline modification i.e., additional time may be consumed on data preparation.

Table 4. The detailed result of the experiment comparing different input’s shapes

8 CONCLUSIONS

This paper presents a set of experiments aimed at studying the usage of neural networks for hyperspectral images segmentation. Experiment shows that neural networks can achieve a good result, but the specificity of hyperspecters should be considered. The architecture and dimensionality reduction play the most important role in the hyperspecters segmentation. As experiments show, the classic architecture Unet couldn’t achieve a result better than algorithms such as logistic regression and discriminant analysis can do. Otherwise proposed in the paper architecture achieves the best results among all the models. By the study, doing dimensionality reduction with PCA is a significant step in hyperspecters preprocessing. Hyperspectral image bands’ equalization has no significant impact on models’ accuracy. The spatial resolution used in the neural networks’ input is also worth mentioning. The different input shapes affect neural networks’ training process and accuracy significantly. The experiments show that there is a tradeoff between training time and data preparation time. Also, the experiments show that pixel-wise strategy and semantic segmentation of the full hyperspecter do not allow achieving the best result. Pixel-wise strategy lacks information about neighborhoods’ pixels. For full hyperspecters, the neural networks should have a proper receptive field. By the experiments, probably the best starting point in tasks of hyperspecters semantic segmentation might be: not a big neural network with simple architecture, small input shape from 32 × 32 up to 128 × 128, preprocessing hyperspecters with PCA algorithm.