Abstract
Classification of remotely sensed data requires the modelling of suitable image processing algorithms. The rise of machine learning systems upgraded the viability of satellite image applications. Using Convolutional Neural Networks (CNN), benchmark classification exactness can be accomplished for land cover grouping. Motivated by the concept of Normalized Difference Vegetation Index (NDVI), this paper utilizes only the red and near infrared (NIR) band information for classifying the publicly available SAT-4 and SAT-6 datasets. This is done, since NDVI computation requires only the two band (red and NIR) information and the classes involved in the dataset are types of vegetation. In this work, new deep learning architectures for three different networks (AlexNet, ConvNet, VGG) were proposed by hypertuning the network and the input as two band data. The modified architectures with the two band information along with reduced number of filters were trained and tested model manages to classify the images into different classes. The proposed architectures are compared against the existing architectures in terms of accuracy, precision and trainable parameters. The proposed architecture is found to perform equally efficient by retaining high accuracy with less number of trainable parameters, when compared against the the performance of benchmark deep learning architectures for satellite image classification.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Image classification is regarded to be an element of two real variables, for instance, a(x,y) with a as the amplitude (e.g.brightness) of the image at the real coordinate position (x,y). A picture might be considered to contain sub-pictures, in some cases alluded to as regions– of– interests (ROIs). This idea mirrors the way that images as often as possible contain accumulations of items every one of which can be the reason for a locale. In a refined image processing framework, it ought to be conceivable to apply particular image preparing tasks to selected locales. Pictures gained through current sensors might be polluted by an assortment of noise sources.
Emergence of image classification algorithms finds applications in variety of fields. Numerous components have to be taken into account in image classification process. This intricate process may incorporate assurance of a reasonable characterization framework, choice of preparing tests, image preprocessing, feature extraction, choice of appropriate grouping approaches, post-classification handling, and exactness evaluation [15]. The outline of the characterization system, and the nature of the classification results are also real strides of image classification. The major factors involved in image classification is explained below:
1.1 Sensor data selection
For image classification, requirement of proper selection and analysis of remote sensed data is essential. Hence, a good understanding about the strength and weakness of the sensor data plays an important role in image classification. Atmospheric condition, nature of classification and scale of the study area plays a vital role in suitable selection of the sensor data. Different classification scales are needed to classify objects at different levels [3, 12, 15]. For example, a fine scale classification system is needed at local level, a medium scale system for regional level and a coarse scale classification system for global level.
1.2 Selection of mapping approach
Image processing is a strategy to perform a few operations on an image, done by manipulating and analyzing the image properties. Processing aims to get an improved version of an image or to retrieve the basic information from it. This can be called as a type of signal processing in which, input is an image and yield might be image or attributes/highlights related with the image [5, 27]. Images are of different types, one among such is satellite image. Using different mapping approaches, the data from the multispectral dataset can be transformed into information needed by the user [4]. Using thematic remote sensing and quantitative remote sensing, both image classification and modelling of land information can be extracted from the satellite image dataset [11, 13]. From parametric to non-parametric classifiers, from pixel to object classifiers, from hard to soft classifiers are the recent advances in satellite image classification [16]. etc. This involves,
-
1.
Improvement of segments of the grouping calculation, including preparing and learning.
-
2.
Advancement of new frameworks level methodologies that increase the fundamental classifier calculations.
-
3.
Exploitation of various sorts of information or subordinate data in the arrangement procedure.
These factors have to be considered, while defining the mapping approach.
1.3 Feature extraction
Satellite images contains rich source of significant information. There arises the need for classification of these images. Variety of techniques are there to group these images [2]. The well-known parameter for grouping of land cover is Normalized Difference Vegetation Index [19]. It utilizes the red and NIR band data to survey the presence and absence of live green vegetation [6, 21]. The formula used for calculating NDVI is given by,
Recent researches reported that it is possible to redefine the method for thematic information extraction using several new classification algorithms. By defining a mapping procedure, thematic information can be derived from satellite images [10, 28]. The mapping approach needs to consider the following factors such as, the attributes of the satellite information to be utilized, specialized particulars of the final map, attributes of the geological territory to be mapped, accessibility of subordinate information [14, 29]. Using convolutional neural networks, high classification accuracy can be achieved in characterization of land cover. Convolutional Neural Networks (CNN) has picked up prevalence throughout the years, since it is able to learn expressive descriptors [1, 22]. Land cover gathering and scene understanding in aeronautical pictures depend progressively on significant frameworks to achieve new best in class comes about. CNN’s have transformed into a noticeable learning machine and discover applications in the fields of common dialect handling, hyperspectral picture arrangement, therapeutic picture investigation, clothing matching and micro-video enhancement for the venue categorization [8, 17, 20, 24,25,26]. The guideline vitality of CNN lies in its significant engineering, which takes into account separating a course of action of segregating features at different levels of deliberation [9, 18].
The existing architecture which uses all the four bands (Red, Green, Blue and Near Infrared (NIR)), requires a lot of computation in terms of trainable parameters [23]. However, Normalized Difference Vegetation Index (NDVI) utilizes the red and NIR band to survey the presence and absence of live green vegetation [6, 21]. Motivated from the concept of NDVI, the present work is the hyperparameter tuning (modification) of the standard architectures to classify the vegetation land cover present in SAT-4 and SAT-6 datasets, using only red and NIR band information, instead of all the four bands.
The major contributions of the present work are as follows:
-
The modification of three standard architectures (Alexnet, Convnet and VGG) to classify the classes of SAT - 4 and SAT-6 dataset using only the two band (Red (R) and Near-Infrared (NIR)) information of the data.
-
The hyperparameter tuning (filter size and the number of filters in all the layers) of all the three networks with two band to classify the landcover classes of SAT-4 and SAT-6 datasets.
2 Architecture
Nowadays, there is a wide and varied amount of architectures and algorithms that are utilized in profound learning systems. One among it, is Convolutional Neural Networks (CNN), which plays a significant role in image processing, natural language processing and video recognition [7]. The different deep learning frameworks used in this study include AlexNet, ConvNet, VGG.
2.1 AlexNet
AlexNet architecture, composed of 22 layers is a deep convolutional neural network. The network starts with a convolution layer and is followed by a rectified linear unit (RELU) transfer function. Next, is a maximum pool, which progressively reduce the picture estimate. This restrains the measure of framework estimation, parameters and shields from overfitting. Again the same pattern is followed (CONV-RELU-POOL). The seventh layer is again a convolution layer, soon after which, is a transfer function RELU. Next four layers follows the pattern convolution followed by relu (CONV-RELU). Next is a max-pooling layer, after which, comes the fully connected layers, dropout and threshold layers. At last, a softmax classifier to arrange the images into various classes [23]. The detailed explanation of architecture with the output dimension of each layer is discussed below.
Input patch of size 28 × 28 × 4 is convolved with 16 filters of dimension 4 × 3 × 3 with 1 pixel overlap generates dimension of 16 × 26 × 26. This is computed using the formula ((W − F + 2P)/S) + 1, where W refers to input size, F refers to filter size, S is stride and P is zero padding. In the above computation, W is 28, F is 3, S is 1 and P is 0. This is followed by a RELU activation function, in which dimension remains unchanged. Next comes a 2 × 2 max-pool layer in which, image is downsampled by 2 yields an output dimension of 16 × 13 × 13. This is again convolved with 48 filters of size 3 × 3 with a pixel overlap and padding of 1 produces 48 × 13 × 13. Followed by this, comes an activation and max pool with a filter size of 3 × 3 and overlap of 2, produces an output of 48 × 6 × 6. Then, convolution with 96 filters of size 3 × 3 with stride and padding of 1 results in 96 × 6 × 6 and is followed by an activation layer. 96 × 6 × 6 is convolved with 64 filters of size 3 × 3 with single stride and padding produces 64 × 6 × 6 as output. The image is downsampled by 2 yields 64 × 3 × 3 as output. Next, is a simple linear layer that converts 64 × 3 × 3 to (64 ∗ 3 ∗ 3) × 1 = 576 × 1. Followed by this, comes a dropout with a likelihood of 0.5, and a simple linear one, which transforms 576 × 1 to 200 × 1. This is followed by a dropout, linear and threshold layers, which contains similar parameters. Final layer classifies the image patches into their corresponding classes using a 4-way softmax classifier.
2.2 ConvNet
A ConvNet design is made out of three phases and each stage comprises of three sorts of layers called filter bank layer, non-linearity layer and pooling layer. The architecture sums upto a total of 10 layers, comprising two convolution and two fully connected layer. The first is a convolution layer in which, most of the computation occurs followed by a tangent activation function and a max-pool. This same pattern is repeated in the next stage. A reshape layer is used to transform the given yield volume into a 1D tensor. Followed by this is a fully connected layer and a tangent layer. Finally, a linear layer transforms the output into given number of classes. Thus, classification of images into different classess is managed using a softmax layer.
Input image of size 28 × 28 × 4, where 4 refers to the total number of bands are convolved with 32 filters of size 5 × 5 producing an output volume of size 32 × 24 × 24. This is followed by a tanh activation function. Image gets downsampled by a factor of 3 in max-pooling layer, which produces an output volume of size 32 × 8 × 8. Next, a second convolution layer with 64 filters of size 5 × 5, which is convolved with 32 × 8 × 8 to produce an output of 64 × 4 × 4. Again dimension is reduced by a factor of 2 resulting into an image of size 64 × 2 × 2. The reshape layer converts given volume of size 64 × 2 × 2 to an output volume of size (64 ∗ 2 ∗ 2) × 1 = 256 × 1. This is followed by two fully connected layers. In this layer, 256 inputs are mapped into 200 hidden units. Finally, a soft-max classifier classifies the image into 4 classes.
2.3 VGG
The model comprises of a total of 59 levels. In this network, dropout and batch normalization layers are used frequently inorder to quicken the whole process. The first stage constitutes the 4 layers: convolution layer, batch normalization, relu, dropout. Input image of size 28 × 28 × 4 is convolved with 64 filters of dimension 3 × 3 producing an output of 64 × 28 × 28. Followed by this is, a batch normalization of 0.001 and an activation layer. The fourth layer is a dropout with a probability of 0.3. The next four stages follows the similar pattern of (Convolution-Batch Normalization- Relu- Dropout) but with different number of filters, i.e. second stage is having 64 input filters and 128 output filters, third stage is having 128 input filters and 256 output filters, fourth stage with 256 input filters and 512 output filters and finally last stage with 512 number of input and output filters. At last, some fully connected layers and a softmax classifier which is managed to classify images into different classes.
3 Dataset
The experiments are performed on multispectral SAT-4 and SAT-6 datasets. Both are obtained from NAIP dataset [23]. Each image is of size 28 × 28, with four bands namely Red, Green, Blue and NIR. In SAT-4 multispectral dataset, a total of 500,000 images are present, out of which 400,000 is used for training and 100,000 for testing. Whereas in SAT-6 dataset, a total of 405,000 image patches are there, out of which 324,000 are used for training and rest 81,000 for testing. The four classes in SAT-4 dataset include barren land, trees, grasslands, and all other land covers other than the mentioned above. A total of six classes are present in SAT-6 dataset like barren land, trees, grasslands, roads, buildings, waterbodies [2, 23]. The sample images of SAT-4 and SAT-6 datasets are shown in Fig. 1.
4 Proposed experimental framework
Motivated by the concept of Normalized Difference Vegetation Index, the experiment in this study utilizes only red and nir band information. The two band information (red and NIR) is first extracted from the multispectral SAT-4 and SAT-6 datasets and rest of the steps are performed, which is described below. The two band information obtained from SAT-4 and SAT-6 dataset is fed into the above explained three networks separately, with similar hyperparameters used in the existing architecture is trained and the model obtained is able to effectively classify the images into different (4 or 6) classes. The other contribution in this work is the two band information from both datasets along with the reduced number of filters is given to each of the three networks and trained. Finally, the tested model manged to classify the images into different (4 or 6) classes.
The proposed methodology contains the following steps:
-
1.
Input : Set of training and testing images with ground-truth class labels.
-
2.
Extract the red and the NIR band from the training images.
-
3.
Change the depth of the filter in the first layer of the architecture from 4 (red, green, blue and NIR) to 2 (red and NIR only).
-
4.
Change the number of neurons in the output layer as 4 (number of land-cover classes of the input dataset).
-
5.
Train the architecture and obtain the model.
-
6.
Evaluate the obtained model for the testing images and compute the performance assessment metrics called accuracy and precision.
-
7.
Tune the network with the hyperparameters namely the size and the number of filters in each and every layer of the architecture based on the training and the validation accuracy.
-
8.
Obtain the model for the hyperparameters tuned architecture.
-
9.
Repeat the experiment for SAT-6 dataset by changing the number of neurons in the output layer as 6.
-
10.
Repeat the entire set of experiments for all the three standard architectures (Alexnet, Convnet and VGG) for satellite image classification.
The proposed architectures for the multispectral dataset is described below:
4.1 AlexNet with two band information for satellite image classification
In this proposed method, 16 filters of dimension 2 × 3 × 3 is convolved with input image (with two band information) of size 28 × 28 × 2 . Here, single pixel overlap is used, which in-turn generates an yield of dimension 16 × 26 × 26. Using the formula, ((W − F + 2P)/S) + 1, yield dimension can be calculated, where W refers to input size, F refers to filter size, S is stride and P is zero padding. In the above computation, W, F, S and P is 28, 3, 1, 0 respectively. Then comes a RELU non-linearity layer and a max-pool of dimension 2 × 2. This in turn generates an output dimension of 16 × 13 × 13. Output dimension of 48 × 13 × 13 is obtained as a result of convolution with 48 filters of size 3 × 3. In this computation, both stride and padding is 1. Again comes the pattern, RELU-Max pool with a filter size of 3 × 3 and overlap of 2 produces an output of 48 × 6 × 6. 96 × 6 × 6 is generated as a result of convolution between 48 × 6 × 6 and 96 filters of size 3 × 3 with stride and padding of 1. This is followed by an activation layer. The above obtained yield dimension is again convolved with 64 filters of size 3 × 3 with stride and padding of 1 produces 64 × 6 × 6 as output. 64 × 3 × 3 is yielded as a result of max pool by a factor 2. Next, is a simple linear layer that converts 64 × 3 × 3 to (64 ∗ 3 ∗ 3) × 1 = 576 × 1. Next, with a probability of 0.5 comes dropout and a linear layer, which transforms 576 × 1 to 200 × 1. Last three layers are dropout, linear and threshold layers. Finally, using the softmax classifier images are grouped into different classes according to the dataset. The above explained architecture applied on SAT-6 dataset is depicted in Fig. 2.
4.2 Hyperparameter tuned AlexNet with two band information
In this proposed framework, hypertuning is done for the number of filters in each of the convolution layers, rest all the other layers seems similar to the existing architecture. Layers in which hypertuning is done is explained below:
-
1.
4 filters of dimension 2 × 3 × 3 is convolved with input image (with two band information) of size 28 × 28 × 2 . Here, single pixel overlap is used, which in-turn generates an yield of dimension 4 × 26 × 26.
-
2.
12 filters of size 3 × 3 is convolved with output from the max pooling layer resulting into output dimension of 12 × 13 × 13.
-
3.
24 × 6 × 6 is generated as a result of convolution between 12 × 6 × 6 and 24 filters of size 3 × 3 with stride and padding of 1.
-
4.
The above obtained yield dimension is again convolved with 16 filters of size 3 × 3 with stride and padding of 1 produces 16 × 6 × 6 as output.
-
5.
16 × 6 × 6 is downsampled by 2, and is followed by a simple linear layer that converts (16 ∗ 3 ∗ 3) × 1 = 144 × 1.
The application of AlexNet using SAT-4 and SAT-6 dataset is depicted in Fig. 3.
4.3 ConvNet with two band information for satellite image classification
In this architecture, 32 filters of dimension 2 × 5 × 5 is convolved with input patch of size 28 × 28 × 2, produces an output volume of dimension 32 × 24 × 24. Followed by this, is a tangent non-linearity layer and a max pool of dimension 3 × 3. The resulting output is of dimension 32 × 8 × 8, as a result of downsampling by a factor of 3. The next three layers follows the same pattern. 64 × 4 × 4 is yielded as a result of convolution with 64 filters of size 5 × 5. Next is, a tanh activation. The third layer of this second stage is a max pool of dimension 2 × 2, produces an output of size 64 × 2 × 2. Output volume of size (64 ∗ 2 ∗ 2) × 1 is reshaped into 256 × 1. Then two fully connected layers follow, in which 256 number of hidden neurons are mapped into 200 neurons. At last, a softmax layer effectively groups the images into classes depending on the dataset. This is followed by two fully connected layers. In this layer, 256 inputs are mapped into 200 hidden units. Finally, a soft-max classifier classifies the image into different classes depending on the dataset. The above explained architecture is depicted in Fig. 4.
4.4 Hyperparameter tuned ConvNet with two band information
Layers in which hypertuning is done is explained below:
-
1.
In this architecture, 8 filters of dimension 2 × 5 × 5 is convolved with input patch of size 28 × 28 × 2, produces an output volume of dimension 8 × 24 × 24.
-
2.
16 × 4 × 4 is yielded as a result of convolution with 16 filters of size 5 × 5.
-
3.
24 × 6 × 6 is generated as a result of convolution between 12 × 6 × 6 and 24 filters of size 3 × 3 with stride and padding of 1.
-
4.
The above obtained yield dimension is again convolved with 16 filters of size 3 × 3 with stride and padding of 1 produces 16 × 6 × 6 as output.
-
5.
16 × 6 × 6 is downsampled by 2, and is followed by a simple linear layer that converts (16 ∗ 3 ∗ 3) × 1 = 144 × 1.
The remaining layers is similar to that of the existing architecture. This is shown in Fig. 5.
4.5 VGG with two band information for satellite image classification
In this architecture, 4 levels constitutes the first stage, which followes the pattern: convolution, batch normalization, relu, and dropout. Input image of size 28 × 28 × 2 is convolved with 64 filters of dimension 3 × 3 with padding and stride of 1, produces 64 × 28 × 28 as output. Next is, a batch normalization of 0.001 and a non-linearity relu function. The last level in the first stage is dropout with a probability of 0.3. Then comes a 2 × 2 max pool layer. The next stage takes the input 64 × 14 × 14 and perform convolution with 128 filters of dimension 3 × 3. Followed by this, is a batch normalization, relu and dropout with a probability 0.4. The third stage has a convolution layer with 128 number of input filters and 256 number of output filters. The other layers in this stage follows similar pattern (batch normalization-relu-dropout). Next stage has a convolution layer with 256 number of input filters and 512 output filters and rest layers remains same. Final stage is having convolution layer with 512 number of input and 512 number of output filters. Last is some fully connected layers, in which every neuron in previous layers are connected to each and every neuron in the next layer. Finally, a softmax classifier classifies the images depending on the dataset. The architecture is shown in Fig. 6.
4.6 Hypertuned VGG with two band information
Layers in which hypertuning is done is explained below:
-
1.
Input image of size 28 × 28 × 2 is convolved with 16 filters of dimension 3 × 3 with padding and stride of 1, produces 16 × 28 × 28 as output.
-
2.
16 × 28 × 28 is convolved with 32 filters of dimension 3 × 3.
-
3.
The third stage has a convolution layer with 32 number of input filters and 64 number of output filters.
-
4.
Next stage is having a convolution layer with 64 number of input filters and 128 output filter.
The remaining layers are similar to that of the existing architecture and the architecture is shown in Fig. 7.
5 Results and discussions
The analysis of the proposed methods in terms of accuracy, precision and total number of trainable parameters is presented in this section.
A comparitive study of the performance of three different networks such as AlexNet, ConvNet, VGG on the publicly available SAT-4 and SAT-6 datasets is explained in detail.
5.1 SAT-4 experimental results
Concerning the experiments performed on the SAT-4 dataset, the computed exactness and precision after the use of the extraordinary profound learning systems are summarized in the Tables 1, 2 and 3. Performance comparison of the proposed architectures against the benchmark for SAT-4 dataset were also presented.
Comparing with the results (accuracy and precision) obtained with existing and proposed architectures, it can be seen that the proposed architecture for SAT-4 dataset is able to maintain almost the same level of accuracy and precision rates, while compared against the existing architectures.
5.2 SAT-6 experimental results
Concerning the experiments performed on the SAT-6 dataset, the computed exactness and precision after the use of the extraordinary profound learning systems are summarized in the Tables 4, 5 and 6.
The Tables 4, 5 and 6 depicts the comparison of the existing architectures (AlexNet, ConvNet and VGG) against the proposed architectures.
In comparison to the results (accuracy and precision) obtained with existing and proposed architectures, it can be seen that the proposed architecture for SAT-6 dataset is able to maintain almost the same level of accuracy and precision rates while compared against the existing architectures.
5.3 Results based on comparison of trainable parameters
This section describes the results obtained in terms of total number of trainable parameters. Total trainable parameters are calculated for both the existing and proposed architecture. Comparison between existing and hypertuned architecture is depicted in the Tables 7, 8 and 9.
The number of trainable parameters for the proposed architecture with two band input alone and no hyperparameter tuning, is found to have the similar number of trainable parameters, except at the first convolution layer, where the input changes from 4 bands to 2 bands. Hence the number of trainable parameters for the first convolution layer in the AlexNet architecture is 304. This is calculated by ((F × F × D) + 1)K, where F is the filter dimension, D is the depth of the filter, K refers to the number of filters. Comparison of trainable parameters is done for existing and proposed (Modified 2 Band) architectures for the three different networks is summarized in the Tables 7, 8 and 9.
Analysis of the estimated performance rates against the total trainable parameters in the existing and proposed architectures, it is inferred that the proposed architecture is able to achieve same performance rates with less number of trainable parameters. Hence, the proposed architecture is efficient to retain high accuracy and precision rates with reduced number of trainable parameters when compared against the existing one.
6 Conclusion
In this paper, the performance of three different deep learning systems for the classification of multispectral datasets were presented. The tests were performed on the publicly available SAT-4 and SAT-6 datasets. The proposed design contains less number of trainable parameters yet similarly productive as that of the current engineering. Also the capability of NDVI is projected in this work, which can stay as a singular parameter for classification of landcover. The proposed architecture is proficient enough for classification of DeepSat dataset with less number of trainable parameters, which changed the present four band convolutional neural system to two bands with the decreased number of channels in satellite picture characterization. As a future work, further reduction in number of trainable parameters can be analyzed.
Change history
27 July 2024
This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1007/s11042-024-19943-w
References
Audebert N, Le Saux B, Lefèvre S (2017) Segment-before-detect: vehicle detection and classification through semantic segmentation of aerial images. Remote Sens 9(4):368
Basu S, Ganguly S, Mukhopadhyay S, DiBiano R, Karki M, Nemani R (2015) Deepsat: a learning framework for satellite imagery. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, p 37
Bragilevsky L, Bajić IV (2017) Deep learning for Amazon satellite image analysis. Commun Comput Signal Process (PACRIM), 1–5
Chen H, Wang Y, Gao S (2017) Assessing relationship of air quality index and vegetation type using hyperspectral remote sensing. In: Geoscience and remote sensing symposium (IGARSS), pp 4878–4881
Chippy J, Jacob NV, Renu RK, Sowmya V, Soman K (2017) Least square denoising in spectral domain for hyperspectral images. Procedia Comput Sci 115:399–406
Dahigamuwa T, Yu Q, Gunaratne M (2016) Feasibility study of land cover classification based on normalized difference vegetation index for landslide risk assessment. Geosciences 6(4):45
Dev S, Wen B, Lee YH, Winkler S (2016) Ground-based image analysis: a tutorial on machine-learning techniques and applications. IEEE Geosci Remote Sens Mag, 79–93
Dixon K, Deepa M, Ajay A, Sowmya V, Soman KP (2016) Aerial and satellite image denoising using least square weighted regularization method. Ind J Sci Technol 9(30):1–10
Dutta S, Manideep BC, Rai S, Vijayarajan V (2017) A comparative study of deep learning models for medical image classification. IOP Conf Series: Mater Sci Eng 263(4):042097
Haridas N, Aswathy C, Sowmya V, Soman K (2016) Hyperspectral image denoising using legendre-fenchel transform for improved sparsity based classification. Intell Syst Technol Appl, 521–528
Jeevalakshmi D, Reddy SN, Manikiam B (2016) Land cover classification based on NDVI using LANDSAT8 time series: a case study Tirupati region. Commun Signal Process (ICCSP), 1332–1335
Kaiser P, Wegner JD, Lucchi A, Jaggi M, Hofmann T, Schindler K (2017) Learning aerial image segmentation from online maps. IEEE Trans Geosci Remote Sens 55(11):6054–6068
Li H, Tao C, Wu Z, Chen J, Gong J, Deng M (2017) RSI-CB: a large scale remote sensing image classification benchmark via crowdsource data. arXiv:1705.10450
Liu Q, Hang R, Song H, Li Z (2018) Learning multiscale deep features for high-resolution satellite image scene classification. IEEE Trans Geosci Remote Sens 56 (1):117–26
Lu D, Weng Q (2007) A survey of image classification methods and techniques for improving classification performance. Int J Remote Sens 28(5):823–870
Lunga D, Yang HL, Reith A, Weaver J, Yuan J, Bhaduri B (2018) Domain-adapted convolutional networks for satellite image classification: a large-scale interactive learning workflow. IEEE J Selected Topics Appl Earth Observ Remote Sens 11(3):962–77
Makantasis K, Karantzalos K, Doulamis A, Doulamis N (2015) Deep supervised learning for hyperspectral data classification through convolutional neural networks. Geosci Remote Sens Symp (IGARSS), 4959–4962
Moorthi SM, Misra I, Kaur R, Darji NP, Ramakrishnan R (2011) Kernel based learning approach for satellite image classification using support vector machine. Recent Adv Intell Comput Syst (RAICS), 107–110
Nath SS, Mishra G, Kar J, Chakraborty S, Dey N (2014) A survey of image classification methods and techniques. Control, Instrum, Commun Comput Technol (ICCICCT), 554–557
Nie L, et al (2017) Enhancing micro-video understanding by harnessing external sounds. ACM Int conf on multimedia, pp 1192–1200
Özbay B, Ċimtay Y, Kandaz F (2017) Calculation of vegetation index for short wave infrared hyperspectral images. In: Signal processing and communications applications conference (SIU), pp 1–3
Paisitkriangkrai S, Sherrah J, Janney P, van den Hengel A (2016) Semantic labeling of aerial and satellite imagery. IEEE J Selected Topics Appl Earth Observ Remote Sens 9(7):2868–2881
Papadomanolaki M, Vakalopoulou M, Zagoruyko S, Karantzalos K (2016) Benchmarking deep learning frameworks for the classification of very high resolution satellite multispectral data. ISPRS Ann Photogram Remote Sens Spat Inf Sci 3(7):83–88
Sachin Rajan, Sowmya V, Govind D, Soman KP (2017) Dependency of various color and intensity planes on CNN based image classification. In: International symposium on signal processing and intelligent recognition systems, pp 167–177
Song S, et al (2017) NeuroStylist: neural compatibility modeling for clothing matching. ACM Int conf on multimedia, pp 753–761
Song S, et al (2018) Neural compatibility modeling with attentive knowledge distillation. In: Int ACM SIGIR conference on research & development in information retrieval, pp 5–14
Srivatsa S, Sowmya V, Soman K (2016) Least square based fast denoising approach to hyperspectral imagery. Intell Comput Techniques: Theory Practice Appl, 22–24
Xu D, Sun L, Luo J, Liu Z (2013) Analysis and denoising of hyperspectral remote sensing image in the curvelet domain. Mathematical Problems in Engineering
Zhang C, Hao X, Bai J, Dai M (2014) Improving hyperspectral data classification of satellite imagery by using a sparse based new model with learning dictionary. Hyperspectral Image Signal Process Evol Remote Sens, 1–4
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s11042-024-19943-w
About this article
Cite this article
Unnikrishnan, A., Sowmya, V. & Soman, K.P. RETRACTED ARTICLE: Deep learning architectures for land cover classification using red and near-infrared satellite images. Multimed Tools Appl 78, 18379–18394 (2019). https://doi.org/10.1007/s11042-019-7179-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7179-2