Keywords

11.1 Introduction

Satellite data requirements and advancements in sensor technology have led to large number of operational satellites and a huge volume of satellite. There are number of sensors operating in optical, microwave, and infrared bands, with different spectral, spatial, and radiometric resolutions. This data is being used in large number of applications in the field of defense intelligence, policy decision-making, urban planning, vegetation monitoring, natural resource monitoring, climate change studies, geo-hazards monitoring, ocean monitoring, and many more. Current satellite data processing techniques are based on physical retrieval, probabilistic approaches, and statistical models and are associated with a wide range of challenges [1] which includes high dimensionality, uncertainty, nonlinearity, spatial and spectral redundancy, etc. Apart from the current automated techniques for satellite data processing, still a large number of applications require manual intervention and use of human intelligence for decision-making. Artificial intelligence and machine learning techniques are trying to address these issues machine (computer) process like humans. Artificial intelligence (AI) is a term coined by Alan Turing, who was a mathematician and has laid the foundations of the computers for modern age. In around 1950s, his work transpires into broad popular comprehension and has given birth to the idea of “General AI.” But the characteristics of human intelligence, that include reasoning, interaction, understanding, and thinking, should also be possessed by the computer. The idea was to narrow down the extent of AI technologies that could be tasks specific such as various gaming software, recommendation systems, spam emails identification, etc. [2]. Machine learning in the last two decades has drawn the attention of most of the researchers as all of these tasks exhibit certain portion of human intelligence in the process.

In general, machine learning comprises of set of algorithms or models that have data (training data) specific to an application, that is, can be used for training and inference to be drawn from observed patterns (features trends) obtained through training the data. We get a generous amount of (cleaned or uncleansed) data, with the features defined manually (e.g., “weight,” “color,” “spam email,” etc.). Prediction on new data (testing data) from the inference drawn from data trained for specific features and are relatively tuned all features. For example, if we have one image and we need to make out whether the object within the image is a human being or table, we need to take a large amount of image data which contains vegetation land and water body (this is termed as “labeled data”). The algorithm now will identify the features common in all of those images for both vegetation and water. The same algorithm has to be applied to the unlabeled data, the provided image, to predict whether the image contains human being or animal. In the field of satellite image analysis, machine learning plays a vital role, since AI-driven satellite data applications are in demand for a number of important reasons:

  • The rapid advancements in machine vision in the last few years have made challenging tasks (such as identifying cars, buildings, or changes in scene over time) and can be done by machines.

  • With the satellite proliferation, camera technology enhancement, improved data storage, and transfer competencies, there is an exponential increase in amount of data being produced from satellites.

  • Satellite image analysis and interpretation performed by human imagery analysts are a costly affair

.

Machine learning models are able to recognize man-made structures in a satellite image as well as airplanes parked at an airport. Beside of all these abilities, a lot of human intervention is required in ML, such as manually choosing the features (e.g., look for the shape, size, color, texture, etc.).

Limitations of traditional machine learning model in remote sensing are:

  • It is lacking in representation of large and complex models for the systems not linear in nature.

  • Unable to handle large volumes of data

  • Difficult to learn hierarchical features and generalize

  • Lack of training data and use of domain knowledge

  • Compatibility to high-performance computing architectures

  • Accuracy in prediction, forecasting, and classification

  • Address optimization, scalability, and portability

At present, around 180 plus remote sensing satellites have already been launched in 2006–2018 and are operational. The expected number is going to rise by twice or thrice in next few decades [1]. Deep learning (DL) is a subset of ML which in turn is a subset of AI. It is based on neural networks, a conceptual model of the brain that has been around since the 1950s but largely ignored earlier. That’s because they are very computationally expensive. In recent times, processing has become sufficiently cheap and powerful through graphics processing unit (GPUs) and field-programmable gate array (FPGAs), and there has been enough data to feed the DL algorithms, and hence DL is becoming more popular nowadays. DL handles almost every limitations imposed by traditional machine learning with its advanced algorithms and power of computation. Though deep learning is also facing the challenges imposed by satellite data, we are able to overcome most of the challenges by traditional machine learning algorithms. In this chapter, we have discussed both traditional and new trends in machine learning for satellite data processing with focus on present status and future directions in both.

11.2 Machine Learning in Satellite Data Processing

Several machine learning techniques have contributed to reveal the insights of the data provided by satellites orbiting Earth. Machine learning has to be integral part of every step involved in processing satellite data, from preprocessing to decision-making. Stages of processing satellite data are shown in Fig. 11.1. It shows expansion of information domain, i.e., revealing the facts hidden inside, as we process the data from acquisition to decision and policy making to analytics.

Fig. 11.1
figure 1

Stages involved in processing satellite data

The three primary applications, namely, classification (grouping similar pixel together), segmentation (dividing the image into different regions to detect objects), and denoising (making an estimate of the obtained image), have been studied in [3]. Application areas utilizing satellite image data such as change detection, land use land cover, vegetation monitoring, etc. require classification of satellite image under investigation, whereas segmentation is required for urban growth monitoring, road extraction, building extraction and detection, etc., and denoising is a preprocessing stage that shares the importance with both classification as well as segmentation. In order to accomplish this task, various algorithms have been introduced which require the number of feature to be selected by investigator. Traditional manual features are the Histogram of Ordered Gradients (HOG), the Scale-Invariant Feature Transform (SIFT) and its variants, Speeded-Up Robust Features (SURF) [4], color histograms, etc. Principal components analysis (PCA) is one of the most commonly used feature space (dimensionality) reduction techniques. Regularized linear discriminant analysis (LDA) is used for classification of hyperspectral image. K-means clustering is popular in clustering satellite data. Sparse coding is used for sparse representation of data. There is a lot of research that has already been registered in the area of hyperspectral image (HSI) data analysis, kernel-based methods, methods with statistical learning for HSI, spectral distance functions for classification, hidden Markov random field (HMRF), generalized least squares, multi-classifier systems, fuzzy-based, spectral-spatial classification, change detection, vertex component algorithm (VCA), orthogonal subspace projection, support vector domain description (SVDD), Gaussian processes (GP), genetic algorithm, manifold learning, graph-based methods, transfer learning, endmember extraction, and spectral unmixing [5]. These algorithms in combination with few others like selection of feature through genetic algorithm in combination with fuzzy logic-based classification for accuracy improvement over high-dimensional HSI data are also found in literatures. We will discuss about few recent advancements in these fields with applications and benefits and elaborate them in subsequent sections (Table 11.1).

Table 11.1 Various traditional machine learning algorithms with applications

11.2.1 Satellite Image Classification

Machine learning in general is categorized broadly into two categories: supervised and unsupervised. Supervised learning deals with the training data which has labels associated with it. Neural network [6] and SVM have gained most successful models under this category for processing satellite data. In unsupervised learning, data itself is used for learning the patterns (features). Fuzzy -based clustering [7, 8] and multiobjective optimization [9] are used for developing spatial membership relations. A fusion of information from multiple sources has been introduced with graph cuts [10], projection based [11], hierarchical clustering [12], hidden Markov random fields, Markov random fields (MRF) [13] for contextual regularization, self-organizing maps (SOM), and hybrid genetic algorithms [20] and has been reviewed [1]. Segmentation, region extraction is applied to the image with multiple components using ANN and genetic algorithm based approach is being introduced in [14] is done in an image with multiple components. All methods mentioned above work well for spatial data, i.e., these are considered as pixel-based classification, but there are certain applications that appeal consideration of the pixel location over a period of time along with the captured scene termed as spatiotemporal analysis. People also have worked on dynamic clustering strategies for spatiotemporal reasoning [15] and visualization [16]. Sometimes in an application like climate change and prediction-based application, time series analysis techniques are necessary and critical part of our analysis. Linear regression and autoregressive moving average (ARIMA) for time series analysis are used in [17] to study rainfall and temperature trends in Bangladesh. In [18], authors have used data from Moderate-Resolution Imaging Spectroradiometer (MODIS) with a spatial resolution of 500 m to exploit global urban extent map using supervised decision trees for classification of images taken over a period of 1 year.

Change detection is one of the applications that uses spatiotemporal data to see the change in trends that have been observed over a period of time at a particular location. A machine learning model using decision tree has been employed to observe change in multiscale imagery [19]. Object-based change classification with combining object correlation images (OCIs), object-based change classification integrating neighborhood correlation images (NCIs), object-based change classification without contextual features, per-pixel change classification with NCIs, and traditional per-pixel change classification using only bitemporal image data are done. Along with this, a machine learning decision tree and nearest neighbor were also investigated. Comparison between the OCI and the NCI variables was evaluated. Object-based change classifications with incorporation of OCIs or the NCIs resulted more accurate change detection classes [20].

Image transformation subtraction, rotation, analysis of change vector, or cross-correlation analysis are also utilized as change detection approaches [1]. In the past few decades, neural networks and kernel methods are also widely used approaches. Composite kernels have been specifically designed for the combination of multitemporal, multisensor, and multisource information [21, 22]. People are now focused on exclusion or reduced to a large extent the human intervention by utilizing completely unsupervised or semisupervised approaches, respectively [23, 24].

11.2.2 Kernel Base Extraction

Working with hyperspectral images is challenging due to its high dimensionality and nonlinear nature results in higher computational time, and the presence of high colinearity and noisy bands vitiate the model’s quality. Spectral bands represent the characteristics or features of the elements under consideration to model. Various techniques have been proposed to study feature selection in satellite images classical discriminative criteria [25], and a bit advanced one uses machine learning, such as genetic algorithms [26] or SVM-based recursive feature elimination [27, 28]. Recently, more attention has been paid on feature extraction methods. PCA is one of the most widely used linear methods. Later on, multivariate kernel machines were proposed to deal with nonlinearities in the data [29]. In [30], Gaussian process-based classification is also done for hyperspectral images.

11.2.3 Pure Pixel-Based Extraction

A pixel in satellite images is a blend of features (spectral signatures) of various objects or materials found in the spatial extent in the scene. An automatic extraction mechanism right from the image, to get the pure spectral pixels, termed as endmembers is required to be developed: NFINDER algorithm, VCA (vertex component algorithm), orthogonal subspace projection, and SVDD (support vector domain description) [1]. Subpixels in the images can be identified using pure pixels as they serve as basis to represent all other pixels as a linear (or nonlinear) combination of them or mineral mapping. Support vector domain description (SVDD) has also been used for pure pixels selection [31].

11.2.4 Regression

Prediction models are based on establishing and learning the relationship between the evaluated (observed values) and the ground truth available to validate the relationship. Empirical models tune to learn the relationship between the obtained spectra and actual measurements done on ground. Due to certain drawbacks, parametric models have landed to inaccuracy in prediction over new data, which has led to nonparametric and nonlinear regression techniques to come into picture, such as neural networks, support vector regression (SVR) [33], semisupervised SVM for parameter estimation [34], relevance vector machines (RVM) [35], or Gaussian processes (GP) [30, 36]. Due to lack of interpretability and dependency on training data, few analytical models with better accuracy also limit its utilization.

11.3 Recent Trends in Machine Learning for Satellite Data Processing

11.3.1 Manifold Learning

Manifold learning deals with dimensionality reduction and nonlinear feature extraction. The field is span throughout computer science, machine learning, image processing, etc. Manifold learning is focused toward projection of high-dimensional data into a lower dimension and provides a better analysis by preserving main features of the original data. High-dimensional data visualization and understanding become viable. Intrinsic structure of satellite data cannot be described using traditional linear dimensionality reduction methods. Isomap [37, 38] and Laplacian methods, such as unnormalized graph Laplacian, are used as regularization technique for SVM [39]. A manifold regularization technique also includes Laplacian regularization (LapR and HLapR) in [40]. Graph-based Laplacian energy [41] is used for hierarchical image analysis. Local linear embedding (LLE) transforms a very high-dimensional space-embedded images into two dimensional and makes visualization much simpler to analyze the whole data [42]. A nonlinear dimensionality reduction using LLE is done in [43]. The extension over LLE has been proposed in [44] to provide a supervised feature extraction technique. Image denoising is also done using LLE [45]. People also have made efforts to strengthen the discrimination capability and ability to generalize the representation of embedded data [46]. Some algorithms that analyze the intrinsic dimensionality of hyperspectral images have been mentioned [47].

11.3.2 Semisupervised

An area similar to manifold learning is semisupervised learning, which makes use of both labeled data and the wealth of data (unlabeled) samples for development of model using manifold data structure. In perspective of remote sensing data, a variety of methods have been developed that are either generative or discriminative. Conditional density estimation in inclusion of generative models has been presented [48]. A number of graph-based methods have been developed for classification [49, 50], regression, as well as target detection. An adapted graph-based SVM classification technique for time series data is used for analysis and classification of satellite images and a new graph kernel is being designed for the same [51]. The design of cluster and bagged kernels has been successfully presented [52]. A technique applied for image classification and change detection [53] which has been using SVM known as transductive SVM (TSVM) has been developed [54, 55]. Also, a modified TSVM is proposed in [56] to address ill-posed problems in remote sensing. In [57], Fisher discriminant classifier was proposed which is also a semisupervised kernel. The problem faced by these methods is incapability to handle large-scale dataset, i.e., if the number of unlabeled samples is very large, these methods cannot be applied directly.

11.3.3 Transfer Learning

Transfer learning or domain adaptation in view of remote sensing is the problem that arises when training samples are available for a particular time, and we need to classify time series data to update land cover maps. Few methods, like NN and domain adaption SVM (DASVM), have been used [1]. Another problem to address is that for classification of an image with samples taken from different images, which results biasing in sample selection or covariance shift. This has been addressed by defining proper kernel machines [58]. Recently, in [59], authors have proposed maximum margin-based clustering that has used the common features from both the domains (target and source) avoiding the samples only from source domain.

11.3.4 Active Learning

Recently, selection of most relevant sample to train the model is introduced which is termed as active learning. In [60], authors have mentioned object-oriented classification using SVM for pixel-based classification using maximum likelihood classifiers. The extension of this approach that uses boosting for weighing of few selected pixels has been done iteratively. Also, an information-based active learning was introduced for target detection. For very high-resolution satellite images, a model-independent active learning method is also reviewed [1]. In order to achieve improved accuracy in classification, an improved training samples selection strategy has been proposed with the active learning approach utilizing two-staged spatial computation [61]. There is a lot of scope in the field of active leaning since selection of the most informative sample is difficult. To address this issue, authors have employed a method that considers both classification and localization of the detected object [62]. Two metrics have been proposed, one of which evaluates the overlapping of the object and bounding box (resultant of prediction) and the second shows stability of detection in presence of noise.

11.3.5 Structured Learning

Majority of the methods mentioned in previous sections assumes binary output with two classes. But, most of the application of satellite data deals with multiclass output which further increases the complexity for both classification and prediction. This is the part of structured learning where multiple labels can be predicted simultaneously; computer vision-based structured SVM (SSVM) [63] and very few applications have been presented [64] for structured learning.

Kernel based image classification using structured learning in [64] is introduced to overcome issues of noise in very high-resolution image. An efficient with enhanced performance-based multiclass classification-based hierarchical spectral clustering is introduced very recently for incorporating the scalability aspect of processing large satellite image data [65].

The issues with almost all of the methods mentioned in previous sections can be addressed by making use of many intermediate layers that can take care of signature extraction automatically and also capability of learning from a large dataset. Secondly, to deal with time series data, a memory-based architecture is required to be used. These issues have drawn attention of the researchers to move on to the area well known as deep learning, which eliminates the human intervention for feature extraction, making it robust and easy to use for complex applications in satellite data processing.

11.4 Deep Learning

Deep learning refers to a deeper network, with many strata (layers), typically at least four or five layers deep, of (usually) nonlinear transformations. Its unsupervised nature of learning from labeled data and ability to generalize over unlabeled data made it more popular to use in the field of satellite data processing. Deep neural network (DNN) is just not a solution to image classification or prediction but can also be used for feature-based image registration [66] in order to get a robust and accurate match in presence of noise in synthetic aperture (SAR) data. Change detection is also being done using deep learning architecture [67].

11.4.1 Convolutional Neural Network (CNN)

CNN are well-known deep learning network, a stratified architecture which is comprised of convolutional layer, nonlinear layer, and pooling layer. CNN transform input to an output class (prediction) and recognize hierarchically. PCANet proposed in [68] an unsupervised convolutional deep learning network architecture formed using cascading principal component analysis to learn filter banks having multiple stages followed by binary hashing used for indexing purpose and block-wise histograms serve as pooling layer. For estimation of crop yield [69], CNN is being utilized through Convolutional Architecture for Fast Feature Embedding (Caffe), a framework for deep learning, in two different ways: a model with two inner product layer and the other one using single inner product layer with rectified linear unit (ReLU) as an activation function.

In [70], authors have reviewed recent advancement in the use of CNN in classification of hyperspectral images. Supervised CNN with 1-D, 2-D, and 3-D CNNs along with their comparison on performance basis have been reviewed. Unsupervised CNN has also been used for hyperspectral image classification to learn spectral-spatial features. The videos obtained from cameras mounted on satellites don’t have spatial resolution well enough to interpret the scene due to motion of objects on earth as well as motion of camera; a spatiotemporal analysis by fusion of multispectral images and space videos is done using CNN [71]. A pretrained model transferred to supervised CNN [72] is used to handle high-dimensional data with a simple and computationally efficient approach. In [73], summary of almost all types of CNN and modifications made over it are presented with application and data. Not only for earth, Convolutional Neural Networks have also been used to automatically detect geological landforms on Mars [74].

11.4.2 Recurrent Neural Network

The applications of standard neural networks (and also convolutional networks) are limited because they only accept a fixed-size vector as input (e.g., an image) and produce a fixed-size vector as output (e.g., probabilities of different classes). Also, these models use a fixed amount of computational steps (e.g., the number of layers in the model). Recurrent neural networks are unique as they allow us to operate over sequences of vectors—sequences in the input, the output, or in the most general case both.

A novel approach for ocean and weather prediction is proposed as a doctoral thesis referred in [73]. Modification on RCNN is done to learn the equations in mesoscale meteorological model. A recurrent neural network (RNN) works for tracking of multiple objects in presence of occlusion in data and is mentioned in [73].

11.4.3 Recursive Neural Network

A recursive NN can be seen as a generalization of the recurrent NN, which is in fact a recursive neural network with the structure of a linear chain. Recursive NNs operate on hierarchical structure. Recurrent NN operates on progression of time. Long short-term memory (LSTM) which is a type of recursive NN used with convolution named as Convolutional LSTM is used for precipitation nowcasting [75]. The ConvLSTM is an extension of fully connected LSTM (FC-LSTM).

11.4.4 Deep Belief Network

A deep belief network (DBN) is a probabilistic, generative model made up of multiple layers of hidden units. It is a composition of simple learning modules that contributes to each layer. A DBN can be used generatively to pretrain a DNN by using the learned DBN weights as the initial DNN weights. Back-propagation or other discriminative algorithms can then be applied for fine-tuning of these weights. DBNs are particularly helpful when limited training data are available. These pretrained weights are closer to the optimal weights than randomly chosen initial weights. People have used DBN [76] for urban planning to effectively extract the features and improve the performance of classification. Detection of aircraft in high-resolution satellite imagery, object recognition, traffic flow prediction, urban land use and land cover (LULC), vehicle detection, nighttime vehicle sensing in far infrared (IR), and prediction of drought index utilize DBN. Classification of polarimetric SAR and HSI made use of Restricted Boltzmann Machine (RBM) and DBN to perform spectral information-based classification [73].

There are few other classes of DNN as well, one of which is stacked auto encoders (SAE) which can be used for HSI classification and dimensionality reduction. All of the methods mentioned above are summarized in Table 11.2 with their application and advantages.

Table 11.2 Comparison of various deep learning algorithms with applications and advantages

11.5 Case Studies

We are presenting particular use cases of artificial intelligence in this section. Depending on the way of utilizing the information obtained from processing of satellite data, these can be categorized into two categories:

  1. (a)

    Direct application: Satellite imagery is processed through machine learning techniques straight away that provides insights of the scene, e.g., object detection, vehicle tracking, urban boundaries, road segmentation, building detection, change detection, etc.

  2. (b)

    Derived application: Certain complex and sophisticated model that uses the set of features which in turn used to derive some conclusion or making decision policies and also utilizes the data other than the data obtained through satellite, e.g., making decision policies for farmers, profit prediction of retailers, etc.

    1. (i)

      Object detection in a high-resolution image

      Many challenging applications of satellite data processing has already been discussed in previous sections. Object detection in particular is a challenging task in satellite imagery due to the following reasons:

      • Within hundreds of megapixels in satellite image, a tiny part (approximately 10–20 pixels to few hundred pixels) of it constitutes the object in search.

      • Lack of availability of training data.

      • Optimization of algorithm to detect small objects.

      • Adaptation of algorithm to work well with different scales and objects.

      Segmented satellite image with identification of several objects is shown in Fig. 11.2. Different colors used for different objects are marked in legend.

      Fig. 11.2
      figure 2

      Demonstration of time series analysis and change detection algorithm. The changes in buildings and roads are highlighted red (before) to green (after) [78]

      The major issues in the object detection, change detection, and time series analysis-based application have been addressed by few popular deep learning architectures such as FasterRCNN and You Only Look Once (YOLO) [77]. The presence of unsolved challenges made this research area still in an immature stage.

    2. (ii)

      Change detection is another area of application which uses the direct output produced by processing satellite imagery. This represents the difference in the image taken for a particular geographical location at two different times (temporal changes), where significant changes in the area (region) are expected to occur as shown in Fig. 11.2. There can be a simple binary change, where a pixel belongs to either of the two expected class or a multiclass change, where a pixel can be labeled from a set of class labels.

      Multiclass classification requires supervised techniques to be used, which uses training data (ground truth observation) for classification. Developing a solution to multiclass classification is extremely difficult and complex due to following issues:

      • Complex and expensive ground truth data collection.

      • Data normalization.

      • Effects of lenses, climate changes, and other natural changes.

      • Thorough knowledge of remote sensing is required to choose a machine learning technique.

    3. (iii)

      Profit earned by retailers: Unlike direct applications mentioned previously, derived ones extract information from satellite imagery which serve as base features to complex systems. In retail sector, number of cars parked at parking lot can give the estimate about the profit that can be made by the retailer. The predicted profit then can be reported in form of monthly, quarterly, or annual reports. The count for number of cars present in the parking area for the scene captured by satellite image is presented in Fig. 11.3.

    4. (iv)

      Crop yield estimation and price prediction: Normalized vegetation index (NDVI) is derived product from various satellites like Landsat, MODIS, OCM, Sentinel, etc. NDVI provides a crucial information used for crop yield estimation and price prediction. The farmers, commodity traders, insurance policy makers, government policies in agribusiness, and many more are utilizing agriculture-related intelligence to accomplish their needs. AI techniques are utilized in identifying higher yield area to help farmers to choose the best time and places for farming for a particular crop. Figure 11.4 shows the domination of sugar plantation in the island.

    5. (v)

      Economic growth monitoring: Satellite images can uncover economic activities of the countries which are hard to reach. The factors affecting or showcasing the economic status of a country are number of high-rise buildings and increase in rate of construction, electricity consumption (can be measured in night hours by measuring the luminosity), number of cars, roads, etc. The prediction of poverty can also be done with measuring the factors mentioned as well as making a per capita income map, which also contributes to policies for making sustainable development growth (Fig. 11.5).

Fig. 11.3
figure 3

Car-counting algorithm [79]

Fig. 11.4
figure 4

In this infrared view of Maui presented in false color, the large sugar fields that dominate the agriculture of that island can be seen in bright green at the narrowest part of the island, wedged in the low lands between the mountains of Pu’u Kukui on the east and Haleakala in the west [80]

Fig. 11.5
figure 5

Construction rates monitoring (urbanization) using shadow detection [81]

11.6 Conclusion

Artificial intelligence that includes using neural networks, deep learning, or other algorithms that make the computer do tasks that would be hard to program with conventional methods can be more efficient and easier way. Data processing of satellite images and the evaluation of extremely large datasets are two common use cases of AI in space applications. The future of satellite processing lies in use of deep architectures as well as big data technology. Since there are a number of open challenges that need to be addressed in this area and big data is one of them. Storage and archival of that huge satellite data is another problem to address that can be fixed by utilizing cloud computing. We have reviewed many AI algorithms that have been utilized to address few problems faced by satellite image processing community, but these are restricted to the type of data, type of sensors, availability of limited labeled dataset, etc. Now, there is a need of a common platform wherein the data, irrespective of the source, type and data inadequacy, etc., can be processed automatically with least or no human intervention. The extension of machine learning and deep learning architectures for big data technologies in composition with cloud computing architectures are still open issues.