1 Introduction

The open data policy under Copernicus programme from NASA has acquired extensive attention in the field of remote sensing in 2008. The Copernicus programme began to provide the free Landsat satellite data. This is to combine the satellite data with the analytic tools in order to increase the economic growth. However; inspite of having larger modified softwares to pre-process the satellite data, there are problems where cloud covered regions often trouble the analysis [73]. Therefore, cloud detection is found to be an important initial approach to get the information regarding climatic studies and different properties of earth and atmosphere. This is due to the fact that change in cloud, affects the energy-budget between earth and atmosphere and hence affects the water exchange that describes the climate [12, 27, 108]. Moreover, cloud and its shadow obscures the specific scene more brighter and darker, respectively which affects various tasks such as crop monitoring, classification of marine habitats, land use region monitoring, environmental monitoring, geographic mapping and target recognition. Therefore, it is a vital pre-processing step to correctly and efficiently classify clouds, before satellite imagery used for further analysis. For this reason that cloud’ statistics were utilized for atmospheric research since the first remote sensing imagery came into picture [4, 180].

Most of the methods based on cloud’ statistics are observed to be focused around spectral signature using pixel-to-pixel processing. Distinct spectral features of clouds are observed in satellite imagery depend on its brighter and colder characteristics. This leads to mis-classification of cloud with pixels having similar spectral signatures, for example, sand in deserts and snow/ice. Moreover, clouds having diverse shape also hampers the accuracy of its detection [31]. Therefore, various studies have been employed using physical parameters of cloud like cloud density, shape of clouds, optical properties and brightness temperature. However, higher spectral heterogeneity of clouds and its larger temperature variability with underlying surfaces cause difficult to extract cloud contaminated pixels from satellite images. The main reason of this difficulty is the utilization of threshold based approach which is sensitive in various atmospheric conditions for the detection and removal of distinguish characteristics of cloud. These characteristics of cloud are divided into low-level, mid-level and high-level based on its height from ground level, represented in Fig. 1.

Fig. 1
figure 1

Cloud chart based on their heights from the ground level [119]

Cirrocumulus, cirrus and cirrostratus are high-level clouds. Cirrocumulus cloud can be seen in winter indicates fair but cold weather. Its appearance is small rounded puffs, long rows and usually find in white or gray in color. Similarly, cirrus cloud indicates fair weather and can be seen as long, thin, wispy white stream. Whereas, cirrostratus cloud can be found in 12 to 24 hours before rainfall or snowstorm. The another characteristics of cloud comes under mid-level which includes altocumulus and altostratus. It generally extend over the entire sky like a thin sheet of cloud. Altocumulus cloud produces in group as grayish-white with 1 km thick and one portion darker as compare to other. It indicates warm or humid morning and thunderstorm in late noon time. While altostratus indicates a storm with rain or might be snow on its way. They can be seen as gray or blue-gray cloud and generally expanded in the entire sky. Cumulonimbus cloud with vertical growth of 10 km high indicates the thunderstorm which is associated with heavy rain, lightening, snow and sometimes tornado. In the presence of cumulonimbus cloud, if rainfall hits to the earth’s surface then it converts into nimbostratus cloud. Stratus, cumulus, and stratocumulus are the other categories that come under low-level cloud. Stratus cloud appears gray in color like fog and cover almost entire sky and don’t reach the earth’s surface. It indicates fall of little mist or drizzle. Cumulus cloud in the form of cotton balls with sharp outlines, flat base and with vertical growth show fair or stormy weather. Stratus cloud with low and lumpy shape with gray color indicates light rainfall [67].

The only problem in detecting distinguish characteristics of cloud is its responses towards the change in climate are complicated and hence tough to compute the energy as well as water balance [72, 108, 139]. Therefore, mechanism of cloud is an essential area of study to incorporate more precise representation of its behavior. The number of analysis are done over seasonal cloud patterns and weather conditions but their relativity to earth radiation budget (ERB) calculation is limited [194]. The accurate understanding regarding mechanics of cloud formation and its growth is still unresolved but many theories are suggested by explaining the cloud structure using micro-physics of single droplets. Over sixty years of review, satellite data has presented to us an abundance of learning for understanding the atmosphere from space. Consequently, the approach of weather satellite technology increases the possibility of accurate study of clouds. In addition to the type division, clouds can further be classified in species, and sometimes divided further into varieties, which define the special characteristics of the clouds, and are related to the cloud transparency and the arrangements of the macroscopic cloud elements. Thus, with knowledge of cloud type, size, motion and development, one can calculate and predict the presence of cloud over underlying earth surfaces.

Literature reveals the huge number of cloud detection methods which are basically developed in two categories. The one is threshold based algorithm and another category depends on classification based approach. Threshold based approaches are based on spectral and textural information of clouds which is captured through the available channels of the sensors. But it is further observed that threshold method failed to give accuracy in the presence of complex underlying surfaces to distinguish various types of clouds. On the other side, due to the breakthrough in the field of pattern recognition and machine learning algorithms, classification based approach started to apply on cloudy satellite images. These algorithms select the favorable features of clouds to optimize the model parameters through training data. However, these favorable features required to be extracted manually after large computation to gain better detection accuracy. Therefore, the use of neural network approach came into the picture to extract features automatically in order to distinguish cloud contaminated pixels in satellite images. Further, the researcher focused around the task of differentiating thin clouds because the information of thin clouds mixes with the information of underlying regions. Hence, the concept of super-pixel is used with the deep neural network approach [143]; but the approach ignores the multistage con-textures during classification. Thereafter, multi-scale deep neural network has been developed in order to extract both local as well as global features of clouds and segment the cloudy regions.

In response of development and limitation of cloud detection algorithm discussed above, present work discusses the various approaches on cloud detection with following major contributions.

  • Algorithmic advancement of cloud detection is discussed based on spectral and textural parameters.

  • Experimental simulation on existing neural network approaches with spectral and textural parameters is employed as classifier for cloud detection.

  • The growth of neural network approach is further extended towards deep neural network for cloud detection in satellite images.

As a confine of research on cloud detection approaches, this paper aims to the growth of cloud detection algorithm by sinking the knowledge from the literature review and classifying the articles from 1970 to 2020. The reason for selecting this duration is that the information and communication technology of this era has played a vital role, not only in the direction of remote sensing but also in the potential to accumulate data from online databases. This literature review started on October 2016 and it was built on a search in the keyword index, abstract and conclusion of articles under IEEE transactions on Geo-science and Remote Sensing, Royal Meteorological Society, Monthly Weather Review, AGU Journals, Journal of Applied Meteorology, International Journal of Remote Sensing, Remote Sensing of Environment, Journal of Geophysical Research, Journal of Climate and Applied Meteorology, ISPRS Journal of Photogrammetry and Remote Sensing, Journal of the Atmospheric Sciences and Journal of Climate. It was observed that limited number of research articles was easy reach to the thematic point and hence additional sources are added in this work. Based on the scope of approximately 200 selected journal, books and thesis are collectively represented in a wider range of methodologies in algorithmic advancement and experimental analysis on cloud detection approach. Suitable papers are summarized and enable us to derive conclusions and recommendations for further investigations while evaluating and comparing different methods.

The remainder of present article is arranged as follows. Section 2 presents the algorithmic advancement on cloud detection in detail. Section 3 conducts a critical experimental analysis with Existing Techniques on Cloud Detection. Section 4 presents the simulation results on existing neural network approach to detect clouds using various supervised learning rules. Section 5 discusses the algorithm that demonstrates the growth of neural network approach towards deep neural network for cloud detection. Lastly, the concluding remarks are given in Section 5.

2 Algorithmic advancement on cloud detection

This section presents various studies that incorporate the algorithms which were not highly customized; instead the focus was on cloud detection algorithm and its related works by evaluating under a given parameters. Its result of literature is shown graphically using web knowledge in Fig. 2.

Fig. 2
figure 2

Growth of popularity of methods in cloud classification over the past decades

2.1 Spectral parameters

The properties of cloud in terms of its spectral parameters like irradiance and reflectance values demonstrate the thickness, height and amount of cloud [32, 114]. Bowker et al. [14] studied spectral reflectance information of 156 targets using which classification algorithms were designed to differentiate cloud against various earth’s surfaces. Similarly, different studies has been observed where individual earth’s surface is taken into account against cloud. For example, Greaves and Chang; Liljas; Crane and Anderson [29, 46, 100] assessed the reflectivity at 3.7μ m to distinguish water and ice crystal cloud. Raschke et al. [132] used the spectral information to differentiate cloud against snow/ice in the polar region with AVHRR data. Similarly, many efforts have been made to extract spectral parameters such as,

  • Ackerman et al. [1] assessed the analysis on cloud mask algorithm for MODIS data and found that 0.86 μ m is sensitive to detect all types of cloud in absence of sun-glint. The algorithm found difficulty in detecting cloud during night time in polar region.

  • Lo and Johnson; Kuhn [84, 105] evaluated the absorption region of water vapor over Nimbus II satellite. The analysis found that the wavelength range of 6.4-6.9 μ m is used to recognize water vapor over clear atmosphere.

  • The ratio of near-infrared (NIR) band to visible (VIS) band has been utilized to discriminate land area with threshold values 1.6 and sea surfaces with 0.75. This is due to the fact that VIS is more reflective than NIR band. However, this ratio technique was found ineffective in the existence of sun-glint [11, 100, 137].

  • Raschke et al. [29] observed the similar reflectance property of cloud and snow/ice in VIS band and hence it is challenging to extract cloud contaminated region in this range. However, more reflectance of cloud is observed in the water absorption band (1.5 μ m) whereas more absorption for snow pixel is seen in this band. Hence 1.5 μ m can be used to differentiate cloud against snow.

  • Grant and Hunt [45] studied that ice cloud can be discriminated in 10-11 μ m band against water territory due to 10% difference in its refractive index.

  • Crane and Anderson [29] used 1.6 μ m channel of AVHRR satellite to discriminate cloudy region over snow/ice.

  • Hulley and Hook [65, 99] introduced new ASTER cloud mask algorithm (NACMA) to differentiate cloud against snow/ice/deserts/or thin cloud from Landsat-7, MODIS, and AVHRR data. To check the validity of the algorithm, normalized difference snow index (NDSI) was used and found that its value greater than 0.4 shows the presence of snow with reflectance greater than 0.11.

  • Jedlovec et al. [71] used bispectral composite threshold (BCT) algorithm over GOES-12 data using 11 μ m and 3.9 μ m channels. Better performance of the algorithm is observed for day time cloud detection during summer season because sun-glint produces better contrast between cloud and earth’s surfaces in the infrared channel.

  • Shenk et al. [145] investigated that 2.3 μ m is sensitive channel for cloud detection and particularly cirrus cloud can be discriminated at 6.5-7.0 μ m range of wavelength.

  • Szejwach [153] observed that cirrus cloud is colder in 5.7-7.1 μ m and 10.5-12.5 μ m ranges.

  • Yang et al. [174] developed automated ground based cirrus cloud detection scheme because of non-uniformity of cloud brightness in the presence of sun-glint. Hence background subtraction method is used named background subtraction adaptive threshold (BSAT) method to detect cirrus cloud and eliminate illumination effect.

  • Band ratio of Red channel to Blue channel is used which gives better results for thick cirrus cloud but fail in case of thin cirrus. Hence combination of BSAT and adaptive threshold method is incorporated with band ratio method to extract cirrus cloud accurately.

  • Bell and Wong; Liljas [11, 100] found that thin cirrus cloud is detected by taking the brightness temperature (BT) difference at 11 μ m and 12 μ m.

  • The BT difference between 11 μ m and 3.7 μ m is used to detect low stratus cloud during night time [11, 100, 137].

  • Li et al. [96] used hybrid thresholding algorithm (HYTA) using band ratio technique of blue to red band with minimum cross entropy thresholding. The challenge to the technique is to discriminate cloud in the presence of sun-glint which required additional thresholding techniques.

  • On the same application, day time [55] and night time [56] cloud detection algorithm was developed over FengYun (FY-3A)/VIRR polar-orbiting meteorological satellite data using BT difference of 1.38 and 1.6 μ m and 3.7 and 12 μ m to differentiate high and low cloud, respectively.

  • Gupta and Panchal [52] realized day time cloud detection algorithm using decision tree approach in the VIS band. The results were compared with FY-3A/VIRR official cloud mask and achieved more accurate detection rate.

Surface albedo is the ratio of reflected radiation to its incident radiation on planet. The amount of reflected radiation not only demonstrates the characteristics of surface but also shows the spectral and angular distribution of irradiation [26]. Albedo parameters affect the climate by knowing the absorption radiation of planet. The variation in albedo due to irregular heating effect among land and ocean territories drives the effect of climatic information. Welch et al. [165] investigated that albedo is the cosine of solar zenith angle measured at each pixel and scaled the gray level from 0-255 corresponds to 0-100%, respectively. The pixels appear brighter with more value of albedo and darker with less value of the albedo parameter [151]. Few contribution on cloud detection based on this parameter are,

  • Climate is hot and humid in tropical region and has heavy rainfall. Hence, Kazantzidis et al. [78] estimated the effect of cirrus cloud with different solar zenith angles over solar irradiance to predict tropical cyclogenesis. This increases the accuracy in the tropical measurements as well as the impact of cirrus cloud on climate.

  • The automatic cloud cover assessment (ACCA) has been proposed [54] for Landsat data but false results for cloud edges and transparent clouds are observed due to the absence of channel around 1.375 μ m. Thus ACCA algorithm was improved by changing its threshold parameter to minimize the error in cloud mask and improve the detection of optically thin cirrus cloud and its edges [159].

  • Zhong et al. [189] developed object-oriented cloud and cloud-shadow matching (OCM) algorithm for charge-coupled device (CCD) sensors where the availability of channels are not enough. The OCM method uses the concept of modified ACCA approach for Landsat 7 image data to initialize the cloud map with omission error of 14.74%.

2.2 Textural parameters

Spectral features were used to evaluate the average brightness variation between the bands while textural features include identification of different parameters (such as rough, chunk, striped, ruffle, thin and many more contextual attributes) within a band in spatial domain.

Pickett [131] studied that color, appearance, orientation and size of a region can be utilized to distinguish various textures within the region. Therefore, Gutman et al. [53] proposed a focus on extension of spatial domain information into radiance information of VIS to extract cloudy region. Moreover, Knottenberg and Raschke [79] observed many applications like cloud and its shadow detection using visible imagery by distinguishing textural properties of clouds. However, Seze and Rossow [142] observed some situations like dust or cirrus cloud over the desert where least information of spatial variance is observed in the VIS channel. Similarly, various analysis is done for cloud detection using textural parameters such as,

  • Cheng et al. [24] developed cloud removal algorithm based on similar pixel replacement from multi-temporal images using markov random field (MRF) [20, 34]. Better results were seen over SPOT/HRV images.

  • Shenk et al. [145] studied 13 different variety of features such as height, appearance, albedo and fourier power spectrum (FPS) to obtain climatic conditions by discriminating various types of cloud till five years over Nimbus-3 medium resolution infrared radiometer (MRIR) low-spatial resolution data.

  • The classification of stratocumulus, cumulus and cirrus clouds using textural features has been done over Landsat multi-spectral data [126]. However, stratocumulus misclassified with cumulus cloud because of their least difference in size, position, number of gaps and presence of different gaseous element across the radiation path.

  • Chen and Pavlidis [138] investigated that magnitude spectrum is huge for smooth texture analysis with high spatial frequency and low value for rough texture analysis. Therefore, Kuenning et al. Rossow [83, 133] proposed a focus on extraction of rough and smooth textural features using entropy of particular range of wavelengths at low-level of altitude [43].

  • Seze and Rossow [142] observed difficulty in distinguishing various types of cloud in the small inhomogeneous surface with low contrast case. This is due to the fact that inhomogeneous surface consists large spatial variability and hence mis-classifies as cloudy pixel. Therefore, various cloud detection techniques (e.g., [115, 134, 142, 150, 162]) have been discovered using threshold technique for low contrast region by computing the various statistics of cloud.

  • [163] employed GLDV method for texture analysis in order to generate multi-spectral signature of various characteristics of cloud. Likewise, Welch et al. [164] used GLDV to extract 9 texture features in unsupervised classification. The analysis demonstrates decaying in classification accuracy while improvement is seen in run time storage capacity by 40% and 87%, respectively.

  • Welch et al. [164] proposed a focus on gray level co-occurrence matrix (GLCM) [58] as texture analysis to classify different types of cloud to increase the accuracy of analysis over satellite images. It was investigated that GLCM saves 50% storage capacity and 30% in run time. Further, sum and difference histogram (SADH) and gray-level difference vector (GLDV) textural features which are developed from GLCM approach for classification purpose. Over comparison, it is observed that the accuracy using SADH and GLDV for cirrus cloud classification is not improved but the improvement in accuracy is found for the discrimination of stratocumulus and cumulus cloud.

2.3 Other parameters

The studies based on few other parameters are,

  • Emissivity in NIR band between 0.1 to 0.95 is utilized to measure variation in cloud thickness [41, 66, 68, 101, 170]. Hunt [66] studied the emissivity of cloudy region and observed its sensitive in 8.5-13 μ m range as compare to the other channels of SWIR band such as 2.3, 3.5, and 3.8 μ m. For night time analysis, the difference of 12 μ m and 3.7 μ m ranges is used for mid-level cloud detection [11, 124, 137].

  • Water cloud consists comparatively less water droplets than ice cloud. This information helps to distinguish water cloud against ice cloud using cloud scattering parameters (cos𝜃).

  • Contrast parameter is calculated from blocked image to get the multiple optimal threshold values and remove the thin cloud [190].

  • Shen et al. [144] observed that thin cloud consist of low frequency information and hence developed the frequency domain based algorithm using homomorphic filter. This algorithm is applied over cloudy region of Landsat ETM+ and GaoFen-1 images in order to reduce the false detection for clear pixel by removing thin clouds and recover the ground information.

3 Experimental analysis with existing techniques on cloud detection

This section presents the work where the focus is on the evaluation of compared applicable results between different techniques.

Table 1 demonstrates the experimental analysis of various cloud classification algorithms. The content in the column of the table shows: (i) name of the satellite/ imager used for the analysis, (ii) classification method (iii) publication year of the paper, (iv) average accuracy of the classification method of all classes, (v) maximum accuracy obtained, (vi) minimum accuracy achieved and (viii) number of classes. All the information in the table are grouped based on classifier and its accuracy. Here accuracy is subjective to human factor because results are based on visual comparison. The parameter which affects the accuracy of classifier is the number of output classes varying from 2 to 10 distinguish regions. Also, the accessible channels and cost of equipments which tend to the use of final measurements can also affect the accuracy of classifier. It is observed that different strategies have been developed conceptually based on neural network (NN) method for cloud classification application. The frequency growth of NN method is demonstrated in Fig. 3 for cloud classification application. Therefore, next section incorporates detailed study on NN approach to improve the accuracy of cloud classification for multi-spectral satellite images.

Table 1 Experimental results based on maximum used classifier on cloud classification
Fig. 3
figure 3

Frequency growth of NN method in cloud classification

4 Neural network approach

Neural network approach was utilized to improve classification accuracy in geophysical data [91]. It can put the solutions of complicated problems by training the network on the feature value. Therefore, NN is described as a possible alternative way to discriminate cloud using simple components which function in a parallel manner [128].

4.1 Basic concept

A NN consists neurons which are represented as nodes. Nodes are defined by real values which demonstrate the activity of neurons and are directly connected with the weighted paths. Therefore, neural network layer is defined as weighted sum of the input nodes which is passed through the activation function. The architecture of the neural network is demonstrated here in Fig. 4.

Fig. 4
figure 4

Architecture of artificial neural networks to extract cloud contaminated pixels

Generally, NN consists three or more layers i.e. an input layer, an output layer and one or more hidden layers between input and output layers. The nodes of the network consist adder and non-linear function. The input (xl) and its output (al) are collectively written with lth layers in (1)

$$ \begin{array}{@{}rcl@{}} \begin{bmatrix} n_{1l} & n_{2l} & n_{3l} & {\dots} & n_{sl} \end{bmatrix}^{T} &=& \begin{bmatrix} w_{11l} & w_{12l} & w_{13l} & {\dots} & w_{1Rl} \\ w_{21l} & w_{22l} & w_{23l} & {\dots} & w_{2Rl} \\ {\vdots} & {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ w_{s1l} & w_{s2l} & w_{s3l} & {\dots} & w_{sRl} \end{bmatrix} \begin{bmatrix} x_{1l} \\ x_{2l} \\ x_{3l} \\ {\vdots} \\ x_{Rl} \end{bmatrix} + \begin{bmatrix} b_{1l} \\ b_{2l} \\ b_{3l} \\ {\vdots} \\ b_{sl} \end{bmatrix} \\ a_{l} &=& f_{l}(n_{l}) \ \ \ \ with\ \ \ \ n_{l}=W_{l}x_{l-1}+b_{l} \end{array} $$
(1)

where, fl demonstrates the activation function of lth layer in NN to introduce the non-linearity and calculate nontrivial problems.

4.2 Advancement in cloud classification algorithms

Cloud classification application in remote sensing has recently found numerous work on NN. For the review, we found approximately 55 relevant articles which are published in last few 20 years (Fig. 2). This section summarizes NN advancements where papers are merely compared NN performance with other algorithms or discusses the specific applications.

Probabilistic neural network (PNN) is a supervised classifier that classifies cloudy region by estimating probability function of the input information. Bankert [8] proposed a focused on two layer PNN with spectral, textural, and physical input features and hold-one-out training and testing method to differentiate the classes: cirrus, cirrocumulus, cirrostratus, altostratus, nimbostratus, stratocumulus, stratus, cumulus, cumulonimbus and clear [149]. Lee et al. [91] proposed a focus on feed-forward network trained by back propagation (BP) NN classifier with cloud textural input features. This algorithm worked for the discrimination of cirrus, stratocumulus and cumulus cloud as these clouds are important radiative cloud types. Miller et al. [113] proposed a different classification scheme which worked as an alternative to BP called CMAC algorithm. CMAC algorithm can improve interaction between cloud and climate using prior knowledge of contextual information. Xie et al. [168] proposed a focus on multilevel cloud detection algorithm using deep learning method. CNN approach was used to differentiate thick cloud, thin cloud, and clear pixel using simple linear iterative clustering (SLIC) method. This approach is further used in [104] where combination of SLIC and SEEDS has been utilized to distinguish the entire image into super-pixels and classifies into thick cloud, thin cloud, building and other culture. Lee et al. [91] presented similar approach on three layer NN to detect cirrus cloud and obtained 85% accuracy. However, it has been observed that cirrus cloud was misclassified with cumulus cloud. Therefore, four layer NN with nine texture features in the input layer was proposed where cirrus cloud was detected with accuracy of 96%, stratocumulus cloud with 92% and cumulus cloud with 90%. In a different approach, Walder and MacLaren [156] studied ANN to design automatic cloud classification system by using textural and spectral features extracted from AVHRR data.

Lewis et al. [92] integrated hopfield NN method to track the cloud in the visible and infrared channels of Meteosat 5. Shape and spectral parameters have been used in this method. Shape parameter includes area, perimeter, area to perimeter ratio, length, width, and length to width ratio of cloudy region while visible spectral parameters include average and maximum intensities of cloud contaminated pixels in the satellite images.

Self-organizing map (SOM) is the only unsupervised NN method that reduces the dimensions of data by mapping it into two dimensional data space. Murao et al. [118] evaluated a classification scheme emphasizing both SOM and three-layered feed forward NN trained by back propagation method. SOM is used to extract texture features. This method is named as hybrid NN which was applied over GMS infrared temporal data to estimate rainfall rate. Peak and Tag [129] obtained the training data based on geographical information and processed for Kohonen’s self-organizing feature map. Using Kohonen SOFM method, nine patterns were differentiated correspond to dark part of forest, bare land, road, grassy place, water region, farm, cloud, shadow of cloud and density inhabited district. Neural network classifier was then compared with bayesian method and it was found that average accuracy is almost same using both the methods but in case of overall accuracy, NN gave better results using nine neurons in the hidden layer.

Feijt et al. [38] designed neural network, named as mapping neural network (MNN) with two hidden layers to achieve cloud parameters: mean optical depth, effective radius, relative cloud inhomogeneity, and fractional cloud cover of multispectral reflectance data (e.g. MODIS) using NIR channels. Further, same authors have used MNN [36, 37] to measure radiant flux (i.e. reflectance,transmittance, and absorption) of inhomogeneous cloud using Monte Carlo method. Evaluation of method was done by using horizontal changes of optical depth and not of radius of cloud particles with the inclusion of two solar zenith angle (i.e. 0o and 60o). It was also observed that results gave better outcome for the reflectance than for the transmittance and absorption. The shadowing effect of neighboring pixels is found in the transmittance of 60o while transmittance of 0o reflects the more photon penetration into the optically thin cloudy pixels.

Jang et al. [70] observed difficulty in cloud detection using SPOT database because of the absence of thermal bands. Hence ANN method was used with single sigmoid layer by varying number of neurons and trained by Levenberg-Marquardt back-propagation (LM-BP) method. Good performance was obtained in thick and thin clouds. Bankert and Wade [10] improved neural network as cloud classifier by reducing the training databases using fast condensed nearest neighbor (FCNN) rule discussed by Angiulli [3].

4.3 Classification learning rules for cloud detection

At the heart of weight update method, an expression for partial derivative of the cost function with respect to the weight of the network is present. The expression demonstrates the change in cost function value correspond to change in weight and bias values. This method is termed as back propagation (BP) method. Hence BP is not only fast learning method for the network but also detailed insight information of change in behavior of weights and bias of the network are also obtained. Therefore the discussion in this section is only a onset of thorough understanding of the various equations used by the researchers that are carried out in Table 2.

Table 2 Mathematical model of learning rules in ANN model

4.4 Performance comparison of learning rules in NN model

Neural network has attracted many researchers over the last couple of decades to tackle cloud-related problems. So this section provides a detailed discussion on binary cloud classification using NN approach with different texture features (or parameters). The core idea of the method is to compute GLCM matrix of each pixel of multispectral databases. After that texture feature is calculated from computed GLCM matrix and then input to the NN system.

NN approach includes single Layer Perceptron (SLP), two Layer Perceptron (TLP) and multi Layer Perceptron (MLP) schemes. MLP is computationally complex and is compared for binary cloud classification. MLP is expected to be used for deep learning applications and used to solve complex classification problems. But in case of binary classification, they need long training time and produced correct classification accuracy similar to TLP network. Hence, performance and execution time of SLP and TLP with different neurons are compared and not compared with MLP in order to investigate the effectiveness of NN learning rules. Number of neurons in hidden layers are selected based on the past work which is provided in Table 3.

Table 3 Selection of number of hidden layer neurons in NN architecture

Various feature selection methods were discussed in order to select subset of features from the textural and spectral feature bank. While evaluating methodology, principal component analysis (PCA) is used in this article to reduce the dimensionality of the input and to identify the most important features. These selected features aim to get more accurate daytime cloud detection as well as capable to adapt different satellites. The following are the important textural features which has been identified by PCA scheme to design input of the network:

$$ \begin{array}{@{}rcl@{}} Correlation={\sum}^{N_{g}-1}_{i=0}{\sum}^{N_{g}-1}_{j=0}\frac{(i-\mu_{i})(j-\mu_{j})p(i,j)}{\sigma_{i}\sigma_{j}} \end{array} $$
(2)
$$ \begin{array}{@{}rcl@{}} Contrast={\sum}^{N_{g}-1}_{i=0}{\sum}^{N_{g}-1}_{j=0}(i-j)^{2}p(i,j) \end{array} $$
(3)

Equation (2) demonstrates the mathematical formulation of correlation that shows the joint probability occurrence of the function, p(.,.) which returns the value of specified pair of GLCM matrix. The mean and standard deviation are denoted by μi and σi, respectively of ith row of matrix. Similarly, μj and σj are the mean and standard deviation of its corresponding jth column. Contrast in (3) shows the local variations in luminance that distinguish the object. In order to validate the performance of various learning rules (provided in Table 2), investigation is carried out on Landsat8 OLI/TIRS database. Table 4 enlisted with real-life multi-spectral databases with their detailed information of training and testing samples. All images of these databases are converted into 256 gray-level intensity value. The simulation of the classification learning algorithm is carried out on MATLAB version R2015a on Intel(R) core (TM) i5processor 2.30 GHz CPU along with 4GB of inbuilt RAM in Windows-7(64-bit).

Table 4 Multi-spectral data analysis with training and testing approaches in NN model

To check the validity of different learning methods over different databases, demonstrated in Fig. 5 and discussed in Table 4, binary cloud classification is considered for comparison due to the differences in output classes, hypothesis and considered parameters. Table 5 demonstrates the summary of effect of different training samples over classification accuracy, performance and execution time of classifier. An empirical comparison of the Landsat8 OLI/TIRS data with different percentage of training samples with respect to different neurons is presented in Table 3. We have intentionally selected CNN learning rule with Levenberg-Marquardt weight updating technique because it is a most widely used technique to realize for practical application.

Fig. 5
figure 5

Multi-spectral databases used to analyze the cloud detection performance with existing methods.: (a) Landsat8 OLI/TIRS (b) AVHRR (c) GOES (d) NOAA

Table 5 Arranged Levenberg-Marquardt training sets with inserted neurons in NN model

To validate the performance of NN as a classifier or predictor, it is found that use of cross-entropy is better than other classification error (e.g., mean square error (MSE)). Cross-entropy calculates a network performance using each pair of target (T) and output (y) elements. Minimization of cross-entropy leads to be a good classifier. Equation 4 demonstrates the calculation of cross-entropy for each t-y pair. N= 1 is a special case where classifier behaves as a binary classifier having 0 or 1 value with the target element. The results show that NN approach with 10% training condition in case-e has achieved approximately 69% accuracy with less execution time which is significantly sufficient than all other cases. Meanwhile, features extracted have positive influence on the performance (here cross-entropy) in case of TLP approach.

$$ \begin{array}{@{}rcl@{}} Cross \ entropy & =& -Tlog(y); K=1 \\ & =& -Tlog(y) - (1-T)log(1-y); K>1 \end{array} $$
(4)

Two sets of comparison experiments with concluded TLP scheme are listed: one using the supervised NN rules and other using the unsupervised NN rule. Table 6 demonstrates the results of comparing the different supervised learning rules (discussed in Table 2) with TLP NN having (6,6) neurons in hidden layers and 10% training samples. It is found that the calculated cloud detection accuracy tolerates minor degradation while the minimum error rate is noticed with the use of Levenberg-Marquardt learning rule.

Table 6 Comparison of supervised learning rules in NN model

In addition, it is observed that combination of TLP NN with Levenberg-Marquardt learning rule gives maximum value in true positive rate (TPR) and false negative rate (FNR) while minimum value in true negative rate (TNR) and false positive rate (FPR). Hence TLP with Levenberg-Marquardt learning rule is selected for comparison with unsupervised SOFM learning rule (refer Table 2) for all the four databases to check the comparative performance of each algorithm for binary cloud classification. The results in Table 7 are found in favor of SOFM in terms of accuracy while it favors TLP in terms of ellapse time which shows the time complexity of training and testing computation of the algorithm.

Table 7 Comparison of supervised and unsupervised learning rules in NN model

In a different approach, double-branch PCA Network (PCANet) architecture is designed in combination with support vector machine (linear function of NN) classifier where bright and thick clouds are separated by threshold method and then this information is combined with SVM classifier to extract high-level of cloud information from multi-spectral satellite images [23, 193]. Further, the neural network approach has been utilized to identify classification membership of cloud, cloud shadow, water, snow/ice and clear pixels in a Landsat image [64]. With the similar approach, cloud detection algorithm in optical satellite imagery enhanced towards deep learning model. Few works have been studied like model based on U-net architecture which discriminate cloud against snow region in the visible range of electromagnetic spectrum with minimum multi-spectral training samples [73, 136].

A deep learning model for detection of clouds in optical satellite imagery has been observed, named as remote sensing network (RS-Net) which is based on the concept of U-net architecture. The model is trained and validated on Landsat 8 and SPARCS images. The evaluated results indicate the better discrimination performance on hardly distinguishable surfaces such as cloud over snowy and icy regions. In particular, the performance of deep learning model with visible bands demonstrates promising outcomes of cloud detection with multi-spectral satellite images [73, 141, 152]. Xie et al. Further, [169] designed a two layer deep convolution neural network (CNN) to discriminate clouds into thick, thin, and non-cloudy regions, but the network finds difficulty in distinguishing clouds in the presence of snow region.

Further, Cerdena et al. [17] used ANN algorithm with MLP network having back-propagation to get various cloud parameters like temperature of cloud, its optical depth and its effective droplet radius. However, ANN has some important disadvantages like network dependency on their architecture, characteristic parameters and training algorithms due to which the solution stuck to the local solution of the error surface. Therefore, genetic algorithm (GA) has been used to overcome this problem and get the global optimal solution. This indicates cloud detection implementations using artificial evolutionary algorithms (EA) which is a promising machine learning algorithm. EA is particularly appealing in cloud detection application because of their ability to reach the optimal solution in least computation time and limited training samples. Therefore, Kaminsky et al. [77] suggested an algorithm named optimized cloud detection index (OCDI) using GA. The analysis used PCA method initially over MODIS database to get the optimal channel and then GA is applied to optimize the OCDI parameters. Thus the probability of getting the global solution over error surface increases by overcoming the limitation of ANN approach. Earlier, Lisens et al. [103] described an algorithm using GA to optimize the NN parameters in order to design cloud mask. Therefore, it has been observed that amalgamation of NN and GA decrease the computation time and increases the NN ability. The cloud detection study has been further enhanced towards multi-dimensional objective space. Recently, Gupta et. al. developed multi-objective social spider optimization technique to obtain label data for single band using multi-spectral Landsat 8 satellite images. This label data is further utilized to train the NN to get label data for rest bands comes under visible range. By combining the decision of bands of visible range, cloud contaminated region is extracted against various surface of earth [51].

5 Deep neural networks for cloud detection

The rapid growth of machine learning technology to solve complex problems acquires attention towards deep learning methods in the past few years. The conventional machine learning approaches require a domain expert to extract appropriate features for a targeted problem. Deep learning approaches overcome this problem by extracting significant features [122, 123] from the raw data [88]. It acquires a body of processing layers where it can learn distinguish features of data by abstracting at multiple levels [87]. Therefore, deep learning algorithm has emerged in various application such as face recognition [86], speech recognition [57], image segmentation [147], signal processing [181], bio-activity prediction [157] and so on.

It has been observed that deep learning algorithm is modeled with different types of architectures like convolutional neural networks (CNN), deep belief networks and recurrent neural networks. The CNN, usually named as ConvNet, has gained wide attention because of its astonishing ability in classification application based on contextual information [107].

5.1 Convolutional neural network (CNN)

The CNN structure is described in Fig. 6 in which the functionality of four processing layers are discussed such as convolutional layer, pooling layer, fully connected layer and activation function.

Fig. 6
figure 6

Architecture of convolutional neural network for cloud detection

Convolution layer

correlates the neurons of previous layer in the next layer which is technically termed as receptive field [121]. The functionality of receptive field of neurons is to extract local features that is associated with a particular location of input image [88] and forms a weight vector [86]. The receptive field of neurons share same weight to the next layer due to which same features associated at different location of an image can be detected which is demonstrated in Fig. 7.

Fig. 7
figure 7

Neuron receptive field towards next layer in NN architecture

Here, the weight vector acts as a kernel or filter which slides over targeted image in order to map its significant features. This process is termed as convolution operation due to which number of filters are mapped together to extract their corresponding features from the input image [125] and reduces training parameters [183]. Thereafter, output in next layer at particular location is computed as discussed in (1).

Pooling layer

summarizes the extracted features in patched form. The main issue with output feature map of convolution layer is that it reduces the importance of features location once it has been detected [88]. One process to handle this sensitivity is to down sample the feature map in order to detect the robust changes of location of features in the image which is usually known as local translation invariance [183]. Therefore, Pooling layer is incorporated to down samples the feature maps. The average pooling or max pooling are the most common approaches that are used to significantly reduce the map-size [89].

Fully connected layer

is equivalent to the concept that is used to describe the artificial neural network (ANN). The output of convolution and then pooling layer is forwarded to the fully connected layer. The output of the fully connected layer is computed as formulated in Eq. 1. The limitation gap of this layer is high computational time complexity to train the big data [183].

Activation function

is introduced to incorporate the non-linearity in the deep learning model. The rectified linear unit (RELU) is the most frequently used activation function which returns 0 with negative input value whereas positive input is returned as it is in the output side. This simple approach benefits the model to operate in the non-linear environment [82].

5.2 Deep learning models

Cloud detection is a challenging pre-processing task in any applications of remote sensing images with limited number of spectral bands. Therefore, deep learning algorithm has acquired an immense breakthrough in the field of computer vision as an algorithm in artificial intelligence [30]. The convolutional neural network (CNN) as a deep learning algorithm has raised the performance of applications such as image segmentation [75, 82, 184], object detection, semantic segmentation and so on. [106, 172] proposed intensive predictor with fully CNN (FCNN) without fully connecting the entire layers. Moreover, this structure maps the segmented image of required size as well as enhances the processing speed over conventional CNN method. The FCNN approach is observed further to discriminate cloud and snow in the satellite image because of their similar low-level features such as color distribution and textural pattern [106]. The low-level feature based conventional methods [188] overlook the local textural pattern [185] and high semantic information which are the important element to discriminate cloud over snow region [13, 158]. Moreover, conventional local feature method difficult to model the semantic information such as cloud is enclosed with its shadow or snow is extended along the hills. However, FCNN deep learning algorithm has been used to learn their abstract pattern from its underneath layers [186]. Thereafter, bidirectional independent recurrent neural network has been introduced to extract deep features of images and perform the semantic segmentation; however the approach requires to be modified for multi-modal scenario [179].

Based on the principle of FCNN, the structure is modified, namely, multi-scale features convolutional neural network (MF-CNN) to get the multi-scale global features to discriminate the thick and thin clouds in Landsat 8 satellite images. This is due to the fact that MF-CNN integrates the spatial information at the low-level whereas it provides semantic information at high-level of its structure. This increases the accuracy to detect individual clouds in the combination image [81, 117, 143]. Normally, the analysis of thin cloud is in contrast over thick clouds such as in cloud removal and target detection application for which multi-level cloud detection is required to be addressed [19, 167]. Therefore, a deep CNN with two branches has been designed to get the multiscale features and to predict one of the three classes in which thick clouds, thin clouds and clear pixels has been presented [33, 169]. However, [182] proposed the dual branch CNN, named as multi-scale fusion gated network (MFGNet) to fuse the features at various depths and scales for cloud detection in Gaofen-5 (GF-5) database. [155] used deep neural network approach with feature fusing approach to extract cloud information by using temporal data of Meteosat second generation (MSG) satellite. With the similar approach, [18] has used SegNet with 13 convolutional layers and 13 deconvolution layers to distinguish thick cloud, thin cloud, cloud shadow and clear pixels in Landsat satellite images with multi-level spatial and spectral features. Further, multilevel feature fused segmentation network (MFFSNet) [171] has been proposed to discriminate cloud and its shadow information with the approach of pyramid pooling module. Similarly, [97, 98, 173] has developed multi-scale convolutional feature fusion (MSCFF) to map both local and global features of cloud and cloud shadow by incorporating encoder-decoder module. The identical encoder-decoder structure has been utilized with CNN architecture in [47], named as Cloud-AttU model to fuse the multi-scale features for cloud detection. Similarly, encoder-decoder architecture has been proposed in [48], named as, CDnetV2 by incorporating adaptive feature fusing model (AFFM) at the encoder side whereas decoder structure consists high-level semantic information guidance flows (HSIGFs) to get information of cloud location.

Further, cloud-net algorithm has been designed and trained fully convolutional neural network with local and global features of various blocks of Landsat 8 database [74, 109, 116, 176]. With the similar approach, [146] modified CNN model where input image is divided into super-pixel [161] as sub-region and then features are extracted from each sub-region with four convolutional layer and two fully-connected layers in CNN model to extract deep features of thick and thin cloudy pixels. Identically, [94] proposes a weekly supervised deep learning method for cloud detection (WDCD) on each block of images by incorporating modified global convolutional pooling (GCP) operation. It is observed that most of the algorithms are designed to handle multi-spectral satellite images. However, a few deep learning algorithm extended the methodology to handle hyper-spectral satellite images. Triple-attention Guided Residual Dense and BiLSTM networks(TARDB-Net) has been introduced to select essential spatial and spectral features for classification in hyper-spectral satellite images [16, 49, 50]. Further, CloudScout algorithm based on CNN has been deployed for nano satellite payload where the approach detect the cloudy image directly in on-board satellite and send only the cloud free images to ground station to develop the low power embedded application [44].

The major advantage of deep CNN approach as compare to conventional machine learning algorithms is that it can automatically detect the significant features for high dimensional data. This study has discussed the basic model of deep convolutional neural network and its extensive ability to use as a classifier to detect clouds in satellite images. Moreover, it is observed that the immense use of CNN approach in various applications such as high-resolution data, medical images, speech recognition and so on. Therefore, present study is expected to add a value in extending the knowledge of deep learning and provide a broad understanding who like to venture more in these applications.

6 Conclusion

The effects of cloud on weather prediction made its detection an important pre-processing task of remote sensing. This article discusses the important contribution of spectral and textural parameters to detect cloud, its shadow, cloud removal and classification of different types of cloud from multi-spectral satellite images. The major issue is to optimize the enhanced observational capacity of different radiometers using retrieved cloud parameters. To investigate this problem, parameters associated with cloud are sorted into spectral, texture, albedo and other features. From the algorithmic perspective there is a significant discussion is done on feature selection and weight optimization in neural network (NN) architecture. In NN based techniques, Levenberg-Marquardt, supervised learning rule is found better in terms of ellapse time and SOFM, an unsupervised learning algorithm is found better in terms of accuracy. Night time and day time simultaneous cloud detection analysis are important challenging issue using NN based techniques. Genetic algorithm is observed to fill the limitation gap of this techniques. Hence artificial neural network trends with evolutionary algorithm was chosen to increase its ability of generalization with low computation time and better accuracy. Further, the discussion on cloud detection algorithm has been extended with deep neural network approach. This study has emerged as a prominent approach towards classification method with contextual features. It is expected that it will provide a broad understanding for various other classification based application to various venture of this field.