Cloud Detection from Remote Sensing Images by Cascaded U-shape Attention Networks

Li, Ao; Yang, Jing; Li, Xinghua

doi:10.1007/978-3-031-46305-1_13

Ao Li¹⁴,
Jing Yang¹⁵ &
Xinghua Li ORCID: orcid.org/0000-0002-2094-6480¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14355))

Included in the following conference series:

International Conference on Image and Graphics

491 Accesses
1 Citations

Abstract

Cloud is an important meteorological information in remote sensing applications as it plays a significant role in the Earth’s climate and weather patterns, but it also brings difficulties to the information extraction from optical images, especially when the underlying surface features to be analyzed are obscured. Therefore, cloud detection is an indispensable step in optical remote sensing image processing. Different from low-spatial resolution images, medium and high-resolution images contain richer geographical features, and the distribution of clouds is more scattered, which makes it necessary to enhance the network’s ability on detailed features extraction. Therefore, the two cascaded U-shape attention networks (CUA-Net) model is proposed to detect the cloud in Landsat 8 images. In the first U-shape network, the up-sampling layers in path expansion integrate the information from all previous layers to make full use of multi-scale features. Additionally, the attention modules in the skip connection are added to detect the position and edges of cloud accurately. After that, the second U-shape network is utilized to optimize the preliminary segmentations from the first network, thus obtaining results closer to the ground truth. In the experiments, CUA-Net was evaluated on 38-Cloud Dataset and compared with current mainstream networks, showing significant improvements both on visual effects and quantitative indicators.

Access provided by Autonomous University of Puebla. Download conference paper PDF

LigCDnet:Remote Sensing Image Cloud Detection Based on Lightweight Framework

Remote sensing image cloud removal based on multi-scale spatial information perception

Article 16 August 2024

Gated aggregation network for cloud detection in remote sensing image

Article 27 June 2023

Keywords

1 Introduction

Remote sensing images play a vital role in natural disaster detection, agricultural resources management, environmental monitoring, urbanization surveys and other research fields. However, a factor that cannot be ignored in optical satellite images is cloud cover. Cloud can interfere with the remote sensing data by reflecting and absorbing the electromagnetic radiation, which leads to difficulties in data interpretation. Consequently, it is a crucial part of remote sensing field to accurately identify the cloud coverage over images for subsequent applications [1].

Cloud detection methods can be roughly grouped into classical methods and pattern recognition methods [2]. The threshold-based methods are the earliest classical methods. They mainly analyze individual pixels, such as the automatic cloud coverage evaluation [3] and Fmask [4], and they can segment cloud from images by multiple fixed thresholds. Especially, Sun et al. [5] proposed a general dynamic threshold cloud detection algorithm to solve the difficultly in fixed thresholds selection. Since those threshold-based methods are easily restricted by the spectrum, the Bayesian methods [6] and texture based methods [7] utilizing the spectral and geometric properties of cloud are proposed to leverage more features. Moreover, some methods based on statistical characteristics [8] are proposed for thin cloud detection. They mainly take advantage of the physical properties of clouds, so the results can be obtained quickly with the high-level characteristics of images ignored, which leads to detection difficulties when facing complex surface environments and ever-changing clouds.

With the development of computer hardware, pattern recognition technology has attracted the attentions. Many advanced machine learning methods to identify cloud are proposed. Among them, the early clustering [9], fuzzy clustering [10, 11] and SVM [12,13,14] have formed a mature system, however, the detection accuracy is relatively limited by their poor performance in large-scale training set. In recent years, artificial neural networks have emerged as a promising approach for cloud detection due to their ability to learn complex patterns and feature representations from multitudinous labeled training data. For example, the U-net [15, 16] uses a completely symmetrical network structure and skip connections to improve the accuracy of cloud detection with fewer training samples. MS-UNet [17] combines convolutions of different sizes to extract multi-scale features, thus identifying cloud of different sizes and shapes. Cloud-Net [18] proposed by Mohajerani et al. adds the residual structure to U-Net, and achieves superior results for Landsat 8 images. As time goes on, more advanced networks are proposed, Unet 3 + [19] uses full-scale skip connection to preserve spatial information and fuse features at different layers. Li et al. proposed global context-dense block U-Net (GCDB-UNet) [20] to enhance the detection capability of thin cloud. Lu et al. designed a mutual guidance module (MGM) [21] to solve the problem of rough segmentation boundaries. Although these methods have been able to detect most of cloud on remote sensing images, the thin cloud recognition and boundary identification capabilities still need to be further strengthened especially for medium and high-resolution images such as Landsat 8.

In order to better capture the complex semantic features and precisely segment the cloud in remote sensing images, the two cascaded U-shape attention networks (CUA-Net) model is proposed. Its innovations are as follows, (1) it enhances the connection between the network layers to preserve as much information as possible, (2) it makes use of the attention module to focus on relevant cloud features and to ignore irrelevant ones, which can improve the network's ability of identifying clouds in complex scenes with varying cloud and background noise, (3) a second U-shape network is designed to correct the inaccurate information gain from the previous steps. Via these structures, the features extracted from convolution blocks can be utilized effectively to recover sophisticated cloud masks and obtain higher accuracy.

2 Algorithm

The architecture is designed as two cascaded U-shape networks, as shown in Fig. 1. The first network is used to perform a preliminary segmentation by identifying the possible cloudy regions of the image. The output of the first network ${X}_{En}^{1}$ is then fed into the second network, which refines the edges and details by further segmenting the cloudy regions and removing false detections. After that, the preliminary results ${X}_{En}^{1}$ and the supplementary information ${X}_{De}^{1}$ are added and convolved once to obtain the final cloud detection results. The proposed CUA-Net will be introduced separately below.

2.1 The First U-shape Network

The first U-shape network consists of a contraction path for feature extraction and an expansion path for image recovery. The two parts are connected by the attention-based skip connection, which is used for transferring deep features from the contraction path to the expansion path to preserve spatial information.

Down-sampling Layer in Contraction Path.

The down-sampling layer mainly uses residual structure shown in Fig. 2. Its branches on the above include two $3\times 3$ convolutions to extract features from the input. The branches below use a small-scale skip connection, where the input firstly go through a $1\times 1$ convolution, and then connected with itself. Finally, the results of the two branches are summed and put to a maximum pooling. This structure can avoid the gradient disappearance caused by the deep network, and make the encoder converge faster. Simultaneously, it allows the network to learn the residual mapping between the input and output feature maps, which helps to preserve the low-level features from upper layer.

Attention-Based Skip Connection.

In U-Net, the skip connections are used to preserve the features learned from the contraction path and improve the accuracy of segmentation. However, only layers with same depth are connected in the original U-Net architecture. To address this limitation, a modified skip connection shown in Fig. 3 is proposed, the features from all previous layers in the contracting path are concatenated and sent to the expansion path. In order to make the output from layer ${X}_{Res}^{1}, \cdot \cdot \cdot , {X}_{Res}^{i-1}, {X}_{Res}^{i}$ able to be connected, multiple self-connections are used to make the dimension of ${X}_{Res}^{1}, \cdot \cdot \cdot , {X}_{Res}^{i-1}$ as same as ${X}_{Res}^{i}$, and then feature graph size is unified by maximum pooling. After that, all the $i$ layers are added and input to the subsequent attention module. This modified skip connection allows the network to capture more fine-grained details and improve cloud detection accuracy.

Convolutional block attention module (CBAM) [22] is a lightweight attention architecture composed of channel attention module (CAM) and spatial attention module (SAM). CAM focuses more on the category information. The input image will go through parallel MaxPool layer and AvgPool layer at first, and then pass by a single shared MLP to extract more comprehensive high-level features. SAM pays more attention on the spatial location of the target. It applies the average pooling and the maximum pooling along channel axis, which can effectively strengthen the spatial information.

The attention-based skip connection can preserve features extracted from all layers in contraction path and pay effective attention on the channel and spatial characteristics of the target. What’s more, the number of parameters in this structural is small, which will not bring additional burden to the network.

Up-sampling Layer in Expansion Path.

The up-sampling layer in the expansion path is used to increase the resolution of feature maps while reducing the number of channels, as shown in Fig. 4. The input ${X}_{Up}^{i+1}$ is firstly up-sampled by a deconvolution, then combined with ${AM}^{i}$ from corresponding skip connection and ${X}_{Up}^{i+2}, {X}_{Up}^{i+2}, \cdot \cdot \cdot , {X}_{Up}^{5}$ from the lower up-sampling layers. By this way, not only the feature maps in contraction path are used, the maps in the layers in front of expansion path are also used. Their combination will go through two convolutions to recover the semantic details and be added to the deconvolved ${X}_{Up}^{i+1}$. More complex and detailed cloud properties from deep feature maps can be recovered due to the full use of multi-scale information.

2.2 The Second U-shape Network

The second U-Shape network is mainly utilized to refine the segmentation mask generated by the first network. Although most of the cloud information can be extracted after the anterior training, thin cloud and fragmentary cloud are easily failed to be detected, and some highlight surfaces can be mistaken as cloud. Therefore, the second U-shape network is designed to revise these incorrect detections. It consists of an encoder-decoder structure with skip connections between them, similar to a four-layer U-net. The difference is that the bridge layer in the middle takes advantage of dropout function to prevent the model from overfitting. No extra structures are added to the second network due to its complementary role and the expectation of lower network complexity.

2.3 Activation Function and Loss Function

ReLU is used as the activation function except from the last layers of the two U-shape networks and the attention module which has certain definition. It is a piecewise linear function that produces an output of zero for negative inputs and a linear output for positive inputs. By introducing non-linearity, ReLU can avoid the network from gradient disappearance and overfitting with small cost. Sigmoid is used as the activation function after ${X}_{Up}^{1}$ and ${X}_{De}^{1}$ to map the output value between 0 and 1, thus determining the probability that each pixel is cloudy.

Denote the true value as $t$, the predicted value as $p$, and the total number of pixels as $N$, the loss function used can be denoted as Eq. (1).

$$Loss\left(t, p\right)=1-\frac{\left(1+{\beta }^{2}\right)\times \sum_{i=1}^{N}t\left(i\right)p\left(i\right)+\epsilon }{\sum_{i=1}^{N}t\left(i\right)+{\beta }^{2}\times \sum_{i=1}^{N}t\left(i\right)p\left(i\right)+\epsilon }$$

(1)

where $i$ means the $i$ th pixel in the image, $\beta$ is a constant which controls the weight of recall relative to precision. In the experiments, $\beta$ is taken as 2 to give more weight to recall, making it more suitable for cloud detection datasets where the positive class is smaller than the negative class. $\epsilon$ is assigned as10^–7 to avoid any division by zero.

3 Data and Experiments

3.1 Data and Environment

The experimental data set is 38-Cloud Dataset [18] made by Sorour Mohajerani, including 18 scenes for training and 20 scenes for testing, and each scene is cut to 384 $\times$ 384 patches. The source of the dataset is Landsat 8 images with the resolution of 30 m, and their red, green, blue and near-infrared bands are chosen for cloud detection.

The experiments were performed on a Linux system with Python 3.6, configured with GPU versions of Tensorflow1.12.0, Keras2.2.4 and skimage 0.15.0. A Quadro RTX 5000 graphics card was used as the driver for training and prediction. The Adam optimizer with an initial learning rate of 1 $\times$ 10^–4 was used during training, and when the learning rate was reduced to 1 $\times$ 10^–8, the training was finished.

3.2 Experiments Results

In order to verify the ability of the proposed CUA-Net, the comparison experiments and ablation experiments were conducted. The comparison experiments involve the performance of CUA-Net with state-of-the-art networks. On the other hand, the ablation experiments were conducted to evaluate the effectiveness of the second U-shape network and CBAM in skip connections.

Comparison Experiments.

U-net [16], MS-UNet [17], Cloud-Net [18] and Unet 3 + [19] are selected for comparison, and the experimental results are shown in Fig. 5, where the black and white refers to the correctly identified clear and cloudy area, respectively, while the red means it is cloudy but falsely detected as clear, and the blue means it is clear but falsely detected as a cloudy area.

The visual effects of cloud detection from whole scene image by different methods are shown in Fig. 5(a). It can be seen that these methods can detect majority of cloudy area, but U-net, MS-UNet and Cloud-Net have more mistakes, especially for the highlighted regions in lower right corner. Although Unet 3 + can achieve better results, the performance on boundaries is still worse and the missing cloud information is more compared with CUA-Net. Figure 5(b)–Fig. 5(e) is the visual effect of local details, representing four different types of landcovers: bare land, ice land, vegetation and mountains. Results indicate that CUA-Net can achieve better visual effect with less confusion and more clear boundaries under different surface conditions. For example, in Fig. 5(b) and Fig. 5(d) covering both thin and thick cloud, all methods can accurately detect the main cloud, but for edges and details, the results gained from CUA-Net is most consistent with the ground truth. As for Fig. 5(c) covered with ice and snow, U-net and MS-UNet have many omissions on the boundary, Cloud-Net and Unet 3 + perform better but the capability of detail extraction still need to be strengthened, while the CUA-Net can accurately distinguish between ice land and cloud due to its advantageous structures. For highlighted ground shown in the above of Fig. 5(e), all the other four methods detect it as cloud more or less except CUA-Net. Through the visual interpretation, it can be confirmed that CUA-Net can achieve more detailed edges and superior cloud detection results than other methods.

To evaluate the cloud detection accuracy more objectively, Precision, Recall, Specificity, Intersection over Union (IoU), Overall Accuracy (OA) and F1 score are selected for quantitative evaluation. High precision indicates that the detected cloud is generally true, while high recall means that the model can detect most cloud. Specificity is used to measure the negative predictions, IoU to measure the overlap between the predicted result and ground truth, and OA for the correctly classified instances. F1 score is the harmonic mean of precision and recall to measure their balance. They are defined as Eqs. (2)–(7).

$$Precision=\frac{TP}{TP+FP}$$

(2)

$$Recall=\frac{TP}{TP+FN}$$

(3)

$$Specificity=\frac{TN}{TN+FP}$$

(4)

$$IoU=\frac{TP}{TP+FP+FN}$$

(5)

$$OA=\frac{TP+TN}{TP+FP+FN+TN}$$

(6)

$$F1=2\times \frac{Precision\times Recall}{Precision+Recall}$$

(7)

where $TP$ (true positive) indicates the total amounts of correctly detected cloud pixels, $TN$ (true negative) represents the number of correctly detected clear pixels, $FP$ (false positive) means the amounts of clear pixels incorrectly detected as cloud pixels and the $FN$ (false negative) on the contrary. The quantitative evaluation results are shown in Table 1.

Table 1. Accuracy evaluation results in comparison experiments (%).

Full size table

Table 1 shows that the proposed method achieves higher accuracy than the other four networks in Precision, Recall, Specificity, IoU, OA and F1, which is consistent with the judgment of visual interpretation, indicating that the proposed method performs better in most of remote sensing scenes.

Ablation Experiments.

In order to verify the effect of second U-shape network (denoted as S-UNet) and CBAM in skip connections, we designed four ablation experiments: (1) only the first U-shape network used (denoted as F-UNet only), (2) the second U-shape network used without CBAM (denoted as +S-UNet), (3) the CBAM used without the second U-shape network (denoted as +CBAM), (4) both the second U-shape network and the CBAM used (CUA-Net). Their visual effect and accuracy evaluation results are shown in Fig. 6 and Table 2, respectively.

Table 2. Accuracy evaluation results of ablation experiments (%).

Full size table

Comparing the results in groups F-UNet only and +S-UNet combined with +CBAM and CUA-Net, it is found that S-UNet leads to a slight decrease in Recall, but the Specificity, IoU and F1 scores are higher than the experiments without S-UNet, and the Precision is remarkably improved. The visual interpretation also shows that the addition of S-UNet can achieve results closer to the ground truth, as it can be a good complement to the edges and details for cloud. Comparing the results in groups F-UNet only and +CBAM combined with +S-UNet and CUA-Net, it can be confirmed that CBAM can focus well on the attributes and locations of cloud, which can improve the detection accuracy comprehensively, and reduce the probability of confusing cloudy and clear area. The overall results show that better cloud detection results can be achieved with both S-UNet and CBAM.

4 Conclusion

In conclusion, the proposed CUA-Net for cloud detection has shown promising results. The second U-shape network helps to supplement the details and cloud boundaries, thus obtaining more refined and truth-related results. The dense connections and the attention model help the network preserve and focus on important features and suppress irrelevant features, contributing to higher accuracy. The CUA-Net has been evaluated on 38-Cloud dataset compared with four representative networks. The results show that it performs better than other methods in terms of quantitative evaluation and visual effect. Overall, the proposed method has potential to be applied in remote sensing fields where cloud detection is essential, and further research can be conducted to optimize the model for better performance.

References

Long, C., Li, X., Jing, Y., Shen, H.: Bishift networks for thick cloud removal with multitemporal remote sensing images. Int. J. Intell. Syst.Intell. Syst. 2023, 9953198 (2023)
Google Scholar
Gupta, R., Nanda, S.J.: Cloud detection in satellite images with classical and deep neural network approach: a review. Multimed. Tools Appl. 81(22), 31847–31880 (2022)
Article Google Scholar
Irish, R.R., Barker, J.L., Goward, S.N., Arvidson, T.: Characterization of the Landsat-7 ETM+ automated cloud-cover assessment (ACCA) algorithm. Photogramm. Eng. Remote Sens.. Eng. Remote Sens. 72(10), 1179–1188 (2006)
Article Google Scholar
Zhu, Z., Woodcock, C.E.: Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sens. Environ. 118, 83–94 (2012)
Article Google Scholar
Sun, L., et al.: A universal dynamic threshold cloud detection algorithm (UDTCDA) supported by a prior surface reflectance database. J. Geophys. Res. Atmospheres 121(12), 7172–7196 (2016)
Article Google Scholar
Xu, L., Wong, A., Clausi, D.A.: A novel bayesian spatial-temporal random field model applied to cloud detection from remotely sensed imagery. IEEE Trans. Geosci. Remote Sens.Geosci. Remote Sens. 55(9), 4913–4924 (2017)
Article Google Scholar
Başeski, E., Cenaras, Ç.: Texture and color based cloud detection. In: 7th International Conference on Recent Advances in Space Technologies, pp. 311–315. Istanbul, Turkey (2015)
Google Scholar
He, X.Y., Hu, J.B., Chen, W., Li, X.Y.: Haze removal based on advanced haze-optimized transformation (AHOT) for multispectral imagery. Int. J. Remote Sens. 31(20), 5331–5348 (2010)
Article Google Scholar
Gómez-Chova, L., et al.: Cloud detection for CHRIS/Proba hyperspectral images. In: 10th Remote Sensing of Clouds and the Atmosphere, pp. 508–519. International Society for Optics and Photonics, Bruges, Belgium (2005)
Google Scholar
Surya, S., Simon, P.: Automatic cloud detection using spectral rationing and fuzzy clustering. In: 2nd International Conference on Advanced Computing, Networking and Security, pp. 90-95. Mangalore, India (2013)
Google Scholar
Bo, P., Fenzhen, S., Yunshan, M.: A cloud and cloud shadow detection method based on fuzzy c-means algorithm. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 13, 1714–1727 (2020)
Article Google Scholar
Li, P., Dong, L., Xiao, H., Xu, M.: A cloud image detection method based on SVM vector machine. Neurocomputing 169, 34–42 (2015)
Article Google Scholar
Sui, Y., He, B., Fu, T.: Energy-based cloud detection in multispectral images based on the SVM technique. Int. J. Remote Sens. 40(14), 5530–5543 (2019)
Article Google Scholar
Latry, C., Panem, C., Dejean, P.: Cloud detection with SVM technique. In: International Geoscience and Remote Sensing Symposium, pp. 448–451. Barcelona Spain (2007)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: 18th International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241. Springer, Munich, Germany (2015)
Google Scholar
Mohajerani, S., Krammer, T.A., Saeedi, P.: A cloud detection algorithm for remote sensing images using fully convolutional neural networks. In: 20th International Workshop on Multimedia Signal Processing, pp. 1–5. Vancouver, Canada (2018)
Google Scholar
Kushnure, D.T., Talbar, S.N.: MS-UNet: a multi-scale UNet with feature recalibration approach for automatic liver and tumor segmentation in CT images. Comput. Med. Imaging Graph.. Med. Imaging Graph. 89, 101885 (2021)
Article Google Scholar
Mohajerani, S., Saeedi, P.: Cloud-Net: An end-to-end cloud detection algorithm for Landsat 8 imagery. In: 39th International Geoscience and Remote Sensing Symposium, pp. 1029–1032. IEEE, Yakohama, Japan (2019)
Google Scholar
Huang, H., et al.: Unet 3+: a full-scale connected unet for medical image segmentation. In: 45th International Conference on Acoustics, Speech and Signal Processing, pp. 1055–1059. Barcelona, Spain (2020)
Google Scholar
Li, X., Yang, X., Li, X., Lu, S., Ye, Y., Ban, Y.: GCDB-UNet: a novel robust cloud detection approach for remote sensing images. Knowl.-Based Syst..-Based Syst. 238, 107890 (2022)
Article Google Scholar
Lu, C., Xia, M., Qian, M., Chen, B.: Dual-branch network for cloud and cloud shadow segmentation. IEEE Trans. Geosci. Remote Sens.Geosci. Remote Sens. 60, 1–12 (2022)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: convolutional block attention module. In: 15th European Conference on Computer Vision, pp. 3–19. Munich, Germany (2018)
Google Scholar

Download references

Acknowledgement

The authors are grateful to the reviewers for their attention and comments on our paper. This research is supported by the National Natural Science Foundation of China (NSFC) under Grant no. 42171302 and the Key R&D Program of Hubei Province, China (2021BAA185).

Author information

Authors and Affiliations

School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, China
Ao Li & Xinghua Li
CCCC Second Highway Consultants Co, Ltd, Wuhan, China
Jing Yang

Authors

Ao Li
View author publications
You can also search for this author in PubMed Google Scholar
Jing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xinghua Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinghua Li .

Editor information

Editors and Affiliations

Dalian University of Technology, Dalian, China
Huchuan Lu
University of Sydney, Sydney, NSW, Australia
Wanli Ouyang
Shenzhen University, Shenzhen, China
Hui Huang
Tsinghua University, Beijing, China
Jiwen Lu
Dalian University of Technology, Dalian, China
Risheng Liu
Institute of Automation, CAS, Beijing, China
Jing Dong
University of Technology Sydney, Sydney, NSW, Australia
Min Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, A., Yang, J., Li, X. (2023). Cloud Detection from Remote Sensing Images by Cascaded U-shape Attention Networks. In: Lu, H., et al. Image and Graphics. ICIG 2023. Lecture Notes in Computer Science, vol 14355. Springer, Cham. https://doi.org/10.1007/978-3-031-46305-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-46305-1_13
Published: 29 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46304-4
Online ISBN: 978-3-031-46305-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics