Introduction

Different land cover types have a dramatic and distinct impact on the urban environment, including local climate and air quality because of their physical properties or as indicators of human activity (Fan et al. 2019). For instance, deforestation–afforestation can alter the local climate through strong modifications on evapotranspiration and urbanization directly affects local air pollution due to the increasing number of emission sources (Zhao et al. 2020a). High spatial resolution urban land cover information is required for investigating the influence of urban surfaces as well as other urban climate studies, transportation, and urban planning applications (Oke et al. 2017; Sha et al. 2018; Xu et al. 2019), urban ventilation and pollutant dispersion modeling (Yang et al. 2020; Zhao et al. 2020b), and carbon emission estimations (Zhang et al. 2014; Gao et al. 2014).

Previous studies (Du et al. 2016; Zheng et al. 2018; Xu et al. 2020) have aimed to investigate the influence of urban indicators on neighborhood-scale urban climate. These studies often relied on one or two case studies due to the scarcity of high-resolution land cover data, which typically have limited spatial and temporal coverage. Acquiring urban land cover data is challenging due to the intricate composition of urban surfaces, comprising various elements like buildings, roads constructed from different materials, and diverse impervious surfaces (Fan et al. 2021). While these data can be manually labeled or mapped, this process is labor-intensive and time-consuming. In recent years, deep learning methods, such as convolutional neural networks (CNNs), have recently been demonstrated to be an effective automated tool for complex urban surface classification at the pixel level (Volpi and Tuia 2017; Chen et al. 2020; Fan et al. 2021). It provides a promising tool for large-scale land cover extraction which will be of great value for urban environment studies at a larger scale, including the city, regional, or even global scale. However, deep learning methods are inherently data-driven and require a substantial volume of training examples, i.e., images, and corresponding manually labeled or annotated ground truth (Li et al. 2020). An additional challenge lies in achieving domain generalization, as models must acquire generalized and precise feature representations from a limited number of training data originating from the source domain to effectively perform on unseen target domains (Xu et al. 2022). The primary issue arises from data discrepancies between the training samples (source domain) and the target images (target domain), where domains often refer to different datasets or data distributions. It is common for the imagery used to have a different acquisition time, atmospheric condition, or even origination (from different satellite sensors), resulting in discrepancies between the source and target domains (Fan et al. 2021). Poor domain generalization ability has become a major hindrance to obtaining high-resolution land cover data at a larger regional, national, or even global scale.

Studies have examined the domain generalization ability and the reliance on training samples of deep learning methods (Hoffman et al. 2016; Scott et al. 2017; Li et al. 2019b). Transfer learning has emerged as a valuable tool for enhancing model generalization and has found application in image mapping tasks (Pan and Yang 2010; Fu et al. 2023b, a; Liu et al. 2023). A specific subset of transfer learning is domain adaptation (DA), which is specifically used to enhance the models trained on one domain (source domain) to perform well on a different domain (target domain) (Chen et al. 2018; Toldo et al. 2020; Wilson and Cook 2020; Peng et al. 2022). Transferring the images of different domains into a similar appearance or style is one of the effective ways to improve model generalization ability (Xu et al. 2022), such as the Cycle Generative Adversarial Network (Toldo et al. 2020). To enhance the performance, pretrained semantic model can be used as a supervision model to encourage the model to maintain consistent semantic representations during translation (Hoffman et al. 2018). While DA models have been used in high-resolution image classification (Ji et al. 2021; Luo and Ji 2022), the primary focus has been on improving image translation performance. However, the translated images may still exhibit disparities with the target image, and this error can persist and propagate throughout subsequent training process if used as new training data. To address this problem, we employed the semi-supervised learning (SSL) (van Engelen and Hoos 2020; Jing and Tian 2021) technique, which can leverage the labeled data to guide the model’s learning process, while also benefiting from the vast amount of unlabeled data. SSL can help reduce the influence of errors resulting from DA by extracting additional information from unlabeled data in the target domain, while also serving as an effective means to mitigate overfitting. Therefore, combining above-mentioned approaches will yield better results than using any single method alone (Zhang et al. 2022; Aryal and Neupane 2023).

In this study, our objective is to propose a domain adaptive land cover classification model for extracting land cover from urban surface imagery from different domains. The model combines the DA and SSL techniques and thus can be further exploited with existing annotated datasets and applied to different images from Google Earth platform [Google Inc., California, USA]. Then, the urban surface information, specifically the surface land cover types in urban areas, around weather stations and air quality monitoring stations was extracted from Google Earth images. This information was then employed to study the correlation between the urban surface types and the environmental parameters, i.e., air temperature and pollutant concentration.

The rest of the paper is structured as follows: “Data preparation” section presents the preparation of the source training dataset and the target images within study areas used for the training and testing process, as well as the meteorological and air pollutant data for further correlation analysis. The proposed domain adaptive urban surface recognition methods are described in “Methods” section. “Results and discussion” section presents results and a discussion of the domain adaptive image classification model and correlation analysis with local environmental parameters. In the end, the conclusions are drawn in “Conclusions” section.

Data preparation

Source dataset

To initiate the procedure, the GF2 dataset is chosen as the source dataset (XS, LS), which was built in our previous study (Fan et al. 2021). The GF2 dataset is a high-quality 1-m resolution land cover dataset built with 41.9 km2 of Gaofen2 (GF2) satellite imagery. Gaofen2 satellite is a sub-meter level optical Earth observation satellite, equipped with two high-resolution sensors (PMS) with a combined swath width of 45 km. The effective spatial resolutions for raw GF2 imagery are 4 m for multispectral bands (MSS) and 1 m for panchromatic band (PAN). The raw GF2 imagery has four spectral bands, including BLUE (450–520 nm), GREEN (520–590 nm), RED (630–690 nm), and Near-infrared (NIR, 770–890 nm). The Gram–Schmidt pan-sharpening method (Klonus and Ehlers 2009) was used to fuse MSS imagery (4 m) and PAN imagery (1 m) and produce 4-band multi-spectral images (RED, GREEN, BLUE, and NIR) with 1 m spatial resolution.

There are 10 sample images in the GF2 dataset covering both urban and sub-urban areas, and each sample image has a fixed size of approximately 4.2 km2 (2048 × 2048 m), and four of them are illustrated in Fig. 1(a1–d1). Eight of the samples were taken in Hangzhou, China, while the remaining two images were from Beijing, China. In this study, Beijing and Hangzhou samples were used as the source training dataset to enhance generalization ability. All images were fully annotated with eight common land cover categories at pixel level using the proposed OSM-OBIA method (Fan et al. 2021). Considering the actual land cover behind the “Shadow” is not clear, the rest seven categories were grouped into two larger categories for further correlation analysis, i.e., the “Artificial” (including “Building”, “Roads”, and “Other impervious”) and “Natural” (including “Tree”, “Low vegetation”, “Bare land”, and “Water”) surface. The definition and fraction of each land cover category are listed in Table 5 in Appendix A.

Fig. 1
figure 1

Demonstration of the GF2 training dataset. Four sample images (2048 m × 2048 m) from the source dataset were taken in Hangzhou (a1, b1) and Beijing (c1, d1). Corresponding ground truth (a2–d2)

Study area and target images

We investigated the impact of urban land cover in China’s three most developed urban agglomerations: Beijing, Shanghai, and the Guangdong-Hong Kong-Macao Greater Bay Area (GBA). All those areas have an urban population of over 20 million (China Statistical Yearbook, 2015) (National Bureau of Statistics of China 2015). The urbanization rate in Beijing and Shanghai has exceeded 85%, including Guangzhou, Shenzhen in GBA. We collected land surface images from Google Earth around 33 meteorological monitoring sites and 107 air quality monitoring sites in those three regions, as listed in Table 1. The distribution of meteorological and air quality observation stations is shown in Fig. 910 in Appendix B.

Table 1 Details of the environmental monitoring stations and observation data. (MT: meteorology, AQ: air quality)

To validate the model and obtain the urban surface information, the land surface images around meteorological and air quality stations are first downloaded from Google Earth as target images, referred to as Google Earth images hereafter. The Google Earth is a computer program that maps the Earth by overlaying satellite images, aerial photographs, and other Geographic Information System (GIS) data. Consequently, it can be a repository of land surface images, typically acquired by diverse sensors under varying climate conditions and at different times. According to existing studies (Li et al. 2019a; Liu et al. 2021), a buffer zone of 2 × 2 km square area was selected for the urban surface recognition and correlation analysis related to environmental parameters. The corresponding meteorological or air quality monitoring sites are located in the center of the buffer zone. Thus, the target dataset consists of 556 km2 of unlabeled Google images.

These selected images were chosen after the year 2017 to minimize the land surface difference in comparison to the source dataset. Then, images heavily obscured by cloud cover or mist were excluded by manual intervention. While the Google Earth platform can offer images with a spatial resolution as fine as 0.3 m, such high-resolution images were limited in availability. Therefore, images with a resolution of 0.6 m were chosen for our research., which were subsequently down-sampled to 1 m to be consistent with source images. The original GF2 imagery has four spectral bands, while images from Google Earth are RGB images that only contain three bands, i.e., RED, GREEN, and BLUE bands. For consistency, the NIR bands of the GF2 imagery in the training dataset were excluded. As a result, both source and target images are RGB with 1 m resolution. Detailed information of the source and target images is listed in Table 2. To further evaluate the model performance on Google Earth images, an additional number of images from Google Earth was obtained which correspond to the same ground truth with GF2 dataset, hereafter referred to as the Google dataset, as shown in Fig. 11 (a1–a3) in Appendix C. Misalignments exist between the Google Earth and the ground truth in GF2 dataset, and we have discussed the impact of these misalignments in detail in Appendix C.

Table 2 Detail information on the source and target images

Environmental data

To collect as many available samples as possible, meteorological and air quality data were collected over a two-season period in 2021 (Mar. 21, 2021, to Sep. 21, 2021) when a significant number of new air monitoring sites had been built and put into use. Meteorological data were collected from China National Meteorological Information Centre (http://data.cma.cn/). Mean values of air temperature (AT), relative humidity (RH), and Wind speed (WS) of the selected period were used for analysis in this study. Air quality monitoring data were obtained from the China National Environmental Monitoring Center (CNEMC, http://106.37.208.233:20035/). For air quality data, six pollutants (PM2.5, PM10, SO2, NO2, CO, and O3) were measured and recorded hourly according to China Environmental Protection Standards HJ 193–2013 and HJ 655–2013.Footnote 1 Detailed information about the data can be found in our previous study (Fan et al. 2020).

Methods

In this study, we initially combined the DA and SSL techniques to train a land cover classification model that can be applied on Google images, as shown in Fig. 2. Subsequently, the urban surface information around weather stations and air quality monitoring stations was extracted and was then used to study the correlation between the land surface types and the environmental parameters, e.g., air temperature.

Fig. 2
figure 2

Overview of the proposed workflow

Semi-supervised domain adaptive urban surface classification model

The overall workflow can be separated into two steps. In the first step, the source dataset (XS, LS) and target images (XT) were used in the domain adaptation (DA) process to generate a translated GF2 dataset (XS’, LS), as described in “Domain adaptation (DA)” section. The translated GF2 dataset is considered a training dataset built with target domain images. In the second step, the classification model is trained with the translated GF2 dataset (XS’, LS) and the unlabeled target images (XT) in semi-supervised learning (SSL) style, which is described in “Semi-supervised learning (SSL)” section. As a result, the trained classification model can be used in target Google Earth images.

Domain adaptation (DA)

As shown in Fig. 3, the DA process contains two steps. The first step is to use the source dataset to train an initial classification model (M0). DeepLab V3 (Chen et al. 2017) was used for the training process, which has an encoder-decoder architecture (Ronneberger et al. 2015; Badrinarayanan et al. 2017). The ResNet101(He et al. 2016) was used as the backbone network for feature extraction as recommended in a previous study (Fan et al. 2021). The decoder part used to recover spatial details from the extracted feature map remains the same as in Li et al (2019b). In the second step, the image-to-image translation model, i.e., CycleGAN (Zhu et al. 2017), whose task is to translate the source image into the same style as the target image, is trained with the source dataset and the target data. Meanwhile, the training process is conducted under the supervision of the initial model M0 to improve the training performance (Hoffman et al. 2018). With the trained CycleGAN model, a translated GF2 dataset (XS’, LS) is produced, which can be considered a dataset built from target images (XT).

Fig. 3
figure 3

Illustration of the domain adaptation (DA) process

Semi-supervised learning (SSL)

After the domain adaptation, SSL was used to enhance the model performance on target images. A new classification model M1 was trained with the translated GF2 dataset (XS’, LS) (Fig. 4). Then, the M1 model was applied to the target data (XT) to generate segmentation results as pseudo labels. Only the pseudo labels (Lpl) with high quality or prediction confidence were selected for the next procedure. To obtain high-quality pseudo labels, the prediction confidence threshold is set as 0.9 (Li et al. 2019b). Following that, the translated GF2 dataset (XS’, LS) and the target data with its corresponding pseudo labels (XT, Lpl) were used to train the next generation of segmentation model M2(i), where i represents the ith loop. The quality of pseudo labels (Lpl) can be improved and more high-quality pseudo labels can be obtained for the next training loop when a better M2(i) model is achieved. Therefore, the model with better performance is expected to be trained after each iteration. More detailed information for the models and configurations can be found in Appendix D.

Fig. 4
figure 4

Illustration of the semi-supervised learning (SSL) process

Evaluation metric

In this study, we assess the model's performance in the target domain, specifically on both the translated GF2 dataset and the Google dataset. We employ two evaluation metrics: the F1 score and overall accuracy (OA), as defined in Eq. (1) and Eq. (2), respectively. The F1 score is a metric that combines Precision (Eq. 3) and Recall (Eq. 4) into a single value. Precision quantifies the proportion of accurately classified pixels within a specific category. Recall, conversely, gauges the accurate identification of pixels belonging to a particular category. Good classification performance is characterized by high scores in both Precision and Recall, making the F1 score, which represents the harmonic mean of Precision and Recall, a more comprehensive evaluation metric. Meanwhile, OA is computed by dividing the total count of correctly predicted pixels across all categories by the total number of pixels. In this study, we assessed per-category performance using the F1 score and overall performance using OA.

$$F1\;Score = 2 \times (Precision \times Recall) / (Precision + Recall)$$
(1)
$$OA=(TP+TN)/{N}_{all}$$
(2)
$$Precision = TP / (TP + FP)$$
(3)
$$Recall = TP / (TP + FN)$$
(4)

where True Positives (\(TP\))/ True Negative (\(TN\)) are the number of pixels correctly classified as positive/positive; False Positives (\(FP\))/ False Negative (\(FN\)) are the number of pixels wrongly classified as positive/positive; \({N}_{all}\) is the total number of pixels.

Correlation analysis

After the land cover information around each weather station and air pollutant monitoring station was derived, bivariate correlation analysis was conducted between the Artificial/Nature surface proportions and environmental data, i.e., meteorological and air quality data, over a two-season period (Mar. 21 to Sep. 20) in 2021. The Pearson’s correlation coefficient (− 1 ≤ r ≤ 1) is used to measure the linear correlation between land surface proportions and seasonal averaged environmental values at each site, which is calculated by Eq. (5). Following Dugord’s work (Dugord et al. 2014), the significance of correlation was classified into three levels: strong correlation (|r|≥ 0.6), correlation (0.3 ≤|r|< 0.6) and weak correlation (|r|< 0.3). Bilateral t-test was conducted to evaluate the significance of the correlation coefficient.

$$r=\frac{\sum \left({x}_{i}-\overline{x }\right)\left({y}_{i}-\overline{y }\right)}{\sqrt{\sum {\left({x}_{i}-\overline{x }\right)}^{2}{\left({y}_{i}-\overline{y }\right)}^{2}}}$$
(5)

where \({x}_{i}\), \({y}_{i}\) denote value of the Artificial/Nature surface proportions and the environmental parameter at each monitoring station \(i\), respectively; \(\overline{x }\), \(\overline{y }\) denote their average value within the city/region.

Results and discussion

Urban surface classification

The overall accuracy of the final classification model on each dataset is presented in Table 3. The performance of the initial classification model M0, which is trained on the original GF2 dataset (XS, LS) serves as the benchmark. Our objective is to achieve performance as close to the benchmark as possible for the final model on the target domain, i.e., the translated GF2 dataset and Google dataset. The “M1” row represents the performance of the model trained on the translated GF2 dataset without using the SSL technique. The following rows show the performance of the model trained on the translated GF2 dataset at each iteration of SSL. The model’s performance on each land cover category is assessed with the F1 score, as depicted in Fig. 5.

Table 3 The overall accuracy of each trained model. M0: the model trained with the origin GF2 dataset. M1: the model trained with translated GF2 dataset. M2(i): the model in the ith iteration of semi-supervised learning using translated GF2 dataset and the source dataset
Fig. 5
figure 5

Model’s performance on each land cover category measured by F1 score

As listed in Table 3, the model M0 achieved 79.3% of OA on the source GF2 dataset but failed on the translated GF2 and Google datasets (OA < 30%), which means that M0 cannot be directly applied to land surface images from Google Earth. The model trained on the translated GF2 dataset could achieve 72.3% of OA using the domain adaptation technique. While using both DA and SSL techniques, the model performance on the translated GF2 dataset has a significant increase (72.3%–74.2%) at the first loop (Table 3). The improvement becomes smaller and reaches its highest OA (75.2%) on the translated GF2 dataset at the 3rd loop of SSL, i.e., M2(3). This indicates that the quality of the extracted pseudo labels stops improving and the semi-supervised training process converges, which leaves limited space for further improvement (Li et al. 2019b) with more SSL loops. Besides, the achieved accuracy closely resembles the score attained by M0, indicating that the model is not suffering from underfitting. The final classification model [M2(3)] achieved a significant improvement on the Google dataset, as listed in Table 3, from 23.1% to 62.3%. This score is relatively lower compared with the one on the translated GF2 dataset (75.2%). This result may be attributed to the fact that the DA process might not entirely eliminate the domain differences. It could also be partially influenced by misalignments with ground truth due to changes in land cover and variations in camera angles, as illustrated in Fig. 12 in Appendix C.

The models exhibit performance imbalances on each land cover category, as shown in Fig. 5. The performance of model M0 is relatively low for the “Low vegetation” and “Other impervious” categories, mainly due to the limited sample size of these two categories in the GF2 dataset (Table 5 in Appendix A) and high intra-class variances and interclass similarities (Liu et al. 2020). However, this issue is significantly mitigated while using M2(3), as shown in Fig. 6, which has been trained additionally on substantial amount of unlabeled data. the F1 score on translated GF2 dataset are higher than that of M1, as well as M0 on GF2 dataset. This indicates the SSL technique can alleviate performance imbalances across land cover categories resulting from uneven distribution of labeled training data (Yang and Xu 2020). While tested on Google dataset, the model experiences a general decline in performance across all categories, with the exception of “Water” and “Bare land”. This phenomenon may be attributed to the fact that these two categories have smaller interclass differences compared to others, such as buildings and roads. Additionally, they are less susceptible to image misalignment caused by variations in view angles. The most significant performance difference is observed in the “Shadow” category, which is likely to be affected by climate condition and the solar angle at the time of photography.

Fig. 6
figure 6

The classification results on (a) the target image using (b) the M1 model and (c) the M2(3) model

The trained model has demonstrated performance improvement when tested on the translated GF2 dataset and the Google dataset. This method effectively enhances the model's generalizability across the two domains, making it suitable for application to satellite images exhibiting data discrepancies. However, the model’s performance may remain constrained by the limited diversity in the original training dataset, particularly concerning regional variations and spatial resolutions (Fan et al. 2021). The original GF2 dataset was built with satellite images captured in Chinese cities, where modern urban construction practices have led to similarities in building and road materials in recent decades. Consequently, the method may be applied to other similar cities such as Guangzhou, China, as the result shown in Fig. 7. But the model may not perform as well in areas characterized by diverse building characteristics, such as European regions with historic structures.

Fig. 7
figure 7

The land cover map derived from Google Earth image. The blue solid line represents the administrative districts in Guangzhou, China

Correlation analysis

The classification model [M2(3)] derived the land cover information around each environmental monitoring station, including weather stations and air pollutant monitoring stations. The correlation analysis results are listed in Tables 4 and 7 in Appendix F. The correlations between the land cover fraction of a certain type and mean values of the weather data and air pollutant data are shown in Fig. 8, 13 and 14 in Appendix F. The meteorological data in 2020 is also used for analysis and the result is shown in Table 6 in Appendix E.

Fig. 8
figure 8

Relationship between urban land cover and meteorological indicators (two-season average values) in three regions

Correlation of land cover with meteorological parameters

As shown in Fig. 8(a, d) and Table 4, there is a strong positive correlation (0.60 to 0.73) between artificial surfaces and air temperature in all three areas. All correlation coefficients are significant except for those in GBA. This result is in line with previous case studies (Yan and Dong 2015). Zhang et al. (2020) found 70% of the variance in daytime air temperature can be explained by building footprint ratio. Air temperature can be affected by changed physical land surface properties and anthropogenic heat emissions (Stewart and Oke 2012). A larger proportion of artificial surfaces means more natural land cover is reformed into buildings, roads, and other impervious surfaces. Therefore, the increased heat storage, anthropogenic heat emissions, and decreased water storage capacity result in a warmer environment with reduced relative humidity (Oke 1982). Those correlations are ubiquitous in all three areas and have small seasonal variations, which are also supported with data from the year 2020, as shown in Table 6 in Appendix E.

Table 4 Pearson correlation coefficients between land cover and meteorological factors for each season in 2021. Bold values (with |r|≥ 0.3) indicate a correlation. (AT: air temperature, RH: relative humidity, WS: wind speed)

There is a negative correlation (− 0.32 to − 0.43) between relative humidity and impervious land cover in Shanghai and GBA (Fig. 8(b, e) and Table 4). Lin et al. (2020) also found a negative correlation between urbanization and humidity in the urban areas of Guangdong, which is the so-called urban dry island effect (Lokoshchenko 2017). Given the definition of relative humidity, the urban–rural difference in RH is governed by thermal and moisture differences (Oke et al. 2017). The thermal difference between artificial and natural areas has a significant contribution to the RH difference. However, the emission of water vapor from industrial sources and transpiration may offset this phenomenon. This likely explains the weak correlation between artificial surface and RH in Beijing, except for summer (Table 4), This can also be witnessed in the analysis result in 2020 (Table 6 in Appendix E). Compared with Shanghai and GBA, Beijing has a drier climate with less rainfall, except for its rainy season (summer) (Liu et al. 2009). In dry seasons, the irrigation of urban green spaces, additional water vapor from human activities, and the evaporation of water bodies become the dominant contributor to air humidity in urban areas, causing a similar or even higher RH than rural areas, i.e., an urban wet island effect (Liu et al. 2021). In this scenario, the correlation between RH and the fraction of artificial surface can be weakened significantly.

There is a negative correlation (− 0.40 to − 0.60, Beijing) and a weak negative correlation (− 0.24 to − 0.28, Shanghai) between wind speed and artificial surfaces fraction, as high-rise buildings and their complex layouts reduce mean wind speed (Fan et al. 2019; Zhao et al. 2020b). This phenomenon has been intensively observed in previous studies (Du et al. 2016; Tao et al. 2018). This relationship is not obvious in GBA. The meteorology stations in GBA are more scattered than in the other two cities (Fig. 10 in Appendix B). The different background wind conditions at these stations could be the dominant influencing factor. For instance, the wind speed in an urban area near the coast may be higher than in rural areas inland due to the sea breezes.

Correlation of land cover with air pollutant concentrations

The results of correlation with air pollutant concentrations are detailed in Appendix F, with the corresponding discussion presented here. The correlations between most air pollutants and land cover are not as strong as those of meteorological indicators, as shown in Table 7 in Appendix F. One reason is that the emission or generation of each air pollutant is complex and can be affected by multiple factors ranging from the building scale, neighborhood scale, city-scale to regional scale (Tao et al. 2018; Fan et al. 2020). Most air pollutants are related to human activities, e.g., traffic, agriculture, and industry, and have different formation mechanisms (Fan et al. 2020). For instance, traffic is a major emission source for NO2 (Lin and Cheng 2007) and CO (Hrebtov and Hanjalić 2019). The NO2 concentration is also affected by industrial emission and photochemical reactions with VOCs (Liu et al. 2013). This all causes the heterogeneous distribution of air pollution within a region. Solely land cover information does not contain land use information, traffic density and the location emission source. For example, residential and industrial land use have distinct impacts on local air quality, but such information cannot be recognized by this model at present. Besides, the impervious land cover or the recognized road cannot reflect the traffic volume, a large proportion of road cover in the sub-urban industrial area may have small traffic flows. Therefore, the impact of human activity on air pollution may be more clearly revealed if more impact factors can be included in the analysis. Another limitation of this analysis is that all monitoring stations in Shanghai are located in areas with a similar impervious surface fraction, ranging from 25 to 55%. In contrast, the range for Beijing is 0% to 70%, and for the Great Bay Area (GBA), it is 5% to 80%. The absence of data points in rural and highly densely built areas may introduce deviations into the current results. Therefore, our future work will prioritize including a more diverse set of sample points.

Conclusions

In this study, we combined the domain adaptive (DA) and semi-supervised learning (SSL) techniques to achieve domain adaptive land cover classification. With a small labeled dataset built from Gaofen2 (GF2) satellite imagery, the trained model can be applied to images from Google Earth. The model was jointly trained with a significantly larger amount of unlabeled data. Compared to the result using only DA, this study utilizes the SSL technique to further improve the model’s performance and mitigate performance imbalances across land cover categories that often arise from an uneven distribution of labeled training data in a small dataset. As a result, the model’s performance on Google Earth images was improved significantly. The best performance was achieved after three SSL iterations, represented by the M2(3) model. With this model, the overall classification accuracy on the translated GF2 dataset was improved from 19.5% to 75.2%. Although the performance of the Google dataset is underestimated due to misalignments between image and ground truth, the overall accuracy can be improved from 21.3% to 62.3%.

The classification model [M2(3)] was used to derive land cover data for analyzing the relationship between urban surface information and environmental parameters in the three most developed cities/areas in China: Beijing, Shanghai, and GBA. The results provide indicate a strong positive correlation (|r|≥ 0.6) between air temperature and artificial land surface. The relative humidity is negatively related (|r|≥ 0.3) to the artificial land surface except for Beijing. The wind speed in Beijing is negatively related (|r|≥ 0.3) to the artificial land surface, but the correlation is weak in Shanghai and GBA. In terms of air pollutants, we found most correlations between air quality parameters and land cover are weaker than those among meteorological parameters, and show a regional difference. Detailed land use information, traffic volume, and location of emission source are needed for investigating the distribution of air pollutants in future studies.

This study makes contributions to both high-resolution land surface classification in complex urban areas and understanding the influence of land surface on urban climate. However, the model’s performance remains constrained by the origin training dataset, which only contains training samples in Chinese cities. For future studies, integrating diverse sources of training samples into the scheme to build a more generalized model. The correlation analysis also provides useful insights for the parameterizing the impact of land cover in urban environment modelling. The high-resolution land cover information will be important data for assisting urban climate studies. Coupling high-resolution land cover data with other impact factors and more sophisticated urban climate models will be of great interest in the future.