Introduction

Land use and land cover maps are essential sources of information for land management and planning (Esfandeh et al. 2022; Yao et al. 2022) Earth Sciences and remote sensing organizations are traditionally interested in accurate and current LULC monitoring (Qian and Zhang 2022; Viana et al. 2019), Primarily because it offers useful data to comprehend relationships between humans and their environment (Praticò et al. 2021). Using single-source, single-temporal satellite images served as the foundation for LULC mapping (Steinhausen et al. 2018). An accurate and timely collection of LULC maps helps in understanding the development of human society and improves climate change and change in the environment modelling, It has a growingly important part in contemporary society (Kuang et al. 2018).

The Spatial map of land use and land cover as well as other earth surface features may now be done quickly and more accurately because of the use of GEE, remote sensing, and GIS technology (Pande 2022), Google Earth, and machine learning classifiers. The GEE platform has offered improved possibilities for simple large satellite data processing and analysis and provided time series data on land cover and usage (Yuan et al. 2005). The most popular technique for assessing land cover and monitoring changes over time is remote sensing photography (Gómez et al. 2016; Wulder et al. 2016), Due to population growth and the requirement to create additional areas to satisfy the demand for On hydrological and water resource modelling community (Roy et al. 2014), is eager to incorporate and assess shifting land use and its effects on the food production to support water budget, water security and energy production,(Sridhar and Wedin 2009).

For large-scale areas, high-resolution land cover mapping takes a massive amount of data for classification in traditional geospatial techniques, so choosing GEE for classification makes it simple to categorize an entire large region. As a result, enormous storage capabilities, powerful processing, and the ability to use a variety of techniques are all necessary (Xie et al. 2019). With the introduction of Google Earth Engine, these demands were met and this kind of technology was made available to everyone without charge (Gorelick et al. 2017; Kolli et al. 2020; Sidhu et al. 2018).

In recent decades, a large number of researchers have numerous researchers have employed various remote sensing imagery for LULC classification throughout the last few decades. For the classification and extraction of cultivated land, scrubland, agriculture, bare soil, and water body (Zhang et al. 2020), suggested an innovative combining classification using SAR and optical images in remote sensing. The overall accuracy was more than 85%. For the extraction and categorization of agriculture, scrubland, farmland water bodies, and bare soil, (Zhang and Roy 2017) suggested a unique fusion (feature-based) classification approach utilizing remote sensing pictures (SAR and Optical). The total accuracy was greater than 85% (Hu and Nacun 2018; Pan et al. 2022). Many investigations on LULC, but the standard approaches only extract a small number of land cover categories, and the classification outcomes with these results are frequently coarse. In reality, there are many different and intricate types of land cover in the fields. Therefore, creating a novel strategy for the specific forms of land cover will be quite useful (Batunacun et al. 2018). One of the important study regions for remotely sensed satellite images is how to execute LULC classification rapidly and reliably (Batunacun et al. 2018). The classic LULC method relies on remote sensing elements in addition to computer-aided image interpretation and visual interpretation as its key bases of support colour, texture, shape information, natural geography, and landforms are every instance of sensing imagery (Petit and Lambin 2001; Singh and Singh 2018). However, typical supervised classification involves substantial expertise and a time-consuming technique for obtaining an extensive amount of features (Chen et al. 2018). In addition, supervised classification (Traditional) using manually created spatial aspects frequently leads to small sample sizes and weak generalization (Zhao and Du 2016), also restricting the small number of training samples’ reliability and accuracy of the classification outcome (Chen et al. 2018). In summary, the manual selection of samples for training is a very lengthy, laborious, and unpredictable operation. To supply sufficient and effective samples in a timely way, substantial effort should therefore be put into choosing and extracting training samples in supervised classification (Attarchi and Gloaguen 2014).The benefits of machine learning (ML) algorithms for different fields of vulnerability studies like flood, drought landslide assessment (Chowdhuri et al. 2022; Saha et al. 2022a; Wen et al. 2022). In this regard, geographic information systems (RS-GIS) have been employed extensively over the past few decades (Saha et al. 2022b). RS-GIS techniques are frequently employed in the investigation of geohazards for their impact assessment studies (Islam et al. 2022; Saha et al. 2022b; Saha et al. 2022a, 2022b).

It is possible to complete thorough and automatic LULC classification inside study regions because of the large storage capacity, powerful processing, and self-programming classification algorithms(Zhao and Du 2016). All of the aforementioned criteria can be fulfilled using Google Earth Engine (GEE), and it’s free and available to everyone (Stromann et al. 2020). Without being restricted to consuming (time) conversion, mosaics, resampling, projection and registration, procedures, GEE can quickly process multi-source satellite pictures (Attarchi and Gloaguen 2014; Chowdhuri et al. 2022; Saha et al. 2022a, 2022b). In the current study, LULC classification maps have been generated utilizing remote sensing and GIS and Three advanced machine learning approaches, including SVM, RF, and CART. The effectiveness of the intricate and LULC classification (automated) approach was assessed using the use of machine learning techniques in supervised classification. Except for cartography, all procedures in this study were carried out using the GEE cloud platform. The goal of this research is to suggest a LULC classification technique that can quickly and efficiently supply unmanned training samples with thorough LULC categories. The main objective of the study was to classify LULC using multispectral satellite images from Landsat-8 and Sentinel-2, compare existing machine learning approaches on the GEE platform, and thereby determine the satellite image source and the machine learning technique that result in the most accurate classification.

Study area

The total area of Karnataka is 191,761 Square kilometre. The state is situated between the Latitude 11o31’N to 18o45’ N and Longitude 74o12’ E to 78o40’ E. The Karnataka is divided into 31 districts and 176 taluks. The Krishna, Cauvery, and Godavari are the three major rivers of Karnataka (Harishnaika et al. 2022), Based on Physiography, there are four zones in the state. In addition, the eastern portion of the state is made up of mountainous areas, coastal regions west of the Western Ghats, and northern and southern plains (Harishnaika and Kumar 2022).

Figure 1 is indicated the Koppen-Geiger climate classification system, which employs 30 types and 5 major classes to categorize the global climate patterns (Moghbeli et al. 2020; Yang et al. 2017), According to this classification it divides the Karnataka state into 4 groups, Namely; Aw (Equatorial Winter Dry), which predominates the region, is distinguished from BSh (Arid Steppe Hot) and tropical monsoon (Am) by various swings in monthly rainfall and temperature (Harishnaika and Ahmed 2023; Harishnaika and Ahmed 2022; Wu et al. 2019). The average annual rainfall of Karnataka is about 1248 mm, and the state’s average annual temperature is about 27 °C. The X and XII agro-climatic zones of Karnataka state are made up of the southern plateau and hills, west coast plain, and Ghats areas (Harishnaika and Ahmed 2022). The different part of the state has little different climate and weather conditions based on their environment and morphology, which controls the tropical climate of the area, you may get one of the greatest descriptions of the monsoon seasons around the world (Kumar et al. 2022). It possesses a high degree of ecological endemism and diversity, making it one of the 8 “hottest hotspots” of biodiversity on Earth. The Western Ghats and its surrounding regions do, however, periodically have little rainfall, which might cause forest fires in this area (Harishnaika and Ahmed 2022).

Fig. 1
figure 1

Location and climatic classification of the Karnataka state

Dataset

A platform called Google Earth Engine (GEE) contains a sizable amount of Earth observation widely used systems like MODIS, Landsat, and Sentinel as well as different geospatial platforms including demographic and climate data (Xie et al. 2019; Zhang and Roy 2017) from Landsat and Sentinel can be accessible in GEE through USGS (United States Geological Survey). The Landsat-8 OLI (Operational Land Imager), data for 2015, LULC generation, and the data (2016–2021), all contain information about the land surface cover it showed in Table 1. Sentinel-2 Level-2 C data and Landsat-8 data with atmospheric correction utilizing the Landsat-8 Surface Reflectance Code (LASRC) were employed in the current investigation. Because the cloud cover is less than 5–10% the data were chosen, and all these images had been combined into one image. Landsat-8 (5 bands) and Sentinel-2 (8 bands) were used to categorize the data. For a Landsat-8, the total number of images used was seven in 2015, For Sentinel-2, six in 2016, and nine images in 2017 and 2018, ten images are in 2019–2021.

Table 1 Sentinel-2 and Landsat-8 band information, which were used for LULC classification

Each Landsat and Sentinel pixel represented (30 m x30m and 10 m × 10 m) respectively, as the statistical unit. Five main classes were used to LULC namely, Agricultural land, Built-up land, Forest land, Fallow land, Wasteland, Water body and others. Forest and agricultural areas were represented as vegetation, while ponds and rivers were represented as water areas. The investigation used spectral bands from Landsat-8 images 2 to 7, as well as from Sentinel-2 image bands 2 to 12 (Table 1).

Methodology

Our suggested approach, shown in Fig. 2, was implemented in an entire Karnataka state that was wholly dependent on the utilization of the GEE cloud computing setting. We created a consistent time series starting with the images that were readily available in the GEE collection. To identify the optimal input image that offered the maximum accuracy. The Classification of different images according to their period and finally, the multiple classifiers like Random Forest, Support Vector Machine and Classification Regression Trees method were tested and determine which one is best results regarding classification accuracy. The summary of our suggested approach is likely image pre-processing, which includes image picking, filtration, and time series data construction; image processing (Computation of yearly data, image reduction); categorization (acquisition of Validation and Training points, best image for input selection, machine learning methods and factors optimization); accuracy evaluation and comparison; and mapping.

Fig. 2
figure 2

Location and Training sample selected for LULC classification

CART (classification regression trees)

CART was a single-tree selection classifier, similar to RF. The defining characteristic of the split is produced by one attribute, which divides the set of data into subgroups at the node of the tree depending on the normalized knowledge gain (Loh 2011). Breiman (1984) created the CART binary decision classification tree, which enables straightforward decision-making in logical if-then scenarios. Based on a predetermined threshold, CART runs recursively by separating nodes until it reaches the terminal nodes. This method divides the input data into group sets, and the trees are built using all but one of those. The reduced tree with the lowest deviation is chosen once the tree has been evaluated using the group that was excluded. The sample size employed in each class will have a significant impact on CART. High dimensionality data, in particular, hinders CART efficacy. To reach a judgment, the character with the highest normalized information gain value is chosen. In GEE, the smallest leaf density and the maximum amount of connections are the only variables that can be changed.

RF (random Forest)

Exact categorization and output combining are used forecast the outcomes (Tumer and Ghosh 1996). To create a new label, the random forest classification model aggregates the results from many decision-making procedures with the highest number of votes (Loukika et al. 2021). To produce a single tree, the random forest selects a randomly chosen subset of samples. Data from the initial, completed set of training data are sampled for every tree in this bagging procedure (Breiman 1996). RF randomly chooses the factors from training samples to calculate the best split for creating a tree. Although it can weaken individual trees, it weakens the link between trees, which causes a reduced misperception. Random forest gained popularity as a result of its dependability to sound and exceptions (Eisavi et al. 2015). It has been shown that RF can effectively classify various types of land use and land cover (Waske and Braun 2009). The two stronger algorithms, bagging and random, which are referred to as the method’s “powerhouse,” have helped the RF methodology. The R ‘random Forest’ package was utilized in our study to create the LULC map.

SVM (support vector machine)

A supervised learning algorithm called an SVM- support vector machine is utilized for solving regression and classification problems. In the training stage, SVM classifiers build a perfect hyperplane that divides several classes with the least misclassified pixels. The extreme points and vectors needed to build the hyperplane are chosen using SVM. Support vectors are the names for these extreme positions. The price factor C, gamma, and parameters serve as the primary selection criteria for support vectors (Hepaǧuşlar et al. 2004). The C and Gamma variables are defined using the grid search technique, producing accurate prediction outcomes. The cost parameter C significantly affects the performance of SVM and support vector machines. For training on sizable datasets, the simple linear kernel is favoured. Since the goal of SVM is to discover the best separating hyper-plane from the available hyper-planes, the original SVM method was launched with a set of data and its objective was to locate the hyper-plane that could separate the datasets into several classes. Additionally, the SVM algorithm needs an appropriate kernel function to precisely establish the hyper-planes and reduce classification mistakes. The type of kernel that is employed is a crucial component of the SVM approach. The SVM’s performance is primarily determined by the kernel size, and its resemblance to a smooth surface is primarily determined by the higher kernel density.

Accuracy assessment

Following the completion of the classification using machine learning techniques, the accuracy of the categorized images was assessed. New separate verification samples that were naturally extracted from excellent quality period data of the GEE platform were used to assess the accuracy of the ensuing results. To make each land cover type’s spatial distribution more clear, the System for Sites for terrestrial environment parameterization from Google Earth was also used.

Quantitatively evaluate the accuracy, the PA (producer’s accuracy), the user’s accuracy (UA), and OA (Overall accuracy) depending on the standard confusion measures utilized. Despite being a frequently used measure, the j coefficient was not included in the present investigation because it has a strong correlation with general precision (Olofsson et al. 2014). Likewise, an interval of confidence of 95% is shown for each accuracy index after it has been updated.

Results and discussion

Land use/land covers classification applying GEE

The present research uses Sentinel-2 Level-1C (10 m) and Landsat −8 (30 m). Landsat-8 level 1 data to assess the efficacy of several machine learning approaches on LULC and NDVI categorization. Figures 2 and 3 show how Sentinel-2 and Landsat-8 images on the GEE system were used as the preliminary input for machine learning techniques including RF, CART, and SVM (classification) for the LULC from 2015 to 2021. Disturbed pixels due to cloudy situations were removed from any obtainable data employing the cloud mask technique accessible on the GEE. Temporal aggregation techniques including median, imply that minimum/maximum was utilized to fill in the blanks in cloudy data. For the duration of the investigation, Sentinel-2 and Landsat-8 images were put together using the media NDVI (Normalized Difference Vegetation Index) are commonly employed index that was created and utilized as ad inputs for LULC classification to represent the properties of the vegetation and the water bodies, respectively.

Fig. 3
figure 3

LULC and NDVI at Different Machine learning Techniques

Image observations were used to generate the training and validation datasets. For classification, a total of 1617 training Samples were utilized. Each class should have at least 567 (RF), 521 (CART), and 527(SVM) training samples for classification, according to a rule of thumb (Loukika et al. 2021). Each class received 66 to 81 samples for verification and 81 to 97 samples for training samples shown in Fig. 2. The identical data used for training and validation were classified using SVM, RF, and CART algorithms. The five main categories of LULC were Agricultural, forest, water body, built-up land, fallow land, wasteland, and others. The optimal cross-validation factor is 5–10 from the experiments, which was utilized as a valuable input for the CART (Belgiu and Drăguţ 2016). For the years 2015 to 2021, maps of land use and cover were created using the random forest (RF) CART and SVM models. To identify LULC classes, therefore, the random forest strategy was effective in terms of results and accuracy. Tables 1 for data sets Sentinel-2 and Landsat-8 band information and Table 3 provide appropriate results of the LULC in Km2.

The NDVI method for time-series classification

The suitability of the NDVI approach for overtime classification using the Landsat time series was also demonstrated in this study. Using MODIS imagery, the NDVI had previously achieved extremely precise values for the satellite time series indicated in Fig. 3. Demonstrating the classifier’s ability to handle multidimensional analysis to some extent. The assessment in this study showed a good overall accuracy of roughly 74%. The accuracy values recorded for each of the agricultural land categories (the agricultural classes were prevalent in the research area, representing roughly 78% of the total area) were significant, even though the water bodies and forest categories contributed to this total value. We choose to calculate the series index (NDVI) and carry out the cluster analysis utilizing the spectral characteristics of the training data for all LULC classes to enhance the recognition of each LULC type and boost the classification accuracy of the relationship between the NDVI and LULC classes were indicated in Fig. 4. Surprisingly, the spectral characteristics of the different vegetation types observed in agriculture classes meant such confusion within agriculture land classification. This is particularly true for short-term crops in agroforestry settings. They possess a Complex agricultural pattern with seasonal production and harvest of short-term irrigated and rain-fed crops, followed by bare soil that is readily mistaken for long-term farmland and permanent pastures.

Fig. 4
figure 4

Spatial pattern of NDVI from 2015 to 2021

Linear regression model for NDVI

The measure of how closely two datasets are associated is the correlation coefficient (R2), often known as the Pearson correlation coefficient. Figure 5 and Table 2 Indicated the temporal pattern of NDVI concerning LULC. In the current investigation, R2 was used to demonstrate the level of correlation between the modeled and observed data sets. The current study also used the root mean square error (RMSE) to examine the discrepancy between forecasts and observations. The RMSE was used to assess the relative NDVI variability as well as the error of the predicted in Table 2. Using the Root Mean Square Error (RMSE), Mean square of the errors (MSE), Mean Absolute Percentage Error (MAPE), and R2 values displayed in Table 2, a detailed description of the geographic variation of the modelled concentration of NDVI is given. From 2015 to 2021, the coefficient value (R2) has a maximum of R2 = 0.22 and a lowest value of R2 = 0.05 for the years 2017 and 2018, respectively. The yearly mean, maximum variation, and minimum variance in NDVI were shown in Fig. 5 for the entire state. In 2015 and 2020, an average linear trend with R2 values of 0.12 and 0.11 was observed. In addition, 2021 and 2016 will show a median linear trend with a coefficient of determination of 0.08, which will be comparable to 2018. Karnataka had the greatest values for the NDVI maximum, minimum, and mean correlations (R2 = 0.169, RMSE = 0.101, MSE = 0.010, and P = 0.001). NDVI and (R2 = 0.014 and RMSE = 0.086, MSE = 0.007, and P = 0.001 respectively) have modest correlations, but they nevertheless fulfill the model indices in Table 2.

Fig. 5
figure 5figure 5

Temporal pattern of NDVI (mean, maximum and minimum) from 2015 to 2021

Table 2 Results of the Simple linear regression (SLR) For NDVI

LULC pattern from 2015 to 2021

Figure 6 displays the Landsat 8 and sentinels −2 derived LULC map layout for 2015–2021 together with a table showing the percentages of various land use groups offer data for the year 2015 on the producer’s overall accuracy, the kappa coefficient, and the arrangement of land types by Square kilometre. Table 3 showed the Results of the LULC in Square Kilometre, the results of this study show that Agricultural land (122,789.4 km2 or 64.03 of the total investigated area), is the biggest LULC group, Followed by Forest land (37,678.56 km2 or 19.65% of the total region) is the second-biggest land use group. The Wasteland (13,897.63 km2, or 7.25% of the total land), water bodies (10,060.7 km2, or 5.25% of the total land), and development area (4023.46 km2) make up the other three land use categories, the Table 4 indicated that Results of the LULC Change Percentage %. Throughout the study period, agriculture and forest areas predominate; however, there is an increase in water bodies land due to the construction of new canals and water bodies from the government plan; these increases are about 10,060.7, 10,530.8, 10,450.7, 11,230.7, 11,654.7, and 11,789.7 from 2015 to 2021, respectively in Table 3. The area of built-up and agricultural land is expanding, while the area of wasteland is decreasing from 2015 to 2021. On an annual basis from 2015 to 2021, LULC maps were produced for the entirety of Karnataka. Nearly 7.5% of the land region in Karnataka had a change in LULC, which served as a defining characteristic of the changing patterns in Table 4. As demonstrated in these case studies, time-series maps of LULC were created using RS images and then used to remove the veil of mystery around the changes that the surface of the planet had undergone. Figure 6 notified the spatial pattern of LULC concerning CART, RF, and SVM.

Fig. 6
figure 6figure 6figure 6

Spatial pattern of LULC with respect to CART, RF and SVM

Table 3 Results of the LULC in Km2
Table 4 Results of the LULC Change Percentage %

The accuracy assessment

In this investigation, stratified random sampling was the method of choice, and it was carried out individually for each type of LULC. Using the RF, CART, And SVM classification methods, an overall of 567, 521, and 527 validating samples and training samples 221, 203 and 199 respectively from Karnataka districts were extracted, respectively. For the Landsat 8 data in 2015–2021, the overall accuracy in the CART Model is 84.01%, and the Kappa Coefficient is 0.704%. For the classification of Sentinal-2 images (Fig. 7 and Table 5), the overall accuracy is about 85.17%, and the Kappa Coefficient is 0.723%. The average overall accuracy in the RF Model for the Landsat data is 93.98%, and the kappa Coefficient is 0.902%. For the classification of sentinel 2 images, the overall accuracy is around 95.157%, and the kappa Coefficient is 0.916%. The mean value of the overall precision in the SVM Model for the Landsat data is 89.00%, and the Kappa Coefficient is 0.842%. For the classification of sentinel 2 images, the overall accuracy is approximately 92.13%, and the Kappa Coefficient is 0.88%. The study area and classifiers’ overall accuracy (%) Kappa coefficient can be seen in the figure. The Overall Accuracy (%) based on the RF classifier for the state of Karnataka is consistently greater compared to CART and SVM. Figure 7 showed the temporal variations of accuracy in %. Table 5 indicated the Overall accuracy in the Kappa coefficient of Sentinel and Landsat for machine learning methods from 2015 to 2021.

Fig. 7
figure 7

Accuracy Assessment of LULC (SVM, RF and CART) in %

Table 5 Overall accuracy in Kappa coefficient of Sentinel-2 and Landsat-8 for machine learning methods

As a result, for effective land use management and planning, which can be supported by RS (remote sensing) and GIS-based technologies, it is essential to have a greater comprehension of the many connections between changing land use/cover and rural inhabitants’ livelihoods. Today, scientists and researchers are working to improve already-existing methods, cloud platforms, and tools for geographic information systems. It is a valuable setting for performing advanced analysis and classification procedures as well as producing multi-temporal composite images. The NDVI and land use maps of various times have been estimated using the Google Earth engine algorithm in the region. Using the GEE machine learning system, Erdas Imagine 2014, and ArcGIS 10.8 software, a change detection mapping has been created. Because of this, change detection mapping is a very popular and effective application of GIS software. There has been a significant change in many courses of LULC over the past five years.

Conclusions

The state of Karnataka area’s LULC class change has been detected over six years via remote sensing and GIS technology. The study was carried out in a Karnataka state for the classification of LULC using advanced machine learning techniques in the GEE platform.

  1. 1.

    Landsat-8 satellite and Sentinel-2 were used for study over 6 years, and the effectiveness of RF, SVM, and CART techniques for the classification of LULC on the GEE platform was examined. The reliability of LULC data classification from satellite photos was influenced by the type of classifier that was employed. The efficacy of each classification concerning each class can be assessed using the accuracy evaluation of each class.

  2. 2.

    A conservative approach would not be likely to try to compute both temporal and spatial phenomena without the use of multitemporal satellite data. The results show that in six years, agricultural land increased by 64.03–67.81 (3.78%), while water bodies increased by 5.25 to 6.3. The increased built-up land in the study region implies extraordinary spatial development of the area. A total of 567, 521, and 527 validating samples from Karnataka districts were extracted, respectively, using the RF, CART, and SVM classification algorithms.

  3. 3.

    According to the findings of this study, agricultural land (122,789.4 km2 or 64.03%) is the largest LULC group, followed by forest land (37,678.56 km2 or 19.65%).

  4. 4.

    The government’s new development strategies and plans, the LULC and NDVI images show that the quantity of forest and agricultural land increased between 2015 and 2021. The NDVI coefficient value (R2) for the years 2017 and 2018 has a maximum value of R2 = 0.22 and a lowest value of R2 = 0.05 from 2015 to 2021.

  5. 5.

    The linear trend in NDVI with R2 values of 0.12 and 0.11 was detected between 2015 and 2020. Furthermore, 2021 and 2016 will exhibit an average linear trend with a coefficient of determination of 0.08, comparable to 2018.

For LULC classification, the benefits of the GEE (cloud platform) are convenient and adaptable, especially for a large number of input features. We were able to overcome the limitations of desktop systems by using the GEE to speed up geospatial large data collecting and processing the datasets.