Keywords

1 Introduction

Satellites gather an enormous amount of data every day. This data comes into use when the data is processed to gain useful insights. The essence of satellite imagery is that it gives the benefit of accessing remotely sensed data [1]. Remote sensed data is the relevant land cover data that is accessed anywhere irrespective of time and location. There is no requirement of visiting a particular location to explore the data. The data which is gathered in such a way can be used for analyzing the vegetation cover over a period of time. This information gives us the insights to predict cases like the increase or depletion of vegetation. The scenarios where depletion of vegetation cover is observed can be used for researching the drastic impacts on nature as a result.

Change detection techniques are not limited to land cover changes. The selective bands give us the detail of what is being detected. A few other observations that can be explored using change detection techniques are changes in land cover, and water bodies like rivers, lakes, and oceans. Predicting the differences using computer-based algorithms makes the process of finding the change faster and more efficient with improvements and enhancements that can be accommodated easily. The ecosystem stays balanced when there is a significant index of vegetation and forest cover.

Over 26% of the vegetation area was lost to habitation between the years 2005 and 2020. It is clear that the approaching era will be undesirable if these patterns of land use change persist. The majority of the land is covered by built-up areas (industries, commercial areas), as opposed to agricultural areas, however, agricultural areas shouldn't be thought for conversion to built-up areas [2]. Digital agricultural applications are crucial for tracking remote harvest and assessing the state of farmlands. As a result of their effectiveness in identifying land cover components, high-resolution pictures have drawn much more attention [3]. This estimate can be applied mainly to urban areas where the vegetation index decreases due to the increase in population. But it is not limited to any region.

The NDVI approach is commonly used to detect changes in the amount and distribution of vegetation, as well as in land use and land cover. Thanks to advancements in the spatial and spectral (from broadband range to small range) resolution of remote sensing data, it is now possible to work at the micro level [4].

1.1 Literature Study

According to Sumanta Bid, it is observed that NDVI technique with specific thresholds can detect changes in the vegetation of a place using remote sensing [4]. You Y, Cao J, Zhou W produced research article focused on different kinds of change detection techniques that can be used for urban change detection [5]. Song A, Choi J, Han Y, Kim Y research paper included change detection in hyperspectral imagery by using convolutional neural networks [6, 7]. Previous research on change detection for vegetation on multispectral images is either primarily focused on a single algorithm like NDVI or focused on urban changes and hyperspectral images. But there is a need to detect vegetation change accurately using satellite images and compare it to other techniques to identify the best technique.

Our proposed study provides extended research by comparison study of K-means and NDVI for vegetation change detection on multispectral satellite images and determines the performance of both algorithms primarily for detecting accurate changes in vegetation over years.

2 Proposed Work

Change detection identifies the spatial changes induced by man-made or natural processes in multi-temporal satellite images. Remote sensing for change detection can help with urban planning and expansion by considerably improving land utilization. The objective of the project is to gather Landsat 8 satellite images and apply NDVI and compare results with PCA and K-means between 2 timestamps of the Vijayawada area.

2.1 Datasets

The dataset is taken from earth explorer [8] with USGS credentials. The images are located with a specified coordinate set enclosed in circular shape. The image radius considered for this project is 250 m near Vijayawada which can be observed in Fig. 1. The datasets that are available are displayed as search results. There are alternative variations for each dataset which include the year, cloud cover, area covered, and temperature constraints.

Fig. 1
A screenshot of the Earth Explorer page with an image location view.

Image location view in earth explorer

The selected dataset can be viewed with a footprint market on the map which can be observed in Fig. 2 and the specification can be viewed in Table 1. The image is displayed as a tile on the map. When the dataset is downloaded, the bands that are available for the Landsat 8 satellite are displayed [9]. To download the data, USGS account credentials are necessary. The data collected is Landsat Collection 2 with Level-2 data including Landsat 8–9 OLI (Operational Land Imager)/TIRS C2 L2 as specifications. The image data with feature class as vegetation features with minimum cloud cover is selected to acquire accurate results.

Fig. 2
A screenshot of the Earth Explorer page. The search results column is on the left, and the location view is on the right.

Image location tile view in earth explorer

Table 1 Feature details of the dataset

2.2 Data Extraction and Pre-processing

QGIS acts as GIS (Geographic Information System) software which allows different users to explore and edit spatial data. Beyond that, it allows composing and exporting various kinds of maps. The images are selected and pre-processed using QGIS Semi-automatic Classification Plugin (SCP) [10]. The images are cropped according to required coordinates to map a specific location.

2.3 Design Methodology

The multispectral images that are collected in the area of Vijayawada with timestamps 2013 and 2021 are collected as eleven different bands individually as Landsat Collection 2 Level 1 data which can be viewed in Fig. 3. The specification for level 1 data is considered as Landsat 8–9 OLI/TIRS C2L1. The bands that correspond to different change detection are listed in Table 2.

Fig. 3
A set of Landsat 8 image bands. There are a total of 11 bands present.

Image bands of two timestamps

Table 2 Landsat 8 band designations

For vegetation analysis, bands 4, 5, 6 are considered. These image bands are stacked using QGIS software and pre-processed to remove noise. Our first change detection approach is calculating NDVI, and second approach is PCA and K-means.

2.4 Procedure

NDTS

First step is importing libraries that are needed to open Geo TIFF format images. The libraries required are osgeo, gdalconst, NumPy, SciPy, IPython, matplotlib which are built-in libraries in python. Load the datasets after pre-processing using QGIS SCP plugin. The datasets used here are the stacked images. The images are loaded, and the files are named to apply the functions that are available in the libraries. Editing spatial data in QGIS can be enabled using digitized tools [11].

Plot the histograms for both bands. By taking multiple data points and organizing them into logical ranges or bins for the pixel values, the histogram condenses the given picture data series into an easily understood visual. NDTS can be computed in vegetation as NDVI with 0.1 as threshold value.

$${\text{NDTS}} = \frac{{\left( {I_{t2} - I_{t1} } \right)}}{{(I_{t2} + I_{t1)} }}$$
(1)

Here, It2 represents Image at timestamp t1 (2013) with red reflectance and It2 represents Image at timestamp t2 (2021) with near infrared reflectance.

The NDVI approach is a straightforward arithmetical indicator that may be applied to remote sensing readings to determine whether or not the target or object being examined has significant vegetation. The array values of two images are used to find NDVI by using 0.1 as threshold value to identify the best satellite image differentiating using the algorithm which is specifically applied to calculate Normalized Difference Vegetation Index in the images. The connection between fluctuations in vegetation growth and spectral variations rate has been extensively studied using the NDVI (Normalized Difference in Vegetation Index). Determining the growth of green vegetation and spotting changes in the vegetation are both useful [12].

The image files are obtained as an output after calculation of NDTS. The square of values is calculated for comparison with the original image. The square of NDTS is calculated to increase scope of detecting change between marginal values that are estimated during NDTS calculation and remove noise using 3 × 3 mode filter. This improves the image accuracy and resolution to display changes.

Display the changes along with the detected output graph. The output is displayed in the form of a histogram and change map. The change map indicates the change using two colors. The black color signifies no change, and the white color indicates change observed. NDVI signifies vegetation change detection in two images.

Principal Component Analysis (PCA)

The primary purpose of PCA is to reduce dimensionality. A high number of variables are condensed into a smaller number in order to minimize the dimensionality of large data sets while retaining the majority of data.

PCA includes the following operations:

The first action is standardization. This stage involves normalizing the range of continuous beginning variables such that each one contributes equally to the analysis. Mathematically, the mean for each value of each variable may be subtracted, and it can be divided by standard deviation.

$$Z = \frac{{{\text{value}} - {\text{mean}}}}{{{\text{standard }}\,{\text{deviation}}}}$$
(2)

Computation of the covariance matrix. In this stage, we try to figure out how the variables in the given input dataset vary from mean about each other, or if there is any link between them.

$$\left( {\begin{array}{*{20}c} {{\text{cov}} \left( {a,a} \right)} & {{\text{cov}} \left( {a,b} \right)} & {{\text{cov}} \left( {a,c} \right)} \\ {{\text{cov}} \left( {b,a} \right)} & {{\text{cov}} \left( {b,b} \right)} & {{\text{cov}} \left( {b,c} \right)} \\ {{\text{cov}} \left( {c,a} \right)} & {{\text{cov}} \left( {c,b} \right)} & {{\text{cov}} \left( {c,c} \right)} \\ \end{array} } \right)$$
(3)

Calculate the eigenvectors and eigenvalues of the covariance matrix to find the primary components. By linearly integrating the initial variables, principal components are formed as completely new variables. n major components are given from n-dimensional data. The ten main components are thus provided by 10-dimensional data. The second component, however, has less information than the first component, since PCA concentrates the most information on to the first component.

The last step is to select the feature vector. The major components of this feature vector are quite important. Low-importance principal elements may be discarded. We can determine the primary components in terms of importance by computing the eigenvectors and sorting them by their eigenvalues in descending order.

K-Means Clustering

By using the K-Means clustering method, an informative index is divided into K unique, non-covering groups and it can be used for change detection [13]. Characterize the number of bunches you require (K) before using K-Means grouping. The K-implies calculation will then assign each perception to one of the K groups [14].

The K-Means uses a centroid that minimizes the idleness between the points. It can be represented by below equation

$$\mathop \sum \limits_{i = 0}^{n} {\text{min}}\left( {\left| {\left| {x_{i} - \mu_{j} } \right|} \right|} \right)\left( {\left| {\left| {x_{i} - \mu_{j} } \right|} \right|} \right)$$
(4)

Determine the number of clusters (K) and then randomly assign K different centroid locations. Finding the Euclidean distance across each point and the centroid is the following step. Each point should be assigned to the closest cluster before the cluster mean is determined as the new centroid. After the new point is assigned, the new centroid's position (X, Y) is:

$$\frac{{X \, = \, \left( {x_{1} + \, x_{2} + \, x_{3} + \, x_{4} + \cdots + x_{n - 1} + \, x_{n} } \right) }}{n}$$
(5)
$$\frac{{Y \, = \, \left( {y_{1} + y_{2} + \, y_{3} + \, y_{4} + \cdots + y_{n - 1} + \, y_{n} } \right)}}{n}$$
(6)

2.5 System Architecture

The architecture of the entire change detection system can be seen in Fig. 4.

Fig. 4
A system architecture diagram starts with downloaded images of different bands at time stamps t 1 and t 2, then 3 different bands of timestamps each, then apply change detection algorithms, and finally detects changes from both the stacked images.

System architecture diagram

3 Results and Observations

Different band combinations can be observed in Table 3. The bands selected for vegetation change detection are Red, Near Infrared, and Short-wave Infrared 1. The wavelength required for the detection of change ranges from 0.64 to 1.65 µm. The image resolution is around 30 for every image band of Landsat 8 [15].

Table 3 Bands associated with landscape

The image features are displayed in the form of a dense peak with values from negative value range to positive value range. A change map indicates the change between two images at different timestamps. From the analysis of the vegetation index in the area of Vijayawada, there is a significant decrease in the vegetation of the city. The reports suggest that this decrease in vegetation index is due to urbanization and the development of cities due to the increase in population over the past few years. This caused a hike in agricultural area removal and deforestation which decreased the vegetation index to a lower rate. The change map and associated graph results convey where the exact change is observed which gives the estimate of the vegetative index between the past few years.

RMSE is very helpful to measure a model's performance, during training, cross-validation, or even monitoring after deployment. The root mean square error is popularly used metrics for this. It is a reasonable rating scale that adheres to some of the most popular statistical hypotheses and is simple to understand. y(i) is the ith measurement, y^(i) is its corresponding forecast, and N is the total number of data points.

$${\text{RMSE}} = \sqrt {\frac{{\mathop \sum \nolimits_{{i = 1}}^{N} \left\| {y\left( i \right) - y \wedge \left( i \right)} \right\|^{2} }}{N}}$$
(7)

Image compression quality is also contrasted by the mean square error (MSE) and peak signal-to-noise ratio PSNR. PSNR indicates a measure of the peak error, whereas the MSE represents the cumulative squared error among original and compressed images. The error is inversely correlated with the value of MSE.

$${\text{PSNR}} = 10\log_{10} \left( {\frac{{R^{2} }}{{{\text{MSE}}}}} \right)$$
(8)

The histogram presented in Fig. 5 shows how the images are differentiated with each other. Blue curve indicates stacked image 1 and orange curve indicates stacked image 2. The change maps in Figs. 6 and 7 indicate particulary where the change is detected. White indicates change observed and black area indicates no change. The evaluation metrics obtained by calculating RMSE (Root Mean Square Error) is 0.0043071182 for NDTS and 0.0151108205 for K-means. It is used to calculate the variation between source image and obtained image. The metrics obtained by calculating PSNR (Peak Signal–Noise Ratio) can be seen in Table 4, which is 47.31626752546268 for NDTS and 36.41423705747562 for K-means. If PSNR is higher, then the image is of higher quality. If RMSE is higher, the produced image will be of lesser quality. Hence NDTS shows best change results compared to K-means.

Fig. 5
A line graph of the y axis ranging from 0 to 4000 versus the x axis ranging from 0 to 250. The line begins at around 0 and rises to a peak between 50 and 100 before falling back to 0.

Histogram depicting change between two images

Fig. 6
A representation of the altered map obtained by N D T S.

Change map obtained by NDTS

Fig. 7
A representation of the altered map obtained by P C A K-means.

Change map obtained by PCA K-means

Table 4 Metrics obtained

4 Conclusion

The project mainly deals with the change of vegetation. The analysis is done using remote sensing techniques with the aid of satellite imagery which gives accurate results in less amount of time. The extended and depleted vegetation gives the idea of how the land cover is being used over the years. As the estimation of change using change detection algorithms overcomes the problem of identifying the location where the plantation is required, it helps various organizations to take steps toward growth and control over vegetation. The project mainly dealt with vegetation change detection in multispectral images, and it can be concluded that NDTS performs better than PCA k-means for vegetation change detection.

The future study of this project can be extended for land cover change detection and water body change detection. Research using change detection algorithms can help to assess the situation of increase in water level over the years through global warming. But the process can be more perspicuous if the work is extended to hyperspectral images. The algorithms that are newly employed using complex deep neural networks can give a presumable spike in accuracy and decrease the time taken for the process.