Introduction

There has been a rise of agricultural activities in arid and semi-arid regions throughout the globe. The primary reason is the growing population’s high demand for agricultural products. The terraforming of deserts and wastelands into arable lands using modern technologies (Marasco et al., 2012; Muniasamy, 2020; Patil et al., 2015; Singh, 1998; Tanwar et al., 2018) has its consequences. The arid lands were never designed to sustain heavy agricultural activities, and human terraforming abilities are in their infancy. So, there have been new challenges in the drylands, such as unprecedented farming activities and environmental changes (El-Beltagy & Madkour, 2012; Sivakumar et al., 2005). An effective monitoring strategy must be implemented to collect data and observe these phenomena for better planning and management. Field surveys for these vast wastelands are one solution, but they are slow, time-consuming, and costly. On the other hand, remote sensing satellites (Hazaymeh & Hassan, 2017; Johansen et al., 2008; Tueller, 1987) and unmanned aerial systems (UAS) (Ahmad et al., 2022; Sankey et al., 2018) provide a cost-efficient and rapid alternative for mapping and surveying such surfaces.

With the availability of the high temporal frequency and spatial resolution of remote sensing data from satellite constellations like the Dove satellite sensor constellation etc., it is still a challenge to extract helpful information from the enormous amount of data available. The information from the huge collection of raster datasets can be autonomously extracted with statistical and machine learning classification algorithms. The classification algorithms efficiently delineate the phenomenon or classes by segregating information from noise with minimal human interaction once properly trained. The use of machine learning algorithms to extract features in agricultural fields is an ongoing research area as algorithms’ ability to delineate features varies from case to case and needs to be calibrated uniquely for the feature and region. A few studies have used remote sensing classification for these arid regions (Chen et al., 2020; Jabal et al., 2022; Krisnayanti et al., 2021; Ram & Kolarkar, 1993; Saha et al., 2018; Sharma et al., 2015). In those studies, the satellite data had moderate resolution for mapping various natural and anthropogenic activities. With modern technology, high-resolution satellites like Worldview-2 (Alabi et al., 2016; Nouri et al., 2013; Rahman et al., 2018; Upadhyay et al., 2012; Zhang et al., 2019) provide data for measuring these parameters with better accuracy. However, at such a high-resolution, small objects that were previously invisible behave as noise and hinder classification accuracy (Adhikari et al., 2021). Hence, more refined automated machine learning (ML)-based classification methods and better algorithms are needed to provide enhanced results.

The proper selection of ML algorithms plays a vital role in the process of feature extraction. The random forest (RF) (Breiman, 2001) is a widely used and well-tested ML algorithm in the field of remote sensing (Belgiu & Drăguţ, 2016; Pal, 2005) feature extraction and classifcaiton. RF algorithm is an ensemble-based ML algorithm consisting of many independent decision tree classifiers trained on subsets of training samples. The combined results of all trees are aggregated to summarize the result. It can take large data sets with higher dimensionality, provides variable importance that informs about the impact of the variable on the output quality, is immune to overfitting, and handles bulk quantities of variables efficiently. Hence, it is used as the primary ML model for the current case study. Another major factor that decides the model’s classification accuracy is the input parameters (or variables) and training samples used for classification. Image textures (Bharati et al., 2004) are the derived visual interpretation features from the pixels themselves. Image textures are spatial patterns or a specific arrangement of pixels that distinguishes them from the background or other features. The textural image information takes the surrounding pixels into consideration, thus adding more context to the features. In the current study, the texture of various classes are visually distinct and using a textural mathematical descriptor can further enhance the classification accuracy. Here one such innovative approach is discussed for an arid cropland delineation from similar barren land using textural information.

Materials and methods

Study area and datasets

The northwestern Rajasthan in India is a dryland; most of it is covered in sand and xerophytic vegetation. These regions suffer from severe erosion, low rainfall, and severe droughts. In the late 1960s, the green revolution influenced agricultural practices throughout the nation. The geography of northwestern Rajasthan has seen a dynamic shift in its land use and land cover since then and is still undergoing transformation. The combination of vast unhabitable areas, extreme climatic conditions and regional data complexities have resulted in poor monitoring of these regions. The region (as shown in Fig. 1) is situated at Bhaluri village in Bikaner district, Rajasthan, India. It has an arid climate and has water dependency on canals or seasonal rainfall. The region encounters extreme temperature variations and poor soil quality in terms of nutrients and the terrain is highly erosive. The soil taxonomy of the area falls under Torripsamments (aridic Psamment). It consists of unconsolidated sand deposits, often found near shifting sand dunes. It has no distinct soil horizons, has low water-holding capacities, and must consist entirely of loamy sand or coarser texture material. Seasonal farming (Kharif) is performed in small patches in the region.

Fig. 1
figure 1

Study area in Bhaluri village

The image (shown in Fig. 1) is a subset of satellite imagery taken in October 2019 from Worldview-2, a high-resolution remote sensing satellite. Worldview-2 satellite is a commercial Earth Observation satellite by DigitalGlobe (Launch date 8/Oct/2009). The PAN band has ground sample distance (GSD) of 0.46 m while the eight-band multispectral (MUS) has 1.84 m and radiometric resolution is 11 bits/pixel. It is a sun-synchronous satellite with an altitude of 770 km and a revolution period of 100 min. For this study, the Worldview-2 eight-band multispectral dataset was available. The bands are shown in Table 1.

Table 1 Worldview-2 satellite data MUS band details

Most of the data analysis work was done using open-source libraries of R in Rstudio (raster) (Hijmans et al., 2021), Rgdal (Bivand et al., 2021), ggplot2 (Wickham, 2016), GGally (Schloerke et al., 2021), caret (Kuhn, 2021), random forest (Liaw & Wiener, 2002), RStoolbox (Leutner et al., 2021), python gdal (GDAL/OGR Contributors, 2021), cv2 (Bradski, 2000), numpy (Harris et al., 2020), matplotlib (Hunter, 2007), seaborn (Waskom, 2021), pandas, and Scikit-image (van der Walt et al., 2014). For map publishing and data visualization, ArcGIS and ArcGIS Pro were used.

Reference data generation

Furthermore, to explore the validity of the classified results, the cropland was manually digitized (as shown in Fig. 2), and was used as a reference for the automated random forest classification approach. It will provide insight into the similarity between manual delineation and automated strategies discussed above. It is to be noted that, in the manual delineation approach, small green patches and tree canopies were not marked as croplands but considered whole as a parcel distinctively visible in the image.

Fig. 2
figure 2

Manually delineated croplands

Methodology

The study area selected is a subset from an arid region which depicts current scenario of arid croplands. The study was designed to analyze the automatic cropland feature extraction for aridlands. Figure 3 explains the methodology followed for the paper. First, the atmospherically and terrain-corrected Worldview-2 data was collected. It was observed for radiometric and geometric corrections, and finally, the desired study area was cropped out. OSAVI (optimized soil-adjusted vegetation index) (Rondeaux et al., 1996) was derived, which is similar to NDVI (normalized difference vegetation index) (Tucker, 1979), but the influence of background soil is reduced effectively (Xue & Su, 2017). Here band B7 and band B5 were taken as the NIR band and red band respectively to calculate OSAVI, as shown in Eq. (1).

$$\mathrm{OSAVI }(\mathrm{B}9)=\frac{\left({\uprho }_{\mathrm{NIR}}-{\uprho }_{\mathrm{RED}}\right)}{\left({\uprho }_{\mathrm{NIR}}+{\uprho }_{\mathrm{RED}}+0.16\right)}$$
(1)

where ρ is reflectance in the respective band.

Fig. 3
figure 3

Methodological flowchart

The GLCM (gray level co-occurrence matrix) is one of the image’s textural mathematical descriptors to differentiate various textures and has been widely used in remote sensing applications (Dhumal et al., 2019; Iqbal et al., 2021; Jia et al., 2012; Wickham, 2016). GLCM is created from a grayscale image. It is estimated by taking a pixel and building a relative window size around it. Within the window, the intensity of each pixel is compared with the central pixel and different combinations of gray levels that co-occur window are recorded in a matrix, which gives the measure of variance in gray level intensity concerning the central pixel. Finally, as discussed below, various statistical methods are applied to derive the different GLCM variables. Thus, GLCM might provide helpful information on separating crop patterns (dense) and wild vegetation (sparse) when combined with random forests. In few previous works (Adhikari et al., 2021; Wickham, 2016), GLCM indeed improved the accuracy but the present work investigates the ability of GLCM to delineate croplands in an arid ecosystem. It will highlight their application for the semi-arid cropland automatic delineation, which assists in better and more convenient monitoring of the region.

The OSAVI was used to calculate the GLCM (Hall-Beyer, 2017; Haralick et al., 1973) and the seven textural bands (mean, variance, homogeneity, contrast, dissimilarity, entropy, and second moment) as shown in the following equations (Eqs. (2) to (8)), which were used as an input parameters for classifcation. The following GLCM bands were used in the study:

  1. 1.

    Mean (B10):

    $${M}_{ij}= \sum_{i}^{N}\sum_{i}^{N}i*P(i,j)$$
    (2)
  2. 2.

    Variance (B11):

    $$VA{R}_{ij}= \sum_{i}^{N}\sum_{i}^{N}{\left(i-\mu \right)}^{2}*P(i,j)$$
    (3)
  3. 3.

    Homogeneity (B12):

    $$HO{M}_{ij}= \sum_{i}^{N}\sum_{i}^{N} \frac{1}{1+{\left(i-j\right)}^{2} }*P(i,j)$$
    (4)
  4. 4.

    Contrast (B13):

    $$CO{N}_{ij}= \sum_{i}^{N}\sum_{i}^{N}{\left(i-j\right)}^{2}*P(i,j)$$
    (5)
  5. 5.

    Dissimilarity (B14):

    $$DI{S}_{ij}= \sum_{i}^{N}\sum_{i}^{N}|i-j|*P(i,j)$$
    (6)
  6. 6.

    Entropy (B15):

    $$EN{T}_{ij}= -\sum_{i}^{N}\sum_{i}^{N}P\left(i,j\right)*\mathrm{log}(p(i,j))$$
    (7)
  7. 7.

    Second Moment (B16):

    $$S{M}_{ij}= \sum_{i}^{N}\sum_{i}^{N}P{\left(i,j\right)}^{2}$$
    (8)

where,

\(P(i,j)\) = GLCM value on element (i, j)

N = number of gray levels used

\(\mu = {\sum }_{i,j}^{N}i*P(i,j)\), is the GLCM mean

These GLCM features statistically describe the texture of the objects in the image. The Gray level is quantized to 16 levels and in all directions, but one parameter remains unknown, which is the window size, which determines the range over which gray level co-occurrence is observed. Hence, multiple GLCM datasets are generated by varying bin (or window) sizes. The various sample sizes were tested with respect to the classifcation accuracy, and at about six thousand samples, the accuracy was getting consistent and was taken as the sample size. Then, all 16 bands are normalized and six thousand samples (fifteen hundred each from all four classes) are sampled from the population using stratified random sampling and manually validating the pixels through visual inspection. Defining the number of samples for RF machine learning algorithm has been an issue. We worked on samples from 25 samples per class (the total number of classes is 4) to 2000 samples per class keep all other parameter constant and examined the model’s accuracy. It was observed that model accuracy increased with sample size but as it approached 2000 samples per class the acuuracy curve became almost flat. Hence, 1500 samples per class was taken for current model. From the literature (Corcoran et al., 2013; Izquierdo-Verdiguier & Zurita-Milla, 2020; Rodriguez-Galiano et al., 2012), it was observed that around 1:2 ratio for dividing their training and testing dataset. During the empirical comparison of the accuracies of results using different ratios of training and testing data (for 6000 samples), we found 1:2 worked best for us. Two-thirds (66.67 % approx.) samples are taken for training data and the rest are used for testing data.

The random forest (Breiman, 2001) classifiers are run multiple times by varying GLCM textural bin sizes, which are applied in order to analyze and optimize the model with respect to bin size. The window sizes used were 0, 3, 5, 7, 9, 11, 13, 15, 19, and 23. The classifiers are labeled as no-GLCM (only nine (B1 to B9) bands are used as input parameters), GLCM-3, GLCM-5, GLCM-7, GLCM-9, GLCM-11, GLCM-13, GLCM-15, GLCM-19, and GLCM-23, respectively. Finally, testing data is used to validate the classification. Then, the final crop delineations (from all classifiers) are compared with the manual classification to observe the similarity between manual and automated approaches.

The study area is broadly classified into two major categories (croplands) (a), where active agriculture is being practised and the barrenlands. The barrenlands are further classified into three sub-categories: (b) dense xerophytic vegetation (wildland) are undisturbed passive sandy region with major vegetation patches, the sandy textured with minor vegetation patches (sparse vegetation land) (c), and actively disturbed (bright soil patches) (d) that are most likely formed due to human agricultural activities in past and currently unused as shown in Fig. 4. A few pixels of cropland and wildland share vegetation cover and spectra similarly; bright patches and sparse vegetation wildland pixels are pretty similar in terms of pixel values. Moreover, similar spectral properties lead to higher chances of misclassification. The cropland vegetation and tree canopies are denser in OSAVI band than the rest of the region, but specific patches within cropland boundaries are bright due to soil degradation or poor crop health.

Fig. 4
figure 4

a Cropland, b dense xerophytic vegetation (wildland), c the sandy textured soil with minor vegetation patches, and d bright soil patches

The improper agricultural practices might have resulted in the removal of wild vegetation and making the soil surface prone to erosion, leading to further degradation of barrenlands. Most of the barrenlands of class (d) might be caused due to removal of natural vegetation for croplands and are now abandoned with minimal vegetation growth leading to more land degradation and sparser vegetation. The tree canopies’ spectral signature behave quite similar to croplands which add more complexity as tree canopies are small features that adds ambiguity at the borders. In the following Worldview-2 data, delineating these regions becomes challenging as most bands are highly positively or negatively correlated (as shown in Fig. 5). Hence, the information they offer might overlap a lot. The bands B11 and B16 showed the least (positive) correlation, whereas B7 and B8 have a minimal correlation.

Fig. 5
figure 5

Correlation table between different bands (GLCM bin size is 13)

The correlogram was plotted using a small sample (twenty thousand samples selected with stratified random sampling from four classes) from manually classified datasets for all sixteen bands, but due to its large size (16 bands), a subset (only bands B1, B6, and B9) is presented in Fig. 6. Band B1 class (d) (in purple) seems to have a bimodal histogram, and as the wavelength increases (or the band number), the bimodal gets closer, they approach SWIR wavelengths. The class (d) which represents the bright patches, from the histogram, it was observed there are twin peaks within the bright patches class, one slightly brighter than the other, and as the wavelength increases, the twin peaks converge. This shows the possibility of two subtypes of bright patches or might be noise (or misrepresentation) within labelled samples. The OSAVI (B9) is a vegetation index; it highlights class (a) (in peach), which is rich in chlorophyll and represents healthy plants. It is noticeable that the OSAVI is quite spread for class (a), which indicates the overall varying condition of crops and bimodal histogram and also could be due to the minor presence of tree canopies in class (a) samples.

Fig. 6
figure 6

Correlogram of 4 classes samples (peach (crop/class a), lime green (poor vegetation/class b), cyan (sparse wild vegetation/class c), and purple (bright patches/class d) to bands (B1 (coastal), B6 (red edge), and B9 (OSAVI))

It would appear that for the sample, the delineation between barrenlands (b, c, d) and cropland is possible using B9. Still, it is a sample and does not highlight small spatial classes like tree canopies at the delineation border and various random locations that distort accuracy. Furthermore, for a better study of land degradation, the delineation between different classes of barrenlands is also required, so ML algorithm RF was implemented to provide better analysis. In Fig. 4, the texture of each barrenlands and cropland is different due to vegetation pattern; thus, using GLCM (of OSAVI) might improve the classification accuracy.

The seven normalized GLCM-13 (with a window size of 13) images, generated from the OSAVI band, are shown in Fig. 7 as grayscale images. The GLCM is used for the quantification of heterogenous surface patterns and roughness. Since OSAVI was used for GLCM generation, the soil-adjusted vegetation surfaces were highlighted from the rest of the characters. The seven GLCM textural images, mean, variance, homogeneity, contrast, dissimilarity, entropy, and second moment cover individual aspect of the matrix-based statistical texture features. The mean and variance act as the low pass filter and the elements that vary from the average value are re-adjusted to give an averaging look to the image. The averaging effects tend to eliminate the speckles generated by tree canopies. Homogeneity or inverse difference moment (IDM) value appraises the tightness of the distribution of the components.

Fig. 7
figure 7

GLCM-13 textural images a mean, b variance, c homogeneity, d contrast, e dissimilarity, f entropy, g second moment, and h OSAVI of the study area

Thus, the IDM tends to be significant for images with constant or near-constant patches. The contrast is more prominent for a GLCM with larger off-diagonal values and its weight values are the opposite of IDM weight. The first-order GLCM entropy for homogeneous scenes highlights higher entropy. It will highlight the zones of a sudden change in pixel values which are mostly boundary lines with abrupt change in classes. Thus, a near-random or noisy image will have a larger entropy. In the dataset, the boundaries are visually distinct and focused on homogeneity, contrast, entropy, and dissimilarity, although only entropy inhibits tree canopy information to a certain extent. Second moment (SM) measures the homogeneity by the sum of squares of entries. A near-random or noisy image will have an evenly distributed GLCM with a low SM. In the image, the large uniform regions appear brighter, while complex regions with smaller field patches appear darker. Overall, the mean and variance highlight the significant agricultural croplands with vegetation. In contrast, entropy highlights various boundaries between different classes with minimal interference from the tree canopyies and SM emphasizes image uniformity. The visual inspection of data provides a glimpse of dataset parameters and further statistical analysis needs to be conducted.

Similarly, the rasters were analyzed for each of the seven GLCM parameters and represented in Fig. 8 as the density plot. The density plots, or kernel density estimates (KDE), are the smoothened version of the histogram representing a continuous variation in the values. The samples from the normalized GLCM-13 are extracted and plotted for analysis. The KDE of B11 (mean) and B12 (variance) tends to conveniently segregate out cropland from the rest of the Bands. Since they are random samples from population and do not represent the exact distribution of classes, the appropriate boundary delineation of the croplands might not be attained but samples provide an approximate information on classes behavior. While in rest of the bands, the delineation of each class is visible, but the overlap between classes b, c, and d is dominant, showing similarity in classes in the particular bands.

Fig. 8
figure 8

Density plot of a GLCM bands (B10 to B16) with four class samples (a (crop), b (poor vegetation), c (sparse wild vegetation), and d (bright patches) for GLCM-13 textural bands (B10 to B16)

On further analysis, the mean GLCM KDE at different bin sizes (3, 5, 7, 9, 11, 13, 15, 17, and 19) are represented in Fig. 9 for all four class samples. The KDE from the sample provides a deeper insight into the behavior of classes concerning bin size and the seven GLCM parameters. The mean GLCM KDE for different bin sizes highlights values shifting in four classes. Classes (a) cropland and (d) bright patches are on the distinct opposite side of the graphs but are mostly adjacent in the image. It also highlights the average OSAVI values of each class and classes b, c, and d have similar OSAVI values. Overall, as the bin size increases, the mean GLCM KDE tends to flatten, and most are multimodal.

Fig. 9
figure 9

Density plot of a different band with four class samples (a (crop), b (poor vegetation), c (sparse wild vegetation), and d (bright patches) for GLCM mean textural Bin sizes (3, 5, 7, 9, 11, 13, 15, 17, and 19)

Overall, the band ratios could not fully delineate all classes optimally. In order to improve results, machine learning classifiers (MLC) were implemented. The RF classifiers have high accuracy with limited training samples and were used in the study (Belgiu & Drăguţ, 2016; Feng et al., 2015). The RF uses multiple parallel decision trees with input samples selected through bootstrap aggregation. The output classified image is the combination of results from multiple decision trees fed with slightly different training samples; this results in higher classification accuracy. The multiple samples are obtained by Bootstrap aggregation (bagging function). Suppose input training samples are X = x1, x2, x3…, xn and output are Y = y1, y2, y3 …, yn. The bagging process is repeated N times with changing training sets using the fitting function (fn), as given in Eq. 9. Unseen x′ is fitted by fn, and then, the result is averaged by N. This process reduces the bias and strong cumulative predictor trees has a higher chance of designating accurate classes to input datasets.

$$\widetilde{f}=\frac{1}{N}\sum_{n=1}^{N}{f}_{n}\left({x}^{^{\prime}}\right)$$
(9)

Results and discussion

The major results of the RF classification are for all the bin sizes and textural parameters are collectively discussed in the following section. Table 2 represents the confusion matrix of RF classification using the nine bands only (no-GLCM). The overall accuracy and kappa coefficient observed were 96.3 % and 94.8 % respectively, when run with testing datasets. Overall the distinction between crop and other classes was observed with high accuracy. Classes b, c, and d seem to have a more significant overlap. Higher overlap between classes is because of both are barrenlands with difference in vegetation cover and the OSAVI band used for GLCM highlights vegetation. Hence, using OSAVI band for generating GLCM texture bands highlights vegetation-barren textures and improves the classification accuracy with additional textural details. The final organized map is shown in Fig. 10. Most of the misclassifications are due presence of chlorophyll-rich tree canopies in the midst of barrenlands which are on the higher end of OSAVI and closer to class cropland (class a) leading to decrease in classification accuracy.

Table 2 Confusion matrix for classified result with first nine bands
Fig. 10
figure 10

RF no-GLCM cropland classification of the study area

Table 3 represents the confusion matrix of classified results with the GLCM-13 (input is nine bands and GLCM-13 bin-sized seven textural bands) approach. The training and testing data used were the same in all approaches. Clearly, GLCM-13 has higher accuracy than no-GLCM approach. The overall accuracy of 99.5 % and the kappa coefficient of 99.3 % was observed. GLCM improved the accuracy by adding textural information. The accuracy also depends upon the properties of training and testing samples taken. Even though the overall accuracy is high, it should be noted that the relative accuracy of the no-GLCM compared with GLCM-13 is lower; hence, it has improved the classification model. Figure 11 represents the classified map with the GLCM-13 approach. The presence of lesser speckles is also a bonus using textural information.

Table 3 Confusion matrix for classified result with GLCM-13 approach (16 bands)
Fig. 11
figure 11

RF 13-GLCM classification of study area

The random nature of classifiers comes from the row and column subsampling. Each tree takes random sub-samples, which are split based on a random subset of input data variables. The split-criterion importance is measured across all classification and regression trees (CART) for each input variable as variable importance is calculated for the GLCM-13 and no-GLCM classifier, shown in Table 4. The more significant the variable responsible for classification higher it is ranked. On further comparison of the first few variables’ importance for no-GLCM approach, B2 (blue) seems to be the most significant variable, followed by B9 (OSAVI) and B7 (NIR-1), while bands B6 (red edge) and B4 (yellow) had a minimal role in the splitting of data.

Table 4 Variable importance table for GLCM-13 and no-GLCM

Similarly, the most significant variable for the GLCM-13 classification observed was B12 (variance), followed by B11 (mean) and B7 (NIR-1), while B3 (green), B4 (yellow), and B6 (red edge) had a minimal contribution. Now the variable importance may increase or decrease by altering varying input combinations. As most bands are highly correlated (as seen in Fig. 5), thus bands B3, B4, B5, and B6 seem to have a lower contribution. At the same time, NIR and OSAVI-based bands play a vital role in classification as the texture is primarily due to vegetation patterns. OSAVI, NIR, and most of the GLCM bands highlight the vegetation more from background objects, thus improving the accuracy.

In further analysis, the cropland was delineated, and a morphological operator (erosion) was applied to remove noise. The same filter was used in both cases and the border areas were trimmed and finally compared with manual delineation. Figure 12 represents the binary image output of no-GLCM and GLCM-13 approaches. The noise is reduced but not eliminated using morphological operators. It is then compared with manual delineation (Fig. 2). The resultant output is represented in Table 5 (here, the reference is manual delineation and predicted is the output of GLCM-13 RF).

Fig. 12
figure 12

a Delineation of the crop from no-GLCM classifiers, b delineation of the crop from GLCM-13 classifiers

Table 5 Confusion matrix for comparison of classified data with manual cropland delineation

The overall accuracy of 96.3 % and kappa coefficient of 94.8 % were observed for the no-GLCM RF classifier. For GLCM-13, RF classifier accuracy of 99.5 % and kappa coefficient of 99.3 % were observed. As the GLCM bin size increases, reference crop and predicted barren-land error tend to increase during anticipated crop and reference barren error decreases. Also the minor barren region within the cropland is obscured with increase in GLCM bin size. On the nine bands, the Maximum Likelihood Classifier (MLC) model (no-GLCM MLC) was also applied with similar conditions and even though the classification accuracy observed was almost similar to no-GLCM RF, the delineation accuracy of no-GLCM MLC was 89 %. Hence, RF-based models were preferred over it.

As the GLCM bin size increases, overall accuracy tends to go higher until it stagnates at bin size 19; no further improvement in accuracy is observed, as seen in Fig. 13. Figure 13 also compares bin sizes and both accuracies (RF-GLCM classifier and manual delineation), which show an increment in the accuracy with an increment in bin sizes. The graph at bin size 13 for RF-GLCM tends to stagnate for RF-GLCM and delineation accuracy. While for bin size nine, the delineation accuracy tends to stall to a straight line and then slowly decline after bin size 19. Lower delineation accuracy is observed for no-GLCM because of the presence of tree canopies and green vegetation in barren region classes. In GLCM RF-based approach, the influence of these object noises (tree canopies in barrenlands) is somewhat reduced, but it also nullifies the small patches of barrenlands present between the croplands.

Fig. 13
figure 13

Comparison of accuracy with different bin sizes

The vegetation texture was distinctively visible for different classes at current very high satellite data resolution. The work supported using GLCM-based RF MLC for automated classification of arid regions by showing improved classification accuracy over non-GLCM datasets. It was observed that OSAVI-based GLCM-13 gave the optimal results when delineating crops for the present study. The delineation of croplands using texture seems to blur the minor barren land inside the cropland while also removing little vegetation patches in the barren fields. The tree canopy at the edges of cropland and barren land contributes to the delineation error. As bin sizes increases, the classifiers can compensate. Hence depending upon the application, the GLCM-based classification must be used cautiously.

Conclusions

The current paper explored the utility of advanced high-resolution remote sensing satellite datasets for monitoring and characterizing arid croplands activities from the background. The addition of textural properties overall enhanced the delineation process as the boundary delineation accuracy has improved with 16 parameters by 3.5 %. Overall monitoring the ever-changing agricultural patches to estimate the barren land, cropland, and natural vegetation area for various applications (mapping LU/LC (land use/land cover), biomass estimation, anthropogenic mapping activities, cereals harvest predictions, and modeling for desertification, deforestation, soil erosion, desert greening to name a few is a cumbersome process) is essential for arid zone environment monitoring and assessment. Using the textural information and ML processes for auto delineation between these classes will save lot of time and provide better results, assisting in environmental management decisions. Furthermore, the present work improves monitoring accuracy by emphasizing the potential of high-spatial resolution satellite imageries, artificial intelligence, and textural information. Automating the classification process to properly delineate croplands with high accuracy will result in a much better and faster analysis of the region. It will eventually make environmental and agricultural remote sensing surveys faster and economical for vast dryzone areas. It is a more frequent and cost-efficient survey with better accuracy that will result in better environmental policies and planning leading to a more sustainable developement (Pathak et al., 2013; Ram & Kolarkar, 1993; Shakoor et al., 2011).

The segmentation and object-based image analysis approaches can be further implemented for better boundary delineation challenges in arid zone for high-resolution remote sensing monitoring. Further research over data dimensionality, development of hyperspectral indices, data integration from multiple sensors, and resolving data complexity over higher spatial resolution data might improve the model quality further, leading to better analysis and hence better mapping. The delineation of zones may further be used with time series data to study the vegetation trends. The application of high-resolution RS satellites and RS-UAS (Unmanned Aerial System) with automatic class delineation procedures opens multiple avenues for tracking fauna, anthropogenic activities, and precision mapping of vegetation dynamics might revolutionize the environmental mapping paradigms. The application of artificial intelligence for the classification and delineation of high-resolution remote sensing datasets for environmental monitoring has a huge potential.