Keywords

1 Introduction

Agriculture plays an important role in the economy of India. To attain sustainable agriculture practice, accurate crop mapping needs to be in place. Satellite imagery provides timely, accurate, and detailed spatial information about an agro-ecological environment [1]. Crop mapping using satellite imagery would help in providing essential and accurate information about the crops, useful to manage many agricultural resources [2]. However, crop classification using remote sensing data is a challenging task due to crop heterogeneity and similar reflectance in fields. Various machine learning algorithms have been successfully investigated for cropland mapping from single-date to time-series remote sensing images. The cropland mapping techniques applied to time-series images have been demonstrated to perform superior to single-date mapping techniques [3, 4]. For example, Muller [5] successfully differentiated cropland and pasture fields from Landsat time series and Zheng [6] applied the SVM model on time-series Landsat Normalized Difference Vegetation Index (NDVI) data for identification of crop type. Time-series Landsat is explored with ensemble classifier and with other ML methods like SVM, Neural Network, logistic regression, and extreme gradient boosting for land cover classification [7]. Senf [8] used Landsat time-series imagery, and multi-seasonal MODIS to classify crops from Savannah. Jia [9] researched the adequacy of phenological features processed from the MODIS NDVI time-series melded with NDVI data obtained from Landsat 8 for cropland mapping. MODIS-Terra/Enhanced Vegetation Index (EVI) time series have been effectively used to derive the phenological patterns for the classification of cotton, maize, soybean, and noncommercial crops in Brazil [10]. MODIS-Terra EVI has also been used to detect phenological stages, and MODIS NDVI to extract phenological information like the season, peak, and end of the season [11] of rice crop. Double cropping, single cropping, forest, and pastures were mapped using the patterns of vegetation dynamics identified from MODIS EVI data by Maus [12]. Landsat, MODIS, and Chinese HJ-1 time series have been successfully explored for sugarcane crop classification. Time-series Landsat 8 [13] and time-series Chinese HJ-1 CCD images [14] were used to automatically map sugarcane over large areas by applying object-based image analysis and data mining techniques. Sugarcane cropping practices, including crop type and harvest mode, were mapped using Landsat 8 NDVI time-series by Mulianga et al. [15]. Time series of SPOT 5 images were integrated with crop growth model and expert knowledge to deal with the issue of missing acquisitions or uncertain radiometric values by El Hajj et al. [16] in order to detect sugarcane harvest.

Many studies have investigated the potential of a single date Sentinel-2 image to classify crops including sugarcane using RF, SVM, DT machine learning methods. Furthermore, applying RF, DTW algorithms on time series of Sentinel-2 produced the best results for cropland mapping [17], but is not yet explored for sugarcane crop classification. So, considering the affordability of high spatial-temporal resolution of Sentinel-2 data, and the potential of RF, SVM, and ML approaches, this study aimed to evaluate the effectiveness of time-series Sentinel-2 images and the potential of RF and SVM on this data to classify sugarcane crop from other land covers. The rest of the paper is organized as follows: Sect. 2 describes the study area and the data; Sect. 3 presents the proposed methodology; Sect. 4 discusses the results followed by a conclusion.

2 Study Area and Data

2.1 Study Area

The study area is located near to Sameerwadi, Karnataka, India, at 16.38980 N and 75.03710 E (Fig. 1). Sameerwadi is a village situated in Mudhol taluka, Bagalkot district of Karnataka state in India. The study area covers four talukas, i.e. Mudhol, Jamkhandi, Raibag, and Gokak, and around 8 lack acres of land. The area has an altitude of 541 m above sea level with annual precipitation around 545 mm. The climate is generally dry and the temperature ranges between 16.2 and 38.7 °C. Sugarcane is the main crop cultivated in this region, apart from maize, turmeric, and banana. Figure 1 depicts the study area.

Fig. 1.
figure 1

FCC image of the study area

2.2 Sugarcane Crop Cycle

The phenology of sugarcane may provide valuable information for remote sensing classification in the study area. Sugarcane crop’s phenological dynamics throughout its biological cycle needs to be perceived well to understand its spectral behavior, which is vital because of its great impact on classification accuracy. Depending on the planting date, sugarcane has 3 growth cycles, i.e. 12 months (Early season), 14 months (Mid-late season), and 18 months (Late season) in the study region. The 12-month crop is planted in the months of January and February, 14-month crop is planted between November and December, whereas 18-month crop is planted during July–August. After harvesting for the first time, the crop is regrown again 3–4 times and harvested after every 12 months. This practice is referred to as ‘ratoon’. In addition to this, it is important to take in the growth stages and varieties of sugarcane in the classification task. There exist four stages: germination, tillering, grand growth, and maturity of sugarcane with varieties of CO 86032, CO 91010, SNK 2005, 265 in the study region. Due to these properties of sugarcane, a satellite image acquired on a particular date contains variations in fields which include different growth stages of sugarcane crop, plant cane and ratoon cane, sugarcane varieties, and other crops cultivated for the crop rotation purpose. This necessitates the use of multi-temporal images to perform the classification with the best accuracy. By appropriately utilizing time-series remote sensing images, the phenology of sugarcane, which can be utilized to separate the sugarcane crop fields from the other land, may diminish the obstruction of comparative spectra from the other vegetation in the range and help in increasing the classification accuracy.

2.3 Data

The Sentinel-2 launched on June 23, 2015 is an Earth Observation (EO) mission from the EU Copernicus program that captures optical imagery at a high resolution of 10–60 m for the services and applications for agriculture monitoring, land cover classification, water quality, and emergencies management. It has 13 bands out of which one of the three visible bands (band 4) and the near-infrared band (band 8) were used in our study. The images were downloaded from the European Space Agency’s (ESA) Sentinel Scientific Data Hub which is an open source. Five satellite titles per month, used to obtain study area, are obtained from January 20, 2019, to May 07, 2019, as listed in Table 1. The selected temporal images were free from cloud coverage and with good quality. The images were geo-referenced to WGS 1984 UTM zone 43 N projection system. EU Copernicus program provides images with geometrical and radiometrical corrections. All the images were atmospherically corrected using Semiautomatic Classification Plugin (SCP) available on QGIS 2.18 distributed under the GNU GPL license.

Table 1. Sentinel-2 images used in the study

3 Methodology

The proposed methodology is depicted in Fig. 2 which contains the following steps: (i) acquisition of Sentinel-2 temporal data, (ii) atmospheric corrections of all the images, (iii) NDVI computation, (iv) preparing an input image, v) selection of training samples and generation of Region of Interest (ROI) files, (vi) classification using RF, (vii) classification using SVM, and (viii) classification accuracy assessment.

Fig. 2.
figure 2

The proposed methodology

3.1 Data Acquisition and Preprocessing

As listed in Table 1, Sentinel-2 images were obtained free of cost from the Copernicus website. All images were atmospherically corrected to reduce the effects of the atmosphere to produce the surface reflectance values. It helps in improving the use and interpretability of images.

3.2 Data Collection and Preparation of Training Set

The classification was performed based on the NDVI values of the crops from January 2019 to May 2019. We have selected NDVI as it is proven to be the best Vegetation Index (VI) in the literature for crop mapping [6, 7, 9, 11, 15]. All preprocessed images’ NDVI computation is performed to get the NDVI time-series. Then the study area is extracted from these images and layer stacked to generate a multispectral input image for the classification of sugarcane crops. Every pixel of the stacked image represents a vector containing NDVI values corresponding to the considered images.

Training Dataset: Training dataset has been created by field survey which was performed from January 2019 to May 2019. In this field campaign, ground truth data has been recorded by the Global Positioning System (GPS) device (Montana 680) for sugarcane and maize crops. Apart from this, samples for other classes were generated from a visual interpretation based on expert knowledge. In total, 14 sugarcane polygons and 06 maize polygons surveyed in fields were used for training, and 40 polygons were generated for all other classes. In the study area, during the sugarcane developing cycle, various phenological stages of sugarcane fields may coincide on the same date, ranging from the region of reaped sugarcane, and sugarcane in different growth phases up to the phase of grown-up sugarcane ready to harvest. In this way, we attempted to collect samples of all sugarcane phenological stages, with the goal that all the significant subclasses would be represented. The testing polygons are distinct from training polygons. The polygons were selected from the different agricultural parcels to account for many other factors such as soil, water source, climate, and cultivation practices.

3.3 Classification

Random Forest: Random forest is a nonparametric, ensemble method [18] based on the Classification and Regression Trees (CART). A classification tree iteratively splits the bootstrap data into pure subsets. Many such independent classification trees are generated by setting ntree and mtry hyperparameters. The ensemble’s final decision is taken from the majority vote of the predictions of all the trees. RF has shown magnificent performance in remote sensing applications [19,20,21,22] due to the capability of handling large input variables, run on large datasets, to handle outliers and to provide the importance of predictive variables on final model performance [23, 24]. RF also achieved significant accuracy in sugarcane classification [1, 2].

Support Vector Machine: SVM is a statistical learning method used for solving classification as well as regression problems. It does not assume the distribution of data and finds an optimal hyperplane between the two classes to be classified. It is basically a two-class classification method but can be extended for multiclass problems [25, 26]. The main capability of SVM of achieving high accuracy even with fewer training samples made them very useful in remote sensing applications [6, 27]. SVM is proven to be one of the best ML methods in various remote sensing applications which mainly include crop classification [26], biotic stress detection [28], yield estimation, and Land Use and Land Cover (LULC) [25, 29, 30].

Sugarcane crop has varying crop cycle and diverse planting and harvesting dates which make classification complex. We first, classified the Sentinel-2 NDVI time-series using ground truth data and supervised the classification into seven classes using RF and SVM classifiers. The classes are sugarcane (sugarcane crops having age more than six months), early sugarcane (sugarcane with age less than six months), maize, water body, fallow land, built-up and bare land. Both the models were trained using the training dataset. Both models are widely used models in the crop classification and are tuned with the hyperparameters to achieve maximum accuracy. Open-source R software is used to implement RF and SVM classifiers. Then, recoding of the assigned classes was performed in post-classification through ENVI software. This resulted in one early sugarcane class and a grown-up sugarcane class. This formed the sugarcane map for four talukas’ region.

3.4 Accuracy Measures

In remote sensing, accuracy is the measure to validate the correctness and quality of the generated classification maps. The evaluation is performed through the overall accuracy and kappa coefficient measures, and accuracy of an individual class is measured through producer’s and user’s accuracy. Sometimes, F1 score is used to determine class-wise accuracy [2]. In this work, the accuracy was determined with overall accuracy and kappa coefficient measures.

4 Results and Discussion

Sentinel-2 five tiles, covering the study area, in every month for five months were obtained, then mosaicked and ROI was cropped from that image. Then, layered stacking of NDVI was performed, and the resultant image was used for sugarcane and other land cover classification. Two well-known ML classifiers RF and SVM discriminated the sugarcane and other classes very well. The RF’s overall accuracy is obtained as 88.61% and the kappa coefficient is 0. 8387 (Table 2).

Table 2. Accuracy assessment of RF and SVM

The classified image using RF is shown in Fig. 3. The optimum accuracy was achieved by tuning the parameter mtry with value 2. SVM’s achieved overall accuracy is 81.86% and kappa coefficient is 0.7623 (Table 2), and the classified image is given in Fig. 4. From Tables 3 to 4, it is observed that the work resulted in classifying sugarcane, early sugarcane, built-up and bare land classes more accurately by RF than SVM. Fallow land class achieved the lowest producer’s accuracy with RF, and Maize is less accurately classified by SVM.

Fig. 3.
figure 3

The classified image by RF

Fig. 4.
figure 4

The classified image by SVM

Table 3. Producer’s and user’s accuracy for RF
Table 4. Producer’s and user’s accuracy for SVM

The total area classified into each of the classes by RF and SVM is presented in Fig. 5.

Fig. 5.
figure 5

Class-wise coverage of the total area in hectares

After classifying the time-series image into seven classes, reclassification was performed that resulted in two sugarcane classes (early sugarcane and grown-up sugarcane), and a sugarcane map is generated which is shown in Figs. 6 and 7.

Fig. 6.
figure 6

Sugarcane map on RF-classified image

Fig. 7.
figure 7

Sugarcane map on SVM-classified image

5 Conclusion

In this study, we evaluated the potential of RF and SVM to discriminate sugarcane crop from other land covers using Sentinel-2 NDVI time-series images and a limited number of training polygons. We utilized Sentinel-2 images of five months from January to May 2019 which covers two main phenology of sugarcane, i.e. tillering and grand growth, January–December temporal coverage is required for precise crop classification. The achieved producer’s and user’s accuracies reach 97.70 and 92.39 respectively. RF classifier achieved 88.61% accuracy whereas SVM reached up to 81.86% concludes RF’s superiority in sugarcane classification in the study area. Thus from the results, we conclude that our spectral-temporal approach for classification gave reliable discrimination between sugarcane and other land covers. Future investigation will be to evaluate different vegetation indices like GNDVI, EVI, etc. from time series data to discriminate all four phenology of sugarcane crops in the study area.