Introduction

Tea (Camellia sinensis (L.) Kuntze) is a perennial evergreen cash crop grown under the canopy of primarily leguminous trees, that provide partial shade. The tea agroforestry system includes nitrogen-fixing shade trees as a companion trees to the tea plantations (Barrios et al. 2018). Shade trees increase understorey productivity by providing favourable microclimatic conditions to the understorey crop (Mukherjee and Sarkar 2016) by decreasing the incident radiation, temperature, and wind speed and increasing the relative humidity (Mohotti et al. 2020). Tea is cultivated in a dense population (12,000–18,000 plants ha−1) under tropical and subtropical humid conditions. Due to the dense population of green canopy, the area of the tea plantation resembles an agroforestry ecosystem (Pramanik and Phukan 2020). Tea plants are intensively managed as bushes through periodical pruning for getting the continuous flush of young leaves and for the ease of plucking (Baruah 1989; Kalita et al. 2015). It is the most commonly consumed beverage globally (Gramza-Michałowska 2014; Fu et al. 2018; Jayasinghe and Kumar 2019). Though tea is grown commercially in many parts of the world, it is mainly grown in Asia, Africa, and the Near East; with China, India, Kenya, and Sri Lanka contributing the major shares (FAO 2013). Due to the increasing demand in the beverage market, the area under tea plantations has expanded fast in recent decades (Su et al. 2014; Zhang et al. 2017). With a tea production of 6.34 million tons worldwide in 2015, the area under tea cultivation stretched to 3 million hectares (FAO 2015). With 25% of global production, India emerged as one of the leading countries in tea production and its contribution to the country's Gross National Product (GNP) (Kalita et al. 2015). India is the 2nd largest producer of tea in the world and accounts for the highest tea consumption globally (http://www.teaboard.gov.in/). Tea production in India for the financial year 2018–19 was 1.35 million tons (http://www.teaboard.gov.in/). Tea is indigenous to India and the tea plantations in India are mainly distributed in Assam, West Bengal, Himachal Pradesh, Kerala, Karnataka, and Tamil Nadu. The state of Assam, with suitable agro-climatic conditions necessary for tea plantations, is well-known in the world for its dominant role in tea production. It is the largest tea-producing state in India which supplies one of the best quality tea to the world. With a production of over 0.63 million tons of tea per year, the state is the largest tea-growing region in the world (http://www.teaboard.gov.in).

The tea agroforestry system plays a significant role in the economy, as it supports the livelihood and food security, mainly in developing countries (FAO 2015). Besides economic and social benefits, this agroforestry system has ecological significance as well (Ahmed et al. 2014; Li et al. 2011; Shankar Raman et al. 2021). Globally, tea is considered an important cash crop (FAO 2015). Although tea agroforestry systems are mainly managed for tea production, the long rotational cycles (40–90 years) (Han et al. 2007) indicate its potential to sequester and store huge amounts of carbon (Li et al. 2011). Biomass storage of 48–61 Mg ha−1 was reported for different aged tea plantations in northeast India (Kalita et al. 2020). The biomass stock recorded in their study was close to the median range of data (32–123 Mg ha−1) reported for agroforestry systems from India and elsewhere (Negash and Starr 2015; Nascimento Ramos et al. 2018; Brahma et al. 2018; Reang et al. 2021). Phukan et al. (2018) found that tea bushes can assimilate 1243.8–2526.7 kg CO2 ha−1 year−1. Pramanik and Phukan (2020) revealed that tea plants assimilated 31.82–249.22 g CO2 plant−1 year−1 with an average of 92.06 g CO2 plant−1 year−1. There are very few studies conducted for assessing the carbon stock of tea agroforestry systems in India based on ground inventory (Kalita et al. 2015, 2017). Carbon stocks of tea plantations of Sri Lanka and China have been studied (Wijeratne et al. 2014; Li et al. 2011). Unlike the intensively studied carbon stocks of forest systems, studies on carbon stock of tea agroforestry systems are very scarce and hence these systems are very poorly understood. An accurate and timely assessment of the spatial distribution of the tea agroforestry system’s biomass has profound significance in its better management. It is expensive and laborious to assess the spatial distribution of biomass in tea agroforestry systems using traditional surveys. However, geospatial technology facilitates efficient spatiotemporal assessment of biomass in tea agroforestry systems.

A variety of satellite data and classification methods have been used for mapping and monitoring of tea plantations all over the globe (Ghosh et al. 1992; Su et al. 2017; Fauziana et al. 2016; Dihkan et al. 2013; Xu et al. 2016; Xu 2016; Chuang and Shiu 2016; Yang 2017; Zhu et al. 2019; Wang et al. 2019). Due to its high spectral, spatial, temporal resolution, and free availability, Sentinel-2 satellite imagery can be considered to be very effective in mapping and monitoring tea plantations (Zhu et al. 2019). The availability of remote sensing (RS) data has led to the development of various modelling techniques to estimate the spatio-temporal distribution of different vegetation characteristics, including the leaf area index (LAI) (Srinet et al. 2019), and biomass (Nelson et al. 2000; Lu 2005; Sales et al. 2007; Kushwaha et al. 2014; Manna et al. 2014; Yadav and Nandy 2015; Nandy and Kushwaha 2021). In recent times, the machine learning algorithms (MLAs), such as random forest (RF), artificial neural network (ANN), and support vector machine (SVM) are increasingly being used to estimate forest biomass by integrating RS data and in-situ measurements (Wu et al. 2016; Dhanda et al. 2017; Nandy et al. 2017; Pandit et al. 2018; Dang et al. 2019). Being fast and insensitive to overfitting, RF can successfully handle high data dimensionality and multicollinearity, which makes it suitable for RS-based estimation of vegetation attributes (Zhao et al., 2019).

Considering the spatial distribution and area under tea agroforestry systems worldwide, it is expected to sequester a large amount of atmospheric CO2 in their biomass. However, there is not a single study that addresses the spatial distribution of carbon stock in tea agroforestry systems in India. Hence, the present study aims to map the spatial distribution pattern of the aboveground biomass (AGB) of the tea agroforestry system in Barak valley, northeast India using Sentinel-2 satellite imagery and RF, a machine learning algorithm.

Materials and methods

Study area

The present study was carried out in the Barak valley region of southern Assam, northeast India. Barak valley consists of three districts, viz., Cachar, Hailakandi, and Karimganj, covering a geographical area of about 6922 km2. The area lies between 24°08′ to 25°07′ N and 92°09′ to 93°23′ E (Fig. 1). The area experiences a tropical, warm, and humid climate with an average annual rainfall of 2390 mm, most of which is received during the southwest monsoon season (May–September). The mean annual minimum and maximum temperatures were recorded as 18.75 °C and 29.55 °C, respectively. The mean value of relative humidity ranges between 60.5% (April) and 83% (June) (NEDFCL 2020). The topography of the terrain is undulating, characterized by hills, hillocks, wide plains, and low-lying waterlogged areas. The soil of this region is mainly composed of silty clay and loamy soil while coarse sandy loamy soil is found in hillocks. The forests of the study area mainly consist of tropical evergreen, semi-evergreen and moist deciduous forests; bamboo and cane brakes are also widely found (Champion and Seth 1968). Agriculture is the economic backbone of the area, with land-use systems varying from smallholder agroforestry (home gardens) to cash crop-oriented farms of bamboo (Bambusa sp.) and rubber (Hevea brasiliensis). Rice and tea are the major cash crops, while vegetables are grown mainly for subsistence. Tea is grown under the canopy of shade trees, forming the tea agroforestry system (Kalita et al. 2014), which is an integral part of the economy of this region. Shade trees including Albizia odoratissima, A. lebbeck, A. chinensis, A. procera, Dalbergia sissoo, Derris robusta, and Senna siamea are planted systematically in tea agroforestry systems. A. lebbeck and A. odoratissima are the dominant shade trees in the tea gardens of Barak valley (Kalita et al. 2020).

Fig. 1
figure 1

Location of the study area

Methodology

The methodology (Fig. 2) for the present study involves tea garden mapping using the Sentinel-2 satellite data, collection of field inventory data for tea bushes and shade trees for tea agroforestry AGB calculation. The spatial distribution of tea agroforestry system AGB was mapped by integrating spectral information from Sentinel-2 data and field inventory data using RF algorithm.

Fig. 2
figure 2

Methodology for estimation of aboveground biomass in tea agroforestry systems

Satellite data processing and variable extraction

The present study used Sentinel-2 satellite imagery of 10 December 2016. Sentinel-2 provides optical satellite data in 13 spectral bands, from the visible and the near-infrared to the shortwave infrared. In the present study 10 spectral bands, excluding bands 1 (Coastal aerosol), 9 (Water vapour) and 10 (SWIR-Cirrus), were used (Table 1). Orthorectified satellite data was downloaded (https://earthexplorer.usgs.gov/) and a subset image was extracted from the acquired reflectance image using the study area boundary. Spectral variables including band reflectance (10 bands) and 14 spectral indices (Table 1) were extracted from the reflectance image of the study area.

Table 1 Spectral variables extracted from Sentinel-2 satellite imagery, used for developing tea agroforestry aboveground biomass prediction model

Tea agroforestry system mapping and field data collection

Tea agroforestry system mapping was carried out using on-screen visual interpretation of Sentinel-2 imagery to delineate the boundary of the tea agroforestry systems. These boundaries were refined using Google Earth to remove the confusion arising due to the similar spectral signatures of forest and tea plantation areas. The image interpretation elements like, tone, texture, pattern and shape were taken into consideration for identifying the tea growing areas and mapping the same. The accuracy of the tea agroforestry system map was assessed using 124 ground control points.

For collecting the field inventory data, sample plots of 0.1 ha (31.62 m × 31.62 m) size were laid across the tea agroforestry system. The field inventory was carried out in 2016. A total of 80 sample plots were randomly laid in the tea agroforestry system of the study area (Fig. 3). The GPS coordinate at the centre of each sample plot was noted down. At each sample plot, the diameter of each tea plant at 5 cm above the ground was taken. The diameter of each shade tree at 1.37 m from the ground was also recorded. Tree density (per hectare) was calculated from the expanded values of each 0.1 ha plot multiplied by 10 and averaged. The basal area (m2 ha−1) of each plot was calculated from the expanded values of each stem into hectares by multiplying the stem density and averaged (Brahma et al. 2018). Using the allometric equation, developed by Kalita et al. (2015), the AGB of tea bushes was calculated. Species and site-specific volumetric equations (FSI 1996) were used to calculate the volume of each shade tree. Volume multiplied by the wood density (FRI 2002) gave the AGB of shade trees. To calculate the AGB of tea agroforestry systems, the AGB of tea bushes and shade trees of a sample plot was added up to get the total AGB per 0.1 ha. The total AGB value was thereafter converted to Mg ha−1. Using the GPS coordinates of each sample plot, a point layer was generated where each point shows the AGB values. The point layer was overlaid on the false colour composite to visualize the spatial distribution of sample plots across the study area (Fig. 3). A total of 80 plots were laid, of which 50 plots were randomly selected for developing the RF-based AGB model, and the rest were used for validation.

Fig. 3
figure 3

Locations of the sample plots for aboveground biomass assessment of tea agroforestry system

RF-based modelling for predicting biomass

In the present study, to determine the optimum independent variables and predict the AGB of tea agroforestry systems, RF regression algorithm (Breiman 2001) was used. The randomForest package (version 4.6-14) (Liaw and Wiener 2002) in R was used. Spectral variables (Table 1) including band reflectance (10 bands) and spectral indices (14 in number) were considered as independent variables. Whereas the field-measured AGB of tea agroforestry system was used as the dependent variable. The RF algorithm requires two input parameters, Mtry and Ntree. Mtry is the number of independent variables fed to each predictor tree. Ntree is the number of regression trees grown based on a bootstrap sample of the observations. For the parameterization of RF algorithm, Mtry values were obtained using tuneRF function from the randomForest package. Once the optimum value of Mtry was obtained, various values of Ntree were tested and based upon the accuracy of RF model, the optimum Ntree value was selected. The algorithm uses two-thirds of the samples (in-bag samples) to train the trees and the remaining one-third (out-of-bag (OOB) samples) for estimating the OOB error (Belgiu and Drăguţ 2016).

The independent variable importance derived from RF is vital in separating more useful variables from less useful ones and hence in reducing the dimensionality of data. It is represented by the increase in mean square error (MSE) of predictions as a result of the variable being permuted and the overall reduction in node purity for that variable over all the trees (%IncMSE and IncNodePurity). For the selection of fewer predictors, which can offer the best prediction, a recursive feature elimination with cross-validation, based on predictor importance ranking, was implemented. Using these selected sets of variables, RF-based models were built for the prediction of AGB, and its spatial distribution was mapped. The developed models were validated by comparing the predicted AGB with the field-measured AGB using data from 30 validation plots.

Results

Stand attributes, tea agroforestry system mapping and field measurements for AGB calculation

In the tea agroforestry system of the study area, shade tree density varied from 182 to 230 stems ha−1 for the different aged plantations. In contrast, the tea bush density varied from 11,400 to 18,400 stems ha−1 for the different aged plantations (Table 2). The basal area of shade trees and tea bushes ranged from 6.98 to 7.29 m2 ha−1 and 42.06 to 51.08 m2 ha−1 respectively for the different aged plantations (Table 2). The spatial distribution of tea agroforestry systems was mapped (Fig. 4a). The mapping accuracy was found to be 92.74%. The tea agroforestry systems were extracted from the Sentinel-2 imagery using the mapped boundaries (Fig. 4b, c). The area under tea agroforestry systems was found to be 330.69 km2. The tea bush AGB ranged from 9 to 38.49 Mg ha−1 with a mean of 21.78 Mg ha−1 whereas the shade trees AGB ranged from 18.69 to 128.75 Mg ha−1 with a mean of 50.66 Mg ha−1. The total AGB (sum of plot-wise tea bush AGB and shade trees AGB) in these tea agroforestry systems ranged from 39 to 149.47 Mg ha−1. The mean AGB was found to be 72.43 Mg ha−1.

Table 2 Stem density and basal area of shade trees and tea bushes in the study area for different aged tea plantations in Barak valley, northeast India
Fig. 4
figure 4

a Tea agroforestry system map of Barak Valley; b zoomed view of a tea agroforestry system; c extracted tea agroforestry system; and d field view of a tea agroforestry system

RF based modelling for AGB estimation

To optimize the independent variables, RF algorithm was run recurrently using various Mtry and Ntree values. Ntree and Mtry values of 500 and 18 were chosen respectively, as the OOB error was the lowest (5a). Using all 24 variables, the observed and predicted AGB values showed an R2 of 0.94 and % RMSE of 8.47%. Variable importance based on an increase in mean square error (%IncMSE) as well as node purity (IncNodePurity) was obtained (Fig. 5b). Recursive feature elimination with cross-validation was done to optimize the variables. The minimum RMSE was attained for a set of 5 variables (Fig. 5c): Normalized Difference Vegetation Index (NDVI), Green Normalized Difference Vegetation Index (GNDVI), Normalized Difference Infrared Index (NDII), Normalized Difference Water Index (NDWI) and Atmospherically Resistant Vegetation Index (ARVI). The final model was developed using these 5 variables to estimate AGB in tea agroforestry systems. The spatial distribution of AGB in tea agroforestry systems (Fig. 6) was mapped at 20 m spatial resolution using RF. The AGB values ranged from 30 to 124 Mg ha−1, with a mean AGB of 75 Mg ha−1. The spatial distribution of AGB was validated using AGB values of 30 validation plots. It was observed that the RF algorithm could predict AGB with R2 of 0.86 (Fig. 7), RMSE of 7.50 Mg ha−1 and % RMSE of 9.94%.

Fig. 5
figure 5

Random forest based modelling a average squared error vs. number of trees, b predicted variable importance, and c number of variables vs. RMSE

Fig. 6
figure 6

a Spatial distribution of aboveground biomass (Mg ha−1) in tea agroforestry systems; zoomed view of b false colour composite, c aboveground biomass (Mg ha−1) distribution

Fig. 7
figure 7

Observed vs. predicted aboveground biomass (AGB) (Mg ha−1)

Discussion

The variations in stem density and basal area of shade trees and tea bushes between the different aged tea plantations are attributed to differences in space management, shade tree species composition, tree size, and growth pattern (Condes and Del Rio 2015; Kalita et al. 2020). The basal area reported in the present study is much higher than the 23.8 and 29.9 m2 ha−1 found in natural forests and Piper betle agroforestry systems respectively from similar geographical region (Brahma et al. 2018). However, the basal area of the tea agroforestry systems of the present study area is consistent with the reported 41.60–74.05 m2 ha−1 and 40.50–68.75 m2 ha−1 in paan (Piper betle) jhum agroforestry systems and natural forests, respectively of the same study area (Nandy and Das 2013). The stand density of the tea bushes reported in this study is much higher than the in paan jhum agroforestry systems (763–934 stems ha−1) and natural forests (522–865 stems ha−1) of the same study area (Nandy and Das 2013).

The estimated biomass storage of 75 Mg ha−1 in the tea agroforestry is slightly higher than the reported 71.81 Mg ha−1 in the traditional pineapple agroforestry systems of the Barak valley region of northeast India (Reang et al. 2021). However, the estimated biomass storage under tea agroforestry was lower than the reported 167.4 Mg ha−1 for a natural forest from a similar geographical area (Brahma et al. 2018). Nevertheless, according to IPCC (2007), the agroforestry systems provide significant carbon absorption capacities with a mitigation potential of 1.1–2.2 PgC (1Pg = 1015 g) in terrestrial ecosystems over the next 50 years. Although agroforestry systems are not primarily designed for carbon sequestration, they can play a major role in storing carbon in the above- and belowground biomass and in soil (Sathaye et al. 2001; Montagnini and Nair 2004). In the present study, the AGB of the tea agroforestry system was mapped. For this purpose, the tea bushes and shade trees were not considered separate entities but a whole system. Hence, while mapping the spatial extent of the tea agroforestry, tea bushes and shade trees were mapped simultaneously.

RS plays a significant role in the mapping and monitoring of tea agroforestry systems compared to traditional surveys. It is time and cost-effective in providing information about the spatial distribution of any land use/land cover (Navalgund et al. 2019). Sentinel-2 satellite data with better spectral and spatial resolution than its contemporary medium resolution satellite imagery proved to be very effective in delineating the tea agroforestry systems in the present study. Moreover, visual image interpretation yielded high mapping accuracy. Nandy and Kushwaha (2010, 2011) and Bagaria et al. (2021) also observed that visual image interpretation yielded high mapping accuracy. High-resolution satellite imagery could have been adequate to distinguish shade trees from the tea bushes and map these components seperately. However, high-resolution satellite imagery is expensive and unable to provide high temporal resolutions, making it challenging to identify and monitor tea plantations at high temporal frequencies over large areas. In this context, Sentinel-2 satellite imagery could be the best for regular mapping and monitoring of tea agroforestry systems due to its high spectral and spatial resolution, continuity, affordability, and access, and its history of successful application to multi-temporal classification research for tea plantation (Zhu et al. 2019; Li et al. 2019) and other vegetation (Erinjery et al. 2018; Manna and Raychaudhuri 2020; Waśniewski et al. 2020). Devi et al. (2012) used IRS P6 AWiFS data to identify major land use/land cover in Barak Valley using the unsupervised digital classification technique, which also included tea plantations.

Tea agroforestry systems contribute to a large terrestrial carbon stock in India. However, the potential of RS-based approaches for tea AGB estimation has not been demonstrated by any study in India. Hence, the approach used in the present study can contribute enormously to mapping and monitoring the AGB of tea agroforestry systems in the country. In the present study, the spatial distribution of AGB in tea agroforestry systems was mapped using spectral information from Sentinel-2 and RF algorithm. The utility of spectral information from Sentinel-2 has been widely explored for the estimation of biophysical and biochemical variables in tropical areas (Pham and Brabyn 2017; Dang et al. 2019; Nandy et al. 2021; Vasudeva et al. 2021). The potential of machine learning algorithms to predict AGB with high accuracy has also been much discussed (Dhanda et al. 2017; Pandit et al. 2018; Dang et al. 2019; Nandy et al. 2021). For AGB estimation, machine learning algorithms, like RF, have been reported to perform better than linear modelling (Dube and Mutanga 2015). RF algorithm has the capability to handle non-linearity between AGB and RS data (Liu et al. 2017). It has been widely used to rank the explanatory variables based on their importance which makes it very effective in predicting AGB (Pandit et al. 2018; Dang et al. 2019; Yadav et al. 2019).

Some studies have used Sentinel-2 data, with better spectral resolution among the medium resolution sensors and additional SWIR and red-edge bands, in combination with RF algorithm to predict AGB in various ecological settings with high accuracy (Chrysafis et al. 2017; Khan et al. 2020; Pandit et al. 2020; Fassnacht et al. 2021; Pham and Brabyn 2017; Nandy et al. 2021; Li et al. 2022). The findings of the present study recapitulated the fact that Sentinel-2 data and RF algorithm are effective for the accurate mapping of the spatial distribution of AGB even in the tea agroforestry areas.

In the present study, for tea agroforestry AGB prediction, the most important variables were found to be NDVI, GNDVI, NDII, NDWI, and ARVI. The results corroborates with the findings of the previous studies. A strong correlation between NDVI and woody biomass was observed by Kushwaha et al. (2014). Nandy et al. (2017) also found NDVI as the most important variable for predicting AGB. Vegetation indices show better relationship with AGB because of their capability to reduce the effect of shadows and environmental conditions on the reflectance (Adam et al. 2014). The near-infrared (NIR) band was found to be common in all the important variables, which emphasized the importance of the NIR band in the estimation of AGB. NDVI which uses NIR and red bands, the highest reflectance and absorption regions of chlorophyll, respectively, to give a measure of healthy and green vegetation, was found to be the most important variable. It was followed by GNDVI, which is also sensitive to the chlorophyll content of the vegetation. The results demonstrated that indices using SWIR bands were also suitable for estimating AGB. SWIR band can differentiate the moisture content of vegetation and soil (Brown et al. 2016). Hence, the SWIR band could capture the variation in the spatial distribution of AGB efficiently. Yadav and Nandy (2015), Molinier et al. (2016), Chrysafis et al. (2017), Nandy et al. (2017), Dang et al. (2019) and Nandy et al. (2021) also found that the SWIR band-derived spectral indices showed a stronger relation with AGB irrespective of data and environmental settings. Pham and Brabyn (2017) noticed NDII as one of the top variables which has a strong relationship with AGB. Pandit et al. (2018) observed that NDVI and NDII are one of the top important variables for AGB prediction. Dang et al. (2019) found that NDII, GNDVI, and NDWI showed strong relationship with the forest AGB. Nandy et al. (2021) also found NDWI and GNDVI as the most important variables for AGB prediction. ARVI is considered to be less sensitive to atmospheric effects and robust in areas with topographic effects (Kaufman and Tanre 1992). It has also been reported to be sensitive to AGB (Bayaraa et al. 2021).

As the agroforestry systems have been recognized for high carbon storage potentials (Kumar and Nair 2011; Jose 2009; Nath et al. 2021), it is imperative to develop a methodology for mapping and monitoring the AGB and carbon stock in these systems. This study presented a comprehensive methodology for mapping tea gardens and spatial distribution of tea agroforestry AGB. It highlighted the utility of Sentinel-2 data and RF algorithm in mapping and monitoring the AGB and carbon stock of tea agroforestry systems in India. The developed methodology can be used to monitor the carbon stocks present in this important agroforestry system for better management and account for their share in mitigating climate change by absorbing the atmospheric CO2.

Conclusions

The present study presents an approach for AGB mapping in tea agroforestry systems by the synergistic use of RS and field inventory data using a machine learning algorithm. The methodology can be adopted for mapping and monitoring the carbon stock of tea agroforestry systems. Information on the carbon stock of the tea agroforestry systems can facilitate its better management. The study also highlighted the utility of freely available Sentinel-2 data for mapping and monitoring the spatial distribution of carbon stock of tea agroforestry systems. It was observed that RF algorithm using a combination of five optimized spectral indices can effectively predict the AGB of tea agroforestry systems. The study was conducted using freely available satellite data (Sentinel-2), algorithm (RF), and software (R). Hence, this cost-effective methodology can be applied in other areas too for effectively mapping the carbon stock of the tea agroforestry systems or similar agroforestry systems. In this study, the tea bushes and the shade trees could not be mapped separately using Sentinel-2 satellite data. However, shade trees form an integral part of the tea agroforestry systems, and hence while mapping the spatial distribution of AGB or the carbon stock both the components should be considered together forming an agroforestry system.