Introduction

Greenhouse gases in the atmosphere play an important role over the climate change. Regarding greenhouse dynamics, carbon dioxide (CO2) has a vital importance and it contributes significantly to global warming (Hansen and DeFries 2004). The growing trees remove CO2 from the atmosphere through the process of photosynthesis and store the carbon in plant structures (Dixon et al. 1994). Different forest stands store different amounts of carbon, and their storage capacity is tightly linked to tree cover. Tree cover is the percentage of the ground surface area covered by a vertical projection of the outermost perimeter of the natural spread in the plants’ foliage (Rokhmatuloh et al. 2005). Changes in forest cover affect the delivery of important ecosystem services, including biodiversity richness, climate regulation, carbon storage, and water supplies (Hansen et al. 2013). Changes in forest cover are highly relevant to the global carbon cycle, changes in the hydrological cycle, an understanding of the causes of changes in biodiversity and in understanding the rates and causes of land use change (Townshend et al. 2012). However, spatially and temporally detailed information on local and global-scale forest cover is limited, and comprehensive information is needed for making better decision-making.

The annual changes of carbon are calculated within the changes in tree cover using different modelling techniques. In the past decade, several efforts to estimate percent tree cover as a continuous variable have been made by utilizing multiple linear regression (MLR) (Zhu and Evans 1994; DeFries et al. 2000), linear mixture modelling (LMM) (Iverson et al. 1989) and regression trees (RT) (Herold 2003; Sá et al. 2003; Hansen et al. 2003, 2005). Among these techniques, the regression tree technique is well suited for percent tree cover mapping because, as a non-parametric classifier, it requires no prior assumptions about the distribution of the training data (Berberoglu et al. 2009).

Recently, there has been increasing emphasis on the need for products derived from Landsat resolution data to integrate into the sophisticated modelling techniques for tree cover estimations (Townshend et al. 2012.). Classifications in Landsat resolution are essential for detecting the tree cover because of the fine scale of many such changes especially those resulting from anthropogenic factors. A substantial proportion of the variability of land cover change has been shown to occur at resolutions below 250 m (Townshend et al. 2010).

Previously, global-scale analysis using Landsat data was generally regarded as not feasible because of the absence of multi-temporal data sets, measurement data derivation for accuracy assessment and the large computational and storage demands in carrying out the analysis (Hansen et al. 2013).

Landsat data have primarily been used at relatively local scales for tree cover estimation. DeFries et al. (2000) calculated global tropical forest change based on advanced very high resolution radiometer (AVHRR) data along with regional rates of changes estimated from Landsat data (Townshend et al. 2012). More recently, Landsat samples have been used to provide estimates of forest loss and changes at regional scale (Hansen et al. 2008) and, subsequently, for the globe (Hansen et al. 2010; Townshend et al. 2012). Although recent studies utilized the use of selected Landsat imageries, they were not used to integrate into the numerical modelling techniques at local scale. Integrating Landsat and other high resolution data into the modelling processes is still a research need for estimating tree cover. Combining fine resolution data and numerical models will assist to reveal the spatial distribution of local forest species and their dramatic changes at local scale to provide comprehensive information for decision-making. This is especially true for Mediterranean ecosystems where the vegetation species show reasonable variations.

The objective of this study was to estimate the percent tree cover of various forest stands in Goksu Watershed located in the Eastern Mediterranean part of Turkey based on an empirical relationship between tree coverage and fine-scale remotely sensed data established by the RT technique. Modelling the percent tree cover has a significant importance for decision-making in such complex Mediterranean ecosystems, where the monitoring of the climate change effects is essential due to diversity in vegetation and topography.

The RT algorithm was used to estimate percent tree cover at global and regional scales using coarse spatial resolution remotely sensed data derived from different sensors ranging between 250 m and 1 km (Berberoglu et al. 2007). In this study, Landsat TM/ETM images with 30 m spatial resolution provided greater spectral and spatial resolution. This data set integrated into the RT algorithm. The capability of the RT algorithm was evaluated together with high resolution data to derive percent tree cover mapping in a complex Mediterranean environment.

Study area and data

Goksu River Watershed is located at the Central Eastern Mediterranean Basin in Turkey and was selected to model the percent tree cover. Location of the study region is shown in Fig. 1. The area of the basin covers approximately 10,500 km2. The basin has a very high local variability in terms of forest species. It comprises pure and mixed conifer forests, including Pinus nigra, Cedrus libani, Abies cilicica, Pinus brutia, Juniperus excelsa and Quercus cerris. The prevailing climate is characterized by Mediterranean with mild and rainy winters and hot and dry summers with a mean annual precipitation of approximately 800 mm. The mean annual temperature is 19 °C (Donmez et al. 2013).

Fig. 1
figure 1

Study area

Data acquisition

Twenty scenes of multi-spectral LANDSAT TM/ETM imagery representing the five different dates of the study area from October 1999 to June 2007 were used to estimate the percent tree cover. The list of the LANDSAT images used in the regression tree model is given in Table 1.

Table 1 The list of the LANDSAT images used in the Regression Tree Model

These images were obtained from the US Geological Survey (USGS) Earth Resource Observation Systems (EROS) data centre. The selected images were relatively free of haze and cloud. The model results were evaluated using high resolution GeoEye-1 scenes. The GeoEye-1 sensor provides high resolution images at 0.41 m (panchromatic) and 1.84 m (multi-spectral). It has four spectral bands that ranges between 450 and 920 nm in its standard band settings (Digital Globe 2013). The multi-spectral GeoEye-1 scenes used in this study were recorded in June 2012 with 1.84 m spatial resolution.

Other data utilized in the analysis included 1:25,000 scale Government Forestry Department and topographic maps and aerial photographs.

Data processing

The Landsat TM/ETM images used in this study were geometrically recorded in Universal Transverse Mercator (UTM) projection system and WGS 84 datum with paths 176–177 and rows 34–35. These images were processed to develop a percentage tree cover grid layer. Normalised difference vegetation index (NDVI) maps were produced by means of the spectral data to derive additional metrics for the RT model. The NDVI function uses ratios of bands 3 and 4. It ranges between 0 and 1. Higher values indicate the greater amount of green leaf vegetation (Koy et al. 2005).

Testing data were produced using very high resolution GeoEye-1 scenes. These scenes were classified using supervised classification algorithm and maximum likelihood method. The images were classified as “tree” and “non-tree” areas in combination with the ground data. The classified GeoEye-1 images were used for training and testing the model results.

The tree canopy spectral characteristics are deviating considerably from the shadowed spatial neighbourhood for forest area. Thus, the amount of shadow is quite important for such modelling studies. Shadow index (SI) technique was used through extraction of the low radiance of visible bands that indicated as the vegetation quantity increases. It is formulated as (Rikimaru 2000; Tateishi et al. 2008):

$$ \mathrm{S}\mathrm{I} = {\left[\left(255-B\right) \times \left(255-G\right) \times \left(255-R\right)\right]}^{{\scriptscriptstyle \raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$3$}\right.}} $$

B is blue band, G is green band, and R is red band responses.

Methods

Regression tree

This study utilized the commonly applied technique, RT model, to predict percent tree cover within a Mediterranean type forest using LANDSAT TM/ETM data. The regression tree algorithm produces a rule-based model for predicting a single continuous response variable from one or more explanatory variables. RT is a piecewise constant or piecewise linear estimate of a regression function constructed by recursively partitioning the data (Loh 2002). Regression trees are built through a process known as binary recursive partitioning, which is an iterative process of splitting the data into subsets called nodes. At each node, the algorithm investigates all possible splits of all explanatory variables (Tottrup et al. 2007). Partitioning the data is based on reducing the deviance from the mean of the target variables (Y bar). Y i is the target variable of each data. A search is conducted over all predictors and possible split points such that the reduction in deviance, D(total), is maximized (Breiman et al. 1984).

$$ D\left(\mathrm{total}\right)={{\displaystyle \sum \left({Y}_i - {Y}_{\mathrm{bar}}\right)}}^2 $$

The cut point, or value, splits the data into two mutually exclusive subsets, left and right subsets. The reduction in deviance is expressed as follows (Rokhmatuloh et al. 2005):

$$ {\Delta}_{\mathrm{j}\;\mathrm{total}}=\mathrm{D}\ \left(\mathrm{total}\right)\hbox{--}\ \left(\mathrm{D}\left(\mathrm{L}\right) + \mathrm{D}\left(\mathrm{R}\right)\right) $$

where D(L) and D(R) are the deviances of the left and right subsets. The algorithm first searches maximized (△j, total) over all predictor variables and possible cut points subject to the constraint that the number of members in the left and right subsets is larger than some minimum value or user-defined value (Borel and Gerstl 1994; Rokhmatuloh et al. 2005).

Modelling

The methodology for deriving percent tree cover with RT consisted of five steps for this study (Donmez et al. 2011):

  1. i)

    Generate reference percentage tree cover data

  2. ii)

    Derive metrics from LANDSAT data

  3. iii)

    Select predictor variables

  4. iv)

    Fit RT models

  5. v)

    Accuracy assessment and final model and map production

  1. i)

    Modelling percent tree cover relies on the quality of training and testing data. Digital multi-spectral GeoEye-1 images with a spatial resolution of 1.84 m were used to derive reference percentage tree cover data needed to train the model.

  2. ii)

    Normalised difference vegetation index (NDVI) that was derived is a ratio of the difference and total in reflectance between near-infrared (band 4) and visible red (band 3) of the LANDSAT standard band setting.

  3. iii)

    Predictor variable selection involved feature selection for the most relevant input variables for the percent tree cover modelling. This was accomplished using the stepwise linear regression (SLR) method from S-PLUS (Insightful Corp 2001), which also provides classification and regression tree software. The SLR method selects the best subset of predictor variables to be employed in regression tree modelling using a stepwise procedure, which repeatedly alters the model at the previous step by adding or removing predictor variables (Helsel and Hirsch 2002). The Cp statistic is expressed as:

    $$ {C}_p = p+\frac{\left(n-p\right)\left({s}_p^2-\mathrm{o}{\prime}^2\right)}{\mathrm{o}{\prime}^2} $$

    where n is the number of observations (number of training data), p is the number of coefficients (number of predictor variables plus one), s p 2 is the mean square error (MSE) of the prediction model and \( \mathrm{o}{\prime}^2 \) is the minimum mean squared error (MSE) among the possible models (Rokhmatuloh et al. 2005). The Cp statistic for each variable was examined. The Cp statistic provides a convenient criterion for determining whether a model is more accurate by adding or removing the predictor variables. The Cp statistic specifies which predictor variables are significantly related to percentage tree cover prediction.

  4. iv)

    Validation of the model results is one of the most important steps in modelling process. The results of the RT algorithm were evaluated through a cross-validation (CS) technique. Sample data were divided into complementary subsets, performing the analysis on the first subset called the training set and validating the analysis on the other subset called the validation set or testing set. For each split, the model is fit to the training data, and predictive accuracy is assessed using the validation data. The results are then averaged over the splits. Cross-validation estimated the expected level of fit of a model to a data set that is independent of the data that were used to train the RT model. The most relevant input variables are selected using the SLR method, and the available training with the reference data derived from high resolution images, relationships between tree cover density and LANDSAT spectral values that were modelled using RT technique were fitted in model evaluation. A total of 20 % cells was separated from the predictor variables in order to validate the model. Summary of percentage tree estimates using regression tree method is shown in Fig. 2.

    Fig. 2
    figure 2

    Summary of percentage tree estimates using regression tree method

Results

Overall results of this study comprise three parts, including generating the predictor variables, model validation and spatial composition of the percent tree cover.

Predictor variables from LANDSAT data

NDVI bands were used as a biophysical variable in addition to the LANDSAT standard band setting to increase the accuracy of the model results. A total of five NDVI maps including different months were produced and used as biophysical variables. These NDVI images were combined with the spectral bands of LANDSAT TM/ETM images to carry out the RT model. Predictor variables derived from images are shown in Table 2.

Table 2 Predictor variables derived from LANDSAT ETM + images (band 1: blue (0.45–0.52 μm), band 2: green (0.52–0.60 μm), band 3: red (0.63–0.69 μm), band 4: near-infrared (0.77–0.90 μm), band 5: short wave infrared (1.55–1.75 μm), band 6: short wave infrared (2.09–2.35 μm))

In the RT modelling process, a total of 30 spectral bands and five NDVI images were utilized as predictor variables to derive percentage tree cover map of the study area. The RT were carried out by approximately 100 rules. Each rule comprised various number of training cases. Most relevant and contributed predictor variables in the RT model for estimating percentage tree cover are shown in Table 3.

Table 3 The most four contributed predictor variables in each rule for estimating percent tree cover

Among the all predictor variables, the RT required only a few bands as critical inputs that were used in the production rules. The most four contributed of predictors are Red, Near-Infrared and NDVI bands for each LANDSAT TM/ETM image. The maximum range of the tree cover was varied between 74 and 92 for those predictors.

Model validation

Validation of the regression tree map derived from the RT model was carried out using 1654 validation pixels in total derived from high resolution images. Sixteen scenes of the GeoEye-1 data were selected from most green season for the area with representative tree stands such as Evergreen Needle Leaf as testing data. The random pixels were selected from the test data and ground truth based on a random sampling method. It contained percent tree cover values ranging from 0 to 100 %. The model derived from SLR-selected variables produced a reasonable prediction error with 5.50 %. The model results were represented with a high correlation coefficient of 0.80 (Fig. 3).

Fig. 3
figure 3

Correlation between modelled and observed tree cover

Two sub-scenes of GeoEye-1 images representing different forest cover types were classified and recorded to tree and non-tree pixels at 1.84 m spatial resolution. Location of these images in the study region is shown in Fig. 4. This data set covered an area of 210 km2. The classification results were then resampled to estimate percentage tree cover at the LANDSAT TM/ETM spatial resolution (Fig. 5).

Fig. 4
figure 4

Location of the test sites in study region

Fig. 5
figure 5

Extracting percent tree cover from very high-resolution GeoEye images. a RGB color composite images, b tree extraction as tree/non-tree using maximum likelihood algorithm

Maximum likelihood algorithm captured the tree-covered areas as testing and training data. These data sets included highland areas to include high variability of the vegetation cover within the area.

Accuracy of the RT model was evaluated by means of correlation coefficient for training and testing data sets (Table 4). The model application showed a strong agreement between modelled and observed tree cover data. There is a strong relationship between the predicted values and the validation cases within each terminal node.

Table 4 Accuracy of the RT model

In total, the RT model was based on 108,679 training cases. Average error was varied between 7.9 and 8.3 for training and testing data by 100 total rules. Standard deviation (STD) and root mean square errors (RMSE) in various tree cover strata are also shown in Table 5.

Table 5 Standard deviation and RMSE in tree cover strata

STD values are varied between 4 and 10 in different tree cover percentages. In terms of its deviation and RMSE, the model showed a good performance to estimate the tree cover lower than 30 %. It has also good agreement with over 76 % of percentage tree cover.

Spatial composition of percentage tree cover

A percentage tree cover map layer was produced for Goksu Watershed by integrating remotely sensed data into the RT model (Fig. 6). It was resulted that SLR-selected variables estimated the tree cover within the range of 0 and 100 %.

Fig. 6
figure 6

Percent tree cover map of Goksu watershed (map projection: UTM, WGS 1984)

This map provided the tree cover representing the Mediterranean forest with a higher spatial detail emphasizing the tree cover distribution of different forest stands. The zero percent tree cover areas are widely located in central parts of the region. In contrary, 80–100 % tree cover took part in south-eastern and south-western areas.

Tree coverage of each forest stand was also derived using land cover map and the percent tree cover map derived from the RT model (Fig. 7). The tree cover grid cells of each forest stand were extracted from the percent tree cover map and shown in Fig. 8. J. excelsa, P. brutia, P. nigra, C. libani and Q. cerris stands were mapped for the study area.

Fig. 7
figure 7

Percent tree cover maps of different forest stands in Goksu Watershed

Fig. 8
figure 8

Percent tree cover comparison of the forest stands in Goksu Watershed (Juniperus excelsa, Abies cilicica, Pinus nigra, Pinus brutia, Quercus cerris, Cedrus libani)

The maps of forest stands showed the exact locations of the forests and their coverage. There is a strong influence towards the valley regions. Significant amounts of forest are located in the eastern part of the region towards the Goksu River Delta. A small forest patch is located in the lowland plateaus.

The major forest stands in the region showed a great variation in terms of their percent tree cover distribution. The tree cover of Juniper and Turkish Pine stands mostly ranged between 20 and 60 %. Turkish pine has also a significant amount of tree cover over 80 %.

Discussions

A comprehensive percent tree cover map showing forest distribution of the Goksu Watershed is the main output from the RT modelling process applied in this study. Percent tree cover map is a key component to be used in combination with various data sets in spatially distributed models for estimating carbon distribution and potential for a subsequent year. The approach of this study includes the coupled analysis of high resolution images and ground information to derive spatially explicit and internally consistent forest cover map.

This study contains a large amount of uncertainty due to various data sources and model structures. The results produced by the RT model should not be considered as precise predictions due to various factors in evaluation process. Accuracy of the estimated percent tree cover map was evaluated by incooperating ground-based information, land cover map and model outputs. The comparison of these maps at randomly sampled sites revealed that the agreement with land cover data set was relatively similar with percent tree cover map. However, the existing grasslands in the output had tree coverage, which overestimated tree cover at open forests and croplands.

The large degrees of uncertainty in the different parts of the modelling process limit the final output. This study indicated that a precise estimation is not possible due to the limited capability of the models at local scale. To cope with this issue, simplified versions of the spatially distributed models should be developed to estimate existing forest cover at local scale. Additionally, using various high resolution spatial data for training and testing the model performance will tend to reduce uncertainty. This will enable more accurate determination of forest extent where the species diversity is high.

Conclusions

A new approach for mapping percentage tree cover across the Eastern Mediterranean part of Turkey using regression tree model and multi-temporal LANDSAT TM/ETM data with a 30-m spatial resolution was presented in this study. The results were reasonable with a correlation coefficient of 0.80 and prediction error of approximately 5.0 %. This approach provided a significant potential for forest cover mapping and monitoring in a watershed scale by means of high resolution remote sensing data adaptation into its processes.

The use of high resolution remotely sensed data in the RT algorithm caused high computing power requirements. With respect to data processing steps, some refinements are recommended to facilitate the simulation process by sub-dividing the data inputs into smaller portions. Hence, the algorithm was adapted to divide the data set by 100 parts and the simulation was carried out with those portions. The outputs of these subsets were combined, and percent tree cover map of Goksu Watershed was derived.

The spatial composition of the forest stands was also delineated by integrating the percent tree cover and land cover maps of the region. The fractional maps of Juniper, Taurus Fir, Turkish Pine, Crimean Pine, Cedar and Oak were produced that revealed a significant improvement in the spatial representation of the landscape. These maps clearly showed that the lower lands of the watershed are still forested, although anthropogenic effects on the natural vegetation are threatened by large spatial patterns of agricultural areas and settlements.

The spatially distributed results showed that most of the forests are located in central highlands of semi-natural areas where strong linkage to ecosystem variability exists. High spatial resolution output maps for each forest stand are especially important to reveal the level of human disturbances on forest areas that can guide the decision-makers at local scale.

Single-year results were presented within this paper. However, the approach used in this study might also provide an opportunity to simulate the tree cover dynamics for a longer term of monitoring. Annual time series of input data could provide a significant option for local and regional monitoring of vegetation patterns. Thus, our understanding of the complex vegetation dynamics and their interactions between humans will be improved by combining the high resolution remotely sensed material and geo-spatial modelling techniques.

LANDSAT data provided a great potential to derive percent tree cover within its spatial resolution and spectral variability. Its combination with very high resolution GeoEye-1 images showed good performance as training and testing data sets in the RT model. The cloud cover and its partly shades on the region were a problem. This was resolved by a simple technique based on surface reflectance values from other cloud-free months. This technique provides an advantage for the temporal modelling of percent tree cover in complex regions to monitor forest change studies with RT modelling.