Introduction

Forests contain a large amount of terrestrial biodiversity and are considered important components of terrestrial ecosystems that provide several crucial ecosystem goods and services to humanity (Gamfeldt et al. 2013). Forest biodiversity encompasses not only trees but also all life forms found within forest ecosystems. However, trees are essential elements of forests that cannot be replaced with any other elements. Trees, along with various other ecological variables, create ecological and physical variations in forests and provide food, shelter and living space for numerous organisms (Jones et al. 1994). Meanwhile, they provide a humid environment for understory plants, and their fallen leaves affect soil conditions and protect seeds, seedlings and soil organisms (Callaway and Walker 1997). Tree species diversity promotes timber production (Morin et al. 2011), influences carbon storage and provides more resistance to disturbance effects (Pedro et al. 2015) and insect herbivores (Castagneyrol et al. 2014). Gamfeldt et al. (2013) stated that tree species richness in boreal and temperate forests was positively affected by multiple ecosystem functions, such as tree growth, topsoil carbon storage, berry production, game production potential, the presence of deadwood and biodiversity in the understory. Therefore, knowing and conserving a variation of tree species in a forest is crucial to ensure a future potential of high levels of multiple ecosystem services (Gamfeldt et al. 2013).

Plant diversity maps play an outstanding role in effective management and decision-making for vegetation landscapes. Thus, their significance is expected to be especially considered in forested areas by the forest managers because of the difficulty in managing and appreciating the diversity of forest trees (Foody and Cutler 2006). In particular, the concept of biodiversity as a primary principle should be considered during protection and conservation planning, management and utilization (Foody and Cutler 2006; Kiran and Mudaliar 2012). Enhancing, determining, assessing and monitoring forests biodiversity have become an important issue when protecting habitats and a huge number of species (Wenting et al. 2004; Kiran and Mudaliar 2012). Therefore, to plan and manage forests in an effective and sustainable way, it is essential to understand tree species diversity and composition of forests (Fallah et al. 2012; Kiran and Mudaliar 2012). Satellite remote sensing images have contributed to identify biodiversity. These data are updatable, inexpensive and available at various temporal and spatial scales (Foody and Cutler 2003, 2006; Wenting et al. 2004; Gillespie et al. 2008; Oldeland et al. 2010; Boyd and Foody 2011; Pettorelli 2013; Duro et al. 2014; Rocchini et al. 2015). Derived vegetation indices from satellite data have generated information directly related to tree species diversity. However, it is quite troublesome to collect data in field-based studies when the study area is large. Therefore, using remote sensing data sets is valuable and might be recognized as a convenient way to obtain the distribution of biodiversity and their status over large areas (Turner et al. 2003). Accuracy increases with the acquisition of ground truth samples. Although, field-based studies are time consuming, expensive and limited on a large scale, especially in forest biodiversity inventories and forest planning (Buhk et al. 2007; Fallah et al. 2012; Dalmayne et al. 2013). Remote sensing data can be used for accomplishing and updating (Fallah et al. 2012) mentioned studies.

There are numerous remote sensing data sets that could reliably be used in biodiversity studies (Carlson et al. 2007; Kalacska et al. 2007; Nagendra and Rocchini 2008; Oldeland et al. 2010; Ghiyamat and Shafri 2010; Nagendra et al. 2010, 2013; Getzin et al. 2012; Ghahramany et al. 2012; Ewijk et al. 2014; Rocchini et al. 2014; Warren et al. 2014; Hernández-Stefanoni et al. 2014; Somers et al. 2015). Nonetheless, using Landsat data is commonly used data. They are freely available and images were taken since 1972. Landsat data also have appropriate geometric, spectral, spatial and temporal resolution over large areas (Rocchini 2007a; Nagendra et al. 2013; Pettorelli 2013).

It should be emphasized that the utility of remote sensing data has been increasing for determining species richness and long-term preservation decision-making (Carroll 1998; Gould 2000; Nagendra et al. 2013) and identifying prominent conservation areas (Carlson et al. 2007) and ecological restoration efforts (Champagne et al. 2004). Studies on the relationship between species richness and remote sensing data have increasingly used vegetation indices (VIs) and band combinations. It should be noted that the usage of NDVI in estimating species richness has increased. In a previous study, there was a positive correlation between plant species richness and NDVI both in temperate and tropical ecosystems (Gillespie et al. 2008). It has become possible to calculate species turnover or beta diversity of an area using the spectral heterogeneity of remote sensing data (Rocchini et al. 2009b). In this method, beta diversity has been obtained using compositional and spectral similarity matrices among sampling plots (Schmidtlein and Sassin 2004; Rocchini et al. 2009b, 2015). Generally, in most conducted researches, the similarity in species composition among pairs of sampling plots has been characterized using the Sørensen index or the Jaccard index of sampling plots, and the spectral distance can be accessed using Euclidean distance (Rocchini et al. 2009a, 2015). In this method, a large decay in the similarity of species among sites corresponds to high beta diversity in terms of species composition (Rocchini et al. 2015).

There are more tree diversity studies in tropical forests (Gillespie et al. 2009) than in temperate forests (Fairbanks and McGwire 2004; Levin et al. 2007; Meng et al. 2016). Temperate forest ecosystems have high plant biodiversity, and species composition changes due to climate change in high-latitude ecosystems have been reported (Kirschbaum et al. 1995; Solomon 2007). A number of conducted studies are often less which is considered seasonality differences impacts on tree species diversity in temperate forests. Thus, it is essential to do tree diversity research in temperate forests. The aim of present study was to analyze the relationship between alpha diversity of trees and spectral variables derived from freely available Landsat data in a temperate forest. We used richness and the Shannon and Simpson indices to calculate tree alpha diversity. The Shannon index is recommended for landscape management within an ecological framework because it is sensitive to the presence of rare species, while the Simpson index can be used if the dominant species is more important (Nagendra 2001). We also investigated the relationship between beta diversity and remotely sensed data using species composition similarity and spectral distance similarity of sampling plots via quantile regression.

Material and methods

Study area

The study was conducted in the Gönen dam watershed area, which is located in the northeast part of the Kazdağı Mountain in Turkey (26.960–27.540° E, 39.640–40.100° N) (Fig. 1). The watershed covers approximately 113,700 ha, and the elevation ranges from 90 to 1400 m a.s.l.. The mean annual precipitation is 847.3 mm, and the mean annual temperature is 12.8 °C according to long-term data from the nearest meteorological station located in the Yenice Province. The forests are composed of pure or mixed conifer and broadleaf trees. The following 27 tree species are found within the study: Pinus nigra J. F. Arnold. Subsp. pallasiana (Lamb.) Holmboe, Pinus brutia Ten, Pinus pinea Ten., Abies nordmanniana (Steven) Spach subsp. equi-trojani (Asch. & Sint. ex Boiss.) Coode & Cullen, Quercus cerris L., Q petraea (Matt.) Liebl. subsp. iberica (Steven ex M.Bieb.) Krassiln., Q. frainetto Ten., Quercus pubecens Willd., Quercus infectoria Oliv., Fagus orientalis Lipsky, Carpinus betulus L., Castanea sativa Mill., Sorbus torminalis (L.) Crantz, Tilia tomentosa Moench., Tilia platyphyllos Scop., Salix caprea L, Populus tremula L., Acer campestre L., Acer platanoides L., Alnus glutinosa, Arbutus anrachne, Arbutus unedo, Cornus mas, Phillyrea latifolia L., Prunus sp., Ilex colchica Pojark., Robinia pseudoacacia L.

Fig. 1
figure 1

Location of the study area in Turkey. The size of vegetation sampling plot symbols corresponds to tree richness number (red spots indicate the distribution of used sampling plots in the study)

Data gathered

Field data collection and diversity variables

The watershed boundary was determined with a digital elevation model (DEM) using GIS, and it was systematically divided into 3 km × 3 km grids. A total number of 99 sampling units, each 20 m × 20 m, were selected using geographically stratified random sampling within these grids (plantation areas, maquis and canopy cover less than 50% were excluded). The field data were collected in May, June and July 2013. Within each plot, the tree species were identified, and all of the trees with a diameter at breast height (dbh) larger than 7 cm were measured. The presence/absence and abundance data (tree species number and tree species basal area) of tree species were used to find the relationship between richness and the Shannon and Simpson diversity indices and spectral variables (SVs), which were derived from Landsat images. The Shannon index was computed twice, based on tree species number and tree species basal area.

Spectral variables

NDVI (normalized difference vegetation index), greenness, DVI (difference vegetation index), and EVI (enhanced vegetation index). They are the most widely used vegetation indices and are calculated from two regions of the electromagnetic spectrum (the red band and near-infrared [NIR] band) (Tucker 1979; Cabacinha and Castro 2009; Boyd and Foody 2011; Baig et al. 2014; Ahmed et al. 2015). These wavelength regions provide valuable information for vegetation studies (Broge and Leblanc 2001). Most recent studies have shown relatively high correlations between NDVI and plant richness (Gould 2000; Levin et al. 2007; Gillespie et al. 2008).

We used NDVI, DVI, EVI and greenness in our study. In addition, the near-infrared band (OLI B5) was evaluated because of its great potential for recognizing plant species (Rocchini et al. 2007). In this paper, we refer to NDVI, DVI, EVI, VI, greenness and OLI B5 as SVs (spectral variables), and tree richness and the Shannon and Simpson indices as the TDVs (tree diversity variables) (Table 1). Several forest studies have illustrated the consistency of derived SVs between Landsat OLI and ETM+, which are complementary data sets (Li et al. 2013; Xu and Guo 2014; She et al. 2015).

Table 1 Variables used in this study

Image data and processing

Landsat 8 OLI images, which have a spatial resolution of 30 m, were acquired from the US Geological Survey for six dates in 2013 (18 May, 19 June, 21 July, 7 September, 9 October and 10 November) (http://earthexplorer.usgs.gov/). It should be emphasized that the images were selected for various seasons due to the seasonality of vegetation vigor, which likely changes the vegetation indices (Freitas et al. 2005; Cabacinha and Castro 2009). All of the images were Level 1T, precision-geocoded and terrain-corrected products. The DN values of all of the images were converted to the physical measure of the top of atmosphere reflectance (TOA) and atmospheric correction is done using the Dark Object Subtraction 1 method (DOS1) (Chavez 1996; Congedo 2016).

An existing local road map conformed well to all of the satellite images, demonstrating the appropriate geometric correction of satellite data. However, to increase the accuracy, the image taken in May was used as a reference image, and other images were georeferenced. Topographical correction was not considered because the band-ratio vegetation indices significantly reduced the noise caused by topographical variation. The SVs were then generated for all six dates. The vector layer of the sampling plots was overlaid on each SV. For all SVs, the values of the corresponding pixels in each sampling plot were obtained using the mean DN derived from 2 × 2 and 3 × 3 pixels sampling windows of images to reduce errors caused by both image registration and plot georeferencing between field data and satellite data (Rocchini et al. 2009b). All image processing and GIS analyses were performed with the “semi-automatic classification” plugin of Quantum GIS (QGIS Development Team 2015), GRASS GIS (GRASS Development Team 2012) and SAGA GIS (SAGA Development Team 2015). The applied methodology is shown in Fig. 2.

Fig. 2
figure 2

Flow chart of the methodology

Data analysis

TDVs were calculated for the 99 plots using the “vegan” package (Oksanen et al. 2012) in the R-software (R Development CoreTeam 2015). All SV values were imported into the R environment. The Kolmogorov–Smirnov test was used to check data normality. The Spearman correlation test was used to assess the relationships between the SVs and the TDVs for all dates. We then used the NDVI index, which had statistically high correlations among dates, to identify outliers using box plots (Fig. 3). All the outliers that were repeated three or more times were interpreted. Heterogeneous variation is typically found in ecological data, and it is usually inadequate for explaining the relationship between variables in regression analyses (Cade and Noon 2003). Quantile regression was used to determine the relationship between beta diversity and spectral values, as proposed by Rocchini et al. (2009a). OLS regression was used to determine the difference between classical and quantile regressions. The quantile regression analysis was performed with the “quantreg” package (Koenker and Koenker 2007) in the R-software (R Development Core Team 2015). To quantifying beta diversity, species composition similarity matrices were calculated using the Jaccard coefficient, and spectral distance matrices were identified using the Euclidean distance between sampling plots to assess beta diversity. However, pairwise species composition similarity was computed in two distinct ways. First, we assessed tree species number in each sampling plot. Then, tree species basal area in each sampling plot was assessed because it better represents trees abundance, composition and structure.

Fig. 3
figure 3

Boxplots between NDVI values and tree richness and their outliers in May, June, July and September

Results

Descriptive statistics of the variables

The results of the Kolmogorov–Smirnov test demonstrated that SVs and TDVs were not normally distributed. Tree richness and the Shannon and Simpson indices ranged from 1 to 6, 0.6 to 1.6, and 0.5 to 0.75, respectively. In our study, from a total number of 99 sampling plots, 28, 26, 22, 9, 9 and 5 plots contained 2, 3, 1, 4, 5 and 6 tree species, respectively.

Correlation between spectral and tree alpha diversity variables

A significant positive correlation at the 0.001 level was observed between selected tree alpha diversity variables (richness and the Shannon and Simpson indices) and the SVs in June, July and May (Table 2). The Shannon index calculated from basal area (3 × 3 SIBA (Shannon Index Basal Area)) of trees had the highest positive correlation with NDVI (r = 0.685) in June. In addition, the correlation values of all of the SVs were nearly equal to 0.6 in 2 × 2 SIBA and 3 × 3 SIBA variables in May, June and July. In September, October and November, less significant and lower correlations were estimated for all tree diversity variables (TDVs). Overall, the SVs appeared to be more sensitive to SIBA, followed by the Shannon index, the Simpson index and tree richness. Generally, the correlations between the SVs and the TDVs were moderate and statistically significant in June, July and May. However, the highest significant correlations between tree richness (2 × 2) and NDVI (r = 0.594) were in June (Table 2). OLI B5(r = 0.561) was the most significant in June, followed by EVI (r = 0.550) in July and greenness (r = 0.538) in June. The Shannon index was the most correlated with June values of both NDVI and OLI B5 (r = 0.608 in both analyses). VI was the least correlated SV with the TDVs. All SVs had a slightly lower correlation in October than the other months, and NDVI has the lowest correlation of all TDVs with other SVs in October. There was no significant correlation in November (Table 2). The SVs, especially NDVI, and all TDVs (except the Simpson index) were highly positively correlated in June.

Table 2 Spearman correlation coefficients between species diversity variables and SVs (Shannon Index Basal Area (SIBA))

The boxplots of tree richness in the 4 months showed a statistical positive relationship between NDVI and tree richness of the study area (Fig. 3). When the boxplots were investigated, most of the outliers of the sampling plots were repeated two or more times. Plot numbers 54, 71, 11 and 65 had values greater than normal, and plot numbers 132, 20 and 106 have a value lower than normal. In particular, plot numbers 54 and 65, with one and four tree species, respectively, were available in May, June, July and September. Plot numbers 70 and 2, with three and four species, respectively, were outliers in May, June and July. Additionally, plot numbers 11, 132 and 106, with 2, 3 and 5 plant species, respectively, were outliers in June, July and September. After determining all outliers, outliers repeated three or more times were investigated in detail with information from the field, such as tree species, canopy closure as estimated visually, basal area of the dominant tree, total basal area, diameter class, number of suppressed trees and total number of trees (Table 3).

Table 3 Canopy closure, basal area of dominant tree, total basal area, diameter class, number of dominant tree and total number of tree species at upper and lower outliers of plot numbers in 4 months. (M: May; J: June; Jy: July; S: September)

The decay rates in the curves for both the tree species number and tree species basal area (Fig. 4) were statistically significant for OLS and quantile regression at all τ values (Table 4). For the species number method, the decay rate of the ordinary least square regression was almost half of the quantile regression decay rate at all tau values. For the tree species basal area method, the decay rate of the OLS regression was half of the 0.90, 0.95 and 0.99 tau quantile regression decay rates. For both methods, the intercept of the OLS regression was lower (0.27) than all quantile regression intercepts. The tree species basal area method had higher intercepts than the tree species number method for all tau levels except 0.99 (0.956).

Fig. 4
figure 4

The distance decay of species similarity was calculated with a tree species numbers in sampling plots and b tree species basal areas in sampling plots. Ordinary least square (red line) and quantile regressions (dashed lines) considered four different τ (from upper to lower lines: 0.99, 0.95, 0.9, 0.75)

Table 4 Linear models for both ordinary least squares and quantile regressions at different quantile tau (0.99, 0.95, 0.9, 0.75)

Discussion

This study was conducted to determine the relationship between the field-based alpha and beta diversity indices and vegetation indices derived from remote sensing data. We used medium-resolution remote sensing data (30 m spatial resolution) which is freely accessible for more than 40 decades. In fact, it provides an opportunity to do time series analysis of forest trees biodiversity. In addition, we tried to investigate the effects of ecological variation on spectral values. We reduced geometric error between the field data and satellite data using the mean DN derived from 2 × 2 pixel and 3 × 3 pixel windows. Generally, like other studies (Rocchini et al. 2009b, Meng et al. 2016), the 3 × 3 pixel correlation analysis produced higher correlation values than the 2 × 2 pixel correlation analysis, although it was not observed a very significant meaningful increase.

Across all six dates, the Shannon-Weiner species diversity index calculated from two different abundance data sets generally had the best correlation results of all TDVs. Our results complement previous studies within different vegetation types, including tropical dry forest (Fairbanks and McGwire 2004; Kalacska et al. 2007), savannah (Oldeland et al. 2010) and temperate forest (Meng et al. 2016). The Shannon index takes into account species abundance and also the structural and compositional variety of the plant community (Foody and Cutler 2003; Dogan and Dogan 2006; Oldeland et al. 2010). We also used the Shannon index based on the basal area of trees (SIBA) to improve the correlation results calculated from the Shannon index based on number of tree species. Another forest study also found correlations between the Shannon index and SVs. Meng et al. (2016) found correlation coefficients of r = 0.532 for NDVI and r = 0.824 for VI calculated from 9 × 9 pixel windows of SPOT data. For the 2 × 2 pixel windows, the correlation values between SIBA and the SVs were similar (June had the highest values, between 0.622 and 0.667), except for VI, which had a lower correlation (r = 0.399). For the 3 × 3 pixel windows, the correlation values between the SVs and SIBA were slightly higher (0.611–0.685) than those found in the 2 × 2 pixel windows, and VI had a lower correlation (r = 0.466). These results are similar to other studies (Meng et al. 2016). The Simpson index had a slightly better correlation compared with tree richness. There is a limited number of studies that have characterized a correlation between the Simpson index and SVs in temperate forests using Landsat TM (Meng et al. 2016).

Generally, NDVI had the best result according to correlation values of the SVs and TDVs in all six dates, which is similar to other studies that have conducted correlation and regression analyses in temperate forests (Lassau and Hochuli 2007; Levin et al. 2007; Mohammadi and Shataee 2010; Meng et al. 2016) and tropical regions (Carlson et al. 2007). Other authors have calculated correlation coefficients of r = 0.85 using AVIRIS (Gould 2000), r2 = 0.788 (Fairbanks and McGwire 2004), r = 0.79 and r2 = 0.86 in subtropical forest (Gillespie et al. 2009), r2 = 39% in tropical forests using Landsat ETM+ (Foody and Cutler 2006), and r = 0.69 in tropical forests. EVI had the highest correlation, with r = 0.672 (John et al. 2008) and with r2 = 0.16 in Inner Mongolia (Mediterranean region). EVI has less saturation problems (Scaggs 2007), is less sensitive to background reflectance (Jiang et al. 2008; Rocha and Shaver 2009) and is less influenced by atmospheric conditions (Jiang et al. 2008). The near-infrared band also had positive correlations for all dates, which can be explained by its ability to discriminate plants species. This finding is consistent with previous studies (Nagendra 2001; Hernández-Stefanoni and Dupuy 2007; Rocchini 2007b; Mohammadi and Shataee 2010) (Mohammadi et al. 2011, Meng et al. 2016). The near-infrared band can be used as a good proxy for estimating and assessing alpha diversity at the regional and local scales.

The highest correlations between the SVs and TDVs were in June, followed by May and then July, which are growing season months. The correlation values were low in September due to decreasing photosynthetic activity and short-term water stress (Volcani et al. 2005; Wang et al. 2016). Leaves are fully developed and have high photosynthetic activity (Volcani et al. 2005; Wang et al. 2016) in June. This is a known fact that is emphasized in our study and in similar studies that have investigated what drives better correlations between SVs and TDVs (Carlson et al. 2007). These studies were conducted in tropical forest systems using AVIRIS (Gould 2000), where r2 = 0.788, in subtropical forests (Fairbanks and McGwire 2004; Gillespie et al. 2008), and especially in temperate forests (Levin et al. 2007; Mohammadi and Shataee 2010; Mohammadi et al. 2011; Simonson et al. 2012; Viedma et al. 2012; Warren et al. 2014; Ceballos et al. 2015). Among the SVs, VI almost had the lowest correlation with the TDVs, which is inconsistent with Meng et al. 2016.

Although we obtained moderate correlation results, we also investigated unexplained variation in the correlations between the SVs and TDVs. In all outliers, it was observed that the canopy closures were 70% higher than in other plots. Thus, we identified sampling plots which had outliers and then we have tried to investigate outliers that were repeated three or more times. Surprisingly, most of upper outliers, except for sampling plot 65, the basal area of F. orientalis covered most of total basal area (more than 50%) and in the lower outliers, Quercus sp. had small diameters. The outliers with F. orientalis had a higher reflectance in the NIR band which is likely due to the leaf structure and higher reflectance of the beech trees (Tittebrand et al. 2009; Vorovencii 2011). In contrast, the basal area of suppressed trees was less than 6% of the total basal area in the lower outliers. By comparing the upper and lower outlier trees, in most cases, the upper outliers had nearly 50% fewer trees and 200% more basal area than the lower outliers. These differences are likely due to differences in the development stages of trees in the sampling plots, which affected the correlation results. However, it should be mentioned that correlation results is likely impacted by existing forest understory and with trees species under 10 cm as well.

Many studies have characterized the relationship between SVs and alpha diversity, but few studies have assessed beta diversity (Rocchini 2007b; Rocchini et al. 2009a, 2009b). Quantile regression is used to determine the relationship between spectral diversity and beta diversity. Beta diversity is quantified using quantile regression and OLS regression methods which are used by Rocchini et al. (2009c) and other ecological works (Rocchini 2007b). In our study, the slope of the OLS regression was low, while in the upper quantile regression, it was about twice as high. This result was explained by the increase in the similarity between the sampling plots which is located close together (Nekola and White 1999; Palmer 2005; He et al. 2009). Obtained Jaccard index values were close to one, with little difference in the two beta diversity methods. The similarity of the sampling plots increased and when it reached to zero indicator of increasing dissimilarity. The intercepts were different because they were strongly related to the tree’s development stage in a number of sampling plots at the tree species basal area method. The tree basal area was better than the tree species number method at representing this similarity. In tree diversity studies, the data continuity of Landsat products have great advantages in long-term mapping and monitoring forest ecosystems. Additionally, it is possible to compare Landsat 4, 5 and 7 data using cross-calibration methods (Markham and Helder 2012; Morton et al. 2012) and NDVI can be obtained using these datasets.

Conclusion

Finally, SVs, especially NDVI, can be considered a useful method for evaluating the physiological changes of forests in different seasons and helpful for estimating plant alpha diversity over large areas during the maximum growing season. The near-infrared band can also be used as a quick way to obtain plant alpha diversity in research. In tree diversity studies, differences between NDVI for various tree species should be considered because each tree species has specific reflectance patterns at red and near-infrared wavelengths (i.e., different spectral signatures) depending on the season, leaf structure and other variables in both temperate and tropical forests. Beta diversity was assessed through spectral similarity of remote sensing data. In general, regions that are close together are more similar and their similarity will decrease with increasing distance from each other. Additionally, remote sensing data will allow for the assessment of habitat diversity and the identification of areas that are likely to support relatively high or low levels of species diversity. Further investigations can consider the application of ESA’s Sentinel 2 which is free optical data with a higher resolution instead of Landsat data in forest trees diversity studies.