Keywords

1 Introduction

The analysis of farmers data is an important opportunity to advance our knowledge of the ecology of agrosystems. One of the most promising approaches to synthesize these data is the use of data-driven approaches where the data drive the construction of models for predictive purposes [1]. Though this approach is powerful for predictive purposes we believe that such large datasets can also be successfully used in the context of traditional hypothesis confirmation studies. Here we seek confirmation in the data for the hypothesis that within-field NDVI variability is driven—also—by the micro-topography (elevation). We do this using a large set of data obtained from “Akkerweb”—a geoplatform popular with Dutch farmers (2,akkerweb.eu).

Our main working hypothesis is that the lowest portions of a field are generally more wet and therefore crop growing there is less water-limited. This hypothesis is not novel, for example Kravchenko et al. [3, 4] found that yield variability at within-field level in corn and soybean fields in the Midwest of the US was associated to topography. Timlin et al., [5] found in the context of a rainfed maize experiment that in dry years the concave parts of the field were more productive than convex parts whereas the effect of terrain curvature disappeared in wet years. Da Silva et al. [6, 7] found that in a Portuguese irrigated maize field, characterized by a high elevation range, topography was driving within-field yield variability and showed that maize yield was higher near the flow lines. These findings on yield-topography correlation were extended by Maestrini and Basso [8] who observed that not only the yield average but also yield temporal variability (stability) was influenced by topography, with the more concave portions of the field being characterized by a higher variability as a result of being either waterlogged in wet years or relatively more wet in dry years.

Here we attempt to prove that elevation is a driver of within-field crop-growth variability even in the Netherlands, a land that is notoriously flat. Further we extend these findings by examining how different soil types and cultivar earliness influence such correlation. We believe that the correlation between NDVI and elevation—once sufficiently corroborated with data—could be used as an indicator of drought stress and deviations from such correlation may help us understand which management strategies (e.g. cultivar choice) alleviate drought stress.

In this study we focus on potatoes,—a major crop in the Netherlands—well represented in our farmers’ dataset. Potatoes have been generally regarded as a crop with a shallow root system [9, 10] that makes them vulnerable to drought stress, however there are empirical evidences that their roots can reach one meter [11] and beyond [12].

Therefore, here we hypothesize that also for the Netherlands we can establish a correlation between relative within-field elevation and crop-growth and that such correlation is influenced by both soil type and cultivar earliness. We expect that potato crops on sandy soils as well as late cultivars are more sensitive to dry spells. The sandy soils will be more sensitive because they have a low water holding capacity whereas late cultivars will suffer more when exposed to summer dry spells for longer periods. We apply our hypothesis to a dataset of Dutch farms where the location of the field, the cultivar, the soil type (clay or sandy) is known and the biomass throughout the season is proxied by the normalized difference vegetation index (NDVI) retrieved from satellite images, is known, along with elevation.

2 Materials and Methods

2.1 Data from Akkerweb

Data were collected from farmers’ data input to the geo-platform Akkerweb [2] (akkerweb.eu). Through this platform farmers can enter the information about their fields (polygon, crop and cultivar, and sowing dates) and receive information to support their decision making on—among others—in-season fertilization, crop protection, pests like plant-parasitic nematodes and variable rate applications like haulm killing for potatoes.

We received the anonymized data, supplied by the farmers to the Akkerweb geoplatform, through a query that returned the dataset records for which the cultivar, the start and end date of the management plan, and the polygon were available. The query returned the following fields: a key identifier for each field, the field polygon (in well-known text format), the area, the cultivar, the purpose (e.g. for potato consumption, starch, table or seed potatoes), start and end date of the management plan and soil type. The total number of unique records returned was over 100 k. We produced a subset of the dataset where the crop was potato, the purpose was not production of seed potato and the field area was larger than 2 ha and smaller than 100 ha. The fields smaller than 2 ha were excluded because often after the removal of a 10 m buffer there wouldn’t be enough pixels left to perform the analysis (particularly if they were characterized by an elongated form). The fields were cultivated in the years 2015 to 2019. Here we will focus on the analysis of the years 2016, 2017 and 2018 because these years are representative of very different weather conditions, particularly 2016 was a wet year, 2017 was a dry spring and a wet summer, and that 2018 was characterized by a very dry summer.

For this analysis we used only a randomly selected subset of records of 3249 fields located on soils classified either as sandy or clay, cultivated with potato (excluding seed potatoes) in the years 2016, 2017 and 2018.

2.2 Publicly Available Data

For each field we retrieved the images from the satellites Landsat 8 and Sentinel 2. For the years before 2017 we only retrieved images from Landsat 8. For each field we also used information from the Dutch digital elevation model (DEM) Actueel Hoogtebestand. This digital elevation was produced using airborn lidar data collected between 2007 and 2012 (AHN2). We used the version of the dataset interpolated at a resolution of 0.5 m. This DEM has “an accuracy of 20 cm for 99.7% of the points. The average point density for AHN2 is between 6 and 10 points per square meter” [13].

For each cultivar we tried to retrieve data on the cultivar performance, e.g. earliness score or susceptibility to late blight, reported by the cultivar vendor and available on the Akkerweb platform. The performance of individual cultivars are usually scored on a number of indicators on a scale from 1 to 9 (for earliness 1 corresponding to the latest varieties and 9 to the earliest varieties). In this context we used the score on cultivar earliness to make inferences about the influence of the cultivar earliness on the correlation elevation-NDVI.

We used Google Earth Engine [14] to perform calculations on satellite images (Landsat 8 and Sentinel 2) and the Dutch DEM (AHN2 interpolated to 0.5 m). For each polygon we applied the following algorithm (pseudocode):

  1. 1.

    Import the polygon to Google Earth Engine. The information about the polygon were passed in the form of a list of coordinates.

  2. 2.

    Remove a 10 m buffer from the border to make sure that we did not include pixels that were not heterogenous in land cover (e.g. half road and half field).

  3. 3.

    Check the images available in the Landsat 8 and Sentinel 2 surface reflection collection.

  4. 4.

    Remove the clouds and cirrus from Sentinel 2 and clouds and cloud shadows from Landsat 8 using the respective pixel quality bands.

  5. 5.

    Clip the raster regions to the clip polygon setting to “not available” the pixels out of the field.

  6. 6.

    Calculate for each image the following quantities were calculated through a reducer function:

    1. a.

      Standard deviation of elevation.

    2. b.

      Sperman correlation between NDVI and elevation. To calculate the correlation the elevation dataset was resampled to match the resolution of the vegetation index dataset because this was coarser than the DEM(0.5 m vs 30 m for Landsat 8 and 10 m for Sentinel 2).

If an image was completely covered by clouds, cirrus (for Landsat 8) or cloud shadows (for Sentinel 2) the calculated quantities were set to “not available”. The cloud coverage in the images was calculated from the quality band pixels in the two satellites (namely pixel_qa for Landsat8, and QA60 for Sentinel 2).

We retrieved cumulative rain from weather data using the set of weather stations of the Royal Netherlands Meteorological Institute (KNMI, 45 stations). For the purpose of this study we calculated an average Dutch weather for the Netherlands by averaging the values of all the weather stations (Fig. 1 left).

Fig. 1.
figure 1

Here we represent the correlation among over the growing season for the years 2016 to 2018 in contrasting soil types (clay vs sandy). The continuous lines represent the average measured across all the fields in a certain date. The dashed lines and the colored area represent the prediction and the 95% CI interval of the mean calculated using a 3rd degree polynomial equation. (Color figure online)

To approximate the temporal variability of correlation between NDVI and day of the year (DOY) we fitted a 3rd degree polynomial model to the data. This was intended more as a tool to visually interpolate the data rather than a predictive tool or an inferential tool.

2.3 Descriptive Statistics on the Retrieved Dataset

The cultivars present in our subset for which we performed calculations on satellite images are presented—anonymized—in Fig. 2 along with the earliness score.

Fig. 2.
figure 2

Number of fields for the cultivars (anonymised) by year used in this study. The total number of fields analyzed was 3249

We analyzed the influence of earliness scores on NDVI-elevation correlation and found that the distribution of cultivar earliness was the following in sandy soils, 3–335 fields, 4–180 fields, 5–548 fields, 6–449, the other earliness classes (1, 2, 7, 8, 9) had less 30 fields. To represent as evenly as possible the late and early classes we defined as late the cultivars with earliness score lower or equal to 5 (6 cultivars with more than 10 fields) and as late the other (3 cultivars with more than 10 fields).

The soil types on the investigated fields are presented in Fig. 3. The main soil types are sandy (in the eastern part of country) and clay (in the western part of the country) for sake of simplicity we excluded the other soil types from the analysis.

Fig. 3.
figure 3

Geographical distribution of clay and sandy soils in the fields represented in this study.

Figure 4 shows the distribution of the within-field variability of elevation in the Netherlands. It is well known that fields in the NL are generally very flat. The most uneven fields are located in the western part of the country and in the southern province of Limburg (median standard deviation of elevation being 20 cm) whereas as expected the fields in the region of Flevoland, reclaimed from the sea in the sixties and seventies of last century were the most flat (median standard deviation of elevation typically < 10 cm).

Fig. 4.
figure 4

Median of the standard deviation of the elevation within-fields across the Netherlands. The grey dots represent the field randomly sampled in this study.

3 Results

The correlations that we found were generally weak as the absolute value of the Spearman correlation was on average well below 0.2, but thanks to the high number of observations we were to observe ecologically meaningful trends in them. We found that NDVI was negatively correlated with elevation in sandy soil and dry periods (spring 2017, summer 2018), whereas such negative correlation was not observed for clay soils in all the years and in sandy soil for 2016 (a wet year, Fig. 1).

In clay soils the correlation was positive at the beginning of the season whereas no correlation was observed on average for the rest of the season.

Also, the earliness score of the cultivar influenced the correlation. In fact the early varieties showed a neutral or positive correlation in wet years (indicating that higher biomasses were observed at relatively higher location), whereas the late developing varieties indicated a predominance of negative correlation in the dry years (Fig. 5).

Fig. 5.
figure 5

These figures refer to the NDVI-elevation correlation observed in the different years of late cultivars (earliness score 1 to 5) and early cultivars (earliness score 5 to 9) over the course of the season in three different years (2016, wet, 2017 dry in the spring and 2018 dry in the summer). The colored areas indicate the confidence interval of the mean predicted using a third degree polynomial. The black line indicates cumulative rainfall (referred to the first right axis) and the dashed lines represent the median NDVI over the season (referred to the left axis). (Color figure online)

4 Discussion

The correlations that we observed between the Normalized Difference Vegetation Index (NDVI, a good proxy for biomass) and elevation were generally weak suggesting that their use for prediction purposes at within-field scale is limited, nonetheless on a larger scale the correlation elevation-NDVI may be a useful indicator of cultivars sensitivity to drought stress. We observed that the correlation was negative in dry periods in sandy soils, whereas we hardly observed a negative correlation between within-field elevation and NDVI in clay soils (Fig. 1). The negative correlation for sandy soils may be easily explained by within-field water rerouting and/or distance from the water table and has generally been reported before for different crops (see introduction). The lack of correlation in the clay soils may be explained by geographical position of clay soils. In fact, in our dataset clay soils exhibit the lowest within-field elevation variability because they are mostly located in reclaimed areas (Flevoland province, Fig. 4) and are notoriously very flat. Moreover, clay soil has a higher water holding capacity and is therefore less prone to induce drought stress in the crops, and it has been shown that roots of potatoes in clay soils may grow as deep as one meter possibly because their growth can be facilitated by the presence of cracks in the subsoil [11].

Interestingly we observed a weak positive correlation in the clay at the beginning of the season, irrespective of the year. This could be due to the fact that the higher portions of the fields are less wet and thus warmer at emergence and as a consequence they develop more rapidly.

Our data on the within-field correlation between elevation and NDVI also offer a first insight on how large data from farmers could be used to evaluate differences between cultivars. An evaluation of this correlation for individual cultivars goes beyond the scope of this small study as it would require an evaluation of the performance of the different cultivars under drought for validation. Such scoring is not available to us at the present stage. However, we were able to evaluate how the cultivar earliness score influences the correlation in sandy soils. We found that late cultivars had a stronger negative correlation between NDVI and elevation than early ones. We suggest that early cultivars may escape dry spells because in the summer they have already developed deeper roots, nonetheless we have not yet data to validate this hypothesis.

5 Limitations of This Study

An important factor which we did not consider in this study was irrigation. It is likely that irrigation strongly influenced the correlation between NDVI and elevation, possibly also in unexpected direction. It could be that higher water availability levels-out the differences between the different parts of the field (as the difference in the correlation between wet and dry years would suggest) or it could also be that surface water rerouting exacerbates such differences. We were not able to investigate this aspect with the current dataset, but we believe that information on elevation could be useful to drive precision irrigation.

As we said this methodology has the potential to obtain information about cultivar differences for drought sensitivity. However to be usefully deployed such capability should be made available to breeders, whose plots are too small to be sensed using Landsat or Sentinel. Nonetheless the deployment of new satellites (e.g. Worldview) with finer resolution can open important opportunities in this direction.

An important aspect is that the AHN2 dataset has an accuracy (20 cm) that is lower than the field elevation in many cases. This would lower our correlation between vegetation indices and elevation and therefore not impair the validity of our correlation estimates.

6 Conclusion

Using farmers’ data we were able to observe a negative correlation—although weak—between NDVI and elevation, and show how this correlation is stronger during dry periods. Because the correlation between the two variables was generally low, elevation has limited predictive power for within-field variability of growth in the Netherlands. Nonetheless we were able to depict how cultivars earliness influences such correlation. The correlation between altitude and vegetation is a parameter that can be virtually measured for every crop and deviations from this parameter, when measured over sufficiently large samples may carry important information about agroecological processes such as cultivars sensitivity to drought-stress.