1 Introduction

In arid and semiarid environments, precipitation is the limiting factor for plant growth (Al-Bakri and Suleiman 2004), and thus, knowledge of the relationship between precipitation and vegetation is important for efficient resource management (e.g. food security, water management, trading aspects as export of high-value agricultural goods).

The relationship between precipitation and the Normalized Difference Vegetation Index (NDVI; Sellers 1985) as an indicator of plant growth (Fang et al. 2005) has been thoroughly investigated in arid and semiarid regions (e.g. Jobbagy et al. 2002; Chu et al. 2007; Iwasaki 2009). Regarding the African continent, most studies related to this topic focus on the Sahelian zone (e.g. Hielkema et al. 1986; Nicholson et al. 1990; Proud and Rasmussen 2011) or south Africa (e.g. Gaughan et al. 2012). Other studies were done for Somalia (e.g. Omuto et al. 2010) or across larger regions, namely the African continent (Sahel, south Africa and east Africa, Zhang et al. 2005), the sub-Saharan Africa (e.g. Vanacker et al. 2005) and the 200 to 600 mm annual rainfall belt (Martiny et al. 2005). A good summary about Africa-related studies concerning the relationship between NDVI and precipitation can be found in Chamaille-Jammes et al. (2006).

Hielkema et al. (1986) state that in the Mediterranean zone, the relationship between the primary production and precipitation is quite different from that in the Sudano-Sahelian zone. While precipitation occurs in the Sudano-Sahelian zone in the high-temperature season (summer rainfall), it occurs in the Mediterranean zone during the low-temperature season (winter rainfall). Besides, the Mediterranean climate has a smaller annual temperature range and photosynthetic activity, as represented by the NDVI, follows precipitation rather than temperature (Richard and Poccard 1998).

Semiarid regions of winter rainfall and Mediterranean climate as present in Morocco have been investigated for the western Mediterranean (e.g. Puigdefabregas and Mendizabal 1998), Jordan (Al-Bakri and Suleiman 2004), Israel (Schmidt and Gitelson 2000; Penuelas et al. 2004), the Iberian Peninsula (Udelhoven et al. 2009) and for southwest Australia (Timbal and Arblaster 2006). Focussing on Morocco, Balaghi et al. (2008) used precipitation data, NDVI and temperature to assess wheat grain yields. NDVI data (annual average) were also applied in Morocco by Sobrino and Raissouni (2000) for evaluating the regional response of soil-vegetation systems to climate and to monitor land degradation. Jarlan et al. (2013) found statistical relations between NDVI and climate information within the study region, which they propose to apply for a seasonal prediction model as a contribution to the implementation of an agricultural early warning system.

Response of NDVI to precipitation depends on the distribution of precipitation throughout the growing season and the intensity of individual precipitation events (Hielkema et al. 1986). It also depends on vegetation types, e.g. with different water storage capacities (Fang et al. 2005; Gaughan et al. 2012), and varies depending on the geographical region and topography such as valleys, slopes or hillsides (Tanaka et al. 2000). Further factors are soil type (Nicholson and Farrar 1994), soil fertility, water retention and management practices such as burning and stocking (Hielkema et al. 1986; Wang et al. 2001).

High explained variances between NDVI and precipitation appear for the Sahelian region with a time-lag of 1–3 months (Malo and Nicholson 1990; Nicholson et al. 1990; Davenport and Nicholson 1993), for south Africa with a time-lag of 1–2 months (Richard and Poccard 1998; Chamaille-Jammes et al. 2006) and for Spain with a time-lag of 1–3 months (Udelhoven et al. 2009). A linear relationship between NDVI and precipitation has been shown for Namibia (du Plessis 1999) and for the western Sahel where the annual precipitation lies approximately between 150 and 1000 mm (Nicholson et al. 1990). For the eastern Sahel/east Africa, the relationship has been shown to be log-linear (Davenport and Nicholson 1993). Eklundh (1998) detected no strong relation between precipitation and NDVI in Kenya, while Nicholson and Farrar (1994) proved a linear relationship for the Kalaharian of Botswana up to a saturation level of ~500 mm/year (for further thresholds, see Richard and Poccard 1998). For the study region, Höpfner and Scherer (2011) detected a time-lag of 1.5 months by means of lagged correlations.

For comparisons with precipitation, NDVI values have been calculated over different time periods such as annual or growing season NDVI (e.g. Li et al. 2004), monthly NDVI (e.g. Herrmann et al. 2005) or 10-day NDVI (e.g. Eklundh 1998) using different calculation rules like mean, maximum, range or integrated NDVI.

Analogously, different precipitation values such as the ratio between the annual and the growing-season precipitation (e.g. Richard and Poccard 1998), trimonthly precipitation (e.g. Nicholson et al. 1990), bimonthly precipitation (e.g. Richard and Poccard 1998), monthly precipitation or 10-day precipitation (e.g. Al-Bakri and Suleiman 2004) were used either as totals or means. Some authors also consider precipitation frequency (e.g. Fang et al. 2005). Precipitation values in most of the cited studies originate from station measurements (e.g. Kerr et al. 1989; Hess et al. 1996), while recently, more and more studies apply estimates based on remote sensing (e.g. Proud and Rasmussen 2011; Gaughan et al. 2012). The spatial resolution of remotely sensed estimates was highest with 0.25° (e.g. Iwasaki 2009; Gaughan et al. 2012).

On one hand, remote sensing for monitoring vegetation is becoming more and more sophisticated with regard to spatial and temporal resolution, time length and availability. On the other hand, it remains difficult to obtain precipitation data of similar quality in particular for data-sparse regions such as northwest Africa. Here, e.g. rain gauge data are only available for a relatively small number of locations, which are mostly located close to cities, airports and along coast lines (Schneider et al. 2013) and therefore are not representative of inland areas. This study is conducted for a winter rainfall region located in a semiarid African region north of the Sahel and south of the Mediterranean Sea that is rarely in the focus of other related studies due to the above-stated shortcomings in observational data. The constantly improving capabilities of numerical weather prediction (NWP) models offer the opportunity to reduce this problem by providing precipitation fields and other meteorological variables as gridded data sets of high spatial and temporal resolution. Longer time periods of years to decades can be simulated by NWP models by successive model runs of shorter periods driven by large-scale atmospheric datasets (e.g. Bromwich et al. 2005; Maussion et al. 2011). For example, Maussion et al. (2014) used the Weather Research and Forecasting Model (WRF) to dynamically downscale global final analysis data with a daily reinitialisation strategy to produce the High Asia Reanalysis, at spatial resolutions of 30 and 10 km.

The high spatiotemporal resolution of the NWP data represents a major step ahead compared to the precipitation data used in other comparable studies. Data derived from NWP enables us to investigate the relationship between precipitation and the NDVI at almost any timeframe (e.g. 16 days to seasonal), region (e.g. NW-Morocco), high spatial resolution (e.g. 2 km) and thematically focused scope (e.g. single type of land cover). Thus, corresponding to Gaughan et al. (2012), we are able to address in detail the following research questions:

  1. 1.

    What is the relationship between NDVI and precipitation in the study region during the decade 2001–2010?

  2. 2.

    How much variance of the NDVI is explained by variance in different timeframes of precipitation within the decade 2001–2010?

  3. 3.

    Which relationships exist between different land cover types and precipitation?

The manuscript is structured as follows. First, we provide a detailed description of the materials applied including basic characteristics of the study region, data pre-processing and the regression analyses. This is followed by a presentation of results and a discussion. Finally, conclusions are drawn.

2 Materials

2.1 Study region

The study region (Fig. 1) includes the cities of Casablanca, Rabat and Meknès and covers an area of ~38,000 km2 ashore from N 32° 37′ 50′ to N 34° 18′ 56′ latitude and from W 5° 30′ 6′ to W 8° 21′ 56′ longitude. The Atlantic Ocean to the north-west and the Middle Atlas to the south-east are natural borders. The mean elevation is about 484 m a. s. l., and the maximum elevation about 1767 m a. s. l. in the Atlas Mountains. Precipitation normally occurs during the winter months with roughly 75 % occurring in November to March. The hydrological year starts in September of the previous year and ends in August. The total mean of annual precipitation in Casablanca (Nouasseur) is 369 (289) mm, ranging between 66 (50) mm and 665 (563) mm (NCDC data September 1980–August 2010). The climate near the coast is moderate due to the Canary current. The 30-year mean annual air temperature in Casablanca is 18.1 °C (NCDC data, September 1980–August 2010, Casablanca station).

Fig. 1
figure 1

Study region in northwest Morocco: cities (squares), water mask (blue areas), mayor river basins (grey lines) and stations (triangles) of the “Global Summary of the Day” (GSOD) provided by the National Climatic Data Center (NCDC). GSOD stations are used for validation

2.2 Data sets and pre-processing

Annual data sets cover the time period from September 2000 to August 2010, i.e. the hydrological years 2001–2010.

2.2.1 NDVI time series data

The gapless NDVI time series dataset of MOD13Q1 product (collection 5) from the Moderate Resolution Imaging Spectroradiometer (MODIS) has been acquired from the Warehouse Inventory Search Tool (WIST Warehouse Inventory Search Tool available at: https://wist.echo.nasa.gov/~wist/api/imswelcome/. Accessed 21 February 2011). It consists of 230 single NDVI 16-day composites covering a time period of ten years from 2001 to 2010. The National Aeronautics and Space Administration (NASA) provides accurate, cloud-free, continuous and consistent NDVI data of high quality (Huete et al. 1999). We re-project NDVI raster data into resized WRF Lambert conformal (WGS 84, spatial resolution of 250 m × 250 m, hereafter “MODIS-grid”). Hence, one pixel of the NDVI raster data represents the NDVI value for an area of 250 m × 250 m. This is different to the applied precipitation data set, which has grid points representing values only for single points and not areas. However, we speak hereafter of grid points for both data sets to facilitate readability and easier understanding.

Grid points in the Atlantic Ocean near the coast are eliminated applying a water mask derived from the ASTER global digital elevation model data. The water mask also includes reservoirs ashore, which were masked using Landsat data of 2004.

We apply the algorithm of Chen and Dudhia (2001) as described in Höpfner and Scherer (2011) to smooth NDVI data of each grid point on the temporal scale. This smoothing algorithm assumes that NDVI is always depressed but never overrated by noise. Its application reduces impact of single contaminated data points and keeps the upper envelope of NDVI data.

After smoothing, we deduce five phenologic metrics (Reed et al. 1994) from NDVI data for each grid point and year (Fig. 2). The phenologic metrics describe the annual phenology of vegetation by its intra-annual NDVI characteristics. We use the phenologic metrics to run annual land cover classifications for the entire study region. The applied land cover scheme (Table 1) is in particular helpful to describe vegetation in regions where the majority of the land cover is unknown and ground truth data are scarce. The multi-temporal classification compensates this disadvantage and allows partitioning of the land cover in five different land cover types based on intra-annual time series of NDVI data. However, the scheme is developed for remote-sensing data (MODIS-NDVI) and does not allow for a distinction between specific land-use categories although the phenologic metrics of green vegetation might be very similar to e.g. stocking. Thus, areas with similar phenologic metrics are summarised in one land cover type even if its land use might be different (see Table 1, more details in Höpfner and Scherer 2011). Figure 3 shows the mean land cover of the study region from 2001 to 2010.

Fig. 2
figure 2

Derived phenologic metrics according to Reed et al. (1994): MaxV (maximum NDVI value within the vegetation period), MeanV (mean NDVI value of the vegetation period), OnV (NDVI value at the beginning of the vegetation period), EndV (NDVI value at the end of the vegetation period), RanV (range between maximum value within the vegetation period and minimum of OnV and EndV, Höpfner and Scherer 2011)

Table 1 Land cover types corresponding to Höpfner and Scherer (2011)
Fig. 3
figure 3

Mean land cover of the study region based on thresholds derived from NDVI data of the Moderate Resolution Imaging Spectroradiometer (MODIS) between 2001 and 2010

Nevertheless, land use can be evaluated indirectly by an evaluation of the vegetation response to precipitation (e.g. degradation in Li et al. 2004). For response analyses, we apply primarily the phenologic metric MeanV (mean NDVI value between onset and offset of the growing season, Fig. 2). For a better readability and understanding, we speak hereafter of mean NDVI where mean NDVI of 2001 describes for instance the mean NDVI value of the growing season between September 2000 and August 2001. Figure 4 (top left) shows images of spatial distributed data of mean NDVI for 2001–2010. The highest values occur not only for forested areas due to their perennial vegetation but also for some high-productive vegetated areas, e.g. southern of Casablanca. The respective coefficients of variation of mean NDVI (Fig. 4, top right) are lowest near the coast, in cities, and in most of the areas showing a high mean NDVI.

Fig. 4
figure 4

10-year mean of annual mean NDVI and 10-year mean of annual mean precipitation (left) and coefficients of variation of the corresponding ten annual values (right). The marked area covers the political district of the Wilaya of Grand Casablanca. NDVI-related data refers to the MODIS grid and precipitation data to the WRF-grid

2.2.2 Precipitation data

In this study, we used precipitation data from a new data set covering ten hydrological years between 2001 and 2010: the Northwest Africa Reanalysis (NwAR), generated by dynamical downscaling of a large-scale meteorological dataset following the methodology presented by Maussion et al. (2014). The NWP model used to generate the NwAR is the WRF (Skamarock and Klemp 2008), version 3.2.1. The model configurations are summarised in Table 2. The data set consists of consecutively reinitialized model runs of 36 h time integration. Each run starts at 12:00 UTC, and the first 12 h from each run are discarded for spin-up. The remaining 24 h of model output provide 1 day of the 10-year-long time series of meteorological variables (e.g. precipitation, temperature, pressure, geopotential, wind direction, wind speed). Using a very short integration time ensures that the model output remains constrained by observations, while the conditions at the Earth’s surface influencing atmospheric processes (e.g. topography, land cover), particularly in the boundary layer, are described by the model at higher spatial detail. Thus, our methodology provides a re-analysed state of the atmosphere at high spatial (2 km) and temporal (hourly) resolution. For model initialisation and definition of boundary conditions, we have used data from the operational model global tropospheric analyses (final analyses, FNL; data set ds083.2), which are available every 6 hours and have a spatial resolution of 1°. Three domains (Fig. 5) are defined with spatial resolutions of 30, 10 and 2 km. The northwest Africa domain is the largest domain in which the Morocco domain is nested as second-level domain. The smallest domain covering the region of Casablanca-Rabat is nested in the Morocco domain as a third-level domain. We use the two-way nesting cascading approach defined by Maussion et al. (2011), such that every child domain of higher resolution benefits from the two-way nesting option (the information exchange between parent and child domain is bidirectional) while avoiding inconsistencies in the parent domains occurring in the presence of the child domain. First, the large northwest Africa domain is computed alone. Then, the Morocco domain is computed using a two-level, two-way nesting within the northwest Africa domain. Finally, the Casablanca-Rabat domain is computed using a three-level, two-way nesting within the two larger domains of the NwAR. Further details for the modelling strategy as well as for sensitivity analyses can be found in Maussion et al. (2011, 2014).

Table 2 Overview on the configuration of the weather research and forecasting (WRF) model used for the computation of the Northwest Africa Reanalysis (NwAR) consisting of three different nested domains, i.e., the northwest Africa, Morocco and the Casablanca-Rabat domains
Fig. 5
figure 5

Overview on the configuration of the Weather Research and Forecasting Model (WRF) used for the computation of the Northwest Africa Reanalysis (NwAR) consisting of three different nested domains (the biggest domain equates to the entire map as shown)

For the present study, we use precipitation data (NwAR precipitation) from the smallest domain (hereafter “WRF-grid”). To ensure that NwAR precipitation data are of suitable accuracy for the purpose of our study, we compared it with rain gauge precipitation records (NCDC data) from the “Global Summary of the Day” (GSOD) provided by the National Climatic Data Center (NCDC) as described in “Section 3”.

The WRF-grid has a size of 140 × 100 grid points and a grid spacing of 2 km. We first remove four grid points from each border as artefacts because of possible border effects from the multilevel approach. Then, we transfer the already applied water mask of the MODIS-grid to the spatial resolution of the WRF-grid (2-km grid spacing, i.e. spatial up-scaling). Each grid point of the WRF-grid contains 64 grid points of the MODIS-grid due to the different spatial resolutions. We mask all grid points of the WRF-grid that have less than 58 (90 %) valid grid points in the MODIS-grid. The threshold of 90 % is applied to account for variations of water level, especially in the reservoirs within the study region. By doing so, 9401 grid points (~37,600 km2) remain as input for the analyses in the study region.

For the analyses, we use mean NwAR precipitation per day as unit (mm/day) for all aggregations of precipitation over different timeframes. To avoid misunderstanding, we speak hereafter of mean precipitation where this terminology always refers to a specific timeframe of temporal aggregations (e.g. mean precipitation per month expressed in millimetres per day).

A temporal up-scaling of the NwAR precipitation data is realised through aggregation from an hourly to a daily temporal resolution, and in a second step to the 230 timeframes of NDVI 16-day composites. Höpfner and Scherer (2011) used lagged correlations and proved that vegetation response in the study region has a time-lag of about 1.5 months. Therefore, it is necessary to calculate the mean precipitation for the two 16-day composites covering August 2000, which are missing in the NwAR precipitation data set. To get both values, we use in each case the mean value of the ten corresponding composites of August 2001–2010.

3 Methods

For the analyses of the vegetation response to precipitation, we first validate the NwAR precipitation data (see results). Here, we describe the methods applied for the response analyses. For all analyses (validation and response analyses), we use the 5 % significance level.

For the response analyses, we first upscale mean NDVI from the MODIS-grid to the WRF-grid using spatial averaging. Speaking hereafter of grid points, we always refer to WRF-grid, and exceptions will be marked explicitly. In the same way, we speak of precipitation data referring to NwAR precipitation data. The workflow for the response analyses is shown in Fig. 6.

Fig. 6
figure 6

Workflow applied for response analyses

In a first step, we run multi-temporal linear regression analyses. For it, we use the ten annual values of mean NDVI (2001–2010) as proxy of vegetation response. As precipitation input data of the 10 years, we systematically apply mean precipitation, which is calculated for different input timeframes. These timeframes shift systematically from 32 days up to the entire hydrological year using a lag of 16 days. By doing so, it is possible to extract the timeframe of precipitation whose variance best explains the variance in mean NDVI during the decade. The application of a lag of 16-day steps (hereafter composites) corresponds to the timeframe of an NDVI composite. For the aggregation of precipitation data, a minimum length of two composites (32 days approx. 1 month) is applied as shortest input timeframe to minimise noise and the attested uncertainties in the precipitation data when using shorter aggregation time periods (see results of the validation).

Each timeframe of precipitation input for regression analyses is defined by a start composite and the length of the aggregation period (number of composites). If both are set, the mean daily precipitation for the corresponding ten annual timeframes can be computed at each grid point.

Defining the start composites, it is additionally necessary to include precipitation that occurs just before the beginning of each hydrological year because of the time-lag of nearly 1.5 months (~3 composites) between vegetation response and precipitation in the study region (Höpfner and Scherer 2011). Thus, regarding each year as a time series of 23 composites (keys 0–22), we define the first start composite at key “−2” which covers the beginning of August, i.e. 1 month before the start of the respective hydrological year (September). We use the two August composites of each previous hydrological year and the first 14 composites of the current hydrological year as different starting composites of the aggregation periods. Thus, the last starting composite is at the end of March. Once the start composite is set, we stepwise add one composite to enlarge the length for the calculation of the mean precipitation until the end of the hydrological year is included (end of August). By doing so, the maximum length of the aggregation period of mean precipitation is 25 composites.

Focussing on research questions one and two, a sequence of linear regression analyses is conducted based on ten mean NDVI values and ten mean precipitation values of the different input timeframes at each grid point between 2001 and 2010. To extract the mean response of vegetation to precipitation in the study region, we run the linear regression analyses first with spatial means of NDVI and precipitation data. Then, we run linear regression analyses for each single grid point to get spatially distributed response information.

In a second step, we focus on research question three. We first extract the maximum of significantly (5 % confidence level) explained variance (r 2) in mean NDVI for each grid point from spatially distributed response information. This allows locating areas of high (0.6 ≤ r 2 < 0.8), medium (0.4 ≤ r 2 < 0.6) or low (0.2 ≤ r 2 < 0.4) explained variance. Then, we focus on all grid points having a maximum r 2 greater than or equal to a defined threshold. This threshold is changed stepwise to examine different clusters of grid points and their land cover composition. The results of the ten annual land cover classifications are the basis for determining shares of land cover types for each respective cluster of grid points. This is possible because of the different spatial resolutions of both grids (250 m and 2 km). Thus, we gain evidence of the land cover composition of grid points with high, medium or low explained variance in mean NDVI.

In Morocco, considerable agricultural areas are irrigated. Since irrigation unlinks vegetation response to precipitation, we assume that these areas do not show high explained variances of mean NDVI. Further, we assume that irrigation is mainly related to agricultural land use. To extract such areas, we use the classification thresholds of the land cover type ‘high-productive vegetation’, which is designed to denote agricultural lands showing a high range value of NDVI within the vegetation period (RanV ≥ 0.4, Höpfner and Scherer 2011). We extract all grid points with a low to medium explained variance (r 2 < 0.6) that contain a high mean range value of NDVI within the 10 years (RanV ≥ 0.4) to map potentially irrigated agriculture.

In addition, we run a sequence of linear correlations using annual spatial means of precipitation input and yearly spatial means of mean NDVI based on each single land cover type. We only use MODIS-grid points of the specific land cover to upscale mean NDVI to WRF-grid. Grid points without values (e.g. no MODIS-grid point has this land cover type, or a WRF-grid point is masked) are excluded from analysis. Throughout this procedure, we can specify the mean relation of each land cover type to precipitation in the study region. A first impression of the mean seasonal cycle of precipitation and NDVI of the different land covers is depicted in Fig. 7. Corresponding to Al-Bakri and Suleiman (2004), we use re-analysed precipitation data as an independent variable and mean NDVI as a dependent variable for all linear regression analyses.

Fig. 7
figure 7

Monthly means (hydrological year: SEPAUG) of mean daily precipitation (bars) and spatial mean NDVI of different land cover types (lines)

4 Results

4.1 Validation of NwAR precipitation

The spatial distribution of mean annual precipitation during the decade indicates a gradient of decreasing values towards the south and increasing values towards the eastern higher altitudes of the Atlas (Fig. 4, down left). The general pattern of the coefficient of variation of the mean annual precipitation (Fig. 4, down right) shows that variability of mean annual precipitation is higher in mountainous areas than that in coastal areas of the study region. However, the results of linear regression analyses between measured and NwAR precipitation are statistically significant for all four GSOD stations (Table 3).

Table 3 Validation results of re-analysed precipitation data compared to station measurements (GSOD) at different aggregations (base) between 2001 and 2010

First, we extract the NwAR precipitation time-series of the WRF-grid point next to the location of the corresponding NCDC station. Then, we applied several predefined sets of temporal aggregation to compute the mean precipitation before running the corresponding linear regression analyses:

  • Daily base (no temporal aggregation),

  • Two-day, 5-day and 10-day base (mean daily precipitation for these time-steps),

  • Sixteen-day base (mean daily precipitation corresponding to composite length of NDVI data),

  • Monthly base (mean daily precipitation per month),

  • Seasonal base (mean daily precipitation corresponding to the four seasons September–November (autumn), December–February (winter), March–May (spring), June–August (summer)),

  • Annual base (mean daily precipitation per hydrological year).

For linear regression analyses, we used NwAR precipitation data as independent variable and the measured precipitation data as dependent variable.

Generally, the validation results show increasing r 2 values, the more days are aggregated to the mean precipitation, except on an annual base (Table 3). Focussing on the results from the four stations, in average 50 % of the variance in measured precipitation can be explained by variance in NwAR precipitation using a basis of 10 days for regression analysis. This averaged explanation in variance increases to 56 % for a base of 16 days, 65 % for a base using months, 78 % using seasons as a base and 56.0 % using hydrological years as a base.

The increase of r 2 values is explained by the general framework of modelling or re-analysing precipitation: On the one hand, it is possible to assess precipitation as one of several variables of the climate system in a quite similar temporal pattern, but on the other hand, the congruity of precipitation data also depends on the exact point in time and the exact amount of precipitation. Therefore, it is possible that precipitation occurs in another amount than measured or with a temporal shift (e.g. 1 day later than assessed). This assumption is underlined by the almost doubled r 2 values when switching from a daily base (16 % explained variance in average) to a 2-day base (30 % explained variance in average) and nearly tripled switching from a daily base to a 10-day base (50 % explained variance in average). This effect decreases, the more days are aggregated and saturates at a certain level, which is indicated at the seasonal base (78 % explained variance in average). Following this hypothesis, the final drop of r 2 values applying an annual base is surprising at first. We explain this with a second effect influencing the results, i.e. by heavy rain events that were not re-analysed correctly in terms of intensity. If such a heavy rain event occurred, one of the input values for linear regression analysis would be strongly out of line. This is not crucial if the quantity of values is high enough, but if the number is rather small as on an annual base (10 values in all), one value out of line will have a significant impact on the explained variance of the mean NDVI. Through this, r 2 values decrease again using a certain time length for aggregation.

In all, the validation shows that the NwAR precipitation reproduces measured precipitation well but not in every exact detail in terms of extent and timing of precipitation. This is the case when validation is conducted at seasonal subsets as annual precipitation input. Here, no correlation at all can be found between NwAR and GSOD for mean precipitation in summer (JJA in Table 4), but with high correlations for all other (wetter) seasons (e.g. DJF in Table 4). However, precipitation in summer seasons is, in general, low (Fig. 7), and precipitation in winter is known to be of higher significance for vegetation response (Höpfner and Scherer 2011).

Table 4 Validation results of re-analysed precipitation data compared to station measurements (GSOD) for different seasons of the hydrological year (September–August) from 2001 to 2010

4.2 Vegetation response analyses

The results of multiple regression analyses using annual spatial means show that the variance in mean NDVI is best explained (r 2 = 0.73) by the variance in mean precipitation between the middle of November and end of December (Fig. 8). Using other starting composites and keeping the end of December for precipitation input, r 2 values are slightly lower but also high (0.65 < r 2 < 0.73). This is indicated by the diagonal of orange and red rectangles in Fig. 8 and is similar for input timeframes ending in the middle of January (diagonal above). Results become insignificant using a starting composite after the end of November, which results in a gap after end of November (remaining white rectangles in Fig. 8).

Fig. 8
figure 8

Results of multiple regression analyses using annual spatial means of the study region and systematically changing timeframes for calculation of mean precipitation. The x-axis describes the first input composite and the y-axis, the number of input composites to define the length of the timeframe for calculation of mean precipitation. For each input timeframe of precipitation, the explained variance in mean NDVI is displayed as long as the result is significant on the 5 % significance level. Vegetation input for analyses was the spatial mean NDVI of each year. The diagonals explain identical end composites having different start composites and varying input lengths. The maximum r 2 is marked (x)

By applying spatially distributed data and running multiple regression analyses for each grid point individually, we find statistically significant high or very high explained variance in mean NDVI (r 2 ≥ 0.6) for approximately 61 % of the grid points within the study region (Fig. 9).

Fig. 9
figure 9

Highest explained variance (r 2) in mean NDVI on the 5 % significance level considering all examined timeframes

Examining the composition of land cover types at grid points that represent r 2 values above defined thresholds, we discover a decreasing spatial percentage of ‘forest’ and ‘low-productive vegetation’ with increasing thresholds (Fig. 10). This is opposite to the percentage of land cover type ‘high-productive vegetation’ which increases with increasing thresholds. The percentage of land cover type ‘sparsely vegetated’ increases up to a threshold of approximately 0.83 and decreases applying higher thresholds.

Fig. 10
figure 10

Spatial shares of land cover types focussing only on grid points above a defined threshold of explained variance in mean NDVI

The identification approach of potentially irrigated areas reveals a cluster of grid points covering nearly 14.1 % (~5300 km2) of the study region (Fig. 11). The spatial shares of land cover types of this cluster are presented in Table 5.

Fig. 11
figure 11

Grid points which are not very sensitive to precipitation (r 2 < 0.6) but have a high mean range of NDVI within the growing period (RanV ≥ 0.4)

Table 5 Spatial shares (%) of land cover types 2001–2010

Land cover type specific analyses show no statistically significant result for the land cover type ‘very sparsely vegetated’. The results of the other four land cover types are displayed in Fig. 12. The highest explained variance in mean NDVI for the two land cover types high-productive vegetation and low-productive vegetation occurred for precipitation between beginning of September and end of December (r 2 = 0.75 and r 2 = 0.67, respectively). For the land cover type sparsely vegetated, results show that the variance in mean NDVI is explained best by the variance in precipitation between the beginning of November and middle of March (r 2 = 0.62). The results for the land cover type forest show that the variance in mean NDVI is best explained by the variance in precipitation between the beginning of October and end of the hydrological year (r 2 = 0.48). It is apparent and common for the four land cover types that the explained variance in mean NDVI becomes statistically insignificant when applying precipitation of input timeframes that start after the end of November (Fig. 12). Using the overlap of the land cover type specific timeframes of highest explained variance in mean NDVI, precipitation between November and December is emphasised.

Fig. 12
figure 12

Linear regression results applying spatial means and systematically changing input timeframes for calculation of mean precipitation. Results applying spatial means of mean NDVI depending on land cover type ‘sparsely vegetated’ (a), ‘forest’ (b), ‘high-productive vegetation’ (c) and ‘low-productive vegetation’ (d). All results are significant on the 5 % significance level. Not significant results are not displayed. The maximum result is marked for each land cover type. The maximum r 2 is marked (x)

In addition, we also analysed with respect to high influence of precipitation in autumn, if precipitation is autocorrelated, which means that precipitation from September to December triggers the precipitation of January to April. We therefore compared the ten spatial means of annual precipitation from September to December with those of January to April using a linear regression analysis. As a result, we found no statistically significant relationship. This means that precipitation between January and April is not triggered by precipitation between September and December.

5 Discussion

Spatial information is aggregated to one value for each year by spatial averaging. Thus, local effects are reduced, and it becomes possible to describe the mean condition of the study region. Using spatial means as input for linear regression analyses, the general relationship between mean precipitation and mean NDVI shows higher values of the explained variance (r 2) in mean NDVI by the variance in mean precipitation up to the end of December (Fig. 8). If later precipitation is included, the explained variance is lower and not significant. This and the fact that results become insignificant when using a start composite after the end of November underline the high influence of variances in the mean precipitation between November and December on mean NDVI in the study region.

The land cover type specific results of response studies using spatial means have to be interpreted by the applied land cover scheme. The vegetation and the land use of the five land cover types can be quite heterogeneous as explained. Especially, the land cover type low-productive vegetation is the most heterogeneous of all land cover types, because it consists of vegetation of agricultural land use on the one hand (high human influence) and of vegetation of grasslands or shrub lands on the other hand (low human influence, Höpfner and Scherer 2011). Vanacker et al. (2005) found land cover products highly sensitive to short-term rainfall variability, especially for grass and shrub savannahs. Thus, areas often swing between the two land cover types low-productive vegetation and sparsely vegetated when running annual land cover classifications. This also happens for other vegetation formations at the border between two of the five land cover types (e.g. rain-fed agriculture swings between the land cover types low-productive vegetation and high-productive vegetation, depending on its location and water supply). As a consequence, the patterns of correlation results are similar (Figs. 8 and 12).

Regarding the results from the land cover types high-productive vegetation and low-productive vegetation, they both show highest explained variances in mean NDVI towards the end of December. This underlines the results above but is different from sparsely vegetated land cover that shows the highest explained variance including mean precipitation up to middle of March (Fig. 12). We find the location of areas belonging to one of the three land cover types as the main difference. Sparsely vegetated areas are mainly located in towns and more extreme sites of higher altitudes, which facilitates surface run-off or hampers infiltration and retention. Thus, water availability is lower, and vegetation needs longer input of precipitation to produce green biomass. The locations of low-productive vegetation are more or less near cities, in uneven areas, towards dryer regions in the south, and towards higher altitudes. Most of the areas classified as low-productive vegetation shape transitional zones around forested areas or between areas classified as sparsely vegetated and classified as high-productive vegetation. Areas classified as high-productive vegetation are mainly in the plains of the study region. It is reasonable that vegetation of higher plant density (e.g. croplands) produces biomass faster than vegetation with lower plant density (e.g. shrublands), which is additionally reinforced when plant locations are less extreme in case of water availability (e.g. higher rates of infiltration and retention). Thus, vegetation of agricultural land use is privileged and quickly leads to higher NDVI values compared to other potential land uses.

The additional emphasis of precipitation between the beginning of October and beginning of June or later in correlation results of land cover types high-productive vegetation and low-productive vegetation (orange/yellow vertical line in Fig. 12c, d) leads to the assumption that areas not agriculturally used are well classified. This refers to the limits of the classification scheme (Höpfner and Scherer 2011) and concerns vegetation formations at the border to the land cover type forest. Here, we assume an impact of not heavily harvested crops of agricultural land use (e.g. viniculture). Another possibility of not-harvested vegetation could be shares of deciduous shrubs or trees which have a high NDVI range due to a well-developed system of roots.

In all, the results of land cover types high-productive vegetation and low-productive vegetation confirm earlier results that found the period between the beginning of October and middle of December to be critical using only one reference point of precipitation measurements as an input (Höpfner and Scherer 2011). Now, we can state this more precisely to the timeframe between the beginning of September and end of December applying spatially distributed precipitation data.

Results from the land cover type forest are different. Its vegetation has a perennial character, which does not rest during the summer like the vegetation of the other land cover types. Thus, precipitation is always important to overcome the dry, hot season by e.g. infiltration, retention and refilling of additional sources in deeper levels, respectively. Having longer roots that reach these deeper resources, vegetation of the land cover type forest is able to overcome the dry, hot summers when water in upper soil levels becomes scarce. This mitigates the effects of variance in precipitation and leads to lower r 2 values. Similar ideas were stated by Udelhoven et al. (2009) who found that crops and grassland in the semiarid regions of Spain are more sensitive towards water stress than perennial woody species.

The nearly complete absence of vegetation in areas assigned as land cover type very sparsely vegetated explains that no statistically significant relation between mean NDVI and mean precipitation can be found for this land cover type. This confirms the robustness of the methods applied.

In general, we can derive an importance of variation in precipitation especially in November from the land cover type specific results because the explained variance in mean NDVI becomes statistically insignificant if precipitation input from the beginning of December or later is applied.

Results from spatial distributed analyses show a high influence of mean precipitation (r 2 > 0.6) on mean NDVI for more than 61 % of the study region. This means that there is a high explained variance in mean NDVI for the majority of the areas using mean precipitation data. Results from examinations of spatial shares of land cover types confirm the above discussed results when applying different thresholds of r 2 (Fig. 10). From a mathematician’s point of view, shares of land cover types with a high overall explained variance in mean NDVI must increase when using higher thresholds for input. Therefore, it is consistent that shares of the land cover type forest decrease with rising thresholds because this land cover type has the lowest explained variance in mean NDVI of all land cover types. Vice versa, it is also consistent that shares of the land cover type high-productive vegetation and low-productive vegetation increase using higher thresholds because these land cover types contain the highest overall explained variance in mean NDVI. Nevertheless, shares of low-productive vegetation finally decrease because the overall explained variance in mean NDVI of the land cover type high-productive vegetation is even higher.

Three quarters of potentially irrigated agriculture are covered with high-productive vegetation during the 10 years (Table 5). Although, a high share of high-productive vegetation on potentially irrigated agriculture is expected on the one hand due to the applied threshold (RanV ≥ 0.4), this surprises on the other hand because high-productive vegetation has the highest r 2 values of all land cover types (Fig. 12). However, this is not contradictory because the applied threshold leads to a considerable number of grid points that fulfil the assumptions made for being potentially irrigated (r 2 < 0.6, RanV ≥ 0.4). Thus, we assume that areas of the land cover type high-productive vegetation having low to medium r 2 are potentially irrigated and that areas of high r 2 values are rain-fed. Together with shares of land cover type low-productive vegetation, a span of 75–96 % of irrigated land is conceivable. Determining the exact proportion of the combined shares (“low-” and “high-productive” land cover types) is difficult. This is due to a lag of information about which areas of the low-productive vegetation are indeed irrigated croplands and which are only land cover fragments due to e.g. different grid spacings. The shares of the other two land cover types in the cluster of potentially irrigated lands are regarded to be such fragments. Exemplarily, validations with an available ground truth map of the Wilaya of Grand Casablanca did not disagree with our results. A visual validation of data of Google maps for the subcluster of potentially irrigated lands at the northern border of the study region reveals structures of plantations similar to viniculture. All together, we found no disproof for our assumptions.

In accordance with Balaghi et al. (2008), our results suggest a high forecast potential regarding land management for e.g. crop yield of up to 1.5 months. Also, Bolton and Friedl (2013) state that NDVI (among other VIs) at 65 to 80 days after green up (start of growing season) correlates best with crop yields (maize, soybeans) in semiarid regions. However, crop yield is not only a consequence of photosynthetically active and green vegetation as represented by the NDVI (Jackson and Huete 1991). Crop yield has to be seen as an end-of-season observation that also integrates the cumulative effect of processes as, e.g. nutrient deficiency, insect infestation or disease over the entire season (Pinter et al. 2003). These processes are then not often well related to the mean NDVI when senescence of greenness sets in naturally and e.g. leaves begin to loose chlorophyll. Thus, we cannot assume, based on the regressions, that higher than normal precipitation at the beginning of the growing season, followed by relatively high mean NDVI, is directly linked to higher crop yields. Therefore, precipitation at the beginning of the growing season is a necessity for high crop yields, but it cannot sufficiently explain the total crop yield at the end of the growing season. This kind of assumptions is feasible in the opposite case when precipitation at the beginning of the growing season (green up) is below normal. In this case, precipitation leads more likely to a decrease in crop yields. Keeping this in mind, our results have a high relevance for land use management, e.g. to decrease vulnerability of agricultural land in the case of droughts. The forecast potential for crop yields within the study region should be further investigated applying NDVI, re-analysed precipitation and other re-analysed weather data for better parameterisation of crop yield models (Pinter et al. 2003; Seiler et al. 2007).

6 Conclusions

This study used re-analysed precipitation data within a region north of the Sahel and south of the Mediterranean Sea, which is rarely in focus of other studies within this context. We applied a new comprehensive dataset (NwAR) at a very high spatiotemporal resolution (2 km, hourly). This is a major step ahead of other studies, which apply data from station measurements, estimates based on remote sensing or data based on interpolation at lower spatial resolutions.

The applied continuous NDVI data allowed producing comprehensive consistent land cover products of each year over a large region (~37,600 km2). These products are of high value for vegetation analyses, because normally land cover/land use products are limited in their spatial or temporal dimension. The application of 16-day NDVI data ensures continuous input and the quality of products.

First of all, we conclude that the NwAR precipitation reproduces the measured precipitation of four validation stations sufficiently accurate over a decade (r 2 = 0.78 at a 3-monthly, seasonal base) although not every exact detail in terms of extent and timing is reproduced.

Secondly, we show that a statistical association exists between NwAR precipitation and mean NDVI in northwest Morocco within the decade 2001–2010. However, the influence of precipitation on mean NDVI depends upon the temporal sequence in which precipitation occurs. This confirms the findings of Wang et al. (2001). We prove that for vegetation in general (independently from land cover types), 73 % of variance in mean NDVI can be explained by the variance in precipitation between the middle of November and end of December. Thus, the explained variance based on a specific timeframe (November–December) of precipitation tends to be higher than the explained variance over the entire hydrological year.

Thirdly, land cover type specific results show that 75 % of the variance in mean NDVI of the land cover type high-productive vegetation can be explained by the variance in precipitation between the beginning of September and end of December. This is especially important for agricultural lands. Moreover, the variance in precipitation between the beginning of November and end of December has to be emphasised for these land cover types. The variance in precipitation in the same timeframe explains 66 % of the variance in mean NDVI of the land cover type low-productive vegetation. For the land cover type sparsely vegetated, 62 % of the variance in mean NDVI can be explained by the variance in precipitation between the beginning of November and middle of March. Results from the land cover type forest show the best explained variance in mean NDVI by the variance in precipitation between the beginning of October and end of the hydrological year (r2 = 0.48). This is the lowest explained significant variance in mean NDVI of all land cover types and can be explained by the perennial character of vegetation in forests.

Fourthly, we conclude that the variance in precipitation in November is especially critical for mean NDVI. Lower ranking but also of high influence is the variance in precipitation in December.

Fifthly, we conclude that spatially distributed precipitation data allow to extract irrigated areas. We find roughly 14 % (~5300 km2) of the study region to be potentially irrigated. The applied approach therefore opens potential for monitoring aspects and should be deepened in further analyses.

Sixthly, spatial distributed results show that 61 % of the study region has a high explained variance in mean NDVI (r 2 > 0.6). The higher the explained variance, the higher is the share of land cover type high-productive vegetation in these areas. We can conclude that, despite the irrigated areas, a considerable part of agricultural areas are rain-fed in the study region. With respect to efficient resource management strategies (e.g. food security, water management), this point has central importance.

Seventhly, a wide-ranging applicability is possible since the method of our study can be independently applied from our study region and usual land use/land cover types.

In all, the re-analysed data of high spatiotemporal resolution opens a new quality of analysis valuable, e.g. for monitoring aspects, policy decisions, regulatory actions and land-use activities (Lunetta et al. 2006). Especially the provision of land cover that is very sensitive to precipitation has great value for risk management (e.g. floodings, droughts, fire). In terms of agricultural management and export goods, our methods are valuable because land areas are better understood in their sensitivities.