Introduction

Global food demand is increasing as a result of demographic growth, economic progress, and increasing urbanization (Van Dijk et al., 2021); thus, boosting the production of quality crops is a major issue for world producers. Crop micronutrient and water balance is one of the most significant determinants of the overall success of any farming business. Water content is entirely related to the physiological activity of the plants. The effective use of water by plants can greatly enhance the economic returns and environmental sustainability of agricultural businesses, by achieving optimum yields and increasing profits while reducing unnecessary input costs.

Remote sensing is the science of acquiring information about surfaces and/or materials from a distance, typically via remote sensors, either ground-based or on-board satellites and aircraft, which measure reflected or emitted energy. Remote sensing at the canopy scale is currently employed to measure vegetation greenness and health under specific climatic conditions via several indices: Ratio Vegetation Index (RVI) (Tian et al., 2014), Optimized Soil Adjusted Vegetation Index (OSAVI) (Fern et al., 2018), Normalized Difference Vegetation Index (NDVI) (Matas-Granados et al., 2022), etc.

In order to avoid and minimize the damaging impacts of drought, a number of techniques have been introduced for monitoring and forecasting drought. Among these, remote sensing is the most prospective technology (AghaKouchak et al., 2015). In remote sensing, the indicators used in drought surveillance can be categorized into four types: greenness anomaly, temperature anomaly, fluorescence anomaly, and moisture anomaly (Sun et al., 2023).

Imaging sensors are at the center of any remote sensing application and have variable temporal, spectral, and spatial resolutions; moreover, they are categorized according to their spectral response and provide the key remote sensing bands (Abdullah et al., 2023). Hyperspectral remote sensing can accurately detect spectral variations in plant cover, increasing the precision of identification of vegetation units and minimizing errors in observations of vegetation cover (Yang & Du, 2021).

Spectroscopy techniques, including those in the visible and NIR range, are high-precision analytical methods with applications in fields as diverse as mineralogy (Baliyan et al., 2021), biomedicine (Chaudhary et al., 2022), agriculture (Zahir et al., 2022), and ecology, to name a few. It is also used to validate data from drones, aircraft, and satellites. Thus, spectral data of vegetation provide a wide range of information about how plants react to nutrients, light, and water (Schweiger, 2020).

Previous scientific studies have enabled to predict the water content of leaves (Dzikiti et al., 2010; Gao et al., 2022; Wang et al., 2020) of different plants through spectral reflectance. Furthermore, several research studies (Neto et al., 2017; Rallo et al., 2014) have taken into account all the spectral information using multivariate statistical analysis methods, such as partial least squares regression (PLSR), in order to take full advantage of a greater number of spectral bands and better predict plant parameters.

Spectral measurements of vegetation using the sun as a light source can be problematic (Anderson et al., 2006), especially under unstable atmospheric conditions, requiring frequent collection of reference data by the same or another spectrometer (Herrman et al., 2019). Indeed, the spectral signatures can be subject to errors due to surrounding factors. Wendel and Underwood (2017) have reported that collecting data under natural light and cloud cover has remarkable effects on the spectral response. The study conducted by Upadhyay et al. (2020) has revised and optimized techniques for collecting spectral data of vegetation using a handheld spectroscopic sensor and, thus, has concluded that the spectral data can be impacted by many factors such as light intensity, number of spectral shots per plant, height measurement between the sensor and the sample, and influence of temperature due to sun, wind, and plant humidity. For this reason, the quality of the spectral data is decisive for the efficiency of the quantitative analysis of the spectra.

Leaf water content has been widely studied submitting the vegetation to water stress or studying the effects of drought. In the present study, the water content has been predicted from the plant under progressive watering and correlated with key wavelengths using PLSR models.

Problem Statement

Drought poses a serious threat to both the food and environmental security of numerous regions around the world. Various types of drought exist, such as ecological, hydrological, agricultural, and meteorological drought (Crausbay et al., 2020). In Mediterranean ecosystems, under long dry periods and limited water availability, the water deficiency between the soil and the plant is the principal environmental factor impacting farming yields. Water stress mostly happens during drought and is expressed by visible changes in the color and structure of plants. One method of assessing a plant’s water status is to determine its leaf water content (LWC). The reference method involves measuring the mass of leaves in different hydration states, by specifically cutting the leaf, which increases the complexity of the procedure; thus, it cannot enable the leaf water content to be quantified in vivo (Serrano-Finetti et al., 2023).

Irrigation scheduling based on plants has important implications for the precise management of water resources in agriculture. The first approach to vegetation water management is the development of a non-invasive method that saves time and provides reliable scientific information. Our objective in this paper is to monitor and assess the spectral behavior of Rosmarinus officinalis under progressive watering conditions deploying a non-destructive approach combining hyperspectral remote sensing with PLSR and, consequently, to develop a PLSR model that is able to estimate the water status of rosemary leaves in specific wavelengths based on chemometric techniques.

Materials and Methods

Field experiments were conducted during September 2023 at the Faculty of Sciences and Technology, Tangier (43° 18′ 8″ N, 89° 20′ 8″ W). In this paper, the species taken into consideration is the medicinal and aromatic plant: Rosmarinus officinalis (Table 1).

Table 1 Rosmarinus officinalis description in the field

In this experiment, we explored changes in the reflectance pattern that could be caused by water applied 1 and 24 h before each spectral response recording. Reflectance was recorded from rosemary leaves watered uniformly under natural conditions. The hypothesis of this experiment is that the spectral response of rosemary would be similar for each configuration, if watering does not affect the plant’s reflectance properties. The spectral signatures of the plant were collected in five watering conditions (Table 2). Data were acquired for the 5 conditions at 11:30 in the morning. Rosemary is one of the most drought-tolerant plants, with an annual water consumption of 4500 to 5500 m3/ha during the growing season (Abbaszadeh et al., 2020). On this basis, we estimated the quantity of water required for each watering at 110–130 ml, bearing in mind that the plant surface area considered was 900 cm2. The climatic conditions on the data collection days are detailed in Table 3. Our theory is that any meaningful difference in rosemary leaf spectral signature, taken under different watering conditions, is a consequence of water absorption.

Table 2 Watering conditions for the five watering conditions
Table 3 Weather conditions on the five days of the study

Spectral Data Acquisition and Leaf Humidity Measurements

All spectral data were collected using the ASD Fieldspec HandHeld spectrometer (Analytical Spectral Devices, Inc., Boulder, CO-USA) which has a resolution in the very NIR of approximately 3 nm (around 700 nm) with a full width at half maximum of a single emission line.

A white reference standard with a reflectance of around 100% was used for calibration and prior to the measurement of each sample, in order to have the same illumination geometry for the sample and white reference spectra. The dark current (DC) is an amount of electric current generated by the thermal electrons which is always added to that generated by the incoming photons. To obtain accurate data, the DC of each channel must be subtracted from the total data, and for this reason, the subtraction has been done prior to each sampling.

All data were collected and processed using FieldSpec® Dual data collection software RS3 provided with the spectrometer by Analytical Spectral Devices, Inc., Boulder, CO, USA. Upadhyay et al. (2020) gave an optimum of 30 spectral readings for the same plant sample. Thus, in order to optimize the measurements obtained, an average of 30 reflectance spectral signatures was calculated for each sample.

Finally, 4200 spectra (28 samples × 5 conditions × 30 spectral readings) were collected, and 140 spectral signatures representative of 140 samples (28 samples × 5 conditions) were used for the regression analysis. The spectral range was between 325 and 1075 nm.

To measure the LWC, they were cut at the base of the petiole and weighed on an analytical scale (Kern Serie 770 14). The fresh leaves were then placed in dark paper bags and dried for 24 h at 70 °C in a drying oven (manufactured by Binder: model E 28/drying and heating ovens with mechanical adjustment) until a constant mass was obtained (Fig. 1). Each leaf sample was re-weighed after drying, and the water content of the leaf was calculated according to Eq. (1) (Neto et al., 2017):

$$\text{LWC}= \frac{L\text{fresh }- L\text{dry}}{L\text{dry}}$$
(1)

where \(L\text{dry}\) is the leaf dry and \(L\text{fresh}\) is the leaf fresh mass in grams.

Fig. 1
figure 1

Data collection steps in the experiment

Preprocessing Technics

Visible and NIR spectroscopy cannot be used under the ideal conditions of the Beer-Lambert law because it is no longer verified because the measured spectrum is distorted by many phenomena such as scattering, temperature, and sample absorption. Spectra generally contain noise that is random. Chemometrics was needed to increase the signal differences and retrieve valuable information. For this reason, data were preprocessed by the Savitzky-Golay (SG) filter, spectral first derivative (SFD), and standard normal variate (SNV).

All preprocessing techniques were applied using the “prospectr” package from R software version R-4.3 (under the GNU project license and released under the GNU General Public License). This package is very practical for chemometrics and signal processing, as it contains a variety of tools for the preprocessing and sampling selection of spectral data.

Building the Regression Model

The PLSR analysis was performed to calculate a correlation between the rosemary leaf spectral signatures and the water content. The models were built to extract the latent variables from the independent variables (or predictor variables) represented here by rosemary leaf spectra that best predict the dependent variables (or response variables) which are the leaf water content. Seven PLSR models have been developed based on raw data and 6 different preprocessing techniques. After preprocessing, the spectral data for each condition were divided into calibration and cross-validation sets. The relationship between spectral signatures and the water content of rosemary leaves was established using the PLSR algorithm. It is essential to include a validation dataset—considering the RMSE—as a supervised technique to choose the optimal number of latent variables in the model. In this research, 75% of the spectral data were randomly selected as the training set (105 samples) and 25% for cross-validation (35 samples).

Tian et al. (2001) have reported that full spectrum gave more improved results for PLS models than reduced spectrum; for this purpose, the models were developed using the full range of the spectrum. Spectral data present collinearity and replicates; thus, leave-one-out cross-validation was carried out to assess the performance of the PLSR models. The performance of the PLSR models developed was assessed using the following statistical analysis indicators: coefficients of determination (R2), root-mean-square error estimated by calibration (RMSEC) and root-mean-square error cross-validation (RMSECV), the RPD, and bias. The RPD is the ratio between performance and deviation, and it is used to assess the accuracy of the prediction in NIR spectroscopy models (Nie et al., 2009). Bias is an error occurring when certain characteristics or aspects of a dataset are accorded more importance within the model than others. It is the ratio between the standard deviation of the variable and the standard error of prediction of this variable by the PLSR model. The best models with good performance should exhibit lower RMSE values and higher R2 values. According to Cheng et al. (2017), the model is excellent when the R2 value is greater than 0.91.

The variable importance in projection (VIP) extraction method was considered to select key wavelengths in LWC prediction. VIP scores are computed following Eq. (2) (Farrésa et al., 2015). It is calculated as the weighted sum of the squared weights between the PLSR latent variable and the initial variable, taking into consideration the amount of variance “y” explained in each component extracted.

$${\text{VIP}}_{j}= \sqrt{\frac{{\sum }_{f=1}^{F}{w}_{jf}^{2}*{\text{ SSY}}_{f }*J}{{\text{SSY}}_{\text{total}}* F}}$$
(2)

where \({w}_{jf}\): the weight value for j variable and f component.

\({\text{SSY}}_{\text{f}}\): the sum of squares of explained variance for the fth component and J number of X variables.

\({\text{SSY}}_{\text{total}}\): the total sum of squares explained by the dependent variable.

F: the total number of components.

Results and Discussion

For NIR spectroscopy, Beer-Lambert’s law states that there is a multilinear relationship between the concentrations of chemical compounds and the molecular absorptions at various wavelengths of the spectrum. In other words, there is a correlation between the degree of attenuation of light passing through a material and the chemical properties of that material.

Figure 2 represents mean raw spectral data for the 5 watering conditions considered in this study. For condition 2—1 h after watering—the spectral curve exhibits a higher reflectance in NIR but not in the visible in comparison with the initial condition without watering. The visible region can be influenced by the presence of pigments, so the first watering probably had no effect on the production of these metabolites, but it did help to increase the plant’s water potential.

Fig. 2
figure 2

Rosemary leaf spectral signatures for the five watering conditions

For condition 3 and 24 h after an initial watering, we could see a clear increase in reflectance in the visible and NIR. In condition 4, reflectance increased throughout the visible and NIR wavelength range. Indicating a higher concentration in pigments, especially the chlorophyll which reflects in green (550 nm), and a higher water potential after the two waterings.

In condition 5, reflectance did not increase across the entire spectrum. This can be explained by two hypotheses: either by the fact that the plant is in a state of saturation or full turgidity, bearing in mind that rosemary does not tolerate large quantities of water because it has a strategy of conserving available water by reducing stomatal conductance (Nicolás et al., 2008). In addition, the increase in the amount of water in the soil reaching a certain limit causes poor aeration of the soil, which delays the rate of water absorption (Jain, 2018), or by the effect of heat and wind causing evapotranspiration, since the day corresponding to condition 4 had the highest temperature (29 °C), and we can therefore assume that the plant lost some of its water potential.

Water stress reduces chlorophyll levels, stomatal conductance, and photosynthesis. Water supply therefore impacts not only the water potential of leaves but also the production of pigments, particularly chlorophyll, which produces reflectance peaks in the green (550 nm) part of the visible spectrum. Water is a key factor in the production of chlorophyll in vegetation, playing a crucial role in several physiological functions. In this context, Liu et al. (2024) have concluded that water had a significant effect on chlorophyll and heme, which involve a shared enzymatic biosynthesis path, by reducing the precursor molecules in chlorophyll metabolism and/or increasing the degradation products. Several studies have correlated plant water potential with the near-infrared part of the spectrum (Clevers et al., 2008; Wijewardana et al., 2019). However, in this study, the results obtained showed that water supply could influence reflectance, not only in the NIR, but also in the visible, while also having an impact on pigment production in the plant.

The PLSR model has been built based on the LWC calculated following Eq. (1). Table 4 shows statistical results of leaf fresh and dry mass and LWC for each watering condition. Calibration and validation sets were chosen randomly among the 28 rosemary samples for every condition.

Table 4 Leaf water content according to watering conditions

Table 5 shows the statistical values of R2, RMSE, RPD, and the optimal latent variables for predicting the water content of rosemary leaves from the PLSR models. Cozzolino et al. (2011) have stated that if more than the optimal number of components are selected, overfitting can occur, and the whole model will be highly dependent on the data, which will distort the prediction. Exceeding a certain number of latent variables, the PLSR models showed overfitting results. The RMSE graph demonstrated strongly that after the optimal number of components, the model begins to overfit, as the error for the test set rises. For this reason, only the optimal number of components showing appropriate RMSE results has been used in statistic calculation. Based on the model, the water content of rosemary leaves under 5 watering conditions impacted specific wavelengths.

Table 5 PLSR model results for raw data and different preprocessing techniques (cal: calibration; val: validation)

In general, the leaf water prediction models showed reliable and excellent results with RPD levels above 2.5 (Zhang et al., 2015). From Table 5, we can observe that, on one hand, the preprocessed data with a combination of Savitzky-Golay filter and spectral first derivative show the highest R2 = 0.99 and RPD = 10.09 for calibration and lowest RMSEC = 0.018 and low BIAS =  − 0.0016 for validation. On the other hand, the model based on SNV-SG preprocessing gives the highest R2 = 0.959, RPD = 4.97 and low BIAS =  − 0.005 for validation and the lowest RMSECV = 0.036. It means that the PLSR model using the SNV-SG preprocessed dataset has the highest performance in predicting water content in rosemary leaves. For this reason, we have considered this model for further analysis. Furthermore, the ratio RMSEC/RMSECV (Fig. 3) shows that component 7 is the one with the lowest RMSE for cross-validation.

Fig. 3
figure 3

RMSEC/RMSECV ratio for the PLSR model

Models with an R2 in the range of 0.80 to 0.90 are particularly suitable for applications in the agricultural sector (Williams, 2001). In the external validation, the PLSR model based on SNV-SG preprocessing showed R2 = 0.959 and RPD = 4.97, suggesting that the predictive abilities of the water content of the model were adequate and satisfactory. Indeed, Nie et al. (2009) have indicated that when the RPD is greater than 3, the prediction performance is successful for analytical objectives. In the same way, Neto et al. (2017) have obtained R2 = 0.8386 and RPD = 2.35 using PLSR for leaf water content prediction in sunflower, whereas Zhang et al. (2015) have used full spectra to predict soluble protein in oilseed rape leaves, and their PLSR model performance was based on R2 = 0.9441 and RPD = 2.98. The scatterplot of the PLS prediction models selected for the rosemary leaf water content on the basis of its spectra is illustrated in Fig. 4.

Fig. 4
figure 4

Scatter plot between predicted and reference values for rosemary leaf water content (cal: calibration; cv: cross-validation)

The highest VIP statistic results above the threshold of 1 highlighted the importance of the 530–574, 606–612, 621–633, 656–693, 700–799, and 898–1028 nm spectral regions for leaf water content estimation (Fig. 5). In the same direction, the lowest regression coefficients are observed at 623–632 and 918–928 nm (Fig. 6). More specifically, VIP maxima and regression coefficient minima indicate that the most sensitive narrow bands were obtained at 628, 629, 785, 927, and 928 nm. This confirms the spectral results obtained above; the water supply caused a progressive increase in reflectance in the red and the SWIR (shortwave infrared) regions.

Fig. 5
figure 5

VIP from rosemary leaf water content PLSR model

Fig. 6
figure 6

Regression coefficients from rosemary leaf water content PLSR model

These results are in line with the study of Neto et al. (2017) who have concluded that the wavelengths related to water content are in the NIR region, 990 nm specifically, whereas Rallo et al. (2014) have found that absorbance spectral bands around 1450 and 1900 nm are associated with leaf water content. Furthermore, Zahir et al. (2022) observed that spectral bands at 740 nm, 840 nm, 970 nm, 1200 nm, 1460 nm, and 1850 nm reveal information about water molecular structure and leaf water content.

Conclusion

The study concluded that combining hyperspectral remote sensing with the partial least squares regression and chemometrics was shown to be highly effective in predicting and assessing the state of water in rosemary leaves subjected to progressive watering. It can be efficiently used in a non-destructive way to study rosemary owing to its attributes to detect parameters influencing its spectral response. Indeed, the progressive watering leads to variations in spectral reflectance. Using NIR spectroscopy at the leaf stage enables direct assessment of the water status, eliminating the need for plant water costing and time-consuming analysis and enabling growers to use this information for water management.

From the statistical results of the PLSR model and for the external validation, it was found that the preprocessing techniques based on the combination of standard normal variate and Savitzky-Golay has the best performance in predicting leaf water content with R2 = 0.959, RPD = 4.97, and RMSECV = 0.036. In addition, the study demonstrated that the spectral narrow bands at 628, 629, 785, 927, and 928 nm are the most sensitive to water content.