Introduction

Hyperspectral sensing is the measurement of the spectral characteristics of materials by the using sensing systems with more than 60 spectral bands and with spectral resolutions less than 10 nm. This resolution can produce a continuous portion of the light spectrum defining the chemical composition of an object through its spectral signatures (Gomez 2020). With substantial developments in recording spectral bands of electromagnetic waves, hyperspectral sensors can provide data with a large number of spectral bands due to their high resolution in the range of 350 to 2500 nm, and spectral bands are acquired by passive optical sensors. Spectral data are detected from any surface that can reflect, absorb, and transmit electromagnetic radiation (Hruška et al. 2018).

Hyperspectral imaging provides the ability to complete reflectance or fluorescence spectroscopy on all single spatial pixels of a spectral image thereby discerning characteristics that cannot be seen by human eyes (Robila 2004; Gomez 2020). The basic shape of a curve over the spectral range is characteristic of the parent material of the object being analyzed by spectroscopy (Liang 2004). In the visible (VIS) to near infrared (NIR) spectrum (approximately between 400 and 1100 nm), characteristics of water, soil, or plant canopy give rise to specific curvatures in the reflectance spectrum, which makes them recognizable (Liang 2004; Robila 2004).

Perhaps the biggest advantage of hyperspectral data over simpler red–green–blue (RGB) imagery and multispectral data is that hyperspectral data can detect more accurate information of the object due to more spectral bands being recorded. Hyperspectral acquisition devices, including sensor types, acquisition modes and unmanned aerial vehicle (UAV)-compatible sensors, provide information that is needed or used both for research and commercial purposes, (Adão et al. 2017). Hyperspectral sensors and UAV have been useful in many areas of study including material identification, precision agriculture (vegetative coverage, nutrition deficiencies, foliar water content, physiological disorders, etc.), environmental aspects (wetlands, hydrology, etc.), health care (medical diagnoses, food safety, food quality assessment, etc.), and many more applied fields (Adão et al. 2017; Gomez 2020).

A vegetative index (VI) describes an equation that processes spectral data for the purpose of determining information about plant health. Detectable vegetation indices (VIs) from hyperspectral signatures can provide an estimation and analysis of several plant characteristics, such as biophysical, physiological, or even biochemical parameters in crops, including leaf chlorophyll content (LCC), leaf water content (LWC), leaf area index (LAI), fractional photosynthetically active radiation (FPAR) absorbed by a canopy, surface roughness, and phenology, which are some of the most important inputs to land surface process models (Liang 2004; Adão et al. 2017; Morcillo-Pallarés et al. 2019). These VIs can be applied in the regression models to help estimating plant status, such as foliar mineral contents.

With the importance of nitrogen increasing yield efficiency and crop health, modern application of hyperspectral signatures in preventing nitrogen deficiencies in the field have become widespread. Hence, much research has been conducted using remote sensing and applying hyperspectral signatures to determine crop nitrogen deficiency, required rates of fertilizers to increase crop production, or even the amount of nitrogen uptake by plants to improve agricultural production and yield efficacy (Maes and Steppe 2019). DeOliveira et al. (2017) applied selected vegetation indices to estimate foliar N concentration in three Eucalyptus tree clones grown in the field. Liu et al. (2016) applied multiple linear regression and neural network analysis to find a relationship between the leaf nitrogen content of field grown winter wheat and vegetative indices in narrow bands. Other studies have used hyperspectral indices to check the nutrition status of sodium and potassium content in grass (Capolupo et al. 2015), potassium deficiency level in canola (Severtson et al. 2016), nitrogen concentration in field grown oat (Van Der Meij et al. 2017), corn (Gabriel et al. 2017), rice (Wen et al. 2018), and wheat (Zhu et al. 2018), and leaf N, P, K, Ca, Mg, and few micronutrients of corn and soybean plants (Pandey et al. 2017).

Little-leaf mockorange (Philadelphus microphyllus A. Gray) is a species from the Hydrangeaceae family. This species is a shrub native to the western United States (California, Colorado, Utah, Nevada, Wyoming, Arizona, Texas, and New Mexico) and grows in arid rocky slopes, cliffs, or pinyon-juniper to coniferous woods (Gardenia 2019; Lady Bird Johnson Wildflower Center 2015; Khajehyar et al. 2024). Species within the mockorange genus have historically been propagated by seeds, summer soft-wood cuttings, hardwood cuttings and layering (Dirr and Heuser 2006), but little-leaf mockorange can be difficult to propagate as ex vitro cuttings and fails to breed true from seed (Khajehyar et al. 2024; Steve Love, University of Idaho, personal communication), meaning a more efficacious propagation system, such as micropropagation, would be advantageous. To our knowledge, this Philadelphus species is new to tissue culture and no other Philadelphus species have been put into culture to date, and little-leaf mockorange is the first to be put into tissue culture for asexual plant production. No other Philadelphus species have been produced via different tissue culture techniques. So, this species was used since a nursery in the state (Idaho) wanted to see mass production of the selected plant. Axillary shoot proliferation is easier to complete than most other in vitro procedures to use for rapid clonal reproduction of this species, particularly since this technique can take advantage of axillary bud production on its stems. For these reasons, this species was used as a test for trying to determine if hyperspectral analysis could be used to try to obtain the proper nutrient levels to use in the culture medium for a species to put into culture for the first time. If hyperspectral analysis could be used successfully for this species, then other species of plants that have higher economic production in tissue culture could be studied.

Establishing axillary shoot cultures in vitro may require adjusting the nutrient medium components to optimize desirable shoot growth of the new species. Finding the optimum concentration of each component is critical and requires time and money. Estimating an explant’s foliar mineral status to check its health status is important to attain optimal in vitro growth. Usually, destructive methods are applied to estimate foliar mineral contents, especially for tissue cultured plants. Finding nondestructive methods, such as applying hyperspectral signatures can help growers to reduce their production cost and save time.

To date, reports on using hyperspectral devices and hyperspectral vegetation indices in tissue culture environments are lacking. To check the feasibility of using of this technology to evaluate the mineral content of tissue cultured little-leaf mockorange shoots, we used a spectroradiometer during the shoot proliferation stage of micropropagation to determine if hyperspectral imaging could help in estimating nutrition status of the explants during shoot proliferation. If hyperspectral imaging shows success, it can help tissue culture plant producers save money by avoiding destructive sampling for foliar nutrient analysis and save time waiting for nutrient analyses to be completed.

Materials and methods

Plant materials and tissue culture

Stems from the selected Little-leaf Mockorange (Philadelphus microphyllus A. Gray) plant were established as axillary shoot cultures as described elsewhere (Khajehyar et al. 2024). Shoot cultures were subcultured monthly for 6 months until the shoots were acclimated to in vitro conditions. Stable shoot cultures were used in all experiments.

Philadelphus microphyllus stems from stable shoot cultures were subcultured and grown on half-strength Murashige and Skoog (½ MS) medium (Murashige and Skoog 1962) supplemented with different cytokinins (all purchased from PhytoTech Laboratories, Inc., Lenexa, KS), such as zeatin (Zea, product ID: Z125), kinetin (Kin, Product ID: K750), benzylaminopurine (BA, Product ID: B800), meta-Topolin (MT, product ID: T841), thidiazuron (TDZ, Product ID: T888), or dimethylallylamino purine (2iP, Product ID: D217) (each used at concentrations of 0, 1.1, 2.2, 4.4, or 8.8 µM in separate experiments), or different concentrations of minerals such as N (0, 15, 22.5, 30, 37.5, 45, or 60 mM), or Fe (0, 0.5, 5, 25, 50, 75, 100, or 500 µM). Iron was tested in the culture media since it is an essential and often limiting micronutrient. The cytokinin applied in the culture media for the mineral experiments was 1.1 mM zeatin. Six stem explants (per jar) were placed on the culture medium in 195 ml culture vessels (baby food jars) filled with 40 ml ½ MS medium containing 0.5 mg·L−1 thiamine-HCl, 0.25 mg·L−1 nicotinic acid, 0.25 mg·L−1 pyridoxine–HCl, 1 mg·L−1 glycine, and 0.05 g·L−1 myo-inositol, with pH = 5.6. Four replicate jars were used per treatment (different PGRs or minerals at each concentration used). Cultures were incubated in a SG-30S germinator (Hoffman Manufacturing Inc., Albany, OR) at 25 ± 1 °C under a 16-h photoperiod (cool-white fluorescent lamps), with 38 μmol·m−2·s−1 photosynthetic photon flux (PPF), for 8 weeks with one subculture onto the fresh media after the 4th week. The fresh media contained the same concentrations of cytokinin, N, or Fe and were made 1 day before subculturing. At the end of week eight, explants were harvested for collection of growth data and measurement of hyperspectral signatures.

Preparing the spectroradiometer and taking readings

For this research, we used either an Analytical Spectrum Devices FieldSpec 4 High-Resolution spectroradiometer (Malvern Panalytical Ltd., Westborough, MA, USA) or an Analytical Spectrum Devices FieldSpec HandHeld-2 spectroradiometer (Analytical Spectral Devices Company, Boulder, CO, USA) (Supplementary Fig. 1). After 30 min of spectroradiometer warm up, the device was optimized and calibrated with a Spectralon® 99% white reference panel. During calibration, an average of 100 dark current measurements were calibrated together, and an average of 50 scans of the Spectralon® white reference were measured every two minutes (Labsphere Inc., North Sutton, NH, USA) (Beck 2019). Target reference recordings displayed an average of 20 scans at an optimized integration time of approximately 1 s.

Shoots from all the cytokinin experiments, all N, and all Fe experiments used at the various concentrations (Supplementary Fig. 2), were analyzed for their hyperspectral reflectance and then their shoot mineral contents.

Reflectance readings of mockorange shoots were made immediately (within 2 min) after they were taken out of the jar and prior to completion of the reflectance spectra procedure (Supplementary Fig. 3). Four jars for each treatment (PGR or mineral at each concentration) were read for the data. For each jar 3 readings were done which resulted in 12 observations overall for each treatment. Measurements were completed in a dark-room and conducted on a black-colored bench to exclude external light and reduce outside lights. The probe was held about 5 to 10 cm over the explants to take the reflectance. Measurements were taken on all six shoots that were grown within each culture jar. Three duplicate readings were recorded for shoots grown in each jar in order to reduce error effects. Four jars per treatment were used for a total of 12 hyperspectral readings. After every 10 to 12 readings, a new calibration was completed to reduce the error from external white light. All measurements were acquired using RS3 software version 6.4 (Malvern Panalytical Ltd., Westborough, MA, USA).

Reflectance spectral data represented the full range of VIS, NIR, and short wave infrared (SWIR) light between 350 and 2500 nm, with a resolution of 1 nm. The spectral sampling interval was automatically interpolated from 1.4 nm to 1 nm at the time of each individual measurement by RS3 software, so a single value for each wavelength from 350 to 2500 nm was recorded (Beck 2019). Data were exported by the ViewSpec Pro software version 6.2 (Malvern Panalytical Ltd., Westborough, MA, USA). The average of three readings of the reflectance from the group of six explants (per jar) was used to create a single treatment reflectance spectrum for each jar of shoots.

Tissue analysis for mineral content

After taking the hyperspectral reflectance, the shoots were separated from the agar medium, placed in an envelope and dried at 70˚C for 72 h. Dried shoots were ground using a pestle and mortar. Dried tissues were sent to the tissue analysis lab (Brookside Laboratories, Inc., New Bremen, OH) for foliar nutrient analysis. Tissue analysis was completed by using a combustion method applying a Carlo Erba 1500 C/N analyzer to estimate total N content (method B2.20, Miller et al. 2013). For Ca, lab procedures entailed use of nitric acid and hydrogen peroxide in a closed Teflon vessel and digested in a CEM Mars Microwave and analyzed on a Thermo 6500 Duo ICP (method B4.30, Miller et al. 2013). Results from foliar analyses were used for correlation model training with the hyperspectral signatures (Supplementary Table 1 and 2).

Hyperspectral data analysis

Preprocessing the spectral signatures was the first step in hyperspectral dataset analysis, particularly for spectra collected by the spectrometer. To further reduce noise, spectra were preprocessed with a Savitzky–Golay smooth filter (window size = 5 and polynomial order = 4) (Ge et al. 2019). The process of selecting an appropriate order and window size was done by trial and error, with the goal of smoothing out only large changes on a signature's surface.

The success of developing regression models is contingent upon the number of features assigned to the feature space (Zhao et al. 2019). Apart from the reflectance value at each wavelength, the hyperspectral dataset was used to extract spectral indices and geometric features from continuum removal regions. Thus, the number of features used for regression becomes even more critical when hyperspectral datasets are used; the large number of spectral bands makes determining whether spectral bands or spectral vegetation indices generated from spectral bands, or both, are associated with foliar chemical or physiological status, or in this case, leaf mineral content. To address this question, related features (explained in the following sections) were extrapolated from the spectra and then feature selection approaches were suggested for training the model with fewer but more informative features.

Spectral indices

Spectral indices defined by the mathematical operators between two or more spectral bands are also widely used for features extraction in remote sensing (Lu et al. 2020). Many spectral indices used in agricultural applications are suitable for the specific purpose of plant monitoring. In this study, some commonly used spectral indices for mineral estimation were selected (Table 1).

Table 1 The highest correlated vegetation indices determined by using hyperspectral imaging in this study. Formula calculations were obtained from Anonymous, Index Database 2011

Continuum removal

The absorption bands of the electromagnetic spectrum contain valuable information about the minerals or chemical compounds present in the target. This information has been used in various studies. Huanga et al. (2004) and Gomeza et al. (2008) used absorption features to estimate the amount of clay and calcium in the soil and the nitrogen concentration in a tree's canopy leaf surface, respectively.

Basically, the presence of organic components on the surfaces of plant leaves results in absorptions in the VIS and NIR wavelength ranges. These molecules include C-N, NH, and OH (Hunt 1980), which indicate significant biochemical substances found on plant surfaces, such as lignin and starch, as well as nitrogen-containing components found in plants, such as protein and chlorophyll. These chemical and organic compounds may produce absorption in the spectral signature of plants due to the electron transfer phenomenon in the VIS region of the electromagnetic spectrum. On the other hand, specific absorptions in the SWIR region of the plant’s spectral signature may be connected to the cellulose, glucose, and water content of the plant’s leaf structure.

To demonstrate the geometrical differences between absorption regions, spectra need to be transformed into numerical features. To extract numerical information from the absorption region's surface, the spectrum's general concave shape must be ignored. This approach to normalization is referred to as “continuous removal” or “convex body”, and it enables the comparison of spectra acquired with various equipment or under varying lighting conditions (Sowmya and Giridhar 2017).

The continuum removal, spectral signatures, and convex hulls of spectra can be shown graphically (Supplementary Fig. 4). Three characteristics are defined in this study by the geometry of the spectral signature following continuum removal. The depth, area, and asymmetry features in Supplementary Fig. 4 correspond to the continuum values at the lowest point of absorption, the area under the continuum curve in an absorption region, and the ratio of the left to right area. In this study, fifteen ranges in spectral signatures were selected (Table 2). To choose these spectral ranges, the spectral signature was carefully examined, and the absorption regions were selected based on a visual comparison between the absorption regions and the surrounding (left and right extremum) wavelengths.

Table 2 Fifteen wavelength ranges taken from spectra extracted from little-leaf mockorange shoot cultures by using a spectroradiometer

The \({Area}_{Left}\) is the area of space between the continuum line and the continuum removed spectrum on the left, and \({Area}_{Right}\) is the area of space between the continuum line and the continuum removed spectrum on the right, the features are defined as follows (Aspinall et al. 2002):

  • D = The absorption depth (the lowest point in continuum region)

  • \(Area={Area}_{Left}+{Area}_{Right}\)

  • \(Asymmetry= \frac{{Area}_{Left}}{{Area}_{Right}}=Asy\)

For example, Asy 2 means the Asymmetry in the second wavelength range.

Model development

From the feature selection section, relevant features from spectral signatures were identified for tissue cultured shoots. The next steps were to 1) fit the regression model by using machine learning methods and 2) validate their significance using test data. Linear, Random Forest and Support Vector Machine were three regression models used in this research and are briefly explained below.

  • Linear Regression: is a linear model that assumes a linear relationship between the input variable (x) and the single output variable (y). To select the relevant features, defined features (independent variables), such as reflectance values, continuum removal, and spectral indices for a linear model, a correlation test was used. Pearson's correlation coefficient is the covariance of the two variables divided by the product of their standard deviations (Freedman et al. 2007). Pearson correlation coefficient was used so that features with high correlation values were first recognized and selected from the list of defined features. In addition to a single variable linear model, a multi-variant linear model was also examined to determine the performance of different combinations of spectral features on the estimation results.

  • Random Forest Regression (RF): This type of regression is a supervised learning algorithm that uses an ensemble learning method for regression and also is constructed by a set of decision trees. Group learning technique is combined with multiple decision trees to compare against a single regression model, enabling RF to obtain satisfactory and acceptable results for an R-square (R2) value close to 1 or root mean square error (RMSE) close to zero, which shows ideal estimation. For this reason, RF has been widely used by researchers in regression and classification problems. The performance of a random forest model depends on the number of trees and the input variables. Therefore, in this paper, different random forest regression models were trained to achieve the best model.

  • Support-Vector Machines (SVM): This type of regression is a supervised learning model with associated learning algorithms that analyze data for classification and regression analysis. Various models can be produced based on changing the parameters in SVM, including the kernel type and the c-constant penalty term, which has the responsibility of balancing and maximizing the separator margin in features space (for example a two-dimensional space constructed by reflectance values in two wavelengths). In this study to reach an optimal model, parameter tuning was considered first by using RBF (Radial Basic Function), Linear and Polynomial (commonly used or built-in functions in SVM algorithm for transferring values of a variable to another space, these functions are known as kernels). Kernels and C values of 10, 100 and 1000 were used and then models with satisfactory results were selected.

To manage the results, the following procedures involved separately adding variables into the model and then calculating the coefficient of determination (R2), RMSE and the correlation coefficient (Corr). Next, a combination of variables was added to the model (multiple-inputs) and then new calculations for R2, RMSE and Corr were made. The best model was chosen by comparing the results and using the best R2 and Corr values and by using error bar plots and scatter plots. The error bar plots showed the error between observed and predicted values and the scatter plots showed the correlation between observed versus estimated values.

$${R}^{2}={\left[\frac{1}{N}\frac{\sum_{i=1}^{N}\left[({P}_{i}-\overline{P })({O}_{i}-\overline{O })\right]}{{\sigma }_{p}{\sigma }_{o}}\right]}^{2}$$
$$RMSE= {(\frac{1}{N}\sum\limits_{i=1}^{N}{[{P}_{i}-{Q}_{i}]}^{2})}^{1/2}$$

where N is the number of observations, Oi is the observed values, Pi is the estimated values, O is the mean of the observational values, P is the mean of the estimated parameter and is the standard deviation of the observations and is the standard deviation of the estimated values.

Data partitioning

Data sets were divided into model training and model test groups for generating the optimum regression model. Data partitioning or splitting data sets (hyperspectral recorded samples) into training and sample (test) groups was one of the crucial steps in regression. In our case, 39 samples (reflectance spectra) out of 56 samples (70%) were used for model training and the rest of samples were used for model testing (17 samples out of 56 samples). The training data set was then used to develop a regression model with wavelengths in the spectral signature and vegetation indices calculated from those spectral signatures, as well as generated features obtained from those spectral signatures correlated to the foliar nutrient content from lab analysis. The developed model was validated and evaluated by using test datasets.

Model evaluation criteria (Statistical criteria for numerical evaluation of the developed model)

A schematic diagram of the methods used for developing a regression model from the hyperspectral bands and the mineral content in little-leaf mockorange shoots, is shown in Fig. 1, and the evaluation criteria were calculated separately for foliar N or Ca contents.

Fig. 1
figure 1

Schematic diagram of model development from hyperspectral bands and foliar mineral analysis for little-leaf mockorange shoots grown in tissue culture

The flowchart can be divided into the following steps:

  • Step 1: Separately adding variables into model and calculation of R-Square (R2), Root Mean Squared Error (RMSE), and Correlation.

  • Step 2: Adding a combination of variables (multiple inputs) into the model and then calculation of R2, RMSE and Correlation.

  • Step 3: Comparing the results and choosing the best models given their performance in terms of evaluation criteria to be shown using error plot and scattering plots.

  • Step 4: Plotting the best results in error plot (showing the error between observed and estimated values) and scatterplot (showing scattering of observed and predicted values to each other).

Results

The correlation between spectral features including spectral bands, spectral indices and continuum removal features were calculated. Spectral bands with higher correlation to leaf N content were used in regression model training (Figs. 2 and 3). As shown, the wavelengths from 648 to 651 nm were shown to have a moderately high correlation with %N with correlation value of 0.30 (Fig. 2). In general, leaf reflectance between 505 and 670 nm had the highest correlation with N content of microshoots and was used for developing a linear model for N estimation (Fig. 2).

Fig. 2
figure 2

Correlation between leaf %N and the hyperspectral signatures acquired by a spectroradiometer from tissue cultured little-leaf mockorange shoots. The boxes show the wavelength of the peak in the spectrum and the correlation value

Fig. 3
figure 3

Correlation value between features and VIs with leaf N content of little-leaf mockorange shoots produced in tissue culture. The ovals show the features with highest correlation with the leaf N content

Model development

Results showed that the reflectance values at the wavelength of 648 nm, asymmetric feature in range 1819 nm to 2150 nm (Asy 11) and the area from 559 to 772 nm (Area 3) had correlation values of 0.30, 0.31 and 0.37 with %N content (Fig. 3). These spectral features provided information needed for predicting the %N to generate a linear model for N content measurement. The best single variable linear model was obtained by Asymmetric features in range 11 shown below.

$$\%Nitrogen={4.47}^{*} \left(Asy 11\right)-3.45$$

Based on these spectral data, N content acquired by a linear model was estimated by R2 = 0.21, RMSE = 0.54 and Corr = -0.45 (Supplementary Fig. 5).

Random Forest regression was used in the next model. One of the main advantages of RF regression is that the number of input variables lack an effect on this model (Horning 2010). The algorithm is able to apply the most effective variables given to entropy value, and then develop the regression model by using the most effective variables, meaning that RF algorithm could also be a feature selection. All the selected spectral bands from the correlation test and all the spectral features (indices and continuum removal) were added to the RF model. Based on the results, the RF regression model revealed that asymmetric point from 1819 to 2150 nm (Asy 11), asymmetric point from 559 to 772 nm (Asy 3), the reflectance values at the wavelength of 2480 nm, reflectance at wavelength of 525 nm, and the Double Peak Index (DPI) were the most effective features to generate a nonparametric (non-linear) model (Fig. 3).

To develop a RF model, besides using optimal feature selection as effective inputs to the model, the number of trees in a RF model must be determined. By testing various models with different combinations of the mentioned features and/or indices, eventually the most accurate model was selected (Table 3). The fitted model with DPI index and reflectance at 525 nm and the tree number of 5 was a more accurate model fitted by RF regression, with R2 = 0.72 and RMSE = 0.30, and correlation = 0.84 (Supplementary Fig. 6) compared to the other fitted models.

Table 3 Various models developed for %N estimation in little-leaf mockorange shoots produced in tissue culture. Each model has different feature combinations and a different number of trees via the Random Forest algorithm

Support vector machine, one of the most commonly used regression methods, was used for the developing another regression model. In the SVM model, two main objectives were considered. First, the selected features from the correlation test and those selected by RF methods were added to a SVM model. Second, the parameters of the SVM model, including the kernel type and the penalty term, were evaluated by trial and error such that the most accurate SVM model fitted by the optimal model had the lowest RMSE.

The model generated by SVM regression provided an estimation of foliar %N content that compared to the linear model, and the fitted SVM model including Double Peak Index (DPI) with asymmetric point from 1819 to 2150 nm (Asy 11) (Table 4). Another model including DPI with asymmetric point from 559 to 772 nm (Asy 3), provided an approximate accurate method to estimate foliar N content, respectively at R2 = 0.58 and RMSE = 0.32, or R2 = 0.61 and RMSE = 0.33 for little-leaf mockorange shoots produced in tissue culture (Supplementary Fig. 7).

Table 4 Various models developed for %N estimation of little-leaf mockorange shoots produced in tissue culture. The models had different feature combinations and different penalty terms via SVM algorithm. RMSE, R2 and correlation values resulted from the SVM model tuned by the linear kernel and three different values of penalty term

Foliar calcium content

After analysis of the hyperspectral bands and checking for their correlation with the Ca content of the shoots received from the tissue analysis, the bands with higher correlations were selected, and those were 721 nm, 541 nm, 1293 nm, 1805 nm, and 2209 nm, with correlation values of 0.35, 0.33, 0.30, 0.28, and 0.26, respectively (Fig. 4).

Fig. 4
figure 4

Correlation between leaf %Ca and the hyperspectral signatures acquired by the spectroradiometer from tissue cultured little-leaf mockorange shoots. The numbers in the rectangles represent wavelength (in nm) and correlation values, respectively

Examining the correlation values between %Ca with different features and VIs spectra showed that the minimum (depth) external of the wavelength between 1819 to 2150 nm (Min 11), and minimum (depth) external wavelength between 1287 to 1670 nm (Min 8) had the highest correlation values with Ca, respectively 0.59 and 0.45 (Fig. 5).

Fig. 5
figure 5

Correlation value between features and VIs with leaf calcium content of little-leaf mockorange shoots produced in tissue culture. The abbreviations and numbers in the ovals represent the features and their correlation with the leaf Ca content

Model development

Model development showed that Ca content determined by a linear model consisted of parameters of minimum (depth) external wavelengths between 1819 to 2150 nm (Min 11) and the area from 559 to 772 nm (Area 3) could be estimated by R2 = 0.83 and RMSE = 0.09. Nevertheless, the coefficient of Area 3 was low enough to ignore it to draw the error bar graph (Supplementary Fig. 8).

%Calcium = 1.13*(Min11) + 0.08

The Random Forest algorithm provided a successful model to estimate the %Ca of little-leaf mockorange shoots. After examining several models with different feature combinations and tree numbers, the model including four features of minimum (depth) from 838 to 843 nm (Min 4), area from 2428 to 2490 nm (Area 15), asymmetric point from 1670 to 1714 nm (Asy 9), and Cellulose Absorption Index (CAI), with the tree number of 5 were the most effective features to generate a nonparametric (non-linear) model (Fig. 6, Table 5), yielding R2 = 0.99 and RMSE = 0.03 and correlation value = 0.99 (Supplementary Fig. 9, right). The error bar plot in Supplementary Fig. 9 (left) reveals only slight differences between observed and estimated Ca among test samples proving the success of developed RF model for shoot Ca estimation.

Fig. 6
figure 6

The importance value of generated features and selected VIs regarding leaf %Ca in little-leaf mockorange shoots produced in tissue culture as determined via Random Forest algorithm. The abbreviations and numbers in the rectangles represent the features and their correlation with the leaf Ca content via RF regression model

Table 5 Various models developed for %Ca estimation in little-leaf mockorange shoots produced in tissue culture. The models contain different features combinations and different number of trees via the Random Forest algorithm

Using the specific spectral features and a selected index (CAI) acquired from the RF algorithm as the best variables to use in model development. The specific spectral features and CAI index used for the RF algorithm were also used to develop a fitted model for SVM regression. After developing and running several models with different penalty terms (costs = 10, 50, or 100) and different kernels (linear, polynomial, or radial) (Table 6), eventually a model via linear kernel, including all four features of minimum (depth) reflectance from 838 to 843 nm (Min 4), area from 2428 to 2490 nm (Area 15), asymmetric point from 1670 to 1714 nm (Asy 9), and CAI was eventually developed. This model had a R2 = 0.59 and RMSE = 0.16 and was determined to be the better model, regardless of the penalty term (cost value) (Table 6, Supplementary Fig. 10).

Table 6 Various models developed for %Ca estimation of little-leaf mockorange shoots produced with tissue culture. The models used different feature combinations and different penalty terms via the SVM algorithm

Discussion

Regression modeling plays an important role in estimating various plant characteristics, such as mineral content and water content. Accurate prediction of these parameters can assist in better understanding of plant growth and development, and improving agricultural practices. In this context, several regression models have been developed for hyperspectral data analysis, including the Random Forest (RF) and Support Vector Machine (SVM) models. In this study, we compared the performance of linear, RF, and SVM regression models in predicting the nitrogen (%N) and calcium (%Ca) content of tissue-cultured shoots. Additionally, we evaluated the importance of selecting the best features and wavelengths from the hyperspectral bands for accurate prediction. Our findings indicated that the RF model outperformed the SVM model in predicting %N, whereas %Ca was better predicted by the RF model with higher R2 and lower RMSE values. These results demonstrated the importance of selecting the appropriate regression model and optimal features for hyperspectral data analysis in predicting plant characteristics.

This research demonstrated that hyperspectral imaging can be used to predict the percentages of N and Ca in little-leaf mockorange shoots produced in tissue culture. Linear, RF and SVM regression procedures were used to obtain an accurate model to estimate the %N and %Ca in little-leaf mockorange shoots produced in tissue culture. Among the three developed regression models used to estimate and predict the foliar nitrogen content, random forest regressions and SVM, could estimate %N more accurately than the linear regression model. Nevertheless, the models developed to predict %N were slightly less accurate than those developed for predicting %Ca in the tissue cultured shoots.

The RF (tree number = 5) could estimate %N better than SVM (no matter what the cost (parameter or penalty term) used for this regression model). For %Ca, the RF model had a higher R2 (0.99), had a lower RMSE (0.03) and provided a better model than SVM with a lower R2 (0.59) and a higher RMSE (0.16). Finding the best regression model and the best features or indices as well as the best wavelengths throughout the hyperspectral bands is highly important for predicting a specific mineral content or other plant characteristics, such as water content.

Although the linear regression model provided an acceptable R2 value, the model failed to predict %Ca. Hence, RF and SVM regression models were alternately considered. Based on the results obtained from this research, foliar %Ca content could best be estimated using a non-linear regression model rather than a linear model. Although the features used in the model (including the Cellulose Absorption Index) worked for both RF regression model and SVM regression model, the RF regression had stronger R2 and correlation, and therefore was a better model to estimate the %Ca of tissue cultured shoots of little-leaf mockorange. Cellulose is an important component in the structure of primary cell walls of green plants (Khajehyar 2021; Khajehyar et al. 2024). Calcium interacts with cellulose as a cellular structural component. A high correlation between %Ca and CAI is likely due to this relationship, and in the future more detailed experiments can be conducted to determine any possible relationship between %Ca and CAI index.

To date, research using hyperspectral images to estimate shoot mineral contents of shoots or plantlets produced in tissue culture (in vitro) is lacking. Studies, however, have been conducted to estimate N content of agronomic field crops, such as estimating N in winter wheat at different growth stages, based on NIR wavelengths via multivariate linear regression and Back Propagation (BP) neural network using vegetation indices (Liu et al. 2016), estimating leaf N content of winter wheat via selected spectral indices and around NIR wavelengths (Zhu et al. 2018), estimating N content in potato plants in NIR (Clevers and Kooistra 2012), N estimation in maize via VIs, such as NDVI, Renormalized difference vegetation index (RDVI) or Optimized Soil-Adjusted Vegetation Index (OSAVI) (Gabriel et al. 2017), N estimation in rice with Gaussian process regression (GPR) model (Wen et al. 2018), N estimation of eucalyptus using NDVI in red-edge and modified red-edge NDVI (DeOliveira et al. 2017), and estimation of macro- and micronutrients such as N and Ca in soybean and maize via partial least squares regression (PLSR) models (Pandey et al. 2017).

Although some reports describe the use of NIR or lower short wave infrared (SWIR) wavelengths to provide effective estimates of N, almost all of these studies have used only vegetation indices such as NDVI or other VIs. The difference between this study and other hyperspectral studies was application of different geometric features generated from continuum removal, such as minimum reflectance (depth), area under the spectrum, and asymmetric point of the spectrum alongside the reflectance spectrum acquired from little-leaf mockorange shoots produced in tissue culture. Applying these geometric features for plants grown in an in vitro environment, nevertheless, resulted in satisfactory R2 and RMSE values obtained from the regression models used to predict N and Ca contents in the shoots.

An interesting aspect of %N and %Ca estimation was that both were predictable in spectrum ranges from 1819 to 2150 nm (Range 11) and from 559 to 772 nm (Range 3). Using different features of these ranges provided information for each of these two minerals in little-leaf mockorange shoots. In addition, correlation plots of estimated and measured values for N and Ca concentrations, revealed a small gap between higher concentrations and lower concentrations of these two minerals, probably due to the limited number of samples (less than 100) used for predicting their concentrations. The other possibility for the gap was that hyperspectral images could estimate N or Ca only at higher concentrations, due to the tiny size of the leaves and stems on the shoot cultures, meaning less information was acquired from their reflectance.

A deeper look at the scatter plot of %Ca obtained from the RF algorithm (Supplementary Fig. 9, error bar plot) showed that samples with values higher than 0.15 of CAI features had much lower differences between the measured and estimated values compared to the differences between measured and estimated values of CAI less than 0.15. This result indicated that for a more accurate prediction, features with higher correlation values must be selected. On the other hand, except for two samples (error bars shown in Supplementary Fig. 10, left), the developed model either accurately estimated or slightly over-estimated %Ca.

Most of the earlier foliar nutrient content studies have used mostly the vegetation indices to estimate canopy minerals especially N. Unfamiliarity with hyperspectral features relative to prediction of foliar mineral status may be a limitation on using of this technique in comparison with vegetation indices. Recruitment of a team of plant scientists, plant nutritionists, and hyperspectral scientists, may provide an opportunity to apply these features more effectively. This study illustrated the potential for success of such a team of a plant scientists and hyperspectral scientists.

All these results were obtained from a specific selected mockorange genotype. Application of hyperspectral imaging was successfully completed for shoots from this little-leaf mockorange grown in vitro, but the success of this method for other mockorange species as well as other plant species still needs to be tested.

This study showed that hyperspectral imaging could help to predict foliar nutrient contents (N and Ca particularly) of little-leaf mockorange shoots produced in tissue culture and could help to avoid destructive methods of foliar mineral analysis. This nondestructive method, can save tissue culture producers the time necessary for drying, grinding, sending the samples off to a tissue analysis lab, and then waiting for the analysis, and the money by avoiding paying for shipping and foliar tissue analyses, enabling producers to save money.

However, it is highly recommended to employ hyperspectral imaging on a larger number of samples to enhance data collection and minimize potential errors. This approach facilitates a more robust reliance on correlations by increasing the dataset. Moreover, conducting additional experiments analyzing nitrogen content in shoot cultures and incorporating these findings into modeling experiments can refine the random forest model. This necessitates further data analysis to validate the model's efficacy.

Additionally, it is advisable to extend the application of imaging techniques to monitor a broader range of plant species, particularly those with substantial foliage in their tissue culture scales.

Conclusion

This study demonstrated that strong regression models could be developed to predict N and Ca contents of tissue cultured little-leaf mockorange shoots. The best features to estimate %N were reflectance values at the wavelength of 648 nm, asymmetric point from 1819 to 2150 nm (Asy 11) and the area from 559 to 772 nm (Area 3), and reflectance at wavelength of 1919 nm. These features were used in a nonparametric (non-linear) model, with RF regression to provide the best model for estimation of foliar %N content. Best features to estimate %Ca in the shoots were minimum reflectance from 838 to 843 nm (Min 4), area from 2428 to 2490 nm (Area 15), asymmetric point from 1670 to 1714 nm (Asy 9), and Cellulose Absorption Index (CAI). Random forest regression provided a more accurate model to estimate %Ca than the other regression models. The best RF regression model for %N in little-leaf mockorange shoots resulted in an R2 = 0.72 and correlation = 0.84. Likewise, the best RF model for %Ca estimation resulted in an R2 = 0.99 and correlation = 0.99. These strong statistical values clearly demonstrated that hyperspectral imaging can be used to predict accurately %N and %Ca in tissue cultured shoots from one selected little-leaf mockorange genotype. Other mockorange species as well as other plant species produced in tissue culture would need to be tested to validate using hyperspectral imaging to predict N and Ca contents of their shoots.