Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Vibrational Spectroscopy Defined

The region of the electromagnetic spectrum that draws our interest is between the ultraviolet (lower end of 400 nm wavelength) and the far-infrared (upper end of 50,000 nm). This region encompasses the visible (400–780 nm), near-infrared (780–2,500 nm), and mid-IR (2,500–25,000 nm) regions. Flanking this large swath of wavelengths are gamma rays (~0.001 nm) and X-rays (~0.01 nm) on the short end, and microwaves (~107 nm) radio waves (~1010 nm) on the long end (Fig. 3.1). The fact that information on molecular structure is contained in this region, particularly that in the mid-IR, can be deduced by the wave-particle principles of quantum theory, starting with the expression for the energy of a photon,

Fig. 3.1
figure 1

Electromagnetic spectrum, highlighting the region used in hyperspectral imaging (400–1,700 nm)

$$ E=hv $$
(3.1)

Where E is the photon’s energy, v is the frequency of the wave, and h is Planck’s constant. We see that the energy of a photon is directly proportional to its frequency.

We also recall that the wavelength (λ) and frequency (v) are inversely related to each other, with their product being the speed of light (c) in the medium that the light is passing through,

$$ c=\lambda v. $$
(3.2)

Because of the underlying quantum theory of band vibrations, spectroscopists typically identify band locations in terms of a modified form of frequency defined as the number of wave cycles within a fixed distance. By convention, the distance is a centimeter, so that the term, wavenumber having units of cm−1, can be thought of as the number of complete wave cycles in a 1 cm thickness. Physicists and engineers, on the other hand, typically speak in terms of wavelength, and the unit of choice for the visible and near-infrared region is the nanometer, which is one billionth (10−9) of a meter. Because of their reciprocal relationship, conversion between wavelength and wavenumber or vice versa is a matter of multiplying the reciprocal by 1 × 107. Because the popularization of near-infrared measurement and analysis arose from the physicist/engineering community, whereas qualitative analysis using the mid-IR region arose from the spectroscopist, we continue today with this dichotomy in absorption band assignment. Although the conversion between the two scales is routine, it is important to remember that if the increment between neighboring readings from an instrument is uniform in one scale, it will not be in the other. With the gaining popularity of Fourier transform (FT) near-infrared spectrometers, whose scale is based on wavenumbers, it is especially important to keep this in mind when comparing FT spectra with conventional monochromator-based dispersive spectrometers whose basis is uniform spacing in the wavelength domain.

Quantum theory dictates that the absorption of light by a molecule comes about by discrete changes in energy levels (quantum levels) that, for the mid-IR region, happen when an inter-atomic bond within the molecule absorbs energy that equals the difference between two adjacent quantum levels. Taking a diatomic (two-atom) molecule such as carbon monoxide as an example, the vibrational frequency at which the bond expands and contracts is set by the selection rules of quantum theory. These rules also apply to more complex, polyatomic molecules.

2 Inter Atomic Bond

2.1 Theory

The starting point for modeling atomic bond vibrations is usually the harmonic oscillator described by classical mechanics. In this model, two atoms are bonded by a restoring force that is linearly related to their bond distance. In its simplest form, a bond between two atoms is modeled as a spring connecting two spherical masses, m 1 and m 2 . The potential energy of this two ball assembly, V, depends on the displacement of the masses with respect to their rest positions, caused by either compression or elongation of the spring,

$$ V = \frac{1}{2}k{\left(x-{x}_{rest}\right)}^2 $$
(3.3)

where \( \left(x-{x}_{rest}\right) \) is the distance between the centers of masses and k is force constant of the spring. In this simple model potential energy varies in a quadratic relation with distance to form a parabolic shape, as demonstrated in Fig. 3.2. Two problems become readily apparent when using this model to approximate molecular behavior. First, limits must be placed on the distance of compression, as atoms are of physical mass and dimension, such that it is not possible for the atoms to have a zero compression distance. Second, a bond between atoms may only elongate so far before the atoms disassociate.

Fig. 3.2
figure 2

Potential function for two bonded atoms

A third problem, which was not adequately addressed until the introduction of quantum mechanical theory in the 1920s, is explained by first considering the total energy of the system, which is the sum of the potential energy (V) and kinetic energy. With the latter written in terms of momentum (p), the total energy (E) of the system is

$$ E = \frac{p^2}{2m} + V $$
(3.4)

where m is the total mass of the system. Classical mechanics allows the energy to take on a continuum of values, but this turns out to be impermissible in nature. This is explained by the Heisenberg uncertainty principle, part of which states that for a given direction it is not possible to know position and momentum simultaneously. Related to this is the restriction that energy is quantized, which means that at a specific frequency the energy of the oscillator is limited to discrete, i.e., quantum, levels, υ. Solution of the wavefunction form of the harmonic oscillator becomes,

$$ {E}_{\upsilon }=hv\left(\upsilon +\frac{1}{2}\right) $$
(3.5)

in which E υ is the energy of the υth quantum level \( \left(\upsilon = 0,1,2,\dots \right) \) and v is the fundamental frequency of the vibration, which is related to the force constant of the bond (k) and the reduced mass (μ) by

$$ v=\frac{1}{2\pi}\sqrt{\frac{k}{\mu }} $$
(3.6)

recalling that the reduced mass of a diatomic molecule defined as \( 1/\mu =1/{m}_1+1/{m}_2, \) where m 1 and m 2 are the masses of the atoms. Classical mechanics theory produces the result that like atomic bonds within a molecule vibrate in phase at these fundamental or normal frequencies, with the number of unique vibrational frequencies related to the size (i.e., number of atoms = N) of the molecule by the relation (3 N−6). Taking a triatomic molecule such as water for example as shown in Fig. 3.3, three modes of vibration are possible: symmetric stretching (both hydrogen atoms moving toward and away from the central oxygen atom in tandem), asymmetrical stretching (one hydrogen moving away at the same time as the other moving closer to the oxygen), and bending (hydrogen atoms moving toward and away from each other). Actual vibrational behavior of water is far more complicated, as we shall see below.

Fig. 3.3
figure 3

Modes of vibration for a single water molecule

Vibrations between bonded atoms occur when the energy of the photon matches that of the difference between energy levels of two sequential quantum levels of the bond. For the electrical field to impart its energy into the molecule a polar distribution of charge, or dipole, must exist or be induced to exist across the bond. The jump between the ground state \( \left(\upsilon =0\right) \) and the first level of excitation \( \left(\upsilon = 1\right) \) characterizes the fundamental vibrations across the mid-infrared region, this being from 4,000 cm−1 to 400 cm−1 (2,500–25,000 nm).

It turns out that the energy relationship of Eq. 3.5 can be used to describe bond behavior for small values of the vibrational quantum number, corresponding to the bottom region of the energy curve (Fig. 3.2) where there is near symmetry between left and right sides. For larger quantum levels, the energy relation is more complicated, such that the nonsymmetrical Morse-type function \( \left[{\left(1-{e}^{-c\left(x-{x}_{rest}\right)}\right)}^2\right] \), also shown in Fig. 3.2, is used to incorporate the features of mechanical and electrical anharmonicity. Mechanical anharmonicity arises from the fact that because of the atoms’ dimensions and mass there are physical limits to the separation distance between bonded atoms that preclude them to being too close (overlapping) or too distant (disassociating). Electrical anharmonicity arises from a nonuniform change in dipole moment with change in distance between bonded atoms. Unlike the parabolic nature of the mechanical model, the Morse function allows for dissociation of the two atoms as the energy level is increased. The solution to the wavefunction becomes

$$ {E}_{\upsilon }=hv\left(\upsilon +\frac{1}{2}\right)-xhv{\left(\upsilon +\frac{1}{2}\right)}^2 $$
(3.7)

The variable x is the anharmonicity constant. The presence of anharmonicity allows for (1) overtone transitions, which arise from a change between nonadjacent vibrational quantum levels (e.g., ∣Δυ∣ > 1); (2) combination bands, which occur when the energy from one photon produces simultaneous changes in quantum levels of two or more different vibrational modes; and (3) unequal differences between energy levels of the quantum states as described by Miller (2001). These allowances would otherwise be forbidden under a set of conditions known as selection rules that arise from group theory in quantum mechanics (Wilson et al. 1985). The significance of these allowances becomes apparent when we shift away from the fundamental vibrations of the mid-IR region to the overtone and combination vibrations of the near-IR. To a first approximation, the frequencies of the overtone bands are integer multiples of the corresponding fundamental frequency, with each higher overtone (first, second, …) being weaker than the preceding. Thus, absorptions from overtone vibrations of the same bond become progressively weaker as wavelength is decreased. Combination bands involving CH, NH, and OH tend to be longer in wavelength than overtones, though with overlap between the two band types. Taking water in the liquid state as an example, its two most prominent bands in the near-IR region are a combination band (\( {\upsilon}_2+{\upsilon}_3 \) = asymmetric stretch + bending) occurring at ~1,910 nm and a first overtone of OH (\( {\upsilon}_1+{\upsilon}_3 \)) occurring at 1,460 nm, where it is noted that the location of these bands as well as the other combination bands and higher overtones are strongly influenced by temperature through changes in hydrogen bonding. Further complication arises with water absorbed in biological matrices whereby hydrogen bonding can occur between water, polysaccharide, lipid, and protein molecules. Considering wheat starch and microcrystalline cellulose separately equilibrated to 53 % RH as examples (Delwiche et al. 1992), the wavelength location of the prominent 1,900 water combination band peak decreases by ~17 and 11 nm, respectively, as temperature increases from −80 to 60 °C, which causes a reduction in the hydrogen bond strength between water and the matrix (Fig. 3.4).

Fig. 3.4
figure 4

Temperature effect on the 1,940 nm water combination band in moisture containing starch and cellulose

2.2 Practical Ramifications for the Near-IR Region

Because of the departures from the ideal case of the harmonic oscillator we are left with the inherent complexities, which are fortunate opportunities of near-IR spectroscopy. Three general statements are made to underscore the limits and power of the near-IR:

  1. 1.

    Compared to the fundamental absorption bands of the mid IR region, absorption bands of the near-IR region are weak. What appears to be a detriment for the NIR analyst on first glance is actually a boon because materials that are examined in either transmission or reflection modes do not have to be diluted beforehand, as is the typical procedure for mid IR analysis. At most, especially for reflectance measurement, the material is ground into fine particles thereby reducing the heterogeneity of the sample caused by spatial differences in either chemical or physical structure.

  2. 2.

    The near-IR region is primarily composed of the overtone and combination bands arising from bonds involving the lightest atom, hydrogen. Typically, these include the bonds C–H, O–H, and N–H, all of which are prevalent in organic molecules. Thus, near-IR analysis is especially well suited for the studies involving agricultural, biomedical, pharmaceutical, and petrochemical materials.

  3. 3.

    Because hydrogen is much lighter than any other atom, its bonds with carbon, oxygen, and nitrogen produce vibrational movements that cause the largest motion for the hydrogen atoms, thus making vibrational movements localized to the functional group. Intrachain bond vibrations such as C–C are not active in the near-IR region.

  4. 4.

    Due to the multitude of overlapping bands in the near-IR region, the exact assignment of a bond’s vibration to a wavelength or frequency is seldom possible, thus making near-IR spectroscopy a poor candidate for qualitative analysis. By the same token, however, quantitative analysis such as the concentration of chemical group is possible through the power of advanced regression algorithms.

The exact location and magnitude of overtone and combination bond vibrations for functional groups are very difficult to determine because of the effects from anharmonicity and dipole moment changes. As a rule of thumb, bond strength and reduced mass determine the band frequency location, while the dipole moment and anharmonicity affect the band’s magnitude. Other factors such as hydrogen bonding and neighboring groups will have secondary effects on location and magnitude.

3 Light Absorption in a Scattering Medium

The realization that electromagnetic radiation behaves in both corpuscle and waveform conditions has led to several theoretical models to describe the infrared spectral response. We will consider some of these in brief format, with the reader directed to particular seminal texts for more detail.

3.1 Light Without Scattering

This is the most common model used in calculating concentration of a solute in gases and clear liquids, in other words media with negligent scattering. The theory was originally developed by Pierre Bouguer (1729) and Johann Lambert (1760) independently, and later expanded by August Beer (1852) to include substances of varying concentrations within the media. In the translated words (from Latin) of Lambert’s Theorem 68, “The logarithm of the remaining light, when it is weakened in a less transparent medium, is related by a ratio to the maximum of all the intermediary obstructing material of that medium, which it encounters along its path, and to whatever manner in which the obstructing material may be disseminated in the medium and whatever the curvature of the path is.” (p. 391). With inclusion of Beer’s contribution, the law states that the intensity I of light decreases exponentially with penetration distance d and the concentration of the compound of interest [J],

$$ \frac{I(d)}{I\left(d=0\right)} = {e}^{-k\left[J\right]d} $$
(3.8)

where k is an absorption coefficient (formerly called the extinction coefficient when dealing with molar concentrations). Equation 3.8 is universally referred to as the Beer-Lambert Law, secondarily as Beer’s law, and lastly, but seldom, the Beer-Lambert-Bouguer Law. A more familiar format appears when the base 10 logarithm of each side is taken,

$$ A= \log \left(\frac{1}{T}\right)={k}^{\prime}\left[J\right]d $$
(3.9)

in which we write transmittance (T) as the ratio of the intensity of the light at depth d to that at the surface. In practice, the concentration c may be written in terms of molarity (n solute/volumesolution), mole fraction (n solute/n solvent), or mass fraction (masssolute/masssolution) with the units for k selected accordingly so that the right hand side product is dimensionless. Direct application of Eq. 3.9 occurs with the use of simple spectrophotometers in which cuvettes of precise dimension are used to measure the intensity of transmitted light through a clear solution at a single wavelength, typically in the UV region. Upon the development of the calibration curve, absorbance is directly related to the concentration of a conjugate from a biochemical assay. The complexities of natural materials, such as plant and animals, leads to a stretching of the rules for the Beer-Lambert equation, yet conveniently and frequently with success.

In diffuse reflectance analysis, liberties are taken with the Beer-Lambert law to allow the substitution of reflected or, in Dahm and Dahm’s (2007) terminology, remitted energy for transmitted energy. A simplified representation of diffuse reflectance is shown in Fig. 3.5. In this case, white or monochromatic light is collimated and then directed onto a the surface of a sample, whereupon the light may (1) be directly reflected from the surface of the first particle that it encounters; (2) penetrate the first surface, followed by additional internal reflections and transmissions with other particles; (3) be remitted light from the surface that it entered; or (4) be absorbed by an atomic bond. Mathematical modeling of these phenomena is the subject of ongoing investigation (Dahm and Dahm 2007). In practice, terms on the right side of Eq. 3.9 are commonly lumped together and collectively referred to as the concentration of the compound of interest. Hence,

Fig. 3.5
figure 5

Schematic of light remittance

$$ \log \left(\frac{1}{R}\right)\approx \left[ analyte\right] $$
(3.10)

with square brackets indicating concentration. The implementation of this relationship implies that pathlength is constant across samples and other compounds or analytes are not interfering. Using ground wheat as an example, typical uncorrected log(1/R) spectra are shown for in Fig. 3.6a, b for the mid-IR and NIR regions, respectively. What is obvious from either plot is the lack of a clear baseline response despite the existence of wavenumber or wavelength regions of low spectral absorption. The non-horizontal behavior of these spectra, especially noticeable in the NIR region is caused by scatter. Because pathlength in a scattering medium is extremely difficult, if not impossible, to determine, the assumption of constant pathlength is favored when samples are of the same distribution in particle size makeup. A workaround to the particle size problem is to apply a mathematical correction to the log(1/R) spectrum, typically a multiplicative scatter (signal) correction (Martens and Næs 1989), a standard normal variate transformation (Barnes et al. 1989), or a first or higher order derivative, as explained in Sect. 3.4. The problem of interfering absorbers is addressed by considering responses at more than one wavelength, whereupon by using linear modeling methods (multiple linear regression, principal component, partial least squares) or nonlinear methods (artificial neural networks, support vector machines) quantitative models of sufficient accuracy are possible. Known collectively as chemometrics, extensive details on such algorithms for quantitative and qualitative analysis of spectra are the subject of several texts (Mark and Workman 2007; Naes et al. 2002; Varmuza and Filzmoser 2009; Jolliffe 2002; Cristianini and Shawe-Taylor 2000). One further simplification that is often employed in remote sensing hyperspectral analysis is the use of reflectance R directly, in which it is assumed that the degree of nonlinearity between R and its log reciprocal transform is negligible [for example, over the reflectance range 0.2–0.8, the coefficient of determination with log(1/R) is 0.97].

Fig. 3.6
figure 6

Mid IR and near-IR spectra of ground wheat

3.2 Kubelka-Munk

Unlike the Beer-Lambert-Bouguer theory that was based on transmission and adopted for diffuse reflectance, the theory known as Kubelka-Munk (K-M) is fundamentally based on reflectance from a scattering medium, with its application primarily relegated to the paper and paints industries. As with Beer-Lambert-Bouguer, K-M theory is most appropriate for media and analytes of low absorption (Olinger et al. 2001). Originally developed by Kubelka and Munk (1931), the theory assumed light to be traveling through a continuum, a medium with no internal boundaries such as particle surfaces. Further, the light is modeled as having a forward flux and a backward flux. Diffuse illumination at the surface is also assumed and the medium scatters the radiation isotropically. The behavior of radiation is written as a combination of two constants, K and S, which have analogies to absorption and scatter. With solution of coupled differential equations for forward and backward radiations, the well known Kubelka-Munk function was derived (Kortüm 1969)

$$ \mathrm{F}\left({R}_{\infty}\right)=\frac{K}{S}=\frac{{\left(1-{R}_{\infty}\right)}^2}{2{R}_{\infty }} $$
(3.11)

In this equation, R is the remitted radiation from a medium of infinite thickness, which may be experimentally determined by observing when the addition of depth to a sample produces no change in R. For the mid-IR and near-IR regions the infinite thickness is a reasonable assumption for thicknesses greater than several millimeters. As explained by Dahm and Dahm (2007), the problem with Kubelka-Munk equation in practice arises with the attempts at disentangling K and S. Ideally, one would like to treat K as a pure absorption coefficient in the same manner as Beer-Lambert so that concentrations of absorbing compounds can be accurately modeled. In reality, the equation falls short for reasons of (1) a two flux model overly simplifying light direction; (2) specimens are not diffusely illuminated, but instead illuminated with collimated light; (3) the medium is not a continuum but instead consists of discrete particles that individually reflect and refract light; and (4) given that the instrument measures remitted energy, the expression does not inherently provide a means to separate absorption, hence concentration of a component, from scatter. With respect to log(1/R), F(R) is more greatly affected by baseline errors (Griffiths 1995). Also, from experimental measurements of reflectance from a three component mixture of varying proportions (NaCl as the nonabsorbing matrix, graphite as a general absorbing compound, and carbazole as a typical organic analyte possessing both C–H and N–H bonds), Olinger and Griffiths (1988) reasoned that the linearity of F(R ) with concentration of an absorbing compound (carbazole in this case) is highest when the matrix is nonabsorbing because the photons have more opportunity to undergo interactions with many particles before leaving the sample surface. Linearity drops off as the matrix becomes more absorbing, as they observed when 5 % graphite by weight was added to the NaCl matrix. Conversely, the theoretical lack of a dependency of the linearity of log(1/R) with many particle interactions for a photon is the reasoning behind the general better performance of log(1/R) in diffuse reflectance spectroscopy of powdered materials. Because of these limitations, the Kubelka-Munk theory is not commonly applied to near-IR diffuse reflectance spectroscopy of biological and agricultural materials.

3.3 Diffusion Theory

This is also a continuum approach that has gained use in modeling the decay of light in biological tissue and uses a mathematical model to derive an expression for remitted light as a function of a coefficient due to absorption (μa) and another coefficient due to scattering, called the transport scattering coefficient (μ s ), in which scattering is assumed to be isotropic (Farrell et al. 1992). An assumption of a highly scattering matrix allows for the development of a diffusion equation for photon propagation, as derived from the Boltzmann radiative transport equation. For the special condition of considering light as a point source directed onto a semi-infinite medium at a direction normal to the surface, the remitted radiation R(r), where r is the radial distance from the point of entry, can be derived (Farrell et al. 1992),

$$ R(r)=\frac{a^{\prime }}{4\pi}\left[\frac{1}{\mu_1^{\prime }}\left({\mu}_{\mathrm{eff}}+\frac{1}{r_1}\right)\frac{e^{-{\mu}_{\mathrm{eff}}{r}_1}}{r_1^2}+\left(\frac{1}{\mu_1^{\prime }}+\frac{4A}{3{\mu}_t^{\prime }}\right)\left({\mu}_{\mathrm{eff}}+\frac{1}{r_2}\right)\frac{e^{-{\mu}_{\mathrm{eff}}{r}_2}}{r_2^2}\right] $$
(3.12)

where the transport albedo, \( {a}^{\prime }={\mu}_s^{\prime }/\left({\mu}_a+{\mu}_s^{\prime}\right) \), the effective attenuation coefficient, \( {\mu}_{\mathrm{eff}}={\left[3{\mu}_a\left({\mu}_a+{\mu}_s^{\prime}\right)\right]}^{1/2} \), \( {r}_1 = {\left[{\left(1/{\mu}_t^{\prime}\right)}^2+{r}^2\right]}^{1/2} \), and \( {r}_2 = {\left[{\left(1/{\mu}_t^{\prime }+4A/3{\mu}_t^{\prime}\right)}^2+{r}^2\right]}^{1/2} \). Further, the total interaction coefficient, \( {\mu}_t^{\prime }={\mu}_a+{\mu}_s^{\prime } \), and A is a parameter that is related to the internal reflection and is derived from the Fresnel reflection coefficients. In practice, A may be determined empirically as a function of the relative refractive index (Groenhuis et al. 1983), and with additional simplification it may be treated as a constant. With this assumption the right side of Eq. 3.12 becomes an expression of only two terms, the absorption coefficient μa and the transport scattering coefficient μ s . Among other features, the diffusion theory approach differs from K-M in that absorption and scattering are mathematically decoupled.

From experimental measurements of R(r) at various radial positions and inverse application of Eq. 3.12, separate values for μa and μ s are determined over the wavelength range of interest, thus producing separate absorption spectra and scattering spectra. Lu and coworkers (Qin and Lu 2008; Lu et al. 2010) developed this approach using line scan hyperspectral imaging (λ = 500–1,000 nm) to nondestructively examine ripeness in tomatoes (Qin and Lu 2008), bruising in apples (Lu et al. 2010), and mechanical damage in pickling cucumbers (Lu et al. 2011).

4 Practical Outcomes for Near Infrared Reflectance

Because log(1/R) is most commonly used in NIR spectroscopy the following discussion will assume this format, though many of the transformations may be applied to the other formats of reflectance data mentioned in the previous section. Broadly termed as spectra preprocessing, these transformations are performed for improving signal-to-noise and minimizing the effect of scatter often with the expectation that band intensities become more linearly related to the concentration of the absorbing compound. Improvement in signal-to-noise is typically performed by a smoothing operation, such as a running mean,

$$ {\overline{A}}_j = \frac{A_{j-l} + \dots + {A}_{j-1}+{A}_j+{A}_{j+1}+\dots +{A}_{j+l}}{2j+1} $$
(3.13)

where A j is the original spectral value at wavelength j and Ā j is the mean value as determined from the original value as well as the neighboring l points on the left and the same number of points on the right. Selection of the size for l should be based on the inherent bandpass of the spectrometer (typically 10 nm for dispersive scanning monochromators) and the size of the absorption band of interest. In practice, the value is selected by trial and error, with too small a value yielding insufficient noise reduction and too large a value attenuating the higher frequency absorption bands. Using simulated spectral data, Brown and Wentzell (1999) warned of the deleterious effect of smoothing on multivariate calibrations such as principal component (PC) regression. Also, they noted that smoothing has the greatest chance of being beneficial under conditions of high measurement noise and wavelength-to-wavelength correlation, in which case the improvement in characterization of the spectral subspace through PC reduction offsets the losses caused by spectral distortion.

Spectral derivatives, or more accurately stated as spectral differences, usually of the first and second order are applied for removal of vertical offset and slope effects. In simplest form, these are two-point (first) and three-point (second) central finite difference expressions. Although the points need not be consecutive (in which case the difference becomes a poor approximation to the true derivative but nevertheless may produce a better calibration), the intervals between the end points and the central point should be equal. With these Δy/Δx and Δ2y/Δx2 difference expressions, it is common to omit the denominator term when chemometric modeling is the goal. The omission is not a problem unless one is trying to accurately show the values for the derivative spectrum for its own sake or one is attempting to compare derivative spectra possessing different values for Δx.

A more common form of spectral differentiation is the Savitzky-Golay polynomial approximation procedure, as first popularized by the authors (Savitzky and Golay 1964). Using a sliding window along the wavelength axis, and assuming a constant wavelength spacing, the window of points (typically an odd number between 5 and 25) is fitted using least squares regression to a polynomial of second through sixth order, whereupon the analytical derivative of the polynomial function is evaluated at each point. The procedure can be computationally simplified to a convolution operation using the same number of points in the window and using published coefficient values, as reported in their original paper and later corrected by Steinier et al. (1972). Spectral differentiation is shown by example in Fig. 3.7. A ‘spectrum’ has been created by adding two Gaussian functions, with one being twice the magnitude and twice the width of the other, to a sloped line and then adding random noise. In this simple example, one sees a replacement of the vertical offset in the absorbance curve with a small offset (=1/1,500 on average, i.e., the slope of the upwardly trending line in the absorbance curve) in the first derivative curve, which disappears altogether in the second derivative. The two local absorbance maxima become zero crossing points in the first derivative curve but reappear as local minima in the second derivative curve. This is typical behavior and explains why the second derivative is generally easier to interpret than the first derivative. However, interpretation of the second derivative is tricky even without considering the complexities of vibrational physics. For instance, in the simple example of Fig. 3.7, the Gaussian band at the lower wavelength, despite being half the magnitude of the other Gaussian band, appears as having a larger absolute value in second derivative. Because the lower wavelength band has half the width of the upper band its curvature and hence second derivative magnitude is greater. Secondly, derivatives have the tendency to amplify noise, as seen in the progression of a smooth appearance in the absorbance curve (Fig. 3.7a) to a first derivative curve with noise of pen width in magnitude (Fig. 3.7b), and on to a second derivative curve with very noticeable noise (Fig. 3.7c). In practice, the noise amplification effect is not as pronounced because the ‘noise’ in a spectrum is not entirely random as in the artificial spectrum of Fig. 3.7, but instead may largely consist of baseline drift, which has low-frequency dominance in the noise power spectrum. The low-frequency nature means that noise levels of neighboring wavelengths are not fully independent. Brown et al. (2000) investigated drift noise through simulation and found that derivatives may reduce drift noise, but at the same time spectra can be distorted with respect to the underlying chemical constituents, thus making it difficult to predict the benefit of this preprocessing technique in multivariate calibrations. For example, if the left Gaussian band in Fig. 3.7 was centered at 1,750 nm rather than 1,700 nm, the resulting absorbance spectrum (Fig. 3.8a) would appear as one broad but asymmetrical band superimposed on the upwardly trending baseline curve. Upon first differentiation using the same SG convolution function the two zero crossing points are replaced by one located at approximately 1,770 nm (Fig. 3.8b), which is between the absorbance peak positions of 1,750 and 1,800 nm. The second derivative (again with the same function as originally used) has local minima at 1,750 (the same as original) and 1,814 nm (Fig. 3.8c), which is 14 nm longer than the absorbance peak position. This helps to explain why NIR calibrations are often trial and error operations that are enhanced by a priori knowledge of the locations and magnitudes of bands associated with the analyte of interest.

Fig. 3.7
figure 7

Demonstration of spectral differentiation. (a) Artificial spectrum created by combining two Gaussian bands (full widths at half maximum of 50 and 100 nm, with peak values of 0.25 and 0.50, respectively, and centered at 1,700 and 1,800 nm, respectively) to a sloped line [y(1,000 nm) = 0.2, y(2,500 nm) = 1.2], then adding random noise (−0.0004 to 0.0004 peak to peak, uniformly distributed). (b) Savitzky-Golay first derivative (cubic polynomial, 11 point convolution window). (c) Savitzky-Golay second derivative (cubic polynomial, 11 point convolution window)

Fig. 3.8
figure 8

Demonstration of spectral distortion by differentiation. The conditions that produced graphs (ac) are identical to those of Fig. 3.7 with exception that the low wavelength absorption peak is located at 1,750 nm and the dashed lines are moved to his location

Because of the complexities of the scatter-absorption effect, separation of these components through theoretical means as described earlier is often replaced by working corrections as part of spectral preprocessing. The two most common full spectrum approaches are the multiplicative signal (scatter) correction (MSC) and the standard normal variate transformation (SNV). With MSC, as popularized by Martens (see Geladi et al. 1985), a sample’s reflectance spectrum is corrected to have roughly the same degree of scatter as the other samples within the calibration set. The usual procedure is to calculate the mean spectrum of the calibration set and then for every sample within the set, a least squares correction (most often a first order polynomial but this can be of higher order) is developed by regressing the spectrum’s points onto those of the mean spectrum. The regression coefficients are then used to ‘correct’ the spectrum to the mean spectrum. This has the noticeable effect of collapsing spectra together so that under ideal conditions all sample-to-sample spectral differences are attributed to chemical absorption. This transformation requires the retention of the reference (mean) spectrum in order to correct future spectra before the calibration equation is applied. An example of this transformation is shown in Fig. 3.9, which consists of spectra of 198 samples of ground wheat, first with no transform (Fig. 3.9a) and next with MSC (Fig. 3.9b). Alternatively, one may conduct a scatter correction that is based on each spectrum independently. Known as the standard normal variate (SNV) transformation, this correction has a format similar to that of standard error in statistics, this being that within each spectrum the mean value (over the wavelength region) is subtracted from each spectral value and this difference is then divided by the standard deviation of the spectrum’s values (Barnes et al. 1989), as shown by example in Fig. 3.9c. This results in each transformed spectrum having a mean of zero and a standard deviation of unity. As with MSC, the intention is that the benefits of the SNV correction through reduction of variation from scatter outweigh whatever losses in chemical information that result from spectral distortion.

Fig. 3.9
figure 9

Example of spectral scatter removal techniques using a set of 198 ground wheat samples. (a) Raw log(1/R) spectra. (b) With multiplicative scatter correction (MSC). (c) With standard normal variate (SNV)

5 Application to Imaging

The principles of NIR spectroscopy carry over to NIR hyperspectral imaging. With the latter, the measurement of energy remitted from a broad surface, as read by one or a set of detectors, is replaced by image measurement from a camera sensor array in which each element or pixel captures energy from a small region of the sample surface. The spectral dimension arises from one of two general formats, a liquid crystal tunable filter employed to capture two-dimensional images at a series of tuned wavelengths, or a dispersion device called a spectrograph that is placed between the lens and body of the camera. Between the spectrograph and the lens is a slit that reduces the focused image to that of a narrow line. The line of light is passed onto the spectrograph, which then disperses the light from each ‘point’ along the line to a series of wavelengths. By methodically advancing the location of the line on the object, either by moving the camera or moving the object, additional lines are imaged until the entire object has been scanned and reproduced as a mosaic of lines.

In its simplest and most common form, camera array readings are referenced to a highly reflecting Lambertian material such as ‘Spectralon’ (Labsphere, North Sutton, NH) while also being corrected for dark current of the sensor. In such case reflectance (R) becomes

$$ R=\frac{E_{sample}-{E}_{dark}}{E_{reference}-{E}_{dark}} $$
(3.14)

where E x is the energy from each x component. The reference material is treated as being 100 % reflective and Eq. 3.14 assumes a linear response for sample reflectance. Alternatively, sample reflectance may be determined using a higher order polynomial to describe the response (Burger and Geladi 2005). In such cases, a set of reflectance standards with traceable reflectance values (typically 3–8 samples of Spectralon doped with carbon black) whose reflectances span the anticipated range of the samples is used to develop a calibration equation. For example, assuming a quadratic response, reflectance is written (Burger and Geladi 2005),

$$ R = {b}_0+{b}_1E+{b}_2{E}^2. $$
(3.15)

During calibration, the left side values of Eq. 3.15 are known for the reflectance standards, E is measured for each standard, and the coefficients b 0, b 1, and b 2 are determined by least squares regression. The regression procedure is performed at each wavelength, which may be done on a pixel-by-pixel basis or globally using the median spectrum, as determined from pixels within a region of interest.

5.1 Collection of the Hypercube

Hyperspectral systems fall into two broad categories depending on the method of light dispersion. As illustrated in Fig. 3.10a, the tunable filter system collects two dimensional spatial images at each ‘tuned’ wavelength. This results in a stack of spatial images with each page in the stack representing a separate wavelength. The push broom system (Fig. 3.10b) builds up one dimensional spatial spectral pages. Regardless of mode of instrument operation, the stored data, known as a hypercube, consists of one spectral and two spatial dimensions.

Fig. 3.10
figure 10

Schematic of the two modes for hyperspectral image collection. (a) Stacked wavelength—at a given time instant the camera records two spatial dimensions (x and y) at one passband (λ) of a liquid crystal tunable filter. Recording continues with the next passband. (b) Pushbroom—at a given time instant the camera records one spatial (y) and one spectral (λ) dimension, where the spectral component is created by radiation dispersion through a spectrograph located between a slit and the camera body. Recording continues with relative movement of the object in a direction (x) perpendicular to the other spatial direction

By way of example, a set of 81 spectra collected from an approximately square (9 pixel × 9 pixel) region of a wheat kernel using a push broom hyperspectral imaging system is shown in Fig. 3.11. (Details of the system and settings are found in Delwiche et al. 2012). A digital photograph of a wheat kernel is added as an inset in Fig. 3.11 for the purpose of showing the approximate location and size of the square region. The 9-element width is just a small portion of the line, which consists of 320 elements. Individual pixel spectra have a much higher level of noise than those from a conventional spectrometer, as seen by comparing Figs. 3.11 and 3.9a. Averaging of all pixel spectra within the square region results in reduction in noise (Fig. 3.11, solid black curve), albeit at the expense of fine feature detail of the individual pixels. It should be noted that whether it be on an individual pixel level or on a regional level, the principals of spectroscopy as well as the mathematical transformations leading up to and including qualitative and quantitative modeling hold true for hyperspectral imaging.

Fig. 3.11
figure 11

Pixel reflectance spectra and average spectrum from a 9 × 9 pixel square region of a wheat kernel

6 Raman

Raman spectroscopy is based on the property of the photons from light of a very narrow frequency (e.g., a laser) striking a sample whereupon while most of the photons’ energy momentarily raises the energy state of the molecules but then is released as the molecule returns to its ground state. A small fraction of photons, however, release a portion of their energy to the molecule whereby the bond is not returned to its ground state, and further the photon emerges at a lower energy and hence lower frequency. This phenomenon is known as Stokes scattering. The Stokes shift is a measurement of the difference in frequency of the incident and emergent photon. Oppositely, photons may pick up energy from molecules already at a higher than ground state in the matrix as they return to a lower state. In this case when the photon is released from the medium after scattering, the frequency is greater than the incident frequency. This is known as anti-Stokes scattering and occurs at an even lower prevalence than Stokes scattering because there are relatively few molecules already in an excited state. As with infrared spectroscopy selection rules exist for a Raman transition, these based on the requirement that the polarizability of light changes as the molecule vibrates. Traditionally, Raman spectroscopy has been used in determining force constants, dissociation energies, and bond lengths. The change in polarizability determines the intensity of the bands in the Raman spectrum. Further, the intensity is proportional to the fourth power of monochromatic light of excitation upon shifting,

$$ {I}_{Raman}\propto {\left({v}_0\pm {v}_j\right)}^4{\left(\frac{d\alpha }{dQ}\right)}^2 $$
(3.16)

where v 0 and v j are the original and scattered light frequencies, respectively, and the squared term is the change in polarizability that occurs during vibration. Knowledge of this relationship is useful for two reasons. First, it shows that the frequency of monochromatic light is not fixed by the Raman effect, but rather Raman spectra may be obtained at any number of frequencies. In practice, the monochromatic sources are supplied by lasers, for which the two most popular are the infrared diode laser at 785 nm (12,740 cm−1) and Nd:YAG at 1,064 nm (9,400 cm−1). Second, Raman intensity diminishes with increase in the wavelength of the laser source by a fourth order relationship. Hence, without complicating factors, a shorter wavelength source would be preferable. In reality, fluorescence, which is often prevalent at low wavelengths, becomes the complicating factor. Because Raman signals are inherently weak, fluorescence emission can at times overpower the Stokes lines. This is particularly problematic with botanical samples. Conversely, fluorescence can be avoided by exciting at longer wavelengths, such as with a Nd:YAG laser, but at the expense of reduced Raman intensity.

Raman spectroscopy and infrared spectroscopy, though both based on molecular vibration, are complementary to one another. Bonds that exhibit strong absorption in the infrared, such as water, will typically be weak in the Raman and vice versa. Thus for biological samples, which typically have more than 50 % water by mass, Raman spectroscopy offers a means to examine molecular structure in situ.