Introduction

Aroma compounds, which have been widely used in food, perfume, medicine, and tobacco, among other products, are economically important. Accurate identification of aroma compounds is a prerequisite for quality control and product development [13]. In recent decades, gas chromatography–mass spectrometry (GC–MS) has been proved to be a useful tool for identification of volatile and semi-volatile organic compounds because of its high separation performance and accurate identification capability [47]. The main purpose of GC–MS analysis is to resolve mixtures of compounds into less complex mixtures or, ultimately, pure components. Usually, compound identification is achieved by library search on the basis of comparison of experimental mass spectra with those stored in a suitable library (e.g., that of the National Institute of Standards and Technology, NIST) [8]. However, mass spectra cannot provide enough structural information to distinguish all compounds easily. In addition to matching of mass spectra, chromatographic retention data can serve as complementary information for positive identification of resolved components. The retention index (RI), a useful tool for qualitative identification, has been used by many analysts [913]. The RI was first proposed by Kováts in 1958 for isothermal chromatography and further developed by van den Dool and Kratz into the linear temperature-programmed retention index (LTPRI) for linear temperature programming. Because it is independent of operating conditions, except for stationary phase polarity [14, 15], the RI is very suitable for interlaboratory comparison and provides a feasible way of investigating potential mechanisms of chromatographic retention behavior. When RI are used for qualitative identification, both homemade RI and published RI databases can be used as reference. This approach avoids the use of time-consuming and expensive procedures in which identification is based on injection of pure compounds, especially for complex samples containing hundreds of components for which the corresponding standard pure compounds may not be available.

Nowadays, linear temperature-programmed GC is used for a wide range of applications, and LTPRI has thus become increasingly important for compound identification. Applications of this technique have increased substantially, because of the possibility of converting non-volatile or thermally unstable compounds into volatile and fairly stable derivatives. The use of RI in temperature-programmed GC has been reviewed by Gonzales [16]. Development of LTPRI databases for GC–MS analysis is extremely necessary, especially when mass spectral data are far more readily available than retention index data. In fact, since the early days of GC much effort has been devoted to standardization of the methods used to determine retention data, especially the retention index, with the main purpose being wider and more reliable use of collated published results for identification purposes. Several retention index databases are available [17, 18]. Worthy of note, among several retention index datasets, are compilations published as books and widely used as reference works, for example those authored by Jennings and Shibamoto [19] and by Adams [20]. However, most of the aforementioned index databases are collected from different published studies. Thus, it is worth noting that discrepancies may arise because the retention index data have been reported by many different laboratories and reported operating conditions are not always reliable. Certainly, the best RI database is that developed in a single laboratory with fixed experimental conditions, but this is not an easy task because it is time-consuming and expensive. For aroma compounds, the retention index database at http://www.flavornet.org/f_kovats.html has been widely used. However, the retention indices of this database were collected from many papers in recent decades, and only Kováts RI for isothermal conditions are included. To the best of our knowledge, there are very few comprehensive databases of LTPRI of aroma compounds measured on different stationary phases by a single laboratory. One exception is the study of Federica Bianchi in which an RI database of 250 food aroma volatiles on a polar stationary phase was created [21]. Moreover, only one polar stationary phase was used in that study. In addition, mixtures rather than pure standard compounds were used for measurement of retention indices, and identification of the volatile compounds was performed by comparing mass spectra with those stored in the NIST library. So, there is a problem of qualitative accuracy. In flavor and fragrance analysis, LTPRI in gas chromatographic analysis has been reviewed by Zellner et al. [22].

In this context, the purpose of this study was to develop an LTPRI database of over 300 aroma compounds and to use it for analysis of tobacco flavor. Alcohols, esters, aldehydes, ketones, ethers, organic acids, and phenols, among other compounds, were included in the database. Retention times of these aroma compounds on three capillary columns (HP-1, HP-5MS, and DB-225MS) of different polarity were determined; the C7–C30 n-alkanes were used as reference substances for LTPRI calculation. To evaluate the stability of the LTPRI, experiments were performed under different experimental conditions: heating rate (4, 6, 8 or 10 °C min−1), flow rate (1 or 2 mL min−1), and initial temperature (50, 60, 70, 80, or 90 °C). The results showed the LTPRI were stable under the different experimental conditions, differences being <10 retention index units for most compounds. Twenty-two compounds for which differences were >10 retention index units are discussed on the basis of molecular structure. We also studied the relationship between retention index and experimental conditions, and retention index normalization. Finally, the developed LTPRI database was used for analysis of tobacco flavor. The results obtained show that many chemical components could be identified by RI matching using this database.

Experimental

Instruments and Samples

GC–MS analysis was performed with an Agilent (USA) 7890A gas chromatograph equipped with an Agilent 5975C mass spectrometer. Three commonly used stationary phases of different polarity were used:

  • HP-5MS capillary column (30 m × 0.25 mm i.d., film thickness 0.25 μm; Agilent);

  • HP-1 capillary column (30 m × 0.25 mm i.d., film thickness 0.25 μm; Agilent); and

  • DB-225MS capillary column (30 m × 0.25 mm i.d., film thickness 0.25 μm; Agilent).

Helium (99.99 %, Changsha Industrial Gas) was used as carrier gas at a flow rate of 1.0 mL min−1. Split injection of 1 μL was conducted with a split ratio of 15:1. The oven temperature program was: initial temperature 50 °C for 1 min, increased at 6 °C min−1 to 280 °C which was maintained for 2 min, the precision of temperature control was ±0.1 °C. Mass spectrometry was performed in electron-impact (EI) mode, scan range 33–350, ionization energy 70 eV. The injector and ion source were kept at 250 and 230 °C, respectively. The MS was set to acquire data in “Raw” and “Scan” modes. The volatile compounds were identified by comparing their mass spectra with those stored in the National Institute of Standards and Technology (NIST), US Government library.

All standard aroma compounds were purchased from Shanghai H&K Flavor. C7–C30 n-alkanes were purchased from Sigma–Aldrich, Milan, Italy.

RI Calculation

Retention indices, with C7–C30 n-alkanes as reference substances, were calculated by use of the expression:

$$ {\text{RI}}(x) = 100 \times z + 100 \times \frac{{{\text{RT}}(x) - RT(z)}}{{{\text{RT}}(z + 1) - RT(z)}} $$
(1)

where RI(x) is the retention index of unknown compound x, z is the number of carbon atoms of the n-alkane eluting before unknown compound x, z + 1 is the number of carbon atoms of the n-alkane eluting after unknown compound x, RT(x) is the retention time of unknown compound x, RT(z) is the retention time of the n-alkane eluting before unknown compound x, and RT(z + 1) is the retention time of the n-alkane eluting after unknown compound x. All the indices were calculated for three replicate injections of the pure compounds.

Results and Discussion

Using the experimental conditions given in the section “Instruments and Samples”, GC–MS analysis was performed for the aroma compounds and the C7–C30 n-alkane reference compounds. On the basis of previous experience, the aroma compounds were divided into 50 groups, i.e. approximately six aroma compounds were injected at a time, to avoid peak overlapping. To assess the reproducibility of retention index values, each aroma compound was analyzed three times under the same experimental conditions. On the basis of the known mass spectra of the compounds analyzed, mass spectral matching was also used for further confirmation of the identity of the aroma compounds. The retention times of the aroma compounds and the C7–C30 reference compounds were then recorded and used to calculate retention indices. As suggested by Ettre, retention indices are reported as integers [23]. Finally, a retention index database of over 300 aroma compounds on three stationary phases was created. Also, taking into account that different GC experimental conditions may be used for identification of aroma compounds in other laboratories, we determined the retention indices under different conditions, e.g. gas flow rate, initial temperature, and heating rate, to evaluate the stability of the retention indices.

Effect of Stationary Phase Polarity

As we know, the mechanism of gas chromatographic retention is interaction between the solute and the stationary phase, thus, RI varies with stationary phase. The effect of stationary phase polarity on retention index has been studied for a long time, and retention index conversion methods have been proposed [2426]. However, standardization of the retention index remains an unsolved problem, especially conversion of retention index values between non-polar and strongly polar stationary phases. Therefore, the many retention data obtained by use of different stationary phases of different polarity are quite valuable for study of retention index standardization. Data mining studies of quantitative structure–retention relationships (QSRR) for the retention index on different stationary phases have also been reported [27, 28], but this kind of retention index database is quite uncommon, especially for aroma compounds with structural diversity. Much research has revealed large differences between RI values for the same compound obtained on stationary phases of different polarity. Furthermore, because the interaction between analytical solutes and polar stationary phases is more complicated than that between analytical solutes and non-polar stationary phases, the retention index on a polar stationary phase is more difficult to predict [29, 30].

In this paper, three frequently used stationary phases of different polarity were used. HP-1, HP5-MS, and DB-225MS are non-polar, weakly polar, and polar stationary phases, respectively. Retention indices on the three stationary phases are listed in Table 1. The precision of the retention indices was assessed by performing triplicate injections under fixed experimental conditions: differences of <1 retention index unit were observed for all the compounds. As expected, retention indices on the stationary phases HP-1 and HP5-MS were similar, because these two stationary phases are of similar polarity, although retention indices on HP5-MS were always slightly larger than those on HP-1. In contrast, the values of the retention indices on DB-225MS were much larger than those on HP-1 and HP5-MS, because of larger intermolecular forces between the solutes and the polar stationary phase. Interaction between solute and stationary phase depends not only on dispersion force, induction force, and steric effect but also on dipole–dipole interactions, i.e. a directional force on polar stationary phases [29]. Further research on the relationship between molecular structure and differences between RI will be performed in our next study.

Table 1 Retention indices of aroma compounds on three columns

A △RI of approximately 100 was usually obtained for homologous compounds differing by one carbon atom unit, confirming the reliability of the data [21]. On the basis of this relationship we estimated the reliability of the created RI database. For instance, the retention indices of nonanal, decanal, undecanal, and dodecanal were, respectively, 1,082, 1,183, 1,285 and 1,387 on HP-1, 1,104, 1,205, 1,306 and 1,408 on HP-5MS, and 1,283, 1,394, 1,475 and 1,613 on DB-225MS. We found that the relationship between number of carbon atoms and the corresponding retention index was very significant for HP-1 and HP-5MS but not so obvious for DB-225MS, because factors other than the number of carbon atoms are important in the retention process. △RI for homologous compounds under different experimental conditions were also checked, and similar results were obtained.

Effect of Experimental Conditions

Although RI are regarded as being independent of operating conditions, except for stationary phase polarity [21], there are reports of RI varying with column geometry, carrier gas flow-rate, initial temperature, and heating rate [24]. In fact, despite substantial improvements in instrumentation, acquisition of reliable retention indices may still be difficult. Consequently, misidentification may occur when retention indices are used as reference values, especially retention indices from different published studies. To overcome this problem, some authors have proposed use of techniques for conversion of retention indices obtained under different conditions to furnish standardized RI for interlaboratory comparison [24]. However, the problem of RI standardization has not yet been resolved, although many theories have been proposed on the basis of thermodynamics or quantitative structure–retention relationships.

In this work, taking into account that different experimental conditions may be required when different samples are analyzed, some experimental conditions which are often varied, gas flow rate, heating rate, and initial temperature, were considered during development of the retention index database. The HP-5MS capillary column was used for this experiment. RI of some representative aroma compounds under different experimental conditions are listed in Tables 2 and 3.

Table 2 Retention indices on HP-5MS, obtained by use of different heating rates
Table 3 Retention indices on HP-5MS, obtained by use of different initial temperatures

From Table 2 it is apparent the maximum difference between RI values for heating rates of 4, 6, 8, and 10 °C min−1 is 10 retention index units for most of the compounds. This indicates the retention indices are very stable when different heating rates are used, which ensures identification by use of RI matching is reliable. However, it is worth noting there are 22 compounds, including 2-naphthyl methyl ketone, benzophenone, humulene, and oxacyclohexadecan-2-one, for which △RI is >10 retention index units. All these 22 molecules contain a benzene ring or a macrocyclic substructure, but the reason for the difference is unclear. Details about molecular structure and the corresponding retention indices can be found in the supporting information (supplementary Table 5).

For most of the compounds, values of RI increase with the increasing heating rate. In contrast, the values of RI for acids decrease with increasing heating rate. For example, the values of RI for octanoic acid are 1,180, 1,179, 1,176, and 1,175 for heating rates of 4, 6, 8, and 10 °C min−1, respectively. These observations provide a chance for further data mining of retention indices.

Table 3 gives RI values for 51 representative compounds measured with initial temperatures of 50, 60, 70, 80, and 90 °C. It is apparent that changes of the retention indices are very small: the difference is below 10 retention index units for all the compounds. For most of the compounds, RI values increase with increasing initial temperature. However, the study of Boswell shows that the isothermal retention index of anthracene is highly dependent on the temperature at which it is measured. Use of linear temperature-programmed retention indices assumes that a given compound always elutes at the same position between the two bracketing n-alkanes, but this is clearly not the case when experimental conditions are changed. So, to investigate the effect of initial temperature on elution order, the change of elution order for different initial temperatures was studied on the basis of a simple ranking method. Fifty-one samples and n-alkanes were ranked by retention time; details can be found in the supporting information (supplementary Table 6). Two kinds of change in elution order were observed. The first is position changes between the two bracketing n-alkanes, for example, the positions of phenetole and ethyl caproate changed between C9 and C10. The second is that a given compound elutes in a different n-alkanes interval; for example, 3-hexenoic acid elutes between C10 and C11 when the initial temperature is 50 or 60 °C but between C9 or C10 when the initial temperature is 70, 80, or 90 °C. Similar results are obtained for undecylenic aldehyde and isoamyl phenylacetate. This is because of a large difference between the enthalpy of transfer of anthracene and the bracketing n-alkanes.

Finally, the effect on RI of carrier gas flow rate was also investigated. The results showed that absolute differences between RI values were between 0 and 8 retention index units for all compounds for flow rates of 1.0 or 2.0 mL min−1. Details can be found in the supporting information (supplementary Table 7).

Normalization of Retention Indices

Our previous study showed that RI obtained at different initial temperatures can be transformed to those at a standard temperature by multiple linear regression which included use of molecular structural descriptors [24]. In this work, we attempted to achieve normalization of retention indices for different heating rates by use of partial least-squares (PLS) regression. The purpose of this study was to transform retention indices obtained by use of different heating rates (6, 8, and 10 °C min−1) to those at a standard heating rate (4 °C min−1), considering that changes of retention indices under different conditions are essentially caused by their different structures. Therefore, taking structure descriptors into consideration for normalization is reasonable. So, 59 commonly used descriptors were calculated by use of Dragon 6.0 software. The retention index at 4 °C min−1 was the dependent variable y and the retention index 6 °C min−1 (or 8 or 10 °C min−1) plus 59 descriptors were the independent variables X, PLS regression was then performed. The results of normalization with descriptors, normalization without descriptors, and no normalization can be found in the supporting information (supplementary Tables 8 and 9). The error distribution is shown in Fig. 1. From the results it is apparent that normalization with descriptors is best.

Fig. 1
figure 1

Comparison of errors for different normalization methods 1 PLS normalization with descriptors, 2 PLS normalization without descriptors, 3 no normalization

Use of the RI Database for Essence Oil Analysis

Another purpose of this study was to use the created RI database for qualitative identification of aroma compounds, so further experiments were performed to analyze the volatile fraction from a real matrix—tobacco flavor. GC–MS is often used for tobacco flavor analysis. However, a mass spectrum cannot provide enough structural information to distinguish all compounds easily, especially when there are many isomers that further increase the difficulty of quick and accurate identification. Previous studies have shown that compounds could not be distinguished by MS but could be distinguished by use of retention indices. For instance, the mass spectra of isoborneol and borneol are almost completely identical, but there is an obvious difference between their retention indices.

Although RI have been used for identification for a long time, the procedures used are rarely reported. In general, the procedure commonly used for identification by use of RI is:

  1. 1.

    decide which retention index will be used, e.g. Kováts index, linear retention index, or Lee index;

  2. 2.

    design the experiment for RI determination as similar as possible to that used for the reference RI collection used; and

  3. 3.

    for an ambiguous MS matching result, compare the RI calculated for a given compound with the reference RI from the reference database; the candidate with the RI closest to the calculated RI is a reliable identification.

In this work, an industrial tobacco flavor provided by the Technology Center of China Tobacco, Hunan Industrial, was used to evaluate and validate the RI database. Analysis of the tobacco flavor was performed under experimental conditions similar to those used for creation of the RI database, in accordance with the above procedure. The results are shown in Table 4. Nineteen compounds from this tobacco flavor were identified by use of RI comparison. The difference between the calculated RI and the reference RI for a given compound is quite small, the range of absolute △RI is 0–6 RI units. The results obtained by RI comparison are in good agreement with those from matching of mass spectra.

Table 4 Identification of aroma compounds in tobacco flavor by RI and MS

It should be noted that the mass spectra of the two isomers isoborneol and borneol are quite similar (Fig. 2). It is very difficult to assign these peaks accurately on the basis of MS match alone. However, the two isomers elute at different times, so we can distinguish them on the basis of order of elution or retention index. On the HP-5MS column the retention indices of isoborneol and borneol were 1,160 and 1,168, respectively. So, the first peak is isoborneol and the second is borneol.

Fig. 2
figure 2

MS spectra of isoborneol (up) and borneol (down)

Conclusion

A database containing RI of over 300 compounds was created and used for identification of aroma compounds. Different experimental conditions (stationary phase, flow rate, heating rate, and initial temperature) were used to study their effects on RI. Stability and reproducibility of the retention index database was evaluated by replicate injection and retention regulations analysis. The data collected by us may add to the value of data collected previously. In addition, the created database was successfully used to identify aroma compounds in tobacco flavor. Further study of quantitative structure–retention relationships and data mining on the basis of this database will be conducted in the future.