Introduction

Herbs and other aromatic plants are popular ingredients in strongly flavored alcoholic beverages (Tonutti and Liddle 2010), and the combination of these and other products into cocktails is a foundational aspect of mixology (Regan 2003; Haigh 2009). One of the primary uses of aromatic bitters is their addition to mixed drinks to accent flavors and increase aromatic complexity (Clarke 2010; Parsons and Anderson 2011). What we call an “Old-Fashioned” cocktail today is the simplest and oldest style of cocktail (Grimes 2002; Simonson and Krieger 2014), and in its most essential form is whiskey, water, bitters, and a small amount of sugar, combined and served over ice.Footnote 1 This closely mirrors the earliest print definition of what was then called simply, “Cocktail”: “Cock Tail, then, is a stimulating liquor, composed of spirits of any kind, sugar, water and Bitters” (Sampson et al. 1806). As drinks-mixing became more elaborate through the nineteenth century, this drink had Old-Fashioned appended to its name by the 1890s (Wondrich and DeGroff 2007).

In a previous study, the flavor chemistry of 16 commercial bitters was characterized (Johnson et al. 2015), and based on this work, the current study focuses on the interactive effects of bitters in a whiskey matrix, representing the simplest type of cocktail.

In previous studies of wines, additive, masking, and synergistic effects on aroma were observed when different wine varieties were blended together (Hopfer et al. 2012). Here, the complex aromas of the whiskeys and the bitters and the effect of the high alcohol whiskey matrix provides a unique and interesting model system to further evaluate the effects of both the alcohol matrix and aroma-aroma interactions on aroma perception, partitioning, and release.

Four typical whiskeys (two bourbons and two ryes) and four common types of bitters were combined factorially into all 16 possible pairs of whiskey and bitters, which were then subjected to sensory analysis by aroma, and volatile analysis by GC-MS. Results of both data sets were compared to each other, and interactive effects between bitters and whiskeys were studied.

Materials and Methods

Materials

Four whiskeys and four bitters were used in this study (Table 1). Two bourbons (B1 and B2) and two rye whiskeys (R1 and R2), commercially available, were purchased in Davis, CA, USA, and were chosen for being commonly used in making Old-Fashioneds. Two whiskeys were premium whiskeys (B1 and R1, more than $25/750 mL), while the other two were of lower price (B2 and R2, less than or around $20/750 mL).

Table 1 Whiskeys and bitters used in this study

The four bitters were those most commonly used in Old-Fashioneds and other whiskey-based cocktails, and were purchased in Davis, CA, USA, and through online vendors (Table 1). The four bitters represented four different styles, including “A,” a typical aromatic bitters; “M,” a mole-style bitters, a new variety incorporating the chocolate, chili, and spice flavors of Mexican Mole Poblano; “NO”, an anise-heavy, New Orleans style bitters; and “O,” an orange bitters.

Determining the Dilution Factor of an Old-Fashioned

An Old-Fashioned cocktail is typically made by stirring room temperature whiskey, sugar, and bitters over ice. This melts the ice, which chills and dilutes the cocktail. To estimate the dilution one encounters when consuming an Old-Fashioned, an Old-Fashioned was made by stirring 5 g of spring water (Crystal Geyser, Calistoga, CA), 50 g of whiskey, and 3 dashes of bitters with 100 g of cracked ice for 60 s in a chilled (4 °C) mixing glass. The mixture was strained and the mass of the resulting cocktail and the leftover ice were weighed separately. Measurements were repeated in triplicate and showed that the final mixture was approximately 50% water and 50% whiskey. Besides providing an accurate model of the alcohol content of a stirred Old-Fashioned, diluting the spirit with water by half had been previously reported in the descriptive analyses of gin and tequila for purposes of panelist safety (Heymann and Ebeler 2017). To control for dilution and temperature effects, samples used for the subsequent analyses were diluted to this measured level with water, but served at room temperature rather than chilled or over ice.

Preparation of the 16 Samples Made from the Four Whiskeys and Bitters

Samples were made by diluting whiskeys 1:1 with spring water. Fifteen milliliters of each diluted whiskey sample was dispensed into a black tulip-shaped ISO wineglass (International Organization for Standardization 1977) and 200 μL bitters were added, mimicking the composition and dilution of the experimental model Old-Fashioned, as described above. A full factorial design was used, i.e., each of the four whiskeys was paired with each of the four bitters, resulting in the 16 samples used in this study.

Sensory Analysis

A generic descriptive analysis (DA), as described in (Lawless and Heymann 2010), was used to profile the sensory aroma characteristics of the 16 model Old-Fashioned cocktails. Fourteen volunteer panelists (4 females; aged 21 to 43) were recruited per email from a pool of students and employees from various departments at the University of California, Davis, CA, USA (IRB protocol 351687-1). Panelists gave oral consent to participate in the sensory study, consisting of six training sessions and six evaluation sessions.

A total of six training sessions were held with the panelists in groups. During the first four training sessions, panelists smelled the samples blindly, and then generated, discussed, and refined descriptors by consensus until an agreed-upon list of 26 terms was determined (Table 2). In the first training session, 4 of the 16 model Old-Fashioneds, each made with whiskey B2 and one of the four bitters, were smelled and discussed. Over the next three training sessions, the 16 samples were presented in a random order, with four in session 2 and six each in sessions 3 and 4, so that all samples were smelled at least once during training. References were prepared for each descriptor and these were smelled and discussed by the panelists, and changed and adjusted if necessary, over the second, third, and fourth training sessions.

Table 2 List of aroma descriptors and reference standards used in the descriptive analysis. All reference standards were presented in black tulip-shaped tasting glasses, covered with a plastic lid

During the last two training sessions, training in usage and rating of the references was provided, where each panelist smelled eight samples in each session, and rated the intensity of each descriptor in each sample. Panelist performance was checked by having panelists blindly identify the reference standards and ensure agreement on aroma intensity rating of descriptors in the samples.

The assessment of all 16 samples was performed in triplicate over six sessions, with eight samples served in each session. Samples were served at room temperature (25 °C) in black tulip-shaped ISO wineglasses (International Organization for Standardization 1977), covered with a plastic lid. In each session, prior to the assessment, all panelists smelled the 26 references to refresh their memory, before they proceeded with the sample assessment in an individual, temperature-controlled (25 °C) tasting booth under red light to mask any potential color differences. Each descriptor was rated on a computer screen on unstructured line scales, anchored by “low intensity” and “high intensity.” Samples were presented monadically, in a complete Williams Latin Square design, as provided by FIZZ software (version 2.47B, Biosystemes, Couternon, France), which was also used to capture all data. All samples were blinded with random three-digit codes, also provided by FIZZ software.

Headspace Solid-Phase Microextraction-Gas Chromatography-Mass Spectrometry (HS-SPME-GC-MS)

Model Old-Fashioneds were prepared for volatile analysis in the same proportions used for sensory analysis. A mixture of 10 mL whiskey diluted 1:1 with deionized water, 130 μL of bitters, 50 μg/L 2-undecanone (Sigma-Aldrich, St. Louis, MO) as an internal standard (IS), and 3 g of sodium chloride (Fisher Scientific, Pittsburgh, PA, USA) to improve volatile partitioning into headspace and increase analysis sensitivity, was added into 20 mL amber glass headspace vials (Agilent Technologies, Santa Clara, CA, USA) and capped with magnetic, PTFE-lined silicone septa headspace caps (Supelco, Bellefonte, PCA, USA). The extraction protocol was adapted from (Johnson et al. 2015), and is described as followed: Samples were warmed to 40 °C and agitated at 500 rpm for 5 min before extraction. A conditioned, 2-cm long 50/30-μm-thick Polydimethylsiloxane/Divinylbenzene/Carboxen (PDMS/DVB/Car) SPME fiber (Supelco) was introduced into the headspace of the vial for 45 min at 40 °C with rotational shaking at 250 rpm. A Gerstel MPS2 autosampler (Gerstel, Inc., Linthicum Heights, MD, USA) performed the extraction and the injection. The fiber was thermally desorbed in a SPME 0.7 mm straight inlet liner (Supelco), heated to 250 °C. Analysis was carried out with an Agilent 6890 GC-single quadrupole-MS (Agilent Technologies), equipped with a DB-WAX column (30 m × 0.25 mm ID × 0.25 μm film thickness, J&W Scientific, Folsom, CA, USA). Samples were introduced in split mode (10:1), and separated using an oven program, starting at 40 °C, held for 3 min, followed by a 2 °C/min ramp to 180 °C, then the ramp was increased to 30 °C/min until 250 °C was reached, and held for 3 min. The MS was run in electron ionization mode at 70 eV after a 1.5-min solvent delay, scanning each m/z between 40 and 300 amu. Each sample was analyzed in triplicate in random order.

Peaks were identified by matching the background-subtracted average mass spectrum across half peak height to the NIST mass spectral database (National Institute of Standards and Technology 2008), followed by verification by retention index and authentic standards, where available (Table 3). Retention indices (RI) were calculated (Kováts 1958; Van den Dool and Kratz 1963) using an alkane standard mix (n-C8 – C20, Sigma-Aldrich) that was analyzed under the same conditions as the samples. Following identification, GC peaks were manually integrated and normalized to the IS peak area to yield relative quantification of headspace concentrations.

Table 3 List of identified volatile compounds, together with CAS numbers, retention time (RT), calculated retention index (CRI) on a DB-WAX column, literature RI (LRI), observed concentration ranges in IS equivalents (ISE), and whether a significant whiskey (W) or bitters (B) effect was observed (P < 0.05). Fisher’s least significant differences (LSD) are presented for significant effects when observed

Statistical Analysis

Analyses were carried out in R (R Core Team 2013), using the RStudio editor (version 0.99.484, Boston, MA, USA), with the additional packages agricolae (de Mendiburu 2013), pls (Mevik and Wehrens 2007), and SensorMineR (Lê and Husson 2008).

Missing data (i.e., one panelist missed one of the 18 sessions) were imputed with the average of the two other product replicates of that panelist. A multivariate analysis of variance (MANOVA) on the product effect revealed significant differences among the 16 model Old-Fashioned samples (P < 0.005). Subsequent univariate analyses of variance (ANOVA) for the main effects Judge, Replication, Whiskey, and Bitters and the 2-way interactions Judge-by-Whiskey, Judge-by-Bitters, and Whiskey-by-Bitters were performed to evaluate the main effects and the interactive effects among judges, whiskeys and bitters.

For descriptors with a significant Judge-by-Whiskey or Judge-by-Bitters interaction, a pseudo-mixed model was used (Gay 1998), where the F-ratio of the Whiskey or Bitters effect is calculated with the mean square of the interaction term instead of the mean square of the error term. Descriptors were considered significantly different among samples for P < 0.05 by ANOVA. For descriptors, where samples were found to differ significantly by ANOVA, Fisher’s least significant differences (LSD) were calculated.

Mean values for aroma descriptors found to differ significantly by ANOVA were used in subsequent principal component analysis (PCA) and partial least squares regression (PLSR). For the sensory data set the PCA was conducted using the covariance matrix.

Volatile compounds were tested for significant differences due to Whiskey and Bitters by ANOVA (P < 0.05), and Fisher’s LSD values were calculated for volatiles with significant effects. PLSR was used to correlate sensory qualities to the headspace volatiles. Mean peak areas of volatiles for each sample (normalized to peak area of 2-undecanone in each sample) and aroma intensity ratings were standardized by dividing by their standard deviation.

Results and Discussion

Understanding the Sensory Space of Model Old-Fashioneds Through Descriptive Analysis

The sensory panel agreed upon 26 aroma descriptors which are summarized along with their references in Table 2. Of these 26 aroma descriptors, 15 showed a significant Bitters effect in the ANOVA (P < 0.05), indicating similar aroma attributes were found due to the different Bitters across all model Old-Fashioned mixtures. The 15 descriptors that varied depending on which bitters was used in the Old-Fashioneds include anise, black pepper, cardamom, chocolate, cinnamon, clove, coffee, cola, earthy, ginger, hay, herbal, nutmeg, orange, pencil shavings, although in the post hoc LSD test the coffee and hay descriptors did not differ significantly among the bitters samples (Table 4a). These descriptors, found to differ in the Old-Fashioneds, align with findings in (Johnson et al. 2015), where individual bitters (without the addition of whiskey) were characterized by a trained sensory panel. In the study by Johnson et al. (2015), the used aromatic bitters “A” was characterized by ginger, cardamom, cola, vanilla, black pepper, nutmeg and clove aromas, and these aromas were also perceived by our trained panel when bitters “A” was used in Old-Fashioneds. The New Orleans style bitters “NO” was described individually as high in anise, green, caraway, alfalfa hay, and earthy aromas, and similarly these aroma descriptors were also used by our panel for Old-Fashioneds that contained “NO” bitters. Also the used orange bitters “O” showed alignment in aroma descriptors across the two panels and two ways of assessment—individually in (Johnson et al. 2015) as well as part of an Old-Fashioned in our study—with characteristic orange candy and orange peel, grapefruit and cardamom aromas. Last, the Mole Poblano Bitters “M” was rated highest in chocolate, brown sugar and cinnamon aroma when assessed as part of a model Old-Fashioned and by itself (Johnson et al. 2015). Three additional descriptors—herbal, pencil shavings, and coffee—were not used by the panel in (Johnson et al. 2015), but were used and rated highest in the aromatic (A) and orange (O) bitters containing mixtures for herbal aroma, and highest in pencil shavings in mixtures containing mole (M) bitters, while the coffee aroma did not differ across the four bitters (Table 4a).

Table 4 Mean values of aroma descriptors that differed significantly in the ANOVA for all 16 mixtures (four whiskeys mixed with four bitters). ANOVAs were run for the overall (a) bitters and (b) whiskey effect, as well as for (c) the 16 mixtures. Letters in the same column for each sub-table are not significantly different by Fisher’s LSD test (P < 0.05)

Seven of the 26 descriptors showed a significant Whiskey effect in the ANOVA, indicating that these descriptors were the result of the use of four different whiskies in the Old-Fashioneds. They include anise, banana, coconut, earthy, nutmeg, oak, and vanilla, aromas (P < 0.05). Looking at the means listed in Table 4b, anise and earthy aromas were rated highest in the two rye whiskey samples while the two bourbons, B1 and B2, were lower in these aromas. Banana, coconut, and vanilla aromas, on the other hand, were rated highest in the bourbon whiskeys B1 and B2, and aroma ratings for B1 mixes were significantly higher than the ratings for rye R2 mixes in all three cases. The remaining two aroma intensities, nutmeg and oak, did not differ significantly in the post hoc comparison, but followed the two trends described above: while mean oak aroma was rated higher in both bourbons than in the rye samples; nutmeg aroma was higher in the rye R1, compared to the rye R2 and the other bourbon samples.

While the above results discussed the overall differences in aroma means between the four whiskies and four bitters, respectively, we also looked at the sensory differences between the 16 model Old-Fashioneds. Fifteen of the 26 aroma descriptors differed significantly between the Old-Fashioneds (P < 0.05), indicating that mixing different bitters with different whiskies changed the overall Old-Fashioneds in different directions (Table 4c). Of these 15 significant aroma attributes, nine were rated highest in the rye mixtures, while three were highest in the bourbon mixtures. The ginger and oak aroma means did not differ significantly in the post hoc comparison; however, the highest ratings for ginger aroma were found in one of the rye mixtures (R2-O); and oak aroma was highest in one of the bourbon mixtures (B2-A). Generally, model Old-Fashioneds showed greater differences in specific aroma intensities when the mixtures contained rye whiskey; for example, black pepper, chocolate, cinnamon, and hay aromas had the highest and the lowest intensities in rye whiskey mixtures (Table 4c). Only herbal aroma showed the greatest difference in intensity in the bourbon whiskey mixtures. Interestingly, some aromas that were present in either the whiskeys or the bitters alone, were not perceived when the mixtures were evaluated, i.e., banana, coconut, coffee, and pencil shavings. The complex mixtures apparently masked these aromas.

Three of the sensory descriptors, cola, nutmeg, and oak, were found to have a significant whiskey-by-Bitters interaction effect (P < 0.05). Significant interactions between whiskey and bitters mean that the mixing of one or more of the bitters with one or more of the whiskeys caused the sensory qualities of cola, nutmeg, and oak to either be heightened or dampened, on average, compared to other pairings of the same whiskey with other bitters, or the same bitters with other whiskeys (Fig. 1). In all three cases, model Old-Fashioneds that contained rye whiskey R1 were most affected, as R1 Old-Fashioneds showed the largest differences between each other, compared to other whiskies. In the case of cola aroma, the R1 mixture with orange “O” bitters was significantly different from all other three R1 Old-Fashioned, while other whiskey mixtures did not differ significantly in cola aroma (Fig. 1a). In fact, adding the “O” bitters to the other three whiskies did not change the cola aroma significantly compared to the other bitters. For nutmeg aroma (Fig. 1b), the addition of the four bitters to the four whiskeys led to different effects, but the effects only differed significantly across the bitters, not across the different whiskeys. For example, adding mole (M) bitters to bourbon B1 significantly decreased the nutmeg aroma compared to adding the aromatic (A) bitters to B1. Another example is rye whiskey R1, which showed significant enhanced nutmeg aroma in the aromatic (A) bitters mixture compared to the mixtures containing mole (M) or New Orleans (NO) bitters. Interestingly, although not statistically significant, oak aroma was reduced for all four bitters when added to rye whiskey R1 (Fig. 1c).

Fig. 1
figure 1

Interaction plots for the three aroma attributes—a cola, b nutmeg, and c oak—that showed a significant whiskey-by-bitters interaction in the ANOVA (P < 0.05). Same letters within one aroma attribute are not significantly different by Fisher’s LSD test. Bitters are shown with different symbols and line types (A—filled triangles with dot-dashed line; O—small filled circle with solid line; M—filled diamond with dotted line; NO—open circles with dashed line)

Using all descriptors that differed significantly among the 16 model Old-Fashioneds, a principal component analysis (PCA) was carried out (Fig. 2). A marked drop in the percentage of explained variation was observed in the scree plot after three principal components (PCs), and within the first three PCs over 71% of the total variance was explained, with PC 1 contributing 32.5%, PC 2 an additional 24.6% and PC 3 another 14.1%.

Fig. 2
figure 2

Principal component analysis biplots of model old-fashioneds, characterized by descriptive analysis. a PC 1 vs. PC 2. b PC 1 vs. PC 3. All 16 samples are shown in bold all caps (codes in Table 1). Aroma descriptors, differing significantly among the samples (P < 0.05), are shown in italic font

Looking at the first two PCs (Fig. 2a), 32.5% of the total variance is explained along PC 1, and samples are separated by the type of bitters, with aromatic (A) and orange (O) bitters on the right hand side of PC 1, separated from samples containing mole-style (M) or New Orleans style (NO) bitters on the negative PC 1 axis. Earthy and chocolate aroma descriptors contributed strongly to PC 1: their loadings vectors have less than a 45° angle with PC 1 and they show significant correlations to PC 1 (P < 0.05) in the negative direction. These attributes correlated positively to model Old-Fashioneds with added mole (M) and New Orleans (NO) bitters. Herbal, cardamom, ginger, cola and clove aromas contributed strongly to PC 1 in the positive direction, where samples containing orange (O) and aromatic (A) bitters are positioned. Along PC 2, clusters separated by PC 1 are sorted differently depending on bitters type—on the right side, PC 2 separates samples containing aromatic (A) bitters, characterized by nutmeg and cinnamon aromas, from those containing orange (O) bitters, in the negative direction. Within both aromatic and orange bitters-containing sample groups, the samples cluster by whiskey type, with the two bourbons B1 and B2 being close to each other, and the two ryes R1 and R2 grouping together. In both of the clusters by bitters type, the ryes plot higher on PC 2 than the bourbons. For the A and O bitters types, the mixtures with rye whiskeys are more associated with the aroma descriptors positively correlated to PC 2, including clove, black pepper, cinnamon, nutmeg, and anise relative to the bourbons in the same cluster. Bourbons within a cluster, on the other hand, plot lower on PC 2, thus, are more influenced by the aroma descriptors oak, vanilla, and coconut, which are negatively correlated to PC 2.

On the left hand side of Fig. 2a, PC 2 separates samples by whiskey type, with the premium rye (R1) and the premium bourbon (B1) at its most extreme ends, while the “basic” rye (R2) and bourbon (B2) are clustered together and slightly closer to the middle of PC 2. Rye is often considered to be a “spicier” whiskey than bourbon (Hellmich 2006; Maclean 2007; Buglass 2011; Stewart 2013), and this is apparent in this dataset, i.e., the rye-containing mixtures associate more highly with spice terms, such as clove, black pepper and earthy.

Along PC 3 (Fig. 2b), aromatic (A) and orange (O) bitters containing mixtures show some overlap between bitters types that were separated along PC 2 (Fig. 2a); however, for mole (M) and New Orleans (NO) bitters on the negative side of PC 1 a clearer separation by type is apparent: samples containing the mole (M) bitters are on the positive PC 3 axis and are significantly correlated to chocolate aroma. New Orleans (NO) bitters are on the negative axis of PC 3 samples, showing a correlation to anise, hay, and earthy sensory aromas. Interestingly, the lower cost rye-mole bitters mixture (R2-M) is positioned closer to the samples containing New Orleans style bitters, due to the lowest rating of the chocolate aroma, characteristic for all other samples containing mole (M) bitters.

In summary, as explained by the separations plotted in the PCA, the strongest driver of perceived flavor differences in the model Old-Fashioneds (i.e., the main source of separation along PC 1) is the bitters type. Explaining most of the sample separation (32.5%), PC 1 distinguishes between samples containing orange or aromatic bitters and those containing New Orleans style or mole bitters, regardless of whiskey type. For samples containing New Orleans style or mole bitters, the type of whiskey is a stronger separator along PC 2 than type of bitters, and the effect is more pronounced for the premium bourbon (B1) and rye (R1) than for the basic bourbon (B2) and rye (R2), which were more similar to each other when mixed with New Orleans (NO) or mole (M) bitters.

Analyzing the Volatile Changes of Model Old-Fashioneds with HS-SPME-GC-MS

Sixty aroma volatiles were detected using the method described above. Of these volatile compounds, two compounds, (Z)-4-decenol and whiskey lactone, showed both significant Whiskey and Bitters effects (P < 0.05), 8 compounds showed significant differences due to the different whiskies in the mixtures, and the majority of compounds (41) differed significantly across the bitters (Table 3). Looking at the two compounds that showed differences due to both whiskey and bitters used (Fig. 3a, b) a similar behavior is apparent: For both (Z)-4-decenol and whiskey lactone the addition of mole “M” and orange “O” bitters increases the headspace concentration across all whiskies, with highest concentrations in the B1 mixtures and lowest in the R1 mixes. In both cases, these enhancement effects are most pronounced in the bourbon B1 and least apparent in the rye R1. Such a behavior points towards chemical mixture effects as the enhancement in both compounds cannot be attributed to just the addition of a certain bitters as the effects differ among the whiskeys, with a suppression or no differences between the different bitters for the rye R1.

Fig. 3
figure 3

Examples of observed effects on volatiles in model old-fashioneds. a (Z)-4-Decenol. b Whiskey lactone. c β-Pinene. d Ethyl (E)-4-decenoate. e p-Cymene. f Anethole. g Eucalyptol. Within each compound, bitters sharing small letters and whiskeys sharing capital letters are not significantly different from each other (P < 0.05)

Eight compounds, β-Pinene, isoamyl caproate, isoamyl caprylate, ethyl myristate, ethyl hexadecanoate, ethyl (E)-4-decenoate, phenylethyl alcohol, guaiacol, differed significantly across the whiskies, independent of the bitters added (P < 0.05), and they could be grouped based on their behavior in the Old-Fashioned mixtures (Suppl. Fig. 1) in two groups: The first group, consisting of β-Pinene as well as all esters except ethyl (E)-4-decenaote showed the behavior depicted in Fig. 3c, where the compounds were only detected in the headspace of all bourbon B1 mixtures. The second group, consisting of phenylethyl alcohol, guaiacol, and ethyl (E)-4-decenoate, showed highest headspace concentration in the rye R2 mixtures, followed by mixtures with bourbon B2. These volatiles were also lowest in Old-Fashioneds containing bourbon B1. As an example, ethyl (E)-4-decenoate is shown in Fig. 3d.

The majority of volatile compounds differed across the bitters, indicating that bitters contribute a majority to the aroma compounds of Old-Fashioned cocktails. The 41 compounds that showed significant differences across the four bitters followed three major trends (Suppl. Figs. 2, 3, 4): Seventeen compounds originated from the aromatic bitters “A” as they were detected in the headspace of Old-Fashioneds that contained the bitters “A”. As an example p-Cymene is shown in Fig. 3e. Some of these 17 compounds did show some differentiation due to the whiskey used in the Old-Fashioneds; for example, Sabinene, α-Phellandrene and Elemicin were detected in the headspace of all “A” mixtures except for Old-Fashioneds that were made with bourbon B1, rye R1, and rye R1 and R2, respectively (Suppl. Fig. 2), indicating a masking effect by the whiskey.

As a second trend, six volatiles were identified that showed highest concentrations in Old-Fashioned mixtures that contained the New Orleans “NO” bitters (Suppl. Fig. 3), but were not detected or barely detected in all other mixtures. The six compounds include estragole, carvone, methyleugenol, 2-tridecanone, myristicin, and anethole, the latter is shown as an example in Fig. 3f. In a study characterizing the chemical composition of different bitters (Johnson et al. 2015), all listed compounds were detected in large concentration in the same New Orleans bitters as used in this work, confirming that they originate from the “NO” bitters.

Another 14 volatile compounds could be grouped based on their suppression behavior in Old-Fashioneds that contained either the aromatic bitters “A” or the New Orleans bitters “NO.” These volatiles probably originate from either the orange “O” or the Mole Poblano “M” bitters (Suppl. Fig. 4), as all of them were detected in the bitters “O” and “M” themselves (Johnson et al. 2015); eucalyptol is shown as an example in Fig. 3g.

Four compounds (Hexanal, Geranyl acetate, γ-Eudesmol and an unidentified compound with RI 1734) differed significantly across the bitters (P < 0.05), but did not follow any of the three trends described above (Suppl. Fig. 5).

A total of nine compounds (α-Pinene, α-Terpinene, β-Phellandrene, Caroyphyllene, Ethyl caprate, Ethyl dodecanoate, Isopentyl dodecanoate, Isobutyl decanoate and Cinnamaldehyde) did not differ significantly across the whiskies or bitters (P > 0.05). Their interaction plots are shown in Suppl. Fig. 6, and it becomes apparent that bourbon B1 affected the partitioning for 8 of the nine compounds differently than the other whiskies: When mixed with the orange “O” bitters, headspace concentrations of three terpenes (α-Pinene, α-Terpinene, β-Phellandrene) increased while none of these terpenes was detected in mixtures with the other whiskies. A similar pattern is found for mixing B1 with the aromatic bitters “A,” where Caryophyllene and the three esters increased in concentration, an increase that was not found in “A” mixtures with the three other whiskies.

Correlating Sensory Aroma Descriptors of Model Old-Fashioneds to Volatile Compounds

In a last step, partial least squares regression (PLS, Fig. 4) was performed on standardized significant sensory descriptors and volatile compounds to correlate sensory to chemical composition. The first three latent variables (LVs) explained 38.1, 20.7, and 13.5%, respectively, of the variance in the volatile data, and 23.4, 18.7, and 10.2%, respectively, of the variance in the sensory data. LV 1 primarily separated the samples into groups of those mixtures containing aromatic bitters and those containing other types of bitters, with LV 2 separating the latter group into clusters by type of bitters used (Fig. 4a).

Fig. 4
figure 4

Partial least squares (PLS) regression analysis of the significantly different sensory attributes and volatile compounds in the 16 model old-fashioneds (code in Table 1). a LV 1 vs. LV 2 score plot. b LV 1 vs. LV 2 correlation plot showing predicting (volatiles, smaller font) and predicted (aroma attributes, larger font) variables. c LV 1 vs. LV 3 score plot. d LV 1 vs. LV 3 correlation plot

In the (sensory data only) PCA (Fig. 2), some spatial groupings were more dependent on type of whiskey than type of bitters. By contrast, taking into account chemical differences between the samples in the PLS (Fig. 4a), the type of bitters used is the primary driver of spatial separation and grouping. While this effect dominates overall separation along LV 1 and LV 2, within each group of Old-Fashioneds (separated by type of bitters), the B1 (premium bourbon) sample plots highest along LV 2 compared to the other samples, and the R1 (premium rye) sample plots lowest, with B2 and R2 somewhere in the middle. This mirrors the tendency in the sensory PCA (Fig. 2) for R1 and B1 mixtures to plot furthest away from each other within mixtures containing the same type of bitters. These results suggest that, when mixed with any given type of bitters, latent flavor differences between bourbon and rye are expressed most obviously when comparing more premium whiskeys. The spatial position of the R1-containing mixtures in both the PCA and the PLS ties them to the descriptors anise, hay, and earthy.

Much of the separation in the PLS, as noted above, derives from differences in the aromatic bitters (A), compared to the other types of bitters (Fig. 4b). Many of the compounds contributing strongly to the separation in the PLS (noted by their position further out along one or more of the axes of the plot) are terpenoids, highly associated with the Old-Fashioneds containing the aromatic type bitters, and are also correlated positively to nutmeg, cinnamon, black pepper, clove, and herbal aromas. Across all types of whiskeys-bitters mixtures, the samples with aromatic bitters (A) were rated highest for each of these descriptors, and were significantly higher than at least one other type of bitters. Individual compounds associated to the greatest extent with these aromas of the aromatic (A) bitters-containing Old-Fashioneds were Elemicin, Caryophyllene, Geranyl acetate, α-p-Dimethylstyrene, β-Eudesmol, Camphor, γ-Terpinene, Eugenol, Camphene, Limonene, Myrcene, α-Terpinene, p-Cymene, Sabinene, and α-Phellandrene, as well as several unidentified compounds.

The second dimension, LV 2, primarily separates orange, cardamom, and cola aromas and their associated volatiles at one extreme from earthy, hay, and anise aroma and their associated volatiles at the other (Fig. 4b). Additionally, descriptors that differed significantly among the whiskeys but not the bitters all plot exclusively in the upper half of LV 2, as do nearly all of the non-terpenic esters, which are often associated with yeast fermentation (Vianna and Ebeler 2001; Swiegers et al. 2005). A number of volatile compounds and aromas load strongly onto LV 2 but not to LV 1—this may be because they describe relationships shared between samples that are separated along LV 1. Most dominant among these, plotting positively along LV 2, are orange, cardamom, and cola aromas and their associated volatiles Eucalyptol, Linalool, α-Terpineol acetate, Terpinolene, and α-Pinene. These aroma descriptors differed significantly in intensity between the types of bitters, with the aromatic and orange bitters rated most highly in cardamom and cola aromas, and orange aroma rated significantly higher in orange bitters than all three of the other types. Orange (O) bitters containing mixtures showed the highest ratings in orange aroma, and were significantly different form all other Old-Fashioneds made with aromatic “A,” New Orleans “NO” or mole “M” bitters.

Conversely, the earthy, anise, and hay aromas plot negatively on LV 2, and are associated with a cluster of phenylpropenoid compounds—estragole, myristicin, anethole, methyleugenol, the terpenoid carvone, as well as 2-tridecanone (Fig. 4b). In isolation, estragole (Luebke 2014a) and anethole (Luebke 2014b) are both described as having sweet and anise-like aromas; myristicin (Luebke 2014c) is described as spicy and woody; carvone (Luebke 2014d) as minty and licorice; and 2-tridecanone as waxy, dairy, herbal, and earthy (Luebke 2014e). All Old-Fashioneds containing New Orleans (NO) bitters show a high positive correlation to these aromas and compounds, also evident by the highest ratings in earthy, anise and hay (Table 2b).

Taking into account the third dimension of the PLS model (Fig. 4c, d), model Old-Fashioneds show similar trends as in the first two dimensions, but flipped: The separation between New Orleans (NO)-containing bitters from the mole (M) and orange (O) bitters mixtures along LV 2 is no longer as clear along LV 3, however, the effect of the whisky used in the mixtures becomes more apparent, with samples clustering closer when containing the same type of whiskey. This clustering based on whiskey along LV 3 holds also true for aromatic (A) bitters mixtures, which show the same order as the other bitters mixtures along LV 3: Both rye whiskeys R1 and R2 mixtures are loaded positively along LV 3, followed by bourbon B2-containing Old-Fashioneds, while the bourbon B1 mixtures are located on the negative LV 3 axis.

In summary, separation using both sensory and volatile composition data in the PLS is driven mainly by bitters type along LV 1 between the aromatic (A) bitters and the three others, which are resolved along LV 2. Along LV 3 model Old-Fashioneds separate due to whiskey type.

Mixing bitters and whiskeys into Old-Fashioned cocktails results in identifiable differences in flavor arising from both the bitters and the whiskey used for the cocktail. In other words, this type of mixing does not mask differences between either ingredient, and in fact, the premium whiskeys are more significantly different upon mixing into an old-fashioned than lower priced whiskeys. This suggests that commonly held wisdom that more expensive, or more carefully crafted spirits should not be mixed because their flavor will be lost is not necessarily true, and that aroma characteristics of premium whiskeys continue to come through in careful mixology.

From a holistic standpoint, the sensory data analyzed in tandem with volatile data suggest that the differences in Old-Fashioned type cocktails are driven more strongly by the type of bitters used than by the type of whiskey used, though this depends on the type of bitters used. Multivariate analysis with PCA using the sensory descriptive data suggests that adding aromatic and orange bitters to the different whiskeys leads to a greater differentiation of the whiskeys, in other words, adding those two types of bitters accentuate the individual differences of the whiskeys more, while the contrast is observed for mole and New Orleans style bitters: Adding these bitters to the four different whiskeys did not lead to very different Old-Fashioneds, where the qualities of the bitters override any potential differences coming from the whiskeys. While positions in the PCA plot suggest that aromatic bitters emphasize the spicy qualities of rye whiskeys and other types of bitters emphasize the softer, oakier qualities of bourbons, it should be emphasized that this relationship was not found to be statistically significant. In flavor-chemical terms, while the bitters type was a stronger overall spatial separator of samples, within each cluster of samples grouped by type of bitters, a conserved spatial pattern separating bourbon- and rye-based samples along LV 2 is evident. Although the sensory qualities of the aromatic bitters and the orange bitters show a relative similarity (Johnson et al. 2015), it appears that when mixed with whiskeys the differences become more pronounced.

A number of aroma descriptors generated in the present study—cardamom, hay, ginger, nutmeg, clove, orange, vanilla, anise, cinnamon, chocolate, earthy, black pepper, dried fruit, and cola—were also (independently) generated in the work by (Johnson et al. 2015), focusing on profiling bitters without the addition of whiskey. The terms pencil shavings, coriander, coffee, oak, coconut, caramel, caraway, banana, smoky, and vinegar were unique to this experiment. Of the terms shared with the bitters-only dataset, black pepper, vanilla, anise, and earthy differed significantly in intensity between both the whiskeys and the bitters when made into model Old-Fashioneds. The oak, nutmeg, and cola aromas showed significant interactive effects between bitters and whiskeys. For example, with the cola-like aroma, a synergistic effect was observed where a significantly higher cola intensity was observed in the premium rye (R1)-orange bitters mixture compared to the other mixtures. Addition of orange bitters also appeared to suppress oak aroma in the premium rye (R1) model Old-Fashioned. Adding mole bitters to the premium bourbon (B1) significantly suppressed nutmeg aroma.

The presence of interactive sensory effects suggests further questions of interest about the inherent sensory complexity of aroma mixtures. If sensory qualities in even simple cocktails only exist upon mixing and for specific combinations, further unique interactions could be envisioned for more complex mixtures. This study points to the critical need to evaluate not only physico-chemical effects of the sample matrix on volatile release and partitioning, but also the complex interactive effects of sample matrices on perception of aroma mixtures.