Quantitative genetic studies and their industrial applications

The budding yeast Saccharomyces cerevisiae represents an outstanding model for evolutionary, molecular biology and genetics studies, mostly due to its ease of cultivation and laboratory manipulation, alongside its (prominent) role as a cornerstone of the genomics era (Liti 2015; Voordeckers et al. 2015). The importance of yeast became clear more than 80 years ago, with the beginning of genetics studies in brewing. Indeed, beer production aided the birth of yeast genetics by pioneering crosses between different strains to enhance the quality, flavour and stability of the final fermented product (Barnett 2007; Gibbons and Rinker 2015). Since then, yeast has become a key player in many industrial fermentation processes such as wine and sake as well as the production of fermented food like bread (Legras et al. 2007; Querol et al. 2003).

The close association between S. cerevisiae and humans implies that budding yeast corresponds to a domesticated species; however, re-sequencing studies demonstrated that it is not necessarily domesticated and many new wild isolates are continually being described (Bergstrom et al. 2014; Cromie et al. 2013; Liti et al. 2009; Skelly et al. 2013; Wang et al. 2012). In this context, field surveys in habitats remote from human activity have demonstrated that China harbours an important pool of different natural isolates with extensive genetic variation (Wang et al. 2012). This standing natural diversity of different S. cerevisiae strains provides a unique niche for the industry. Consequently, hundreds to thousands of strains are being used for molecular and quantitative genetic studies in different laboratories around the world, with the aim of: on one hand understanding yeast biology and on the other, providing the appropriate strains for utilisation in industry (Steensels et al. 2014). In an industrial setting, exploiting new genetic variants allows fine tuning of the final preferred product by, for example, increasing substrate utilisation for the production of desirable secondary metabolites or modifying metabolic fluxes in a particular genetic background (Marsit and Dequin 2015). Thus, deciphering the genetics underlying traits of industrial interest has enormous potential to identify new alleles that could serve to ameliorate fermentation processes, representing an opportunity to cope with the demands of increased productivity.

Throughout the last two decades several linkage mapping studies have been conducted in recombinant populations in order to elucidate the inheritance patterns of complex traits in yeast and identify Quantitative trait loci (QTLs) (Ehrenreich et al. 2009; Liti and Louis 2012). Most of these advances were performed utilising laboratory strains, such as S288c and their derivatives (Ehrenreich et al. 2009), providing little information about the allelic variants that explain the phenotypic diversity of the species in the wild (Liti 2015) and limiting their industrial use. With the advent of the genomic revolution and new sequencing technologies; the genome sequence of hundreds of strains is now available allowing the study of a greater genetic panel (Peter and Schacherer 2016). Thus, the latest quantitative genetic studies exploiting yeast allelic diversity expanded our knowledge of the ecology and evolutionary biology in this simple model organism by unveiling many new variants that could be useful in applied studies (Ambroset et al. 2011; Cubillos et al. 2013; Gutierrez et al. 2013; Hou et al. 2016; Jara et al. 2014; Parts et al. 2011; Salinas et al. 2012; Steyer et al. 2012; Tesniere et al. 2015; Wilkening et al. 2014).

The fermentation of beer and wine represents the main two beverages for which yeast has been utilised for centuries (Sicard and Legras 2011). The mapping of QTLs in S. cerevisiae during wine must fermentation has uncovered genetic variants for many traits that have likely been exposed to selection, such as: nitrogen consumption (Brice et al. 2014; Gutierrez et al. 2013; Jara et al. 2014), ethanol tolerance and production (Duitama et al. 2014; Snoek et al. 2016; Swinnen et al. 2012; Tilloy et al. 2014), fermentation kinetics (Kessi-Perez et al. 2016), sulphite resistance (Zimmer et al. 2014) and aroma production (Steyer et al. 2012). For example, utilising bulk segregant analysis coupled with whole-genome sequencing, four genetic variants which can improve nitrogen uptake in wine strains were discovered (Brice et al. 2014). Similarly, consumers and the wine industry are demanding lower alcohol levels and for this, directed evolved populations grown in the laboratory under limiting conditions have provided new artificial genetic variants to cope with this request (Tilloy et al. 2014). Furthermore, linkage analysis in a cross between CBS6412 and ER7A (an industrial strain) revealed SSK1, an osmosensor member of the HOG pathway, as responsible for ethanol yield differences between these two strains. The CBS6412 allele affected growth, volumetric productivity and had a low glycerol/high ethanol production ratio (Hubmann et al. 2013). One of the advantages of this type of study is that two independent genetic variants with completely opposite phenotypes may be equally useful for the industry. For example, an allele which is responsible for the production of greater ethanol levels may not be desirable in the wine industry, but it would be highly prized in the bioethanol business (Pais et al. 2013). In this way, deciphering genetic variants can have multiple applications in different industrial processes with high economic impact.

What drives natural phenotypic variation?

Originally, the majority of the genetic studies focused their efforts in finding differences between strains in coding regions. This bias is based on the premise that most allelic variants within ORFs will significantly affect the protein structure and therefore non-synonymous mutations represent key targets in the search for causal polymorphisms (Ehrenreich et al. 2009; Liti and Louis 2012). These variants were thought for years to be the principal force behind natural phenotypic variation. However, it has recently been argued that these types of mutations are usually deleterious within natural populations and most protein coding sequences are conserved even between different species (Wray 2007). Alternatively, coding regions could be finely modulated through their gene expression. In the past decades, experimental evidence has been gathered, predominantly in higher eukaryotes, demonstrating that natural gene expression variation represents a key factor shaping an individual’s phenotype, where polymorphisms within, e.g., transcription factor binding sites, can lead to phenotypic differences between strains (Chidi et al. 2016; Cubillos et al. 2012; Fraser et al. 2012; Salinas et al. 2016; Wittkopp and Kalay 2012). Moreover, in recent years gene expression regulation has been at the forefront of genetic and evolution studies in multiple species. By finely tuning mutations in both portions of the genome (coding and regulatory regions), organisms can exhibit distinct phenotypes and adapt to stressful conditions (Salinas et al. 2016). By modifying gene expression patterns individuals can exhibit an extraordinary regulatory elasticity, allowing them to withstand unfavourable environmental conditions. A well-known approach to quantify the effect of natural variants acting in cis (this is near the encoded transcript) is to study allele-specific expression (ASE) through massive RNA sequencing (McManus et al. 2010; Salinas et al. 2016). ASE is the difference in expression levels between two parental alleles within a hybrid and reflects the outcome of polymorphisms located upstream of the ORF. Therefore, ASE is a highly useful tool for understanding the impact of genetic variation within regulatory regions (Fig. 1).

Fig. 1
figure 1

Allele-specific expression pipeline. F1 hybrids generated from two individuals genetically diverged (Strain A in red and Strain B in blue) can be utilised to estimate allele-specific expression (ASE) and quantify expression divergence between genetic backgrounds. To accomplish this, the F1 hybrid is subjected to RNA-seq and then specific reads belonging to Strain A or Strain B are quantified for each parental background. Significant deviations from a 50/50 distribution for each allele are labelled as ASE between strains

ASE in natural yeast isolates

Previously, we were able to demonstrate that ASE is highly widespread between natural S. cerevisiae isolates by quantifying the effect of cis-variants in a grid of six F1 hybrids derived from the cross of four divergent strains (Salinas et al. 2016). Furthermore, several efforts have demonstrated that allelic expression differences can directly impact a phenotype (Fay et al. 2004; Gerke et al. 2009; Salinas et al. 2016). By estimating the consumption of aspartic acid and glutamic acid in wine fermentation must in two strains of different geographic origin, we were able to show that polymorphisms in both portion of the gene ASN1 (coding and regulatory), an asparagine synthetase that catalyses the synthesis of asparagine from aspartic acid (Salinas et al. 2016), were partly responsible for nitrogen assimilation differences between these two genetic backgrounds. Interestingly, ASN1 was not the only case where we could directly correlate allele-specific expression and phenotype. Among the thousands of alleles differentially expressed, we also found GDB1, a glycogen debranching enzyme required for glycogen degradation and relevant in the fermentation process (Apweiler et al. 2012). In our ASE strategy, we found that a Wine/European isolate (DBVPG6765, named as WE) exhibited greater expression levels compared to any other surveyed strain (Fig. 2a). To determine the influence of dissimilar GDB1 allelic variants in the fermentation process and its potential industrial application, we performed a reciprocal hemizygosity assay between WE and a North American (NA) strain, since they showed the greatest allelic expression differences, and characterised them for fermentation kinetics in synthetic wine must. We observed significant differences in the total CO2 output between reciprocal hemizygotes, with the NA-gdb1Δ/WE-GDB1 hybrid having a greater rate of CO2 production than the NA-GDB1/WE-gdb1Δ hemizygote, in agreement with greater expression levels in the WE background (Fig. 2b). These strains differ by several polymorphisms in both, the regulatory and coding regions within GDB1. In order to determine whether the phenotypic differences were due to either category of polymorphic changes, we performed an allele swap assay where we swapped, on one hand the promoters of the two strains and on the other, the coding portion (Salinas et al. 2016) (Fig. 2c). Based on this approach, we reconstructed all the possible reciprocal hemizygote combinations in the parental backgrounds by varying either the ORF or the regulatory region. After 21 days of fermentation, we observed in the NA background that the CO2 production increased by 14 % when the GDB1 WE promoter was introduced (Fig. 2d(i)), in agreement with greater expression levels of the GDB1 WE. Surprisingly, when we introduced the NA promoter into the WE background, the NA promoter allele did not change the fermentation kinetics, suggesting a background-dependent effect on the ORF being expressed (Fig. 2d(ii)). Subsequently, when we performed the corresponding experiment replacing the ORF region, we observed an 18 % lower CO2 output when we introduced the GDB1 NA allele into the WE background (Fig. 2d(iv)). These results demonstrate that, just like in ASN1, polymorphisms located in regulatory and coding regions in GDB1 explain the phenotypic differences observed in reciprocal hemizygotes and ultimately, between NA and WE strains. Based on these results, GDB1 alleles (and many others so far described in the literature) could represent potential genetic variants for applied yeast studies.

Fig. 2
figure 2

Phenotypic validation of ASE in GDB1. a Diagram representing the three F1 hybrids and ASE levels for GDB1 between the Wine/European (WE) isolate crossed against the Y12 (Sake, SA), DBVPG6044 (West African, WA) and YPS128 (North American, NA) isolates. b CO2 output (gl/L) for GDB1 reciprocal hemizygotes. NA-GDB1/WE-gdb1Δ (green) denotes hemizygotes carrying the NA allele, while NA-gdb1Δ/WE-GDB1 (orange) denotes hemizygotes carrying the WE allele. c Selected promoter and ORF swap strategies are shown. The WE/NA promoters and ORFs were swapped in the opposite genetic backgrounds, generating four different combinations. d CO2 output (gl/L) for GDB1 reciprocal hemizygotes carrying swapped promoters (graphs i and ii) or ORF (graphs iii and iv). p denotes the promoter and o denotes the ORF that the reciprocal hemizygote carries in each case. The relative percentage of CO2 output difference is indicated in the two cases where significant differences were found between hemizygotes carrying different combinations of promoter and ORF

Perspectives

Quantitative genetic studies have provided a wide set of natural allelic variants which can be used to tackle the needs of the fermentation industry. In the near future, low sequencing costs will expand the repertoire of sequenced strains, revealing an even larger number of genetic variants to explore. However, the phenotypic contribution of these alleles may vary when placed in different genetic backgrounds and therefore large screens to estimate gene–gene (G × G) interactions should be explored before they can be extrapolated to other strains. In this context, establishing not only how these allelic variants interact with other genes from different backgrounds, but also how they interact with the environment (G × E) is a milestone that has not been reached. Screens in offspring derived from dozens of parental pairs (Hou et al. 2016) or the utilisation of recombinant hybrids (Hallin et al. 2016) grown under an array of environments will provide the means for dissecting and understanding complex genetic interactions. Part of these screens can be complemented by determining the effects of polymorphisms upon gene expression or protein structure. Thus far, although many studies have described differences in expression levels, the molecular mechanisms underlying transcript abundance variation are still not clear and represent a current challenge in modern genetics. Thus, deciphering G × G and G × E interactions will help to understand how allelic variants respond to genetic and environmental interactions and generate better models for their application in the industry.