Introduction

Lactic acid bacteria (LAB) are a nontaxonomic group of Gram-positive, low GC content, nonmotile bacteria characterized by their capability to ferment sugars to lactic acid. In food industry, LAB are used for food and beverage fermentation, flavor forming (Urbach 1995), preservation (Stiles 1996), production of add-in ingredients (Hugenholtz et al. 2002), bacteriocins (De Vuyst and Leroy 2007), and exopolysaccharides (Cerning 1990; Welman and Maddox 2003). LAB can also be used to produce bulk and fine chemicals, including lactic acid (Kwon et al. 2001), polyols (Wisselink et al. 2002), and B vitamins (Burgess et al. 2004; Taranto et al. 2003). The rest of the potential applications of LAB are summarized in Table 1. In 2001, the first genome of LAB (Lactococcus lactis ssp. lactis IL1403) was sequenced and published (Bolotin et al. 2001). To date, 25 LAB genomes (15 Lactobacillus, three Lactococcus, three Streptococcus, two Leuconostoc, one Pediococcus, and one Oenococcus) have been sequenced and published (Table 2). LAB have many traits of industrial importance. Over years, the molecular mechanisms underlying these features have been elucidated by molecular microbiologists using single-gene approaches. Although these findings greatly increased our knowledge on LAB, they did not give an overall picture on the functionality. Genomics and functional genomics provide us an unprecedented opportunity to take a global insight into physiological and metabolic capabilities of LAB. The massive new knowledge generated through genomics can help to discover novel application potentials of LAB worthy of future exploration.

Table 1 Primary applications of LAB
Table 2 General features of the sequenced LAB genomes (data were collected from genome database of the National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome)

This review aims to address how genomics helps to increase our understanding to the application potential of LAB. To this end, general features of LAB genomes were summarized, followed by illustrating how to identify and characterize gene functions through genomics and functional genomics. We further extend the genomics approach to genome-scale stoichiometric modeling strategies, presenting the industrial relevant features of LAB revealed by modeling approach.

Overall industrial relevant features revealed by genomics

The most updated genome sequencing information (GOLD, genome online database, http://www.genomesonline.org/) shows that 25 LAB genomes have been sequenced and annotated, while 67 projects are in progress (59 Lactobacillus, three Lactococcus, three Leuconostoc, one Oenococcus, and one Streptococcus). Most LAB genomes are relatively small (1.8–3.3 Mb; Table 2). The numbers of protein-encoding genes differ from 1,700 to 3,200, indicating substantial gene loss or gene gain events during evolution. Comparative genomic analysis revealed that many biosynthesis related genes were lost and external nutrients utilization abilities were enhanced by acquiring genes through horizontal gene transfer or gene duplication, due to the prevailing reductive evolution trend driven by adaptation to the nutrient-rich niches (Makarova et al. 2006). In the genomes of dairy LAB Streptococcus thermophilus (Bolotin et al. 2004), Lactobacillus delbrueckii ssp. bulgaricus (van de Guchte et al. 2006), and Lactobacillus helveticus (Callanan et al. 2008), more than 10% coding genes lost their functions and present as pseudogenes. Especially in S. thermophilus, degeneration of virulence-associated genes, such as those related with antibiotic resistance and adhesion function, diverges it from its pathogenic Streptococci neighbors. This ongoing genome decay process indicates the genome content of LAB could be easily changed and thus might benefit engineering, such as genome shuffling (Patnaik et al. 2002; Stephanopoulos 2002) or domestication under controlled industrial conditions.

Various mobile genetic elements (MGEs) were found in LAB genomes, including plasmids, prophages (Desiere et al. 2002), insertion sequence elements, transposons, and group II introns (Shearman et al. 1996). MGEs contribute to genome plasticity, host competitiveness, and environmental adaptation (Frost et al. 2005; Top and Springael 2003). One of the most important newly discovered MGEs in LAB is the 242-kb plasmid pMP118 present in Lactobacillus salivarius UCC118 (Claesson et al. 2006; Li et al. 2007). This led to the discovery of the widespread of megaplasmids with similar replication origin to pMP118 in 33 strains of L. salivarius (Li et al. 2007). In addition, megaplasmids of sizes ranging from 120 to 490 kb were found in six other species of Lactobacillus: Lactobacillus hamsteri, Lactobacillus intestinalis, Lactobacillus kalixensis, Lactobacillus ingluviei, Lactobacillus acidophilus, and Lactobacillus equi (Li et al. 2007). Interestingly, none of the megaplasmids present in these six Lactobacillus species shares a similar replication origin to pMP118 (Li et al. 2007).

Little is known about the biology of the megaplasmids in LAB. pMP118 is the largest sequenced plasmid in LAB, while the chromosome of L. salivarius UCC118 is the smallest one in sequenced Lactobacillus genomes to date. L. salivarius was traditionally classified as homofermentative bacterium. Genome sequencing discovered that the two key enzymes for completing pentose phosphate pathway, transketolase and transaldolase, are pMP118-encoded. Further experiments showed that some L. salivarius strains can indeed ferment xylose (Li et al. 2006). This led to a reclassification of L. salivarius to facultative heterofermentative bacterium (Li et al. 2006). In addition, it gives an elegant example showing how to ferment pentose by the cooperation of megaplasmid and chromosome encoded genes. Plasmids are common vehicles for rapid genetic transfer. In last decade, food-grade gene cloning and expression systems for LAB were successfully developed (Table 3), such as the nisin-controlled gene expression system in L. lactis (Mierau and Kleerebezem 2005). The megaplasmids mentioned above can potentially be modified to novel genetic tools to be used in Gram-positive bacteria, like BAC vector in Escherichia coli, to clone large DNA fragments.

Table 3 Genetic tools for LAB

Understanding application-related physiological features through genomics

LAB are exploited for many industrial applications because of their related physiological features, which include substrate utilization, stress response, metabolic capabilities, population interaction, and probiotic properties. The mechanisms underlying these features are rather complex. Genomics and functional genomics approaches, characterized by high throughput, large scale, combination of both experimental and computational methodologies, are used to discover novel genes, signaling pathways, metabolic routes, and regulatory circuits. Here, we show how “omics” analysis increase our knowledge of the industrial application-related physiological features.

Substrate utilization

LAB are generally considered as nutrients fastidious, which means they need more nutrients to grow. This is due to that LAB are usually isolated from the nutrient-rich niches such as plants, fermented foods, and gastrointestinal (GI) tract (Holzapfel and Wood 1998). Genome sequencing and annotation revealed that most LAB lack biosynthetic pathways for essential amino acids, nucleotides, and vitamins (Altermann et al. 2005; Azcarate-Peril et al. 2008; Bolotin et al. 2004, 2001; Callanan et al. 2008; Makarova et al. 2006; Pridmore et al. 2004; van de Guchte et al. 2006). Through genomics analysis, many saccharides uptake systems, proteolysis system, and amino acid transporters were found encoded on the genomes to help the host to take up nutrients from milk, plant, or mammalian GI tract.

Genomics revealed that the improved substrate utilization ability of LAB was basically achieved by either gene duplication or horizontal gene transfer (Kleerebezem et al. 2003; Makarova et al. 2006). In Lactobacillales, gene duplication event occurred in phosphotransferase system, amino acid transporters, and peptidases after divergence from the common ancestor, which increases adaptation to nutrients-rich environment (Makarova et al. 2006). Lactobacillus plantarum (3.3 Mb) is the most versatile and flexible species of LAB. The diverse sugar utilization of L. plantarum was achieved by clustering related transporters, metabolic enzymes, and other regulatory proteins on a lower GC content region, named “lifestyle adaptation island”. Because genes on this island were acquired by horizontal gene transfer (Kleerebezem et al. 2003), it is considered as a region with high plasticity, evidenced by DNA microarray-based comparison of 20 strains of L. plantarum (Molenaar et al. 2005). It would therefore be desirable if industrial nutrition fastidious strain can have the free living lifestyle by acquiring these islands through genome shuffling.

LAB can take use of a variety of carbon sources, but a comprehensive understanding on how carbon sources are taken up and metabolized has not yet been achieved. Whole-genome transcriptome profiling and comparative analysis for growth of probiotic L. acidophilus NCFM on different sugars identified genetic elements responsible for carbohydrate metabolism (Barrangou et al. 2006). Three classes of transporters (ATP-binding cassette, phosphoenolpyruvate phosphotransferase system, and galactoside pentose hexuronide permease) and related hydrolyases were found specifically induced by their substrates but repressed by glucose, suggesting the sugar metabolism of L. acidophilus is subjected to carbon catabolite repression regulation (Barrangou et al. 2006). Very often, sugar availability is limited during biopreservation and meat fermentation process. Alternative carbon source utilization capabilities are therefore required to cope with glucose starvation. An example is Lactobacillus sakei 23K which could catabolize external ATP breakdown intermediates for energy production, through a predicted purine nucleoside scavenging pathway revealed by genome sequencing (Chaillou et al. 2005). Another example is the metabolism of casein, which can be used by many LAB as nitrogen and carbon sources. Upregulation of proteolysis genes during growth in milk were found by transcriptome and proteome profiling for L. lactis, S. thermophilus, and L. helveticus (Derzelle et al. 2005; Gitton et al. 2005; Smeianov et al. 2007). These understandings can assist strain selection and defining optimal growth conditions to increase the growth yield of LAB strains.

Stress response

Adverse conditions are often encountered during food fermentation processes, including extreme temperature, acid, oxygen, and osmotic stresses. An ideal LAB strain with industrial potential should resist to these harsh conditions; for example, the LAB strains used as starter culture or probiotics are expected to survive spray drying process (Mauriello et al. 1999; van de Guchte et al. 2002). Response and defense mechanisms in LAB have been studies for years (Rallu et al. 1996; van de Guchte et al. 2002). With the assistance of genomics, the complicated molecular mechanism of stress responses can be understood at a global level (Bron et al. 2006; Budin-Verneuil et al. 2005; Serrano et al. 2007; Xie et al. 2004). This can possibly result in designing interesting strategies to improve the robustness of LAB for application purpose.

Acid stress response is important for LAB as lactic acid is the main catabolism product, which acidifies the media and arrests cell multiplication. In L. plantarum, comparative transcriptome profiling analysis revealed that a novel group of cell surface proteins were specifically induced by the lactic acid stress (Pieterse et al. 2005). Scanning electron microscope revealed that stressed cell surface became unevenly rougher than unstressed one. The authors speculated that it is caused by the induced expression of those surface proteins (Pieterse et al. 2005). In terms of oxidative stress, LAB are facultative anaerobic microorganisms and reactive oxygen species like superoxide and hydroxyl radical could attack cell components. In the past, one report suggested L. lactis could respire when heme was present in the culture (Sijpesteijn 1970). Genomic analysis revealed that L. lactis has the genes necessary for respiration but lacks full heme synthesis pathway and citric acid cycle (Bolotin et al. 2001). When heme and oxygen are available, the sugar metabolism of L. lactis can shift to respiration, which remarkably reduces the effect of oxidative and acid stress and leads to an improved long-term survival of L. lactis. A series of genome-scale analysis confirmed this alternation, and some respiration regulatory mechanisms were also discovered (Gaudu et al. 2002; Pedersen et al. 2008; Vido et al. 2004). Engineered LAB strains with aerobic respiration ability are expected to have an increased growth yield.

Metabolites of industrial potential

LAB could be used as cell factory for production of bulk and fine chemicals, including pyruvate-dissipating end products, exopolysaccharides, bacteriocins, vitamins, low-calorie sugars, complex flavor compounds, and polylactic acid or polylactide (Taguchi et al. 2008). Sorbitol is a popular low-calorie sweetener for its health-promoting properties. Genome sequencing of L. plantarum WCFS1 revealed that it has two putative sorbitol 6-phosphate dehydrogenase genes (srlD1 and srlD2). By reverting the sorbitol catabolic pathway by overexpressing srlD genes in mutant deficient in both l- and d-lactate dehydrogenase activities, high yield of sorbitol from fructose-6-phosphate was achieved (Ladero et al. 2007). Another example on enhanced folate biosynthesis ability of L. lactis ssp. cremoris MG1363 demonstrated that secondary metabolism pathway of LAB can be engineered, with assistance from genome annotation information of L. lactis ssp. lactis IL1403 (Sybesma et al. 2003). Flavor forming is an important trait for industrial application of LAB. The improved gene annotation leads to a better prediction of the flavor-forming pathways. With comparison of published LAB genomes, original annotations were improved and thus certain flavor-forming genes were identified in various LAB strains, providing a starting point for direct selection of potential strains for flavor production (Liu et al. 2008). Compared with genome of closely related strain Lactobacillus fermentum IFO 3956, Lactobacillus reuteri JCM 1112T was found harboring a unique genome island of bacteriocin biosynthesis, indicating its health-promoting role in GI tract (Morita et al. 2008). These studies suggest that comparative genomics approach would be an efficient tool to discover novel biochemical pathways for product with industrial potential (Piskur et al. 2007).

Population interactions and probiotic properties

Natural food processing are typically mixed-culture fermentations thus species live in such an environment become interdependent. For example, yogurt is made by fermenting milk with two LAB species, L. bulgaricus and S. thermophilus (Tamime and Robinson 1999). Molecular mechanisms of inner- or inter-interactions are difficult to understand. By using fluorescence in situ hybridization analysis, researchers could speculate interaction between strains by analyzing population dynamics (Sakai and Ezaki 2006). The availability of genome sequences of both species shed light on an integral analysis of the interactions and metabolic activity in milk. Genome sequences of both species showed a rapid adaptation to the milk environment, evidenced by the numbers of inactivated genes (Bolotin et al. 2004; van de Guchte et al. 2006). Interestingly, the inactivated metabolic function in L. bulgaricus can be complemented by S. thermophilus and vice versa. First, proteolytic L. bulgaricus provides nonproteolytic S. thermophilus with amino acids and peptides produced by cell-wall anchored extracellular protease. Second, L. bulgaricus lacks pyruvate-formate lyase and folate biosynthesis ability, while S. thermophilus can provide formic acid, folic acid, and carbon dioxide to L. bulgaricus. Third, other compounds such as long-chain fatty acid, putrescine, and ornithine could also contribute to mutualistic interaction. On one hand, S. thermophilus may supply L. bulgaricus with several unsaturated long-chain fatty acid, as the de novo biosynthesis pathways of long-chain fatty acid are incomplete in the latter one. On the other hand, ingredients such as putrescine and ornithine could be produced by both species, and the exchange of them mutually increases the resistance of both species to oxidative stress (Sieuwerts et al. 2008). Recently, S. thermophilus cultured with L. bulgaricus in milk was characterized with proteomics and transcriptomics methods. Besides confirming the existence of mutual benefiting hypothesis, other effects for S. thermophilus such as obtaining amino acids and purine from L. bulgaricus, adaptation to hydrogen peroxide (H2O2) were discovered, which provides more clues for understanding their interactions in dairy environment (Herve-Jimenez et al. 2009). The big challenge in mix culture biotechnology is to determine the species composition. Genomics and high-throughput technology will accelerate the innovation process by uncovering intra- or interspecies interactions, regulatory responses to different substrates and processing conditions, as these reviews indicated (Pastink et al. 2008; Sieuwerts et al. 2008).

Some LAB species are known for manufacturing probiotic foods. Other than food fermentation, probiotic functionalities include mainly host–microbial interactions. Probiotic LAB have to survive GI tract, by processing unique features such as acid and bile resistance, adherence, bacteriocin production capability, and growth on prebiotics (Klaenhammer 2000). Some of these features were validated at the genetic level with the aid of genomics and functional genomics analysis (Klaenhammer et al. 2008). One common feature of probiotics is bile tolerance. Comparative transcriptome profiling revealed the responses of L. acidophilus and L. reuteri to bile stress shared common mechanisms, including cell-envelope reorganization, denatured protein degradation, and DNA damage repairing (Pfeiler et al. 2007; Whitehead et al. 2008). Probiotic LAB are capable of utilizing complex carbohydrates that are indigestible by human and other microbiota and these sugars could selectively stimulate growth of probiotic LAB (Gibson and Roberfroid 1995). Characterization of the metabolism of these carbohydrates in L. acidophilus and L. plantarum has identified specific transporters and hydrolases for fructooligosaccharides (Barrangou et al. 2003, 2006; Saulnier et al. 2007). Adhesion properties of probiotic LAB result in a markedly prolonged duration within GI tract and pathogen inhibition, because glycosyl residues on intestinal cell surface could be competitively bound by LAB surface proteins (Pretzer et al. 2005). Aiming to identify mannose-specific adhesin genes in L. plantarum, L. plantarum strains were screened for adhesion ability and their genotypes were identified through DNA microarrays. The candidate genes were thus selected and only one was identified to be adhesin-encoding gene, through mutation verification (Pretzer et al. 2005). This study provides a useful “matching” strategy to identify related genes for certain functions through comparative genomics analysis.

Genome-scale modeling strategies

Mathematical models for improving fermentation were widely used in process design and control, but they are mostly empirical-based and scale-limited (Hoefnagel et al. 2002). Changing of energy charge, ratio of NAD+ to NADH (or ratio of NADP+ to NADPH), and concentration of coenzymes will dramatically affect bacterial metabolic network, since those metabolites participate many biochemical reactions and are always highly connected hubs in the network (Jeong et al. 2000). These biochemical reactions could be predicted by genome annotation. Recently, genome-scale stoichiometric modeling techniques were developed. After reconstruction of the metabolic network from annotation results, a genome-scale stoichiometric model can be built to investigate cellular metabolic capacities, either by calculating metabolic flux distribution or by analyzing metabolic network topological features (Borodina and Nielsen 2005; Feist et al. 2009; Teusink and Smid 2006; Trinh et al. 2009). However, within the 30 genome-scale models (26 species) available so far (Feist et al. 2009), there are only two genome-scale stoichiometric models for LAB strains (Oliveira et al. 2005b; Teusink et al. 2006).

One genome-scale model is for L. lactis ssp. lactis IL1403, with 621 reactions and 509 metabolites (Oliveira et al. 2005a). With appropriate constraints, the model was tested useful for predicting gene essentiality and substrates preferability. Metabolic engineering strategies for improving diacetyl yield were also predicted. There are three genes targeted to be knocked out, which had never been reported before (Oliveira et al. 2005a). Another genome-scale stoichiometric model for L. plantarum WCFS1 was developed after reconstruction of metabolic network and extensive curation (Teusink et al. 2006). Interestingly, the amino acid catabolism pathways related with flavor-forming characteristics were also predicted as ATP producers in this model (Teusink et al. 2006). Many futile cycles and parallel pathways were found, which remarkably increase metabolic flexibility of L. plantarum. However, their regulation mechanism on metabolic level is still unknown and experimental validations are needed (Teusink et al. 2006).

In general, the stoichiometric modeling is still in its infancy. First, based on steady-state hypothesis, continuous culture is needed, while batch culture is more popular in industrial application. Scientists tried to describe a serial steady state at successive time points (Luo et al. 2006; Mahadevan et al. 2002). To this end, novel global kinetic modeling theories and techniques for continuous variation are needed, because the cellular metabolism in most situations is not at a steady state (Jamshidi and Palsson 2008). As most kinetic parameters in enzymatic dynamic equations are difficult to obtain, a trade-off between calculability and complexity should be made (Bulik et al. 2009; Hoppe et al. 2007; Nikerel et al. 2006; Smallbone et al. 2007; Voit 2008). Second, the objective function could be altered, according to network features. For example, in modeling solventogenesis phase of Clostridium acetobutylicum ATCC 824, a nonlinear objective function was successfully introduced to avoid multiple-solution problem due to cyclic pathway and to better characterize stationary features of cellular growth (Lee et al. 2008). In the model of L. plantarum, flux balance analysis (FBA) was tested failure, and the authors suggested the overflow feature (i.e., energetically inefficient metabolic behavior) should be viewed in whatever novel objective functions (Teusink et al. 2006). Third, constraints information of specific fluxes has to be set manually, to restrict the solution space, and it only provides static presentation of cellular metabolism. Recently, integration of transcriptomic data with metabolic modeling was described, both by examples and methods (Covert and Palsson 2002, 2003; Cox et al. 2005). Gene regulatory mechanisms are described as a binary system, such as if-then rules using Boolean logic on FBA to make more accurately predictions, for additional constraints could reduce the solution space (Covert and Palsson 2003). Thus, it is expected to take use of multiple omics data, to build an integrated model that incorporate multiple cellular processes at the genome-scale for LAB similar to what has been done for E. coli and Saccharomyces cerevisiae (Covert et al. 2008; Min Lee et al. 2008).

Concluding remarks and perspectives

In the near future, the number of the available LAB genomes will be approaching 100. This represents a large group of bacteria in microbial genomics, as compared to the total number of the sequenced bacteria (792 finished, 2,392 ongoing). With the rapid development of next generation sequencing techniques, it is not surprising that LAB researchers will be all working with their pet bacterium with genome available. Genomics have made significant advances on the genetics, physiology, and application of LAB. Deep insights into complex physiological features, including substrate uptake and utilization, stress responses, and metabolite biosynthesis, are obtained. Systems biology approaches, which integrate transcriptomics, proteomics, metabolomics, and modeling techniques, are emerging to interpret the “dark” part of sequence. Based on the “dry” genome annotation and “wet” high-throughput analyses, global networks will be developed to quantitatively predict the systems behavior, in a real fermentation situation, for example, complex food matrix. Understanding the interaction among each individual LAB strains is extremely important to explore the application of LAB populations in a form of mixed culture. This can be seen from a simple model system present in yogurt fermentation, the population interaction between L. bulgaricus and S. thermophilus. Quorum sensing is believed to be one of approaches for LAB communications. Future genomics study is expected to reveal the mechanism for communication within LAB, which will further improve the functionalities of LAB and its industrial application.