Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Phylogenomics is a widely adopted approach for assessing the evolutionary histories among organismal lineages based on comparative analysis of genome-scale data. Extending from phylogenetic analysis at the gene level, phylogenomic inference is commonly observed based on gene-by-gene (Beiko et al. 2005; Puigbò et al. 2010), concatenated multi-genes (Nozaki et al. 2007; Baurain et al. 2010) or whole-genome (Rannala and Yang 2008) comparisons. Current standard for phylogenomics involves the identification of homologous gene/protein sequences, multiply aligned these sequences in a multiple sequence alignment framework, from which phylogenies would be inferred. Using phylogenomics, we can gain a better understanding of how a genome has evolved relative to other species.

The power of phylogenomics relies on the availability of high-quality genome data. The earlier phylogenomics studies focused on prokaryotes (Beiko et al. 2005; Puigbò et al. 2010; Bansal et al. 2013). These typically small, simple genomes (mostly <10 Mb in size; little intergenic regions) can be obtained at lower cost than eukaryote genomes. As of 25 April 2014, there are 24,349 prokaryote genomes available on NCBI (http://ncbi.nlm.nih.gov/genome), compared to 2775 eukaryote genomes. Moreover, sequencing decisions have long been biased towards species of economic and medical importance. As the costs of sequencing decrease in recent years, other biological aspects e.g. evolution and phyletic positions can now drive sequencing decision, enabling sequences from taxa that are evolutionarily important (but not necessarily of medical or economic importance) to be generated at scale that was previously unimaginable.

The following sections highlight the importance of phylogenomics in algal research, what have we learned from algal phylogenomics, and its limitations. The future perspectives of algal phylogenomics are discussed in light of the on-going deluge of sequencing data.

2 Why Do We Need Phylogenomics in Algal Research?

For decades, algal research has largely been driven by (a) biotechnology and fisheries, particularly the production of biomass and secondary metabolites (De Ruiter and Rudolph 1997; Chopin and Swahney 2009; Bixler and Porse 2011), and (b) taxonomy and systematics, in which identification of a species, particularly of seaweeds, is complicated by the presence of multiple physiological appearances and ploidies across different life history stages (Blouin et al. 2011). The algal hydrocolloids (e.g. carrageenan, alginate, agar and agarose) are key thickening, gelling and emulsifying agents that are widely used in the industries of food, animal feed, pharmaceuticals and cosmetics. The value of seaweed hydrocolloids is estimated between US$ 0.65 to 1.02 billion (Chopin and Swahney 2009; Bixler and Porse 2011), and the global seaweed industry valued at about US$ 6 billion (Chopin and Swahney 2009; Soto 2009). Lipid production has also been highlighted due to the worldwide attention on biofuel as an alternative to petroleum (Dismukes et al. 2008; Mata et al. 2010). The availability of genome data allows us to address more-fundamental biological questions from the evolutionary perspective.

2.1 Algal Diversity

Algae are a diverse group of simple photosynthetic organisms. Growing almost exclusively in aquatic environments ranging from the freshwaters, estuaries, ocean surface to coral reefs, algae are the most important primary producers on Earth (Amante and Eakins 2009). The diversity of algae has been conservatively estimated at about 300,000 species, with a rough estimate of >1 million species (Guiry 2012). Figure 1 shows the evolutionary relationships among eukaryote lineages based on current systematics (Adl et al. 2012). Photosynthetic lineages in eukaryotes are broadly distributed across different supergroups within Diaphoretickes. These photosynthetic eukaryotes, except plants, are loosely defined as algae (note that colloquially the prokaryotic cyanobacteria are known as the blue-green algae). The supergroup Archaeplastida (Cavalier-Smith 1981; Rodríguez-Ezpeleta et al. 2005), also known as Plantae, represents the most primitive lineages of photosynthetic eukaryotes, which include Glaucophyta (glaucophyte algae), Rhodophyta (red algae), and Chloroplastida (green algae and plants; also known as Viridiplantae). Well-known examples of these taxa include the red seaweeds that are biotechnologically important e.g. Porphyra and Gracilaria, and the green alga Chlamydomonas reinhardtii. These algae possess the simple, two-membrane-bound primary plastids (Cavalier-Smith 1981; Rodríguez-Ezpeleta et al. 2005). In comparison, the other eukaryotic algae possess the more structurally complex, secondary (or tertiary) plastids, bound by three or four membranes. These taxa are sometimes loosely grouped as the “chromalveolates”, which include the stramenopiles (e.g. diatoms, brown algae), alveolates (e.g. dinoflagellates), and the haptophytes (e.g. Emiliania). Some dinoflagellates cause “red tides”, which have a huge impact on global economy and human health (Hallegraeff 1993; Anderson et al. 2008).

Fig. 1
figure 1

Current classification of eukaryote lineages and their evolutionary relationships based on Adl et al. (2012)

2.2 Algal Evolution and Plastid Origin

The origin of algae and plastids (hence photosynthesis) among eukaryotes, are critical to our understanding of the geological and atmospheric histories of planet Earth, e.g. the Great Oxygenation Event ca. 2.4 billion years ago (Scott et al. 2008). Current understanding of plastid origins has been extensively reviewed (Reyes-Prieto et al. 2007; Howe et al. 2008; Keeling 2010; Chan et al. 2011a). Figure 2 shows the current understanding of plastid evolution in eukaryotic algae. The origin of primary plastids among the Archaeplastida lineages (and the known example of the rhizarian Paulinella) traced back to a cyanobacterial source, in which a cyanobacterium was engulfed by and retained within a heterotrophic host (i.e. primary endosymbiosis) (Margulis 1970), estimated to have occurred around 1–1.5 billion years ago (Douzery et al. 2004; Yoon et al. 2004). This process induced genetic transfer from the endosymbiont to the host nucleus, and the engulfed endosymbiont gradually became the extant plastids.

Fig. 2
figure 2

Current understanding of plastid evolution in photosynthetic eukaryotes, for those with primary plastids (Archaeplastida/Plantae), and others with plastids that are more structurally complex, based on three major hypotheses (a, b and c) in plastid evolution

On the other hand, the evolutionary history of the more-complex plastids (e.g. in brown algae, diatoms and dinoflagellates) is complicated by multiple, serial events of endosymbiosis involving already plastid-bearing endosymbionts (Yoon et al. 2005; Reyes-Prieto et al. 2007). Published studies suggest three possible paths from which the secondary/tertiary plastids could have arisen: (a) secondary endosymbiosis involving an ancestral red algal cell, i.e. the chromalveolate hypothesis (Cavalier-Smith 1998, 1999) (Fig. 2A); (b) secondary red algal endosymbiosis followed by tertiary endosymbiosis involving an ancestral haptophyte-like cell, as postulated for fucoxanthin-containing dinoflagellates (Ishida and Green 2002; Yoon et al. 2005) (Fig. 2B); and (c) a secondary endosymbiosis involving both ancestral red and green algal cells (Moustafa et al. 2009), or other eukaryote-eukaryote endosymbioses (Archibald 2009; Bodył et al. 2009; Baurain et al. 2010; Stiller et al. 2014) (Fig. 2C). These hypotheses remain to be investigated further as more genome data become available.

3 What Have We Learned from Algal Phylogenomics?

Table 1 shows a number of key published algal genomes as of 1 January 2014. For years, biased taxon sampling in algal phylogenomics has been attributed to inadequacy of red algal genome data. The availability of red algal genomes in recent years therefore represents a significant milestone in algal research. Given the important role of red algal lineages in algal evolution (Fig. 2), these genomes provide an excellent analysis platform for addressing many outstanding questions in algal evolution and endosymbiosis.

Table 1 Non-exhaustive list of key published algal genomes as of 1 January 2014, sorted by estimated genome size in an ascending order

3.1 Origin of Photosynthetic Eukaryotes

Archaeplastida supergroup represents the primitive lineages of photosynthetic eukaryotes. These taxa bear the primary plastids, and are expected to share a common ancestry. However, the initial phylogenetic support for this hypothesis had been limited to a handful of genes (Rodríguez-Ezpeleta et al. 2005; Nozaki et al. 2007). This is partly due to lack of gene repertoires for glaucophyte (Glaucophyta) and red algae (Rhodophyta), which are scarce in comparison to those available for green algae and plants (Chloroplastida). Not until recently, no glaucophyte genome was available, and the only available red algal genome was the highly reduced genome from the hyperthermophile Cyanidioschyzon merolae. Enriching available data using novel data of mesophilic red algal species, i.e. Porphyridium purpureum and Calliarthron tuberculosum, an earlier study (Chan et al. 2011c) demonstrated a strong support for Archaeplastida (by proxy of strongly supported clades of reds and greens) across hundreds (~50 %) of the analyzed protein phylogenies. These findings are further reinforced by a later study incorporating the novel genome data of Cyanophora paradoxa (Price et al. 2012), the first of any glaucophyte algae. This work completes the missing link that unifies all three major groups under Archaeplastida, thus evidence for a single origin of all primary plastids in eukaryotes. These studies also demonstrate that the earlier difficulty in resolving the supergroup using phylogenetic approaches is likely due to the extent of lateral genetic transfer among microbial lineages.

3.2 Endosymbiosis and Algal Evolution

Owing to endosymbiosis (Fig. 2), the complication of genetic transfer in algal evolution is expected, especially among taxa that possess secondary (and tertiary) plastids, e.g. the “chromalveolates”. The positions of these lineages on the eukaryote tree of life are far from being resolved, as demonstrated in a number of studies based on phylogenies of select genes (Burki et al. 2007, 2012b; Baurain et al. 2010; Parfrey et al. 2010). Key examples of these taxa include the ubiquitous diatoms (stramenopiles) and dinoflagellates (alveolates). In a phylogenomic analysis (Moustafa et al. 2009) using two completely sequenced diatom genomes (Armbrust et al. 2004; Bowler et al. 2008), hundreds of diatom genes are found to be of red or green algal origin, suggesting a putative cryptic endosymbiosis involving an ancestral (prasinophyte-like) green alga in the course of diatom evolution. Later studies of algal genes encoding functions of membrane transport (Chan et al. 2011b) and fatty acid biosynthesis (Chan et al. 2013; Wang et al. 2014) revealed red and/or green algal prominence in these genes, demonstrating that algal genetic transfer as a key factor to environmental adaptation in microbial eukaryotes.

The extent of genetic transfer in prokaryotes is known to be rampant (Beiko et al. 2005; Zhaxybayeva et al. 2006; Dagan and Martin 2007; Puigbò et al. 2010). In a recent transcriptome analysis of the dinoflagellate Alexandrium tamarense (Chan et al. 2012b), the extent of genetic transfer in microbial eukaryotes is shown to be comparable to that in prokaryotes, despite more-complex coding capacity in eukaryotes. The dinoflagellates can be considered as the worst-case scenario in terms of the complexity of algal evolution, because tertiary (and likely quaternary) endosymbiosis events involving other eukaryotic (e.g. haptophyte-like) cells have been postulated (Hackett et al. 2004; Yoon et al. 2005; Wisecaver and Hackett 2011) in addition to the presence of bacterial derived genes (Nosenko and Bhattacharya 2007; Slamovits et al. 2011). Such an evolutionary complexity is against the backdrop of mysteriously immense genome sizes, with the largest dinoflagellate genome (of Prorocentrum micans) estimated to exceed 210 Gbp (Hackett et al. 2004; LaJeunesse et al. 2005).

Some have argued that these findings could in part be an artifact due to inadequacy of red algal genes at the time (Burki et al. 2012a; Deschamps and Moreira 2012) and to technical biases (Dagan et al. 2013). Nevertheless, all these studies demonstrate algal and bacterial genetic transfer as key contributing factors to the adaption and survival of microbial species in fluctuating marine environments.

3.3 Algal Biology and Physiology

Recently available algal genomes also provide an interesting analysis platform for assessing biological features that would inform us about genome innovation relative to physiological and/or environmental changes. The red algal genomes, for instance, are found to be highly compact with few intronic regions across unicellular (Bhattacharya et al. 2013) and multicellular species (Collén et al. 2013; Nakamura et al. 2013), with only about 0.3 introns per gene.

In cases where genome data are not yet available, e.g. for the economically important Porphyra (Gantt et al. 2010; Blouin et al. 2011), studies of transcriptomes are already providing clues to key physiological characteristics in red algae, e.g. new fatty acid biosynthesis and trafficking pathways (Chan et al. 2012a), and differential expression of genes involved in key development processes (Stiller et al. 2012). Studies of other red algal genomes (Bhattacharya et al. 2013; Collén et al. 2013) are generating novel insights into the origin and evolution of carbohydrate metabolism and biosynthesis of secondary metabolites e.g. starch and isoprenoid compounds.

Algal epigenomes (Zhao et al. 2007; Gross et al. 2013) are providing first clues about genetic regulation in these organisms by non-coding elements. Other studies have demonstrated that green algal derived genes in microbial eukaryotes are important for the function of light-harvesting complex superfamily (Peers et al. 2009), and for protection from oxidative damage (Frommolt et al. 2008). Genetic transfer has recently been demonstrated in the cryptophyte Guillardia theta and chlorarachniophyte Bigelowiella natans (Curtis et al. 2012), implicating their respective relict endosymbiont nucleus within the cell, i.e. the nucleomorph (Archibald 2007). All these findings are barely the tip of an iceberg in algal biology.

Recent phylogenomic studies clearly demonstrate the critical role of lateral genetic transfer and endosymbiosis in shaping genomes of algae and all other microbial eukaryotes. Although many of the implicated genes and/or pathways remain to be experimentally validated, findings from these studies provide a knowledgebase of interesting biological and ecophysiological aspects that one could hone in on, e.g. the development of multicellularity in algae (Cock et al. 2010).

3.4 Uncovering Hidden Biodiversity

Phylogenomic methods have recently been used to uncover hidden biodiversity and physiological stages of unculturable microbes. This approach plays to the strength of single-cell genomics (Lasken 2007; Woyke et al. 2009), which allows for capturing snapshots of genome from individual cells, and genomic variation within a population. In an analysis of three single-cell genomes of an unculturable marine “algal” species, picobiliphytes (Yoon et al. 2011), each genome content reveals distinct physiological stage for each cell: normal, actively feeding, and severely infected by a marine virus. These cells were collected from the same 50-mL seawater sample from a single location, suggesting that marine biodiversity is greater than what one would expect, and that it extends beyond the conventional scope at species level.

Interestingly, the authors described a complete absence of chloroplast- or photosynthesis-related genes across these genomes, suggesting that picobiliphytes, previously described as algae (Not et al. 2007), are more likely heterotrophs than photoautotroph. Therefore, the plastid nucleomorph observed in these cells could be a result of kleptoplasty whereby the plastid could have been “stolen” from another algal source (Trench 1969), or simply from an ingested cell, i.e. the plastid was within an algal cell that was engulfed by the picobiliphyte. In this case, phylogenomics using single-cell genome data has uncovered hidden marine biodiversity that would have been overlooked using the conventional genomic approaches based on cultured cells. Incorporating this approach into other means of capturing genome snapshots in situ (Bhattacharya et al. 2012), e.g. across multiple time points via experimental evolution (Sniegowski et al. 1997; Ebert 1998), allows for systematic assessment of diverse aspects of ecology and evolution for specific organisms/cells, as well as their interactions with one another and with the environments.

4 Limitations of Algal Phylogenomics

Given that most of algal genomes available to date are sequenced de novo, the quality of the genome assembly and annotation remains to be improved as more data become available. In addition to large genome sizes (e.g. for dinoflagellates) (LaJeunesse et al. 2005; Hackett and Bhattacharya 2008), yet-to-be identified genome features could be a hurdle for data assembly in algal genome projects. Phylogenomics (and phylogenetics) is a working hypothesis, and one needs to be aware that such an approach yields only clues, not the absolute truth, about how genomes have evolved. Over- and under-interpreting phylogenomic results could yield biased, inaccurate conclusions.

The technical limitations of phylogenetic approaches have been reviewed extensively in the literature (Philippe et al. 2004, 2005, 2011; Stiller 2011). The quality of sequence data vis-à-vis stochastic sequence variation, convergence, long-branch attraction, incomplete sequence data (e.g. transcriptome, gene fragments) in addition to contaminations, represents the biggest hurdle in phylogenomics, and any sequence analysis. Perhaps the most relevant, longstanding issue in the study of algal phylogenomics is the biased or unbalanced taxon sampling, which has significant impact on any phylogenetic inferences and how the results are interpreted. For instance, the opposing views of the algal contribution to the evolution of diatoms vis-à-vis endosymbiosis (Moustafa et al. 2009; Burki et al. 2012a; Deschamps and Moreira 2012) have been largely attributed to the limited red algal gene repertoire. Certainly, the biases of taxon sampling will diminish as more genome data are becoming available. Given the vast diversity of algal species, however, to what extent will such biases of taxon sampling be tolerable remains an open question.

5 Future Perspectives and Conclusions

As the application of phylogenomics in algal research is becoming more common as more data are becoming available, a key question remains: are current state-of-the-art phylogenomic approaches sufficient, or should we spend more time in developing one that is better? In other words, where is the balance between extracting as much information as we can from the rapidly growing data using our current know-how, versus exploring approaches that would take us perhaps closer to the truth? This question has no easy answer.

Given the on-going deluge of sequencing data, the limitation of computational and human resources in data management, interpretation and analysis cannot be overstated. Where genome data is unavailable, the use of transcriptome data in phylogenomic analysis is not uncommon (Struck et al. 2011). However, assembled transcriptome data, e.g. mostly of expressed sequence tags, contain partial gene transcripts and could be biased by environmental conditions during which genetic materials were harvested. Multiple sequence alignment of these sequences alongside with other (putatively homologous) full-length sequences inevitably creates undesirable “gappy” aligned positions (i.e. phylogenetically non-informative sites) that would affect subsequent phylogenetic inferences, against the backdrop of genome rearrangement, genetic recombination, and lateral genetic transfer. An alternative strategy is to use the so-called alignment-free methods in calculating sequence distances (e.g. using k-mers) (Vinga and Almeida 2003; Höhl and Ragan 2007; Reinert et al. 2009; Chan and Ragan 2013; Bonham-Carter et al. 2014; Ragan et al. 2014), which does not require contiguity of homologous sequences to be conserved. However, the application of these approaches and their scalability in phylogenomics remain to be systematically investigated (Posada 2013; Ragan and Chan 2013; Chan et al. 2014). Besides, alternative phylogenetic representations independent from the tree-like structure, e.g. the use of networks (Dagan 2011; Huson and Scornavacca 2011) also provide a fresh perspective into genome evolution.

Given current limitations in phylogenomics (as highlighted in the above section), one could argue that current approaches would yield biased inferences that would be of little use. One could always improve the phylogenetic framework, e.g. in “perfecting” sequence alignments (Edgar 2004; Sievers et al. 2011), identification of homologous groups (Li et al. 2001; Harlow et al. 2004) or phylogenetic algorithms (Neuwald 2009; Liu et al. 2012; Nelesen et al. 2012) to reduce inaccurate inferences. On the other end of the spectrum, providing more-efficient scalability, higher computing capacity, better implementations and sampling strategies among existing data, phylogenomic studies could yield valuable insights into algal biology and evolution. A common ground between the two schools of thoughts is crucial for the field to move forward.

Phylogenomics is a powerful tool for delineating organismal evolution from the genomic perspective, yielding novel, high-level biological hypotheses that would guide experimental designs for further genetic, biochemical or physiological studies at greater depth and in a more-refined focus. As algal research shifts towards a multidisciplinary framework, projects involving large international collaborative networks that combine expertise from various research areas are desirable, as demonstrated in a number of recent studies (Chan et al. 2012a; Price et al. 2012; Collén et al. 2013). With a positive outlook on the forthcoming algal genome data, phylogenomics remains a highly powerful and relevant tool in algal research, especially when we are now at the juncture that we can address interesting biological questions at scale that was unimaginable just a few years ago.