Abstract
Adaptive radiation is the likely source of much of the ecological and morphological diversity of life1,2,3,4. How adaptive radiations proceed and what determines their extent remains unclear in most cases1,4. Here we report the in-depth examination of the spectacular adaptive radiation of cichlid fishes in Lake Tanganyika. On the basis of whole-genome phylogenetic analyses, multivariate morphological measurements of three ecologically relevant trait complexes (body shape, upper oral jaw morphology and lower pharyngeal jaw shape), scoring of pigmentation patterns and approximations of the ecology of nearly all of the approximately 240 cichlid species endemic to Lake Tanganyika, we show that the radiation occurred within the confines of the lake and that morphological diversification proceeded in consecutive trait-specific pulses of rapid morphospace expansion. We provide empirical support for two theoretical predictions of how adaptive radiations proceed, the ‘early-burst’ scenario1,5 (for body shape) and the stages model1,6,7 (for all traits investigated). Through the analysis of two genomes per species and by taking advantage of the uneven distribution of species in subclades of the radiation, we further show that species richness scales positively with per-individual heterozygosity, but is not correlated with transposable element content, number of gene duplications or genome-wide levels of selection in coding sequences.
Similar content being viewed by others
Main
At the macroevolutionary level, the diversity of life has been shaped mainly by two antagonistic processes: evolutionary radiations increase, and extinction events decrease, organismal diversity over time5,8,9. Evolutionary radiations are referred to as adaptive radiations if new lifeforms evolve rapidly through adaptive diversification into a variety of ecological niches, which typically presupposes ecological opportunity1,2,3,10. Whether or not an adaptive radiation occurs depends on a variety of extrinsic and intrinsic factors as well as on contingency, whereas the magnitude of an adaptive radiation is determined by the interplay between its main components, speciation (minus extinction) and adaptation to distinct ecological niches1,2,4,11. Despite considerable scientific interest in the phenomenon of adaptive radiation as the cradle of organismal diversity1,2,10,12,13, many predictions regarding its drivers and dynamics remain untested, particularly in exceptionally species-rich instances. Here, we examine what some consider as the “most outstanding example of adaptive radiation”14, the species flock of cichlid fishes in Lake Tanganyika. This cichlid assemblage comprises about 240 species15, which together feature an extraordinary degree of morphological, ecological and behavioural diversity14,15,16,17. We construct a species tree of Lake Tanganyika’s cichlid fauna on the basis of genome-wide data, demonstrate the adaptive nature of the radiation, reconstruct eco-morphological diversification along the species tree, and test general and cichlid-specific predictions related to adaptive radiation.
In situ radiation in Lake Tanganyika
To establish the phylogenetic context of cichlid evolution in Lake Tanganyika, we estimated the age of the radiation through divergence time analyses based on cichlid and other teleost fossils18, and constructed time-calibrated species trees using 547 newly sequenced cichlid genomes (Supplementary Table 1). Our new phylogenetic hypotheses (Fig. 1, Extended Data Figs. 1–4, Supplementary Figs. 1, 2) support the assignment of the Tanganyikan cichlid fauna into 16 subclades—corresponding to the taxonomic grouping of species into tribes15—and confirm that the Tanganyikan representatives of the tribes Coptodonini, Oreochromini and Tylochromini belong to more ancestral and widespread lineages that have colonized the lake secondarily12,15,19 (Supplementary Discussion). It has been under debate whether all endemic Tanganyikan cichlid tribes evolved within the confines of Lake Tanganyika or whether some of them evolved elsewhere before the formation of the lake20,21,22. Our time calibrations establish that the most recent common ancestor of the cichlid radiation in Lake Tanganyika lived around 9.7 million years ago (Ma) (95% highest-posterior-density age interval: 10.1–9.1 Ma) (Fig. 1), which coincides with the appearance of lacustrine conditions in the Tanganyikan Rift23. This suggests that the radiation commenced shortly after the lake had formed and that all endemic cichlid tribes have evolved and diversified in situ, that is, within the temporal and geographical context of Lake Tanganyika.
Phenotypes correlate with environments
Because—in the case of adaptive radiation—diversification occurs via niche specialization, a strong association is expected in the extant fauna between the environment occupied by a species and the specific morphological features used to exploit it2,3. To quantify eco-morphological diversification across the radiation, we investigated three trait complexes through landmark-based morphometric analyses. Specifically, we quantified body shape and upper oral jaw morphology using 2D landmarks acquired from X-ray images and the shape of the lower pharyngeal jaw bone based on 3D landmarks derived from micro-computed tomography (μCT) scans (Extended Data Fig. 5). To approximate the ecological niche of each species, we used the carbon and nitrogen stable-isotope composition of muscle tissue, which provides information about the relative position along the benthic–pelagic axis (δ13C value) and the relative trophic level (δ15N value), respectively16,24—a pattern that we corroborate here for Lake Tanganyika (Extended Data Fig. 6a, Supplementary Discussion). The major axes of shape variation for each trait complex were identified through a principal component analysis (PCA). To test for phenotype–environment correlations and to identify the ecologically most relevant components of each of these trait complexes, we performed a two-block partial least-square analysis (PLS) with the stable-isotope measurements, and applied a phylogenetic generalized least-square analysis (pGLS) to account for phylogenetic dependence.
The quantification of variation in body shape revealed that principal component 1 (PC1) represented mainly differences in aspect ratio, whereas PC2 was loaded with changes in head morphology (Fig. 2a). The changes in aspect ratio (comparable to PC1) were correlated with the δ13C and δ15N values (PLS: Pearson’s r = 0.69, R2 = 0.48, P = 0.001; pGLS: R2 = 0.12, P < 0.001, λpGLS = 1.007). PC1 of upper oral jaw morphology mainly represented changes in the orientation and relative size of the premaxilla, which was also the main correlate to the stable C and N isotope composition (PLS: Pearson’s r = 0.62, R2 = 0.38, P = 0.001; pGLS: R2 = 0.09, P < 0.001, λpGLS = 1.023), whereas PC2 was defined by changes in the ratio of the rostral versus the lateral part of the bone (Fig. 2b). For lower pharyngeal jaw shape, we found that PC1 reflected mainly changes in the aspect ratio of the jaw bone in combination with an increased posterior thickness, whereas PC2 involved similar shifts in thickness, yet in this case in combination with changes in the lengths of the postero-lateral horns that act as muscle-attachment structures25 (Fig. 2c). The PLS revealed that shape changes similar to PC2 are best associated with stable-isotope values (PLS: Pearson’s r = 0.67, R2 = 0.45, P = 0.001; pGLS: R2 = 0.16, P < 0.001, λpGLS = 1.018). The PCAs further revealed that the occupied area of the morphospace and ecospace scales with the number of species in the tribes (Extended Data Figs. 6, 7; ecospace: Pearson’s r = 0.88, d.f. = 9, P < 0.001; body shape: Pearson’s r = 0.91, d.f. = 9, P < 0.001; upper oral jaw morphology: Pearson’s r = 0.88, d.f. = 9, P < 0.001; lower pharyngeal jaw shape: Pearson’s r = 0.83, d.f. = 9, P = 0.002), a pattern that is not driven by sample size only (Supplementary Discussion).
Overall, the significant association between each of the three traits and the stable C and N isotope composition underpins their adaptive value (Extended Data Fig. 8a–c). A joint consideration points out that deep-bodied cichlids with inferior mouths and thick lower pharyngeal jaws with short horns are associated with higher stable-isotope projections (high δ13C and low δ15N values), indicating that such fishes occur predominantly in the benthic/littoral zone of the lake and feed on plants and algae, whereas more elongated species with more superior mouths and longer and thinner lower pharyngeal jaws are generally associated with lower stable-isotope projections (low δ13C and high δ15N values), suggesting a more pelagic lifestyle and a higher position in the food chain.
Pulses of morphological diversification
Next, we investigated the temporal dynamics of how the observed eco-morphological disparity emerged over the course of the radiation. In addition to the three eco-morphological traits, we also scored male pigmentation patterns to approximate disparity along the signalling axis—another potentially important component of diversification in adaptive radiations1,6,7,26. For all four traits, we estimated morphospace expansion through time using ancestral-state reconstructions along the time-calibrated species tree and applying a variable-rates model of trait evolution27,28 (Extended Data Fig. 8d, e). We calculated morphological disparity as the extent of occupied morphospace in time intervals of 0.15 million years (Myr) in comparison to a null model that assumes Brownian motion. Likewise, evolutionary rates through time were calculated as mean evolutionary rates derived from the variable-rates model, sampled at the same time points along the phylogeny.
Our analyses uncovered a pattern of discrete pulses in morphospace expansion, which were followed, in most cases, by morphospace packing (Fig. 3). The timing of these pulses differed among the traits. For body shape, we found a pulse of rapid morphospace expansion early in the radiation, alongside the first pulse of lower pharyngeal jaw shape diversification (Fig. 3b, c); this early phase of the radiation also features the highest evolutionary rates for body shape (Fig. 3d). The pulse in upper oral jaw diversification occurred in the middle phase of the radiation. Evolutionary rates were increased during this period, and were even higher at a later phase that was dominated by packing of the upper oral jaw morphospace rather than its expansion (Fig. 3b–d). This suggests that, in that later phase, rapidly evolving lineages diverged into pre-occupied regions of the morphospace, ultimately resulting in convergent forms16. The second pulse in lower pharyngeal jaw morphospace expansion happened late in the radiation when evolutionary rates were also highest for this trait (Fig. 3b–d). Thus, the theoretical prediction that eco-morphological diversification is rapid early in an adaptive radiation and slows down through time as the available niche space becomes filled1,5 applies only to body shape. Yet, this early burst in body shape diversification was not connected to a substantial increase in lineage accumulation (Fig. 3c).
Pigmentation patterns showed a single pulse of diversification and increased evolutionary rates late in the radiation—a signature unlikely to be caused by a high turnover rate in this trait (Supplementary Discussion). This late pulse of diversification in pigmentation patterns, together with the consecutive pulses of morphospace expansion in the eco-morphological traits, is in agreement with the prediction that diversification in an adaptive radiation proceeds in discrete temporal stages—first in macrohabitat use, then by trophic specialization, followed by a final stage of divergence along the signalling axes1,6,7. However, in contrast to the conventional stages model, the most recent stage of the cichlid adaptive radiation in Lake Tanganyika, which coincides with a large number of speciation events (Fig. 3c), is characterized by temporally overlapping pulses of diversification in both a putative signalling trait and in an ecologically relevant trait. The lower pharyngeal jaw shape is the only trait complex showing two discrete pulses of morphospace expansion—one early in the radiation and one late when niche space already became limited. This later pulse suggests that diversification in the pharyngeal jaw apparatus facilitated fine-scaled resource partitioning after body shape and upper oral jaw morphospaces had been explored, resulting in the densely packed niche space observed today (Figs. 2, 3b).
Genomic features and species richness
Finally, we examined whether the diversity patterns arising over the course of the radiation are linked with particular genomic features. It has previously been suggested—on the basis of five reference cichlid genomes—that the radiating African cichlid lineages are characterized by increased transposable element counts, increased levels of gene duplications, and genome-wide accelerated coding-sequence evolution13. Because of the phylogenetic substructure of Lake Tanganyika’s cichlid fauna and the widely differing species numbers among tribes, our data offered the opportunity to examine genomic features for an association with per-tribe species richness within a large-scale radiation. We did not find evidence that members of species-rich tribes exhibit greater numbers of transposable elements (Fig. 4a) or more duplicated genes in their genomes (Fig. 4b), nor do they feature elevated genome-wide signatures of selection in coding sequences (Fig. 4c) (see also Extended Data Fig. 9). However, we found that a tribe’s species richness scales positively with a common measure of genetic diversity: genome-wide heterozygosity (Fig. 4d). That genetic diversity is linked to species richness has been previously suspected, although the nature of this relationship and the determinants of genetic diversity are under debate29,30.
Elevated levels of heterozygosity could potentially result from hybridization31, which has itself been suggested as a trigger of cichlid radiations22,32,33. In Tanganyikan cichlids, the level of gene flow within tribes (estimated using f4-ratio values34) does not correlate with a tribe’s species richness (Fig. 4e, Extended Data Fig. 10). Nevertheless, much of the variation in heterozygosity as well as its correlation with species richness can be explained by the observed levels of gene flow within tribes in combination with the reduced gene flow among them: through coalescent simulations of genome evolution along the species tree we show that variation in migration rates, sampled from the empirical f4-ratio estimates, can produce levels of heterozygosity that are similar to the ones observed in nature (Fig. 4f). Hence, the correlation between species richness and heterozygosity can be explained by gene flow and phylogenetic structure, which is consistent with the expectation that the effect of gene flow scales positively with the number of hybridizing species and the divergence among these. In the cichlid radiation in Lake Malawi, which is an order of magnitude younger than the one in Lake Tanganyika, heterozygosity levels vary much less among lineages and do not scale with species richness, which—according to our findings—can be explained by the much lower levels of genetic differentiation between the hybridizing species33.
Conclusion
On the basis of a comprehensive dataset on cichlid fishes from African Lake Tanganyika, we tested predictions related to the phenomenon of adaptive radiation. We establish that the Tanganyikan cichlid radiation unfolded within the temporal and spatial confines of the lake, giving rise to an endemic fauna consisting of about 240 species in 52 genera and 13 tribes in less than 10 Myr. Although the ancestors of these tribes initially found comparable ecological opportunity, present-day species numbers differ by two orders of magnitude among these phylogenetic sublineages. Our analyses of morphological, ecological and genomic information revealed that, taken as a whole, species-rich tribes occupy larger fractions of the morphospace and ecospace and contain species that are, at the per-genome level, genetically more diverse, which appears to be linked to gene flow. We demonstrate a phenotype–environment association in three trait complexes (body shape, upper oral jaw morphology and lower pharyngeal jaw shape) and pinpoint their most relevant adaptive components. We show that eco-morphological diversification was not gradual over the course of the radiation. Instead, we identified trait-specific pulses of accelerated phenotypic evolution, whereby only diversification in body shape shows an early burst1,5. The sequence of the trait-specific pulses essentially follows the pattern postulated in the stages model of adaptive radiation1,6,7, with the extension that the most recent stage of the cichlid adaptive radiation in Lake Tanganyika, which is characterized by a large number of speciation events, is defined by increased diversification in both an ecological (lower pharyngeal jaw) and a signalling (pigmentation) trait. To what extent the observed diversity and disparity patterns were shaped by past environmental fluctuations and extinction dynamics cannot be answered conclusively through the investigation of the extant fauna alone.
Methods
No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.
Sampling
Sampling was conducted between 2014 and 2017 at 130 locations at Lake Tanganyika. To maximise taxon coverage, we included additional specimens from previous expeditions (4.9% of the samples) as well as from other collections (0.8%). The final dataset (301 taxa; n = 2,723 specimens) contained an almost complete taxon sampling of the cichlid fauna of Lake Tanganyika, as well as 18 representative cichlid species from nearby waterbodies, and 32 outgroup species. All analyses described below are based on the same set of typically 10 specimens per species, or subsets thereof (Supplementary Tables 1, 2, Supplementary Methods).
Whole-genome sequencing
Genomic DNA of typically one male and one female specimen per species (n = 547) was extracted from fin clips preserved in ethanol using the E.Z.N.A. Tissue DNA Kit (Omega Bio-Tek) and sheared on a Covaris E220 (60 μl with 10% duty factor, 175 W, 200 cycles for 65 s). Individual libraries were prepared using TruSeq DNA PCR-Free Sample Preparation kit (Illumina; low sample protocol) for 350-bp insert size, pooled (six per lane), and sequenced at 126-bp paired-end on an Illumina HiSeq 2500 (Supplementary Table 1 contains information on read depths).
Assessing genomic variation
After adaptor removal with Trimmomatic35 (v.0.36), reads of 528 genomes (all species belonging to the cichlid radiation in Lake Tanganyika plus additional species nested within this radiation and some selected outgroup species; Supplementary Table 1) were mapped to the Nile tilapia reference genome (RefSeq accession GCF_001858045.136) using BWA-MEM37 (v.0.7.12). Variant calling was performed with HaplotypeCaller and GenotypeGVCF tools38 (v.3.7) (GATK), applying a minimum base quality score of 30. Variant calls were filtered with BCFtools39 (v.1.6; FS < 20, QD > 2, MQ > 20, DP > 4,000, DP < 8,000, ReadPosRankSum > −0.5, MQRankSum > −0.5). We applied a filter to sites in proximity to indels with a minor allele count greater than 2, depending on the size of the indel. With SNPable (http://lh3lh3.users.sourceforge.net/snpable.shtml), we determined all sites within regions of the Nile tilapia reference genome in which read mapping could be ambiguous and masked these sites. Using VCFtools40 (v.0.1.14) we further masked, per individual, genotypes with a read depth below 4 or a genotype quality below 20. Sites that were no longer polymorphic after the filtering steps were excluded, resulting in a dataset of 57,751,375 SNPs. Called variants were phased with the software beagle41 (v.4.1). The phasing of Neolamprologus cancellatus, which appeared to be F1 hybrids, was further improved with a custom script. Further details are provided in the Supplementary Methods.
De novo genome assemblies
De novo genome assemblies were generated from the raw-read data for each individual following an approach described previously42,43, using CeleraAssembler44 (v.8.3) and FLASH45 (v.1.2.11). Eight genomes repeatedly failed to assemble and were therefore excluded from further analyses (specimen vouchers: A188, IRF6, IZC5, JWE7, JWG1, JWG2, LJD3 and LJE8). Assembly quality was assessed with QUAST46 (v.4.5) and completeness was determined with BUSCO47 (v.3). Assembly statistics summarized with MultiQC48 (v.1.7) are available on Dryad.
Determining the age of the radiation
To determine the age of the cichlid radiation in Lake Tanganyika, we applied phylogenomic molecular-clock analyses for representatives of all cichlid subfamilies and the most divergent tribes, together with non-cichlid outgroups (44 species; Extended Data Fig. 1). Following Matschiner et al.18 we identified and filtered orthologue sequences from genome assemblies and compiled ‘strict’ and ‘permissive’ datasets that contained alignments for 510 and 1,161 genes and had total alignment lengths of 542,922 and 1,353,747 bp, respectively. We first analysed the topology of the species with the multi-species coalescent model implemented in ASTRAL49 (v.5.6.3), based on gene trees that we estimated for both datasets with BEAST250 (v.2.5.0). As undetected past introgression can influence divergence-time estimates in molecular clock analyses, we further tested for signals of introgression in the form of asymmetric species relationship in gene trees and excluded five species (Fundulus heteroclitus, Tilapia brevimanus, Pelmatolapia mariae, Tilapia sparrmanii, and Steatocranus sp. ‘ultraslender’) potentially affected by introgression from all subsequent molecular-clock analyses. We then estimated divergence times among the most divergent cichlid tribes and the age of the cichlid radiation in Lake Tanganyika with the multi-species coalescent model in StarBEAST251 (v.0.15.5), using the ‘strict’ set of gene alignments (Extended Data Fig. 1). Further details are provided in the Supplementary Methods.
Phylogenetic inference
To infer a complete phylogeny of the cichlid radiation in Lake Tanganyika (the Tanganyikan representatives of the more ancestral tribes Coptodonini, Oreochromini and Tylochromini were excluded) from genome-wide SNPs we applied additional filters, retaining only SNPs with <40% missing data and between-SNP distances of at least 100 bp. The remaining 3,630,997 SNPs were used to infer a maximum-likelihood phylogeny with RAxML52 (v.8.2.4; Fig. 1, Extended Data Fig. 2, Supplementary Fig. 1). The species-tree topology was further estimated under the multi-species coalescent model from a set of local phylogenies with ASTRAL (Extended Data Fig. 3); these local phylogenies were inferred with IQ-TREE53 (v.1.7-beta7) from alignments for 1,272 genomic regions determined to be particularly suitable for phylogenetic analysis (see Supplementary Methods). We also applied the multi-species coalescent model implemented in SNAPP54 (v.1.4.2) to the dataset of genome-wide SNPs (Extended Data Fig. 4). Species-level phylogenies resulting from these different approaches were used as topological constraints in subsequent relaxed-clock analyses of divergence times (see below). In addition, we estimated the mitochondrial phylogeny based on maximum-likelihood with RAxML (Supplementary Fig. 2). Further details are provided in the Supplementary Methods.
Divergence time estimates within the radiation
For relaxed-clock analyses, the 1,272 alignments were further filtered by applying stricter thresholds on the proportion of missing data and the strength of recombination signals. Ten remaining alignments with a length greater than 2,500 bp and less than 130 hemiplasies (total length: 30,738 bp; completeness: 95.8%), were then used jointly to estimate divergence times with the uncorrelated-lognormal relaxed-clock model implemented in BEAST2. To account for phylogenetic uncertainty in downstream phylogenetic comparative analyses, we performed three separate sets of relaxed clock analyses, in which the topology was either fixed to the species-level phylogeny inferred with RAxML (Fig. 1, Extended Data Fig. 2), the species tree inferred with ASTRAL (Extended Data Fig. 3) or the Bayesian species tree inferred with SNAPP (Extended Data Fig. 4). Further details are provided in the Supplementary Methods.
Morphometrics
To quantify body shape and upper oral jaw morphology, we applied a landmark-based geometric morphometric approach to digital X-ray images (for the full set of 10 specimens per species whenever possible; n = 2,197). We selected 21 landmarks, of which 17 were distributed across the skeleton and four defined the premaxilla (Extended Data Fig. 5a). Landmark coordinates were digitized using FIJI55 (v2.0.0-rc-68/1.521i). To extract overall body shape information, we excluded landmark 16, which marks the lateral end of the premaxilla, hence minimizing the impact of the orientation of the upper oral jaw. We then applied a Procrustes superimposition to remove the effect of size, orientation, and translational position of the coordinates.
For upper oral jaw morphology, we used a subset of four landmarks. A crucial feature of the oral jaw morphology is the orientation of the mouth relative to the body axes. However, this component of the upper oral jaw morphology would be lost in a classical geometric morphometric analysis, in which only pure shape information is retained. To overcome this, we extracted the premaxilla-specific landmarks (1, 2, 16 and 21) after Procrustes superimposition of the entire set of landmarks and subsequently recentred the landmarks to align the specimens without rotation. Thus, the resulting landmark coordinates do not represent the pure shape of the premaxilla but additionally contain information on its orientation and size in relation to body axes and body size, respectively.
To quantify lower pharyngeal jaw bone shape in 3D, a landmark-based geometric morphometric approach was applied on μCT scans of the head region of five specimens per species (n = 1,168). To capture all potential functionally important structures of the lower pharyngeal jaw bone, we selected a set of 27 landmarks (10 true landmarks and 17 sliding semi-landmarks) well distributed across the left side of the bone (Extended Data Fig. 5b). Landmark coordinates were acquired using TINA56 (v.6.0). To retain the lateral symmetric properties of the shape data during superimposition, we reconstructed the right side of the lower pharyngeal jaw bone by mirroring the landmark coordinates across the plane of bilateral symmetry fitted through all landmarks theoretically lying on this plane. We then superimposed the resulting 42 landmarks while sliding the semi-landmarks along the curves by minimizing Procrustes distances and retained the symmetric component only.
To identify the major axes of shape variation across the multivariate datasets we performed a PCA for each trait. We also calculated morphospace size per tribe as the square root of the convex hull area spanned by species means of the PC1 and PC2 scores. We then tested for a correlation between morphospace size and estimated species richness of a tribe15 (log-transformed to obtain normal distribution). To account for phylogenetic non-independence, we calculated phylogenetic independent contrasts with the R package ape57 (v.5.2) using the species tree (Fig. 1) pruned to the tribe level. We then calculated Pearson’s correlation coefficients for independent contrasts using the function cor.table of the R package picante58 (v.1.8).
All landmark coordinates for geometric morphometric analyses were processed and analysed in R59 (v.3.5.2) using the packages geomorph60 (v.3.0.7) and Morpho61 (v.2.6). Further details are provided in the Supplementary Methods.
Stable-isotope analysis
To approximate ecology for each species, we measured the stable carbon (C) and nitrogen (N) isotope composition of all available specimens from Lake Tanganyika (n = 2,259). We analysed a small (0.5–1 mg) dried muscle sample of each specimen with a Flash 2000 elemental analyser coupled to a Delta Plus XP continuous-flow isotope ratio mass spectrometer (IRMS) via a Conflo IV interface (Thermo Fisher Scientific). Carbon and nitrogen isotope data were normalized to the VPDB (Vienna Pee Dee Belemnite) and Air-N2 scales, respectively, using laboratory standards which were calibrated against international standards. Values are reported in standard per-mil notation (‰), and long-term analytical precision was 0.2‰ for δ13C values and 0.1‰ for δ15N values. Note that we have used some of these stable-isotope values in a previous study62.
To confirm interpretability of the δ13C and δ15N values, we additionally collected and analysed baseline samples covering several trophic levels from the northern and the southern basin of Lake Tanganyika (Supplementary Methods, Supplementary Discussion).
To test for a correlation of ecospace size with species richness of the tribes, we applied the same approach as described above to the δ13C and δ15N values.
Phenotype–environment association
For each trait (body shape, upper oral jaw, lower pharyngeal jaw) we performed a two-block PLS analysis based on species means of the Procrustes aligned landmark coordinates and the stable C and N isotope compositions using the function two.b.pls in geomorph. To account for phylogenetic dependence of the data we applied a pGLS as implemented in the R package caper63 (v.1.0.1) across the two sets of PLS scores (each morphological axis and the stable-isotope projection) using the time-calibrated species tree based on the maximum-likelihood topology. The strength of phylogenetic signal in the data was accounted for by optimising the branch length transformation parameter lambda using a maximum-likelihood approach.
Scoring pigmentation patterns
To quantify a putative signalling trait in cichlids, we scored the pigmentation patterns in typically five male specimens per species (n = 1,016), on the basis of standardized images taken in the field after capture of the specimens (see Supplementary Methods). Following the strategy described in Seehausen et al.64, the presence or absence of 20 pigmentation features was recorded, whereby we extended number of scored features to include additional body and fin pigmentation patterns (Extended Data Fig. 5c). We then applied a logistic PCA implemented in the R package logisticPCA65 (v.0.2) and used the PC1 scores as univariate proxy for differentiation along the signalling axes for further analyses.
Trait evolution modelling and disparity estimates
To investigate the temporal dynamics of morphological diversification over the course of the radiation we essentially followed the strategy of Cooney et al.28 (which is based on measurements on extant taxa and assumes constant niche space and no or constant extinction over the course of the radiation), using the PLS scores of body shape, upper oral jaw morphology, and lower pharyngeal jaw shape and the PC1 scores of pigmentation patterns as well as the time-calibrated maximum-likelihood species tree topology. For each trait we assessed the phylogenetic signal in the data by calculating Pagel’s lambda and Blomberg’s K with the R package phytools66 (v.0.6-60). We then tested the fit of four models of trait evolution for each of the four traits. We applied a white noise model, a Brownian motion model, a single-optimum Ornstein–Uhlenbeck model and an early burst model of trait evolution using the function fitContinuous of the R package geiger67 (v.2.0.6.1). Additionally, we fitted a variable-rates model (a Brownian motion model which allows for rate shift on branches and nodes) using the software BayesTrait (http://www.evolution.rdg.ac.uk/; v.3) with uniform prior distributions adjusted to our dataset (alpha: −1–1, sigma: 0–0.001 for morphometric traits; alpha: 0–10, sigma: 0–10 for pigmentation pattern) and applying single-chain Markov-chain Monte Carlo runs with one billion iterations. We sampled parameters every 100,000th iteration, after a pre-set burnin of 10,000,000 iterations. We then tested for each trait for convergence of the chain using a Cramer–von Mises statistic implemented in the R package coda68 (v.0.19-3). The models were compared by calculating their log-likelihood and Akaike information criterion (AIC) difference (Extended Data Fig. 8d). Based on differences in AIC, the variable-rates model was best supported for all traits but body shape, which showed a strong signal of an early burst of trait evolution (Extended Data Fig. 8d, note that the variable-rates model has the highest log-likelihood for body shape as well). We nevertheless focused on the variable-rates model for further analyses of all traits to be able to compare temporal patterns of trait evolution among the traits.
To estimate morphospace expansion through time we used a maximum-likelihood ancestral-state reconstruction implemented in phytools. To account for differences in the rate of trait evolution along the phylogeny, we reconstructed ancestral states using the mean rate-transformed tree derived from the variable-rates model. We then projected the ancestral states onto the original species tree and calculated the morphospace extent (that is, the range of trait values) in time intervals of 0.15 million years (note that this is an arbitrary value; however, differently sized time intervals had no effect on the interpretation of the results). For each time point we extracted the branches existing at that time and predicted the trait value linearly between nodes. We then compared the resulting morphospace expansion over time relative to a null model of trait evolution. We therefore simulated 500 datasets (PLS and PC1 scores) under Brownian motion given the original species tree with parameters derived from the Brownian motion model fit to the original data. For each simulated dataset we produced morphospace-expansion curves using the same approach as described above. We then compared the slopes of our observed data with each of the null models by calculating the difference of slopes through time (Fig. 3) using linear models fitted for each time interval with the two subsequent time intervals. Note that for body shape we also estimated morphospace expansion through time using the early burst model for ancestral-state reconstruction, which resulted in a very similar pattern of trait diversification.
Unlike other metrics of disparity (for example, variance or mean pairwise distances) morphospace extent is not sensitive to the density distribution of measurements within the morphospace and captures its full range69. Hence, comparing the extent of morphospace between observed data and the null model directly unveils the contribution of morphospace expansion relative to the null model; and because the increase in lineages over time is identical in the observed and the simulated data, this comparison also provides an estimate for morphospace packing.
To summarize evolutionary rates we calculated the mean rate of trait evolution inferred by the variable-rates model in the same 0.15 million years intervals along the phylogeny.
To account for phylogenetic uncertainty in the tree topology we repeated the analyses of trait evolution using the time-calibrated trees based on tree topologies estimated with ASTRAL and SNAPP (Extended Data Figs. 3, 4; Supplementary Methods; Supplementary Discussion). Furthermore, to also account for uncertainty in branch lengths, we repeated the analysis on 100 trees from the Bayesian posterior distribution for each of the three trees (Extended Data Fig. 8d, e, results are provided on Dryad).
Further details can be found in the Supplementary Methods.
Characterization of repeat content
For the repeat content analysis, we randomly selected one de novo genome assembly per species of the radiation (n = 245). We performed a de novo identification of repeat families using RepeatModeler (v.1.0.11; http://www.repeatmasker.org). We then combined the RepeatModeler output library with the available cichlid-specific libraries (Dfam and RepBase; v.27.01.2017; http://www.repeatmasker.org; 258 ancestral and ubiquitous sequences, 161 cichlid-specific repeats, and 6 lineage-specific sequences; 65,118, 273,530 and 6,667 bp in total, respectively) and used the software RepeatMasker (v.4.0.7; http://www.repeatmasker.org) (-xsmall -s -e ncbi -lib combined_libraries.fa) to identify and soft-mask interspersed repeats and low complexity DNA sequences in each assembly. The reported summary statistics were obtained using RepeatMasker’s buildSummary.pl script (Fig. 4a, Extended Data Fig. 9a, results per genome are provided on Dryad).
Gene duplication estimates
Per genome, gene duplication events were identified with the structural variant identification pipeline smoove (population calling method; https://github.com/brentp/smoove, docker image cloned 20/12/2018), which builds upon lumpy70, svtyper71 and svtools (https://github.com/hall-lab/svtools). Variants were called per sample (n = 488 genomes, 246 taxa of the Tanganyika radiation) from the initial mapping files against the Nile tilapia reference genome with the function ‘call’. The union of sites across all samples was obtained with the function ‘merge’, then all samples were genotyped at those sites with the function ‘genotype’, and depth information was added with --duphold. Genotypes were combined with the function ‘paste’ and annotated with ‘annotate’ and the reference genome annotation file. The obtained VCF file was filtered with BCFtools to keep only duplications longer than 1 kb and of high quality (MSHQ >3 or MSHQ = −1, FMT/DHFFC[0] > 1.3, QUAL >100). The resulting file was loaded into R (v.3.6.0) with vcfR72 (v.1.8.0) and filtered to keep only duplications with less than 20% missing genotypes. Next, we removed duplication events with a length outside 1.5 times the interquartile range above the upper quartile of all duplication length, resulting in a final dataset of 476 duplications (Fig. 4b).
Analyses of selection on coding sequence
To predict genes within the de novo genome assemblies, we used AUGUSTUS73 (v.3.2.3) with default parameters and ‘zebrafish’ as species parameter (n = 485 genomes, 245 taxa). For each prediction we inferred orthology to Nile tilapia genes (GCF_001858045.1_ASM185804v2) with GMAP (GMAP-GSNAP74; v.2017-08-15) applying a minimum trimmed coverage of 0.5 and a minimum identity of 0.8. We excluded specimens with less than 18,000 tilapia orthologous genes detected (resulting in n = 471 genomes, 243 taxa). Next, we kept only those tilapia protein coding sequences that had at least one of their exons present in at least 80% of the assemblies (260,335 exons were retained, representing 34,793 protein coding sequences). Based on the Nile tilapia reference genome annotation file, we reconstructed for each assembly the orthologous coding sequences. Missing exon sequences were set to Ns. We then kept a single protein coding sequence per gene (the one being present in the maximum number of species with the highest percentage of sequence length), resulting in 15,294 protein coding sequences. Per gene, a multiple sequence alignment was then produced using MACSE75 (v.2.01). We calculated for each specimen and each gene the number of synonymous (S) and non-synonymous (N) substitutions by pairwise comparison to the orthologue tilapia sequence using codeml with runmode –2 within PAML76 (v.4.9e). To obtain an estimate of the genome-wide sequence evolution rate that is independent of filtering thresholds, we calculated the genome-wide dN/dS ratio for each specimen based on the sum of dS and dN across all genes (Fig. 4c, Extended Data Fig. 9b).
Signals of past introgression
We used the f4-ratio statistic34 to assess genomic evidence for interspecific gene exchange. We calculated the f4-ratio for all combinations of trios of species on the filtered VCF files using the software Dsuite77 (v.0.2 r20), with T. sparrmanii as outgroup species (we excluded N. cancellatus as all specimens of this species appeared to be F1 hybrids; Supplementary Methods). The f4-ratio statistic estimates the admixture proportion, that is, the proportion of the genome affected by gene flow. The results presented in this study (Fig. 4e, Extended Data Fig. 10) are based on the ‘tree’ output of the Dsuite function Dtrios, with each trio arranged according to the species tree on the basis of the maximum-likelihood topology. The per-tribe analyses (Fig. 4e) were based only on comparisons where all species within a trio belong to the same tribe (n = 243 taxa).
In addition to the f4-ratio we also identified signals of past introgression among species using a phylogenetic approach by testing for asymmetry in the relationships of species trios in 1,272 local maximum-likelihood trees generated using IQ-TREE (Supplementary Methods; Extended Data Fig. 10).
Heterozygosity
We calculated the number of heterozygous sites per genome (n = 488 genomes, 246 taxa from the Tanganyika radiation) from the VCF files using the BCFtools function stats and then quantified the percentage of heterozygous sites among the number of callable sites per genome (see above) (Fig. 4d).
To explore if the observed levels of heterozygosity per tribe can be explained by the levels of gene flow within tribes we performed coalescent simulations with msprime78 (v.0.7.4). We simulated genome evolution of all species of the radiation following the time-calibrated species tree (Fig. 1), assuming a generation time of 3 years79 and a constant effective population size of 20,000 individuals. Species divergences were implemented as mass migration events and introgression within tribes as migration between species pairs with rates set according to their introgression (f4-ratio) signals inferred with Dsuite. To convert the f4-ratio values into migration rates, we applied a scaling factor of 5 × 10−6, which results in a close correspondence in magnitude of the simulated introgression signals to those observed empirically (Fig. 4, Extended Data Fig. 9c). In each of 20 separate simulations, we randomly sampled one pairwise f4-ratio value for each pair of species (there are many f4 ratios per species pair—one for each possible third species added to the test trio; the maximum values per pair are shown in Extended Data Fig. 10). The simulated data consisted of one chromosome of 100 kb (mutation rate: 3.5 × 10−9 per bp per generation33, recombination rate: 2.2 × 10−8 per bp per generation; see Supplementary Methods). Levels of heterozygosity were calculated for all simulated datasets as described for the empirical data.
To account for between-tribe gene flow we further performed simulations in which migration between tribes was also sampled from the empirical f4-ratio distribution. For simplicity in setting up the simulation model, we assume that gene flow between tribes is ongoing until present day, which is clearly an overestimate (see Supplementary Discussion). Nevertheless, the results of these simulations support our hypothesized scenario, confirming that much of the variation in heterozygosity as well as its correlation with species richness can be explained by the observed levels of gene flow.
Correlation of genome-wide statistics with species richness
We tested for a correlation between tribe means (based on species means) of each genomic summary statistics (transposable element counts, number of gene duplications, genome-wide dN/dS ratio, per-genome heterozygosity, and f4-ratio, as well as the heterozygosity and f4-ratio statistics derived from simulated genome evolution) and species richness of the tribes, applying the same approach as described above for tests of correlation between morpho- and ecospace size and species richness.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
Data availability
All newly sequenced genomes for this study and their raw reads are available from NCBI under the BioProject accession number PRJNA550295 (https://www.ncbi.nlm.nih.gov/bioproject/). The VCF file, tree files, summary statistics of the assembled genomes and phenotypic datasets generated and analysed during this study are available as downloadable files on Dryad (https://doi.org/10.5061/dryad.9w0vt4bbf). The Nile tilapia reference genome used is available under RefSeq accession GCF_001858045.1. All X-ray data are available on MorphoSource under the project number P1093. Source data are provided with this paper.
Code availability
Code used to analyse the data is available on GitHub (https://github.com/cichlidx/ronco_et_al), except for analyses where single commands from publicly available software were used and where all settings are fully reported in the Methods and/or Supplementary Methods.
References
Gavrilets, S. & Losos, J. B. Adaptive radiation: contrasting theory with data. Science 323, 732–737 (2009).
Schluter, D. The Ecology of Adaptive Radiation (Oxford Univ. Press, 2000).
Simpson, G. G. The Major Features of Evolution (Columbia Univ. Press, 1953).
Glor, R. E. Phylogenetic insights on adaptive radiation. Annu. Rev. Ecol. Evol. Syst. 41, 251–270 (2010).
Foote, M. The evolution of morphological diversity. Annu. Rev. Ecol. Syst. 28, 129–152 (1997).
Danley, P. D. & Kocher, T. D. Speciation in rapidly diverging systems: lessons from Lake Malawi. Mol. Ecol. 10, 1075–1086 (2001).
Streelman, J. T. & Danley, P. D. The stages of vertebrate evolutionary radiation. Trends Ecol. Evol. 18, 126–131 (2003).
Benton, M. J. Diversification and extinction in the history of life. Science 268, 52–58 (1995).
Sepkoski, J. J., Jr. Rates of speciation in the fossil record. Phil. Trans. R. Soc. Lond. B 353, 315–326 (1998).
Berner, D. & Salzburger, W. The genomics of organismal diversification illuminated by adaptive radiations. Trends Genet. 31, 491–499 (2015).
Wagner, C. E., Harmon, L. J. & Seehausen, O. Ecological opportunity and sexual selection together predict adaptive radiation. Nature 487, 366–369 (2012).
Salzburger, W. Understanding explosive diversification through cichlid fish genomics. Nat. Rev. Genet. 19, 705–717 (2018).
Brawand, D. et al. The genomic substrate for adaptive radiation in African cichlid fish. Nature 513, 375–381 (2014).
Fryer, G. & Iles, T. D. The Cichlid Fishes of the Great Lakes of Africa (T.F.H. Publications, 1972).
Ronco, F., Büscher, H. H., Indermaur, A. & Salzburger, W. The taxonomic diversity of the cichlid fish fauna of ancient Lake Tanganyika, East Africa. J. Gt. Lakes Res. 46, 1067–1078 (2020).
Muschick, M., Indermaur, A. & Salzburger, W. Convergent evolution within an adaptive radiation of cichlid fishes. Curr. Biol. 22, 2362–2368 (2012).
Salzburger, W., Van Bocxlaer, B. & Cohen, A. S. Ecology and evolution of the African Great Lakes and their faunas. Annu. Rev. Ecol. Evol. Syst. 45, 519–545 (2014).
Matschiner, M., Böhne, A., Ronco, F. & Salzburger, W. The genomic timeline of cichlid diversification across continents. Nat. Commun. https://doi.org/10.1038/s41467-020-17827-9 (2020).
Koch, M. et al. Evolutionary history of the endemic Lake Tanganyika cichlid fish Tylochromis polylepis: A recent intruder to a mature adaptive radiation. J. Zool. Syst. Evol. Res. 45, 64–71 (2007).
Salzburger, W., Meyer, A., Baric, S., Verheyen, E. & Sturmbauer, C. Phylogeny of the Lake Tanganyika cichlid species flock and its relationship to the Central and East African haplochromine cichlid fish faunas. Syst. Biol. 51, 113–135 (2002).
Schedel, F. D. B., Musilova, Z. & Schliewen, U. K. East African cichlid lineages (Teleostei: Cichlidae) might be older than their ancient host lakes: new divergence estimates for the east African cichlid radiation. BMC Evol. Biol. 19, 94 (2019).
Irisarri, I. et al. Phylogenomics uncovers early hybridization and adaptive loci shaping the radiation of Lake Tanganyika cichlid fishes. Nat. Commun. 9, 3159 (2018).
Cohen, A. S., Soreghan, M. J. & Scholz, C. A. Estimating the age of formation of lakes: an example from Lake Tanganyika, East African Rift system. Geology 21, 511–514 (1993).
Post, D. M. Using stable isotopes to estimate trophic position: models, methods, and assumptions. Ecology 83, 703–718 (2002).
Liem, K. F. Evolutionary strategies and morphological innovations: cichlid pharyngeal jaws. Syst. Zool. 22, 425–441 (1973).
Salzburger, W. The interaction of sexually and naturally selected traits in the adaptive radiations of cichlid fishes. Mol. Ecol. 18, 169–185 (2009).
Venditti, C., Meade, A. & Pagel, M. Multiple routes to mammalian diversity. Nature 479, 393–396 (2011).
Cooney, C. R. et al. Mega-evolutionary dynamics of the adaptive radiation of birds. Nature 542, 344–347 (2017).
Ellegren, H. & Galtier, N. Determinants of genetic diversity. Nat. Rev. Genet. 17, 422–433 (2016).
Schluter, D. & Pennell, M. W. Speciation gradients and the distribution of biodiversity. Nature 546, 48–55 (2017).
Grant, P. R. & Grant, B. R. 40 Years of Evolution: Darwin’s Finches on Daphne Major Island (Princeton Univ. Press, 2014).
Meier, J. I. et al. Ancient hybridization fuels rapid cichlid fish adaptive radiations. Nat. Commun. 8, 14363 (2017).
Malinsky, M. et al. Whole-genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow. Nat. Ecol. Evol. 2, 1940–1955 (2018).
Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Conte, M. A., Gammerdinger, W. J., Bartie, K. L., Penman, D. J. & Kocher, T. D. A high quality assembly of the Nile Tilapia (Oreochromis niloticus) genome reveals the structure of two sex determination regions. BMC Genomics 18, 341 (2017).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).
Böhne, A. et al. Repeated evolution versus common ancestry: Sex chromosome evolution in the haplochromine Pseudocrenilabrus philander. Genome Biol. Evol. 11, 439–458 (2019).
Malmstrøm, M., Matschiner, M., Tørresen, O. K., Jakobsen, K. S. & Jentoft, S. Data descriptor: Whole genome sequencing data and de novo draft assemblies for 66 teleost species. Sci. Data 4, 1–13 (2017).
Myers, E. W. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).
Magoč, T. & Salzberg, S. L. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957–2963 (2011).
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Ewels, P., Magnusson, M., Lundin, S., Käller, M. & Multi, Q. C. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
Zhang, C., Rabiee, M., Sayyari, E. & Mirarab, S. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics 19 (Suppl 6), 153 (2018).
Bouckaert, R. et al. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLOS Comput. Biol. 15, e1006650 (2019).
Ogilvie, H. A., Bouckaert, R. R. & Drummond, A. J. StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates. Mol. Biol. Evol. 34, 2101–2114 (2017).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N. A. & RoyChoudhury, A. Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol. Biol. Evol. 29, 1917–1932 (2012).
Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682 (2012).
Schunke, A. C., Bromiley, P. A., Tautz, D. & Thacker, N. A. TINA manual landmarking tool: software for the precise digitization of 3D landmarks. Front. Zool. 9, 6 (2012).
Paradis, E., Claude, J. & Strimmer, K. APE: Analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).
Kembel, S. W. et al. Picante: R tools for integrating phylogenies and ecology. Bioinformatics 26, 1463–1464 (2010).
R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing (2018).
Adams, D. C. & Otárola-Castillo, E. Geomorph: An R package for the collection and analysis of geometric morphometric shape data. Methods Ecol. Evol. 4, 393–399 (2013).
Schlager, S. in Statistical Shape and Deformation Analysis (eds Zheng, G., Li, S. & Szekely, G.) 217–256 (Academic Press, 2017).
Ronco, F., Roesti, M. & Salzburger, W. A functional trade-off between trophic adaptation and parental care predicts sexual dimorphism in cichlid fish. Proc. R. Soc. Lond. B 286, 20191050 (2019).
Orme, D. The Caper Package: Comparative Analysis of Phylogenetics and Evolution in R https://cran.r-project.org/web/packages/caper/vignettes/caper.pdf (2018).
Seehausen, O., Mayhew, P. J. & Van Alphen, J. J. M. Evolution of colour patterns in East African cichlid fish. J. Evol. Biol. 12, 514–534 (1999).
Landgraf, A. J. & Lee, Y. Dimensionality reduction for binary data through the projection of natural parameters. J. Multivar. Anal. 104668 (2020).
Revell, L. J. phytools: An R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012).
Harmon, L. J., Weir, J. T., Brock, C. D., Glor, R. E. & Challenger, W. GEIGER: investigating evolutionary radiations. Bioinformatics 24, 129–131 (2008).
Plummer, M., Best, N., Cowles, K. & Vines, K. CODA: convergence diagnosis and output analysis for MCMC. R News 6, 7–11 (2005).
Ciampaglio, C. N., Kemp, M. & McShea, D. W. Detecting changes in morphospace occupation patterns in the fossil record: characterization and analysis of measures of disparity. Paleobiology 27, 695–715 (2001).
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Chiang, C. et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat. Methods 12, 966–968 (2015).
Knaus, B. J. & Grünwald, N. J. vcfr: a package to manipulate and visualize variant call format data in R. Mol. Ecol. Resour. 17, 44–53 (2017).
Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7, 62 (2006).
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Ranwez, V., Douzery, E. J. P., Cambon, C., Chantret, N. & Delsuc, F. MACSE v2: Toolkit for the alignment of coding sequences accounting for frameshifts and stop codons. Mol. Biol. Evol. 35, 2582–2584 (2018).
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Malinsky, M., Matschiner, M. & Svardal, H. Dsuite–fast D-statistics and related admixture evidence from VCF files. Methods Ecol. Evol. https://doi.org/10.1111/1755-0998.13265 (2020).
Kelleher, J., Etheridge, A. M. & McVean, G. Efficient coalescent simulation and genealogical analysis for large sample sizes. PLOS Comput. Biol. 12, e1004842 (2016).
Malinsky, M. et al. Genomic islands of speciation separate cichlid ecomorphs in an East African crater lake. Science 350, 1493–1498 (2015).
Acknowledgements
We thank the University of Burundi, the Ministère de l'Eau, de l'Environnement, de l'Amenagement du Territoire et de l'Urbanisme, Republic of Burundi, the Centre de Recherche en Hydrobiologie (CRH), Uvira, DR Congo, the Tanzania Commission for Science and Technology (COSTECH), the Tanzania Fisheries Research Institute (TAFIRI), the Tanzania National Parks Authority (TANAPA), the Tanzania Wildlife Research Institute (TAWIRI), the Lake Tanganyika Research Unit, Department of Fisheries, Republic of Zambia, and the Zambian Department for Immigration for research permits; G. Banyankimbona, H. Mwima, G. Hakizimana, N. Muderhwa, P. Masilya, I. Kimirei, M. Mukuli Wa-Teba, G. Moshi, A. Mwakatobe, C. Katongo, T. Banda and L. Makasa for assistance with obtaining research permits; the boat crews of the Chomba (D. Mwanakulya, J. Sichilima, H. D. Sichilima Jr and G. Katai) and the Maji Makubwa II (G. Kazumbe and family) for navigation, guidance and company; the boat drivers M. Katumba and T. Musisha; the car drivers A. Irakoze and J. Leonard; M. Schreyen-Brichard, M. Mukuli Wa-Teba, G. Kazumbe, I. Kimirei, D. Schlatter, R. Schlatter, M. K. Dominico, H. Sichilima Sr, C. Zytkow, P. Lassen and V. Huwiler for logistic support; G. Banyankimbona, N. Boileau, B. Egger, Y. Fermon, G. Kazumbe, G. Katai, R. Lusoma, K. Smailus, L. Widmer and numerous fishermen at Lake Tanganyika for help during sampling; V. Huwiler, Charity, O. Mangwangwa and the Zytkow family for lodging; people of innumerable villages on the shores of Lake Tanganyika for providing workspace, shelter for night-camps and access to village infrastructure; M. Barluenga, H. Gante, Z. Musilová, F. Schedel, J. Snoeks, M. Stiassny, H. Tanaka, G. Turner and M. Van Steenberge for providing additional samples and/or specimens; M. Sánchez, A. Schweizer and A. Wegmann for assistance with the μCT scanning of large specimens; C. Moes for help with radiographs; V. Evrard for help with stable isotopes; I. Nissen and E. Burcklen for assistance with DNA shearing; M. Conte and T. D. Kocher for sharing the RepeatMasker annotations for Nile tilapia; C. Klingenberg and M. Sánchez for discussions on the morphometric approach; A. Tooming-Klunderud and team at the Norwegian Sequencing Centre and C. Beisel and team at the Genomics Facility Basel at the ETH Zurich Department of Biosystems Science and Engineering (D-BSSE), Basel, for assistance with next-generation sequencing; M. Jacquot, E. Pujades and T. Sengstag for the setup and assistance with the collection database system (LabKey); and J. Johnson and A. Viertler for fish illustrations in Fig. 1 and Extended Data Fig. 5, respectively. Calculations were performed at sciCORE (http://scicore.unibas.ch/) scientific computing centre at University of Basel (with support by the SIB/Swiss Institute of Bioinformatics) and the Abel computer cluster, University of Oslo. This work was funded by the European Research Council (ERC, Consolidator Grant Nr. 617585 ‘CICHLID~X’ jointly hosted by the University of Basel and the University of Oslo) and the Swiss National Science Foundation (SNSF, grants 156405 and 176039) to W.S. A. Böhne was supported by the SNSF (Ambizione grant 161462).
Author information
Authors and Affiliations
Contributions
F.R., A.I. and W.S. designed this study (with input from H.H.B., A.K. and S.J.). F.R., A.I., H.H.B. and W.S. collected the specimens in the field. F.R. and A. Böhne extracted DNA and prepared the libraries for sequencing. S.J. coordinated sequencing. M. Matschiner performed the mapping, variant calling, phylogenetic analyses and coalescent simulations. M. Malinsky contributed to the variant calling pipelines and performed the f4-ratio statistics. A. Böhne assembled the genomes and quantified gene duplications, A.E.T. conducted the dN/dS analyses and V.R. analysed transposable elements. A. Boila assessed stable-isotope compositions, H.H.B. radiographed the specimens and W.S. scored pigmentation patterns. F.R. curated the samples and performed μCT scanning, geometric morphometric analyses, and all analyses incorporating morphological and ecological data as well as correlations with species richness. F.R. and W.S. wrote the manuscript with contributions and/or feedback from all authors. All authors read and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature thanks the anonymous reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Age of the adaptive radiation of cichlid fishes in African Lake Tanganyika.
Time-calibrated species tree of species representing divergent tribes and subfamilies within cichlids as well as closely-related non-cichlid outgroups, generated with the multi-species coalescent model in StarBEAST2. Nodes marked with a black dot were constrained according to species-tree analyses with ASTRAL. Node bars indicate 95% highest-posterior density age intervals. Outgroup divergence times are not drawn to scale. Insets visualize the prior distribution applied for the age of African cichlids according to Matschiner et al.18, as well as posterior age estimates for Oreochromini and the cichlid adaptive radiation in Lake Tanganyika (LT).
Extended Data Fig. 2 Time-calibrated species tree of the cichlid adaptive radiation in Lake Tanganyika.
The species tree is based on the maximum-likelihood topology estimated with RAxML (Fig. 1) and was time-calibrated using a relaxed-clock model in BEAST2, applied to a selected set of alignments.
Extended Data Fig. 3 Alternative time-calibrated species tree of the cichlid adaptive radiation in Lake Tanganyika.
The species tree is based on the topology estimated with ASTRAL and was time-calibrated using a relaxed-clock model in BEAST2, applied to a selected set of alignments.
Extended Data Fig. 4 Alternative time-calibrated species tree of the cichlid adaptive radiation in Lake Tanganyika.
The species tree is based on the topology estimated with SNAPP and was time-calibrated using a relaxed-clock model in BEAST2, applied to a selected set of alignments.
Extended Data Fig. 5 Phenotyping of the specimens.
a, Two-dimensional landmarks placed on X-ray images of the specimens. To quantify overall body shape we excluded landmark 16 (to minimise the effect of the orientation of the oral jaw). To analyse upper oral jaw morphology we used landmarks 1, 2, 16 and 21. b, Three-dimensional landmarks used to analyse lower pharyngeal jaw shape on μCT scans of the heads. True landmarks are indicated in red, sliding semi-landmarks are indicated in blue. c, Body regions scored for presence/absence of pigmentation patterns.
Extended Data Fig. 6 Ecospace and morphospace occupation of the cichlid adaptive radiation in Lake Tanganyika.
Scatter plots for each focal tribe (indicated with colours, see Fig. 1 for colour key) against the total eco-and morphospace (grey). Species ranges are indicated with convex hulls. a, Stable N and C isotope compositions (δ15N and δ13C values). The additional plot shows δ15N and δ13C values of a baseline dataset which confirms the interpretability of the stable N and C isotope composition in Lake Tanganyika (see Supplementary Methods and Discussion). b, PC1 and PC2 of body shape (for shape changes associated with the PC axes see Fig. 2). The last plot for each trait shows the size of the traitspace per tribe in relation to species numbers (stable isotopes: Pearson’s r = 0.88, d.f. = 9, P = 0.0004; body shape: Pearson’s r = 0.91, d.f. = 9, P = 0.0001). Traitspace size was calculated as the square root of the convex hull area spanned by species means.
Extended Data Fig. 7 Morphospace occupation of the cichlid adaptive radiation in Lake Tanganyika.
a, b, Scatter plots of PC1 and PC2 for upper oral jaw morphology (a) and lower pharyngeal jaw shape per tribe (b) (indicated with colours, see Fig. 1 for colour key) against the total morphospace (grey). Species ranges are indicated with convex hulls. For shape changes associated with the respective PC-axis see Fig. 2. The last plot for each trait shows the size of the morphospace per tribe in relation to species numbers (upper oral jaw morphology: Pearson’s r = 0.88, d.f. = 9, P = 0.0003; lower pharyngeal jaw shape: Pearson’s r = 0.83, d.f. = 9, P = 0.0017). Morphospace size was calculated as the square root of the convex hull area spanned by species means.
Extended Data Fig. 8 PLS fit for each multivariate trait against the stable N and C isotope compositions (δ15N and δ13C values) and models of trait evolution.
a–c, PLS fits for body shape (a), upper oral jaw morphology (b) and lower pharyngeal jaw shape (c). Associated shape changes and loadings of the respective stable isotope projection are indicated next to the axes. Data points represent species means and are coloured according to tribe. d, Comparison of model fits for different models of trait evolution and phylogenetic signal for each trait complex using three time-calibrated species trees with alternative topologies. e, Overview of the model fits and phylogenetic signal inferred using 100 trees sampled from the posterior distributions of the time calibrations for each of the three alternative tree topologies.
Extended Data Fig. 9 Genome-wide statistical analyses.
a, Proportion of the different classes of transposable elements (TE) among all TE for each tribe (one genome per species, n = 245). b, Species means of dN (left) and dS (right) values over alignment length for each tribe (n = 243 taxa, 471 genomes). The boxes’ centre lines show median, box limits show first and third quartiles, and whiskers show the 1.5 × interquartile ranges. c, f4-ratio statistics among species within each tribe in simulated data (tribe means are based on the mean across 20 simulations of each species triplet). Data points are coloured according to tribes; large points are tribe means shown with 95% confidence intervals, small points represent species means and are only shown for group sizes <40 species. To test for a correlation with species richness per tribe (log-transformed), we calculated phylogenetic independent contrasts for each variable and inferred Pearson’s r through the origin.
Extended Data Fig. 10 Signals of introgression among Lake Tanganyika cichlid species.
Upper matrix: maximum values of the f4-ratio statistics between all pairs of species, derived from calculations across all combinations of species trios with T. sparrmanii fixed as the outgroup. The f4-ratio estimates the proportion of the genome affected by gene flow, all presented values are statistically significant (one-sided block-jackknife tests: P < 5 × 10−5 after Benjamini–Hochberg correction for multiple testing). Lower matrix: Dtree-statistics (hue) with corresponding P-value (two-tailed binomial test, not adjusted for multiple testing; log-transformed; saturation) based on a phylogenetic approach testing for asymmetry in the relationships of species trios in 1,272 local maximum-likelihood trees (see Supplementary Methods). The two different approaches uncovered little gene flow among the tribes (see Supplementary Discussion).
Supplementary information
Supplementary Information
This file contains Supplementary Methods with a detailed description of methods used to collect and analyse the data presented in the manuscript, and a Supplementary Discussion, and Supplementary Figures 1 and 2.
Supplementary Tables
This file contains Supplementary Tables 1 and 2 with sample size information per species for each data set, read depth of the genomes, and information on each of the specimens used in the study, including taxonomic information and sampling locations. The title and caption of each Supplementary Table can be found in the Supplementary Information file.
Rights and permissions
About this article
Cite this article
Ronco, F., Matschiner, M., Böhne, A. et al. Drivers and dynamics of a massive adaptive radiation in cichlid fishes. Nature 589, 76–81 (2021). https://doi.org/10.1038/s41586-020-2930-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-020-2930-4
- Springer Nature Limited
This article is cited by
-
Mitogenomes do not substantially improve phylogenetic resolution in a young non-model adaptive radiation of freshwater gastropods
BMC Ecology and Evolution (2024)
-
Lateral line system diversification during the early stages of ecological speciation in cichlid fish
BMC Ecology and Evolution (2024)
-
Horizontal gene transfer is predicted to overcome the diversity limit of competing microbial species
Nature Communications (2024)
-
Evolutionary radiation strategy revealed in the Scarabaeidae with evidence of continuous spatiotemporal morphology and phylogenesis
Communications Biology (2024)
-
Turnover of sex chromosomes in the Lake Tanganyika cichlid tribe Tropheini (Teleostei: Cichlidae)
Scientific Reports (2024)