Introduction

Most of the largest vertebrate genomes are found in salamanders, a clade of amphibians that includes 686 recognized species (AmphibiaWeb 2015). Salamander genomes range in size from 14 to 120 Gb. These sizes are an order of magnitude larger than bird, mammal, reptile, and frog genomes, as well as all fish genomes with the exception of the six species of lungfish (Gregory 2016). The phenotypic correlates of genomic gigantism in salamanders include reduced neural, visual, and skeletal complexity, unusual blood cell morphology, slow developmental rate, and low metabolic rate (Jockusch 1997; Mueller et al. 2008; Roth et al. 1997; Szarski 1983).

Paleohistological analyses of fossil cell size in extinct tetrapods confirm that the gigantic genomes of living salamanders are a derived trait (Organ et al. 2011). All species of extant crown salamanders examined to date, as well as one of the earliest known stem salamanders, have large genomes, suggesting that genome sizes in the clade have been gigantic for at least 150 to 200 million years (Gregory 2016; Laurin et al. 2015; Marjanović and Laurin 2007; Mueller 2006). Salamander genomes are diploid; their enormous sizes do not reflect increases in ploidy (Sessions 2008; Sessions and Kezer 1991). Instead, they reflect the accumulation of unusually large numbers of transposable element (TE) sequences (Sun and Mueller 2014; Sun et al. 2012b).

Within the salamander clade, genome size has both increased and decreased across lineages, with increases outnumbering decreases by a factor of at least three (Sessions 2008; Sessions and Larson 1987). Additionally, the TE landscape varies across salamander lineages (Sun and Mueller 2014). These patterns demonstrate that the substantial repetitive portion of salamander genomes is dynamic; these genomes have been large for hundreds of millions of years, but not because the same repetitive sequences have persisted for hundreds of millions of years. Rather, each salamander lineage has experienced accumulation and removal of TE sequences, as is the case for all eukaryotes. In salamanders, however, the long-term balance between these two processes has been struck at a different point than in other vertebrates, producing unusually high—albeit variable—TE loads across the clade.

What explains the accumulation and persistence of such high TE levels in salamanders? DNA loss rate has been proposed to be a major determinant of genome size (Petrov 2002). Under this model, the mutational spectrum drives genome size towards an equilibrium value set, in large part, by the deletion rate. DNA loss rates from salamander genomes through small deletions (i.e., <30 bp) are slower than from vertebrate genomes of typical size, suggesting that slow rates of DNA loss through small deletions contribute to genomic gigantism in salamanders (Sun et al. 2012a; Sun and Mueller 2014). However, slow rates of DNA loss through small indels are likely insufficient, in and of themselves, to explain salamanders’ enormous genomes; this result is consistent with analyses of DNA loss rates and genome sizes across other taxa (Gregory 2004). LTR retrotransposon deletion through ectopic recombination also appears to be less common in salamanders than in other vertebrates (Frahry et al. 2015). Thus, slow rates of DNA loss through large deletions also contribute to genomic gigantism in salamanders, although the total contribution cannot be quantified with existing data. However, given that as much as 75 % of salamander genomes consist of relatively recently active repeat elements (i.e., repeats that can be identified by sequence similarity-based methods) (Sun and Mueller 2014; Sun et al. 2012b), it is likely that slow rates of DNA loss, through both small and large deletions, cannot fully explain the TE accumulation underlying salamanders’ gigantic genomes.

The strength of genetic drift, reflecting the effective population size (N e), has also been hypothesized to be a major determinant of genome size (Lynch 2007; Lynch and Conery 2003). Under this model, the deleterious fitness consequences of non-coding DNA (including TEs) are weak enough that, in lineages with low N e, the power of genetic drift overwhelms selection’s ability to purge non-coding sequences from the population. To date, there is no evidence that crown salamanders have experienced stronger genetic drift throughout their evolutionary history than related amphibians with more typically sized vertebrate genomes (Mohlhenrich and Mueller 2016). This result suggests that strong persistent genetic drift cannot explain the TE accumulation underlying salamanders’ gigantic genomes.

Taken together, the results from the analyses of deletion and drift suggest that high levels of TE insertion are required to explain salamanders’ high TE loads. Despite the fact that neither deletion nor drift alone appears sufficient to explain persistent genomic gigantism in salamanders, these two processes interact in an important way. Deletions involving TEs are a major determinant of their mutational hazard and, by extension, their negative impacts on fitness. Thus, TE sequences that mediate/sustain fewer deletions are less deleterious than those that mediate/sustain more deletions. For example, TE insertions more likely to mediate ectopic recombination events—which can produce large deletions as well as duplications—are stronger targets of purifying selection than TEs less likely to mediate such events (Barrón et al. 2014). Additionally, TEs that sustain more small deletions have a higher probability of mutating to a harmful gain-of-function allele. Because of their low rates of both large and small deletions, TEs in salamander genomes likely have a smaller negative impact on fitness—based solely on mutational hazard—than TEs in other vertebrate genomes. This low mutational hazard is consistent with high levels of TE insertion being tolerated in salamander genomes.

What might facilitate high levels of TE insertion in salamander genomes? Across the Tree of Life, novel TE insertions are suppressed by several pathways involving small RNA molecules (Malone and Hannon 2009; Siomi et al. 2011). Differences in these pathways are apparent among different model organisms, both within and among major eukaryotic clades (Dumesic and Madhani 2014). In most animals, TE activity in the germline (i.e., the activity that directly impacts genome evolution) is primarily regulated by the Piwi-interacting RNA (piRNA) pathway (Grimson et al. 2008; Siomi et al. 2011). Relative to other endogenous small RNA classes (e.g., microRNAs and endogenous small interfering RNAs), piRNAs are the largest and most diverse class of non-coding RNAs, and they are produced by a distinct biogenesis pathway (Iwasaki et al. 2015). piRNAs are bound by proteins in the Piwi clade of the Argonaute family and guide their suppression of TE activity (Clark and Lau 2014; Dumesic and Madhani 2014). piRNAs also regulate germline gene expression (Castel and Martienssen 2013). Although much about piRNA biology remains incompletely understood (Clark and Lau 2014), models of transcriptional and post-transcriptional TE suppression are becoming established (Yamanaka et al. 2014).

In this study, we test the hypothesis that salamanders’ unusually high TE loads reflect the loss of the ancestral piRNA-mediated TE-silencing machinery. We deeply sequenced small RNA molecules in the female and male adult gonads in order to identify sequences that bear the characteristics of TE-targeting piRNAs —a length of 27–31 nt, a bias towards 5′ U, base complementarily to transposable element sequences, and, in some cases, an antisense binding partner with a 10-nucleotide overlap. We also examined the amino acid sequences of 12 piRNA pathway proteins from salamanders and other vertebrates, testing whether the overall patterns of sequence divergence are consistent with conserved pathway function across the vertebrate clade. Our results do not support the hypothesis of piRNA pathway loss; instead, they suggest that the piRNA pathway is expressed in salamanders. Given these results, we propose hypotheses to explain how the extraordinary TE loads in salamander genomes could have accumulated, despite the expression of TE-silencing machinery.

Materials and Methods

Sample Information

We obtained two adult Desmognathus fuscus (one female, one male) from Wilkes County, North Carolina on May 13–15, 2012. GPS coordinates for the samples are 36.116072, −81.128333 for the female and 36.07151, −81.176845 for the male. Desmognathine salamanders have the smallest genomes among salamanders (i.e., ~15 Gb); thus, they are a reasonable system to begin exploring small RNA-mediated TE suppression in salamanders because future research incorporating genomic information (e.g., on piRNA clusters) will be as tractable as possible. Animals were euthanized by immersion in chloretone or benzocaine, decapitated, and dissected immediately in accordance with the Colorado State University Institutional Animal Care and Use Committee (IACUC) protocol #11-2775A and the Brandeis University IACUC protocol #13008. Ovaries and testes were dissected from female and male, respectively, flash frozen, and stored at −80 °C. Based on visual inspection, the majority of the ovarian tissue was comprised of late-stage eggs; thus, the bulk of RNA from the female sample likely reflects maternal deposition into eggs. However, our dataset will also include a small fraction of piRNAs produced in the somatic gonadal tissue of females. RNA was extracted using TRIzol® according to manufacturer’s protocols. RNA quality was assessed on a Bioanalyzer (Agilent).

Library Construction and Sequencing

Small RNA libraries and standard RNA-Seq libraries were constructed for both samples. All samples were Turbo-DNased (Ambion) prior to library construction. For small RNA libraries, library construction was performed with the IntegenX PrepX RNA library preparation kit with the PrepX small RNA 8 protocol. For standard RNA-Seq libraries, samples were treated with Invitrogen’s RiboMinus Eukaryote Kit for RNA-Seq. Double-stranded cDNA was synthesized using both poly-A enrichment (i.e., oligo-dT priming) and non-poly-A enrichment (i.e., random priming) because of our interest in capturing diverse transcripts to aid in identifying piRNA targets (e.g., transposons) and precursor loci (e.g., piRNA clusters, which may or may not be polyadenylated). The resulting cDNA was used to make shotgun libraries with the IntegenX PrepX DNA library preparation kit; the Chip-Seq library prep method was used, rather than the standard shotgun method, to avoid biasing the library based on insert length. RNA-Seq libraries and small RNA libraries were sequenced on the Illumina MiSeq (2 × 300 PE) platform. Small RNA libraries were then sequenced to greater depth of coverage on an Illumina HiSeq 2000 (50 SR). RNA quality assessment, library construction, and sequencing were all performed by the Genomics Resources Core of the Institute for Bioinformatics and Evolutionary Studies (IBEST) at the University of Idaho.

Transcriptome Assembly and Annotation

For each individual (i.e., female, male), low-quality reads were eliminated with Trimmomatic (Bolger et al. 2014), and the remaining shotgun reads were assembled with Trinity using default parameters (Grabherr et al. 2011). For the resulting female and male assemblies, we annotated contigs containing TEs using BLASTx against TE-encoded proteins (http://www.repeatmasker.org/RepeatProteinMask.html#database), with an e-value cutoff of 1e−5. We annotated contigs containing non-TE-derived protein-coding genes using BLASTx against the Swiss-Prot protein database (http://www.uniprot.org/uniprot/), with an e-value cutoff of 1e−5.

Small RNA Pool Characterization

Following quality filtering and adapter trimming, we sorted all small RNAs by size and focused our analyses on those between 18 and 36 nt in length. We plotted the length distribution of these sequences in the female and male samples and looked for a peak at the expected size of piRNAs (27–31 nt). We also looked for a peak at the expected size of siRNAs (22–23 nt), as this class of small RNAs has also been shown to target TEs in Xenopus tropicalis (Armisen et al. 2009). We used Bowtie 1.1.2, allowing 0 mismatches, to map the 22–23 nt RNAs from each sample against miRBase to identify microRNAs (Langmead et al. 2009). We used the FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) to determine the number of unique small RNA sequences.

Characterization of Putative piRNAs

We tested whether the sequences comprising the peaks between 27 and 31 nt (i.e., the putative piRNAs) in the female and male samples show the characteristic piRNA bias towards having a uracil in the 5′ position by calculating the proportion of sequences with a 5′ U using in-house Perl scripts.

Identification of Small RNA Targets

Small RNAs guide effector proteins to their target sequences by base complementarity. Accordingly, we mapped the four relevant subsets of the total small RNA dataset (i.e., the female and male putative piRNAs, and the female and male putative siRNAs) to several other datasets to identify their targets using Bowtie (version: bowtie-0.12.9; command: bowtie-a -v 1 -best -strata). First, we mapped each small RNA dataset to its respective reference transcriptome. Next, we mapped each small RNA dataset to the reference transcriptome from the other sex. Finally, we mapped each small RNA dataset to the TE-containing contigs we assembled from our previously published 454 genomic shotgun sequence data from the related species Desmognathus ochrophaeus. Our goal in mapping to the latter two datasets was to identify piRNA targets and precursor transcripts that were not present in the reference transcriptome dataset derived from the same individual as the small RNA dataset. This absence could reflect either methodology (i.e., the transcript was not sequenced and/or assembled, despite the locus being expressed) or true biological reality (i.e., the transcript was not present in the tissue because the locus was transcriptionally silenced). We allowed up to one mismatch when mapping small RNAs to the two reference transcriptomes. When mapping to the D. ochrophaeus genomic shotgun contigs, however, we allowed up to two mismatches to reflect the evolutionary distance between the two species. We suggest that these annotations should be interpreted with less confidence. For each small RNA dataset, we calculated the proportion of small RNAs mapping to the three classes of transposable elements—LTR retrotransposons, non-LTR retrotransposons, and DNA transposons. Within each class, we ranked transposable element superfamilies by the density of mapped small RNAs. We also calculated the proportion of small RNAs mapping to protein-coding genes.

Measurement of Ping-Pong Cycle Activity in Putative piRNAs

In certain gonadal cell developmental stages and compartments, piRNAs can be generated by a ping-pong cycle that yields TE-matching sense and antisense piRNAs. The piRNAs generated by this mechanism exhibit a signature: two piRNAs of opposite polarity that display a 10-nucleotide overlap in their genomic position (Aravin et al. 2007; Brennecke et al. 2007; Gunawardane et al. 2007; Wang et al. 2014; Zhang et al. 2011). In male mice, ping-pong amplification is most prevalent during embryogenesis (Aravin et al. 2007, 2008), but a ping-pong signature is present in adult male and female gonads of other mammals, zebrafish, and Xenopus (Armisen et al. 2009; Houwing et al. 2007; Robine et al. 2009; Roovers et al. 2015). Accordingly, we calculated the overall ping-pong fraction in the female and male putative piRNA datasets. We used RepeatMasker (version 3.2.9; http://www.repeatmasker.org), with the TE sequences annotated from the RNA-Seq datasets as a custom repeat library, to identify TE-mapping piRNAs and determine their sense/antisense orientation relative to the TE transcripts. We used the intersectBed function in BEDTools (Quinlan and Hall 2010) to identify sense/antisense piRNA pairs with a 10-nucleotide overlap. We then identified the individual TE families targeted by ping-pong pairs in both samples.

Analysis of piRNA Pathway Proteins

To complement our analysis of the piRNA pool in salamanders, we also analyzed the amino acid sequences of proteins known to be involved in the piRNA pathway. We chose 12 proteins with diverse roles in piRNA-mediated TE silencing whose patterns of molecular evolution were previously analyzed in diverse vertebrates—ASZ1/GASZ, DDX4, FKBP6, HEN1, KIF17, MOV10L1, PIWIL1, PIWIL2, PLD6, PRMT5, TDRD1, and TDRKH (Yi et al. 2014). We used reciprocal best hit BLAST searches to identify the contigs containing orthologs of each gene from transcriptome datasets for three salamander species: D. fuscus (present study), Ensatina eschscholtzii (Mohlhenrich and Mueller 2016), and Cryptobranchus alleganiensis (Data available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.kv57r) using the human protein sequences as queries. We obtained the amino acid sequences of orthologs from other non-salamander vertebrates (Anolis carolinensis, Bos taurus, Danio rerio, Gallus gallus, Gasterosteus aculeatus, Homo sapiens, Monodelphis domestica, Monopterus albus, Mus musculus, Oreochromis niloticus, Oryzias latipes, Petromyzon marinus, Python bivittatus, Taeniopygia guttata, Takifugu rubripes, Tetraodon nigroviridis, Thamnophis sirtalis, Xenopus (Silurana) tropicalis, and Xiphophorus maculatus) as well as the salamander Ambystoma mexicanum from GenBank and EMBL. For each protein-coding gene, we performed multiple sequence alignments of amino acid sequences using PSI-coffee, an aligner in the T-coffee package that aligns distantly related sequences based on homology extension. We trimmed alignments to positions receiving a T-coffee alignment score of “good” or better and estimated phylogenetic trees from each alignment in MrBayes 3.2, using a mixed model of amino acid substitution (Notredame et al. 2000; Ronquist et al. 2012). We ran each analysis for 10,000,000 generations, sampling every 1000, with three heated chains. We discarded 25 % of the sampled trees as burn-in and verified convergence by comparison of the average deviation of split frequencies between two independent runs. Our goal in estimating these phylogenies was to determine whether overall patterns of sequence divergence are consistent with conserved pathway function across the vertebrate clade. To this end, we asked whether salamander branch lengths differ from those in other taxa. For unrooted trees (i.e., proteins for which we were unable to obtain an ortholog from P. marinus to use as outgroup), we restricted this comparison to the tetrapod clade. To complement the phylogenetic analyses, we also tested for the presence of relevant functional domains in each salamander protein. We used the amino acid sequences of the twelve proteins from salamanders, as well as from Mus musculus and Xenopus (Silurana) tropicalis, to search against the NCBI conserved domain database using RPS-BLAST (http://www.ncbi.nlm.nih.gov/cdd).

Results

Small RNA-Seq and RNA-Seq Datasets

Summary statistics for the small RNA-Seq and RNA-Seq datasets are presented in Table 1. The small RNA-Seq dataset includes 9,364,709 reads and 13,903,326 reads for the female and male samples, respectively, in the 18–36 nt size range. We annotated contigs in the assembled RNA-Seq datasets by 19,100 protein-coding genes and 13,611 protein-coding genes in the female and male samples, respectively. We annotated 11,376 and 15,727 contigs as transposable elements in the female and male samples, respectively. All small RNA-Seq and RNA-Seq datasets are deposited in the NCBI short read archive (SRA) under BioSample accessions SAMN05785596-99.

Table 1 Summary statistics for the small RNA-Seq and RNA-Seq datasets

Small RNA Pool Characterization

The length distributions for small RNAs between 18 and 36 nt for both female and male samples are shown in Fig. 1. There is a large peak at 27–31 nt in both samples, corresponding to the expected size of piRNAs; for brevity, we refer to these putative piRNAs as piRNAs hereafter. There are 3,859,744 total piRNA sequences (1,119,179 unique) in the female and 6,624,981 total piRNA sequences (1,006,994 unique) in the male. In the female, 67 % of piRNAs have a 5′ U. In the male, 72 % of piRNAs have a 5′ U. These data suggest that piRNAs are (1) transcribed in the adult male gonad (germline and/or somatic tissue) and (2) transcribed in the female and likely deposited into the egg. There is a smaller peak at 22–23 nt in both samples, corresponding to the expected size of siRNAs and miRNAs. In the female, 43,517 of the 22–23 nt sequences are miRNAs. In the male, 49,795 of the 22–23 nt sequences are miRNAs. For brevity, we refer to the remaining 22–23 nt RNAs as siRNAs hereafter. In the female, 41 % of the small RNA reads between 18 and 36 nt in length are piRNAs and 13 % are siRNAs. In the male, 48 % of the small RNA reads between 18 and 36 nt in length are piRNAs and 11 % are siRNAs (Table 1). Consistent with other vertebrates, piRNAs are the most abundant class of small RNAs in both female and male D. fuscus gonads.

Fig. 1
figure 1

The length distribution of small RNAs between 18 and 36 nt in a male and female D. fuscus. The peaks at 27–31 nt likely correspond to piRNAs, whereas the peaks at 22–23 nt likely correspond to siRNAs and miRNAs

Small RNA Targets or Precursors

The success rates of mapping the gonadal small RNAs to the two D. fuscus transcriptomes and the heterospecific genomic TE contigs are summarized in Table 2. The proportions of small RNAs mapping to protein-coding transcripts and transposable element classes are summarized in Table 3. We were able to map 25.1 and 31.2 % of female and male piRNAs, respectively, and 27.0 and 22.0 % of female and male siRNAs, respectively. Overall, a higher percentage of siRNAs than piRNAs mapped to protein-coding genes, and a higher percentage of piRNAs than siRNAs mapped to transposable elements. TE superfamilies with mapped small RNAs in each dataset are ranked by density of reads in Electronic Supplementary Material 1. For all small RNA datasets, Gypsy and L2 are the most frequently mapped superfamilies; these are the most abundant superfamilies in the genomes of salamanders of the family Plethodontidae (which includes D. fuscus) and among the top three most abundant superfamilies in other salamanders (Sun and Mueller 2014; Sun et al. 2012b).

Table 2 Overall small RNA target identification success rates based on mapping to the two D. fuscus transcriptomes and the heterospecific genomic TE contigs
Table 3 Overall summary of piRNA and siRNA mapping

Detection of the Ping-Pong Signature in Putative piRNAs

Results from the analysis of piRNA ping-pong cycle signatures are summarized in Table 4. We detected ping-pong pairs mapping to TEs in both the female and male samples. In the female, we identified 796 unique piRNAs that were a part of ping-pong pairs. These piRNAs comprised 0.07 % of the total unique piRNAs and 0.12 % of the total piRNA pool; they mapped to 87 different families representing all three TE classes. In the male, we identified 902 unique piRNAs that were a part of ping-pong pairs. These piRNAs comprised 0.09 % of the total unique piRNAs and 0.43 % of the total piRNA pool; they mapped to 60 families representing all three TE classes. Taken together, these results suggest that the ping-pong cycle occurs at low levels in the adult salamander gonads, contributing to the silencing of all three classes of transposable elements.

Table 4 Summary of piRNAs involved in ping-pong pairs

Analysis of piRNA Pathway Proteins

Phylogenies estimated for the twelve selected piRNA pathway proteins are shown in Fig. 2 (PIWIL1) and Electronic Supplementary Material 2 (all other proteins). All alignments are available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.kv57r. All proteins were identified in at least one salamander dataset, and all have intact ORFs. For 11 of these proteins, the branch lengths estimated for salamanders fall within the range of lengths estimated for other vertebrates; for FKBP6, all salamander branch lengths are shorter than other lineages. Although this is a conservative test, none of these patterns of sequence evolution suggests unusual rates of amino acid replacement in salamanders. Functional domains identified in each salamander protein are summarized in Table 5. For all proteins examined, all functional domains were identified with significant e-values.

Fig. 2
figure 2

Phylogeny estimated for PIWIL1 amino acid sequences. Numbers at nodes are Bayesian posterior probabilities. PIWIL1 salamander branch lengths are among the shortest on the tree; however, for the majority of piRNA pathway proteins, branch lengths for salamanders fall within those estimated for the other vertebrates, consistent with conserved pathway function across the vertebrate clade

Table 5 Conserved domains identified in salamander protein sequences

Discussion

Our data suggest that the piRNA pathway is expressed in the salamander germline, and that the piRNA pool includes molecules complementary to TE transcripts of all three classes. Although putative piRNAs mapping to TEs have been identified in the model salamander Ambystoma mexicanum in the blastema, a dedifferentiated structure formed at the onset of limb regeneration (Zhu et al. 2012), to our knowledge, no previous studies have reported putative TE-mapping piRNAs in salamander gonads. Given our results, how is it that such high TE levels accumulate in salamander genomes? Below, we briefly summarize piRNA-mediated TE suppression and pose several hypotheses for the evolution of high TE loads, despite the expression of TE-silencing machinery.

piRNA-Mediated TE Suppression

TEs and the suppression machinery of their hosts are engaged in an arms race, resulting in a dynamic relationship that oscillates between higher and lower activity levels of individual TE families (Blumenstiel 2011). When a novel TE appears in a naïve host genome, its initial activity level can be high. Novel TEs can result from horizontal transfer or from sequence divergence during vertical transmission; in salamanders, the most abundant TE superfamily (Gypsy/Ty3) shows evidence of the latter (Sun and Mueller 2014). After this initial appearance, the host’s piRNA-mediated silencing pathway may adapt to target the novel TE through the following steps: First, the novel TE may transpose into an existing piRNA cluster locus. piRNA cluster loci are genomic regions transcribed into long RNA molecules that are processed into mature piRNAs; this processing pathway is called primary piRNA biogenesis (Brennecke et al. 2007; Girard et al. 2006; Lau et al. 2006; Malone et al. 2009; Robine et al. 2009; Vagin et al. 2006). Some piRNA cluster loci contain fragments of active and previously active TEs; a subset of the piRNAs produced from such loci is therefore complementary to TE sequences. Second, the PIWI/piRNA complex can enter the nucleus of gonadal cells and guide transcriptional silencing of complementary genomic TE loci through epigenetic modification (Huang et al. 2013; Le Thomas et al. 2013; Rozhkov et al. 2013; Sienski et al. 2012; Sytnikova et al. 2014). piRNAs bound by AUB or AGO3 proteins can also act in the cytoplasm to guide destruction of TE transcripts (Brennecke et al. 2007; Gunawardane et al. 2007; Li et al. 2009). The piRNAs formed by primary biogenesis can feed into the production of additional piRNAs through the ping-pong cycle (i.e., secondary amplification) (Brennecke et al. 2007; Gunawardane et al. 2007; Han et al. 2015; Senti et al. 2015; Wang et al. 2015). These so-called secondary piRNAs also guide TE suppression through associations with Piwi proteins. In addition, secondary piRNAs initiate phased production of diverse primary piRNAs from cleaved TE transcripts (Han et al. 2015; Mohn et al. 2015). Once piRNAs are activated against a TE family through these processes, transposition of the TE is suppressed, and its activity level in the host decreases.

How Might piRNA-Mediated TE Suppression Differ in Salamanders?

In salamanders, although LTR retrotransposons are overrepresented relative to the composition of other vertebrate genomes, all three classes of TEs are abundant (Keinath et al. 2015; Sun and Mueller 2014; Sun et al. 2012b). This suggests that salamanders’ gigantic genomes do not reflect the “escape” of a few TE families from detection/targeting by the host silencing machinery. Rather, it suggests that salamanders differ from other vertebrates in global (i.e., genome wide) TE suppression. Our analysis of gonadal gene and small RNA expression suggests the presence of the Piwi pathway in salamanders, allowing us to infer that complete loss of the Piwi pathway is an unlikely mechanism for global TE expansion in their gigantic genomes.

The global level of piRNA-mediated TE suppression reflects the extent to which all individual TE families in a genome are suppressed by the pathway. More specifically, it reflects (1) the proportion of novel TEs that becomes targeted, (2) the speed with which novel TEs become targeted, and/or (3) the extent to which transposition of targeted TEs is suppressed. Based on our results, we hypothesize that salamanders have evolved less comprehensive TE suppression through changes in one or more of these variables; relative to other vertebrates, we would predict that (1) proportionally fewer TEs ultimately become targeted in salamanders, (2) TEs that do become targeted remain untargeted for a longer period in salamanders, and/or (3) residual transposition levels of targeted TEs are higher in salamanders. Such differences, in turn, could result from many evolved changes to the piRNA pathway machinery. For example, (1) and (2) could result from piRNA clusters in salamanders being smaller in size, fewer in number, or different in some other way that similarly reduces their efficacy as TE “traps.” Our third prediction of higher residual transposition levels in salamanders could result from the guide RNA/effector protein complexes being less likely to interact with and suppress their target TE loci/transcripts in salamanders. Future efforts aimed at (1) recovering genomic piRNA cluster sequences, despite the inherent challenges of assembling large repetitive genomes; and (2) assessing function of all piRNA pathway proteins, through both sequence-based and functional analyses, will allow refinement and testing of these hypotheses. However, consistent with less comprehensive TE suppression, the proportion of piRNAs relative to siRNA/miRNAs (Fig. 1) is lower in D. fuscus than in other vertebrate gonads (Chirn et al. 2015). In addition, the levels of TE-directed piRNAs in salamanders (14 to 18 %; Table 3) are low compared with the levels in other species that have much less substantial genomic TE loads; for example, in Xenopus and some mammals, ~20 to 25 % of the piRNAs map to TEs, whereas TEs comprise only ~30 to 45 % of the genome (Girard et al. 2006; Lau et al. 2006, 2009). In Drosophila melanogaster, ~70 % of the piRNAs map to TEs, whereas TEs comprise only ~12 % of the genome (Adams et al. 2000; Brennecke et al. 2007; dos Santos et al. 2015). Additionally, within the salamander genome, the most abundant class of TEs—LTR retrotransposons—is not disproportionately targeted in the piRNA pool. These patterns suggest that expansion of TE load in salamanders has not been accompanied by an increase in TE-targeting piRNA levels; however, we interpret these numbers with extreme caution because our piRNA mapping was necessarily completed without a reference genome.

With the evolution of less comprehensive TE suppression through any means, overall TE transposition rate increases, resulting in the accumulation of more new TE insertions in populations. For neutral/effectively neutral TE insertions, this translates into an increased rate of fixation of new TE loci. However, the fitness disadvantages of a TE insertion reflect, in part, whether or not it is transpositionally active; silenced loci are less disadvantageous, all else being equal. Because of this, less comprehensive TE suppression decreases the proportion of TE insertions that are effectively neutral, allowing a smaller proportion of novel TE insertions to drift to fixation (Lu and Clark 2010). Despite this lower proportion of fixation, less comprehensive TE suppression is predicted to increase the number of new TE insertions accumulating over evolutionary time (Lu and Clark 2010). Based on our results, we suggest that less comprehensive TE suppression by an expressed piRNA pathway has contributed to genomic gigantism in salamanders.