Transposable elements (TEs) are diverse sequences that move from one genomic locus to another using mechanisms that include initiation of double-strand DNA breaks, integration of sequences, and, for TEs with an RNA intermediate, reverse transcription (Craig et al. 2002). Novel TE insertions can disrupt protein function and gene expression. TE sequences also cause ectopic recombination. Finally, TEs can sustain gain-of-function mutations, creating novel functional sequences (De Gobbi et al. 2006). Our understanding of TE diversity lags behind other portions of the genome, in part because TEs were long thought to be functionless (albeit mutagenic) “junk” with neither protein-coding nor regulatory relevance (Doolittle and Sapienza 1980). TE-derived sequences are now known to form critical parts of genes and gene expression networks in many organisms (Feschotte 2008). In addition, TEs are the main determinant of genome size, which has profound effects on nucleus and cell sizes, cell cycle duration, cell differentiation rate, metabolic rate, embryonic developmental rate, and regeneration rate (Gregory 2005).

Animals vary tremendously in their TE loads, producing a > 6000-fold difference in overall genome size across species (Gregory 2017). Why do genomes differ so widely in suppression and elimination of TEs?

In the past decade, major discoveries have revealed that small RNA pathways regulate TE proliferation (Siomi et al. 2011). Prior to these discoveries, natural selection, genetic drift, and deletion were all proposed to explain inter-specific differences in TE loads (Cavalier-Smith 1991; Lynch 2007; Petrov 2002). Today, these explanations appear overly simplistic; they should be revisited, incorporating evolved differences in TE control pathways across species. However, to date, relatively little research has focused on the evolution of small RNA-based mechanisms of TE control (for examples, see Blumenstiel et al. 2016; Kelleher and Barbash 2013; Madison-Villar et al. 2016).

What challenges and opportunities exist in this emerging area? Many model organisms were chosen for their small, non-repetitive, and “tractable” genomes, but they provide an incomplete picture of genome biology. For example, TEs and TE silencing pathways in Drosophila melanogaster have been extensively characterized, but only ~12% of the fly genome is composed of TEs (Adams et al. 2000). Studying evolved differences in TE control requires comparisons across animals that vary dramatically in genome size, including some very big genomes (e.g., lungfishes, salamanders) (Gregory 2017). The challenges are real—enormous repetitive genomes remain difficult to assemble and annotate (Keinath et al. 2015), which makes small RNA mapping and small RNA precursor locus identification difficult. The opportunities, however, are just as real—enormous TE-rich genomes hold the key to understanding the mechanisms that make some species so permissive to TE activity.

To understand natural diversity in TE control in animals, research should begin by focusing on the Piwi-interacting RNA (piRNA) pathway, a small RNA genome defense system that suppresses TE activity in the animal germline. piRNAs are a diverse class of small RNA molecules that are bound by Piwi proteins and guide transcriptional and post-transcriptional silencing of TEs through base complementarity. When a novel TE invades a naïve host genome (e.g., by horizontal transfer), it is typically suppressed by the host’s piRNA pathway. First, the novel TE may transpose into a piRNA cluster, a genomic region transcribed into a long RNA molecule that is processed into mature piRNAs. Once a TE is thus “trapped” in a piRNA cluster, piRNAs complementary to its sequence are produced. piRNAs are bound by one of the several Piwi proteins; those bound by Piwi1 enter the nucleus and guide transcriptional silencing of complementary genomic TE loci through epigenetic modification. In contrast, piRNAs bound by Aub or Ago3 remain in the cytoplasm and guide post-transcriptional silencing of complementary TE loci through destruction of TE transcripts (Dumesic and Madhani 2014; Iwasaki et al. 2015; Siomi et al. 2011). A novel TE may also insert outside of existing piRNA clusters, but initiate the formation of a new cluster (Shpiz et al. 2014). piRNAs can amplify the production of more piRNAs through a feed-forward pathway called the ping-pong cycle (Brennecke et al. 2007). These secondary piRNAs guide TE suppression through associations with Piwi proteins and initiate phased production of even more piRNAs from cleaved TE transcripts (Han et al. 2015; Mohn et al. 2015).

Studies of large animal genomes show accumulation of many types of TEs—not just a few that have managed to evade piRNA detection and silencing (Sun and Mueller 2014). This pattern suggests global differences in piRNA biogenesis and pathway function that produce a cellular environment more permissive to TE activity—through (1) fewer novel TEs becoming targeted, (2) longer lag time before novel TEs become targeted, and/or (3) less effective suppression of targeted TEs. Understanding these global differences is key to incorporating piRNA biology into models of genome size evolution.

What mechanisms underlie the more permissive TE environment of large genomes? There are a number of possibilities that should be tested through comparative analyses of genomes of different sizes: (1) Does the piRNA profile differ between large and small genomes? More specifically, does the piRNA pool target a smaller proportion of the active TE landscape in large genomes? What differences in the molecular mechanisms of piRNA production underlie the differences in the piRNA pool? (2) Does transcriptional silencing of TE loci through epigenetic modification differ between large and small genomes? More specifically, is a smaller proportion of the active TE landscape silenced by H3K9me3 modification of histone H3 and/or CpG methylation of DNA in large genomes? What differences in the molecular mechanisms of epigenetic mark deposition underlie the differences in the epigenome? (3) Does post-transcriptional silencing of TEs through transcript cleavage differ between large and small genomes? More specifically, do the piRNA/Piwi protein complexes destroy a smaller proportion of TE transcripts in the large cells of species with large genomes? What differences in the molecular mechanisms of piRNA/Piwi complex formation and function underlie the differences in transcript destruction? Finally, which evolutionary forces—mutation, selection, drift—have driven piRNA pathway evolution?

Studies such as these would advance the field of genome evolution by showing how evolved differences in ancient genome defense pathways have shaped evolutionary trajectories in genome size and content. Gigantic genomes remain understudied, given the technical challenges they pose. However, I argue that they are worth the challenge because they offer a unique perspective on how the core machinery that maintains genome integrity evolves, producing diversity across the Tree of Life.