Introduction

The gene clustering that occurs for both protein-coding and noncoding genes can reflect the evolutionary pressures to define a functional genomic landscape for suites of genes involved in similar biological functions, i.e., a co-regulatory gene complex (Chopra 2011). Amongst the noncoding RNA genes, miRNAs have emerged as key regulators of a broad spectrum of cellular functions (Berezikov et al. 2006; Bushati and Cohen 2007; Huang et al. 2010; Kozomara and Griffiths-Jones 2011; Li et al. 2010; Xiong et al. 2009). Many such miRNAs are located in the miRNA gene clusters, where they are often expressed as polycistronic transcripts (Lee et al. 2002; Saini et al. 2008). It is estimated that ~37 % of the known miRNAs in humans form clusters suggesting evolutionary and functional significance for the clustering phenomenon (Altuvia et al. 2005).

The miR-71/2 cluster (including miR-13 as a member of the miR-2 family) is conserved across the insects and other invertebrates (Marco et al. 2012; Marco et al. 2010); miR-2 is predicted to play an important role in the neural development (Marco et al. 2012). We previously demonstrated the duplication of the miR-71/2 cluster in the Lophotrochozoan parasite Schistosoma mansoni with a high level of conservation observed across the clustered sma-miR-71 and sma-miR-71b genes (de Souza et al. 2011). The miRNA database (version 19.0) contains 171 pre-miRs that belong to the miR-2 family and 33 pre-miRNAs that belong to the miR-71 family.

To elucidate the evolutionary history of the miR-71 and miR-2 families across the animal kingdom, we screened for new miRNA members of the family across the animal genomes. In this study, we identify 19 novel miRNAs, which are not reported in the microRNA database, miRBase, version 19.0 (http://www.mirbase.org) (Table 1). These novel miRNAs were identified by screening the candidate regions that surround the known precursors from the same species, or using the known ortholog precursors. In total, we have identified 12 novel miR-2, four novel miR-13, and three novel miR-71 genes (Table 1) (these newly predicted miRNAs are named based on their respective ortholog).

Table 1 Mature sequences of predicted miRNA miR-2, miR-13, and miR-71 not present in miRBase (version 19.0) and their respective orthologs

Our pipeline also identified an additional nine miRNAs, which have been shown in previous studies to be miR-71 or miR-2 family members (Chopra 2011; de Souza et al. 2011) (Table 1). We analyzed the structural characteristics and thermodynamic features by comparing the novel pre-miRNAs and the known miR-71, miR-2, and miR-13 genes deposited in mirBase (Supplementary Figure 1 and Supplementary Table 1). All the novel predicted miRs were highly conserved in terms of both primary and secondary structures, although, this partly reflects the conservative prediction criteria used to identify the novel miRNAs (Supplementary Figures 1, 2a/b/c, Supplementary Tables 1, 2). Furthermore, all the novel predicted miRNAs form stable hairpin structures (minimum free energy <−25 kcal/mol), which is essential for the processing of pre-miR transcripts into mature miRs (Supplementary Table 1). Our analyses of fifteen different pre-miR parameters demonstrated similar values to the set of known miRNAs tested (Supplementary Table 1).

To determine whether the miR-71/2 cluster is evolutionarily conserved in Protostome and Deuterostome species, we retrieved the genomic regions at the boundaries of the miR-71, miR-2, and/or miR-13 miRNAs for these species. We considered miRNAs to be clustered, if the members of both miRNA families were within 10 kb of each another. The miR-71/2 cluster is only conserved in Protostome species (including Ecdysozoan and Lophotrochozoan) (Fig. 1), except for the genus Drosophila that lacks miR-71 as previously described (Marco et al. 2010).The majority of miR-2 copies found within a miR-71/2 cluster are found within 5 kb of the nearest miR-71 copy. However, in some species, such as those in the Drosophila genus, clusters of miR-2/miR-13 genes were found to cluster together in the absence of miR-71. In Caenorhabditis briggsae, miR-2 was found ~11 kb distal from the miR-71 member in the genome, one kb higher than our bioinformatic cut-off. In our study, we considered cbr-miR-2 as clustered with cbr-miR-71 due to their high conservation compared to cel-miR-2. The mature sequences, cel-miR-2 and cbr-miR-2, displayed 100 % of identity at nucleotide level (Supplementary Figure 3).

Fig. 1
figure 1

Phylogenetic tree of life for animals indicating the miR-71/2 cluster distribution among different animal lineages. The highlighted shaded rectangles indicate the clustered miRNAs miR-71, miR-2, and miR-13 identified in this study. Each arrow represents a miRNA gene on each chromosomal region (horizontal bar) and illustrates the distribution of miR-71 or the cluster miR-71/2 in genomic regions of species within the Deuterostomia or Protostomia clades. Cnidaria and Porifera that lack miR-71 and miR-2 are used as outgroup clades. The presence or absence of the clustered miRNAs in each clade is indicated in the text on the right side of the panel. The divergence between the clades is indicated as million years ago (Mya) (Ayala and Rzhetsky 1998; Douzery et al. 2004). w/o without

Whereas the miR-2/miR-13 family was found to be restricted to Protostomes, miR-71 was found in Deuterostome species from the clades Echinodermata, Hemichordata, and Cephalochordata, but not in Vertebrata and Urochordata (Fig. 1) Marco et al. 2012). Indeed, Urochordata and Vertebrata genomes lack both miR-71 and miR-2 family members (Fig. 1). In the Deuterostome species, Strongylocentrotus purpuratus and Saccoglossus kowalevskii, miR-71 was found in isolation; this is consistent with the presence of nonclustered copies of the miR-71 family in Protostome species. In contrast, in Branchiostoma floridae, miR-71 is clustered with miR-4890, which is unrelated to the miR-2/miR-13 family (Fig. 1). The presence of different miRNAs within a miR-71/2 cluster is also observed in other Protostome species, such as Schmidteamediterranea (Fig. 1, miRBase; Palakodeti et al. 2006).

The duplication of the miR-71/2 cluster has previously been reported in S. mansoni (de Souza et al. 2011) and Schistosoma japonicum (Huang et al. 2009). Here, we identify an extra miR-2 copy in S. japonicum (sja-miR-2f) and S. mansoni (sma-miR-2f) (Fig. 1). This additional miRNA clearly illustrates that both the sets of miR-71/2 are conserved between Schistosoma species. In S. mansoni and S. japonicum, the miR-71/2 cluster exists as two separate clusters each containing four miRNAs (one miR-71 and three miR-2 genes) found in the same order in each cluster (Fig. 1). The miR-71/2 cluster is also duplicated in Schmidtea mediterranea (Fig. 1), although, the characteristics of the clusters in S. mediterranea differ (Fig. 1). The miR-71/2 duplicated clusters in S. mediterranea are found in five separate clusters (mirBase 19.0). Three of these clusters consist of one miR-71 and one miR-2 gene. The other two clusters contain either three or four miRNA genes, including miRNAs from the other families. An isolated (nonclustered) copy of miR-2 is also present in S. mediterranea (miRBase 19.0). Although, the functional significance of the miR-71/2 clusters is currently unclear, the comparison of S. mediterranea to S. mansoni and S. japonicum highlights that the duplication of this cluster is conserved across these Platyhelminthes species, and may indicate that miR-71/2 cluster(s) have played significant roles in the evolution of the Schistosoma lineage.

The emergence of miR-2 in Protostomes, coupled to its absence in Deuterostomes suggests that this miRNA gene could have a functionally significant role in Protostome species, possibly in the neural development as the miR-2 cluster displays an enrichment of neural development predicted target genes in Protostomes (Marco et al. 2012). The copy-number amplification of miR-2 genes in the Arthropoda, Annelida, and Platyhelminthes lineages could reflect a divergent neofunctionalization of miR-2 activity between miR-2 homologs and/or a gene dosage effect on the miR-2 activity in these lineages. Furthermore, the conserved clustering pattern of miR-2 with miR-71 to form the miR-71/2 cluster could reflect a functional co-evolutionary relationship of these two miRNAs within Protostome species. Finally, the segmental duplication of the miRNA cluster in specific lineages may also have a functional significance, for example, in S. mansoni, one of the segmentally duplicated miR-71/2 clusters, is located on the female chromosome W where it has a potential to play a role in the sexual differentiation (de Souza et al. 2011).

Methods

miRNAs and Genomic Regions

The mature and precursor miRNA sequences were retrieved from miRBase (Welcome Trust Sanger Institute’s miRBase, http://microrna.sanger.ac.uk—release 19.0; (Kozomara and Griffiths-Jones 2011). The cluster region in each genome was identified using BLASTN (default parameters) against the available animal genomes. For Acyrthosiphon pisum genome and Rhodnius prolixus genome, we used BLASTN using as query known ortholog clustered miRNAs from Ecdyzoan species. The new miRNAs were identified in the following species: Ixodes scapularis (VectorBase, http://www.vectorbase.org, Ixodes scapularis annotation IscalW1; Lawson et al. 2009), Daphia pulex (Colbourne et al. 2011), Acyrthosiphon pisum (Human Genome Sequencing Center at the Baylor College of Medicine—http://www.hgsc.bcm.tmc.edu/) (Consortium 2010), Anopheles gambiae (VectorBase, http://www.vectorbase.org, Anopheles gambiae annotation AgamP3.5) (Lawson et al. 2009), Rhodnius prolixus (VectorBase, http://www.vectorbase.org, Rhodnius prolixus annotation RproC1, SuperContig.feb11) (Lawson et al. 2009), Pediculus humanus (VectorBase, http://www.vectorbase.org, Pediculus humanus annotation PhumU1) (Lawson et al. 2009), Culexquinque fasciatus (VectorBase, http://www.vectorbase.org, Culex quinquefasciatus annotation CpipJ1)(Lawson et al. 2009), C. briggsae (WormBase database, http://www.wormbase.org/) (Yook et al. 2012), S. mansoni (GeneDB, www.genedb.org) (Berriman et al. 2009; Protasio et al. 2012), and S. japonicum (GeneDB, www.genedb.org) (Consortium 2009). The genomic DNA fragments containing the miRNA precursors flanked by 10,000 nt on each side were retrieved from the genome data. A distance of ~10 kb or less between consecutive miRNA genes on the genome has been used to consider miRNA genes clustered. For instance, the miRBase version (version 19.0) has considered the clustered miRNAs within 10,000 nt. Hence, we also considered this cut-off distance to clustered miRNAs for our analysis. The sequences were stored in multi-fasta format for further analysis.

Computational Prediction of Clustered microRNA Genes

Hairpin-like structures were identified from the DNA fragments using einverted EMBOSS and BLASTN tools. The parameters used for einverted program were minimum score threshold 20, gap penalty 4, match score 2, mismatch score 2, and maximum extent of repeats 110 collecting sequences with the length between 50 and 110 nt. BLASTN was used to find matches to known pre-miR structures. These hairpin-like sequences were filtered using MFE (minimal free energy), GC content, mature sequence homology, and noncoding RNAs. MFE of the RNA secondary structures were performed using Vienna RNA Package with the RNAfold and the following parameters: RNA secondary folding energy threshold −20 kcal/mol and with the options “-p -d2 -noLP” (Hofacker 2009). These structures were filtered with GC content ranging from 30 to 65 %. In addition, the animal mature miRNAs were aligned against these sequences and not more than six mismatches were accepted in whole mature miRNAs and one mismatch in seed region (2–8 nt). To remove the other classes of noncoding RNAs (i.e., rRNA, snRNA, SL RNA, SRP, tRNAs, and RNase P), the putative hairpin-like sequences near the putative cluster members were compared against the Rfam microRNA Registry (version 10.0)(Gardner et al. 2009).