Introduction

Several species of Neotropical freshwater catfish have high commercial value and are of growing importance for production in aquaculture systems. Among these is the cachara (Pseudoplatystoma reticulatum, Eigenmann & Eigenmann, 1889). The biology of this species is largely unknown, despite its relative economic importance (Bignotto et al. 2009), and generation of basic biological information and tools for developing essential knowledge for rearing the species will greatly facilitate the expansion of commercial aquaculture operations.

Species of the genus Pseudoplatystoma (family Pimelodidae, order Siluriformes) are migratory and are distributed throughout the main South American drainage basins. P. reticulatum has been found exclusively in the Paraná-Paraguay Basin. However, after a recent taxonomic review, its occurrence in the Amazon Basin has been under discussion (Buitrago-Suárez and Burr 2007; Torrico et al. 2009). Publicly available genetic information for the cachara is mostly restricted to the characterization of genetic diversity of wild populations with microsatellites (Prado et al. 2014) and partial mitochondrial and nuclear gene sequences (Bignotto et al. 2009; Torrico et al. 2009; Carvalho et al. 2011, 2012; Carvalho-Costa et al. 2011; Prado et al. 2011; Hashimoto et al. 2013; Pereira et al. 2013).

The generation of extensive genomic information for the species provided by recent developments in the field of genomics, and in particular high-throughput sequencing (NGS), will help the development of solid tools for use in phylogenetics, phylogeography, and population structure and diversity studies of wild populations. Moreover, the incorporation of genomic tools in breeding programs under development for the species will help accelerate genetic gains in traits associated with productivity, quality and profitability (Yáñez et al. 2015).

Mitochondrial DNA (mtDNA) sequence data has been widely used in studies on phylogenetics, evolution, phylogeography, and population and conservation genetics of different fish species (Avise 1994; Sosa et al. 2012; Borba et al. 2013a, b; Hahn et al. 2013). Most studies have been performed with sequencing data from fragments of a combination of mitochondrial genes. More recently, as whole mtDNA sequences become ever more available, methods to analyze complete mtDNA sequences have been proposed and are likely to yield better-supported results (Yu et al. 2007; Powell et al. 2013; Alam et al. 2014).

Thousands of species have already had their mitogenome completely sequenced and made available in public databases (Hahn et al. 2013). Out of 95 Siluriformes mitogenomes published as of October 2015, only four belong to species from the family Pimelodidae. The current study is the first to report the complete P. reticulatum mitochondrial genome sequence, in addition to performing the first phylogenetic analysis with previously published complete mitochondrial sequences from other Siluriformes.

Materials and methods

Sampling, sample processing, and NGS sequencing

Seven distinct tissues were sampled (white and red muscle, gills, liver, kidney, pituitary gland, and gonad) from 12 P. reticulatum captured at different locations within the Paraguay River Basin (Fig. S1, Supplementary material). Each individual was tested for species purity with a SNaPshot© assay (Villela et al. 2017a) developed based on previously published species-specific markers (Prado et al. 2011; Hashimoto et al. 2013).

Tissues were processed for total RNA extraction using Trizol® (Ambion), following manufacturer’s instructions. Total RNA samples were combined in equimolar amounts into eight different pools: one total RNA pool for each of the seven tissues sampled and a pool containing total RNA from all sampled tissues.

cDNA libraries were constructed for each of the seven tissue-specific RNA pools using TruSeq Stranded RNA Sample Preparation Kits (Illumina, San Diego, CA, USA) following manufacturer’s protocols. Average size of cDNA fragments was 250 bp, ranging from 100 to 580 bp. The seven tissue-specific libraries were sequenced on two lanes with an Illumina HiSeq 2000 platform with a protocol to produce 100 bp paired-end reads (SBS Sequencing Kit, v3). A cDNA library was also constructed with an RNA pool from all seven tissues and sequenced in a single run with an Illumina MiSeq platform, with a protocol to produce 300 bp paired-end reads (SBS Sequencing Kit, v3). Sequencing and library construction were performed at the W. M. Keck Center for Comparative and Functional Genomics at the University of Illinois, Urbana-Champaign (of Pseudoplatystoma reticulatum RNA samples were sent to the United States under exportation license No. 15BR016615/DF).

RNA-Seq and assembly of P. reticulatum mitochondrial genome

All obtained sequences were processed with the program Trimmomatic (Bolger et al. 2014) to identify and remove adapters and determine quality scores and trim sequences, removing end bases with quality scores <25. For the remaining positions, a sliding window with the same minimum quality requirements was applied. The MiSeq paired-end reads >300 bp were merged to obtain super-reads with the FLASh program (Magoc and Salzberg 2011).

De novo transcriptome assembly was performed using Trinity (Haas et al. 2013). Three assembled transcripts derived from the mtDNA were identified within the assembled transcriptome using BLAST searches with the P. corruscans mtDNA. These three transcripts were further processed, along with super-reads generated with MiSeq sequences, using the SSAKE program (Warren et al. 2007) to generate a final consensus mtDNA sequence.

Quality-trimmed reads were mapped to the P. reticulatum complete mitogenome using BWA software (Li and Durbin 2010) and SNP (Single Nucleotide Polymorphism) detection was performed using Freebayes (Garrison and Marth 2012). SNPs with minor allele frequencies (MAF) >0.09 in positions with Read Depth (RD) >522 were considered for further analysis and annotation.

D-Loop sequencing

Part of the P. reticulatum D-loop (control) region sequence was obtained by sequencing a 481 bp PCR fragment amplified from each of the same 12 individual used for RNA sampling. Amplicons were generated with primers CTCTTCCTGCACCCTACCAA and ATTGAGGGCATTCTCACAGG with the following conditions: 30 ng of genomic DNA, 1 pmol of each primer, 5.0 μL of Qiagen Multiplex PCR plus solution, 1.0 µL of QSolution, in a final volume of 10 µL. Thermalcycling conditions were: 95 °C (15 min), followed by 35 cycles of 94 °C (30 s), 50 °C (1 min) and 72 °C (1 min), with a final step of 72 °C (30 min). PCR products purified with ExoSAP-IT were sequenced in both directions with an ABI Prism 3100 automated sequencer with BigDye Terminator chemistry, following manufacturer’s instructions. Obtained sequences were assembled using Phred and Phrap (Ewing and Green 1998; Ewing et al. 1998). Individual sequences were aligned using ClustalW to identify polymorphisms and obtain a consensus sequence that was integrated with the mtDNA sequence derived from the transcriptome.

mtDNA sequence composition and annotation

The P. reticulatum mtDNA protein-coding genes were first annotated manually using the obtained sequence in comparisons with the P. magdaleniatum (NC_026526.1) and P. corruscans (NC_026846.1) mitochondrial genomes, using BLASTx at the NCBI site. tRNA genes and their secondary structures were identified with tRNAScan-SE (v1.21) (Lowe and Eddy 1997) using default parameters. P. reticulatum rRNA and tRNASer(GCT) genes were compared with P. magdaleniatum and P. corruscans mitogenomes. In addition, automated annotations of the P. reticulatum mitogenome were performed with MitoAnnotator (Iwasaki et al. 2013), DOGMA (Wyman et al. 2004) and MITOS (Bernt et al. 2013).

Mitochondrial genome organization

Nucleotide composition of mitogenome and protein-coding genes were determined using BioEdit (v. 7.2.5). Codon usage and RSCU values were calculated with MEGA 6.0 software (Tamura et al. 2013). Codon usage bias (CUB) and its statistical significance were characterized by Codon Deviation Coefficients (CDC). CDC, GC and purine (AG) contents were estimated in 13 protein-coding genes with the Composition Analysis Toolkit -CAT (http://cbrc.kaust.edu.sa/CAT), using the algorithm developed by Zhang et al. (2012). Strand asymmetry (AT and GC skew) was calculated according to Perna and Kocher (1995). L-strand replication origin (OL) and H-strand replication origin (OH) observed in the non-coding control region (D-Loop) of the P. reticulatum mitogenome were identified through alignments with previously annotated sequences from P. reticulatum and P. magdaleniatum. Regulatory elements were identified through alignments with D-Loop sequences from Tropheus moorii (Fischer et al. 2013), Petrochromis trewavasae (Fischer et al. 2013), Macropodus opercularis (Mu et al. 2015), Oreochromis niloticus (He et al. 2011), Oreochromis aureus (He et al. 2011) and Danio rerio (Broughton et al. 2001).

Sequence comparisons with mitogenomes from pimelodidae catfish species

The P. reticulatum mitogenome was compared with mitogenomes from four other catfish species from the family Pimelodidae: P. magdaleniatum (GenBank acc. #: NC_026526.1), P. corruscans (NC_026846.1), Pimelodus pictus (NC_015797.1), and Sorubim cuspicaudus (NC_026211.1), through the alignment of the sequences usingBioEdit (v. 7.2.5) and MEGA 6.0 (Tamura et al. 2013).

A 460 bp fragment of cytochrome b gene (CYTB) from 55 previously published reference sequences from eight different Pseudoplatystoma species (Carvalho-Costa et al. 2011; Lundberg et al. 2011; Torrico et al. 2009) was analyzed along with the obtained P. reticulatum mitochondrial sequence. The optimal nucleotide substitution model was identified with JModeltest 2 (Darriba et al. 2012). BEAST v.1.8.0 (Drummond et al. 2012) was used to run the Yule speciation model, considering a strict molecular clock, through 100 million MCMC iterations. Obtained results were visualized with FigTree v. 1.4.2 (tree.bio.ed.ac.uk/software/figtree/).

Phylogenetic analysis of siluriformes with complete mitogenome sequences

The full P. reticulatum mitochondrial obtained sequence was analyzed with all 94 publicly available Siluriformes complete mitogenomes, including samples from 29 of 36 described families (Ferraris 2007). The optimal evolutionary model selected was GTR + G + I: General Time Reversible with non-uniformity of evolutionary rates among sites using a discrete Gamma distribution (+G) with 5 rate categories and by assuming that a certain fraction of sites are evolutionarily invariable (+I). The phylogenetic tree was estimated by maximum likelihood bootstrap analysis (10,000 replications) performed with MEGA v6 (Tamura et al. 2013).

Results and discussion

RNA-Seq and assembly of P. reticulatum mitochondrial genome

Reciprocal interspecific hybrids between P. reticulatum and P. corruscans (pintado) are commonly produced for commercial rearing in aquaculture operations (Porto-Foresti et al. 2008). Hybrids are fertile (Prado et al. 2012) and can naturally backcross to purebred individuals in the wild or in captive conditions. Considering that visual evaluation is not fully precise for identifying hybrids, DNA testing of each individual was used to confirm that sampled individuals were all pure P. reticulatum.

De novo assembly of the transcriptome generated 312,766 contigs. Three of these contings, 9069, 3798, and 1692 bp in length, were observed to be highly homologous to P. corruscans mtDNA sequences (NC_026846.1). Further assembly of these fragments with merged cDNA sequences produced a single 16,281 bp sequence. Because sequencing was performed using cDNAs, the complete D-loop region could not be recovered. Gap filling with Sanger sequencing of PCR products amplified across the region revealed that only 295 bp of the P. reticulatum mtDNA had not been sequenced and assembled. The complete P. reticulatum mtDNA sequence was observed to have 16,576 bp in length, representing the first report of the complete mitogenome for the species (Genbank acc. # KU291530).

Sequencing of pooled samples allowed for the identification of mitogenome polymorphisms. A total of 77 SNPs with MAF >0.09 and RD >522, composed of 71 transitions (92.2%) and 6 transversions (7.8%), were observed (Table 1). Eight SNPs were observed in the 12 and 16 S ribosomal genes, two were observed in tRNA genes and the remaining SNPs were observed in protein coding genes, while 30 SNPs were observed in non-coding regions. A total of 13% (n = 10) of SNPs observed in coding regions result in amino acid changes (Table S1, Supplementary material). MAF and RD cutoff values were used to eliminate SNPs resulting from sequencing artifacts, observed especially at 3´ ends of reads, from the final annotated sequence (data not shown).

Table 1 Substitution types of 77 SNPs observed in the P. reticulatum mitogenome

Complete mitochondrial genomes have been typically obtained through sequencing of PCR-amplified overlapping fragments (e.g., He et al. 2011; Hrbek and Farias 2008; Mu et al. 2014; Wang et al. 2014). The rapid progress observed in different NGS technologies has greatly increased the number of complete mtDNA sequences made publicly available for different fish species, including tropical siluriformes, such as the trans-andean shovelnose catfish (Sorubim cuspicaudus, Restrepo-Escobar et al. 2014) and Pseudoplatystoma magdaleniatum (Rangel-Medrano et al. 2015). However, only two previous reports of studies using cDNA sequences to generate complete mitogenome de novo assemblies were observed (Heidtmann 2014; Mu et al. 2015). The analyzed NGS sequences were originally generated for characterizing the P. reticulatum transcriptome (Villela et al. 2017b), and although the applied strategy to assemble a complete mitogenome for the species required supplemental sequencing to unveil the untrasncribed portion of the mtDNA control region, the final result was highly satisfactory when compared to other published complete mitogenome sequences.

mtDNA sequence composition and annotation

Manual and automated annotation of the complete P. reticulatum mtDNA obtained with DOGMA (Wyman et al. 2004), MITOS (Bernt et al. 2013) and the MitoAnnotator database (Iwasaki et al. 2013) are detailed in Table S2 (Supplementary material). Comparisons of the obtained results revealed differences in rRNA gene sizes of up to 23 bp (e.g. 12 and 16 S) and of up to 99 bp in protein-coding genes sizes. Consequently, the initial and/or final positions of these genes also differed depending on the applied annotation method (Table S2, Supplementary material).

There were no differences in start/end positions for tRNA genes annotated manually (with tRNAScan) or automatically with MITOS, DOGMA and MitoAnnotator (Table S2, Supplementary material). The tRNASer(GCT) gene was not identified in the manual annotation, and tRNAPhe(GAA) and tRNAMet(TGG) genes were only identified with DOGMA. The tRNASer(TGA) and tRNALys(TTT) genes were mislabeled as tRNAPhe(AAA) and tRNAAsp(AGG) with DOGMA, respectively, but were correctly identified by other methods. Start and end positions for protein-coding genes were equally identified with MitoAnnotator and DOGMA, while observed differences resulted from the omission of Stop codon bases from the final annotation provided by MitoAnnotator. Considering all obtained results, the annotation information obtained with MitoAnnotator was chosen for submission to Genkank (Table 2; Fig. 1). This annotation was compared to publicly available mtDNA annotations from four different Pimelodidae species. Results can be observed in Table S3 (Supplementary material).

Table 2 Complete automated annotation of the P. reticulatum mitochondrial genome (16,576 bp) generated with MitoAnnotator (Iwasaki et al. 2013)
Fig. 1
figure 1

P. reticulatum mtDNA map generated with MitoAnnotator. Protein-coding genes, tRNAs, rRNAs, and non-coding regions are shown in different colors. Genes located within the outer circle are coded on the H-strand whereas the remaining genes are coded on the L-strand

Mitochondrial genome composition and organization

The P. reticulatum mitogenome observed structure is highly similar to other vertebrate mitogenomes. All 37 mitochondrial structural genes (13 respiratory chain complex protein-coding genes, 2 rRNAs, and 22 tRNAs) in addition to the control region were observed in the assembled mtDNA sequence (Table 2; Fig. 1). The same Heavy (H) and Light (L) strand coding pattern previously reported for most vertebrates was also observed in the P. reticulatum mitogenome. Only NADH dehydrogenase subunit 6 (NAD6) and the 8 tRNA genes (tRNAGln(TTG), tRNAAla(TGC), tRNAAsn(GTT), tRNACys(GCA), tRNATyr(GTA), tRNASer(TGA), tRNAGlu(TTC), and tRNAPro(TGG)) were observed to be coded on the L-strand, as the remaining genes are coded on the H-strand.

An atypical start codon (GTG) was only observed on COX1 while incomplete stop codons were observed in several protein-coding genes. The presence of incomplete stop codons is a common feature of mitochondrial genes of several vertebrates, including numerous species of fish (Kartavtsev et al. 2007; Alam et al. 2014; Jiang et al. 2014; Mu et al. 2015). Seven genes were observed to have incomplete stop codons: NAD2, COX2, COX3, NAD3, NAD4 and CYTB presented T as stop codon and ATP6 presented TA. All remaining genes were found to have the same (TAA) stop codon, which is generated via post-transcriptional polyadenylation of the mRNA 3′ end (Ojala et al. 1981).

Intergenic overlaps of protein-coding regions are common within vertebrate mitogenomes and have been reported for several fish species (Broughton et al. 2001; Kim and Lee 2004; Guo et al. 2008; Hrbek and Farias 2008; Prosdocimi et al. 2011; Fischer et al. 2013; Jiang et al. 2014; Mu et al. 2015). Overlap lengths between ATP8 and ATP6 (10 nucleotides), NAD4L and NAD4 (seven nucleotides), and NAD5 and NAD6 (four nucleotides) observed in P. reticulatum were within expected sizes. Intergenic spacer regions were found to be 48 bp in 11 regions (Table 2).

Nucleotide composition of the complete P. reticulatum mitochondrial genome (Table 3) was as follows: A: 31.45%; C: 28.47%; T: 24.63%; and G: 15.44%. GC content (43.92%) was lower than the AT content (56.08%), confirming the existing bias against G in the nucleotide composition of most vertebrate mitogenomes (Broughton et al. 2001; Guo et al. 2008; Fischer et al. 2013). Heavy (H) and Light (L) strand compositional asymmetries have been reported for vertebrate mitochondrial DNA, where the major coding strand (H) has been observed to be relatively rich in A and C in comparison to the L strand (Perna and Kocher 1995; Min and Hickey 2007). Similar trends of strand compositional bias were observed in P. reticulatum mtDNA (Table 3), where the AT and GC skews (Perna and Kocher 1995) were 0.12 and − 0.30; representing an excess of A over T and a shortage of G over C in the H strand, respectively.

Table 3 Base composition of P. reticulatum mitogenome protein-coding genes, rRNAs, tRNAs, and the control region

The observed nucleotide composition in P. reticulatum mitochondrial protein-coding genes (Table 3) was A: 30.26%, C: 30.12%, T: 25.36%, G: 14.26%. AT content was greater than GC content. Average AT and GC skew values of the 12 protein-coding genes were 0.09 and −0.36, while for NAD6 (L-encoded) values were −0.49 and 0.59, respectively. Protein-coding genes also showed overall anti-G bias (14.26% of G). In the third codon position, the anti-G bias was larger (5.12% of sites were G), which has been similarly reported in other vertebrate species (Clayton 1991; Meyer 1993; Fonseca et al. 2008). Anti-G biases in third codon positions of mitochondrial protein-coding genes have been reported for Danio rerio (Broughton et al. 2001), where 7% of sites were G; Arapaima gigas (Hrbek and Farias 2008), with anti-G bias of 3.8% G in the third position of the 12 heavy-strand encoded genes; Macropodus opercularis (Mu et al. 2015), which presented 14.8% G in the third codon position; and Oreochromis niloticus and O. aureus (He et al. 2011).

Min and Hickey (2007) showed the impact nucleotide skews have on amino acid composition of mitochondrial proteins. Mitogenomes with negative GC and positive AT skews encode proteins highly enriched in amino acids encoded by CA-rich codons. The most frequently used amino acids observed in P. reticulatum mitochondrial genome are leucine, threonine, proline and isoleucine (Table 4; Fig. 2). Threonine and proline are amino acids encoded by CA-rich codons and account for 16.57% of encoded amino acids.

Table 4 Pattern of codon usage in the P. reticulatum mitogenome
Fig. 2
figure 2

P. reticulatum predicted mitochondrial amino acid usage pattern

Preferred codons for each amino acid in protein-coding gene sequences were identified with the highest estimated RSCU values and were matched to all 22 identified tRNAs (Table 4), with the exception of methionine, alanine and threonine. The CUA codon (leucine, RSCU = 2.39) was the most frequently used, followed by AAC (lysine, RSCU = 1.42), AAU (asparagine, RSCU = 1.17) and AUC (isoleucine, RSCU = 1.01). GCG codon (alanine, RSCU = 0.16) was the least frequently used (Table 4).

Considering only amino acids with fourfold degenerate third positions (valine, serine, proline, threonine, alanine, arginine and glycine), codons complementary to the tRNAs ending in A were the most frequently observed in the P. reticulatum mitogenome. Among the twofold amino acids, C was the most frequently used nucleotide in the third codon position (Table 4). These results were consistent with the anti-G bias identified in the P. reticulatum mitogenome, since the G nucleotide was the least frequent in the third codon position. This pattern was similarly reported for Danio rerio (Broughton et al. 2001) and is observed in most vertebrates.

Codon usage bias (CUB) has been widely observed in different organisms (Ikemura 1985; Bulmer 1991; Hershberg and Petrov 2008) and is stronger in highly expressed genes used for production of large amounts of protein (Bulmer 1991; Hershberg and Petrov 2008; Zhang et al. 2012; Ma et al. 2014). Codon composition has been reported to have an important role in the regulation of gene expression (Paul et al. 2014), as genes that use codons recognized by more abundant tRNAs may be translated more efficiently (Bulmer 1991; Zhang et al. 2012; Ma et al. 2014). Therefore, accurate quantification of CUBs in protein-coding genes may be important for unrevealing mechanisms involved in gene evolution, function and expression regulation. Several methods have been proposed for estimating CUB. Codon deviation coefficient (CDC) estimates account for both GC and purine (AG) contents, addressing heterogeneous background nucleotide composition (BNC) in all three codon positions, and is considered to be a better estimate to support hypothesis of evolutionary pressures on gene function, independently of underlying reference data sets (Zhang et al. 2012; Paul et al. 2014). Obtained CDC estimates ranged from 0.18 to 0.37. NAD4L was the gene with the highest significant CDC value (0.37, P = 0.00), which supports its function as a highly expressed respiratory chain Complex I gene (Ma et al. 2014). Conversely, COX2, which is part of the cytochrome c oxidase complex also with a key role in the mitochondrial function, was the gene with the lowest significant CDC value (0.18, P = 0.00). In addition, estimated CUB for ATP8 (0.31), the shortest gene analyzed (153 pb), was not significant, which is consistent with simulations reported by Zhang et al. (2012), showing that short sequences tend to generate unreliable CUB estimates (Table 5).

Table 5 Estimates of codon usage bias (CUB) for 13 protein-coding genes of P. reticulatum mitogenome

rRNA and tRNA genes

All P. reticulatum mitochondrial tRNA genes were observed to have anticodons matching the vertebrate mitochondrial genetic code. The two possible codons for serine (TGA and GCT) and leucine (TAA and TAG) were observed, following the pattern reported for most vertebrates. Sequence overlaps were observed for tRNAIle and tRNAGln (1 bp), tRNAGln and tRNAMet (1 bp), and tRNAThr and tRNAPro (2 bp). With the exception of tRNASer(GCT), all of the tRNA genes were predicted to fold into classic cloverleaf secondary structures (Table S4, Supplementary material). Amino acid arm and anticodon loop are commonly 7 bp in length, while TΨC and anticodon arms are 5 bp in length and the DHU arm is 4 bp in length (Broughton et al. 2001; Kim and Lee 2004; Hrbek and Farias 2008). However, variations in the size, position, nucleotide composition, and several non-complementary pairings were identified in most of predicted P. reticulatum tRNA secondary structures (Table S4, Supplementary material).

Non-coding sequences

Two major non-coding regions were observed. A short region with a length of 29 bp (5′-CTT TCC CCG CCG CCT TAA AAA GGC GGG GA-3′), which corresponds to the L-strand replication origin (OL), was observed between tRNAAsn and tRNACys. In addition, one large non-coding region (922 bp), corresponding to the control region (D-Loop) was observed between tRNAPro and tRNAPhe. Base composition analysis revealed the control region is richer in AT (64.97%, Table 3) than the remaining of the mitogenome (56.08%), as previously reported (Mu et al. 2014, 2015).

Important regulatory elements for the replication and expression of the mitogenome were found in three domains of the D-Loop (Fig. 3), showing high identity in most sequence comparisons with other species (Table 6). The conserved GGGGG-box, which functions as the origin of the H- strand replication (OH), was observed in the central domain of the D-Loop region. The highly conserved central domains (CSB-E and CSB-F) were observed to be flanked by Termination Associated Sequences (TAS) and the conserved sequence blocks (CSB-1, CSB-2 and CSB-3), at the 5′ and 3′ ends, respectively. TAS has been reported to be associated with termination of newly synthesized H-strands during replication (Sbisà et al. 1997; Broughton et al. 2001). CSB-1, CSB-2 and CSB-3 contain the origin of the H-strand replication (OH) and two promoters for transcription of the Light (L-) and H-strands (LSP and HSP, respectively) (Walberg and Clayton 1981; Clayton 1991; Saccone et al. 1991; Taberlet 1996; Jemt et al. 2015).

Fig. 3
figure 3

Variable regions and conserved blocks in the non-conding region (D-Loop) of P. reticulatum mitochondrial genome

Table 6 Similarity of P. reticulatum control region regulatory elements with the others species

Sequence comparisons with mitogenomes from Pimelodidae Catfish species

The structure, content, and order of the mitochondrial structural genes were observed to be similar in all five mitogenomes analyzed (Table S3, Supplementary material). Overall sequence identities between P. reticulatum and P. corruscans, P. magdaleniatum, Sorubim cuspicaudus and Pimelodus pictus were 95, 93, 88 and 87%, respectively (Table 7). Sequence length varied from 16,544 bp to 16,576 bp, with the exception of P. corruscans (NC_026846.1, 16,123 bp), which has a smaller 12 S rRNA (515 bp), when compared to the average of other species (959 bp). An alternative P. corruscans 12 S published sequence (Lundberg et al. 2011) is 944 bp in length, indicating therefore that the complete P.corruscans mitogenome sequence (NC_026846.1) is likely to be misassembled.

Table 7 Size and structure of the complete mitogenomes of P. reticulatum and four previously published catfish species from family Pimelodidae

Analysis of a cytochrome b fragment (CYTB) derived from the full P. reticulatum mitochondrial consensus sequence, along with previously published CYTB sequences from eight different Pseudoplatystoma species, placed the samples within the generated P. reticulatum group (Fig. 4), confirming that all sampled specimens were from this species. P. reticulatum is part of the P. fasciatum species complex, which has been recently separated into eight species based on their morphological characteristics and geographic distribution (Buitrago-Suárez and Burr 2007). Available molecular data based on nuclear e partial mtDNA sequences (Torrico et al. 2009; Carvalho-Costa et al. 2011) do not fully corroborate the proposed classification, but are not enough to fully support alternative phylogenetic proposals for the Pseudoplatystoma clade. Analysis of the complete mitochondrial genome of all species in the clade will help stablish the most solid phylogenetic proposal for the group.

Fig. 4
figure 4

Phylogenetic relationship of Pseudoplatystoma based on bayesian analysis of a fragment from Cytochrome b (CYTB) using a Yule model in BEAST. Branch length fitted a strict molecular clock. P reticulatum highlighted in red indicates the generated sequence. Zungaro and Brachyplatystoma sequences were used as outgroup. Numbers above the branches indicate posterior probabilities. Colored bars in the genealogy represent South American river basins marked on the map

Phylogenetic analysis of siluriformes with complete mitogenome sequences

A total of 14,652 bp of the complete P. reticulatum mitogenome sequence, analyzed along with 94 publicly available Siluriformes complete mitogenomes from 29 families, revealed 8,573 variable sites (1,288 singletons and 7,285 parsimony informative). Thirteen of all analyzed families had two or more represented species, and 10 of these were placed in monophyletic groups (Fig. 5). The Bagridae group was observed to be monophyletic with the exception of Rita rita, which was closer to Claridae and Heteropneustidae, with low statistical support. Schilbidae appears to be monophyletic only when species of African (e.g. Claroteidae, without statistical support) and Asian (Panagasidae, with high statistical support) origin were considered. In the Claroteidae family, a single species of each of the two main subfamilies (Claroteinae and Auchenoglanidinae) was sampled and a high similarity with the African group Schilbidae was observed.

Fig. 5
figure 5

Phylogeny of Siluriformes families based on the P. reticulatum obtained sequence (in red) and 94 publicly available full mitogenome sequences. The tree was obtained by Maximum likelihood analysis and branch support was estimated after 10,000 bootstrap runs (values above 50% are shown). Main lineages from Diplomystidae, Loricarioidei and Siluroidei are highlighted

When the Diplomytidae family was considered as the basal group, the expected division between the two major groups Loricarioidei and Siluroidei was observed, confirming previous phylogeny proposals based on sequence analysis from nuclear gene fragments (Sullivan et al. 2006) and morphological data (Diogo 2007). Within Loricarioidei, Callichthydae was the most differentiated group rather than Trichomycteridae, based on only one species of each family. The group termed Big Asia, containing Bagridae (except Rita rita), Sisoridae, Eresthidae and Ampblyciptidae (Sullivan et al. 2006), was observed to have the largest statistical support in the phylogenetic tree among the main groups, which can be better supported by the larger number of representative species analyzed within each family.

Close analysis of the groups from South America showed that families contained in Pimeloidoidea (Pimelodidae and Pseudopimelodidae) were highly supported for their respective monophylies, while genus Pseudoplatystoma was also monophyletic within Pimelodidae. Therefore, most of the observed results are in agreement with other studies (Sullivan et al. 2006; Jondeung et al. 2007) and reinforce the importance of periodical revisions of molecular phylogenies as additional data are generated and made publicly available. Further analysis considering multiple molecular clocks (Drummond and Suchard 2010) and especially including full mtDNA sequences from additional Pseudoplatystoma sp. will contribute to refine the proposed Siluriformes phylogeny.

Conclusions

The first report of the complete Pseudoplatystoma reticulatum mitochondrial genome sequence revealed general gene organization, structure, content, and order similar to most vertebrates and other published Pimelodidae mitogenomes. Specific sequence and content differences were observed and may have functional attributes, for further investigation. Additional phylogenetic analysis considering multiple molecular clocks should be considered for improving the current phylogenetic proposal for the Pseudoplatystoma clade.