Introduction

The classical transfer RNA molecules, or tRNAs, are the most abundant and highly conserved class of non-coding RNAs, existing in all forms of life. tRNAs with a length ranging from approximately 70–100 nucleotides (nt) can be represented as a cloverleaf with four stems, and the three-dimensional structure as an ‘L’ shape. These distinctly structured molecules act as a connecting bridge between the genetic code and the corresponding translation product (protein), and can be said to be genetic decoders of the code (Barciszewska et al. 2016). Apart from their conventional role as adapters in protein translation, tRNAs play a diverse role in cellular functioning (Raina and Ibba 2014). In recent years, a burgeoning of studies related to the extra-translational roles of tRNAs has been performed, revealing new dimensions and perspectives in tRNA biology. tRNA can now be regarded as a multi-functional molecular entity governing different aspects of cellular physiology and metabolism in both prokaryotes and eukaryotes (Goodenbour and Pan 2006). Several extra-translational functions of both charged, as well as uncharged tRNAs in cell signaling, stress regulation, and ribosomal stability have been established (Giegé 2008; Orioli 2017). These are also involved in the apoptotic regulation process in the mammalian system (Schimmel 2017). tRNAs upon endonucleolytic cleavage give rise to tRNA halves as well as transfer RNA-derived fragments (tRFs) which are associated with pathogenesis, and stress signaling along with other functions in a wide range of organisms, including plants (Keam and Hutvagner 2015; Li et al. 2018; Sablok et al. 2017). tRNA-related short interspersed elements (SINES) with unknown functions are also detected in various organisms (Sun et al. 2007). The existence of isoacceptors and isodecoders adds another layer of intricacy to the tRNA world (Geslain and Pan 2010; Pan 2015). Although protein translation is their most studied role, innumerable complexities associated with tRNA biogenesis, processing, intron sequences, pseudogenes, suppressor tRNAs, modifications, interactions with different proteins, diversity, and the differential expression of different isoacceptors and isodecoders highlight another level of complexity in the tRNA world (Herring and Blattner 2004; Lorenz et al. 2017; Pan 2018; Tutar 2012; Yoshihisa 2014; Zahra et al. 2021). In plants, translation occurs not just in the cytosol but also in mitochondria and chloroplast. Thus, there is a higher level of complexity associated with tRNA synthesis, the enzymatic machinery involved, localization, and their intricate interactions and functioning within the different compartments of the plant cell (Marechal-Drouard et al. 1993). The gain and loss of introns amongst diverse plant species reveal their significance during various stages of evolution, and aids in studying comparative genomics across the plant kingdom (Stange, Beier and Beier 1992). The current era of robust Next Generation Sequencing (NGS) technology has provided a massive opportunity to invade and explore the genomics and extensive transcriptomic repertoire. The field of tRNA biology, particularly related to the structural and functional complications of tRNAs, has long piqued the interest of scientists and this curiosity continues to thrive. To explore the tRNAomic world in a systematic and specific way from a genetics and functional genomics perspective, web repositories have been developed from time to time. Some of the famous and currently available web portals for tRNA-related information include Rfam (Griffiths-Jones et al. 2003), tRNAdb (Abe et al. 2009), GtRNAdb (Chan and Lowe 2016), tRNADB-CE (Abe et al. 2009), and plantRNA (Cognat et al. 2013). The genomic tRNA database (GtRNAdb) harbors tRNA gene predictions by tRNAscan-SE (Lowe and Eddy 1996) from three kingdoms of life including 17 plant species. However, this database contains predicted tRNA majorly from archaebacteria, bacteria, and higher eukaryotes. The first plant-exclusive web repository (e.g., plantRNA) was developed in the year 2013. This knowledge base contains annotated around 4350 tRNA gene sequences, and their functions from complete nuclear and organellar genomes belonging to 11 evolutionarily diverse plants (recently updated to 48 photosynthetic species and 35,000 tRNA genes) (Cognat, Pawlak, Pflieger, and Drouard, 2021). Due to the availability of a large number of sequenced genomes, there is still a lacuna that requires filling for the exploration of the vast plant tRNAome in a wider spectrum. Here, we have introduced PtRNAdb (www.nipgr.ac.in/PtRNAdb), the plant-exclusive web repository that brings together tRNA gene sequences and other related information from 3577 different nuclear and organellar genomes (i.e., mitochondrial and plastidial genomes) from lower plants like algae, bryophytes and higher plants including monocots and dicots currently available at the National Center for Biotechnology Information (NCBI). The tRNA gene prediction was done using the robust tRNAscan-SE program and each sequence predicted has undergone precise manual screening and filtering. The manually curated set of 113,849 tRNA gene sequences has been incorporated for enhancing the quality of information related to plant tRNAs. Further, for making it more plant specific, we have built infernal models for isoacceptor and isodecoder molecules for each plant species. This will be beneficial for studying plant-specific tRNA biological features as tRNAscan-SE prediction alone predicts plant tRNA based on the eukaryotic model. Also, we have performed the tRNA consensus study for each isoacceptor and isodecoder using sequence and structure-based alignment, which will be highly useful for studying the phylogeny of tRNAs as they can be the potential markers for evolution of plants, and tRNAs as well. Although tRNAs are believed to be among the most ancient and highly conserved molecules on earth, there is a need to utilize them in studying the diversification of tRNA species and organelles in planta.

Additionally, biological details relevant to the tRNA molecules like A and B box sequence elements, intron sequences along with their position, length, GC content, and secondary structures of both precursor and mature tRNAs are also integrated into this database. To the best of our knowledge, PtRNAdb is the best platform for exploring plant tRNA genes, for addressing questions related to their functions beyond their role as adapters, and will help in advancing the field of plant tRNAomics. The overall representation and major sections/features of PtRNAdb are shown in Fig. 1.

Fig. 1
figure 1

Overview of the web interface, basic functions and features of PtRNAdb

Methodology

Data retrieval

The plant genomes with chromosomal-level genome assembly were downloaded from the National Center for Biotechnology Information (NCBI) database genome browser. Those plant genomes that had scaffold or contigs level of information were excluded to avoid unambiguity in the study. The organellar genomes of only those plants were retrieved for which nuclear data at the chromosomal level was also available. Furthermore, any extra sequences present in nuclear genomes were manually removed. To download the organellar genomes, i.e., mitochondrial and plastidial, the ‘esearch’ and ‘efetch’ utilities of NCBI were employed. The ‘E-utilities’ use a fixed URL syntax that translates a standard set of input parameters into the values necessary for various NCBI software components to search and retrieve the requested data.

tRNA gene prediction

The tRNAscan-SE tool developed by Low and Eddy in 1997 (Lowe and Eddy, 1996) was used to predict the tRNA sequences from the respective genome. It offers high accuracy and reliable predictions and has been utilized for tRNA gene prediction in tRNA research (Lowe and Chan 2016). In this study, tRNAscan-SE (Version 2.0.3) was used with eukaryotic (-E) and organellar (-O) modes for nuclear [tRNAscan-SE -detail -H -E -y -f# -s# -m# -b# -a# -l# -d] and organellar genomes [tRNAscan-SE -detail -H -O -y -f# -m# -b# -a# -l# -d], respectively. The complete command-line options are elaborately discussed in the ‘Methods’ section of the PtRNAdb database.

tRNA sequence filtration

For reporting only reliable and accurate tRNA sequence predictions, various filters were applied to the tRNAscan-SE output. The filtration of tRNAs encoded from the nuclear genomes was done by filtering out the pseudogenes, undetermined tRNAs (i.e., having unambiguous isoacceptors and isodecoders), removing tRNA sequences with more than 2% unambiguous nucleotide composition (i.e., N/n), removing large intronic sequences from tRNAs, and removing organellar (i.e., mitochondrial and plastidial) imported nuclear tRNAs (Michaud et al. 2011). In the organellar-encoded tRNAs, the undetermined tRNA sequences (i.e., those having unambiguous isoacceptors and isodecoders) were filtered out, and the tRNA sequences with more than 2% of unambiguous nucleotide composition (i.e., N/n) were also discarded. After these filtering steps, the total number of tRNA sequences was reduced, but the quality of the overall dataset improved significantly, which is proved by further analysis.

tRNA consensus sequence-based study

Following the filtering steps, a consensus sequence-based study was performed on the tRNA sequences of each plant. The consensus study was performed on the isoacceptors and isodecoders of each plant, and the output was a consensus alignment and consensus structure using RNAalifold (Bernhart et al. 2008). For this consensus sequence-based study, the tRNA sequences from each amino acid were grouped into a single fasta file, and similarly, tRNA sequences from each anticodon/isodecoder were collected into a single fasta file. As a result of this grouping, each plant species now has 20 fasta files organized according to the isoacceptors and 64 organized by isodecoder. After this, all the fasta files were provided as input to the LocARNA (Will et al. 2012) software for consensus sequence study via an in-house shell script. The LocARNA uses Clustal W (Thompson et al. 1994) to align the tRNA sequences. The consensus alignment and consensus-based tRNA structure are available for all the plants present in PtRNAdb and can be accessed from the database detailed page by clicking on the respective isoacceptor or isodecoder. Moreover, a consensus-based phylogenetic tree was constructed for each isoacceptor and isodecoder of all plants using the Environment for Tree Exploration (ETE) (Ghosh et al. 2010). This consensus-based phylogenetic tree is also available for all the plants along with the consensus alignment and structure. However, the isoacceptor and isodecoder groups, which contain only one tRNA sequence, do not have any consensus-based alignment or consensus-based structure since it is only a single sequence. For those single tRNA sequence entries, RNAfold was used, which is a utility of the ViennaRNA package 2.0 (R. Lorenz et al. 2011) to model the tRNA structure.

Building tRNA infernal covariance models

Apart from this, probabilistic models were built for each amino acid and anticodon group using various infernal utilities. Probabilistic profiles of the sequence and secondary structure of the isoacceptors and isodecoders were built, known as covariance models (CMs). Infernal (‘INFERence of RNA ALignment’) (Nawrocki et al. 2009) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence. To build these infernal models, first, the Stockholm files of each isoacceptor and isodecoder group were provided to the ‘cmbuild’ utility of infernal. The Stockholm files were obtained from the previous step in which the consensus study was performed using ‘mlocarna’. The ‘cmbuild’ utility builds the initial covariance model from an input multiple alignment or Stockholm file. After this, the ‘cmcalibrate’ utility is used, which calibrates the E-value parameters for the covariance model. In this last step, we need to integrate our models into the database, therefore, the ‘cmpress’ utility was used to format a CM database into a binary format for ‘cmscan’ that is integrated into the PtRNAdb. Users can provide their query sequence in the ‘analyze’ module of the PtRNAdb database and select the respective plant and isodecoder or isoacceptors model to run cmscan on the query sequence. The output can be downloaded from the result page of the infernal module.

PtRNAdb web interface development

We have developed PtRNAdb using the Apache HTTP server (version 2.4.6) integrated with PHP (version 7.3.3) and MySQL (version 8.0.15) on a server machine with Linux as the operating system. PHP and JavaScript were used to develop the user-friendly web interface of the database, while MySQL (version 8.0.15) was used to process the data at the back-end. The graphical representation of the overall architecture of PtRNAdb along with the features is displayed in Fig. 1. CSS and HTML were used to make the template responsive. Perl and shell scripts were also integrated at the back-end of the database for multiple file handling and data manipulation.

Features incorporated in PtRNAdb

This database was developed for providing a user-friendly and simple interface. To accomplish this aim, several options are provided on the database webpage.

SEARCH

On the ‘SEARCH’ page of PtRNAdb, users can make complex search queries by selecting multiple options and refining the number of outputs to very specific results, or else select fewer options for broad results. It is designed for users to search this database using different combinations of search terms or a single query also. Various search options include Genome type, i.e., nuclear, plastidial, or mitochondrial, Plant category or Plant Name, Amino acid, Anticodons, and Infernal Score Range. By default, the ‘Mitochondrial-Dicot’ category in the ‘Search by Genome’ option is selected and the ‘Infernal score’ is set to maximum.

BROWSE

The ‘BROWSE’ module provides two options to browse through all the entries present in the database: ‘Browse by tRNA sequence length range’ and ‘Browse by plant family’. This module is very useful in cases where users do not have any pre-handed information about tRNAs or the database and just want to understand the features of the database. By selecting a single plant in the ‘Browse by plant family’ module, the viewer can visualize the tRNA summary table, i.e., the number of isoacceptors and isodecoders displayed in tabular format for both nuclear and organellar genomes. Apart from the modules, the output tables of the search and browse modules are highly dynamic. In the output search table, users can further refine the output according to their requirements by using the search box and also shuffle through a large number of output entries easily by using the pagination links. It is expected that these additional features will make the user’s experience more pleasant and make it easier to understand the database.

BLAST

Apart from the ‘SEARCH’ and ‘BROWSE’ modules, the ‘BLAST’ module was also incorporated. In this module, we have integrated the ‘blastn’ option from NCBI-BLAST software package (v2.11.0) (Altschul et al. 1990). This module is very useful for finding regions of similarity between the user input FASTA sequences and PtRNAdb database sequences or PlantRNA database sequences, NCBI reported tRNA sequences. A user can also manipulate the E-value by using the drop-down option on the ‘blast module’ which is set at 0.01 by default.

ANALYZE

This module will run cmscan on the user input sequence to find hits against the PtRNAdb database infernal models built of the tRNA sequences. cmscan will structurally align the user query sequence with the plant-specific tRNA infernal models (isoacceptor or isodecoder model) selected by the user. The user can select different models according to the input sequence.

Results and discussion

Altogether, 109,902 tRNA genes are registered from the nuclear genome belonging to 106 photosynthetic organisms. We also analyzed 127 organellar genomes, out of which 89 were plastid genomes, and 38 were of mitochondrial origin. A total of 2930 and 1017 tRNA genes were identified from plastid and mitochondrial genomes, respectively. The majority of the nuclear and organellar genomes are from dicots, i.e., 69.8% (nuclear) and 71.7% (organellar) genomes, followed by monocots viz. 27.4% (nuclear) and 23.6% (organellar) genomes. Both algae and bryophyte samples were present in low numbers (less than 5%), while no pteridophytes and gymnosperms are present in this study. Additionally, the ‘statistics’ page shows important statistical information related to PtRNAdb. It provides an overview of plant-wise PtRNAdb data analyzed using dynamic charts.

A total of 113,849 tRNA entries are available in the current version of PtRNAdb. The relationship between the genome size of the plants and the number of tRNAs, pseudogenes, and introns is shown in Fig. 2 for all the three levels of genomes (i.e., nuclear, plastidial, and mitochondrial). It is quite clear that with the increase in the nuclear genome size, the number of tRNAs, the number of pseudogenes, as well as introns is increasing (Fig. 2A). Similarly, in Fig. 2B, C, a similar trend is visible except there are no pseudogenes. This is because the tRNAscan-SE algorithm cannot perform pseudogene prediction for the organellar genome. Therefore, only the genome size (in Kb), number of tRNAs, and number of pseudogenes are shown for plastid and mitochondrial genomes. The initial number of tRNAs and the number of true tRNAs obtained after filtration steps are available in Supplementary Table 1. As a result of the filtration steps performed, the quality of the tRNA sequences substantially improved, which was further observed in the consensus sequence study using the LocARNA. All the results and data available on the database can be retrieved easily, and to understand each module of the database comprehensively, users can refer to the ‘Help’ page of PtRNAdb, where the usability of all the modules is explained in detail.

Fig. 2
figure 2

Relation between genome size, number of tRNAs, number of pseudogenes and number of introns of nuclear, plastid and mitochondrial genomes. A Shows relation between genome size in Mb, number of tRNAs, number of pseudogenes and number of introns in nuclear genome samples. B Shows relation between genome size in Kb, number of tRNAs and number of introns in plastid genome samples. C Shows relation between genome size in Kb, number of tRNAs and number of introns in mitochondrial samples

To check the tRNA sequences in our database, we also cross-validated our tRNA sequences with those of the PlantRNA database. Therefore, all the tRNA sequences in the PlantRNA database were retrieved and a database was developed using the ‘makeblastdb’ utility of NCBI-BLAST. All the true tRNA sequences from the PtRNAdb database were provided as query input sequence data and ‘blastn’ was run using PlantRNA as the database. As a result of this analysis, we found 1346 tRNA sequences from the PlantRNA database that were identical to the 53,315 tRNA sequences in the PtRNAdb database. Moreover, there were 1640 tRNA sequences from the PlantRNA database with more than 75% identity with the 90,022 tRNA sequences in the PtRNAdb. Although there is a huge difference in the number of tRNA sequences reported in the PlantRNA and PtRNAdb, since the PlantRNAv1 database reports tRNAs from only 12 plant species (at the time of developing PtRNAdb), and PtRNAdb reports tRNAs from more than 100 plant species, the evident number of tRNA matches indicates that the data from PtRNAdb is reliable.

We have built covariance models (CMs) using various utilities of infernal like ‘cmbuild’, ‘cmcalibrate’, and ‘cmpress’ so that users can structurally align their query sequences with different isotype, and isodecoder models from various plants in PtRNAdb. The consensus RNA secondary structure profiles (i.e., the CMs) consider both primary sequence, and secondary structure conservation that are very useful for finding statistically significant homologous tRNAs (Nawrocki et al. 2009). This feature is available through the ‘ANALYZE’ module of PtRNAdb, which is one of the key features of PtRNAdb. A user have to select a plant among one of the three genome levels, followed by selecting the isoacceptors or isodecoder. Results of this module are available to download in two different formats, i.e., in tabular format as well as a detailed output format as exactly shown on the result page of this module. To structurally align the user’s query sequence with the covariance model selected by the user, the ‘cmscan’ utility of Infernal is used. Since ‘cmscan’ or ‘cmalign’ of the ‘Infernal’ tool is always considered best for proper structural alignments, this module will provide the user with accurate alignments. Furthermore, the consensus-based study shows promising tRNA sequence consensus alignments and consensus tRNA structures. Although some of the consensus structures of tRNA are not of very good quality in terms of their clover-leaf structure, in the future this issue can be resolved by developing a tRNA loop, and stem-specific alignment algorithm. The consensus study aims to take into account whole groups of tRNAs based on isoacceptors and isodecoders, which is more accurate than individual tRNA structures. However, the tRNA sequences that had only a single isoacceptor or isodecoder did not have any consensus-based alignments or structures. In such cases, RNAfold, which is a utility of the ViennaRNA package (R. Lorenz et al. 2011), was used to build the tRNA structures and is shown on the detailed output page along with the RNA Dot Plot generated by RNAfold.

The genomic tRNA evolution is still an interesting domain for researchers as it allows interpretation of their molecular evolution in the plant kingdom (Mohanta et al. 2019). The phylogenetic analysis will be advantageous to understand how nuclear and organellar-encoded tRNAs have evolved from common ancestors. We have incorporated phylogenetic trees based on consensus sequences built for each tRNA entry (both anticodon-wise and amino acid-wise). This will be beneficial in revealing the evolutionary relationship and structural variation among the diverse tRNAs. This will further summarize the similarity between the synonymous codons coding for a specific amino acid. This will also highlight the similarities and the deviations in the sequence composition of isoacceptors and isodecoders.

Besides their fundamental role in protein translation, the structural as well as functional tRNA complexity has intrigued tRNA biologists over the years and is ever-evolving. New functions indicate that tRNA production and modification affect overall plant growth as well as the defense response to various stresses (Ma et al. 2020; Soprano, Smetana, and Benedetti 2018). Studying the organization of tRNA genes and the regulation of their transcription will provide clues about their unknown functions in plants. The existence of a large number of isodecoders brings into question their role within the plant cell. Moreover, the relation between codon usage and tRNA gene copy number can give novel insights into the relevance and impact of favored codons in the plant system. Studying phylogenetic trees will decipher the evolutionary history of tRNA isoacceptors and isodecoders. Thereby, this plant tRNA knowledgebase is aimed at providing large-scale accurate information about plant tRNAs and aiming to aid in experimental tRNA research.

Future directions

The PtRNAdb will be updated continuously with new and more accurate information, along with better annotation of the existing tRNA sequences. We will continue to upgrade the quality of the web interface and offer new search possibilities. A long-term aim of this work will be to enrich the biological information content of the databases, with profiles, the description of tRNA gene expression profiles, the description of occurring tRNA-derived fragments (tRFs), and 3D structure models/clover-leaf models of all the tRNAs.