Keywords

1 Introduction

The emergence of fast, accurate, and cost-effective high throughput Next Generation Sequencing (NGS) technology has significantly paced up the exploration of the immense repertoire of small non-coding RNA molecules (Vickers et al. 2015). The rich plethora of small RNA molecules with diverse biological activities are produced in varied organisms, in various tissues, under different conditions, and during varied stages of development. Amongst them, microRNAs (miRNAs) and small interfering RNAs (siRNAs) have been extensively studied and proved to play a significant role in different aspects of gene expression together with gene regulation. Transfer RNA-derived fragments or tRFs with length ranging from 15 to 28 nucleotides (nt), being similar to miRNAs, both structurally and functionally, has been detected in diverse species ranging from the most primitive Archaebacteria to the most evolved human beings (Keam and Hutvagner 2015). It is reported that tRFs are present in abundance and stand next to miRNAs in the small RNA pool.

Apart from the canonical role of as adapter molecules of tRNAs during protein translation, it has been revealed that tRNAs are also shown to be involved in the regulation of cellular functionalities and metabolism (Orioli 2017). This has paved way for exploring new dimensions in tRNA biology and their implications in cellular physiology and disease. With the augmentation of the omics era, decreased cost of sequencing, and abundance of sequenced transcriptomic data, an insight into tRNA dynamics has revealed that tRNAs are also involved in generation of a novel class of small non-coding RNAs by undergoing endonucleolytic cleavage at specific positions, called tRNA-derived fragments or tRFs (Lee et al. 2009). These novel cleavage products with length ranging from 15 to 28 nucleotides have been detected in both prokaryotes and eukaryotes (Sablok et al. 2017). Some specific tRFs overexpression in different human has unveiled their role in cancer progression and altering cellular dynamics in other organisms as well under different stress conditions (Sun et al. 2018). Although well studied in humans and other organisms, some reports in plants also confirm that tRFs are associated with Argonaute (AGO) proteins and involved in gene expression regulation under various abiotic as well as biotic stresses (Loss-Morais et al. 2013). Being functionally similar to microRNAs, recently the study of tRFs has gained momentum in plants as some tRFs have been proven to be differentially over-expressed during abiotic and biotic stresses.

In diverse organisms including plants, tRFs are synthesized by enzymatic cleavage of tRNAs in a position-specific fashion and depending on the cleavage position of tRNAs, they are classified into three types: tRF-5, tRF-3, and tRF-1 (Lee et al. 2009). The tRF-5s and tRF-3s are generated from mature tRNA from 5′ and 3′ ends respectively, while tRF-1s are synthesized from 3′-trailer sequences of pre-mature tRNAs (Fig. 27.1), which can be mediated by Dicer-like (DCL) proteins or by some DCL-independent process which is yet to be elucidated. The ribonucleases involved in tRF generation are not established yet but RNS1 is speculated to be responsible for tRF synthesis in the model plant, Arabidopsis thaliana (Alves et al. 2017). There is also evidence of the existence of organellar (mitochondrial and plastidial) tRFs from the previous literature but their potential functions have not been revealed yet (Cognat et al. 2017).

Fig. 27.1
figure 1

Cleavage of pre-mature and mature tRNA for tRFs generation

It has now been established that tRFs have a differential accumulation in different plants under various abiotic and abiotic stresses. They are believed to be acting as stress-related gene expression regulators. Specific tRFs viz. AlaAGC, ArgCCT, ArgTCG, and GlyTCC were reported to be overexpressed in drought and salt stress, ValCAC, TyrUGU, ThrGUA, and SerUGA during heat stress, AspGTC and GlyTCC during phosphate deficiency, ArgCCT in cold stress, ArgTCG and TyrGTA during oxidative stress, IleAAT, ArgACG, and AlaCGC during pathogen infection (Hsieh et al. 2009; Loss-Morais et al. 2013; Wang et al. 2016a, b; Alves et al. 2017).

tRFs are also been associated with some AGO proteins like AGO1, AGO2, AGO4, and AGO7 during stress conditions (Sablok et al. 2017). It has been discovered that tRFs also interfere with ribosomal proteins and affect translational activity. Apart from translational repression, the tRFs affect genome stability by governing the post-transcriptional activity of retrotransposons (Martinez et al. 2017). Some recent research suggests that tRFs facilitate in the root nodule formation as well as aid arbuscular mycorrhiza growth in leguminous plants (Jin et al. 2018).

For further exploration of the domain of tRFs, some currently available web-based portals are MINTbase (Pliatsika et al. 2016), tRFdb (Kumar et al. 2015), tRex (Thompson et al. 2018) and tRF2Cancer (Zheng et al. 2016). However, very little information is available for plant tRFs with the exception of tRex which is web-portal holding information about tRFs detected in Arabidopsis only. This chapter discusses about our recently developed database, ‘PtRFdb’ (www.nipgr.res.in/PtRFdb) containing complete information of tRFs detected in 10 evolutionarily dissimilar and diversified plant species (Gupta et al. 2018). This database is believed to be highly resourceful for gaining numerous useful information about different tRF types in diverse plants species. PtRFdb will be useful to elucide new pathways of gene expression regulation in plant genomics and better comprehensive understanding of the cross-talks between other small non-coding RNAs and their downstream target molecules.

2 Materials

The tRNA genes of the ten plant species viz. Physcomitrella patens (Version 1.1), Brachypodium distachyon (JGI v1.0 8X), Populus trichocarpa (January 2010 Version 2.0), O. sativa (v7.0), Sorghum bicolor (Version 1.0), M. truncatula (March 2009 Version 3.0), A. thaliana (TAIR10 February 2011), Glycine max (Wm82.a2), Vitis vinifera (Grapevine 12X) and Zea mays (Version 5b.60) were downloaded from GtRNAdb (Chan and Lowe 2009). Additionally, the reference genomes of the plants were also fetched from their respective genomic portals. Thus, in the FASTA format, the sequences of tRNA genes for each of the considered plant species were extracted as per the strand information. For generating pre-tRNAs, we extracted the sequences 40 nt upstream and downstream at the terminal ends of the mature tRNA genes. As mature RNA contains ‘CCA’ at the 3′-end, CCA was added to tRNA sequences obtained from tRNAscan (Lowe and Eddy 1997). By combining pre-mature and mature tRNA sequences, a reference database was created for each plant species using the option ‘makeblastdb’ script of BLAST (Basic Local Alignment Search Tool) (Altschul et al. 1990). This reference database was utilized for the prediction of three different tRF types (tRF-5s, tRF-3s, and tRF-1s).

3 Methods

3.1 Data Procurement

Datasets constituting small RNA sequencing reads and small RNA/microRNA data comprising of unique sequences with clonal frequency were downloaded from NCBI-SRA (https://www.ncbi.nlm.nih.gov/sra) and NCBI-GEO (http://www.ncbi. nlm.nih.gov/geo/) respectively. This data was further processed for the identification of tRFs.

3.2 tRFs Identification

The fragments with the length ranging from 15 to 28 nt, with a clonal frequency greater than 9 (>9) were selected and BLASTN was performed against reference database aforementioned. Only those reads were considered that mapped along 100% length to the database for further study. Similarly, raw reads were processed by using stringent filters as described in our published research cited earlier. Reads were also filtered using tDRmapper software (Selitsky and Sethupathy 2015) and only reads with a quality score of >28 were accepted. Further, for eliminating false positive reads, reads with identity equal to 100% and without a gap (0%) were selected. The reads length ranging from 15 to 28 nt were selected for incorporation in PtRFdb.

3.3 The Web Interface of PtRFdb

After the collection and compilation of all the information, our PtRFdb database was developed on an Apache Hypertext Transfer Protocol (HTTP) Server together with MySQL at the backend, providing commands for data storage and retrieval into the database. Hypertext Markup Language or HTML, JavaScript, in addition to Hypertext Pre-processor (PHP) for front-end of web interface designing. PHP and PERL languages were used for writing in-built scripts. The home page of PtRFdb is represented in Fig. 27.2.

Fig. 27.2
figure 2

Homepage of PtRFdb (http://www.nipgr.res.in/PtRFdb/)

3.4 PtRFdb Features and Tools

PtRFdb holds detailed information related to tRFs identified in different plants. It holds data at two levels- primary and secondary level. At the primary level, basic information pertaining to each tRF entry like tRF type, tRNA name, gene coordinates, plant, tissue, PubMed ID, anticodon of corresponding tRNA and GSM number are provided. At the secondary level, sequence length, mapping position, relevent frequency, publication and sequencing study was incorporated. For ease in the retrieval of information, we have provided user-friendly search modules: ‘Basic search’, ‘Advanced search’ and ‘Browse’ (Figs. 27.3, 27.4, and 27.5). For each query, up to a maximum of ten different fields can be displayed. The columns for ‘GSM number’, ‘Sequence’, and ‘PMID’ are again linked with their parallel information related to experimental details, research publications, and sequence details as highlighted in the search result of Fig. 27.4.

Fig. 27.3
figure 3

Basic search page of PtRFdb (http://www.nipgr.res.in/PtRFdb/search.php)

Fig. 27.4
figure 4

Advanced search and its corresponding result page in PtRFdb (http://www.nipgr.res.in/PtRFdb/cond.php)

Fig. 27.5
figure 5

Browse section of PtRFdb (http://www.nipgr.res.in/PtRFdb/browse.php)

Advanced search is supported with conditional and Boolean operators for user-built customized search. The ‘Browse’ section of PtRFdb facilitates the user to browse in three different manners: with respect to individual plants, by tRF types (i.e. tRF-5, 3, and 1) and also by anticodon type.

In the ‘BLAST’ page of the PtRFdb (Fig. 27.6), BLASTN of any query nucleotide sequence can be performed against a particular plant species or over entire available datasets which can be selected as per the user’s requirement.

Fig. 27.6
figure 6

Blast page of PtRFdb (http://www.nipgr.res.in/PtRFdb/blast.php)

To know the significance of the BLAST match, different ‘Expect value or E value’ ranging from 0.001 to 100 can be selected. The ‘Method’ section highlights sequencing steps for identification of tRFs in our study right from downloading raw datasets till tRFs prediction. In the ‘Statistics’ page, graphical and tabular representation of the overall distribution of different tRF types in individual plant species is provided. Lastly, the ‘Help’ section guides the users for simply understanding and navigating different modules of PtRFdb.

4 Notes

  1. 1.

    The total of 1344 sequencing datasets of ten plant species were used for the identification of tRFs.

  2. 2.

    Information associated with all analyzed GEO samples in our study was fetched by using the ‘SRAdb’ as well as ‘GEOmetadb’ libraries of the Bioconductor software package (http://www.bioconductor. org) and united for each entry of our database, PtRFdb.

  3. 3.

    For extraction of the mapping coordinates, in house PERL scripts were used.

  4. 4.

    This database holds information about 487,765 entries of tRFs (258,439 tRF-5s, 225,380 tRF-3s and 3946 tRF-1s).

  5. 5.

    The total number of 5607 unique tRFs sequences is incorporated in PtRFdb (2580 tRF-5s, 2269 tRF-3s, 758 tRF-1s).

  6. 6.

    The majority of the tRFs had a length in the range of 18–24 nt and tRF-5 were most abundant of all tRFs.

  7. 7.

    In the advanced search option, the conditional operators ‘=’ and ‘Like’ coupled with the two logical operators ‘AND’ and ‘OR’ were incorporated for user-built customized search.

  8. 8.

    For providing flexibility for search options, the ‘containing’ and ‘exact’ options have been provided.

  9. 9.

    BLAST-version 2.6.0 was utilized in the PtRFdb.

  10. 10.

    As Apache, PHP, and MySQL are free, open-source software, and are platform-independent, so they were preferably utilized for our database development.

  11. 11.

    In the future, attempts will be made for updating our database by further addition of more data.

  12. 12.

    For more details, related to PtRFdb, refer to our published paper (Gupta, N., Singh, A., Zahra, S., and Kumar S. PtRFdb: a database for plant transfer RNA-derived fragments. Database (Oxford). 2018 Jan 1; 2018. doi: 10.1093/database/bay063. PMID: 29939244).