Keywords

1 Introduction

Development of high-throughput transcriptome sequencing techniques discovered an intriguing class of circular RNA (circRNA) molecules without free ends (Jeck et al. 2013). Several reports established that circRNAs are an abundant and ubiquitously expressed RNA species in eukaryotes with regulatory potency (Hansen et al. 2013; Jeck et al. 2013). Furthermore, it has been well established that most circRNAs are generated via backsplicing of canonical mRNA precursors where one or multiple exons undergo head-to-tail joining to produce circRNA (Jeck et al. 2013). The unique chimeric backsplice junction sequence helps in circRNA identification, while the lack of a free end makes them resistant to exonucleases. Over the last few years, thousands of circRNAs have been identified in diverse biological samples (Vromman et al. 2021).

Identification of the huge number of circRNAs in recent years raised questions on the biological relevance of circRNAs in healthy and disease conditions. Over the years, it has been demonstrated that circRNAs play a critical role in gene regulation involving different mechanisms of action, including circRNA acting as miRNA sponges, association with RNA-binding proteins (RBPs), and serving as templates for protein translation (Hansen et al. 2013; Panda 2018; Das et al. 2021; Sinha et al. 2022b). Therefore, various computational tools have been developed to systematically explore possible functions of circRNAs before experimental validations. This chapter describes the detailed analysis of circRNA functions using computational tools (Fig. 1).

Fig. 1
A flowchart of steps like raw data collection, circular R N A identification and annotation, mature sequence, and computational analysis of circular R N A functions like circ R N A-m i R N A interaction, sponging, and translation.

Flowchart depicting computational tools for the analysis of circRNA functions

2 Tools

  • Computer with Windows/Mac/Linux operating system and Internet browser such as Internet Explorer/Mozilla Firefox/Google Chrome

  • Microsoft Excel to plot the bar graphs for data visualization

  • BEDtools

  • Python

  • Cytoscape

  • IRESfinder

3 Method

3.1 Finding the Mature CircRNA Sequence

Endogenous RNA molecules such as protein-coding mRNAs and noncoding RNAs like lncRNAs are transcribed from the genome and spliced to generate the mature functional RNA molecules. Backsplicing of the precursor mRNAs generates circRNAs comprising head-to-tail joined backsplice junction sequences and the intervening exons or introns (Jeck et al. 2013). Finding the composition of the circRNA mature sequence is a vital step toward predicting the downstream functions. High-throughput RNA sequencing followed by analysis using specialized tools such as CIRCexplorer identifies circRNAs present in total RNA sequencing data (Zhang et al. 2016). These tools identify circRNAs based on the chimeric backsplice junction and provide the genomic coordinates, start, and end of the circRNA, exon information, gene name, etc. For example, circAkt3 circRNA generated from the Akt3 gene has a unique chromosomal coordinate ID as chr1|176,930,527|176,937,304|. UCSC genome browser can be used to get the mature sequence of circRNA based on the genomic coordinates by joining the exon, or intronic sequences present between the backsplice coordinates (Lee et al. 2022). Furthermore, the mature sequence of previously annotated circRNAs can be retrieved from circRNA databases such as circInteractome, circAtlas, PanCircBase, and other databases (Panda et al. 2018; Vromman et al. 2021; Wu et al. 2020; Sinha et al. 2022a). However, the mentioned set of tools does not provide the mature sequence of novel circRNAs identified by the circRNA annotation tool. BEDtools is one of the most popular computational software used for the retrieval and manipulation of genomic and spliced sequences of genes expressed in cells (Quinlan and Hall 2010).

  1. 3.1.1

    In order to find the mature sequence of circRNAs using BEDtools (https://bedtools.readthedocs.io/en/latest/), prepare an appropriate BED12 input file containing all the 12 requisite fields such as “chrom”, “chromStart”, “chromEnd”, “name”, “score”, “strand”, “thickStart”, “thickEnd”, “itemRgb”, “exonCount”, “exonSizes” and “exonOffsets” (https://genome.ucsc.edu/FAQ/FAQformat.html#format1). [Note 4.1]

  2. 3.1.2

    Create the input.bed file for circAkt3 using the information from the circRNA annotation file generated from the circRNA analysis tool, CIRCexplorer2 or similar tools (Szabo and Salzman 2016; Zhang et al. 2016).

  3. 3.1.3

    Once the input.bed file is prepared, the bedtools getfasta operation can be used to get the mature splice sequences of circRNAs using the following one-line command in UNIX—bedtools getfasta-nameOnly-tab-s-split -bed <filedirectory> input.bed-fi <path to genome file> genome.fa-fo <filedirectory> prefix_output.txt.

  4. 3.1.4

    Open the tab-delimited getfasta output.txt file using Microsoft Excel. The mature sequence of mouse circAkt3 is 257 nucleotides long as shown below

    >mmu_circAkt3_257

    AATGTCAGTTAATGAAAACAGAACGACCAAAGCCAAATACATTTATTATCAGATGTCTTCAGTGGACCACTGTTATAGAGAGAACATTTCATGTAGATACACCAGAGGAAAGAGAAGAGTGGACGGAAGCTATCCAAGCCGTAGCCGACCGATTGCAGAGGCAAGAGGAGGAGAGGATGAATTGTAGCCCAACCTCACAGATTGATAATATAGGAGAAGAAGAGATGGATGCGTCTACAACCCATCATAAAAGAAAG.

3.2 Analysis of RBPs Interacting with circAkt3

Several reports highlight the importance of the circRNA-protein interaction in cell physiology and diseases, leading to the differential regulation of the target gene expression by circRNAs acting as a protein decoy, protein sponge, and scaffold. Thus, it is paramount to discover the proteins interacting with circRNAs and study the functional outcome of such association in cells. There are a few web servers to predict the RBP binding to circRNAs. For example, the circInteractome (https://circinteractome.nia.nih.gov/index.html) database predicts RBP binding to human circRNAs with circBase IDs and their flanking sequences spanning the circularizing exons (Glazar et al. 2014; Panda et al. 2018). Another database namely ENCORI/starBase v2.0 (https://starbase.sysu.edu.cn/index.php) is useful for identifying circRNA-RBP interactions in human samples based on large-scale CLIP-Seq datasets (Li et al. 2014). circAtlas v2.0 (http://159.226.67.237:8080/new/index.php) database lists interacting RBPs of vertebrate circRNAs along with binding site counts in the upstream and downstream flanking sequences (Wu et al. 2020). However, these databases only predict the circRNA-RBP interaction for circRNAs present in that database. Here, we describe the prediction of RBP binding sites on the circRNA sequence using the RBPmap web server (Paz et al. 2014).

  1. 3.2.1

    Open the RBPmap home page http://rbpmap.technion.ac.il/index.html.

  2. 3.2.2

    Under the Input (mandatory) tab, select the “Genome” type and “Database assembly” version from the dropdown list. Select “Mouse” and “Dec. 2011 (GRCm38/mm10)” for mouse circAkt3 (Fig. 2).

    Fig. 2
    A screenshot of the R B P map page with arrows marked on columns for genome, data assembly, motif selection mandatory, view binding site predictions summary, and download prediction summary. The selection of motifs and prediction summary are expanded.

    Screenshots of RBPmap tool for predicting RBP binding to circAkt3

  3. 3.2.3

    Paste the mature splice sequence of the circAkt3 in the “Query sequences/coordinates” box. [Note 4.2]

  4. 3.2.4

    Under the Motif selection (mandatory) tab, click “Click here to select motifs from RBPmap full list” and select the “Human/Mouse motifs” in the pop-up window.

  5. 3.2.5

    As shown in Fig. 2, clicking the “Submit” button will open a result page. The result page appears as a prediction summary list showing interacting protein names along with Position, Motif, Occurrence, Z-score, and P-value. The full result list can be downloaded as a text file.

3.3 Analysis of circRNA-miRNA-mRNA Regulatory Networks

3.3.1 Identification of miRNAs Interacting with circAkt3

As mentioned in the previous sections, circRNA regulates both transcription and translation by binding RNA-binding proteins and miRNAs. miRNAs are known to suppress target mRNA stability and translation by binding to the 3’UTR of the target gene. CircRNAs with miRNA binding sites in the cytoplasm serve as miRNA sponges to inhibit miRNA activity through competing endogenous RNA (ceRNA) network, thereby regulating downstream target gene expression (Hansen et al. 2013; Panda 2018). Several databases and standalone tools have been developed to predict miRNA target sites, such as miRDB, TargetScan, and miRanda among many others (Enright et al. 2003; Agarwal et al. 2015; Riffo-Campos et al. 2016; Chen and Wang 2020). This section describes the method to predict miRNA binding sites on circAkt3 using miRDB (Fig. 3).

Fig. 3
A screenshot of the m i R D B page with arrows marked for customer prediction, species, submission type, go button, and reverse prediction result. The 11 predicted m i R N A targeting the submitted 257 n t long m R N A sequence are listed.

Screenshots of miRDB web tool for predicting miRNA binding sites on circAkt3

  1. 3.3.1.1

    Open the miRDB web tool (http://www.mirdb.org/). [Note 4.3]

  2. 3.3.1.2

    Click the “custom prediction” tab and select “mouse” as the species and “mRNA target sequence” as the submission type.

  3. 3.3.1.3

    Paste the circAkt3 sequence in the sequence input box and click “go” for custom miRNA prediction.

  4. 3.3.1.4

    After the completion of the prediction, click “Retrieve prediction results” to find the final list of miRNA binding sites in circAkt3 sequence. Here, miRDB predicted 11 miRNAs that bind to circAkt3 (Fig. 3).

3.3.2 Identification of Target Genes of miRNAs Associated with circAkt3

Hundreds of studies have highlighted the role of circRNA-miRNA-mRNA regulatory axis in gene regulation (Panda 2018). Numerous tools and web servers have been developed to predict the miRNA target genes computationally. However, since all miRNA targets may not be functionally validated, we used the mouse (mmu) miRTarBase 8.0 to find the experimentally validated mRNA targets of the miRNAs associated with circAkt3 as follows (Huang et al. 2022).

  1. 3.3.2.1

    Download the complete list of experimentally validated mouse miRNAs and their targets in miRTarBase-Release 8.0.

  2. 3.3.2.2

    Find the common miRNAs targeted by circAkt3 and present in miRTarBase. For circAkt3, four miRNAs were found to have experimentally validated mRNA targets in miRTarBase (Fig. 4).

    Fig. 4
    A circular network diagram for circ A k t 3. The nodes labeled are m i R-3059-5 p, m i R-7116-5 p, m i R-12202-3 p, m i R-6934-3 p, m i R-6385, m i R-203 b-5 p etcetera.

    The circAkt3-miRNA-mRNA regulatory network was constructed on Cytoscape. The hexagon nodes represent miRNAs associated with circAkt3. The yellow hexagons represent the miRTarBase miRNAs, and the blue circle nodes represent experimentally validated target genes in miRTarBase

  3. 3.3.2.3

    Extract the target mRNAs of these four miRNAs in miRTarBase.

  4. 3.3.2.4

    Construct the circAkt3-miRNA-mRNA regulatory network using Cytoscape software.

3.3.3 Analysis of Gene Ontology and Pathways Regulated by circAkt3-miRNA-mRNA Network

As shown in Fig. 4, circAkt3 may act as a sponge for four miRNAs that have experimentally validated downstream targets in miRTarBase. Therefore, the mRNA targets from the circAkt3-miRNA-mRNA regulatory network can be analyzed to predict the biological functions of circAkt3. The association of target genes with different gene ontology terms and pathways can be predicted with various databases and web tools, including Ingenuity Pathway Analysis (IPA), Genomatix Software Suite (GSS), Gene Ontology database (GO) and KEGG pathways, PANTHER, DAVID, KOBAS, and STRING (Dennis et al. 2003; Kanehisa et al. 2016; The Gene Ontology 2019; Bu et al. 2021; Szklarczyk et al. 2021; Thomas et al. 2022). Statistical significance was calculated by student’s t-test and considered significant with a p-value of <0.05. Here, we used the PANTHER database (v17) for the analysis for circAkt3 target genes (Fig. 5) (Thomas et al. 2022).

Fig. 5
A screenshot of the Panther classification systems page. The highlighted sections are entered I Ds, select organisms, select analysis with functional classification and pie chart selected, and pie charts for biological processes, cellular components, molecular functions, and pathways. The submit button is also highlighted.

Enrichment analysis of mRNA targets circAkt3-miRNA-mRNA regulatory network

  1. 3.3.3.1

    To perform the gene enrichment analysis, open http://www.pantherdb.org/.

  2. 3.3.3.2

    Paste the mRNA target gene list in the input box and select “Mus musculus” in the dropdown menu for “Organism”.

  3. 3.3.3.3

    Select the “Functional classification viewed in graphic charts” and “Pie chart”.

  4. 3.3.3.4

    Click the “submit” tab.

  5. 3.3.3.5

    A result page will open with the pie chart results.

  6. 3.3.3.6

    Select the Gene Ontology terms in the dropdown menu to show the classification of genes under different GO terms, including Biological Process, Molecular Function, Cellular Component, and Panther Pathways (Fig. 5). [Note 4.4]

3.4 Analysis of the Protein-Coding Potential of circAkt3

Recent years witnessed a rise in research investigating the protein-coding potential of circRNAs. Only a handful of reports are available about the functional role of translating circRNAs and their peptide products (Fan et al. 2022; Sinha et al. 2022b). Recently, a few tools and databases have been developed to access the coding potential of circRNAs such as CPAT, CircPro, CircCode, CircPrimer, riboCIRC, and TransCirc among many others (Wang et al. 2013; Meng et al. 2017; Sun and Li 2019; Huang et al. 2021; Li et al. 2021; Zhong and Feng 2022). Nevertheless, the presence of open reading frames (ORFs), internal ribosome entry sites (IRES), m6A modification, and the association of circRNAs with translating polyribosomes suggested the possible cap-independent translation of circRNAs into peptides (Yang et al. 2017; Ye et al. 2021; Fan et al. 2022). Here, we used circAkt3 sequence to study its protein-coding potential by analyzing the presence of ORF, IRES, and m6A sites.

3.4.1 Analysis of ORFs Spanning Backsplice Junction

We used the NCBI ORFfinder to identify the ORFs in circAkt3 (Fig. 6). [Note 4.5]

Fig. 6
A screenshot of the O R finder interface with sections highlighted for entering the accession number, g i or nucleotide sequence in FASTA format, minimal O R F length, A T G, alternative initiation codons, and submit. The open reading frame viewer is given below, with columns for O R F 1.

ORFfinder interface showing ORF search result for 3 × sequence of circAkt3

  1. 3.4.1.1

    Open the https://www.ncbi.nlm.nih.gov/orffinder/ and paste the circAkt3 sequence three times to get ORFs spanning the junction.

  2. 3.4.1.2

    Click “ATG and alternative initiation codons” and submit the form.

  3. 3.4.1.3

    The new page will open with a list of ORFs present on circAkt3.

  4. 3.4.1.4

    The ORFs spanning the backsplice junction should be identified manually. As shown in Fig. 6, the selected ORF1 spans the backsplice junction sequence.

3.4.2 Analysis of IRES

  1. 3.4.2.1

    Reportedly presence of IRES in circRNAs may help drive protein translation. IRESfinder (https://github.com/xiaofengsong/IRESfinder) is a standalone Python script (IRESfinder/IRESfinder.py-fcircRNA_2x_sequences.fa-ocircRNA_IRESfinder.result) for screening of IRES sequences in eukaryotic cells based on logit model and k-mer features (Zhao et al. 2018).

  2. 3.4.2.2

    IRESfinder results will score circRNAs into IRES and non-IRES sequences.

  3. 3.4.2.3

    Here, circAkt3 was predicted as a non-IRES sequence suggesting that this may not be translated into protein.

3.4.3 Analysis of m6A Sites

  1. 3.4.3.1

    The m6A sites on the circRNA sequence can be found by SRAMP web tool (Zhou et al. 2016).

  2. 3.4.3.2

    Open the “prediction” tab on the https://www.cuilab.cn/sramp webpage.

  3. 3.4.3.3

    Paste the circAkt3 sequence in the input box for “Mature mRNA mode”.

  4. 3.4.3.4

    The new window shows the results of m6A prediction sites

  5. 3.4.3.5

    As shown in Fig. 7, circAkt3 does not have any m6A sites.

    Fig. 7
    A screenshot of the S R A M P page interface with columns marked for please input a FASTA format sequence, showing the query sequence as R N A, a submit button, and a plot for prediction score distribution along the query sequence.

    Prediction of m6A sites on circAkt3 using SRAMP web server

In addition to the above analysis, the association of circAkt3 with polyribosomes must be tested to study its translation potential (Panda et al. 2017; Ye et al. 2021). Since circAkt3 does not contain IRES or m6A sites, we assume it may not be translated into proteins.

4 Technical Notes

  1. 4.1

    The sequence of multiple circRNAs can be extracted at once using the BED12 information for those circRNAs. The input.bed file used in BEDtools does not contain the headers.

  2. 4.2

    The circRNA sequences in Fasta format can be uploaded as input for predicting RBPs for multiple circRNAs.

  3. 4.3

    Any other sequence-based miRNA prediction webserver or standalone tool can be used to predict miRNAs targeting circRNAs. Examples include miRanda, TargetScan, miRWalk, and miRNet.

  4. 4.4

    Statistical overrepresentation test or gene set enrichment test can also be performed to check specific enrichment of genes for specific GO terms and pathways.

  5. 4.5

    The standalone ORFfinder can predict the ORFs present in multiple circRNAs using the sequences in Fasta format with the following command line in UNIX.

    ORFfinder-in circRNA_3x_sequences.fa-s 1-strand plus-outfmt 1-out circRNA_ORF.txt