Abstract
Gene regulation is of utmost importance to cell homeostasis; thus, any dysregulation in it often leads to disease. MicroRNAs (miRNAs) are involved in posttranscriptional gene regulation and consequently, their dysregulation has been associated with many diseases.
MiRBase version 21 contains microRNAs from about 200 species organized into about 70 clades. It has been shown that not all miRNAs collected in the database are likely to be real and, therefore, novel routes to delineate between correct and false miRNAs should be explored. We introduce a novel approach based on k-mer frequencies and machine learning that assigns an unknown/unlabeled miRNA to its most likely clade/species of origin. A simple way to filter new data would be to ensure that the novel miRNA categorizes closely to the species it is said to originate from. For that, an ensemble classifier of multiple two-class random forest classifiers was designed, where each random forest was trained on one species–clade pair. The approach was tested with different sampling methods on a dataset that was taken from miRBase version 21 and it was evaluated using a hierarchical F-measure. The approach predicted 81% to 94% of the test data correctly, depending on the sampling method. This is the first classifier that can classify miRNAs to their species of origin. This method will aid in the evaluation of miRNA database integrity and analysis of noisy miRNA samples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bartel D (2004) MicroRNAsGenomics, biogenesis, mechanism, and function. Cell 116:281–297
Hammond SM (2015) An overview of microRNAs. Adv Drug Deliv Rev 87:3–14
Hamzeiy H, Suluyayla R, Brinkrolf C, Janowski SJ, Hofestaedt R, Allmer J (2017) Visualization and analysis of MicroRNAs within KEGG pathways using VANESA. J Integr Bioinform 14:20160004
Takamizawa J, Konishi H, Yanagisawa K, Tomida S, Osada H, Endoh H, Harano T, Yatabe Y, Nagino M, Nimura Y et al (2004) Reduced expression of the let-7 microRNAs in human lung cancers in association with shortened postoperative survival. Cancer Res 64:3753–3756
Rodriguez A, Griffiths-Jones S, Ashurst JL, Bradley A (2004) Identification of mammalian microRNA host genes and transcription units. Genome Res 14:1902–1910
Sempere LF, Cole CN, Mcpeek MA, Peterson KJ (2006) The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. J Exp Zoolog B Mol Dev Evol 306:575–588
Kozomara A, Griffiths-Jones S (2010) miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res 39:D152–D157
Velandia-Huerto CA, Yazbeck AM, Schor J, Stadle PF (2021) Evolution and phylogeny of microRNAs—protocols, pitfalls, and problems. In: Allmer J, Yousef M (eds) miRNomics: microRNA biology and computational analysis. Methods in molecular biology, vol 2257. Springer, New York
Meng Y, Shao C, Wang H, Chen M (2012) Are all the miRBase-registered microRNAs true? A structure-and expression-based re-examination in plants. RNA Biol 9:249–253
Saçar MD, Hamzeiy H, Allmer J (2013) Can MiRBase provide positive data for machine learning for the detection of MiRNA hairpins? J Integr Bioinform 10:1–11
Bağcı C, Allmer J (2016) One step forward, two steps back; xeno-microRNAs reported in breast milk are artifacts. PLoS One 11:e0145065
Zhang B, Pan X, Cannon CH, Cobb GP, Anderson TA (2006) Conservation and divergence of plant microRNA genes. Plant J 46:243–259
Yousef M, Khalifa W, Acar İE, Allmer J (2017) MicroRNA categorization using sequence motifs and k-mers. BMC Bioinformatics 18:170
Yousef M, Nigatu D, Levy D, Allmer J, Henkel W (2017) Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers. Eurasip J Adv Signal Proc 2017. https://doi.org/10.1186/s13634-017-0506-8
Yousef M (2019) Hamming distance and K-mer features for classification of pre-cursor microRNAs from different species. In: Benavente-Peces C, Slama SB, Zafar B (eds) Proceedings of the 1st international conference on smart innovation, ergonomics and applied human factors (SEAHF). Springer International Publishing, Cham, pp 180–189
Demirci MDS, Baumbach J, Allmer J (2017) On the performance of pre-microRNA detection algorithms. Nat Commun 8:1–9
Yousef M, Allmer J (2019) Classification of pre-cursor microRNAs from different species using a new set of features BT—database and expert systems applications. In: Anderst-Kotsis G, Tjoa AM, Khalil I (eds) . Springer International Publishing, Cham, pp 15–20
Yousef M, Khalifa W, Acar İE, Allmer J (2017) Distinguishing between MicroRNA targets from diverse species using sequence motifs and k-mers. In: Proceedings of the 10th international joint conference on biomedical engineering systems and technologies. SCITEPRESS—Science and Technology Publications, Setúbal, pp 133–139
Yousef M, Levy D, Allmer J (2018) Species categorization via MicroRNAs—based on 3′UTR target sites using sequence features: in: proceedings of the 11th international joint conference on biomedical engineering systems and technologies. SCITEPRESS—Science and Technology Publications, Funchal, Madeira, Portugal, pp 112–118
Yousef M, Khaleifa W, Onal-Suzek T (2019) In silico validation of ncRNA-ncRNA interaction sites with ncRNAs represented by k-mers features: in: proceedings of the 12th international joint conference on biomedical engineering systems and technologies. SCITEPRESS - Science and Technology Publications, Prague, Czech Republic, pp 168–173
Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition. IEEE, Piscataway, New Jersey, pp 278–282
Saçar MD, Allmer J (2014) Machine learning methods for microRNA gene prediction. In: miRNomics: MicroRNA biology and computational analysis. Springer, New York, pp 177–187
Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460–2461
Kurtz S, Narechania A, Stein JC, Ware D (2008) A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes. BMC Genomics 9:517
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Ohl P, Thiel K, Wiswedel B (2009) KNIME-the Konstanz information miner: version 2.0 and beyond. AcM SIGKDD Explor Newsl 11:26–31
Kiritchenko S, Matwin S, Nock R, Famili AF (2006) Learning and evaluation in the presence of class hierarchies: application to text categorization. In: Conference of the Canadian Society for Computational Studies of intelligence. Springer, New York, pp 395–406
Xu Q-S, Liang Y-Z (2001) Monte Carlo cross validation. Chemom Intell Lab Syst 56:1–11
FernáNdez A, LóPez V, Galar M, Del Jesus MJ, Herrera F (2013) Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowl-Based Syst 42:97–110
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software. ACM SIGKDD Explor Newsl 11:10–18. https://doi.org/10.1145/1656274.1656278
Yousef M, Abdallah L, Allmer J (2019) maTE: discovering expressed interactions between microRNAs and their targets. Bioinformatics 35:4020–4028. https://doi.org/10.1093/bioinformatics/btz204
Olcum M, Tufekci KU, Genc S (2021) MicroRNAs in genetic etiology of human diseases. In: Allmer J, Yousef M (eds) miRNomics: microRNA biology and computational analysis. Methods in molecular biology, vol 2257. Springer, New York
Yildiz MT, Tutar L, Giritlioğlu NI, Bayram B, Tutar Y (2021) MicroRNAs and heat shock proteins in breast cancer biology. In: Allmer J, Yousef M (eds) miRNomics: microRNA biology and computational analysis. Methods in molecular biology, vol 2257. Springer, New York
Karagur ER, Akgun S, Akca H (2021) Computational and bioinformatics methods for microRNA gene prediction. In: Allmer J, Yousef M (eds) miRNomics: microRNA biology and computational analysis. Methods in molecular biology, vol 2257. Springer, New York
Uzuner E, Ulu GT, Gürler SB, Baran Y (2021) The role of MiRNA in cancer: pathogenesis, diagnosis, and treatment. In: Allmer J, Yousef M (eds) miRNomics: microRNA biology and computational analysis. Methods in molecular biology, vol 2257. Springer, New York
Robinson O, Dylus D, Dessimoz C (2016) Phylo.io : interactive viewing and comparison of large phylogenetic trees on the web. Mol Biol Evol 33:2163–2166. https://doi.org/10.1093/molbev/msw080
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Odenthal, L., Allmer, J., Yousef, M. (2022). Ensemble Classifiers for Multiclass MicroRNA Classification. In: Allmer, J., Yousef, M. (eds) miRNomics. Methods in Molecular Biology, vol 2257. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1170-8_12
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1170-8_12
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1169-2
Online ISBN: 978-1-0716-1170-8
eBook Packages: Springer Protocols