Influence of Sequence Length in Promoter Prediction Performance

Carvalho, Sávio G.; Guerra-Sá, Renata; de C. Merschmann, Luiz H.

doi:10.1007/978-3-319-12418-6_6

Sávio G. Carvalho¹⁹,
Renata Guerra-Sá¹⁹ &
Luiz H. de C. Merschmann¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8826))

Included in the following conference series:

Brazilian Symposium on Bioinformatics

812 Accesses
1 Citations

Abstract

The advent of rapid evolution on sequencing capacity of new genomes has evidenced the need for data analysis automation aiming at speeding up the genomic annotation process and reducing its cost. Given that one important step for functional genomic annotation is the promoter identification, several studies have been taken in order to propose computational approaches to predict promoters. Different classifiers and characteristics of the promoter sequences have been used to deal with this prediction problem. However, several works in literature have addressed the promoter prediction problem using datasets containing sequences of 250 nucleotides or more. As the sequence length defines the amount of dataset attributes, even considering a limited number of properties to characterize the sequences, datasets with a high number of attributes are generated for training classifiers. Once high-dimensional datasets can degrade the classifiers predictive performance or even require an infesible processing time, predicting promoters by training classifiers from datasets with a reduced number of attributes, it is essential to obtain good predictive performance with low computational cost. To the best of our knowledge, there is no work in literature that verified in a sistematic way the relation between the sequences length and the predictive performance of classifiers. Thus, in this work, sixteen datasets composed of different sized sequences are built and evaluated using the SVM and k-NN classifiers. The experimental results show that several datasets composed of shorter sequences acheived better predictive performance when compared with datasets composed of longer sequences and consumed a significantly shorter processing time.

This research was partially supported by CNPq, FAPEMIG and UFOP.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

The impact of sequence length and number of sequences on promoter prediction performance

Article Open access 16 December 2015

Image-based promoter prediction: a promoter prediction method based on evolutionarily generated patterns

Article Open access 06 December 2018

Supervised promoter recognition: a benchmark framework

Article Open access 02 April 2022

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Abeel, T., Saeys, Y., Bonnet, E., Rouzé, P., Van de Peer, Y.: Generic eukaryotic core promoter prediction using structural features of dna. Genome Research 18(2), 310–323 (2008)
Article Google Scholar
Abeel, T., Saeys, Y., Rouzé, P., Van de Peer, Y.: Prosom: core promoter prediction based on unsupervised clustering of dna physical profiles. Bioinformatics 24(13), i24–i31 (2008)
Google Scholar
Baldi, P., Brunak, S., Chauvin, Y., Pedersen, A.G.: Computational applications of dna structural scales. In: Glasgow, J.I., Littlejohn, T.G., Major, F., Lathrop, R.H., Sankoff, D., Sensen, C. (eds.) ISMB, pp. 35–42. AAAI (1998)
Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Dineen, D., Schroder, M., Higgins, D., Cunningham, P.: Ensemble approach combining multiple methods improves human transcription start site prediction. BMC Genomics 11(1), 677 (2010)
Article Google Scholar
Florquin, K., Saeys, Y., Degroeve, S., Rouzé, P., Van de Peer, Y.: Large-scale structural analysis of the core promoter in mammalian and plant genomes. Nucleic Acids Research 33(13), 4255–4264 (2005)
Article Google Scholar
Gan, Y., Guan, J., Zhou, S.: A pattern-based nearest neighbor search approach for promoter prediction using dna structural profiles. Bioinf. 25(16), 2006–2012 (2009)
Article Google Scholar
Gan, Y., Guan, J., Zhou, S.: A comparison study on feature selection of dna structural properties for promoter prediction. BMC Bioinformatics 13(1), 4 (2012)
Google Scholar
Grishkevich, V., Hashimshony, T., Yanai, I.: Core promoter t-blocks correlate with gene expression levels in c. elegans. Genome Research 21(5), 707–717 (2011)
Article Google Scholar
Meysman, P., Marchal, K., Engelen, K.: DNA structural properties in the classification of genomic transcription regulation elements. Bioinformatics and Biology Insights 6, 155–168 (2012)
Article Google Scholar
Ohler, U., Niemann, H., Liao, G.C., Rubin, G.M.: Joint modeling of dna sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17(suppl. 1), S199–S206 (2001)
Google Scholar
Yamashita, R., Sugano, S., Suzuki, Y., Nakai, K.: Dbtss: Database of transcriptional start sites progress report in 2012. Nucleic Acids Res. 40(D1), 150–154 (2012)
Google Scholar
Zeng, J., Zhu, S., Yan, H.: Towards accurate human promoter recognition: a review of currently used sequence features and classification methods. Briefings in Bioinformatics 10(5), 498–508 (2009)
Article Google Scholar
Kuhn, M., Johnson, K.: Applied Predictive Modeling. Springer (2013)
Google Scholar
Abeel, T., Van de Peer, Y., Saeys, Y.: Toward a gold standard for promoter prediction evaluation. Bioinformatics 25(12), i313–i320 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Federal University of Ouro Preto (UFOP), Ouro Preto, MG, Brazil
Sávio G. Carvalho, Renata Guerra-Sá & Luiz H. de C. Merschmann

Authors

Sávio G. Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Renata Guerra-Sá
View author publications
You can also search for this author in PubMed Google Scholar
Luiz H. de C. Merschmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Ciência de Computação, Universidade Federal de Minas Gerais, Av. Antonio Carlos 6627, Pampulha, 31270-010, Belo Horizonte, MG, Brazil
Sérgio Campos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carvalho, S.G., Guerra-Sá, R., de C. Merschmann, L.H. (2014). Influence of Sequence Length in Promoter Prediction Performance. In: Campos, S. (eds) Advances in Bioinformatics and Computational Biology. BSB 2014. Lecture Notes in Computer Science(), vol 8826. Springer, Cham. https://doi.org/10.1007/978-3-319-12418-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-12418-6_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12417-9
Online ISBN: 978-3-319-12418-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Influence of Sequence Length in Promoter Prediction Performance

Abstract

Chapter PDF

Similar content being viewed by others

The impact of sequence length and number of sequences on promoter prediction performance

Image-based promoter prediction: a promoter prediction method based on evolutionarily generated patterns

Supervised promoter recognition: a benchmark framework

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Influence of Sequence Length in Promoter Prediction Performance

Abstract

Chapter PDF

Similar content being viewed by others

The impact of sequence length and number of sequences on promoter prediction performance

Image-based promoter prediction: a promoter prediction method based on evolutionarily generated patterns

Supervised promoter recognition: a benchmark framework

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation