Abstract
The promoter region of around 70% of all genes in the human genome is overlapped by a CpG island (CGI). CGIs have known functions in the transcription initiation and outstanding compositional features like high G+C content and CpG ratios when compared to the bulk DNA. We have shown before that CGIs manifest as clusters of CpGs in mammalian genomes and can therefore be detected using clustering methods. These techniques have several advantages over sliding window approaches which apply compositional properties as thresholds. In this protocol we show how to determine local (CpG islands) and global (distance distribution) clustering properties of CG dinucleotides and how to generalize this analysis to any k-mer or combinations of it. In addition, we illustrate how to easily cross the output of a CpG island prediction algorithm with our methylation database to detect differentially methylated CGIs. The analysis is given in a step-by-step protocol and all necessary programs are implemented into a virtual machine or, alternatively, the software can be downloaded and easily installed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Suzuki MM, Bird A (2008) DNA methylation landscapes: provocative insights from epigenomics. Nat Rev Genet 9:465–476. https://doi.org/10.1038/nrg2341
Duncan BK, Miller JH (1980) Mutagenic deamination of cytosine residues in DNA. Nature 287:560–561. https://doi.org/10.1038/287560a0
Deaton AM, Bird A (2011) CpG islands and the regulation of transcription. Genes Dev 25:1010–1022. https://doi.org/10.1101/gad.2037511
Hackenberg M, Previti C, Luque-escamilla PL et al (2006) CpGcluster: a distance-based algorithm for CpG-island detection. BMC Bioinformatics 13:1–13. https://doi.org/10.1186/1471-2105-7-446
Gardiner-Garden M, Frommer M (1987) CpG islands in vertebrate genomes. J Mol Biol 196:261–282
Takai D, Jones PA (2002) Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc Natl Acad Sci U S A 99:3740–3745. https://doi.org/10.1073/pnas.052410099
Hackenberg M, Barturen G, Carpena P et al (2010) Prediction of CpG-island function: CpG clustering vs. sliding-window methods. BMC Genomics 11:327. https://doi.org/10.1186/1471-2164-11-327
Hackenberg M, Carpena P, Bernaola-galván P et al (2011) WordCluster : detecting clusters of DNA words and genomic elements. Algorithms Mol Biol 6:2. https://doi.org/10.1186/1748-7188-6-2
Pruitt KD, Tatusova T, Brown GR, Maglott DR (2012) NCBI reference sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res 40:D130–D135. https://doi.org/10.1093/nar/gkr1079
Fernandez-Pozo N, Menda N, Edwards JD et al (2015) The Sol Genomics Network (SGN)—from genotype to phenotype to breeding. Nucleic Acids Res 43:D1036–D1041. https://doi.org/10.1093/nar/gku1195
Kent WJ, Sugnet CW, Furey TS et al (2002) The human genome browser at UCSC. Genome Res 12:996–1006. https://doi.org/10.1101/gr.229102
Altschul SF, Erickson BW (1985) Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. Mol Biol Evol 2:526–538
Lister R, Pelizzola M, Dowen RH et al (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462:315–322. https://doi.org/10.1038/nature08514
Grantham R, Gautier C, Gouy M et al (1980) Codon catalog usage and the genome hypothesis. Nucleic Acids Res 8:197. https://doi.org/10.1093/nar/8.1.197-c
Bernardi G (1993) Genome organization and species formation in vertebrates. J Mol Evol 37(4):331–337
Bernaola-Galván P, Oliver JL, Hackenberg M et al (2012) Segmentation of time series with long-range fractal correlations. Eur Phys J B. https://doi.org/10.1140/epjb/e2012-20969-5
Hackenberg M, Rueda A, Carpena P et al (2012) Clustering of DNA words and biological function: a proof of principle. J Theor Biol 297:127–136. https://doi.org/10.1016/j.jtbi.2011.12.024
Carpena P, Oliver JL, Hackenberg M et al (2011) High-level organization of isochores into gigantic superstructures in the human genome. Phys Rev E Stat Nonlin Soft Matter Phys 83:31908
Dios F, Barturen G, Lebrón R et al (2014) DNA clustering and genome complexity. Comput Biol Chem 53:71–78. https://doi.org/10.1016/j.compbiolchem.2014.08.011
Oliver L, Hackenberg M, Barturen G, De GD (2011) NGSmethDB: a database for next-generation sequencing single-cytosine-resolution DNA methylation data. Nucleic Acids Res 39:75–79. https://doi.org/10.1093/nar/gkq942
Hackenberg M, Barturen G, Oliver JL (2011) NGSmethDB: a database for next-generation sequencing single-cytosine-resolution DNA methylation data. Nucleic Acids Res 39:D75–D79. https://doi.org/10.1093/nar/gkq942
Lebrón R, Gómez-Martín C, Carpena P et al (2016) NGSmethDB 2017: enhanced methylomes and differential methylation. Nucleic Acids Res 45:gkw996. https://doi.org/10.1093/nar/gkw996
Geisen S, Barturen G, Alganza M et al (2014) NGSmethDB: an updated genome resource for high quality , single-cytosine resolution methylomes. Nucleic Acids Res 42:53–59. https://doi.org/10.1093/nar/gkt1202
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Gómez-Martín, C., Lebrón, R., Oliver, J.L., Hackenberg, M. (2018). Prediction of CpG Islands as an Intrinsic Clustering Property Found in Many Eukaryotic DNA Sequences and Its Relation to DNA Methylation. In: Vavouri, T., Peinado, M. (eds) CpG Islands. Methods in Molecular Biology, vol 1766. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7768-0_3
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7768-0_3
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7767-3
Online ISBN: 978-1-4939-7768-0
eBook Packages: Springer Protocols