Abstract
The DNA sequencing technology is evolving day by day. It results in the creation of large repository of sequence data. One important aspect of analyzing these sequence data is sequence alignment. In this research paper, a method of DNA sequence alignment with the help of a dynamic window algorithm has been discussed. The dynamic window will help in producing the comparative score of different alignment scheme resulting into the best acceptable alignments efficiently.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
Introduction
In the past decade, a lot of research work has been done in the field of DNA sequence alignment. As a result, a large number of alignment algorithms and related software are available at present [1]. Basically, a sequence alignment problem deals with matching the nitrogen base of a sequence with a reference sequence. The processes help in finding out the similar functional regions that have great importance in the field of bioinformatics and its related field.
DNA Sequencing and Alignment
DNA is generally a long sequence of base pairs, polymer of repeating subunits called nucleotide. Each nucleotide contains five-carbon sugar, a phosphate group, and a base. Different bases are adenine (A), guanine (G), thymine (T), cytosine (C), and uracil (U). Nitrogen base combines with the five-carbon sugar to form nucleoside. Nucleoside in turn combines with phosphate groups to form nucleotide [2]. During the sequencing, DNA is represented as sequence of these nitrogen bases. Various methods are there for sequencing the DNA like Sanger’s method, Maxam & Gilbert method, Pyro-sequencing, etc. [3]. Apart from these conventional methods, there are many other interdisciplinary approaches like graphical representation techniques [4], Voss representation, 2-bit binary, the 4-bit binary, the paired nucleotide, the 12-letter alphabet, the digital Z-signals, and the phase-specific Z-curve [5].
Sequence alignment is the simultaneous matching of two or more nucleotides. There are different algorithms available for this purpose. Broadly, those can be classified as algorithms based on hash tables, algorithms based on suffix trees and algorithms based on merge sorting [6]. Optimal alignment methods aim to maximize the score or minimize the cost of the process. But in case of dynamic programming approach, the time complexity of the algorithm is proportional to length of the sequence in exponential terms [7, 8]. In such approach, if the gap cost is nonlinear then complexity is extremely more [9].
Proposed Method
In this research paper, two DNA sequences have been aligned using a dynamic window algorithm. The major functions associated with this algorithm are LengthEqualiser (X, Y), dynamic window (p, q, X, Y), Modify-Y (i, j, X, Y), Modify-X (i, j, X, Y), and score (hit, c).
The function Length Equalizer takes the two sequences as arguments and tries to make them the same length by introducing the gaps at proper place. NIG represents the number of introductory gaps. X and Y are the two sequences,| | represents their cardinality.
The dynamic window function tries to match the possible common sequences. The dynamic nature of the window is maintained with the help of sub-functions—Modify-Y (i, j, X, Y) and Modify-X (i, j, X, Y). Apart from this, the function sends the hits calculated to the score function to calculate the score of the process and keeps the different common sub-sequences obtained.
These two Modify functions are accountable for the dynamic nature of the window. First of all, these functions search the right position for the shifting of a base by calculating the number of gaps required which is represented by “c”. Then after introducing the gaps in the either sequence X OR Y, they revoke the Length_Equaliser (X, Y) and Dynamic_Window (p, q, X, Y) which work recursively. They also return the c-value to Score function.
The Score () function takes hit from Dynamic_Window (p, q, X, Y) and c from Modify-X (i, j, X, Y) or Modify-Y (i, j, X, Y) and calculates the score. Here, the cost of hit is calculated using the cost of each base in the sequence. Similarly, the penalty is calculated using the cost of a single gap and cumulative number of gaps introduced. Figure 7.1 represents the schematic view of the working of the algorithm.
Result and Discussion
The performance of the dynamic window algorithm has been analyzed by different combinations of the following DNA sequences of β-globin genes of eight species given in Table 7.1[10]. A part of the algorithm functioning is given in Fig. 7.2.
As a result of the working of the Dynamic_Window algorithm on a sub-sequences of Human and Opossum β-globin genes, it has been found that the hits are 11, gaps are 7 using Modify-Y (i, j, X, Y) function whereas the hits are 12, gaps are 7 using Modify-X (i, j, X, Y) function. By taking the cost of a base as +1 and that of a gap as −1, the score % in first case is 13.79 (left side in Fig. 7.2) whereas in second case, it is 17.29 (right side in Fig. 7.2). The common sequences in first case in the List are {ATGGTGCAC, CT, GA} and in the second case are {ATGGTGCAC, T, GA}.
Conclusion
In this study, the common sequence alignment using dynamic window algorithm has been proposed. The structure and the working of the algorithm have been discussed. The algorithm has been used for different sets of subsequences of β-globin genes and the score % has been obtained. The dynamic nature is clear from the comparative analysis of the score percentages.
References
Bioinformatics: sequence and genome analysis, David W. Mount, Cold Spring Harbor Laboratory Press.
Fundamental Molecular Biology, Lizabeth A. Allison, Blackwell Publishing Ltd.
A review of DNA sequencing techniques, Lilian T. C. Franca et al, Quarterly Reviews of Biophysics 35, 2 (2002), pp. 169–200. 2002 Cambridge University Press.
DB-Curve: a novel 2D method of DNA sequence visualization and representation, Chemical Physics Letters 367 (2003) 170–176.
Advanced Numerical Representation of DNA Sequences, 2012 International Conference on Bioscience, Biochemistry and Bioinformatics, IPCBEE vol. 3 1(2012) © (2012) IACSIT Press, Singapore.
A survey of sequence alignment algorithms for next-generation sequencing, Heng Li and Nils Homer, Briefings in bioinformatics.vol 11. no 5. 473–483, 2010.
Bioinformatics: sequence, structure & databanks, A practical approach, D. Higgins, W. Taylor, Oxford university press.
Sequence alignment and Markov’s Model, K.R. Sharma, The Mc-Graw Hill.
Mind the Gaps: Evidence of Bias in Estimates of Multiple Sequence Alignments, Tanya Golubchik etal, Mol. Biol. Evol. 24(11):2433–2442. 2007.
Directed graphs of DNA sequences and their numerical characterization, Chun Li, Journal of Theoretical Biology 241 (2006) 173–177.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Behera, L.K. (2018). Global Common Sequence Alignment Using Dynamic Window Algorithm. In: Mandal, J., Saha, G., Kandar, D., Maji, A. (eds) Proceedings of the International Conference on Computing and Communication Systems. Lecture Notes in Networks and Systems, vol 24. Springer, Singapore. https://doi.org/10.1007/978-981-10-6890-4_7
Download citation
DOI: https://doi.org/10.1007/978-981-10-6890-4_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6889-8
Online ISBN: 978-981-10-6890-4
eBook Packages: EngineeringEngineering (R0)