Abstract
We investigate combinatorial enumeration problems related to subsequences of strings; in contrast to substrings, subsequences need not be contiguous. For a finite alphabet Σ, the following three problems are solved. (1) Number of distinct subsequences: Given a sequence s ∈Σn and a nonnegative integer k ≤n, how many distinct subsequences of length k does s contain? A previous result by Chase states that this number is maximized by choosing s as a repeated permutation of the alphabet. This has applications in DNA microarray production. (2) Number of ρ -restricted ρ -generated sequences: Given s ∈Σn and integers k ≥1 and ρ≥1, how many distinct sequences in Σk contain no single nucleotide repeat longer than ρ and can be written as \(s_1^{r_1}\dots s_n^{r_n}\) with 0≤r i ≤ρ for all i? For ρ= ∞, the question becomes how many length-k sequences match the regular expression s 1 * s 2 * ...s n *. These considerations allow a detailed analysis of a new DNA sequencing technology (“454 sequencing”). (3) Exact length distribution of the longest increasing subsequence: Given Σ= {1,...,K} and an integer n ≥1, determine the number of sequences in Σn whose longest strictly increasing subsequence has length k, where 0 ≤k ≤K. This has applications to significance computations for chaining algorithms.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Margulies, M., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057), 376–380 (2005), Corrigendum in Nature 439(7075), 502 (2006)
Aldous, D., Diaconis, P.: Longest increasing subsequences: From patience sorting to the Baik-Deift-Johansson theorem. Bulletin of the American Mathematical Society 36(4), 413–432 (1999)
Niedermeier, R.: Invitation to Fixed Parameter Algorithms. Oxford University Press, Oxford (2006)
Rahmann, S.: The shortest common supersequence problem in a microarray production setting. In: Proceedings of the 2nd European Conference in Computational Biology (ECCB 2003), pp. ii156–ii161, vol. 19(suppl. 2) of Bioinformatics (2003)
Chase, P.: Subsequence numbers and logarithmic concavity. Discrete Math. 16, 123–140 (1976)
Skiena, S.S.: The Algorithm Design Manual. Springer, Heidelberg (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rahmann, S. (2006). Subsequence Combinatorics and Applications to Microarray Production, DNA Sequencing and Chaining Algorithms. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_15
Download citation
DOI: https://doi.org/10.1007/11780441_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35455-0
Online ISBN: 978-3-540-35461-1
eBook Packages: Computer ScienceComputer Science (R0)