Abstract
Given a set S ={s 1,s 2,...,s n } of strings each of length m, and an integer L, we study the following two problems.
k -Closest Substring problem: find k center strings c 1,c 2 ,...,c k of length L minimizing d such that for each s j ∈ S, there is a length-L substring t j (closest substring) of s j with min 1 ≤ i ≤ k d(c i ,t j ) ≤ d. We give a PTAS for this problem, for k=O(1).
k -Consensus Pattern problem: find k median strings c 1,c 2 ,...,c k of length L and a substring t j (consensus pattern) of length L from each s j minimizing the total cost \(w= \sum \limits_{j=1}^n \min \limits _{1 \le i \le k} d(c_i,t_j)\). We give a PTAS for this problem, for k=O(1).
Our results improve recent results of [10] and [16] both of which depended on the random linear transformation technique in [16]. As for general k case, we give an alternative and direct proof of the NP-hardness of (2-ε)-approximation of the Hamming radius k -clustering problem, a special case of the k -Closest Substring problem restricted to L=m.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blum, T., Jiang, M., Li, J.: Linear Approximation of Shortest Superstrings. Journal of the ACM 41(4), 630–647 (1994)
Buhler, J.: Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics 17, 419–428 (2001)
Feder, T., Greene, D.H.: Optimal algorithms for approximate clustering. In: Proceedings of the 20th Annual ACM Symposium on Theory of Computing, pp. 434–444 (1988)
Fowler, R.J., Paterson, M.S., Tanimoto, S.L.: Optimal packing and covering in the plane are NP-complete. Inf. Proc. Letters 12(3), 133–137 (1981)
Garey, M.R., Johnson, D.S.: The rectilinear Steiner tree problem is NPcomplete. SIAM J. Appl. Math. 32, 826–834 (1977)
Gasieniec, L., Jansson, J., Lingas, A.: Efficient approximation algorithms for the Hamming center problem. In: Proc. 10th ACM-SIAM Symp. on Discrete Algorithms (1999),pp.S905–S906 (1999)
Gasieniec, L., Jansson, J., Lingas, A.: Approximation Algorithms for Hamming Clustering Problems. In: Proc. 11th Symp. CPM (2000), pp. 108–118 (2000)
Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38, 293–306 (1985)
Indyk, P., Motwani, R.: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In: Proc. 30th Annual ACM Symp. Theory Comput., pp. 604–613 (1998)
Jansson, J.: Consensus Algorithms for Trees and Strings. Doctoral dissertation (2003)
Lanctot, K., Li, M., Ma, B., Wang, L., Zhang, L.: Distinguishing string selection problems. In: Proc. 10th ACM-SIAM Symp. Discrete Algorithms, pp. 633–642 (1999)
Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. In: Proceedings of the 31st Annual ACM Symposium on Theory of Computing (1999), pp. 473–482 (1999)
Li, M., Ma, B., Wang, L.: On the Closest String and Substring Problems. Journal of ACM 49(2), 157–171 (2002)
Ma, B.: A polynomial time approximation scheme for the Closest Substring problem. In: Proc. 11th Annual Symposium on Combinatorial Pattern Matching, pp. 99–107 (2000)
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)
Ostrovsky, R., Rabani, Y.: Polynomial-Time Approximation Schemes for Geometric Min-Sum Median Clustering. Journal of ACM 49(2), 139–156 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiao, Y., Xu, J., Li, M. (2004). On the k-Closest Substring and k-Consensus Pattern Problems. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-27801-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22341-2
Online ISBN: 978-3-540-27801-6
eBook Packages: Springer Book Archive