Abstract
Regularities in strings are often related to periods and covers, which have extensively been studied, and algorithms for their efficient computation have broad application. In this paper we concentrate on computing cyclic regularities of strings, in particular, we propose several efficient algorithms for computing: (i) cyclic periodicity; (ii) all cyclic periodicity; (iii) maximal local cyclic periodicity; (iv) cyclic covers.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction and Related Work
A fundamental concept of repeating patterns or regularities is that of periods (also known as powers). A period of order k is defined by a concatenation of k identical blocks of symbols. The study of periods can be traced to as far back as the early 1900s with the work of [9], who researched a set of strings that do not contain any substrings that are periods. Periods in diverse forms gained prominence, when they became key structures in computational biology, where they are associated with various regulatory mechanisms and play an important role in genomic fingerprinting [6].
In regularities in strings, one of the most general notions is related to period or power, for instance, given a string x of length n, a period k of a string x is a sub string of x, if it can be decomposed into equal-length blocks of symbols, such that x = \(u^k u'\), where \(u'\) is a prefix of u. However, for simplicity we will discard \(u'\) and only consider \(u^k\).
So far, regularities in strings related with periods and powers, which have been extensively studied, [3,4,5, 8] and algorithms for their efficient computation have broad applications. In this paper, we study cyclic factors of strings. The motivation of cyclic factors comes from viruses. The viruses are circular strings, for example Escherichia coli (E.coli) has 154 bases and it is circular [12] (Fig. 1). Formally, the viruses break up at any point of the circle, for example, that can appear in the DNA sequence as \(x_\delta \dots x_nx_1 \dots x_{\delta -1}\) breaking up at position \(\delta \) (Fig. 2). Now, we propose several efficient algorithms for computing: (i) cyclic periodicity; (ii) all cyclic periodicity; (iii) maximal local cyclic periodicity; (iv) cyclic covers.
2 Preliminaries
A string x of length \(|x|=n\) over an alphabet \(\varSigma \) of size \(\sigma \) can be denoted as \(x[1 \mathinner {.\,.}n]=x[1]x[2] \dots x[i] \dots x[n]\), where \(1 \le i \le n\) and the i-th letter of x is denoted by x[i] \(\in \) \(\varSigma \). The empty string \(\epsilon \) is the string of length 0. The string \(x^R\) is the reverse of string x. And \(x[i \mathinner {.\,.}j]\), \(1 \le i \le j \le n\), denotes the contiguous substring (or factor) of letters, such as \(x[i]x[i+1]x[i+2] \dots x[j]\). A substring \(x[i \mathinner {.\,.}j]\) is a suffix of x, if \(j=n\) and a substring \(x[i \mathinner {.\,.}j]\) is a prefix of x, if \(i=1\). Given a cyclic factor u of length k, \(1 \le k \le n\), we denote by c(u), for instance, \(u=ababc\), c(u) is one of the following rotations: ababc, babca, abcab, bcaba, cabab. Moreover, we say \(u=u_1 u_2 \dots u_n\), \(c_{\delta }(u)=u_{\delta } \dots u_n u_1 \dots u_{\delta -1}\).
In this paper, suffix trees are used extensively as computation tools. For a general introduction to suffix trees, see [2, 7, 10, 11].
Example 1
Consider a string \(x=aaabaabaabaabaaa=:{u_1}{u_2}{u_3}{u_4}\), where \(u_1=aaab\), \(u_2=aaba\), \(u_3=abaa\), \(u_4=baaa\) and \(k=4\), \(\ell =4\). Therefore x has a period of length \(\ell \).
Definition 1
A cyclic periodic array A of a string x of length n is defined to be as follows: \(A[i]:=\ell \), \(1 \le i \le n\), if and only if \(x[1 \mathinner {.\,.}i]\) has cyclic periodicity \(\ell \) by a string u and there no \(u'\), with \(|u'|\le |u|\) that is a cyclic period of \(x[1 \mathinner {.\,.}i]\).
Example 2
Consider a string \(x=aababa\) of length 6, a cyclic periodic array A as follows:
\(x[1]=a \implies A[1]:=1\) \(x[1 \mathinner {.\,.}4]=aaba \implies A[4]:=1\)
\(x[1 \mathinner {.\,.}2]=a\) \(a \implies A[2]:=2\) \(x[1 \mathinner {.\,.}5]=aabab \implies A[5]:=1\)
\(x[1 \mathinner {.\,.}3]=aab \implies A[3]:=1\) \(x[1 \mathinner {.\,.}6]=aab\) \(aba \implies A[6]:=2\)
Definition 2
We define maximal local k-cyclic periodicity of a string x, if a substring y is cyclic periodic and y is not a substring of another cyclic periodic strings.
Example 3
Consider a string \(x=aaaababaaab\), \(\varSigma =\{a,b\}\), and a substring \(y=aabababaa\) is 3-cyclic periodic and substring \(y\alpha =aabababaaa\), \(\alpha \in \varSigma \), is not cyclic periodic, and substring \(\beta y=aaabababaa\), \(\beta \in \varSigma \) is not cyclic periodic. Therefore, the substring \(y=aabababaa\) is maximal local 3-cyclic periodic in string \(x=aaaabababaaa\).
Definition 3
We say that a string x of length n is cyclic-coverable by a string u of length \(k'\), if and only if, for every position i of x, the following condition holds \(x[\beta \mathinner {.\,.}\gamma ]=c(u)\), \(1 \le \beta \le i \le \gamma \le n\).
Example 4
Consider a string \(x=aababaa={u_1}{u_2}\), \(u_1=aaba\), \(u_2=abaa\), \(k'=4\), \(\gamma =2\), is cyclic coverable by a string u, for every position i of x, \(x[1 \mathinner {.\,.}4]=x[4 \mathinner {.\,.}7]=c(u)\).
Definition 4
Compute all cyclic covers of a given string x, that is for all possible length cyclic covers.
Example 5
Consider a string \(x=ababbaba\), then ab, abab, ababb, ababbab are cyclic covers of x.
3 Computing k-cyclic Periodicity
Theorem 1
Given a string x of length n and an integer k, \(1 \le k \le n\), test whether it is k-cyclic periodic; this can be determined in \(\mathcal {O}(n/k)\) time and \(\mathcal {O}(n)\) space.
Proof
We construct the suffix tree of x (see [7, 10, 11]). We let \(u=x[1\dots k]\), then let \(\ell _m\) denote the depth of the lowest common ancestor of \(x[1 \mathinner {.\,.}n]\) (see [1]), and \(x[i_m \mathinner {.\,.}n]\). We compute the LCA \(\ell _m\) of \(x[1 \mathinner {.\,.}n]\) and \(x[i_m \mathinner {.\,.}n]\) for \(i_m=2k, 3k, \dots \), and \(\ell _k=n-k\), if \(\ell _m=1\) for some m, then x is not k-cyclic periodic string. Now consider \(C^{rignt}_m=(u_{\ell _m+1} \dots u_k)^R\), compute the \(\ell _m^{'}\) the LCA of \(u^R\) and \(C^{right}_m\). If \(\ell _m^{'} \ge \ell _m\) for all m, then x is k-cyclic periodic. \(\square \)
4 Computing All Cyclic Periodicities
Theorem 2
Given a string x of length n, test whether it is k-cyclic periodicity for all \(1 \le k \le n\), this can be determined in \(\mathcal {O}(n \log n)\) time and \(\mathcal {O}(n)\) space.
Proof
We apply the algorithm of Theorem 1 for \(k=1,2,\dots ,n\) and we test all cyclic periods of length k. The construction of the suffix tree of string x and \(x^R\) is done once costing \(\mathcal {O}(n)\). The total cost is
\(\square \)
Lemma 1
Compute the cyclic period of x.
Proof
The smallest cyclic-period of x is the cyclic-period of x. \(\square \)
5 Computing Maximal Local k-cyclic Periodicity
Theorem 3
We can compute all k-cyclic periodicity of x in \(\mathcal {O}(n \log n)\) time.
Proof
We apply the algorithm for \(k=1,2,\dots ,n\) and in this case, extend it to cyclic periods of length \(k+1\), where |y| = m is cyclic periodic and \(y\alpha \) = \(m+1\) is not cyclic periodic. Next, we perform this algorithm on string \(x^R\) as \(\mathcal {T}(x^R)\), where |y| = m, again is cyclic periodic and \(\beta y\) = \(m+1\) is not cyclic periodic.
The construction of the suffix tree of string x is done once. The total cost is
\(\square \)
Lemma 2
Compute maximal local k-cyclic periodicity of x.
Proof
We compute and merge the arrays for \(y\alpha \) and \(\beta y\) of x. That is the maximal local k-cyclic periodicity of x. \(\square \)
6 Computing \(k'\)-cyclic Coverability
Theorem 4
Given a string x of length n and an integer \(k'\), \(1 \le k' \le n\), test whether it is \(k'\)-cyclic coverable, this can be determined in \(\mathcal {O}(n)\) time and \(\mathcal {O}(n)\) space.
Proof
We compute the suffix tree of string x as \(\mathcal {T}(x)\), and also we compute the suffix tree of string \(x^R\) as \(\mathcal {T}(x^R)\).
Then we check \(x[1,k']\) with each one of \(x[n-k'+1,n]\), \(x[n-k',n-1]\), \(x[n-k'-1,n-2]\) ...\(x[2,k'+1]\), together with the reverse pairs in \(T(x^R)\). This way we build a collection of cyclic covers if there is one.
The construction of the suffix tree costs \(\mathcal {O}(n)\); checking of equality costs \(\mathcal {O}(1)\) and there are n factors. The total time is \(\mathcal {O}(n)\). \(\square \)
7 Computing All Cyclic Coverability
Theorem 5
Given a string x of length n, test whether it is \(k'\)-cyclic coverable for \(1 \le k' \le n\), this can be determined in \(\mathcal {O}(n^2)\) time and \(\mathcal {O}(n)\) space.
Proof
We apply the algorithm for \(k'=1,2,\dots ,n\) and we compare all cyclic coverable of length \(k'\). The construction of the suffix tree of string x is done once. The total cost is
\(\square \)
Lemma 3
Compute the cyclic coverability of x.
Proof
The smallest cyclic coverable of x is the all the cyclic coverable of x. \(\square \)
8 Conclusions and Open Problems
In this paper, we defined k-cyclic periodicity, we presented several efficient algorithms for computing: (i) cyclic periodicity; (ii) all cyclic periodicity; (iii) maximal local cyclic periodicity; (iv) cyclic covers.
Future work will be focused on computing the cyclic-periodic array, that is the cyclic periodicity of every prefix of string x and computing the cyclic-coverability array, that is testing each prefix of x, for cyclic-coverability. Finally, we will extend the cyclic periodicity to cover the case \(u_1u_2u_2\dots u_k u^1\), where \(u_i\)=\(c(u_j)\) \(\forall \) i, j and \(u^1\) is a substring of some \(u_i\).
References
Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000). https://doi.org/10.1007/10719839_9
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, New York (2007)
Defant, C.: Anti-power prefixes of the Thue-Morse word. arXiv preprint arXiv:1607.05825 (2016)
Erdős, P., et al.: Anti-ramsey theorems (1973)
Fujita, S., Magnant, C., Ozeki, K.: Rainbow generalizations of ramsey theory: a survey. Graphs Comb. 26(1), 1–30 (2010)
Kolpakov, R., Bana, G., Kucherov, G.: mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res. 31(13), 3672–3678 (2003)
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM (JACM) 23(2), 262–272 (1976)
Narayanan, S.: Functions on antipower prefix lengths of the Thue-Morse word. arXiv preprint arXiv:1705.06310 (2017)
Thue, A.: Uber unendliche zeichenreihen. Norske Vid Selsk. Skr. I Mat-Nat Kl. (Christiana) 7, 1–22 (1906)
Ukkonen, E.: Constructing suffix trees on-line in linear time. In: Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture-Information Processing 1992, Volume 1-Volume I, pp. 484–492. North-Holland Publishing Co. (1992)
Weiner, P.: Linear pattern matching algorithms. In: 14th Annual Symposium on Switching and Automata Theory, SWAT 1973, pp. 1–11. IEEE (1973)
Wurpel, D.J., Beatson, S.A., Totsika, M., Petty, N.K., Schembri, M.A.: Chaperone-usher fimbriae of Escherichia coli. PloS one 8(1), e52835 (2013)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 IFIP International Federation for Information Processing
About this paper
Cite this paper
Ajala, O. et al. (2019). On the Cyclic Regularities of Strings. In: MacIntyre, J., Maglogiannis, I., Iliadis, L., Pimenidis, E. (eds) Artificial Intelligence Applications and Innovations. AIAI 2019. IFIP Advances in Information and Communication Technology, vol 560. Springer, Cham. https://doi.org/10.1007/978-3-030-19909-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-19909-8_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19908-1
Online ISBN: 978-3-030-19909-8
eBook Packages: Computer ScienceComputer Science (R0)