Efficient Recognition of Abelian Palindromic Factors and Associated Results

Iliopoulos, Costas S.; Watts, Steven

doi:10.1007/978-3-319-92016-0_20

Costas S. Iliopoulos¹⁸ &
Steven Watts¹⁸

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 520))

Included in the following conference series:

IFIP International Conference on Artificial Intelligence Applications and Innovations

958 Accesses

Abstract

A string is called a palindrome if it reads the same from left to right. In this paper we define the new concept of an abelian palindrome which satisfies the property of being abelian equivalent to some palindrome of the same length. The identification of abelian palindromes presents a novel combinatorial problem, with potential applications in filtering strings for palindromic factors. We present an algorithm to efficiently identify abelian palindromes, and additionally generate an abelian palindromic array, indicating the longest abelian palindrome at each location. Specifically, for an alphabet of size $|\varSigma | \le \log _2(n)$ and after $\mathcal {O}(n)$ time preprocessing using $\mathcal {O}(n + |\varSigma |)$ space, we may determine if any factor is abelian palindromic in $\mathcal {O}(1)$ time. Additionally, we may determine the abelian palindromic array in $\mathcal {O}(|\varSigma |n)$ time. We further specify the algorithmic complexity when this condition on alphabet size $|\varSigma |$ is relaxed.

You have full access to this open access chapter, Download conference paper PDF

Block Palindromes: A New Generalization of Palindromes

Counting Palindromes in Substrings

Computing Longest Single-arm-gapped Palindromes in a String

1 Introduction

The identification of palindromic factors in strings, has been a much studied area of stringology, due to the interesting combinatorial aspects and the strong ties with genetic analysis, where palindromes often correspond to significant structures in DNA [3].

Variations of the palindrome identification problem have been frequently introduced, for example Karhumäki and Puzynina presented results on k-abelian palindromes on rich and poor words [2]. Holub and Saari considered the problem as applied to binary words, investigating the properties of palindromic factors of binary strings [4].

We introduce our own simple modification to the problem, yet to be explored, namely abelian palindromes. Though an interesting combinatorial problem in it itself, an efficient method of detecting abelian palindromes can potentially provide a filter by which ordinary palindromic factors may be deduced. This follows from the fact that an ordinary palindrome must necessarily also be an abelian palindrome, and therefore the search space may be reduced if non abelian palindromes can be efficiently dismissed.

Likewise, the abelian palindromic array may potentially be used to assist in the calculation of the ordinary palindromic array, for the purpose of performing a greedy factorisation of a string into ordinary palindromes. This follows from the fact that the abelian palindromic array provides an upper bound for the equivalent value in the ordinary palindromic array.

The rest of this paper is organised as follows. In Sect. 2, we present basic definitions and notation on strings as well as definitions and results on abelian palindromes. Section 3 presents various data structures and algorithmic tools used in our final algorithms. In Sect. 4, a detailed implementation of our algorithms are presented, the first being identification of abelian palindromes, the second being the generation of the abelian palindromic array. Our concluding remarks are noted in Sect. 5. Additionally, the pseudocode of our implementation may be found in Sect. 6.

2 Preliminaries

2.1 Basic Terminology

We begin with basic definitions and notation from [1]. Let $x=x[0]x[1] \dots x[n\,-\,1]$ be a string of length $|x|=n$ over a finite ordered alphabet. We consider the case of strings over an integer alphabet $\varSigma $: each character may be replaced by its lexicographical rank in such a way that the resulting string consists of integers in the range $\{0,\ldots ,n-1\}$. We use $\textsc {ord}(x[i])$ to refer to the lexicographical rank of the character x[i]. We use $\varSigma [i]$ to refer to the ith character of $\varSigma $, i.e. $\varSigma [\textsc {ord}(x[i])]=x[i]$. For example, for the alphabet $\varSigma = \{\texttt {a}, \texttt {c}, \texttt {g}, \texttt {t}\}$ we have $\textsc {ord}(\texttt {g}) = 2$ and $\varSigma [2]=\texttt {g}.$

For two positions i and j on x, we denote by $x[i \mathinner {.\,.}j]=x[i]\dots x[j]$ the factor (sometimes called substring) of x that starts at position i and ends at position j (it is of length 0 if $j<i$), and by $\varepsilon $ the empty string of length 0. We recall that a prefix of x is a factor that starts at position 0 ($x[0 \mathinner {.\,.}j]$) and a suffix of x is a factor that ends at position $n-1$ ($x[i \mathinner {.\,.}n-1]$).

Let y be a string of length m with $0<m\le n$. We say that there exists an occurrence of y in x, or, more simply, that y occurs in x, when y is a factor of x. Every occurrence of y can be characterised by a starting position in x. Thus we say that y occurs at the starting position i in x when $y=x[i \mathinner {.\,.}i + m - 1]$.

We denote the reverse string of x by $x^R$ as the string obtained when reading x from right to left, i.e. $x^R=x[n-1]x[n-2]\ldots x[1]x[0]$. We say a string x is a palindrome when $x = x^R$.

We make use of the bit-wise exclusive or (XOR) operation between two binary strings x and y of the same length $|x|=|y|$, denoted $x \oplus y$. This adheres to the standard definition of XOR on two binary strings, i.e. $z[i] = (x[i] + y[i])\pmod 2$ where $z = x \oplus y$. We may similarly apply the XOR operation to integers, $x, y \in \mathbb {Z}$ by converting x and y to their respective binary equivalents, performing the XOR operation, and converting the binary result into an integer. For example, given $x=5, y=11$ we have $x\,\oplus \,y = 5\,\oplus \,11 = \texttt {0101}\,\oplus \,\texttt {1011} = \texttt {1110} = 14$.

2.2 Abelian Palindromes

The concept of abelian strings relates to the idea of disregarding the order of appearance of characters in a string, and concerning ourselves only with the number of occurrences of each character within the string. With this in mind, we wish to define the concept of an abelian palindrome. To facilitate this, we must first recall the definition of a Parikh Vector.

Definition 1

The Parikh vector $\mathcal {P}(T)$ of a string T over the alphabet $\varSigma $, is a vector of size $|\varSigma |$ which enumerates the number of occurrences of each character of the alphabet in T. If the character $c \in \varSigma $ has ordinality $i=\textsc {ord}(c)$ in the lexicographical ordering of the alphabet $\varSigma $, then $\mathcal {P}(T)[i]$ stores the number of occurrences of c in T.

We say that two strings $T_1$ and $T_2$ are abelian equivalent denoted $T_1 \approx _p T_2$ if and only if they have the same Parikh vector, i.e. are permutations of each other.

For example, the string $T_1=\texttt {accgta}$ has the Parikh vector $\mathcal {P}(T_1)=(2, 2, 1, 1)$. The string $T_2=\texttt {gactcac}$ has the same Parikh vector and thus $T_1 \approx _p T_2$. We may now define the concept of an abelian palindrome.

Definition 2

A string T is an abelian palindrome if and only if there exists some palindrome P such that $P \approx _p T$.

Note that in general, a string T will more easily satisfy the abelian palindromic property over the palindromic property. This comes as a direct result of Lemma 1, which follows clearly from Definition 2.

Lemma 1

$$\begin{aligned} T \,\,{ palindromic }&\implies T \,\,{ abelian \,\,palindromic} \end{aligned}$$

(1)

$$\begin{aligned} T \,\,{ not \, abelian \, palindromic }&\implies T \,\,{ not\,\, palindromic} \end{aligned}$$

(2)

Proof

Assume T is palindromic, choose $P=T$. Therefore we have $P=T \approx _p T$. Thus T is abelian palindromic and Statement 1 is proven. Statement 2 follows as the contrapositive of Statement 1. $\square $

3 Tools

3.1 Initial Observations

We wish to efficiently identify abelian palindromic factors within a string. To enable this, we define some further concepts and auxiliary data structures.

From Definition 2, it is clear that whether a string T is an abelian palindrome is dependant on the values in its Parikh vector $\mathcal {P}(T)$, specifically the number of values that are odd or even. We use $|\mathcal {P}(T)|$ to refer to the total number of values in $\mathcal {P}(T)$, and further use $|(\mathcal {P}(T))|_\texttt {odd}$ and $|(\mathcal {P}(T))|_\texttt {even}$ to refer to the number of odd and even values in $\mathcal {P}(T)$ respectively. This notation allows us to succinctly describe the defining quality of an abelian palindrome in Lemma 2.

Lemma 2

T abelian palindromic $\iff $ $0 \le {|(\mathcal {P}(T))|_\texttt {odd}} \le 1$.

Proof

We refer to the length of T as n. We call $T_l = T[0 \mathinner {.\,.}\lfloor \frac{n\,-\,1}{2}-1 \rfloor ]$ the left half of T and $T_r=T[\lceil \frac{n\,-\,1}{2}+1 \rceil \mathinner {.\,.}n-1]$ the right half of T. Note that if n is even, $T=T_l\;T_r$. If n is odd, $T=T_l\,c\,T_r$ where $c=T[\frac{n\,-\,1}{2}]$. For an ordinary palindrome P, it is clear that if a character s occurs m times in $P_l$ it must correspondingly occur m times in $P_r$, to preserve the palindromic property of P.

We first show that if T is an abelian palindrome, the number of odd values in the Parikh vector $|(\mathcal {P}(T))|_\texttt {odd}$ can not exceed 1 by contradiction. Let us assume that $|(\mathcal {P}(T))|_\texttt {odd}> 1$. In this case, we have at least 2 different characters $s_1, s_2 \in \varSigma $ with an odd number of occurrences in T. For any permutation of the characters in T, at least one of these two characters must have all its occurrences contained entirely within $T_l$ and $T_r$, and we call this character s. The character s therefore occurs 2m times in T where m is the number of occurrences of s in $T_l$. Therefore s has an even number of occurrences, which leads us to a contradiction. Therefore we conclude that $0 \le |(\mathcal {P}(T))|_\texttt {odd}\le 1$.

We now show that we can always form a palindrome from a permutation of T when $0 \le |(\mathcal {P}(T))|_\texttt {odd}\le 1$. In the notation below we use $P_l^R$ to represent the reversal of $P_l$.

Let us assume $|(\mathcal {P}(T))|_\texttt {odd}=0$. In this case, $\mathcal {P}(T)$ contains only even values. We distribute the characters evenly to form an even length palindrome P such that $P \approx _p T$ as follows (with braces under characters indicating the number of repetitions of that character):

$$ P_l = \underbrace{\varSigma [0]}_{\frac{1}{2}\mathcal {P}(T)[0]} \; \underbrace{\varSigma [1]}_{\frac{1}{2}\mathcal {P}(T)[1]} \; \dots \; \underbrace{\varSigma [|\varSigma |-1]}_{\frac{1}{2}\mathcal {P}(T)[|\varSigma |-1]} $$

$$ P = P_l^{} \; P_l^R $$

Now let us assume $|(\mathcal {P}(T))|_\texttt {odd}=1$. In this case, $\mathcal {P}(T)$ contains a single odd entry corresponding to some character $c = \varSigma [i]$. We distribute the characters evenly, placing the character c at the centre, to form an odd length palindrome P such that $P \approx _p T$ as follows:

$$ P_l = \underbrace{\varSigma [0]}_{\frac{1}{2}\mathcal {P}(T)[0]} \; \dots \; \underbrace{\varSigma [i-1]}_{\frac{1}{2}\mathcal {P}(T)[i-1]} \; \underbrace{\varSigma [i+1]}_{\frac{1}{2}\mathcal {P}(T)[i+1]} \; \dots \;\underbrace{\varSigma [|\varSigma |-1]}_{\frac{1}{2}\mathcal {P}(T)[|\varSigma |-1]} \; \underbrace{\varSigma [i]}_{\frac{1}{2}(\mathcal {P}(T)[i]-1)} $$

$$ P = P_l^{} \; \varSigma [i] \; P_l^R $$

Thus we have shown that $0 \le |(\mathcal {P}(T))|_\texttt {odd}\le 1$ is both a necessary and sufficient condition for T to be an abelian palindrome. Therefore Lemma 2 follows.

$\square $

3.2 Prefix Parity Integer Array

We aim to describe a new data structure that will prove useful in recognising palindromic factors, beginning with some new definitions.

Lemma 2 provides us with a useful criterion by which we can seek longest abelian palindromes. We first provide some additional definitions which will prove useful.

Definition 3

A prefix Parikh vector $\mathcal {P}_i(T)$ of a string T is the Parikh vector of the ith prefix of T:

$$\begin{aligned} \mathcal {P}_i(T)=\mathcal {P}(T[0 \mathinner {.\,.}i]) \,\,{ for }\,\, 0 \le i \le n-1 \end{aligned}$$

Definition 4

A parity vector $\mathbb {P}(T)$ of a string T over the alphabet $\varSigma $ is a bit vector of length $\varSigma $ which indicates the parity (even or odd) of the number of occurrences of each character of $\varSigma $ in T (0 indicates even, 1 indicates odd):

$$\begin{aligned} \mathbb {P}(T)[i]=\mathcal {P}(T)[i]\pmod 2 \end{aligned}$$

Definition 5

A prefix parity vector $\mathbb {P}_i(T)$ of a string T is the parity vector of the ith prefix of T:

$$\begin{aligned} \mathbb {P}_i(T)=\mathbb {P}(T[0 \mathinner {.\,.}i]) \,\,{ for }\,\, 0 \le i \le n-1 \end{aligned}$$

Definition 6

A parity integer $\hat{\mathbb {P}}(T)$ of a string T over the alphabet $\varSigma $ is a decimal integer representing the value of the parity vector $\mathbb {P}(T)$ when interpreted as a binary number, with the order of magnitude of each bit determined by the lexicographical order of the alphabet $\varSigma $:

$$\begin{aligned} \hat{\mathbb {P}}(T)= \sum _{i=0}^{|\varSigma |-1} 2^i \times \mathbb {P}(T)[i] \end{aligned}$$

Definition 7

A prefix parity integer $\hat{\mathbb {P}}_i(T)$ of a string T is the parity integer of the ith prefix of T:

$$\begin{aligned} \hat{\mathbb {P}}_i(T)=\hat{\mathbb {P}}(T[0 \mathinner {.\,.}i]) \,\,{ for }\,\, 0 \le i \le n-1 \end{aligned}$$

Definition 8

The prefix parity integer array $\hat{\mathbb {P}}_A(T)$ of a string T of length n is an integer array of length n, which contains the value of $\hat{\mathbb {P}}_i(T)$ at each position i:

$$\begin{aligned} \hat{\mathbb {P}}_A(T)[i] = \hat{\mathbb {P}}_i(T)\end{aligned}$$

The prefix parity integer array $\hat{\mathbb {P}}_A(T)$ (example: see bottom of Fig. 1) is the key to identifying longest abelian palindromes in a string T. To observe this, we note that the Parikh vector of a factor of T can be determined by evaluating the difference between the two prefix Parikh vectors at the start and end indexes of the factor. The parity vector and parity integer of a factor can also be determined in a similar way, by employing the bit-wise exclusive or (XOR) operation. We summarise these observations in Lemma 3.

Lemma 3

Given a string T:

$$\begin{aligned} \mathcal {P}(T[i \mathinner {.\,.}j])&= \mathcal {P}_j(T)- \mathcal {P}_{i-1}(T) \end{aligned}$$

(1)

$$\begin{aligned} \mathbb {P}(T[i \mathinner {.\,.}j])&=\mathbb {P}_j(T) \oplus \mathbb {P}_{i-1}(T) \end{aligned}$$

(2)

$$\begin{aligned} \hat{\mathbb {P}}(T[i \mathinner {.\,.}j])&= \hat{\mathbb {P}}_j(T) \oplus \hat{\mathbb {P}}_{i-1}(T) \end{aligned}$$

(3)

Proof

Given a factor $F=T[i \mathinner {.\,.}j]$ we have $T[0 \mathinner {.\,.}j] = T[0 \mathinner {.\,.}i-1]\,F$. Therefore it follows that $\mathcal {P}(T[0 \mathinner {.\,.}j]) = \mathcal {P}(T[0 \mathinner {.\,.}i-1]) + \mathcal {P}(F) \implies \mathcal {P}(F) = \mathcal {P}(T[0 \mathinner {.\,.}j]) - \mathcal {P}(T[0 \mathinner {.\,.}i-1])$. Thus Statement 1 is proven.

Statement 2 follows from Statement 1 by observing that the truth table for the XOR operator is analogous to the parity table for the subtraction operator (when we interpret 0 as even, 1 as odd):

Note that subtraction $\ (\mathrm {mod}\ 2)$ and XOR are both commutative operations, and therefore the order of operations is unimportant for both tables.

Statement 3 is simply an alternative formulation of Statement 2 in the form of parity integers instead of parity vectors. $\square $

Given $\hat{\mathbb {P}}_A(T)$, it now becomes simple to verify whether a factor $T[i \mathinner {.\,.}j]$ is an abelian palindrome, i.e. $0 \le |\mathcal {P}(T[i \mathinner {.\,.}j])|_\texttt {odd} \le 1$.

Lemma 4

Given a text T over the alphabet $\varSigma $, the following holds:

$$\begin{aligned} T[i \mathinner {.\,.}j] \,\,{ abelian }\,{palindromic}\,\, \iff \hat{\mathbb {P}}_j(T) \oplus \hat{\mathbb {P}}_{i-1}(T) \in \{0\} \cup \{2^0, 2^1, \dots , 2^{|\varSigma | - 1}\} \end{aligned}$$

Proof

The lemma follows from the application of previously defined lemmas. We use brackets under runs of characters to indicate the length of that run of characters:

$\square $

Lemma 4 immediately leads us to Lemma 5, which allows us to identify the longest factor of a string T starting at i which is abelian palindromic.

Lemma 5

Given a text T over the alphabet $\varSigma $, the longest abelian palindromic factor of T occurring at position i is $T[i \mathinner {.\,.}j] $ where j satisfies the following:

$$\begin{aligned} j=\max \{j' : \hat{\mathbb {P}}_{j'}(T) \in M(T, i)\} \end{aligned}$$

$$\begin{aligned} M(T, i) = \{\hat{\mathbb {P}}_{i-1}(T) \oplus k : k \in \{0\} \cup \{2^0, 2^1, \dots , 2^{|\varSigma | - 1}\}\} \end{aligned}$$

For a given string T and position i, we call M(T, i) the match set.

Proof

For a fixed i, the longest $T[i \mathinner {.\,.}j]$ which is abelian palindromic is found by determining the largest j, such that i and j satisfy the condition in Lemma 4. We may derive the match set M(T, i) from this condition by employing the fact that XOR is commutative:

Thus for a given i, the largest $j'$ satisfying the above condition gives us the j corresponding to the largest abelian palindromic factor $T[i \mathinner {.\,.}j]$. $\square $

3.3 Rightmost Array

We describe a simple data structure that will prove useful for identifying longest abelian palindromes (Fig. 2).

Definition 9

The rightmost array $\mathcal {R}(A)$ of an integer array A of length n over the alphabet $\{0, \dots , n-1\}$ stores at position i the index of the rightmost occurrence of the integer i in A. If there is no occurrence of i in A then $A[i]=-1$. Formally stated:

4 Algorithms

4.1 Abelian Palindromic Factor Recognition

Our algorithm to generate a function recognising abelian palindromic factors, relies on the construction of the prefix parity integer array $\hat{\mathbb {P}}_A(T)$. As shown in Lemma 4, we are able to determine if $T[i \mathinner {.\,.}j]$ is abelian palindromic by evaluating the truthfulness of the expression $\hat{\mathbb {P}}_j(T) \oplus \hat{\mathbb {P}}_{i-1}(T) \in \{0\} \cup \{2^0, 2^1, \dots , 2^{|\varSigma | - 1}\}$.

Given $\hat{\mathbb {P}}_A(T)$, this expression may be evaluated in $\mathcal {O}(1)$ time for a given i and j. This follows from the fact that XOR is a constant time operation. Additionally, we may check if an integer is a power of 2 in constant time by employing the $\ (\mathrm {mod}\ 2)$ operation.

It is important to note, that these operations are constant time under the assumption that their arguments do not exceed the maximum word size w of the computer implementation used. If we assume that the alphabet size is bounded by the logarithm of n, then this assumption holds, i.e. $|\varSigma | \le \log _2(n)$. Alternatively, the limitation on $|\varSigma |$ need not depend on n, and may instead be expressed in terms of the word size w of a machine. If the word size is w, we may assume these operations are constant for an alphabet size $|\varSigma | \le w$. For a larger $|\varSigma |$, the expression in Lemma 4 may be evaluated in $\mathcal {O}(\frac{|\varSigma |}{w})$ time.

We now consider the construction of $\hat{\mathbb {P}}_A(T)$. It is possible to construct the array directly while maintaining a single instance of $\hat{\mathbb {P}}_i(T)$, by Lemma 6.

Lemma 6

$$\begin{aligned}&\hat{\mathbb {P}}_A(T)[0] \quad = 2^{\textsc {ord}(T[0])} \\&\hat{\mathbb {P}}_A(T)[i] \quad \, = \hat{\mathbb {P}}_A(T)[i-1]+(2\, \mathbb {P}_{i}(T)[\textsc {ord}(T[i])]-1) \times 2^{\textsc {ord}(T[i])} \;\; 0 < i \le n-1 \end{aligned}$$

Proof

The case for $\hat{\mathbb {P}}_A(T)[0]$ is trivially true. We note that $\hat{\mathbb {P}}_A(T)[i] = \hat{\mathbb {P}}_i(T)$ is an integer representation of $\mathbb {P}_i(T)$ interpreted as a binary string. $\mathbb {P}_{i}(T)$ and $\mathbb {P}_{i-1}(T)$ differ by a single bit flip, corresponding to the character encountered at T[i]. Therefore by Definitions 6 and 7, $\hat{\mathbb {P}}_i(T)$ and $\hat{\mathbb {P}}_{i-1}(T)$ will accordingly differ by a single power of 2, specifically $2^{\textsc {ord}(T[i])}$.

Whether $2^{\textsc {ord}(T[i])}$ should be added or subtracted is dependant on the current parity of the character T[i]. This is determined by $\mathbb {P}_{i}(T)[\textsc {ord}(T[i])]$, with 1 corresponding to addition ($+1$) and 0 corresponding to subtraction ($-1$).

Thus the mapping $2b-1$ where $b \in \{0, 1\}$ is the most recently flipped bit $b=\mathbb {P}_{i}(T)[\textsc {ord}(T[i])]$, indicates the appropriate addition ($+1$) or subtraction ($-1$). $\square $

With this iterative equation for $\hat{\mathbb {P}}_A(T)$, we now have all the tools necessary to efficiently determine abelian palindromic factors and solve the problem as stated. We formalise the result in Theorem 1.

Theorem 1

Given a string T of length n over the alphabet $\varSigma $, after $\mathcal {O}(n)$ time preprocessing and $\mathcal {O}(n + |\varSigma |)$ space, we may perform queries to determine if $T[i \mathinner {.\,.}j]$ is abelian palindromic in $\mathcal {O}(1)$ time when $|\varSigma | \le \log _2(n)$.

Additionally with no constraint on the size of $\varSigma $, with $\mathcal {O}(\frac{|\varSigma |}{w}n)$ time preprocessing we may perform such queries in $\mathcal {O}(\frac{|\varSigma |}{w})$ time, where w is the computer word size.

Proof

By using Lemma 6 we may iteratively construct $\hat{\mathbb {P}}_A(T)[i]$ from $\hat{\mathbb {P}}_A(T)[i-1]$ in $\mathcal {O}(1)$ time at each step, while maintaining the $\varSigma $-sized data structure $\mathbb {P}_{i}(T)$, resulting in a total time complexity $\mathcal {O}(n)$ and space complexity $\mathcal {O}(n + |\varSigma |)$ to construct $\hat{\mathbb {P}}_A(T)$.

By evaluating the expression on $\hat{\mathbb {P}}_A(T)$ in Lemma 4, we may then determine if $T[i \mathinner {.\,.}j]$ is abelian palindromic. Evaluating this expression may be performed in $\mathcal {O}(1)$ time when the number of bits required to store $\hat{\mathbb {P}}_i(T)$ is no larger than a single computer word w, i.e. when $|\varSigma | \le \log _2(n) \le w$. In general, $\hat{\mathbb {P}}_i(T)$ may be stored in $|\varSigma |$ bits, requiring $\lceil \frac{|\varSigma |}{w} \rceil $ words to store, and thus a multiplying factor of $\mathcal {O}(\frac{|\varSigma |}{w})$ time is required for all operations involving $\hat{\mathbb {P}}_i(T)$, both when constructing $\hat{\mathbb {P}}_A(T)$ and when evaluating the expression in Lemma 4, corresponding to a single query. $\square $

4.2 Abelian Palindromic Array Algorithm

Our algorithm to generate the abelian palindromic array makes use of Theorem 1 and the rightmost array described in Definition 9. We also make use of Lemma 7.

Lemma 7

The abelian palindromic array P of a string T satisfies:

$$\begin{aligned} P[i] = \max \{\mathcal {R}(j):j \in M(T, \hat{\mathbb {P}}_A(T)[i])\} \end{aligned}$$

Where M is the match set as described in Lemma 5.

Proof

Lemma 5 indicates that the longest abelian palindromic factor occurring at i is $T[i \mathinner {.\,.}j]$ where j is the index of the rightmost prefix parity integer with a value contained in the match set M(T, i).

By Definition 9, this rightmost j can be found by taking the largest value obtained when querying the rightmost array with every member of the match set. $\square $

Theorem 2

Given a string T of length n over the alphabet $\varSigma $, we may determine the abelian palindromic array of T in $\mathcal {O}(|\varSigma |n)$ time and $\mathcal {O}(n + |\varSigma |)$ space, when $|\varSigma | \le \log _2(n)$.

Proof

Via the proof in Theorem 1 we are able to calculate the prefix parity integer array $\hat{\mathbb {P}}_A(T)$ in $\mathcal {O}(n)$ time and with $\mathcal {O}(n + |\varSigma |)$ space.

Since $|\varSigma | \le \log _2(n)$, we know all values of $\mathcal {R}(A)[i] \in \{-1, 0, \dots , n-1\}$. Therefore the rightmost array $\mathcal {R}(\hat{\mathbb {P}}_A(T))$ may be calculated in $\mathcal {O}(n)$ time, by parsing $\hat{\mathbb {P}}_A(T)$ from right to left and storing any new values encountered. Full details are available in the pseudocode in Sect. 6.

We now apply Lemma 7, which enables us to determine the longest abelian palindromic factor occurring at i by performing $|\varSigma |$ constant time queries. Thus a total of $\mathcal {O}(|\varSigma |n)$ constant time queries are required, and the total time complexity to generate the abelian palindromic array is $\mathcal {O}(|\varSigma |n)$. $\square $

5 Conclusion

We have presented two algorithms, the first for recognising whether or not a factor is abelian palindromic, and the second for generating an array which provides the length of the longest abelian palindromic factor at each position in a string.

The proposed algorithms are both dependant on a new data structure called the prefix parity integer array, requiring $\mathcal {O}(n)$ time to compute for a string with an alphabet size $|\varSigma | \le \log _2(n)$. Additional complexity is required to determine the longest abelian palindromic factor for each position, namely $\mathcal {O}(|\varSigma |n)$ time.

The main improvement in this work, would be to remove the need for the current requirement that $|\varSigma | \le \log _2(n)$, in order to obtain our current best complexity time. This appears to be a reasonable goal.

6 Pseudocode

References

Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007)
Book Google Scholar
Karhumäki, J., Puzynina, S.: On k-abelian palindromic rich and poor words. In: Shur, A.M., Volkov, M.V. (eds.) DLT 2014. LNCS, vol. 8633, pp. 191–202. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09698-8_17
Chapter Google Scholar
Subramanian, S., Chaparala, S., Avali, V., Ganapathiraju, M.K.: A pilot study on the prevalence of DNA palindromes in breast cancer genomes. BMC Med. Genomics 9(3), 73 (2016)
Article Google Scholar
Holub, Š., Saari, K.: On highly palindromic words. Discrete Appl. Math. 157(5), 953–959 (2009)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, King’s College London, London, UK
Costas S. Iliopoulos & Steven Watts

Authors

Costas S. Iliopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Steven Watts
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steven Watts .

Editor information

Editors and Affiliations

School of Engineering, Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
University of Piraeus, Piraeus, Greece
Ilias Maglogiannis
University of Thessaly, Lamia, Greece
Vassilis Plagianakos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Iliopoulos, C.S., Watts, S. (2018). Efficient Recognition of Abelian Palindromic Factors and Associated Results. In: Iliadis, L., Maglogiannis, I., Plagianakos, V. (eds) Artificial Intelligence Applications and Innovations. AIAI 2018. IFIP Advances in Information and Communication Technology, vol 520. Springer, Cham. https://doi.org/10.1007/978-3-319-92016-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-92016-0_20
Published: 22 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92015-3
Online ISBN: 978-3-319-92016-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Efficient Recognition of Abelian Palindromic Factors and Associated Results

Abstract

Similar content being viewed by others

Block Palindromes: A New Generalization of Palindromes

Counting Palindromes in Substrings

Computing Longest Single-arm-gapped Palindromes in a String

1 Introduction

2 Preliminaries

2.1 Basic Terminology

2.2 Abelian Palindromes

Definition 1

Definition 2

Lemma 1

Proof

3 Tools

3.1 Initial Observations

Lemma 2

Proof

3.2 Prefix Parity Integer Array

Definition 3

Definition 4

Definition 5

Definition 6

Definition 7

Definition 8

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

3.3 Rightmost Array

Definition 9

4 Algorithms

4.1 Abelian Palindromic Factor Recognition

Lemma 6

Proof

Theorem 1

Proof

4.2 Abelian Palindromic Array Algorithm

Lemma 7

Proof

Theorem 2

Proof

5 Conclusion

6 Pseudocode

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation