Keywords

1 Introduction

Formal Languages and Complexity Theory have a long (common) history. Both fields can be seen as two of the major backbones of Theoretical Computer Science. Both fields are often taught together in undergraduate courses, mostly obligatory in Computer Science curricula. This is also testified by looking at classical textbooks like [50]. Yet, modern developments in complexity and algorithmics are barely mirrored in Formal Languages. We want to argue in this paper that this is a fact that need to be changed.

We will work through six case studies to explain several findings in recent years. We will also expose a number of open problems in each of these cases. This should motivate researchers working in Formal Languages to look deeper into recent developments in Complexity Theory, but also researchers more oriented towards these modern aspects of Complexity Theory to look into (possibly apparently rather old) problems in Formal Languages to see if they could offer solutions or at least new approaches to these problems.

In most cases, the problems we discuss in the area of Formal Languages can be easily understood with the already mentioned background knowledge each computer scientists gets already during the corresponding bachelor courses. Therefore, we will only fix notations but assume that no further explanations are necessary. By way of contrast, we will spend some more time explaining at least some ideas of the concepts discussed nowadays in Complexity Theory and Algorithms, so that readers with a background rooted in Formal Languages could easily follow this introductory exposition. In any case, we want to make clear why these modern approaches are particularly suited for attacking decision problems in Formal Languages.

2 Some Modern Concepts of Complexity Theory

As most of our case studies deal with multivariate analysis, let us first delve into the general scheme behind this idea. This is intimately linked to the basic ideas of Parameterized Complexity, which could be paraphrased as follows. If we can show that some decision problem is computationally complicated, which is typically formalized by proving that it is \(\mathsf {NP}\)-hard or PSPACE-hard, what can we do about it then, assuming that we still like to solve it with the help of computers? More traditionally, help was expected from approximation algorithms or, more practically, from heuristics, and in fact the proof of computational hardness often gave a sort of excuse of using these rules of thumb called heuristics. However, in particular using heuristics with no success guarantees is not very satisfying from a more theoretical perspective at least. What can be done about such a situation?

It is exactly here where Parameterized Complexity and also the more practical side of it, namely Parameterized Algorithms, steps in. The basic idea is to not merely look at the bitstring length as the only source of information that an instance of a computational problem might give as defining or measuring its complexity, but to also look at other aspects of the instance, formalized in terms of a so-called parameter. What happens if they are always small in the concrete instances that practitioners look at? Does this still mean that the problem is computationally hard? Or can we possibly solve these practically relevant instances efficiently, although the problem is \(\mathsf {NP}\)-hard when considering all possible instances? Interestingly, one of the most successful approaches within Parameterized Algorithms can be viewed as an analysis of natural heuristics, which means, in more formal terms, the use of reduction rules to provide means to (repeatedly) move from an instance to an equivalent instance of smaller size.

What could these parameter be? Let us look at various examples.

  • The most classical \(\mathsf {NP}\)-hard problem is arguably Satisfiability, or SAT for short. Given a Boolean formula \(\varphi \) in conjunctive normal form (CNF), decide if \(\varphi \) is satisfiable. A possibly natural choice of a parameter could be a bound k on the number of literals that may appear in clauses. When fixing this upper-bound to k, we arrive at problems like k-SAT. This parameter choice is not that helpful, because it is well-known that even 3-SAT remains \(\mathsf {NP}\)-hard, and in fact there are quite a number of much more restricted variants of SAT that cannot be solved in polynomial time, assuming that P does not equal \(\mathsf {NP}\), see [57, 61, 93] for several such restrictions. Only 2-SAT is still solvable in polynomial time.

  • Looking back at the proof of the theorem of Cook, phrased in (nowadays) non-standard terminology in [21], one can argue that the more basic \(\mathsf {NP}\)-complete problem is in fact the following one, which can be considered as a problem arising in the field of Formal Languages: Given a nondeterministic Turing machine M, an input string x and an integer k, does M halt within k steps accepting x? Now, if k is fixed to a small constant, this problem looks more amenable than the previous one. More precisely, let us focus on one-tape machines for now. Assume that we have \(\ell \) symbols in the work alphabet of M and assume that M has t states. Then at each time step, being in a specific state, upon reading a symbol, M has at most \(3\ell t\) choices to continue its computation, the factor three entering because of the different choices for moves. Moreover, after k steps, at most \({\mathcal O}(tk\ell ^{k})\) many configurations are possible, simply because at most k tape cells could have been visited within k steps, and also the current state and head position has to be memorized. The difference to Turing machines with a dedicated read-only input tape is not of importance here. It can be easily checked if an accepting configuration can be reached from the initial configuration within the directed graph implicitly given by the configuration and the reachability relation. Instead of writing \(\ell \) and t, we can also consider the size of M, with reasonable ways to measure this; the size of the state transition table should be always encaptured here, and this also bounds the number of choices. So, we might write the running time of this algorithm like \({\mathcal O}(|M|^k)\), which is polynomial when k is really fixed, but if k is considered as part of the input, this algorithm is clearly exponential. Similar considerations are valid for multi-tape nondeterministic Turing machines. For future reference, let us call these problems Short NTM Acceptance (when referring to single tapes) and Short Multi-Tape NTM Acceptance.

  • Recall that a hypergraph can be specified as \(G=(V,E)\) with \(E\subseteq 2^V\). Elements of V are called vertices, while elements of E are called (hyper-)edges. In different terminology, V is the ground-set or universe, and E is a set system. \(C\subseteq V\) is called a hitting set if \(e\cap C\ne \emptyset \) for all \(e\in E\). In the decision problem Hitting Set, we are given a hypergraph \(G=(V,E)\) and an integer k, and the question is if one can find a hitting set of cardinality at most k for G. This question is well-known to be \(\mathsf {NP}\)-hard. There is a simple relation to SAT: One can view the hyperedges as clauses. Now, a hitting set corresponds to the set of variables set to true. It is also evident why we need the bound k here: setting all variables to true corresponds to selecting V as a hitting set, and this is clearly a satisfying assignment, because none of the clauses contains any negation. This relation also motivates to study d-Hitting Set, where d upper-bounds the number of vertices in each hyperedge. In contrast to SAT, this restriction looks quite promising, considering the following algorithm: As long as there are hyperedges in G and as long as the integer k is positive, do the following: pick such a hyperedge e and branch according to the at most d possibilities for hitting e. In each case, delete all hyperedges from G that contain e, decrement k and continue. If and only if this loop is ever exited with \(E=\emptyset \), the original instance was a YES-instance. As at most \(d^k\) many possibilities are tested, the running time of this algorithm can be estimated as \({\mathcal O}(d^kp(|G|)\), where p is some polynomial and |G| gives the size of the original input. Without this additional accounting of d, it is not clear how to solve this problem better than \({\mathcal O}(|G|^k)\).

  • Observe that hypergraph instances of 2-Hitting Set can be also viewed as undirected simple graphs, because loops (i.e., singletons in the set of hyperedges) can be easily removed, as the constituent elements must be put into the hitting set. 2-Hitting Set is also known as Vertex Cover. Then, an alternative parameterization might be \(k_d=|V|-k\). This problem is also known as Independent Set, and it can be rephrased as asking for a set I of \(k_d\) many vertices such that no edge contains two vertices from I. Such a set is also known as an independent set. A related question asks, given a graph G and an integer \(\ell \), if there is a clique of size \(\ell \) in G, i.e., a set K of \(\ell \) vertices such that each pair of vertices from K is adjacent. All these decision problems are also known to be \(\mathsf {NP}\)-complete.

With these problems from different fields in mind, it might make sense to consider a decision problem \(\mathcal P\) equipped with a computable parameterization function \(\kappa \) that maps instances to integers. Two ways in which algorithms can behave nicely on instances x might be considered. (a) There is an algorithm that solves instances x of \(\mathcal P\) in time \({\mathcal O}(|x|^{\kappa (x)})\). (b) There is an algorithm that solves instances x of \(\mathcal P\) in time \({\mathcal O}(|x|^df({\kappa (x)}))\) for some constant degree d and some (computable) function f. Both definitions imply that \(\mathcal P\) can be solved in polynomial time, when restricted to instances whose parameter is smaller than some constant c. Yet, possibility (b) means that, assuming d to be reasonably small and f not behaving too badly, then \(\mathcal {P}\) can be solved not only for instances where the parameter \(\kappa (x)\) is bounded by a constant but it may grow moderately, i.e., this is a far more desirable property. (Parameterized) problems \((\mathcal {P},\kappa )\) that satisfy (a) are also said to belong to \(\mathsf {XP}\), while if \((\mathcal {P},\kappa )\) satisfies (b), it belongs to \(\mathsf {FPT}\) (fixed parameter tractable).

Let us look at our example problems again.

  • Let \(\kappa _1\) map a Boolean formula \(\varphi \) to the number of variables that occur in \(\varphi \). Then, \((\mathrm {SAT},\kappa _1)\) is in \(\mathsf {FPT}\), because there are only \(2^{\kappa _1(\varphi )}\) many assignments one has to check to see if \(\varphi \) is satisfiable. Let \(\kappa _2\) map a Boolean formula \(\varphi \) in CNF to the maximum number of literals appearing in any clause of \(\varphi \). Then, unless \(\mathsf {P}\) equals \(\mathsf {NP}\), \((\mathrm {SAT},\kappa _2)\) does not belong to \(\mathsf {XP}\).

  • For an instance \(I=(M,x,k)\) of Short NTM Acceptance, let \(\kappa _3(I)=k\). As argued above, (Short NTM Acceptance\(,\kappa _3)\) belongs to \(\mathsf {XP}\). However, it seems to be hard to put it in \(\mathsf {FPT}\). If we consider the size of M as an additional parameter, i.e., \(\kappa _4(I)=|M|+k\), then we have also seen that (Short NTM Acceptance\(,\kappa _4)\) belongs to \(\mathsf {FPT}\). Similar considerations hold true for the multi-tape case. Possibly more interestingly, our considerations show that also with the parameterization \(\kappa _5\) that adds k and the size of the overall alphabet, we end up in \(\mathsf {FPT}\). A reader knowledgeable about the early days of Descriptional Complexity (within FL) might ponder for a moment if there might be a possibility to put (Short NTM Acceptance\(,\kappa _3)\) into \(\mathsf {FPT}\) by resorting to a theorem of Shannon [84] that states that Turing machines with an arbitrary number of working tape symbols can be simulated by Turing machines with binary tapes. However, this idea is problematic at least for two reasons: (a) the simulating machine (with binary working tape) needs a considerable amount of time for the simulation, this way changing the upper-bound k on the number of steps; (b) in Shannon’s simulation, the order of magnitude of the product of alphabet size and number of states stays the same. In combination, both effects counter-act this idea.

  • Let us study various parameterizations for Hitting Set. Let \(G=(V,E)\) with \(E\subseteq 2^V\) and k form an instance and define \(\kappa _6:(G,k)\mapsto k\), \(\kappa _7:(G,k)\mapsto |V|\), \(\kappa _8:(G,k)\mapsto |E|\) and \(\kappa _9:(G,k)\mapsto \max \{|e |:e\in E\}\). As \(\left( {\begin{array}{c}n\\ k\end{array}}\right) \in {\mathcal O}(n^k)\), in roughly \({\mathcal O}(|V|^k)\) steps, the instance (Gk) can be solved by testing all k-element subsets if they form a hitting set, putting (Hitting Set\(,\kappa _6\)) in \(\mathsf {XP}\). With the same idea, (Hitting Set\(,\kappa _7\)) is in \(\mathsf {FPT}\). By using dynamic programming as explained in [30, 41], also (Hitting Set\(,\kappa _8\)) is in \(\mathsf {FPT}\). Due to the \(\mathsf {NP}\)-hardness of Vertex Cover, if (Hitting Set\(,\kappa _9\)) is in \(\mathsf {XP}\), then \(\mathsf {P}\) equals \(\mathsf {NP}\). When combining parameters, \(\kappa _{10}:=\kappa _6+\kappa _9\), we conclude with the reasoning given above that (Hitting Set\(,\kappa _{10}\)) is in \(\mathsf {FPT}\). This is quite instructive, because it shows that even combining relatively weak parameters, one can obtain relatively nice algorithmic results. More details, also for special cases, can be found in [30,31,32, 99].

  • Reconsider \(\kappa _6:(G,k)\mapsto k\) for Vertex Cover. Then, by the equivalence to 2-Hitting Set, (Vertex Cover\(,\kappa _6\)) belongs to \(\mathsf {FPT}\). However, reparameterizing by \(\kappa _{11}: (G,k)\mapsto |V|- k\), we only know membership in \(\mathsf {XP}\) for (Vertex Cover\(,\kappa _{11}\)). Recall that this is equivalent to considering (Independent Set\(,\kappa _6\)). By moving over from G to the graph complement \(\overline{G}\), with \(\overline{(V,E)}=(V,\left( {\begin{array}{c}V\\ 2\end{array}}\right) \setminus E)\), one understands that also (Clique\(,\kappa _6\)) has the same complexity status as (Vertex Cover\(,\kappa _{11}\)).

So far, we introduced the classes \(\mathsf {FPT}\) and \(\mathsf {XP}\) of parameterized problems. Clearly, \(\mathsf {FPT}\subseteq \mathsf {XP}\). As often in Complexity Theory, it is unknown if this inclusion is strict, but it is generally assumed that this is the case. Moreover, we have seen examples of parameterized problems that are not in \(\mathsf {XP}\), assuming that \(\mathsf {P}\) is not equal to \(\mathsf {NP}\). In order to have a more refined picture of the world of parameterized problems, it is a good idea to define appropriate reductions. It should be clear what properties such a many-one \(\mathsf {FPT}\) reduction relating problem \((\mathcal {P},\kappa )\) to \((\mathcal {P}',\kappa ')\) should satisfy: (a) it should translate an instance I of \(\mathcal P\) to an instance \(I'\) of \(\mathcal {P}'\) within time \(f(\kappa (I))|I|^{{\mathcal O}(1)}\) for some computable function f; (b) it should preserve the parameterization in the sense there is a function g such that \(\kappa '(I')\le g(\kappa (I))\); (c) I is a YES-instance of \(\mathcal {P}\) if and only if \(I'\) is a YES-instance of \(\mathcal {P}'\). Such a notion allows us to define further complexity classes, based on the idea of being interreducibility with respect to \(\mathsf {FPT}\) reductions, or \(\mathsf {FPT}\) -equivalent. Observe that the classes \(\mathsf {FPT}\) and XP studied so far are closed under FPT reductions. Let \(\mathsf {W}\)[1] be the class of parameterized problems that are \(\mathsf {FPT}\)-equivalent to (Short NTM Acceptance\(,\kappa _3)\), and let \(\mathsf {W}\)[2] be the class of parameterized problems that are \(\mathsf {FPT}\)-equivalent to (Short Multitape NTM Acceptance\(,\kappa _3)\). In fact, there is a whole (presumably infinite) hierarchy of complexity classes captured in the following inclusion chain:

$$ \mathsf {FPT}\subseteq \mathsf {W}[1] \subseteq \mathsf {W}[2] \subseteq \cdots \subseteq \mathsf {XP}. $$

Looking back at our examples, it is known that (Independent Set\(,\kappa _6\)) and (Clique\(,\kappa _6\)) are complete for \(\mathsf {W}[1]\), while (Hitting Set\(,\kappa _6\)) is complete for \(\mathsf {W}[2]\). As we will see in the following sections, many natural parameterizations of computational problems stemming from Formal Languages lead to \(\mathsf {W}[1]\)-hardness results. If we still want to employ the idea of getting \(\mathsf {FPT}\) algorithms, we need to find different, often multiple parameterizations. This approach is also called mutlivariate analysis. We refer to [12, 28, 71] for further discussions. At this point, we only recall that we also used this idea when looking at (Hitting Set\(,\kappa _{10}\)).

There is a nice characterization of \(\mathsf {FPT}\) based on the idea of the existence of a polynomial-time many-one reduction termed kernelization that maps instances I of \((\mathcal {P},\kappa )\) to instances \(I'\), also of \((\mathcal {P},\kappa )\), satisfying \(|I'|\le f(\kappa (I))\) for some computable function f, where |I| yields the size of instances I in the classical sense.Footnote 1 \(I'\) is also called the kernel of I. The existence of kernelizations is often shown by providing a collection of so-called reduction rules that should be applied exhaustively in order to produce the kernel. As an example, the two reduction rules together provide a kernelization for (Vertex Cover\(,\kappa _6\)). (a) Delete vertices of degree zero, or, more formally, \(((V,E),k)\mapsto ((V\setminus \{v\mid v\notin \bigcup _{e\in E}e\}),k)\), and (b) \((G,k)\mapsto (G-v,k-1)\) if there is some \(v\in V(G)\) with more than k neighbors. In fact, it is not hard to see that the resulting kernels \((G',k')\) even satisfy a polynomial bound on the size of \(G'\) (with some reasonable size measure) with respect to \(\kappa _6(G',k')=k'\). We also say that (Vertex Cover\(,\kappa _6\)) admits a polynomial kernel.

With this notion at hand, the question is if one can always produce kernels of polynomial size for parameterized problems in \(\mathsf {FPT}\). This is not the case unless the polynomial-time hierarchy collapses to the third level, which is considered to be unlikely. For instance, under this condition, it can be shown that both (Hitting Set\(,\kappa _7\)) and (Hitting Set\(,\kappa _8\)) and hence also \((\mathrm {SAT},\kappa _1)\) have no polynomial-size kernels.

Another venue that one could follow is refining the questions about optimality of existing algorithms further beyond the question if the decision problem at hand belongs to \(\mathsf {P}\) or if it is \(\mathsf {NP}\)-hard. This line of research is nowadays captured under the umbrella of Fine-Grained Complexity.

For instance, as discussed, there is a trivial algorithm to solve a SAT instance by testing all assignments. Neglecting polynomial factors, as standard by the very definition of \(\mathsf {FPT}\), we can also state that SAT instances on n variables can be solved in time \(O^*(2^n)\), where the \(O^*\)-notation was invented just to suppress polynomial factors. Now, one can ask if there is any algorithm that solves SAT instances in time \(O^*((2-\varepsilon )^n)\) for any \(\varepsilon >0\). No such algorithm is known today. The hypothesis that no such algorithm exists is also called Strong Exponential-Time Hypothesis, or SETH for short. A weaker assumption is to believe that there is no function \(f(n)\in o(n)\) such that SAT, or in this case equivalently also 3-SAT, can be solved in time \(O^*(2^{f(n)})\). This hypothesis is also known as Exponential-Time Hypothesis, or ETH for short.Footnote 2 The outcome of the famous sparsification lemma is sometimes good to know: Under ETH, there is also no \(O^*(2^{o(n)})\)-time algorithm for 3-SAT on instances that have \({\mathcal O}(n)\) many clauses. For instance, it is known that under ETH, no \(2^{o(n)}\) algorithm exists for solving Hitting Set on instances with n vertices. We also refer to [22]. Further consequences of this approach to the (non-)existence of certain types of \(\mathsf {FPT}\) algorithms are also discussed in the survey paper [62]. For instance, while there are quite a number of algorithms that solve (Vertex Cover\(,\kappa _6\)) in time \(O^*(2^{{\mathcal O}(\kappa _6(G))})\), under ETH there is no algorithm for doing this in time \(O^*(2^{o(\kappa _6(G))})\). It should be clear that for obtaining such results, another form of reduction is needed. We do not explain any details here, but just observe that many well-known reductions suffice for showing some basic ETH-based results. For instance, the typical textbook reductions for showing \(\mathsf {NP}\)-hardness of Vertex Cover start from 3-SAT and then introduce gadgets with two or three vertices for variables or clauses, respectively. Hence, by the outcome of the sparsification lemma, under ETH there is no \(O^*(2^{o(|V|)})\)-time algorithm for computing a minimum vertex cover for \(G=(V,E)\). However, not all lower bounds of this type that one assumes to hold can be shown to be rooted in ETH. For instance, the Set Cover Conjecture claims (using the vocabulary of this paper) that (Hitting Set\(,\kappa _7\)) cannot be solved in time \(O^*(2^{o(\kappa _7(G))})\). As discussed in [22], it is not clear how this (plausible) conjecture relates to ETH.

Finally, one might wonder how to attack \(\mathsf {XP}\)-problems, trying to understand how the parameter(s) influence the running time. For instance, consider the problem of finding a clique of size k in a graph with n vertices. Using brute-force, this problem can be solved in time \(O^*(\left( {\begin{array}{c}n\\ k\end{array}}\right) )={\mathcal O}(n^{k+c})\) for some small constant c. Nešetřil and Poljak showed in [66] that this can be improved to \({\mathcal O}(n^{k\omega /3+c})\), where \(\omega \) is the exponent such that \(n\times n\) matrices can be multiplied in time \({\mathcal O}(n^\omega )\). The underlying idea is that triangles can be found by multiplying adjacency matrices. Nowadays, it is believed that this is indeed the correct bound for detecting k-cliques. The hypothesis that no better algorithms are possible than those intimately linked to matrix multiplication is known as k-Clique Conjecture. This is one of the various examples of conjectures within the realm of polynomial-time algorithms on which several hardness assertions are based. In this context, it is also worth mentioning that there is a common belief that \(\omega >2\); this also links to interesting combinatorial conjectures as exhibited in [5]. We will re-encounter this conjecture in our last case study. Virginia Vassilevska Williams wrote several surveys on Fine-Grained Complexity, the most recent published one being [104].Footnote 3 When dealing with distinguishing problems within \(\mathsf {P}\), also adapted notions of reductions have to be introduced. For the sake of space and because this is not that central to this paper, we are not going to present them here but refer to the mentioned survey papers. Also due to the nature of these reductions, in this part of the complexity world, polylogarithmic factors in the running time are often ignored, leading to notations like \(\widetilde{O}(n^2)\) for denoting quadratic running times up to terms like \((\log (n))^3\). Vassilevska Williams put the central question of Fine-Grained Complexity as follows in the survey to appear in [85]: For each of the problems of interest with textbook running time \({\mathcal O}(t(n))\) and nothing much better known, is there a barrier to obtaining an \({\mathcal O}(t(n)^{1-\varepsilon })\) time algorithm for \(\varepsilon > 0\)? Notice this formulation ignores polylogarithmic factors. Also, SETH perfectly fits into this line of questions.

Many more details can be found in the textbooks that have appeared in the meantime in the context of Parameterized Complexity, often also capturing aspects of ETH and also of SETH. We refer to [23,24,25, 40, 42, 70].

In the following, we present six case studies, focusing on typical effects that show up when dealing with computational problems in Formal Languages. We start with a problem dealing with strings only, continuing with problems involving grammars and automata. There are some common themes throughout all these studies, for instance, the (sometimes surprising) role played by the size of the alphabet concerning the complexity status of the problems. Another recurring theme is that rather simple algorithms can be shown to be optimal.

3 First Case Study: String-to-String Correction S2S

The edit distance is a measure introduced to tell the distance between two strings \(S,T\in \varSigma ^*\), where S is the source string and T is the target string, by counting the number of elementary edit operations that are necessary to turn S into T. The complexity of this problem depends on the permitted operations. Let O be the set of permitted operations, \({O\subseteq \{\mathbf{C },\mathbf{D },\mathbf{I },\mathbf{S }\}}\), with:

  • C Change: replace/substitute a letter

  • D Delete a letter

  • I Insert a letter

  • S Swap: transpose neighboring letters

For each O, define the problem O-String-to-String Correction, or O-S2S for short, with input \(\varSigma \), \(S,T\in \varSigma ^*\), \(k\in \mathbb {N}\), to be the following question: Is it possible to turn S into T with a sequence of at most k operations from O?

Wagner [97] obtained a by now classical dichotomy result that can be stated as follows.Footnote 4

Theorem 1

Consider \({O\subseteq \{\mathbf{C },\mathbf{D },\mathbf{I },\mathbf{S }\}}\).

  • If \({O\in \{\{\mathbf{S },\mathbf{D }\},\{\mathbf{S },\mathbf{I }\}\}}\), then O-S2S is \(\mathsf {NP}\)-complete.

  • If \({O\notin \{\{\mathbf{S },\mathbf{D }\},\{\mathbf{S },\mathbf{I }\}\}}\), then O-S2S is solvable in polynomial time.

Note: \({\{\mathbf{S },\mathbf{D }\}}\)-S2S is equivalent to \({\{\mathbf{S },\mathbf{I }\}}\)-S2S.

How is this dichotomy result obtained? What is the source of \(\mathsf {NP}\) -hardness? Conversely, how do the algorithms work? Here, dynamic programming is the key, and the corresponding algorithms (or variants thereof) have made their way into textbooks on algorithms.

Let us first study the \(\mathsf {NP}\)-hard variant, focusing on \({\{\mathbf{S },\mathbf{D }\}}\)-S2S. What are natural parameters of an instance I defined by \(\varSigma \), \(S,T\in \varSigma ^*\), \(k\in \mathbb {N}\)? From the viewpoint of now traditional parameterized complexity, \(\kappa _1(I)=k\) is a first pick. This has been considered in [3]. Its main result can be stated as follows.

Theorem 2

\({\{\mathbf{S },\mathbf{D }\}}\)-S2S with parameter \(\kappa _1(\varSigma \), \(S,T\in \varSigma ^*\), \(k)=k\) is in \(\mathsf {FPT}\). More precisely, an instance \(I=(\varSigma \), \(S,T\in \varSigma ^*\), k) can be solved in time \({\mathcal O}(\varphi ^k(|S|)\log (|\varSigma |))\), where \(\varphi <1.62\) is the golden ratio number.

In addition, a polynomial kernel was obtained in [101]. One of the important observations is that we can assume to always execute \(k_1= {|S|-|T|}\) deletions prior to swaps. Moreover, the at most \(k-k_1\) swaps can be described by one position in the string. Hence, k is upper-bounded by a function in |S| and we can use the previously mentioned algorithm to prove:

Proposition 3

 \({\{\mathbf{S },\mathbf{D }\}}\)-S2S with parameter \(\kappa _2(\varSigma ,S,T,k)=|S|\) is in \(\mathsf {FPT}\).

With quite a similar reasoning, one can obtain the next result.

Proposition 4

 \({\{\mathbf{S },\mathbf{D }\}}\)-S2S with parameter \(\kappa _3(\varSigma ,S,T,k)=|T|\) is in \(\mathsf {FPT}\).

The previous two results seem to be a bit boring, because |S| and |T| appear to be the natural choice to describe the overall size of the input. Also, these string lengths would be rather big in practice, while one could assume k to be rather small if one compares strings that are somehow similar to each other.

There is one last choice of a parameter that one might tend to overlook when first being confronted with this problem, namely, the size of the alphabet over which the strings S and T are formed. Yet, studying the proof of \(\mathsf {NP}\)-hardness of Wagner, one is led to the conclusion that \(|\varSigma |\) is crucial for this proof, which simply does not work if \(|\varSigma |\) is bounded, for instance, if we consider binary alphabets only. This might look a bit surprising, as it might be hard to imagine that the alphabet size itself could carry such an importance, given the fact that the alphabet carries no visible or obvious structure. Yet, the consideration of the alphabet size will be a recurring theme in this paper, and we will see various situations where this is in fact a crucial parameter. This problem was first resolved in [36] for binary alphabets, showing the following result.

Proposition 5

 \({\{\mathbf{S },\mathbf{D }\}}\)-S2S on binary alphabets can be solved in cubic time.

This was soon superseded by a more general result by Meister [64], showing that indeed the size of the alphabet was the crucial source of hardness for this problem. This was later improved by Barbay and Pérez-Lantero [10] concerning the dependence of the degree of the polynomial describing the running time on the alphabet size parameter.

Theorem 6

 \({\{\mathbf{S },\mathbf{D }\}}\)-S2S with parameter \(\kappa _4(\varSigma ,S,T,k)=|\varSigma |\) is in \(\mathsf {XP}\). More precisely, it can be solved in time \({\mathcal O}(|S|^{|\varSigma |+1})\).

Still, this result is likely only practical only for very small alphabet sizes. Also, it is still open if one can put the problem in \(\mathsf {FPT}\). From a more practical perspective, as the parameters k and \(\varSigma \) are rather unrelated and also because the algorithmic approaches leading to Theorems 2 and 6 are quite different, it would be interesting to see if one can combine both ideas to produce an algorithm that is really useful for instances with a small alphabet size and a moderate number of permitted edit operations. Of course, this should be also checked by computational experiments that seem to be lacking so far.

It would also be interesting to study further parameters for this problem. In [10], several such suggestions have been considered. For instance, Barbay and Pérez-Lantero [10] have shown the following consequence for their algorithm. This relates to the number of deletions \((|T|-|S|)\) already studied above.

Proposition 7

\({\{\mathbf{S },\mathbf{D }\}}\)-S2S with parameter \(\kappa _5(\varSigma ,S,T,k)=|\varSigma |+(|T|-|S|)\) is in \(\mathsf {FPT}\). More precisely, it can be solved in time \({\mathcal O}^*((|T|-|S|)^{|\varSigma |})\).

Let us also discuss some fine-grained complexity results for \({\{\mathbf{C },\mathbf{D },\mathbf{I }\}}\)-S2S. This problem is also known as computing the edit distance (in a more restricted sense) or as computing the Levenshtein-distance between two strings. The already mentioned textbook algorithms, often based on [98], take quadratic time. Whether or not these are optimal was actually a question already investigated 40 years ago. In those days, however, lower bound results were dependent on particular models of computation, while more modern approaches to lower bounds are independent of such a choice. For instance, Wong and Chandra [106] showed a quadratic lower bound assuming that only equality tests of single symbols are permitted as the basic operation of comparison. Masek and Paterson [63] managed to shave off a logarithmic factor by making some clever use of matrix multiplication tricks. Yet, whether essentially better algorithms are possible remained an open question up to recently. Backurs and Indyk [8] finally proved that assuming SETH, no \({\mathcal O}(n^{2-\varepsilon })\)-time algorithm can be expected for any \(\varepsilon >0\). Interestingly enough, their proof was working only for alphabet sizes at least seven. This result has then been improved by Bringmann and Künnemann [14] to binary alphabets. Let us summarized these results in the following statement.

Theorem 8

\({\{\mathbf{S },\mathbf{D },\mathbf{I }\}}\)-S2S can be solved in quadratic time. Assuming SETH, no \({\mathcal O}(|S|^{2-\varepsilon })\)-time algorithm exists even on binary alphabets, for any \(\varepsilon >0\).

Suggestions for Further Studies. (a) Does \({\{\mathbf{S },\mathbf{D }\}}\)-S2S with parameterization \(\kappa _4(\varSigma ,S,T,k)=|\varSigma |\) belong to \(\mathsf {FPT}\), or is it hard or even complete for some level of the \(\mathsf {W}\)-hierarchy? (b) Assuming that \({\{\mathbf{S },\mathbf{D }\}}\)-S2S with parameter \(\kappa _4(\varSigma ,S,T,k)=|\varSigma |\) belongs to \(\mathsf {FPT}\), is it possible to give, e.g., SETH-based lower bounds for showing that existing algorithms are (close to) optimal? (c) Assuming that existing algorithms for \({\{\mathbf{S },\mathbf{D }\}}\)-S2S with parameter \(\kappa _4(\varSigma ,S,T,k)=|\varSigma |\) are optimal, it might make sense to study \(\mathsf {FPT}\)-approaches even for fixed alphabets, because the running times that are obtainable at present are impractical for, say, the ASCII alphabet, not to speak about Unicode. (D) We are not aware of any studies concerning the polynomial-time approximability of the minimization problem related to \({\{\mathbf{S },\mathbf{D }\}}\)-S2S.

4 Second Case Study: Grammar-Based Compression

One of the main ideas behind data compression algorithms is to use regularities found in an input to find representations that are much smaller than the original data. Although data compression algorithms usually come with their own special data structures, there are some common schemes to be found in the algorithms of Lempel, Ziv, Storer, Szymanski and Welch from 1970s and 1980s [90, 103, 108] that easily generalize to the idea to use context-free grammars producing singleton languages for data compression purposes. Such context-free grammars are also called straight-line programs in the literature. Another perspective on this question is offered by Grammatical Inference, a perspective that can be traced back to the work of Nevill-Manning and Witten [67,68,69] and Kieffer and Yang [55]. We refer to [86] for quite a number of other papers that link to Grammatical Inference and applications thereof.

This leads us to consider the following decision problem, called Grammar-Based Compression, or GBC for short: Given a word w over some alphabet \(\varSigma \) and an integer k, decide if there exists a context-free grammar G of size at most k such that \(L(G)=\{w\}\).

In a sense, this question is still ill-posed, because we did not explain how to measure the size of a grammar. In fact, there are several natural ways to do this. The main results that we are citing in the following do not really depend on the choice. Hence, we follow the common definition that the size of a context-free grammar G is computed by summing up the lengths of all right-hand sides of rules of G.

Based on reductions due to Storer [89], Charikar et al. [19] showed the following complexity result.

Theorem 9

GBC is \(\mathsf {NP}\)-complete (on unbounded terminal alphabets).

Although occasionally there were claims in the literature that such a hardness result would be also true for bounded alphabet sizes (see [7]), this question was in fact open until recently. In the journal version of [17], the authors showed the following result.

Theorem 10

GBC is \(\mathsf {NP}\)-complete (on terminal alphabets of size at least 17).

Let us now again study this problem with a multivariate analysis. With grammars, we have some natural choices of parameters, given as input I an alphabet \(\varSigma \), a word \(w\in \varSigma ^*\) and an integer k: \(\kappa _1(I)=\varSigma \), \(\kappa _2(I)=|w|\), \(\kappa _3(I)=k\). Observe that we can assume (after some straightforward reductions) that \(\kappa _1(I)\le \kappa _3(I)\le \kappa _2(I)\). This indicates that \(\kappa _1\) is the most challenging parameterization, while finding \(\mathsf {FPT}\)-results should be easiest for \(\kappa _2\). We now look at these parameters in the chosen order. Our intuition on the strength of the parameters will be certified.

From Theorem 10, we can conclude:

Corollary 11

GBC with parameterization \(\kappa _1\) is not in \(\mathsf {XP}\), unless \(\mathsf {P}=\mathsf {NP}\).

Theorem 12

[17]. GBC with parameterization \(\kappa _2\) belongs to \(\mathsf {FPT}\). More precisely, instance \(I=(\varSigma ,w,k)\) can be solved in time \({\mathcal O}^*(3^{\kappa _2(I)})\).

We now demonstrate that such an \(\mathsf {FPT}\)-result also holds for \(\kappa _3\).

Theorem 13

GBC with parameterization \(\kappa _3\) belongs to \(\mathsf {FPT}\).

Proof

A context-free grammar \(G=(N,\varSigma ,R,S)\) generating a singleton is called an F-grammar if \(F=\{u\in \varSigma ^+\mid \exists A\in N: A\Rightarrow ^* u\}\). As exhibited in [17], a related Independent Dominating Set problem can be used to compute, for a given finite set F with \(w\in F\), the smallest F-grammar that generates \(\{w\}\) in polynomial time. For each F with \(F\subseteq \varSigma ^+\), \(\sum _{u\in F}|u|\le \kappa _3(\varSigma ,w,k)=k\) we can find the smallest F-grammar for w in polynomial time. As \(|\varSigma |\le k\), only f(k) many sets F have to be considered.    \(\square \)

In [17], the idea of a multivariate analysis has been taken further by considering further properties of the grammars we are looking for. For instance, is there a shortest grammar for w that uses \(\le r\) rules? The paper shows that this question is \(\mathsf {NP}\)-hard. Considering r as a parameter, this problem is in \(\mathsf {XP}\) and \(\mathsf {W}\)[1]-hard.

There are also some studies on the approximability of the related minimization problem Min-GBC, see [7, 17, 19, 51, 59, 81]. We now summarize the main results in this direction. Notice the strange role that the size of the alphabet plays again.

Theorem 14

If \(m^*(w)\) denotes the size of the smallest context-free grammar for string w, then Min-GBC can be approximated in polynomial time up to a factor of \({\mathcal O}\left( \log \left( \frac{|w|}{m^*(w)}\right) \right) \). Conversely, there is no polynomial-time algorithm achieving an approximation ratio better than \(\frac{8569}{8568}\) unless \(\mathsf {P}=\mathsf {NP}\) (unbounded terminal alphabets). Furthermore, if there would be a polynomial-time constant-factor approximation algorithm for Min-GBC on binary alphabets, there would be also some polynomial-time constant-factor approximation algorithm for Min-GBC on unbounded alphabets. Min-GBC is \(\textsf {APX}\)-hard on bounded terminal alphabets.

Charikar et al. [19] also proved an interesting connection to a long standing open problem on approximating so-called addition chains. This approach might be interesting from a Fine-Grained Complexity perspective.

Suggestions for Further Studies. (A) The most natural complexity task is to further reduce the size of the terminal alphabet in Theorem 10. More specifically: Is GBC still \(\mathsf {NP}\)-hard for binary terminal alphabets? From a practical point of view, i.e., when applying this technique to compressing data files, this is crucial to know. (B) Is there a polynomial-time constant-factor approximation algorithm for the smallest grammar problem? (C) Storer and Szymanski [89, 90] studied macro schemes that can be viewed as extensions of context-free grammars. No multivariate analysis of the related \(\mathsf {NP}\)-hard compression problems has been undertaken so far. Notice that recent experimental studies [60] show the potential of these ideas. (D) We also mention generalizations of GBC to finite languages described by context-free grammars (and use them to encode specific words) as proposed in [86] which have not yet been studied from a more theoretical perspective.

5 Third Case Study: Synchronizing Words

A word \(x\in \varSigma ^*\) is called synchronizing for a deterministic finite automaton A, or DFA for short, with \(A=(S,\varSigma ,\delta ,s_0,F)\) if there is a state \(s_f\), such that for all states s, \(\delta ^*(s,x)=s_f\). An automaton is called synchronizable if it possesses a synchronizing word. It is known that A is synchronizable iff for every pair \((s,s')\) of states, there exists a word \(x_{s,s'}\) such that \(\delta ^*(s,x_{s,s'})=\delta ^*(s',x_{s,s'})\). This notion relates to the best known open combinatorial problem in Formal Languages, namely Černý’s Conjecture: Any synchronizable DFA with t states has a synchronizing word of length \(\le (t-1)^2\). We are not going to give further details on this famous combinatorial problem, but only refer to the original paper by Černý [18], to two survey articles [82, 95] that also describe a couple of applications, to one very recent paper [92] that describes the best upper bound of \((85059t^3 + 90024t^2 + 196504t - 10648)/511104\) on the length of a synchronizing word for a t-state synchronizable DFA.

Rather, we will now turn to the related decision problem Synchronizing Words, or DFA-SW for short. The input consists of a DFA A and an integer k. The question is if there is a synchronizing word w for A with \(|w|\le k\). In [26], Eppstein has shown the following complexity result:

Theorem 15

DFA-SW is \(\mathsf {NP}\)-complete.

How could a multivariate analysis of this problem look like? Natural parameterizations of an instance \(I=(A,k)\) with \(A=(S,\varSigma ,\delta ,s_0,F)\) include: \(\kappa _1(I)=|\varSigma |\), \(\kappa _2(I)=|S|\), and \(\kappa _3(I)=k\). Clearly, one could also study combined parameters, like \(\kappa _4(I)=|I|+k\). Also, notice that \(\kappa _5(I)=|\delta |\) corresponds to \(|S^{\varSigma \times S}|\), which would therefore be again a combined parameter. We are going to report on results from [33, 96].

Theorem 16

DFA-SW with parameterization \(\kappa _1(I)=|\varSigma |\) does not belong to \(\mathsf {XP}\), unless \(\mathsf {P}=\mathsf {NP}\).

In fact, the reduction from [26] can be used to show the previous results, as it shows \(\mathsf {NP}\)-hardness for binary input alphabets.

Theorem 17

DFA-SW with parameterization \(\kappa _2(I)=|S|\) lies in \(\mathsf {FPT}\). More precisely, it can be solved in time \({\mathcal O}^*(2^{\kappa _2(I)})\). Yet, it does not admit polynomial kernels unless the polynomial-time hierarchy collapses to the third level.

The \(\mathsf {FPT}\)-algorithm is actually quite simple. It is based on the well-known subset construction, reducing the problem to a path-finding problem in an exponentially large graph. Yet, this algorithm is close to optimal in the following sense.

Proposition 18

Assuming ETH, DFA-SW is not solvable in time \({\mathcal O}^*(2^{o(\kappa _2(I))})\).

This result can be easily obtained by re-analyzing the reduction leading to Theorem 17.

Theorem 19

DFA-SW with parameterization \(\kappa _3(I)=|k|\) is \(\mathsf {W}\)[2]-hard.

We provide a sketch of this hardness result in Fig. 1, also to give an example of a parameterized reduction from Hitting Set introduced above. Observe that now the size of the input alphabet of the resulting DFA is unbounded, as it is the vertex set of the given hypergraph. Recently, Montoya and Nolasco [65] showed that (even) for planar automata, DFA-SW with parameterization \(\kappa _3(I)=|k|\) is complete for the parameterized space complexity class \(\mathsf {NWL}\) that embraces the whole \(\mathsf {W}\)-hierarchy. Also, strong inapproximability results are known for the corresponding minimization problem; see [43].

Fig. 1.
figure 1

An example showing how an \(\mathsf {FPT}\) reduction works.

For the combined parameterization \(\kappa _4\), we can state:

Theorem 20

DFA-SW with parameterization \(\kappa _4(I)=|\varSigma |+k\) lies in \(\mathsf {FPT}\). More precisely, it can be solved in time \({\mathcal O}^*(|\varSigma |^{k})\). Yet, it does not admit polynomial kernels unless the polynomial-time hierarchy collapses to the third level. Moreover, there is no \({\mathcal O}^*((|\varSigma |-\varepsilon )^k)\)-time algorithm, unless SETH fails.

Suggestions for Further Studies. (A) Although the proof sketch in Fig. 1 indicates that DFA-SW remains \(\mathsf {NP}\)-hard for rather restricted forms of automata, it might be interesting to study classes of subregular languages regarding the question if DFA-SW might become simpler when restricted to these classes. (B) There are quite a number of notions similar to synchronizing words that have been introduced over the years, also due to the practical motivation of this notion, see [82]. No systematic study of computational complexity aspects has been undertaken for all these notions. (C) In view of the \(\mathsf {FPT}\) result concerning parameterization \(\kappa _2\), the number of states, the parameterization \(\kappa _5\) might not look that interesting. Yet, as \(\kappa _2\) does not allow for polynomial kernels, this question could be of interest for the variation \(\kappa _5'(I)=|S|+|\varSigma |\).

6 Fourth Case Study: Consistency Problem for DFAs

The problem DFA-Consistency takes as input analphabet \(\varSigma \), two disjoint finite sets of words \(X^+,X^-\subseteq \varSigma ^*\), and some integer t. The question is if there is a DFA A with \(\le t\) states that is consistent with \(X^+,X^-\), i.e., \(L(A)\supseteq X^+\) and \(L(A)\cap X^-=\emptyset \). This problem arises in various contexts, for instance, also in connection with Grammatical Inference; see [47]. Its classical complexity status was settled four decades ago.

Theorem 21

[6, 44]. DFA-Consistency is \(\mathsf {NP}\)-complete.

Let us explore the possible natural choices for parameterizations for instance \(I=(\varSigma ,X^+,X^+,t)\). We could look at \(\kappa _1(I)=|\varSigma |\), \(\kappa _2=|X^+\cup X^-|\), \(\kappa _3(I)=\max \{|w| \mid w\in X^+\cup X^-\}\), \(\kappa _4(I)=t\), and there are quite a number of further ways to parameterize with respect to the sets \(X^+\) and \(X^-\). The \(\mathsf {NP}\)-hardness results (with different constructions) extend to situations when \(\kappa _1(I)=2\) or when \(\kappa _3(I)=2\) or when \(\kappa _4(I)=2\). This immediately entails the following results.

Theorem 22

DFA-Consistency with parameterization \(\kappa _1(I)=|\varSigma |\) does not belong to \(\mathsf {XP}\), unless \(\mathsf {P}=\mathsf {NP}\).

Theorem 23

DFA-Consistency with parameterization \(\kappa _3(I)=\max \{|w| \mid w\in X^+\cup X^-\}\) does not belong to \(\mathsf {XP}\), unless \(\mathsf {P}=\mathsf {NP}\).

Theorem 24

DFA-Consistency with parameterization \(\kappa _4(I)=t\) does not belong to \(\mathsf {XP}\), unless \(\mathsf {P}=\mathsf {NP}\).

Intuitively, this last result might appear most surprising, because there are only four ways how a letter can act with respect to two states. The literature situation was also a bit weird for some time. The result was mentioned in Sect. 1.2 of [75], as well as in [47], referring to an unpublished work of Angluin. However, no proof was given in these two references. Therefore, in the process of writing [33], we contacted Angluin, also to see how our solution compared to hers. We were quite impressed to receceive an email from Angluin within a couple of days, sending us a scanned copy of her proof, dating back to August \(2^\text {nd}\), 1988. An \(\mathsf {NP}\)-hardness proof can now be found in [33].

Interestingly, it is open if DFA-Consistency is \(\mathsf {NP}\)-hard for any constant value of \(\kappa _2(I)\). In the sense of multivariate analysis, we should continue to look into combined parameters. Let \(\kappa _{i,j}\) for \(1\le i<j\le 4\) the parameterization given by \(\kappa _{i,j}(I)=\kappa _i(I)+\kappa _j(I)\).

It was shown in [33] that DFA-Consistency is \(\mathsf {NP}\)-hard even for 3-state DFAs with word lengths at most two in \(X^+\cup X^-\). This implies:

Theorem 25

DFA-Consistency with parameterization \(\kappa _{3,4}(I)\) does not belong to \(\mathsf {XP}\), unless \(\mathsf {P}=\mathsf {NP}\).

Conversely, by a trivial algorithm one can show the following (positive) result. The ETH hardness follows from a construction in [33].

Theorem 26

DFA-Consistency with parameterization \(\kappa _{1,4}(I)=|\varSigma |+t\) belongs to \(\mathsf {FPT}\), namely in time \({\mathcal O}^*(t^{|\varSigma |t})\). Assuming ETH, there is no \({\mathcal O}^*(t^{o(|\varSigma |t)})\)-time algorithm for DFA-Consistency.

The remaining parameter combinations seem to be open. Only one three-parameter combination was found in [33] that admitted a further \(\mathsf {FPT}\)-result, combining \(\kappa _2\), \(\kappa _3\) and \(\kappa _4\).

Suggestions for Further Studies. (A) Quite a number of parameter combinations are still open regarding their complexity status. Also, there are more parameters that could be related to \(X^+\) and \(X^-\). One potentially interesting scenario (pondering practical applications) would be to see what happens if there are much less negative than positive samples. (B) It might be an idea to look into classes of subregular languages and find some that allow for efficient consistency checks. (C) There are related questions that have not yet been studied from a multivariate perspective, for instance, what about Regular Expression Consistency?

7 Fifth Case Study: Lower Bounds for Universality

Possibly, the reader would have expected that we focus on problems like DFA Intersection Emptiness and similar problems traditionally studied (with respect to their complexity) in textbooks on Formal Languages. This will be partially rectified in this section. We will mainly concentrate on Universality, which is the following problem. Given a finite automaton A with input alphabet \(\varSigma \), is \(L(A)= \varSigma ^*\)? Clearly, this makes only sense for nondeterministic automata, as the problem can be solved in linear time for DFAs. Natural parameters are the number t of states of A and the size of \(\varSigma \). \(|\varSigma |\) plays again an important role. Notice that the problem is \(\mathsf {PSPACE}\)-complete in general, but co-\(\mathsf {NP}\)-complete for unary input alphabets; see [56, 87, 88] and possibly more explicit in [34]. Natural parameterizations are \(\kappa _1(A)=|\varSigma |\) and \(\kappa _2(A)=|S|\), where \(A=(S,\varSigma ,\delta ,s_0,F)\) is an NFA. By the classical hardness results, we see:

Proposition 27

Universality. parameterized by \(\kappa _1\), is not in \(\mathsf {XP}\), unless \(\mathsf {P}=\mathsf {NP}\).

Conversely, by the classical subset construction to produce a DFA, followed by final state complementation and a simple emptiness check, one sees:

Proposition 28

Universality. parameterized by \(\kappa _2\), belongs to \(\mathsf {FPT}\).

We are now going to study complexity aspects under the ETH perspective.

Theorem 29

[34]. Unless ETH fails, there is no \({\mathcal O}^*(2^{o(t^{1/3})})\) -time algorithm for deciding Universality on t-state NFAs with unary inputs.

There is a slight gap to the known upper bound by Chrobak [20] who showed:

Theorem 30

Universality on t-state NFAs with unary inputs can be solved in time \(2^{\varTheta ((t\log t)^{1/2})}\).

For larger alphabets, the situation looks a bit different for Universality.

Theorem 31

[34]. Unless ETH fails, there is no \({\mathcal O}^*(2^{o(t)})\) -time algorithm for deciding Universality on t-state NFAs with binary inputs, or larger alphabets.

The results is obtained by a parsimonous reduction from 3-Colorability. This is the correct bound, because the power-set construction gives that Universality on t-state NFAs can be solved in time \({\mathcal O}^*(2^{t})\).

Also to overcome the fact that Universality is \(\mathsf {PSPACE}\)-complete, a length-bounded variant has been introduced. LB-Universality: Given NFA A and length bound \(\ell \), does A accept all words up to length \(\ell \)? This length bound puts the problem into \(\mathsf {NP}\). In fact, it is \(\mathsf {NP}\)-complete. From a multivariate perspective, this introduces a natural third parameter, \(\kappa _3(A,\ell )=\ell \).

Theorem 32

LB-Universality, parameterized by \(\kappa _3\), is \(\mathsf {W}\)[2]-hard.

As there is no formal proof of this result in the literature, we provide an explicit construction. In fact, it is quite similar to the construction illustrated in Fig. 1 which the reader might want to consult.

Proof

We show how to solve any instance \(G=(V,E)\) and k of Hitting Set, parameterized by the size k of the solution, with the help of an instance of LB-Universality, parameterized by \(\kappa _3\). Set \(\varSigma =V\), \(S=\{s_0,s_f\}\cup E\). Let \(s_0\) be the initial and \(E\cup \{s_0\}\) be the set of final states. We include the following transitions in the transition relation.

  • \((s_0,a,e)\) for any \(a\in \varSigma \) and any \(e\in E\);

  • (eae) for any \(e\in E\) and \(a\notin e\);

  • \((e,a,s_f)\) for any \(e\in E\) and \(a\in e\);

  • \((s_f,a,s_f)\) for any \(a\in \varSigma \).

Furthermore, we set \(\ell =k+1\). Now, we claim that there is a hitting set of size at most k in G if and only if there is a word of length at most \(k+1\) that is not accepted by the constructed automaton. Namely, the only way not to accept a word by the automaton would be a word ending in \(s_f\) irrespectively what state \(e\in E\) was entered inbetween. This shows that the encoded set of vertices of the hypergraph indeed hits all hyperedges.    \(\square \)

Membership in \(\mathsf {W}\)[2] is unknown. As the parameterized complexity results for the other two parameters transfer, we get a rather diverse picture of what could happen in a multivariate analysis. As parameters \(\kappa _1\) and \(\kappa _3\) yield intractability results, the following (straightforward) result is interesting for the combined parameter \(\kappa _1+\kappa _3\).

Proposition 33

LB-Universality can be solved in time \({\mathcal O}^*(|\varSigma |^\ell )\).

Namely, just enumerate and test all strings up to length \(\ell \). This has been complemented by the following result that proves conditional optimality of this simple algorithm.

Theorem 34

[34]. There is no algorithm that solves LB-Universality in time \({\mathcal O}^*\left( (|\varSigma |-\varepsilon )^\ell \right) \) for any \(\varepsilon > 0\), unless SETH fails.

Let us finally discuss the issue of kernelization for this problem. The size of an instance is vastly dominated by the size of the transition table. Measured in terms of number of states and input alphabet size, this size can be as large as \({\mathcal O}(2^{|\varSigma | |S|^2})\). Is there any hope to bring this down to a size only polynomial in |S|, a result that would complement Proposition 28? Interestingly, this question seems to be open, while it is possible to show the non-existence of polynomial kernels for the length-bounded variation. We can even show this result for the combined parameter \(\kappa _2+\kappa _3\).

Theorem 35

LB-Universality, parameterized by \(\kappa _2+\kappa _3\), does not admit polynomial kernels, unless the polynomial-time hierarchy collapses to the third level.

Proof

It is known that under the stated complexity assumptions, Hitting Set, parameterized by the number of hyperedges plus an upper-bound k on the size of the solution, does not admit polynomial kernels; see [23]. Now, assume that LB-Universality, parameterized by \(\kappa _2+\kappa _3\), would have a kernelization algorithm A that produces polynomial kernels. Now, start with an instance (Gk) of Hitting Set, where \(G=(V,E)\), and first translate it to an equivalent instance \((A,\ell )\) of LB-Universality, using the construction from Theorem 32. Observe that \(\kappa _2(A,\ell )=|E|\) and \(\kappa _3(A,\ell )=k+1\). Next, run algorithm A, yielding an instance \((A',\ell ')\) of size polynomial in \(\kappa _2(A,\ell )+\kappa _3(A,\ell )\) and hence polynomial in \(|E|+k\). Finally, observe that as LB-Universality is in \(\mathsf {NP}\), while Hitting Set is \(\mathsf {NP}\)-hard, there is a polynomial-time transformation of \((A',\ell ')\) into an equivalent instance \((G',k')\) of Hitting Set. Clearly, also \((G',k')\) would be of polynomial size, measured in \(|E|+k\), which contradicts the non-existence of polynomial kernels.    \(\square \)

Let us mention one further exploit of the construction of Theorem 32.

Theorem 36

Under the Set Cover Conjecture, there is no \({\mathcal O}^*(2^{o(\kappa _2(A))})\)-time algorithm for solving instances A of Universality.

Suggestions for Further Studies. (A) For the simple \(\mathsf {FPT}\)-results for this (and similar) automata problems, polynomial kernel questions have barely been studied. This is also true for all the related classical automata problems. (B) There are slight but noticeable gaps between lower and upper bounds on running times (assuming ETH). More gaps can be found in related automata problems, as discussed in [34]. Is it possible to close these gaps, possibly using hypotheses different from ETH? (C) Unlike this section might suggest, most work has been put into studying automata intersection problems (among the classical algorithmic questions about finite automata); see [34, 73, 91, 100, 102]. Relatively few efforts have been put into related questions or into other automata models; we only mention here [37, 38] and the references given in these papers.

8 Sixth Case Study: Parsing Theory

Coming from FL theory courses, where the Chomsky hierarchy is often taught with indicating a certain relevance to areas like Compiler Construction or also to (Computational) Linguistics, one might get disappointed when actually encountering these two mentioned areas. In the former case, the regular languages seem to be relevant and also some parts of the context-free languages, but not much more. In the second case, the situation is even more disillusioning: there, formal language classes more expressive than context-free but being not much more complex with respect to parsing are most interesting.

Even the typical parsing algorithms like CYK or Earley’s mostly taught at FL undergraduate courses are not really relevant, as their cubic complexity is too much for typical applications in Compiler Construction. Rather, one resorts to deterministic context-free languages, also because they allow for giving unambiguous interpretations in the sense of unique parse trees. But is really necessary to spend cubic time for parsing context-free languages?

In a positive (algorithmic) sense, this question was answered already by Valiant [94] in a paper entitled Parsing (general context-free recognition) in time \({\mathcal O}(n^{\omega })\). Here, n is the length of the string to be parsed, and \(\omega \) is the exponent of multiplying two square matrices. At the time of that paper, this was still Strassen’s multiplication, i.e., \(\omega \approx 2.81\). If we want to use this method in a practical algorithm, this might be still a method of choice. Yet, in theory, \(\omega \) has improved to 2.3727, as shown by Vassilevska Williams in [105]. Whether it can be further improved or not, as well as relations to other problems, is discussed in a recent FOCS paper of Alman and Vassilevska Williams [4]. Alternatives to Valiant’s original algorithm are discussed in [45, 80]. Actually, Rytter called Valiant’s algorithm probably the most interesting algorithm related to formal languages. This is a good reason to study it further here.

A natural question in our context is: Can we parse context-free grammars faster than multiplying matrices? This question was first addressed in a paper of Lee [58] with the title Fast context-free parsing requires fast Boolean matrix multiplication. The drawback of the underlying construction is that this is only true for grammars whose size grows with \(n^6\), where n is the length of the string to be parsed. This is not a very realistic scenario. Abboud, Backurs, and Vassilevska Williams have fixed this issue in [1]. This fine-grained reduction works for a specific CF grammar, so that the previous dependence between grammar size and string length no longer holds. To get an idea how these results look like on a more technical level, we cite the following theorem.

Theorem 37

There is context-free grammar \(G_{fix}\) of constant size such that if we can determine if a string of length n belongs to \(L(G_{fix})\) in T(n) time, then k-Clique on n-vertex graphs can be solved in \({\mathcal O}(T (n^{k/3+1}))\) time, for any \(k \ge 3\).

Hence, under the mentioned k-Clique Conjecture, context-free parsing cannot be faster than \({\mathcal O}(n^\omega )\).

We remark that there are extensions of context-free grammars, like Boolean grammars [72], that admit parsers like Valiant’s; therefore, the lower bounds transfer to them immediately. For several related problems in computational biology, we refer to [13, 74, 107].

We are now reporting on one more problem directly related to parsing, namely to parsing tree-adjoining grammars. Notice that these are quite important for computational linguistics. We are not going to give a detailed introduction into tree-adjoining grammars, but rather refer to the textbook [54] that covers this and similar mechanisms from a linguistic yet mathematically profound perspective. Tree-adjoining grammars extend context-free ones in a way that allows for representing several linguistically relevant features beyond context-free languages. They yield one of the basic examples for mildly context-sensitive languages. The parsing is still possible in polynomial time, more precisely, the textbook algorithm will be in \({\mathcal O}(n^6)\), where n is the length of the string to be parsed. Yet, Rajasekaran and Yooseph’s parser [76, 77] runs in \({\mathcal O}(n^{2\omega })\). While this solved a previously well-known open problem in Computational Linguistics, it is interesting that a negative result pre-dated this algorithmic one. Satta [83] showed a reverse relation, actually inspiring Lee’s work. But, not surprisingly, it comes with a similar drawback: This lower-bound is only true for grammars whose size grows with \(n^6\). Bringmann and Wellnitz [15] have improved this result as follows.

Theorem 38

There is a tree-adjoining grammar \(G_{fix}\) of constant size such that if we can decide in time T(n) whether a given string of length n can be generated from \(G_{fix}\), then 6k-Clique can be solved in time \({\mathcal O}(T(n^{k+1} \log n))\), for any fixed \(k\ge 1\).

A consequence would be: An \({\mathcal O}(n^{2\omega -\varepsilon })\)-algorithm for TAL recognition would prove that 6k-Clique can be solved in time \(\widetilde{\mathcal O}(n^{(2\omega -\varepsilon )(k+1)})\subseteq {\mathcal O}(n^{(\omega /3-\delta )6k})\), contradicting the k-Clique Conjecture.

Suggestions for Further Studies. (A) Tree-adjoining grammars (TAGs) have been quite popular in Computational Linguistics in the 1990s, but this has calmed down a bit due to various shortcomings, both regarding parsing complexity (as discussed above) and the expressiveness of this formalism. Possibly, Formal Languages could help in the second issue by coming up with grammatical mechanisms that are more powerful than TAGs but do not need more computational resources for parsing. For instance, can the ideas underlying Boolean grammars be extended towards TAGs? (B) The whole topic of parsing has been a bit neglected in the Formal Language community. This is something that should change, in the best interest of the FL community. Whoever likes to start working in this direction should not overlook the rich annotated bibliography with nearly 2000 entries by Grune and Jacobs [46], available at https://dickgrune.com/Books/PTAPG_2nd_Edition/Additional.html. (C) Since four decades, it is open if EDT0L systems can be parsed in polynomial time [79, Page 315]. Weakening this question, one could also ask [29] if there is some \({\mathcal O}^*(f(|N|))\)-time algorithm for parsing, where N is the set of nonterminal symbols.

9 Conclusions

With this survey, we could only highlight some of the many results that have been obtained in the meantime regarding multivariate analysis, but also regarding fine-grained complexity results. Yet, there are some common themes, as the role of the alphabet size, or also the richness of natural parameter choices. Another typical observation is that often simple parameterized algorithms cannot be improved under certain complexity assumptions. All this gives these problem a flavor different from, say, graph problems.

We preferred to focus on six problems, rather than trying to discuss all of them. Yet, in these conclusions, we are going to mention at least some further papers.

For instance, there is a vast body of literature on string problems. In fact, string problems were among the first ones where a true multivariate analysis was undertaken (without naming it such); see [27]. For a survey on these types of analyses for string problems, we refer to [16]. String problems have been also further investigated from the viewpoint of fine-grained complexity; see [2].

The related area of pattern matching would have also deserved a closer look. Let us suggest [35, 39] and the literature cited therein for further reading. To the readers otherwise more interested in graph-theoretic problems, it might be interesting to learn that the parameter treewidth well-known from graph algorithms has been also introduced in the context of patterns in [78].

String problems have also tight connections to several problems arising in computational biology. We refrain from giving any further references here, as this would finally surpass any reasonable length of the list of citations, but it should be clear that there are scores of papers on the parameterized and also on the fine-grained complexity of such problems.

In the context of stochastic automata, the Viterbi algorithm is central; its optimality is considered in [9].

Finally, let us discuss possible connections to Descriptional Complexity (within FL). One question of this sort is about smallest representations (within certain formalisms). One such example is also grammar-based compression, another one the minimization of automata or expressions, see [11, 53]. Further on, one could consider questions as Given an automaton, is there an equivalent representation with certain additional restrictions? which are typical for this area, but have not yet been considered from a multivariate or fine-grained angle. We only refer to two survey papers of Holzer and Kutrib [48, 49].