Keywords

1 Introduction

Primitive recursion and general recursion (or \(\mu \)-recursion) are well-known and addressed in every textbook on computability. They are based on Peano’s axiomatisation of natural numbers and form a neat definition of computable functions over numbers. They have been studied for a century and are the topic of innumerable articles. Nowadays, computability is not anymore considered to be just about numbers but to be about any kind of information that can be represented and manipulated through textual/symbolic representations. In recent decades, the term recursive has been shifting to be replaced by computable [10] to reflect the preeminence of the computer age and to stress on operational models rather than conceptual definitions.

The present paper advocates an alternative definition of primitive recursion grounded on sequences of symbols, i.e. words, instead of numbers. Although the definition is natural with more than one successor, it has not been much studied. Or rather it has been proved that all the main properties coincide, so that there is little interest in a less refined design.

We feel otherwise for at least two epistemological reasons. The first one is that many articles addressing recursion on words first provide an encoding of words as numbers then work on numbers. It certainly proves that words can be represented as numbers and worked upon this way (at the cost of complexity). Shouldn’t it be the other way round? Numbers are represented by words and all our basic arithmetical algorithms (e.g. multiplication) are taught for decimal representation and implemented with binary-based representations. The second reason is meeting colleagues not keen on proving by induction, and instead, they introduce some numerical measure (e.g. depth of a formula) and then make a (numerical) recursion. When dealing with words, we should use induction to manipulate them (and numbers and recursion when we need counting)Footnote 1.

There are also more practical reasons for word recursion: connections to complexity theory with a natural measure of evaluations and to language theory. The first point is to note that the time complexity naturally corresponds to the size of trace of the evaluations when nothing is reevaluated (dynamic programming) and evaluated only if needed (delayed evaluation).

The connection with language theory is developed in the paper by considering recursion without concatenation/successor (preventing encoding between numbers and words as well as pairing). It already exhibits the ability to decide some non-algebraic languages and do some arithmetic checking. In the rest of the introduction, we present a brief state of the art on recursion on words, then, the complexity connection, some results on language decision without concatenation, and finally, the outline of the paper.

In the literature, the topic is referred to as recursion on string, recursion on word, recursive word/string-functions or recursion on representation. The last denomination often means a representation of natural numbers by words enumerated in shortlex/military order (length then alphabetically) leading to a non-trivial successor word-function. The literature is rather ancient (for computer science) with a peak in the 1960’s. Most of the literature deal with hierarchies and, like almost everything in the field in those days, is number-centric. We concentrate on overviews and more recent and accessible papers (and in English).

The transcription by B. Kapron of notes on a course of S. Cook in Berkeley in 1967 [5] contains the m-adic notation of numbers (digits does not include 0) and relations on weak classes (including polynomial time functions, FP, from [4]). This paper does not exactly use word-functions: it has primitives \(\{n\mapsto 10n+i\}_{0\le i\le 9}\) emulating concatenation on words together with number ordering.

In [6], the authors provide a survey on counterparts on words of classical results for primitive number recursion (Ackermann function, limited recursion, Grzegorczyk and loop hierarchies). They prove that everything coincides.

There exists research on infinite alphabet (not the case here) like [11]. Up to some encoding with numbers, it corresponds to computation over finite sequences of numbers encoded by numbers.

Variations on base functions and operators exist in mentioned papers (e.g. limited recursion for Grzegorczyk hierarchy) as well as others. A restriction to unitary word-functions is considered in [1, 3, 9]. To mention a more recent work, the nowhere defined function is added to primitive recursive word-functions in [7].

As expected, as soon as a class is powerful enough to provide functions to encode and decode from one setting to another, the hierarchies correspond with the number setting. This is a motivation to investigate restrained classes. As far as we know, recursion without concatenation was not investigated.

Comparing to primitive recursion on numbers, the successor function is replaced by left concatenation for every symbol and the recursion operator has to consider every possible first symbol. Various examples of word-functions are provided, some have no counterpart in the number setting like reverting a word or testing whether a word is a palindrome. An encoding of tuples of words on any alphabet as a word in a 2-symbol alphabet is provided; thus multiple recursion is not adding any power to primitive recursion and the number of symbols does not change the hierarchies and complexity classes when there are at least two symbols.

The numbers in Peanos’s axiomatisation are identified with words of a 1-symbol alphabet, and so are the functions. Proving that primitive recursive functions on integers coincide is quite straightforward with the following encoding. Let \({\varSigma }=\{\texttt {a}_1,\texttt {a}_2,\cdots ,\texttt {a}_{{r}}\}\) be the alphabet, the r-adic encoding function of words into natural numbers: \( \langle {\varepsilon }\rangle = 0\) and \({\texttt {a}_k \boldsymbol{\cdot }{w}, \langle \texttt {a}_k \boldsymbol{\cdot }{w}\rangle = k + {r}\boldsymbol{\cdot }\langle {w}\rangle }\). This encoding is onto and corresponds to the ranking number (starting from 0) of the reverse of \({w}\) in the shortlex orderFootnote 2. For example \({\langle \texttt {a}_{i+1} \boldsymbol{\cdot }{w}\rangle =\langle \texttt {a}_{i}\boldsymbol{\cdot }{w}\rangle +1}\) and \({\langle \texttt {a}_{j}\texttt {a}_{i+1}\boldsymbol{\cdot }{w}\rangle =\langle \texttt {a}_{j}\texttt {a}_{i}\boldsymbol{\cdot }{w}\rangle +{r}}\).

The natural evaluation scheme of primitive recursion functions is not very efficient (especially for numbers in unary notations), so a different scheme is used to show the proximity with the Turing machine model. The delayed dynamic evaluation scheme of word-functions is when the functions are called by name (not value) and only the needed expressions are evaluated (delayed) and all the evaluation results are stored so that nothing is re-evaluated (dynamic programming). An evaluation is represented by a direct acyclic graph (DAG) whose nodes are calls to function evaluations. Each node is labelled with the call: expressions of the function and of the arguments and its value (if computed). The DAG-complexity of an evaluation of a function is the number of nodes in the DAG. The size of the input is defined by the sum of the lengths of the input words. Given the expression of the initial function, the out-degrees of the nodes are bounded by present arities; the number of edges is thus linearly bounded in the number of nodes. Nodes are atomic operations, the length of any value is bounded by the input size and the DAG depth. The whole description of the labelled DAG is bounded by a polynomial in the size of its complexity.

The class of polynomial-time functions (from classical complexity theory) corresponds to the class of word-functions such that the DAG-complexity of any evaluation is bounded by a polynomial in the input size. One way, given the expression of the function, it is possible to generate an algorithm that, for any input, builds the DAG and outputs the result in polynomial time. The other way, consider a Turing machine implantation of any polynomial-time algorithm together with a polynomial that bounds its execution time. It is possible to evaluate the entry size and then the polynomial, to get the result in unary and then to do a recursion on the TM simulation. The DAG-complexity is linear in the polynomial value that bounds the iteration time of the Turing machine. Although we are using unary representation, it is still polynomial in the size of the input.

This proof can be adapted to NP (with polynomial-size certificates) and to exponential time. Please note that there also exists syntactic characterisation of FP in the number setting [2].

The paper focuses on primitive recursion without concatenation. Recursion can be used to chop off initial symbols and only suffix of the input can be output preventing the existence of any pairing or encoding function. As functions, they look rather bland; but, as language deciders (as pre-images of the empty word) they prove quite rich. Some algebraic (\(\texttt {a}^n\texttt {b}^n\), palindromes) and non-algebraic (\(\texttt {a}^n\texttt {b}^n\texttt {c}^n\)) languages are decidable. It is also possible to check arithmetical constrains like \(\texttt {a}^n\texttt {b}^m\texttt {c}^{P(n,m)}\) with P polynomial with positive coefficients in two (or more) variables. As a side results, this provides non-trivial examples of unary languages.

Multiple recursion allows to define various functions in one recursion. Usually, this operator is synthesised from single recursion using some pairing function, but no such function is available without concatenation. If multiple recursion is available, any regular language can be decided. Basically, each function corresponds to a state of a finite deterministic automaton.

A rough companion python3 library was developed to manipulate primitive recursive word-functions and check our constructions. It is available at

https://www.univ-orleans.fr/lifo/Members/Jerome.Durand-Lose/Recherche/Companion/2022_DCFS.tgz.

Section 2 collects all the definitions while Sect. 3 provides the expression of various usual functions. Section 4 investigates the concatenation-less primitive recursion functions as language deciders. Section 5 shows that adding multiple recursion to the concatenation-less primitive recursion functions allows to decide all the recursive languages. Concluding remarks and perspectives are gathered in Sect. 6.

2 Definitions

An alphabet, \({\varSigma }\), is a non-empty finite set: \(\{\texttt {a}_1,\texttt {a}_2,\cdots ,\texttt {a}_{{r}}\}\) where \({r}\) is its size. Unless otherwise specified, its size is least 2; The set of words are defined by the free monoid \({\varSigma }^{*}\). Let \(\boldsymbol{\cdot }\) denote the concatenation operator and \({\varepsilon }\) denote the empty word. Teletype fonts are used to denote symbols from \({\varSigma }\) and math fonts to denote words. To ease notation, the concatenation symbol is often omitted, e.g. \(\texttt {a}\texttt {a}\texttt {a}\) stands for \({\texttt {a}\boldsymbol{\cdot }\texttt {a}\boldsymbol{\cdot }\texttt {a}}\).

For any number k, a k-ary (word-)function is a function from \(\left( {\varSigma }^{*}\right) ^{k}\) to \({\varSigma }^{*}\).

The projection of the ith component of a tuple of size n (\(1 \le i \le n\)) is denoted \(\pi _{n}^{i}\). The identity function is denoted id (=\(\pi _{1}^{1}\)). The constant \({\varepsilon }\) function is denoted \({\widehat{\varepsilon }}\) (formally there is one of arity 1, others are generated with compositions and projections). For any symbol a, the 1-ary left concatenation function associated with a, is defined by: \({{}_{\texttt {a}}\boldsymbol{\cdot }({w})= \texttt {a}\boldsymbol{\cdot }{w}= \texttt {a}{w}}\). The notation \(\vec {x}\) denotes a vector of arguments. Sans serif fonts are used to denote functions (in lower case) and operators (capitalised).

Numbers correspond to a 1-symbol alphabet (0 corresponds to \({\varepsilon }\)). The successor of n is denoted S(n) (the only available left concatenation).

Composition Operator. Let jk be positive numbers. Let g be a k-ary function and \(\left( h_i\right) _{1\le i\le k}\) be j-ary functions. The j-ary function \(f={\mathbf {\mathsf{{Comp}}}}(g,(h_{i})_{1 \le i \le k})\) is uniquely defined by:

$$\begin{aligned} f(\vec {x})= & {} g\left( h_{1}(\vec {x}),\cdots ,h_{k}(\vec {x})\right) \end{aligned}$$

where \(\vec {x}\) represents j arguments.

(Single) Recursion Operator on \({\varSigma }\) . Let k be a positive number. Let g be a k-ary function and, for each a of \({\varSigma }\), \(h_{\texttt {a}}\) be a \(k{+}2\)-ary function. The \(k{+}1\)-ary function \(f={\mathbf {\mathsf{{Rec}}}}(g,(h_{\texttt {a}})_{\texttt {a}\in \varSigma })\) is uniquely defined by:

$$\begin{aligned} f(\varepsilon , \vec {y})= & {} g(\vec {y}) \quad \text {and} \\ \forall \texttt {a}\in \varSigma , f(\texttt {a}\boldsymbol{\cdot }w, \vec {y})= & {} h_{\texttt {a}}(w, f(w, \vec {y}), \vec {y}) \end{aligned}$$

where \(\vec {y}\) represents k arguments and \({w}\) is any word in \({\varSigma }^{*}\).

To increase readability, vertical displays of function vectors are often used for composition and recursion.

The set of primitive recursive functions is the smallest set of functions containing the empty-word function, left concatenation for every symbol, all the projections, and closed for the composition and the recursion operators.

From functions, relations are defined as the pre-image of the \({\varepsilon }\). A unary relation represents a subset of \({\varSigma }^{*}\), i.e., a language.

3 First Constructions

In the spirit of the next section, concatenations are avoided as much as possible. Expressions are provided for an alphabet of size 3 (or 2 when the expression is large). The generalisation to larger alphabets is straightforward.

A test is a function that returns \({\varepsilon }\) if and only if the condition is satisfied. It is a membership test for languages and relations.

3.1 Word Manipulations

By composition, it is possible to get any function concatenating a fixed word on the left, e.g. \({}_{\texttt {a}_{1} \texttt {a}_{2} \texttt {a}_{3}} \boldsymbol{\cdot }={\mathbf {\mathsf{{Comp}}}}({}_{\texttt {a}_{1}} \boldsymbol{\cdot },({\mathbf {\mathsf{{Comp}}}}({}_{\texttt {a}_{2}} \boldsymbol{\cdot },({}_{\texttt {a}_{3}} \boldsymbol{\cdot }))))\). By composing with constant empty-word function, it is possible to get any constant function, e.g. \(\widehat{\texttt {a}_{1} \texttt {a}_{2} \texttt {a}_{3}}={\mathbf {\mathsf{{Comp}}}}({}_{\texttt {a}_{1}} \boldsymbol{\cdot },({\mathbf {\mathsf{{Comp}}}}({}_{\texttt {a}_{2}} \boldsymbol{\cdot },({\mathbf {\mathsf{{Comp}}}}({}_{\texttt {a}_{3}} \boldsymbol{\cdot },(\widehat{\varepsilon }))))))\).

Basic operations on words are straightforward. The 2-ary concatenation operator can be generated from composition and recursion:

$$\begin{aligned} \boldsymbol{\cdot }={\mathbf {\mathsf{{Rec}}}}\left( \textsf {id},({\mathbf {\mathsf{{Comp}}}}\left( {}_{\texttt {a}_{1}} \boldsymbol{\cdot },(\pi _{3}^{2})\right) , {\mathbf {\mathsf{{Comp}}}}\left( {}_{\texttt {a}_{2}} \boldsymbol{\cdot },(\pi _{3}^{2})\right) , {\mathbf {\mathsf{{Comp}}}}\left( {}_{\texttt {a}_{3}} \boldsymbol{\cdot },(\pi _{3}^{2})\right) )\right) . \end{aligned}$$

Right concatenation functions can be generated as in:

$$\begin{aligned} \boldsymbol{\cdot }_{\texttt {a}_{1}}={\mathbf {\mathsf{{Rec}}}}\left( \widehat{\texttt {a}_{1}},({\mathbf {\mathsf{{Comp}}}}\left( {}_{\texttt {a}_{1}} \boldsymbol{\cdot },(\pi _{2}^{2})\right) , {\mathbf {\mathsf{{Comp}}}}\left( {}_{\texttt {a}_{2}} \boldsymbol{\cdot },(\pi _{2}^{2})\right) , {\mathbf {\mathsf{{Comp}}}}\left( {}_{\texttt {a}_{3}} \boldsymbol{\cdot },(\pi _{2}^{2})\right) )\right) . \end{aligned}$$

It is possible to manipulate a word as a stack/list with functions to extract the first symbol and the rest of a word:

$$\begin{aligned} \textsf {head}={\mathbf {\mathsf{{Rec}}}}(\widehat{\varepsilon },(\widehat{{\texttt {a}_{1}}}, \widehat{{\texttt {a}_{2}}}, \widehat{{\texttt {a}_{3}}})) \quad \text {and} \quad \textsf {tail}={\mathbf {\mathsf{{Rec}}}}(\widehat{\varepsilon },(\pi _{2}^{1}, \pi _{2}^{1}, \pi _{2}^{1})) \end{aligned}$$

Please note that for head, the first symbol is consumed by the recursion so that it has to be generated again using a concatenation. This phenomenon makes more involving if not prevent the expression of functions without concatenation. In the following, we avoid constant functions (to avoid concatenation), so that needed constants have to be provided as arguments.

The following functions act depending on the presence of \(\texttt {a}_1\) at the beginning of the first argument. The first function returns the rest of the first argument if present, the second argument otherwise. The second function returns its argument with leading \(\texttt {a}_1\) removed (if any).

$$\begin{aligned} \begin{aligned}&\textsf {suppress}_{\texttt {a}_{1}}^{\mathsf {else}}={\mathbf {\mathsf{{Rec}}}}\left( \mathsf {id},\left( \pi _{3}^{1}, \pi _{3}^{3}, \pi _{3}^{3}\right) \right) ,\\&\textsf {suppress}_{\texttt {a}_{1} ?}={\mathbf {\mathsf{{Comp}}}}(\textsf {suppress}_{\texttt {a}_{1}}^{\textsf {else}},(\textsf {id,id})). \end{aligned} \end{aligned}$$

The usual test for equality over numbers does not yield a test for equality but a test to decide whether one word is the reverse of the other. This is because computation in the recursion is done after the recursive call. This is invisible with numbers since in unary notation all words are palindrome.

$$\begin{aligned} \textsf {test}_{\textsf {reverse}}={\mathbf {\mathsf{{Comp}}}}\left( {\mathbf {\mathsf{{Rec}}}}\left( \pi _1^2\left| \begin{array}{l}{\mathbf {\mathsf{{Comp}}}}\left( {\mathbf {\mathsf{{Rec}}}}\left( \textsf {id}\left| \begin{array}{l}\pi _{3}^{1} \\ \pi _{3}^{3}\end{array}\right. \right) \left| \begin{array}{l}\pi _4^2\\ \pi _4^4 \end{array}\right. \right) \\ {\mathbf {\mathsf{{Comp}}}}\left( {\mathbf {\mathsf{{Rec}}}}\left( \textsf {id}\left| \begin{array}{l}\pi _{3}^{3} \\ \pi _{3}^{1}\end{array}\right. \right) \left| \begin{array}{l}\pi _4^2\\ \pi _4^4 \end{array}\right. \right) \end{array}\right) \right| \begin{array}{l}\pi _2^1 \\ \pi _2^2\\ \pi _2^1 \end{array}\right) . \end{aligned}$$

This can be used to test if a word is a palindrome:

$$\begin{aligned} \textsf {test}_{\textsf {palindrome}}={\mathbf {\mathsf{{Comp}}}}\left( \textsf {test}_{\textsf {reverse}} \left| \begin{array}{l} \textsf {id} \\ \textsf {id} \end{array}\right. \right) . \end{aligned}$$

It is possible to reverse a word and then test for equality:

$$\begin{aligned} {\textsf {reverse}}={\mathbf {\mathsf{{Rec}}}}\left( \widehat{\varepsilon }\left| \begin{array}{l}{\mathbf {\mathsf{{Comp}}}}\left( {\mathbf {\mathsf{{Rec}}}}\left. \left( \widehat{\texttt {a}_1}\left| \begin{array}{l} {\mathbf {\mathsf{{Comp}}}} \left( {}_{\texttt {a}_1} \boldsymbol{\cdot }\left| \pi _{2}^{2}\right. \right) \\ {\mathbf {\mathsf{{Comp}}}} \left( {}_{\texttt {a}_2} \boldsymbol{\cdot }\left| \pi _{2}^{2}\right. \right) \end{array}\right. \right) \right| \pi _2^2\right) \\ {\mathbf {\mathsf{{Comp}}}}\left( {\mathbf {\mathsf{{Rec}}}}\left. \left( \widehat{\texttt {a}_2}\left| \begin{array}{l} {\mathbf {\mathsf{{Comp}}}} \left( {}_{\texttt {a}_1} \boldsymbol{\cdot }\left| \pi _{2}^{2}\right. \right) \\ {\mathbf {\mathsf{{Comp}}}} \left( {}_{\texttt {a}_2} \boldsymbol{\cdot }\left| \pi _{2}^{2}\right. \right) \end{array}\right. \right) \right| \pi _2^2\right) \end{array}\right. \right) , \end{aligned}$$
$$\begin{aligned} \textsf {test}_{\textsf {equality}}={\mathbf {\mathsf{{Comp}}}}\left( \textsf {test}_{\textsf {reverse}} \left| \begin{array}{l} \pi _{2}^{1} \\ {\mathbf {\mathsf{{Comp}}}}\left( \textsf {reverse} \left| \pi _{2}^{2}\right. \right) \end{array}\right. \right) . \end{aligned}$$

3.2 Logical Functions

Each of these if functions works like a ternary operator with a condition/test on the first argument returning the second argument if the test succeeds, otherwise the third argument. A test succeeds if it evaluates to the empty word. The most basic function just tests whether the first argument is the empty word (\(\textsf {if}({\varepsilon },y,z)=y\), and \(\forall x\ne {\varepsilon },\textsf {if}(x,y,z)=z\) ). It is defined by:

$$\begin{aligned} \textsf {if}_{\varepsilon }={\mathbf {\mathsf{{Rec}}}}\left( \pi _{2}^{1},(\pi _{4}^{4}, \pi _{4}^{4})\right) . \end{aligned}$$

Conjunction and disjunction operators are defined as 2-ary functions:

$$\begin{aligned} \textsf {and}_{\varepsilon }={\mathbf {\mathsf{{Comp}}}}\left( \textsf {if}_{\varepsilon },(\pi _{2}^{1}, \pi _{2}^{2}, \pi _{2}^{1})\right) \qquad \textsf {and} \qquad \textsf {or}_{\varepsilon }={\mathbf {\mathsf{{Comp}}}}\left( \textsf {if}_{\varepsilon },(\pi _{2}^{1}, \widehat{\varepsilon }, \pi _{2}^{2})\right) . \end{aligned}$$

If a non-\({\varepsilon }\) constant is provided, the negation function can be defined by \({\mathbf {\mathsf{{Comp}}}}\left( \textsf {if}_{\varepsilon },(\pi _{2}^{1}, \pi _{2}^{2}, \widehat{\varepsilon })\right) \). This function has arity 2 (for the constant).

The following functions use the conditions: to start with \(\texttt {a}_{1}\), to belong to the regular language \(\texttt {a}_{1}^{*}\), and to the language \(\texttt {a}_{1}^{+}\):

$$\begin{aligned} \textsf {if}_{{\texttt {a}_{1}} \varSigma ^{*}}= & {} {\mathbf {\mathsf{{Rec}}}}\left( \pi _{2}^{2},(\pi _{4}^{3}, \pi _{4}^{4}, \pi _{4}^{4})\right) ,\\ \textsf {if}_{{\texttt {a}_{1}^*}}= & {} {\mathbf {\mathsf{{Rec}}}}\left( \pi _{2}^{1},(\pi _{4}^{2}, \pi _{4}^{4}, \pi _{4}^{4})\right) , \text {and}\\ \textsf {if}_{\texttt {a}_{1}^{+}}= & {} {\mathbf {\mathsf{{Rec}}}}\left( \pi _{2}^{2},({\mathbf {\mathsf{{Comp}}}}\left( \mathsf {if}_{\texttt {a}_{1}^*},(\pi _{4}^{1}, \pi _{4}^{3}, \pi _{4}^{4})\right) , \pi _{4}^{4}, \pi _{4}^{4})\right) . \end{aligned}$$

3.3 Encoding and Pairing

Any word on any finite alphabet can be encoded on 2-symbol alphabet by:

$$\begin{aligned} \varepsilon\mapsto & {} \texttt {a}_1\texttt {a}_1, \text {and} \\ \texttt {a}_{i_1}\boldsymbol{\cdot }\texttt {a}_{i_2}\boldsymbol{\cdot }\cdots \boldsymbol{\cdot }\texttt {a}_{i_k}\mapsto & {} \texttt {a}_1 \texttt {a}_2^{i_1} \texttt {a}_1 \texttt {a}_2^{i_2}\texttt {a}_1 \cdots \texttt {a}_2^{i_k}\texttt {a}_1. \end{aligned}$$

This function is primitive recursive like its decoding function as constructed below. The special value for \({\varepsilon }\) has to be taken into account both in coding and decoding. The encoding is constructed by concatenating all \(\texttt {a}_1 \texttt {a}_2^{i}\) to a final \(\texttt {a}_1\).

$$\begin{aligned} \textsf {encode}= {\mathbf {\mathsf{{Comp}}}}\left( \textsf {if}_{\varepsilon }\left| \begin{array}{l}\textsf {id}\\ \widehat{\texttt {a}_1 \texttt {a}_2}\\ {\mathbf {\mathsf{{Rec}}}}\left( \widehat{\texttt {a}_1}\left| \begin{array}{l} {\mathbf {\mathsf{{Comp}}}}\big ({}_{\texttt {a}_1 \texttt {a}_2}\boldsymbol{\cdot }\left| \pi _2^2\right. \big )\\ {\mathbf {\mathsf{{Comp}}}}\big ({}_{\texttt {a}_1 \texttt {a}_{2}^{2}}\boldsymbol{\cdot }\left| \pi _2^2\right. \big )\\ {\mathbf {\mathsf{{Comp}}}}\big ({}_{\texttt {a}_1 \texttt {a}_{2}^{3}}\boldsymbol{\cdot }\left| \pi _2^2\right. \big )\\ \end{array}\right. \right) \end{array}\right. \right) . \end{aligned}$$

For decoding, a new \(\texttt {a}_{|{\varSigma }|}\) to be rotated is concatenated on the left for each \(\texttt {a}_1\) but the first. For each \(\texttt {a}_2\) the function \(\textsf {rot}_{\textsf {first}}\) rotates the first symbol of its argument.

$$\begin{aligned} \begin{array}{cc} \begin{array}{rcl} \varepsilon &{} \mapsto &{}\varepsilon \\ \texttt {a}_{k} \boldsymbol{\cdot }w &{}\mapsto &{} \texttt {a}_{k \, \bmod \,\textsf {r}+1} \boldsymbol{\cdot }w \end{array} &{} \qquad \text { and } \textsf {rot}_{\textsf {first }}={\mathbf {\mathsf{{Rec}}}}\left( \widehat{\varepsilon } \left| \begin{array}{l} {\mathbf {\mathsf{{Comp}}}}\left( \texttt {a}_{2} \boldsymbol{\cdot }\left| \pi _{2}^{1}\right. \right) \\ {\mathbf {\mathsf{{Comp}}}}\left( \texttt {a}_{3} \boldsymbol{\cdot }\left| \pi _{2}^{1}\right. \right) \\ {\mathbf {\mathsf{{Comp}}}}\left( \texttt {a}_{1}\boldsymbol{\cdot }\left| \pi _{2}^{1}\right. \right) \\ \end{array}\right. \right) \end{array}. \end{aligned}$$
$$\begin{aligned} \textsf {decode}= {\mathbf {\mathsf{{Comp}}}}\left( \textsf {if}_{\varepsilon }\left| \begin{array}{l}{\mathbf {\mathsf{{Comp}}}}\left( \textsf {tail}\left| \textsf {tail}\right. \right) \\ \widehat{\varepsilon }\\ {\mathbf {\mathsf{{Comp}}}}\left( {\mathbf {\mathsf{{Rec}}}}\left. \left( \widehat{\varepsilon }\left| \begin{array}{l} {\mathbf {\mathsf{{Comp}}}}\left( {}_{\texttt {a}_3}\boldsymbol{\cdot }\left| \pi _2^2\right. \right) \\ {\mathbf {\mathsf{{Comp}}}}\left( \textsf {rot}_{\textsf {first}} \left| \pi _2^2\right. \right) \\ \widehat{\varepsilon }\\ \end{array}\right. \right) \right| \textsf {tail}\right) \end{array}\right. \right) . \end{aligned}$$

The special value for \({\varepsilon }\) allows a simple pairing by concatenation.

$$\begin{aligned} \textsf {pair}={\mathbf {\mathsf{{Comp}}}}\left( \boldsymbol{\cdot }\left| \begin{array}{l} {\mathbf {\mathsf{{Comp}}}}\left( \textsf {encode} \left| \pi _{2}^{1}\right. \right) \\ {\mathbf {\mathsf{{Comp}}}}\left( \textsf {encode} \left| \pi _{2}^{2}\right. \right) \end{array}\right. \right) . \end{aligned}$$

To recover the first and second values of the pair, the middle \(\texttt {a}_1\texttt {a}_1\) should be found while potential leading or ending \(\texttt {a}_1\texttt {a}_1\) encoding \({\varepsilon }\) are treated correctly. To recover the first value, the first \(\texttt {a}_1\) is discarded and \(\texttt {a}_1\texttt {a}_1\) is searched for, preserving only what is crossed.

$$\begin{aligned} \textsf {pair}_{\textsf {first}}= {\mathbf {\mathsf{{Comp}}}}\left( \textsf {decode}\left| {\mathbf {\mathsf{{Rec}}}}\left( \widehat{\varepsilon }\left| \begin{array}{l}{\mathbf {\mathsf{{Comp}}}}\left( \textsf {if}_{\texttt {a}_1\varSigma ^*}\left| \begin{array}{l} \pi _2^1 \\ \widehat{\texttt {a}_1}\\ {\mathbf {\mathsf{{Comp}}}}\left( {}_{\texttt {a}_1}\boldsymbol{\cdot }\left| \pi _2^2\right. \right) \end{array}\right. \right) \\ {\mathbf {\mathsf{{Comp}}}}\left( {}_{\texttt {a}_2}\boldsymbol{\cdot }\left| \pi _2^2\right. \right) \\ {\mathbf {\mathsf{{Comp}}}}\left( {}_{\texttt {a}_3}\boldsymbol{\cdot }\left| \pi _2^2\right. \right) \end{array}\right. \right) \right. \right) . \end{aligned}$$

To recover the second value, the first \(\texttt {a}_1\) is discarded and \(\texttt {a}_1\texttt {a}_1\) is searched for, discarding what is crossed.

$$\begin{aligned} \textsf {pair}_{\textsf {second}}= {\mathbf {\mathsf{{Comp}}}}\left. \left( {\mathbf {\mathsf{{Rec}}}}\left( \widehat{\varepsilon }\left| \begin{array}{l}{\mathbf {\mathsf{{Comp}}}}\left( \textsf {if}_{\texttt {a}_1\varSigma ^*}\left| \begin{array}{l} \pi _2^1 \\ {\mathbf {\mathsf{{Comp}}}}\left( \textsf {pair}_{\textsf {first}}\left| \pi _2^1\right. \right) \\ \pi _2^2 \end{array}\right. \right) \\ \pi _2^2\\ \pi _2^2 \end{array}\right. \right) \right| \textsf {suppress}_{\texttt {a}_{1}?}\right) .\end{aligned}$$

This paring scheme extends straightforwardly to encode any tuple.

4 Primitive Recursion Without Concatenation

Let \({\varSigma }\text {-}\textsf {CL}\text {-}\textsf {PRec}\) be the smallest set of functions containing the empty-word function, all the projections, and closed by the composition and the primitive recursion operators on \({\varSigma }^*\). A direct induction shows that:

Lemma 1

The output of any word-function in \({\varSigma }\text {-}\textsf {CL}\text {-}\textsf {PRec}\) must be a suffix of a word in the input.

In particular, if the input is composed only of \({\varepsilon }\), then the output is \({\varepsilon }\). This limits the computing power and even constrains language recognition: unless a non-\({\varepsilon }\) constant is provided, \({\varepsilon }\) is accepted. This means that if \({\varepsilon }\) is not in the language, a non-empty constant has to be provided in the input.

Since logical operators do not use concatenation, the set of decidable languages/relations is closed under union, intersection and complement (with a constant).

4.1 Some Algebraic Languages Decided in \({\varSigma }\text {-}\textsf {CL}\text {-}\textsf {PRec}\)

Palindromes. Test for palindrome p. 6 does not use concatenation. This language is algebraic, non-ambiguous but not deterministic (it cannot be recognised by deterministic push-down automata, DPDA: it has to guess when the middle of the \({w}\) is read).

Language \(\boldsymbol{\texttt {a}_1^n\texttt {a}_2^n}\). Function \(\textsf {test}_{\texttt {a}_1^n\texttt {a}_2^n}\) first considers the case of input \({\varepsilon }\) (accepted). Otherwise, the input is not \({\varepsilon }\) and is stored as a fail value. The first symbol has to be \(\texttt {a}_1\) (otherwise fail) and then for each discarded \(\texttt {a}_1\), a function that removes one \(\texttt {a}_2\) (or fail) is used on the output.

Technical detail: \(\textsf {test}_{{\texttt {a}_{1}^{n} \texttt {a}_{2}^{n+1}}}^{\textsf {fail}}\) consumes the first \(\texttt {a}_2\) to know when \(\texttt {a}_2^n\) starts; to keep balance \(\textsf {test}_{\texttt {a}_1^n\texttt {a}_2^n}\) consumes the first \(\texttt {a}_1\) before handling the rest of the word to \(\textsf {test}_{{\texttt {a}_{1}^{n} \texttt {a}_{2}^{n+1}}}^{\textsf {fail}}\). The label fail in the name means that a fail value has to be provided as second argument. It differs from the meaning of else since the fail value might not be used to indicate failure.

$$\begin{aligned} \begin{aligned}&\textsf {test}_{{\texttt {a}_{1}^{n} \texttt {a}_{2}^{n+1}}}^{\textsf {fail}}={\mathbf {\mathsf{{Rec}}}}\left( \textsf {id}\left| \begin{array}{l} {\mathbf {\mathsf{{Comp}}}}\left( \textsf {suppress}_{\texttt {a}_{2}}^{\textsf {else}} \left| \begin{array}{l} \pi _{3}^{2} \\ \pi _{3}^{3} \end{array}\right. \right) \\ \pi _{3}^{1} \\ \pi _{3}^{3} \end{array}\right. \right) ,\\&\textsf {test}_{\texttt {a}_{1}^{n} \texttt {a}_{2}^{n}}={\mathbf {\mathsf{{Comp}}}}\left( {\mathbf {\mathsf{{Rec}}}}\left. \left( \widehat{\varepsilon } \left| \begin{array}{l} {\mathbf {\mathsf{{Comp}}}}(\textsf {test}_{\texttt {a}_{1}^{n} \texttt {a}_{2}^{n+1}}^{\textsf {fail}} \left| \begin{array}{l} \pi _{3}^{1} \\ \pi _{3}^{3} \end{array}\right) \\ \pi _{3}^{3} \\ \pi _{3}^{3} \end{array}\right. \right) \right| \begin{array}{l} \textsf {id} \\ \textsf {id} \end{array}\right) . \end{aligned} \end{aligned}$$

If the word is not in \(\texttt {a}_1^n\texttt {a}_2^n\), then either the fail value is used or a \(\texttt {a}_1^n\texttt {a}_2^n\) prefix is removed leaving a non-\({\varepsilon }\) word.

This language is deterministic algebraic (can be recognised by DPDA).

Language \(\boldsymbol{\texttt {a}_1^n\texttt {a}_2^n\texttt {a}_1^m \cup \texttt {a}_1^n\texttt {a}_2^m\texttt {a}_1^m}\). On a word from \(\texttt {a}_1^n\texttt {a}_2^n\texttt {a}_1^m\), \(\textsf {test}_{\texttt {a}_1^n\texttt {a}_2^n}\) should return \(\texttt {a}_1^m\). So that the end of the test is carried out by removing remaining \(\texttt {a}_1\). Removing leading \(\texttt {a}_1^{*}\) is done with \(\textsf {suppress}_{\texttt {a}_{1}^*}\). To avoid consuming one extra symbol (the first \(\texttt {a}_{{\ne }1}\)), one \(\textsf {suppress}_{\texttt {a}_{1} ?}\) is stacked for each \(\texttt {a}_1\) and then the composition is used on a copy of the input.

$$\begin{aligned} \begin{aligned}&\textsf {suppress}_{\texttt {a}_{1}^*}={\mathbf {\mathsf{{Comp}}}}\left. \left( {\mathbf {\mathsf{{Rec}}}}\left( \widehat{\varepsilon } \left| \begin{array}{l} {{\mathbf {\mathsf{{Comp}}}}}\left( \textsf {suppress}_{\texttt {a}_{1} ?} \left| \pi _{3}^{2}\right. \right) \\ \pi _{3}^{3} \\ \pi _{3}^{3} \end{array}\right. \right) \right| \begin{array}{l} \textsf {id} \\ \textsf {id} \end{array}\right) , \\&\textsf {test}_{\texttt {a}_{1}^{n} \texttt {a}_{2}^{n} \texttt {a}_{1}^{m}}={\mathbf {\mathsf{{Comp}}}}\big (\textsf {suppress}_{\texttt {a}_{1}^{*}} \left| \textsf {test}_{\texttt {a}_{1}^{n} \texttt {a}_{2}^{n}}\right. \big ). \end{aligned} \end{aligned}$$

The language \(\texttt {a}_1^n\texttt {a}_2^m\texttt {a}_1^m\) is decided by removing all leading \(\texttt {a}_1\) and then using previous test (swapping \(\texttt {a}_1\) and \(\texttt {a}_2\)): \(\textsf {test}_{\texttt {a}_{1}^{n} \texttt {a}_{2}^{m} \texttt {a}_{1}^{m}}={\mathbf {\mathsf{{Comp}}}}\big (\left. \textsf {test}_{\texttt {a}_{1}^{n} \texttt {a}_{2}^{n}} \right| \textsf {suppress}_{\texttt {a}_{1}^{*}} \big )\).

Since union of decidable languages is decidable, the algebraic language \(\texttt {a}_1^n\texttt {a}_2^n\texttt {a}_1^m \cup \texttt {a}_1^n\texttt {a}_2^m\texttt {a}_1^m\) is decidable. This language is ambiguous.

4.2 Some Non-algebraic Languages Decided in \({\varSigma }\text {-}\textsf {CL}\text {-}\textsf {PRec}\)

Languages \(\boldsymbol{\texttt {a}_1^n\texttt {a}_2^n\texttt {a}_1^n}\). Since intersection of decidable languages is decidable, the language \(\texttt {a}_1^n\texttt {a}_2^n\texttt {a}_1^n=\texttt {a}_1^n\texttt {a}_2^n\texttt {a}_1^m \cap \texttt {a}_1^n\texttt {a}_2^m\texttt {a}_1^m\) is decidable. This language is not algebraic. Similarly, it is possible to prove that the languages \(\texttt {a}_1^n\texttt {a}_2^n\texttt {a}_1^n\cdots \texttt {a}_1^n\) are all decidable.

Languages \(\boldsymbol{\texttt {a}_1^n\texttt {a}_2^{P(n)}}\) with \(\boldsymbol{P}\) Polynomial with Positive Coefficients. The idea is to deal with functions that discard (or fail) the right amount of \(\texttt {a}_2\) according to the number of \(\texttt {a}_1\) for each monomial. So that the result is empty only if the sum matches.

For each monomial, a ternary function is defined. The first argument starts with \(\texttt {a}_1^n\texttt {a}_{\ne 1}\) to provide the value for n. The second argument is the one to remove the \(\texttt {a}_2\) from. The third argument is returned if removing is not possible.

For constant monomial 3, the function is

$$\begin{aligned} \begin{aligned} \textsf {remove}_{\texttt {a}_{2}^{3}}^{\textsf {else}}&={\mathbf {\mathsf{{Comp}}}}\left( \textsf {suppress}_{\texttt {a}_{2}}^{\textsf {else}} \left| \begin{array}{l}{\mathbf {\mathsf{{Comp}}}} \left( \textsf {suppress}_{\texttt {a}_{2}}^{\textsf {else}}\left| \begin{array}{l} \textsf {suppress}_{\texttt {a}_{2}}^{\textsf {else}} \\ \pi _{2}^{2} \\ \end{array}\right. \right) \\ \pi _{2}^{2} \end{array}\right. \right) \\ \textsf {remove}_{\texttt {a}_{2}^{3}}&={\mathbf {\mathsf{{Comp}}}}\left( \textsf {remove}_{\texttt {a}_{2}^{3}}^{\textsf {else}}\left| \begin{array}{l} \pi _{3}^{2}\\ \pi _{3}^{3} \end{array}\right. \right) . \end{aligned} \end{aligned}$$

For the monomial 3x, this is done x times:

$$\begin{aligned} \textsf {remove}_{3\texttt {a}_{2}^{x}}&={\mathbf {\mathsf{{Comp}}}}\left( {\mathbf {\mathsf{{Rec}}}}\left. \left( \pi _3^2\left| \begin{array}{l} {\mathbf {\mathsf{{Comp}}}}\left( \textsf {remove}_{\texttt {a}_{2}^{3}}^{\textsf {else}}\left| \begin{array}{l} \pi _5^2 \\ \pi _5^5 \\ \end{array}\right. \right) \\ \pi _5^4 \\ \pi _5^4 \end{array}\right. \right) \right| \begin{array}{l} \pi _3^1 \\ \pi _3^1 \\ \pi _3^2 \\ \pi _3^3 \\ \end{array}\right) . \end{aligned}$$

For the monomial \(3x^2\), it is done x times again. The function \(\textsf {remove}_{3\texttt {a}_{2}^{x^2}}\) is:

$$\begin{aligned} {\mathbf {\mathsf{{Comp}}}}\left( {\mathbf {\mathsf{{Rec}}}}\left. \left( \pi _3^2 \left| \begin{array}{l} {\mathbf {\mathsf{{Comp}}}}\left( {\mathbf {\mathsf{{Rec}}}}\left. \left( \pi _3^2 \left| \begin{array}{l} {\mathbf {\mathsf{{Comp}}}}\left( \textsf {remove}_{\texttt {a}_{2}^{3}}^{\textsf {else}}\left| \begin{array}{l} \pi _2^5 \\ \pi _5^5 \\ \end{array}\right. \right) \\ \pi _5^4 \\ \pi _5^4 \end{array}\right. \right) \right| \begin{array}{l} \pi _5^3 \\ \pi _5^3 \\ \pi _5^2 \\ \pi _5^5 \\ \end{array}\right) \\ \pi _5^4\\ \pi _5^4\\ \end{array} \right. \right) \right| \begin{array}{l} \pi _3^1 \\ \pi _3^1 \\ \pi _3^2 \\ \pi _3^3 \\ \end{array}\right) . \end{aligned}$$

Even though the definition looks involving, these are just nested for loops.

It is possible to design concatenation-less functions that yield each maximal suffixes of \(\texttt {a}_1^+\texttt {a}_2^+\texttt {a}_3^+\cdots \texttt {a}_m^+\) of the form \(\texttt {a}_k^+\cdots \texttt {a}_m^+\). Hence, all the languages \(\texttt {a}_1^+\texttt {a}_2^+\cdots \texttt {a}_i^{n}\cdots \texttt {a}_j^{P(n)}\cdots \texttt {a}_m^+\) (for given \(i\ne j\) and P) are all decidable.

Using the same tools, it is also possible to test such languages as \(\texttt {a}_1^n\texttt {a}_2^m\texttt {a}_2^{P(n,m)}\) with P polynomial in 2 variables with positive coefficients. More than two variables is similarly possible.

5 Regular Languages are Decidable in \({\varSigma }\text {-}\textsf {CL}\text {-}\textsf {PRec}\) with Multiple Recursion

The multiple recursion operator is usually synthesised with the use of a pairing function, i.e. a one-to-one function from \({\varSigma }^{*}\times {\varSigma }^{*}\) to \({\varSigma }^{*}\). Yet, no such function is available without concatenation since any pairing function would have to map \(\{(\texttt {a}_1^i,\texttt {a}_1^j)\}_{0\le i,j<2}\) to four distinct values, but the only possible outputs are in \(\{{\varepsilon },\texttt {a}_1\}\) (the suffixes). (Adding constants would no work for \(\{(\texttt {a}_1^i,\texttt {a}_1^j)\}_{0\le i,j<k}\) for every k.)

Lemma 2

There is no pairing function in \({\varSigma }\text {-}\textsf {CL}\text {-}\textsf {PRec}\).

Multiple Recursion Operator on \({\varSigma }\) . Let m and k be any positive numbers. Let \(\left( g_i\right) _{1\le i\le m}\) be k-ary functions and, for each a of \({\varSigma }\), \(\left( h_{\texttt {a},i}\right) _{1\le i\le m}\) be \((k{+}m{+}1)\)-ary functions. The \((k{+}1)\)-ary functions

$$\begin{aligned} (f_{i})_{1 \le i \le m}={\mathbf {\mathsf{{Rec}}}}^{m}\Big ((g_{i})_{1 \le i \le m},(h_{\texttt {a}, i})_{\texttt {a}\in \varSigma , 1 \le i \le m}\Big ) \end{aligned}$$

are uniquely defined by \(\forall i,1\le i\le m\):

$$\begin{aligned} \begin{aligned} f_{i}(\varepsilon , \vec {y})&=g_{i}(\vec {y}) \quad \text {and} \\ \forall \texttt {a}\in \varSigma , \quad f_{i}(\texttt {a}\boldsymbol{\cdot }w, \vec {y})&=h_{\texttt {a}, i}(w, f_{1}(w, \vec {y}), \cdots , f_{m}(w, \vec {y}), \vec {y}) \end{aligned} \end{aligned}$$

where \(\vec {y}\) represents k arguments.

The set \({\varSigma }\text {-}\textsf {CL}\text {-}\textsf {PRec}^{*}\) is defined like \({\varSigma }\text {-}\textsf {CL}\text {-}\textsf {PRec}\), but with the addition of the closure by the recursion operators of every arity. Lemma 1 extends to \({\varSigma }\text {-}\textsf {CL}\text {-}\textsf {PRec}^{*}\): the output has to be a suffix of an input.

Regular Languages are Decidable in \(\boldsymbol{\varSigma }\)-CL-PRec\(^{\boldsymbol{*}}\). Let L be a regular language. It is decided by some deterministic finite automaton \((Q,\delta ,q_{0},A)\) where Q is finite set of state, \(\delta \) is the transition table, \(q_{0}\) is the initial state, and Ais the set of accepting states. We suppose that \({\varepsilon }\in L\) (otherwise add a constant to the input and complement).

Let the 2-ary functions \(\left( f_{q}\right) _{q\in Q}\) be defined by multiple recursion from projections by:

$$\begin{aligned} \begin{aligned} \forall \forall q \in A, \quad f_{q}(\varepsilon , w_{1})&= \widehat{\varepsilon }(w_{1})=\varepsilon \\ \forall q \in Q \backslash A, \quad f_{q}(\varepsilon , w_{1})&= \pi _{1}^{1}(w_{1})=w_{1} \\ \forall q \in Q, \quad \forall \texttt {a}\in \varSigma , \quad f_{q}(\texttt {a}\boldsymbol{\cdot }w, w_{1})&= \pi _{|Q|+2}^{i}\Big (w,(f_{s}(w, w_{1}))_{s \in Q}, w_{1}\Big ) \\&= f_{r}(w, w_{1}) \\&\ \ \ \text {where}\,\delta (q, \texttt {a})=r\,\text {and}\,i\,\text {suitably chosen} \end{aligned} \end{aligned}$$

The transition table is encoded in the recursion. The following function decides L.

$$\begin{aligned} \textsf {test}_{L}={\mathbf {\mathsf{{Comp}}}}\left( f_{q_{0}},(\pi _{1}^{1}, \pi _{1}^{1})\right) \end{aligned}$$

6 Conclusion

Word-recursion is a rich context allowing to address words directly and to relate to complexity theory. Although forbidding concatenation seems limiting, it allows to decide non trivial languages. It is open whether all algebraic languages are decidable, and if not, which of them are not and why. More generally, a condition for a function to be (un)computable without concatenation that would rule out functions (e.g., equality) and languages is to be found.

Without concatenation it is still possible to check constrains expressed with a polynomial with positive coefficients. Although we advocate recursion on words, the range of integer languages decidable is also wide; e.g. by testing all possible splitting in two terms, the language \(\{n+n^2|n\in \mathbb {N}\}\) can be decided.

We conjecture that even though this class is restricted, there should be some undecidable properties. For example, emptiness of accepted language might be undecidable (using diophantine equations [8]).

Any function defined without concatenation, f, satisfies \(|f(x_1,\cdots ,x_k)|\le \max (|x_1|,\cdots ,|x_k|)\), so that this class is included in the level \(E_0\) of the Grzegorczyk hierarchy (see [6] for definitions). Relatively to the relations/languages theses classes defined, we lack an example to show that the inclusion is strict. We conjecture that the height of recursion in the function definition provides a proper hierarchy inside the class.

Some of provided constructions rely on duplicating the input. We are wondering whether forbidding duplication leads to a non-trivial class. Otherwise, how can it be characterised?

We would like to close this article by addressing minimisation. The few operators for words in the literature are usually number representation based (related to the shortlex order) in settings where the successor is not a base function but a non-trivial word-function. We want to avoid the influence of numbers and refuse to impose a non-trivial order on words. In the number setting, one can consider the successor function to be just a function to provide from the current one the next value to test. We propose to take that point of view: that the minimisation operator requires another word-function to generate from the current one the next word to try (starting from the empty word), without any constraint on this function (does not have to onto, one-to-one or total, as long as it is in the class). Although it seems more complex, it corresponds to the update of variables in while loops.