1 Introduction

Robots have been used for a very long time to do a variety of tasks like getting an object of interest from a site, inspecting a remote site, etc. A requirement from the modern-day robots is that the user should be able to instruct the robots, asking the robots to execute complex tasks like getting a variety of items spread in the home or office environment, cooking complex recipes, etc. All these operations require the robots to perform multiple operations in the correct sequence. Such a problem is called the mission planning problem. The mission is specified in a formal language by the user that can be understood by the robots. In this paper, we envision a team of robots operating for facilities like home or office with several users who continuously give new instructions to the robots to carry out several tasks. The robotic team must plan and distribute the work among themselves, while optimally solving the entire mission including adapting the solution as new requirements are added or changed continuously by the users.

The classic task planning approaches can be used to specify several missions using atomic propositions. The Linear Temporal Logic (LTL) [1, 2] allows specifying missions where the different atomic propositions have temporal constraints like one proposition should remain true until another proposition gets true. The classic planning techniques including the use of formal languages typically have an exponential complexity and heuristics does a little to make the solutions scalable. This is an ideal situation for the use of evolutionary computation as a planning system, which ensures probabilistic completeness, meaning the probability of finding a solution if one exists tends to one as time tends to infinity; and probabilistic optimality, meaning that the solution tends to the optimal solution as time tends to infinity. The evolutionary computation is used to search for an optimal sequence that solves the mission. For this, it is important that a polynomial-time verification of the solution is possible and hence the solver can only be used for some constrained languages. This is another problem associated with the model verification-based solvers for which even verification is an exponential complexity solution. Many real-life applications are now being solved by using evolutionary computation. The problem is encoded using problem-specific heuristics such that the solution is always valid or can be checked for validity in a polynomial time. As an example, for the Travelling Salesman Problem, a permutation of all places is a valid solution; for classic robot motion planning, a traversal of the path is done in a linear time to check for collisions, etc.

It must be noted that the paper takes only a general inspiration from LTL to specify the mission using a formal language and using a computational method to solve the synthesis problem. The paper itself does not use LTL or any of its variants. The paper does not use the LTL operators for the mission specification. LTL considers infinite traces, while the problem under consideration is for finite strings only. The paper allows the specification of a mission using the AND, OR, and sequencing operations only. A mission specified using these operations is polynomial verifiable and considers only finite strings. The mission planning problem is then the generation of a finite length string that is verified by the verification system, and preferably the string has the smallest path cost. Mission specified using only AND, OR and sequencing operations may appear to have very little expressive power. However, the paper only considers service robots that are useful to do the everyday chores of humans, and a lot of complex operations that humans may require to be done by the robots for everyday life are covered by these 3 operators. The language has not been checked for utility for other domains.

In this paper, the robot mission planning problem is being solved. A task is defined as an instruction given by a user consisting of Boolean and temporal constraints which must be solved by the robot. Numerous tasks coming from the same or different users constitute a mission. The use case is that there are numerous users in an office building. There is a fleet of robots catering to the needs of all the users. A robot solving any task sub-optimally may mean taking in hours for a task that could be otherwise completed in a very few minutes.

The incremental nature of the problem states that the users keep adding task specifications, as they may have requirements. A user may at any instance of time need a specific task (or a collection of tasks) to be given to the robot, while the robot may already be doing some other tasks of the same or other users. The robot neither knows about a prospective task coming before a user gives the task to the robot nor does the robot have a knowledge of the nature and timing of the future tasks that may come. Upon addition of a new task, a complete re-planning may not be possible due to the large computation times. Incremental synthesis approaches save time by reusing the models; however, the approaches must ensure computational efficiency using problem-specific heuristics.

The incremental nature of the solution is that the solution iteratively improves. After a solution is computed, the robot or robot fleet executes the solution. If the solver is iterative or anytime in nature, the significant amount of time when the robot is physically doing an operation can be used by the solver to further optimize the rest of the solution. The assumption is that an optimal search has an exponential complexity even after the use of heuristics while the robot cannot be asked to wait indefinitely for a solution. Therefore, the robot needs to make use of an approximation algorithm. There can be greedy heuristics that can be used to compute a reasonably good solution in a small duration of time. However, the robot has an unknown time during which it travels that can be used to compute the approximation solution. The aim is hence to use as many greedy heuristics as possible (like the use of Dynamic Programming), while to also allow the robot to benefit from the unknown time that the robot takes to travel from one place to the other that can be used for improving the solution quality. Many anytime algorithms exist in the literature. Like the Anytime-A* algorithm can continuously improve the path along with time. However, the Genetic Algorithm is specifically chosen as a solver that allows easy integration of the heuristics like Dynamic Programming for the proposed work.

The paper uses Dynamic Programming to integrate the solutions of different tasks to make the solution of a mission, which happens continuously during the optimization process. Hence adding tasks means retaining the old computations, while optimally adding the solution of the new task. A more difficult problem, however, arises in re-wiring the solution when the robot partially solves a task. The evolutionary solver iteratively optimizes solutions of the task; however, once a task is partially solved by the robot, neither can the robot assigned to the task change, nor the partial part of the task solved by the robot be changed. It may initially appear that deleting the solved components is an easy option; however, partial solutions cannot be verified by the verification system. Further, the problem is that the path traversed by the robot is linear, while the evolutionary individual is a complex encoding that is compiled by Dynamic Programming to get a linear sequence, and hence, the evolutionary individual representation domain is not the same as the final solution domain. This requires back tracing the solution to the evolutionary individual domain.

Since the mission specification will be continuously altered, both when a new task is added as well as when a task is partially solved, the evolutionary algorithm is executed with diversity preservation. The addition or deletion of a task changes the mission specification, which changes the objective measure used in the evolution. It is now possible for a local optimum to become a global optimum and vice versa. Diversity preservation techniques ensure that a diverse set of individuals are always available at different places in the fitness landscape to deal with the problems of changing fitness landscape. Here as well typicality is that diversity preservation requires a distance function, while the evolutionary individual is of variable length and hence an off-the-shelf distance function cannot be used. The Dynamic Programming-based edit distance is hence used for diversity preservation.

This is one of the first approaches to design a complete solution to incrementally solve the mission planning problem with multiple robots to the best of the knowledge of the author, wherein the solution and individual representation make two different domains, and deleting the traced path is not possible due to verifiability. At the same time, the paper stresses that the model verification tools for mission planning problems cannot be practically used for massively large problems possible in the domain of service robotics due to the exponential computational complexity and lack of optimality.

The main contributions of the paper are:

  1. 1.

    An incremental approach to mission planning using evolutionary computation is proposed. The evolutionary technique continuously optimizes the plan as the robot acts based on the best plan available. So, there is a variable time till the robot is executing the current operation, till when the next operation should be finalized as per the optimality criterion. This is a new approach to solve the problem, while the literature uses exponential complexity time solutions that cannot adjust to the variable time available. Tasks can be added at any time, while the tasks are deleted as the robot moves. The optimizer adjusts to the changing tasks. The algorithm ensures probabilistic optimality.

  2. 2.

    The solver continuously optimizes the current best plan as the robot acts. Thus, at any time a part of the tasks has already been executed while the remaining part is scheduled to be executed. The model verification engine can check for the validity of the string representing the complete task’s solution and not a part string representing the solution of the task yet to be executed. The evolutionary approach is thus made freeze-aware, wherein the already solved parts of the tasks are frozen and need to be fixed by the evolutionary approach. The problem is unconventional since the solved string could be scattered in different parts of different genetic individuals.

  3. 3.

    Diversity preservation is added. Since the tasks are continuously added, deleted, or altered; the problem specification and thus the fitness landscape continuously changes. Diversity preservation helps to escape local optima upon a change in the fitness landscape. The genetic individual has a variable-length encoding and thus Euclidian and other distance functions cannot be directly used for diversity preservation. A Dynamic Programming-based distance function is proposed.

  4. 4.

    The solution is extended to multiple robots where the solver simultaneously optimizes the robot assignment to different tasks and the trajectory for each participating robot.

2 Background

2.1 Genetic algorithms

Genetic Algorithm is an evolutionary algorithm widely used to solve optimization problems. The algorithm is inspired by the natural evolutionary process, wherein a population of individuals mate to produce new offspring for the new generation. The process follows Darwin’s principle of the survival of the fittest wherein the fitter individuals are stronger and mate more times while the weaker individuals normally die without mating. The new generation is generally healthier than the parent population, while the evolution process shows an adaptation of the individuals in response to the changing environment.

The Genetic Algorithm models an individual as a solution to the problem in an encoded format like a linear string called the genotype. The population is a collection of such individuals. The goodness of an individual is tested by using a problem-specific objective function called the fitness function. The algorithm iteratively improves the fitness value of the individuals, and every iteration of the algorithm is called a generation. To create the population pool of the next generation, the algorithm first selects a proportion of fitter individuals using a process called selection. Selection stochastically selects individuals such that the probability of selection of fitter individuals is higher that may get selected multiple times, while the probability of selection of the weaker individuals is lower that may not get selected. The selected individuals mate and exchange characteristics to produce children by an operation known as crossover. In a real-coded genetic algorithm, the individuals are encoded as a real numbered string. The arithmetic crossover in such a case produces children as a weighted addition of the parents. The individuals after crossover also undergo a mutation operation signifying the errors incurred in the generation of the children. For a real coded genetic algorithm, a Gaussian mutation can be used that adds a Gaussian noise to the real-coded genes. The parents and newly produced children compete, and the fittest individuals are selected to make the new population pool.

The initial population of the Genetic Algorithm may be generated by any greedy method signifying potentially good candidates. The initial population may as well be random. The individuals are initially scattered in the entire fitness landscape. The individuals converge towards the global optima with time because of the genetic operators. The algorithm attempts to maintain a tradeoff between exploration, which tries to expand the horizon of search of the individuals delaying convergence; and exploitation, which aims to quickly converge the individuals into the current optima.

2.2 Dynamic programming

It is common to recursively divide a problem into smaller parts and to integrate the solutions of the smaller problems to make a solution to the larger problem. It is assumed that the solution of all unit problems is available (or can be computed in a constant time). Such divisions however can result in an exponential number of problems that may take an extremely long time to solve. Dynamic Programming uses a concept called memorization to store the solutions of all the sub-problems as they are solved. Thereafter, if a sub-problem gets generated again, its memorized solution is used directly, rather than computing the entire solution again. In this manner, Dynamic Programming significantly reduces the number of sub-problems by ensuring all problems are solved once.

Consider the problem of finding the shortest common super-sequence of two strings s1 and s2. Let s1(i) denote the substring till the ith character and similarly let s2(j) denote the substring till the jth character. Let the length of the shortest common super-sequence of s1(i) and s2(j) be l(i,j). If the character s1(i) and s2(j) are the same, then we can use the solution of l(i − 1,j − 1) or the shortest common super-sequence of s1(i − 1) and s2(j − 1) and add the character s1(i) making the solution of l(i,j) as l(i − 1,j − 1) + 1. If the character s1(i) and s2(j) are different, then we can either take the solution of l(i − 1,j) and add the character s1(i), or take the solution of l(i,j − 1) and add the character s2(j), whichever is smaller. The smallest case is when either of the strings is empty (size 0) and thus the solution is the other string. The solution is thus given by Eq. (1)

$$ l\left( {i,j} \right) = \left\{ {\begin{array}{*{20}l} i \hfill & {{\text{if}}\;j = 0 \; \left( {{\text{case}}\;1\;{\text{base}}} \right)} \hfill \\ j \hfill & {{\text{if}}\;i = 0 \; \left( {{\text{case}}\;2\;{\text{base}}} \right)} \hfill \\ {l\left( {i - 1,j - 1} \right) + 1} \hfill & {{\text{if}}\;s_{1} \left( i \right) = s_{2} \left( j \right) \wedge i > 0 \wedge j > 0 \; \left( {{\text{case}}\;3} \right)} \hfill \\ {l\left( {i - 1,j} \right)} \hfill & {{\text{if}}\;s_{1} \left( i \right) \ne s_{2} \left( j \right) \wedge l\left( {i - 1,j} \right) \le l\left( {i,j - 1} \right) \wedge i > 0 \wedge j > 0 \; \left( {{\text{case}}\;1} \right)} \hfill \\ {l\left( {i,j - 1} \right)} \hfill & {{\text{if}}\;s_{1} \left( i \right) \ne s_{2} \left( j \right) \wedge l\left( {i - 1,j} \right) > l\left( {i,j - 1} \right) \wedge i > 0 \wedge j > 0 \,\, \left( {{\text{case}}\;2} \right)} \hfill \\ \end{array} } \right. $$
(1)

Instead of solving the equation recursively, Dynamic Programming uses memorization to store all solutions of l(i,j) so that every unique (i,j) pair is only solved once. Equation (1) gives the length of the super-sequence. To trace the super-sequence, additional information is stored in a similar data structure called parent (say π) that stores which case was used to compute the super-sequence, given by Eq. (2).

$$ \pi \left( {i,j} \right) = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\text{if}}\;j = 0 \; \left( {{\text{case }}\;1 \;{\text{base}}} \right)} \hfill \\ 2 \hfill & {{\text{if}}\; = 0 \; \left( {{\text{case}}\;2\;{\text{base}}} \right)} \hfill \\ 3 \hfill & {{\text{if}}\;s_{1} \left( i \right) = s_{2} \left( j \right) \wedge i > 0 \wedge j > 0 \; \left( {{\text{case}}\;3} \right)} \hfill \\ 1 \hfill & {{\text{if}}\;s_{1} \left( i \right) \ne s_{2} \left( j \right) \wedge l\left( {i - 1,j} \right) \le l\left( {i,j - 1} \right) \wedge i > 0 \wedge j > 0 \; \left( {{\text{case }}\;1} \right)} \hfill \\ 2 \hfill & {{\text{if}}\;s_{1} \left( i \right) \ne s_{2} \left( j \right) \wedge l\left( {i - 1,j} \right) > l\left( {i,j - 1} \right) \wedge i > 0 \wedge j > 0 \; \left( {{\text{case}}\;2} \right)} \hfill \\ \end{array} } \right. $$
(2)

3 Related works

A lot of work has been done using model verification tools for robot motion planning using LTL. In a pioneering work, Kress-Gazit et al. [3] used triangulation to decompose the map into a transition system which was fed into a Linear Temporal Logic solver for a robot to generate a controller for discrete and continuous domains. In the same vein, Lahijanian et al. [4] added the notion of probabilities to guarantee that the solution returned satisfies the minimum specified probabilities. The actual probabilities were learned by using Reinforcement Learning. Another related work by Bhatia et al. [5, 6] presented the same problem as an integration of a low-level controller and a high-level mission planner, wherein the low-level controller used a sampling-based approach incorporating robot dynamics. Heuristics can be used to get a faster solution, which was shown in the work of McMahon and Plaku [7]. The heuristics used are aimed at getting solutions early as well as at low cost. In the same vein, Svorenova et al. [8] incorporated preferences and rewards and intensified the search towards the goal as well as maximizing rewards. These approaches represent the LTL in the form of an automaton, which has an exponential complexity thus limiting the use to only smaller problem sizes. The approaches further use a product of the transition system with the automaton representation to ensure that only valid transitions are represented which is again an exponential complexity operation. The integration with low-level planners and incorporation of uncertainties further complicates the search over such representations. The aim of the paper is thus to use a restricted language that can be verified in a polynomial time. The search for an optimal solution will still have an exponential complexity and thus the evolutionary algorithm is used to make an iterative algorithm that continuously optimizes the solution. It must be noted that the proposed approach does not solve for the full LTL and only the AND, OR and sequence operators can be used in the proposed language. The proposed approach argues that given the existing approaches on full LTL cannot deal with exceptionally large problem sizes, research efforts are required to custom design languages even if with restricted representational ability that can be verified in a polynomial time. Uncertainty cannot be currently handled in the proposed approach since it is not possible to compute the uncertainty metrics within a polynomial-time of the number of variables.

Some recent approaches are optimality conscious [9, 10]. The approaches present the problem as a LTL that is converted into an automaton. To model only the feasible transitions, a product of the automaton is taken with the transition system. Here, the conversion of the temporal logic into the automaton is exponential in time and space with respect to the variables used, and therefore, the product with the transition system is also exponential. This however converts the problem into a graph search problem over the product automaton. In the work of Svorenova et al. [11], the notion of probabilities is taken in addition, over which an exhaustive search finds the optimal solution while considering the probabilities. In another recent work, Ulusoy et al. [12] used product automaton to plan for multiple robots optimally. The authors further synchronized the motion of the robots and corrected the plans when the robots were out of synchronization. Fu et al. [13] added heuristics while searching for the optimal plan. The heuristics resulted in focussing on the paths which are more likely to get to the terminal stage. The approaches use the LTL language to take the input while the user can give any formula specified using the LTL. The papers have different system modelling and different optimization objective that is suited to the specifics of the application. Since the input is a generic formula, its verification and hence the search is over a space that has exponential complexity. Further, the approaches need the computation of the product automaton that also has exponential complexity. The proposed work on the contrary attacks problems where an exponential complexity is not feasible due to many proposition variables. The proposed approach restricts the language to be polynomial verifiable and thus facilitating a solution using evolutionary computation. Only AND, OR, and sequence operations can be used, the use of which makes a language that is polynomial verifiable. Limiting the expressive power is important to avoid exponential complexities.

Lacerda et al. [14] considered co-safe LTL, where the language has only those operators that generate a good prefix that appended to an infinite length action sequence guarantee the satisfiability of the formula. Practically, the language reduces the problem of search from infinite sequences to only finding the finite good prefix. The authors specifically allowed tasks to be added to the mission while the tasks were deleted on completion. The system planned dynamically and adjusted to the changes. Schillinger et al. [15] assumed that the mission consists of several tasks and jointly performed the robot allocation and task planning like the proposed approach. The authors used co-safe LTL while working over the computation of the finite prefix to generate optimal trajectories as per the cost function. The authors handled constraints like resources in the LTL formulation. Faruq et al. [16] further incorporated uncertainties in their modelling, while trying to compute a policy that maximizes the probability of completion of the mission. To make the approach computationally efficient, the problem was solved through local Markov Decision Processes for the individual robots with a switching mechanism for task allocations. The approaches still suffer from the generic problems associated with the LTL of a high computational time and exponential complexity that is worsened because of the uncertainties. The approaches are not scalable to many mission sites and robots. Thus, the paper approaches the problem using an optimization approach while making the language specific to be polynomial verifiable. Even though uncertainty makes the solution realistic, it is currently not possible to compute the same in a time linear to the number of variables and is thus omitted. For the same reasons only AND, OR, and sequence operators are used.

Torres and Baier [17] enabled the translation of an LTL expression into an alternate representation in a polynomial time that also facilitated the addition of new constraints with time using the primitives of alternating automata. This was done by using additional fluents. However, the translation does not ascertain the representation of all possible transitions that can severely affect the optimality, even though getting a feasible solution may be possible. Camacho et al. [18] also aimed to represent a general LTL formula as a PDDL, while the conversion could theoretically be exponential. The conversion still requires a search for an optimal solution without good heuristics that can be exponential. Menghi and Garcia [19] considered sub-missions to be solved by the different robots in a decentralized manner in a partially known environment. The robots could plan to meet, which was also associated with uncertainty. The authors’ performed optimistic and pessimistic planning of every robot, handling the meeting constraints. Decentralization and decomposition enable reducing the computation time. However, a classic search over the problem itself could be exponential, especially when done with an aim of computing plans with the optimal cost.

Sampling is a natural response to deal with systems with a massive complexity. Sampling has also been applied to mission planning. A good approach [20] is to sample out trees in the transition system, which is used to synthesize a feasible and optimal path that solves the mission. A random configuration is used to grow the tree, similar to the Rapidly-exploring Random Tree [21, 22] formulation. Only feasible sequences are expanded. In another similar work Kantaros and Zavlanos [23, 24] proposed strategies to generate the samples and grow the trees while searching for a solution that satisfies the temporal logic specification. Sampling can significantly prune the search and result in a much faster algorithm. However, the verification still requires a conversion of the temporal logic into an automaton, which can be exponential in terms of the number of variables used to represent the system. Here, the focus is on the problem when the conversion of the mission formula into an automaton is itself infeasible in the first place. The proposed work restricts the language to the use of AND, OR, and sequence only that is polynomial verifiable, while the restriction is needed as otherwise the problem with an exponential complexity cannot be solved optimally for very large problem sizes.

The problem of mission planning has particularly not been solved by Evolutionary Computation. A particular work to discuss includes Xidias and Azariadis [25] wherein the mission was to visit several sites by multiple robots using a centralized Genetic Algorithm. Lu and Yue [26] solved the multi-robot Travelling Salesman Problem by using a decentralized technique of allocating sites, which was optimized by using Ant Colony Optimization. In the same problem, the notion of preference was added by Kala [27] while solving by using decentralized evolutionary computation. Senol [28] accounted for the human–robot interactions while optimizing a mission using a mixed-integer programming technique, without Boolean and temporal constraints. The problem with all the current approaches is that the temporal operators are rarely modelled and accounted for, which restricts the generality of the solver and thus the solver can be used in a few limited scenarios only. Using a single robot, a prior work of the author [29] incorporated the specification of Boolean and temporal constraints in the robot. The approach inculcated a sequencing temporal constraint into a generic Travelling Salesman Problem. The paper used no heuristic like Dynamic Programming that is included in the proposed work. The paper did not have any incremental nature and could not incorporate multiple tasks. The paper forms a very initial work of the author in the domain.

It is also worth noting the approaches used by the classic planning techniques, which form another school of learning to approach the same problem. The classic planning approaches are relevant to the work because many mission problems can also be specified as classic planning problems. Traditionally costs were not incorporated in the classic planning techniques and optimality was not considered in these approaches. Recently, especially with the increased use of PDDL3 [30, 31], the costs are now being actively taken into consideration. The problem with the approaches is again the absence of the temporal operators. Handling temporal constraints in classic planning techniques is quite different from the ones in temporal logic. A naïve user can write an LTL formula denoting the mission; however, conversion of the same formula into a format that can be directly fed into a classic planning library is still not possible for a naïve user. This makes classic planning not preferable for robot mission planning, where the mission needs to come from an end-user. The proposed approach considers the costs of travelling from one place to the place where the action needs to be performed, and additionally the cost of the action. The domain also faces the problem that the actions are continuous and geometric for the robot, however, are abstracted to Boolean states in classic planning. The integration of tasks and geometric planning [32,33,34] is another good field of study that has recently gained momentum. The motion obtained by any geometric planning algorithm can be used by any low-level reactive planning or control algorithm for the motion of a team of robots. In the proposed approach, the cost between every pair of consecutive actions shall eventually be queried by the planner for feasibility and cost. On the other hand, the number of actions is small and finite. Therefore, a cost matrix is computed at the start instead of computing it on the fly during planning. Artificial Potential Field [35, 36] is a popular choice for moving robotic teams. A similar approach is also used in the proposed work.

Lagoudakis et al. [37] considered a mission where a team of robots had to visit all the mentioned sites. The authors implemented a multi-round auction, where the robots cast a bid using Prim’s heuristic method. The method is dynamic, and changes are rectified in the future rounds of the auction. The specific problem has a good heuristic available in the form of a Prim’s algorithm that gives an optimality guarantee, while the proposed approach has no such heuristic, and optimization is used instead. For the same reasons, instead of a heuristic assignment of robots to the tasks, the robot allocation is also done as a part of the same optimization problem.

4 Problem definition

The problem is that several users have given several instructions to a team of robots called tasks. The tasks are written in a formal language and specify the operations to be done by the robot with ordering and choices that affect the optimality. The robots must distribute the tasks among themselves optimally and plan the tasks optimally. The solver requires a verification system that checks if a string satisfies the constraints of the language. The problem is then to generate the optimal sequence as per the objective criterion that adheres to all the task constraints. The challenge in the problem is that the tasks shall be continuously added by different users, while an accomplished task is deleted from the mission specification. The solutions are thus continuously adapted as per the latest mission specification. The robots continuously work and accomplish different tasks, while the time of operation of the robotic team is used to further optimize the mission plans. The robotic team at any stage will have partially completed tasks, while the optimizer should not alter the part of the plan that has already been accomplished. The different aspects of the problem are given in the following sub-sections.

4.1 Overall problem

The problem is to solve a mission (χ) which is a collection of tasks (\(\psi\)), that is χ = {ψ1, ψ2, ψ3, … ψn}, where n is the number of tasks. It is assumed that the mission is dynamic and changes with time, including the addition of a task ψ by a user (χ ← χ ∪ {ψ}) or completion of a task completely (χ ← χ − {ψ}). The task (ψ) consists of atomic operations with some Boolean and temporal constraints. An atomic operation is the unit operation that the robot can be asked to perform like to visit a place, pick and place an item, or operate a machine. The unit operations are non-divisible, meaning the robot cannot execute any other action till the current atomic operation is complete. A string s consists of an ordered list of atomic operations [σ1, σ2, σ3, … σlen(s)], where len(s) is the size of the string. The set of all atomic operations Σ = {σ1, σ2, σ3, … σm} is a part of the problem specification, where m is the number of possible atomic operations. It is assumed that a verification engine exists such that it can be ascertained whether string s satisfies the requirements of the task ψ in polynomial time, or sψ is polynomial computable. The approach is generic to the use of any Boolean and temporal operator, provided that the resultant language is polynomial verifiable. However, for this paper, we restrict the operators to sequencing (then or T), AND operator (∧), and OR operator (∨). The task ψ can be recursively enumerated using these operators using the Backus Naur Form given by Eq. (3)

$$ < \psi > : = \left( { < \psi > } \right)| < \psi > T < \psi > | < \psi > \wedge < \psi > | < \psi > \vee < \psi > |\sigma_{1} |\sigma_{2} | \, \sigma_{3} | \ldots |\sigma_{m} $$
(3)

The semantics of the language that checks if the string s satisfies the task ψ (sψ) is given by Eqs. (47)

$$ s \vDash \psi_{1} T \psi_{2} \;{\text{iff}}\;\exists_{i} s\left( {1:i} \right) \vDash \psi_{1} \wedge s\left( {i + 1:{\text{end}}} \right) \vDash \psi_{2} $$
(4)
$$ s \vDash \psi_{1} \wedge \psi_{2} \;{\text{iff}}\;s \vDash \psi_{1} \wedge s \vDash \psi_{2} $$
(5)
$$ s \vDash \psi_{1} \vee \psi_{2} \;{\text{iff}}\;s \vDash \psi_{1} \vee s \vDash \psi_{2} $$
(6)
$$ s \vDash \sigma_{i} \;{\text{iff}}\;\sigma_{i} \in s $$
(7)

Equations (47) break down a task ψ recursively into smaller sub-tasks ψ1 and ψ2. Equation (4) states that a string satisfies ψ1 then ψ2 if a prefix till the ith character s(1:i) satisfies ψ1 and the remaining string after the ith character s(i + 1:end) satisfies the task ψ2, which coincides with the literal definition of then. Equation (5) states that a string s satisfies the task ψ1 and ψ2, if the string simultaneously satisfies ψ1 as well as ψ2. Equation (6) states that a string s satisfies the task ψ1 or ψ2, if it either satisfies ψ1 or instead satisfies ψ2, or both. Equation (7) states that a string satisfies the unit operation σi if the string somewhere at any place asks the robot to perform the operation σi.

A verification system is made that checks for the satisfiability of any general string for a formula specified using these operators. The verification system is used to check for the feasibility of the sequence, while the optimizer attempts to find the action sequence for the tasks, whose Dynamic Programming based fusion gives the trajectories of the robot used for cost computation.

It is assumed that there are several robots R = {r1, r2, r3, …, rz} available that collectively solve a mission, where z is the number of robots. However, a task may require a robot to physically carry objects, which cannot be transmitted across robots on the fly. Therefore, the entire task ψ must be solved by any robot, but only one robot. Tasks that do not have such constraints can be supplied as multiple tasks. Let ξ: χ → R be the function that maps a task (ψ) to a robot r = ξ(ψ).

Many practical scenarios were considered before putting a restriction on not allowing multiple robots to partly solve a task. If a task can be divided into multiple robots, the system would accept it as multiple tasks rather than a single task. Consider that a person wants 5 items (say A, B, C, D, and E) from 5 different places. Now it is possible to say that this is 1 task that can be easily divided by the robots for the sake of efficiency. However, the system will interpret it as 5 tasks (get A and deliver to the user; get B and deliver to the user; get C and deliver to the user; get D and deliver to the user; and get E and deliver to the user). The optimizer will consider several possible task assignments among robots, including assigning some of the items to one robot and some others to another robot.

Let Τ′ = {τ1, τ2, τ3, … τz} be the order of performance of atomic operations by all robots where the subscript denotes the robot number (say r). \(\tau_{r}^{^{\prime}} = \left[ {\sigma_{0}^{r} , \sigma_{1}^{r} , \sigma_{2}^{r} , \ldots \sigma_{{{\text{len}}\left( {\tau_{r}^{^{\prime}} } \right)}}^{r} } \right]\) specifies the order of atomic operations to be performed by the robot r, where len(τr) is the number of operations. The ′ is intentionally added to representation to annotate that the trajectory is at the symbolic level and not the real continuous trajectory traced by the robot. Let c: Σ2 → \({\mathbb{R}}\) + be the cost function such that c(loc(σi),σj) is the cost in physically going from the mission site of σi (called loc(σi)) to the mission site of σj and performing the operation σj. Let Sr be the current location of robot r. c(Sr,σj) denotes the cost of going from the current location of robot r to the mission site of σj and performing the operation σj. Here, the cost function is taken as the time for modelling. The total cost C(r) incurred by the robot r is given by Eq. (8)

$$ C\left( {\tau_{r}^{\prime } } \right) = c\left( {S_{r} ,\sigma_{1}^{r} } \right) + \mathop \sum \limits_{i = 1}^{{{\text{len}}\left( {\tau^{\prime } \left( r \right)} \right) - 1}} c\left( {{\text{loc}}\left( {\sigma_{i}^{r} } \right),\sigma_{i + 1}^{r} } \right) $$
(8)

Here len() denotes the length of the solution. Since the cost is taken as the time, the aim is to solve all tasks in as little time as possible. The optimization objective is the makespan, or the total duration of time till the last working robot completes the last operation completely. The total cost of all robots combined is hence given by Eq. (9).

$$ {\text{cost}}\left( {{\rm T}^{\prime } } \right) = \max_{r \in R} \left( {C\left( {\tau_{r}^{\prime } } \right)} \right) $$
(9)

Let τloc(σi),loc(σj): [0,1] → ςfree be the trajectory from the mission site of σi (loc(σi) or source) to the mission site of σj (loc(σj)). Here, ςfree is the free configuration space of the robot, or the set of configurations such that the robot neither collides with the obstacles or itself. In our case, the robot’s configuration is defined by its position and orientation or (x,y,θ). The configuration space is all possible configurations that the robot can take or the set of all (x,y,θ) possible, commonly referred to as SE(2) or the Special Euclidean Group of order 2. However, a subset of these configurations will involve a collision between the robot and some obstacles detected by collision checking libraries, called the obstacle-prone configuration space or ςobs. The remaining configurations constitute the free configuration space or ςfree where the robot does not collide with any obstacle. ςfree is the set complement of ςobs over the super-set ς or ςfree = ς\ςobs. Computing the trajectory is a classic robot motion planning problem [38, 39], wherein the problem is to compute a trajectory from the origin τloc(σi),loc(σj)(0) = loc(σi) to the destination τloc(σi),loc(σj)(1) = loc(σj), such that all intermediate points are collision-free τloc(σi),loc(σj)(e) ∈ ςfree, 0 ≤ e ≤ 1. The cost function is taken as the time required to trace the trajectory and operating, or c(loc(σi),σj) =||τloc(σi),loc(σj)||/v + op(σj), where ||τloc(σi),loc(σj)|| is the total distance of the trajectory, v is the average speed of the robot and op(σj) is the cost of performing operation σj. It may be noted that this trajectory and costs are computed assuming a static map only and the actual time may be larger if there are other humans and dynamic obstacles. The trajectories and costs for all mission sites can be computed in advance with the static map of the place.

The problem is hence to find the optimal sequence for every robot and the optimal robot assignment to every task, or Eq. (10)

$$ T^{*} , \xi^{*} = \arg \min_{{{\rm T}^{\prime } ,\xi }} {\text{cost}}\left( {{\rm T}^{\prime } } \right) $$
(10)

such that all tasks in the mission are satisfied, or Eq. (11)

$$ \tau_{r}^{\prime } \vDash \psi ,\quad \forall r \in R, \;\tau_{r}^{\prime } \in {\rm T}^{\prime } ,\;\psi \in \chi :\xi \left( \psi \right) = r $$
(11)

4.2 Incremental nature

Since the missions are continuously added by the user and deleted, the evolutionary optimization must run in parallel. Adding a new task by the user (χ ← χ ∪ {ψ}) and deleting the old task completed by the robot (χ ← χ − {ψ}) are not difficult and do not change the problem formulation. However, suppose the robot trajectory τr solves the task ψ, where r = ξ(ψ). The robot starts the execution of the trajectory. Suppose at any time t, τr(1:φr(t)) part of the trajectory is already executed. Here, φr: \({\mathbb{R}}\) → [0,len(τr)] is the function that maps the time to the index of the planned trajectory at the symbol level. So, at time t, the robot will be executing the operation τr(φr(t)). The mapping can be understood by the ideal case wherein the robot operates at the assumed average speed of v. Hence, at time t, the robot must be executing the operation given by Eq. (12).

$$ \varphi_{r} \left( t \right) = \left\{ {\begin{array}{*{20}l} {i + 1:\left( { c\left( {S_{r} ,\sigma_{1}^{r} } \right) + \mathop \sum \limits_{j = 1}^{i} c\left( {{\text{loc}}\left( {\sigma_{j}^{r} } \right),\sigma_{j + 1}^{r} } \right) > t} \right)} \hfill & {c\left( {S_{r} ,\sigma_{1}^{r} } \right) \ge t} \hfill \\ 1 \hfill & {c\left( {S_{r} ,\sigma_{1}^{r} } \right) < t} \hfill \\ \end{array} } \right. $$
(12)

The iterator j iterates over the complete trajectory of robot r that is present as a sequence of operations \(\left[ {\sigma_{1}^{r} ,\sigma_{2}^{r} , \ldots \sigma_{{{\text{len}}\left( {\tau_{r}^{^{\prime}} } \right)}}^{r} } \right]\) with the source of the robot as Sr. The cost of the trajectory till the i + 1th item of the sequence is thus \(c\left( {S_{r} ,\sigma_{1}^{r} } \right) + \mathop \sum \nolimits_{j = 1}^{i} c\left( {\sigma_{j}^{r} ,\sigma_{j + 1}^{r} } \right)\). If for any operation this cost is more than the current time t, it means that the i + 1th operation for the robot is being executed, which is the output or the value of \(\varphi_{r} \left( t \right)\).

It is worth noting that the index updates when the robot starts executing an operation and not when it completes executing the operation. The trajectory τr(1:φr(t)) has already been executed and therefore cannot be altered. This part of the trajectory is said to be frozen for the optimization process. Furthermore, unless the task execution has not yet started, or τr(t0:t) = ∅, the robot assignment r = ξ(ψ) cannot change, which is an added constraint for the solver. It must be stressed that Eq. (12) assumes that the actual time taken by the robot to complete every operation is known. However, the cost function only gives an estimate, and the actual robot may take a longer or a shorter time. As an example, the estimate to take the signature from a person maybe 1 min, while it may take longer as the person is talking over a phone. Hence, Eq. (12) is not used to estimate the index of the operation in the trajectory that is currently being executed. The index is calculated by making the robot signal the successful completion of a task.

The mechanism of freezing is such that the operations are non-preemptive. That is once a robot starts executing an operation and later a sequence is found such that it is better to leave the current operation unfinished and start with the new operation, the robot shall not leave the operation. This is because the user may have been notified for the start of the operation and cancellation may not be good for acceptance. Further, for many robots, such termination of operations may not be feasible. A preemptive version can also be implemented by a small change of code.

4.3 Solving by decomposition

If optimization is carried directly onto the problem, it shall become a centralized evolutionary computation that is known not to work unless the number of robots, operations, and tasks constituting the mission is very small in number. The absence of good heuristics like clustering for a Travelling Salesman Problem, further makes it hard to decompose the mission into simpler sub-missions to be solved by using a decentralized technique. Using centralized evolutionary computation further means that the addition of a mission does not mean that the past genetic individual can be easily modified to accommodate the new mission, nor does deletion of a mission mean that the relevant variables can be deleted from the corresponding individuals since they may be widely distributed in the entire individual that cannot be traced.

To best capture the heuristics of the mission, that a mission constitutes of multiple independent tasks, and to aid in the incremental addition and deletion of tasks in the mission, the problem is decomposed into two components that work hand in hand, one is to solve independently for all tasks and the second is to integrate the solutions of the tasks to make a solution of the mission.

In the reformulated problem, let \(\tau_{\psi }^{{{\text{task}}}}\) be the sequence of operations to be performed by the robot r = ξ(ψ) for the task ψ, such that \(\tau_{\psi }^{{{\text{task}}}} \vDash \psi\). The path to be taken to solve the task is hence given by \(\tau_{\psi }^{{{\text{task}}}} = \left[ {\sigma_{0}^{\psi } , \sigma_{1}^{\psi } , \sigma_{2}^{\psi } , \ldots \sigma_{{{\text{len}}\left( {\tau_{\psi }^{{{\text{task}}}} } \right)}}^{\psi } } \right]\), where \({\text{len}}\left( {\tau_{\psi }^{{{\text{task}}}} } \right)\) is the length of the task solution. Let the sequence of operations to be performed by the robot be given by \(\tau_{r}^{\prime } = \left[ {\sigma_{0}^{r} , \sigma_{1}^{r} , \sigma_{2}^{r} , \ldots \sigma_{{{\text{len}}\left( {\tau_{r}^{\prime } } \right)}}^{r} } \right]\). The only constraint on τr is that it must satisfy all tasks, that is Eq. (13)

$$ \tau_{r}^{\prime } \vDash \psi \forall \psi \in \chi : \xi \left( \psi \right) = r $$
(13)

Many realistic applications including the operation of robots in home and office environments have the property that tasks are given to the robot, instead of asking the robot not to do things, meaning that the language is negation-free. The primitive negations like avoiding obstacles, not moving to a hazardous region, etc. can be handled by the path planner. For such languages, if a string satisfies a task, any super-sequence of that string also satisfies the task. More formally, consider a general string s which is a subsequence of a string P. If s satisfies the task ψ, then P also satisfies the task ψ. Let s|P denote s as a subsequence of P, or Eq. (14) holds

$$ s \vDash \psi \Rightarrow P \vDash \psi ,\quad {\text{if}}\;s|P $$
(14)

This is formally given in the "Appendix" section as Theorems 1–5. If a robot r has a set of tasks assigned to it, the robot trajectory (say, \(\tau_{r}^{\prime }\)) that is a super-sequence of all task trajectories \(\left( {\tau_{\psi }^{{{\text{task}}}} } \right)\) satisfies all tasks given to the robot as per theorem 5. In other words, Eq. (15)

$$ \tau_{\psi }^{{{\text{task}}}} \vDash \psi \Rightarrow \tau_{r}^{\prime } \vDash \psi , \quad {\text{if}}\;\tau_{\psi }^{{{\text{task}}}} |\tau_{r}^{\prime } \quad \forall \psi \in \chi :\xi \left( \psi \right) = r $$
(15)

The equation should be read as if the string \(\tau_{\psi }^{{{\text{task}}}}\) satisfies the task ψ, then a superstring (say \(\tau_{r}^{^{\prime}}\)) of \(\tau_{\psi }^{{{\text{task}}}}\) (denoted by \(\tau_{\psi }^{{{\text{task}}}}\)|τr) also satisfies the task ψ, for all such tasks ψ assigned to the robot r. It is given that all task trajectories satisfy the given task, or \(\tau_{\psi }^{{{\text{task}}}} \vDash \psi\). We consider only the sub-set of tasks assigned to the robot r, or the tasks for which \(\xi \left( \psi \right) = r\). For all such tasks, since \(\tau_{\psi }^{{{\text{task}}}} \vDash \psi\), a super-sequence of \(\tau_{\psi }^{{{\text{task}}}}\), say \(\tau_{r}^{^{\prime}}\) \(\left( {\tau_{\psi }^{{{\text{task}}}} |\tau_{r}^{\prime } } \right)\) also satisfies \(\psi\) \(\left( {\tau_{r}^{\prime } \vDash \psi } \right)\) as per theorem 5. Further, we prove that the addition of new terms to \(\tau_{r}^{^{\prime}}\) apart from those necessitated as per the definition of super-sequence are unnecessary by theorem 6 given in the "Appendix" section.

The implied constraint on the construction of the trajectory of the robot is hence to construct a super trajectory of all trajectories of the tasks, such that no extra element is added. This trajectory and constraints are put into Eqs. (10) and (11) to get the optimal path. In other words, instead of directly computing the robot sequences τr, the optimizer calculates the task sequences \(\tau_{\psi }^{{{\text{task}}}}\) at one place and simultaneously fuses task sequences \(\tau_{\psi }^{{{\text{task}}}}\) to produce the robot sequences τr, at the other place. This means the addition of a task (χ ← χ ∪ {ψ}) is simple since the optimizer will add a separate sequence for \(\tau_{\psi }^{{{\text{task}}}}\). Similarly deleting a task (χ ← χ − {ψ}) is simple, in which case corresponding \(\tau_{\psi }^{{{\text{task}}}}\) is simply deleted.

However, a larger problem is when the robot part solves a task, by executing trajectory τr(1:φr(t)). Now freeing τr(1:φr(t)) is not an option since the optimization happens on \(\tau_{\psi }^{{{\text{task}}}}\). This requires a mechanism to link σ ∈ τr to σ ∈ \(\tau_{\psi }^{{{\text{task}}}}\).

The negation-free constraint (or the constraint that the robot can only be given operations to be performed that can be specified without using the negation) has an important implication. This makes the tasks non-conflicting. Therefore, achieving one task does not make the other task unsatisfiable. If a future task is added and correspondingly some operations are added to achieve the same task, the negation-free constraint suggests that the operations do not make any of the previous tasks unsatisfiable. The solver incrementally adapts to the tasks that come. If the tasks were not conflict-free, a future task could have made a previous task infeasible, requiring a solution to be re-computed. In such a case, the incremental approach would not effectively work, and the solution could have been as poor as a complete re-planning.

5 Solution design

5.1 Overall approach

The overall methodology is given in Fig. 1. The genesis of the solution is a mission-solving system that runs in the background and constantly adapts the current mission solution as per the altered mission specifications, addition of tasks by the user, and completion of (part-)tasks by the robots. Since evolutionary computation is used, the solution keeps improving. This does not cause a computational overhead to the system since the optimization for the next operation happens while the robot is physically running and executing the current operation. Consider that the robot plans according to which it gets its next atomic operation to perform. Even before the robot starts executing the next atomic operation, the operation is marked as frozen, meaning the optimizer cannot change the next atomic operation. The robot shall take some time to physically reach the site where the operation needs to be done and to do the operation, sometimes alongside a human. The time required for the same is unknown. As the robot operates, the optimizer keeps optimizing the plan, keeping the frozen parts fixed. When the robot completes the current operation, it fetches the next operation as per the current best plan, freezes the operation, and executes the operation while still optimizing the rest of the plan. Since the mission specification will change often, the diversity preservation mechanisms are added to the evolutionary algorithm. Also, it may be possible that the mission has some heuristics specific to the domain of application, in which case the initial population generation needs to be done in a strategized manner to exploit any possible heuristic.

Fig. 1
figure 1

Overall working methodology. Non shaded is the simplified programme flow

The tasks are added to the system by interrupts generated in the software. Upon receiving such an interrupt, the mission specification variables and all individuals in the population are altered to account for the new task added. Similarly, an interrupt is generated when a robot completes an operation in the task. In such a case, the variables of the mission specification and population pool are traced from the completed operation to freeze them from alteration. A special case is when the entire task completes, wherein again the mission specification variables and population pool is altered to delete the relevant variables.

The individual representation encodes for all tasks in the mission. The solution is needed as the trajectory of the robot. The fitness is itself calculated in terms of the trajectory of the robot. These are two different domains. Therefore, task-encoded solutions need to be compiled into the trajectory of the robot. This compilation is done optimally by using Dynamic Programming. Dynamic Programming only ensures an optimal fusion given some task trajectories; it does not ascertain optimal task trajectories in the first place, which is done by the evolutionary computation. Hence, Dynamic Programming is called within the fitness evaluation of the Evolutionary Computation. A similar Dynamic Programming is also used as the distance function of the diversity preservation, which is otherwise difficult since the different individuals will have different lengths representing the solutions of the tasks.

The output of the Evolutionary Algorithm is the plan or the sequence of steps to be taken by every robot. The algorithm sequentially gives the next operation of the plan to the robot for execution, while the rest of the plan is further optimized as the robot works. The execution of the operation (like going to a place and picking a book, taking a signature, etc.) is a capability that is assumed to be present with the robots and can be invoked as a library function, referred to as the robot controller.

The algorithm requires a cost function between every pair of possible operation site to compute the total cost of the plan that is optimized. The cost between every possible operation site pair is stored as a cost matrix. For the same, first, the map of the arena is made using SLAM techniques. Thereafter, in the static map, the Probabilistic Roadmap [40] technique is used to compute the cost matrix between every pair of mission site possible. The operational cost of physically operating is currently kept as 0.

5.2 Individual representation

The genetic individual encodes the solutions to the different tasks \(\left( {\tau_{\psi }^{{{\text{task}}}} } \right)\), where a solution of each task consists of a string. The complete individual is given in Fig. 2. Some of the operations may have already been performed by the robot, and therefore, the associated variables need to be frozen which is also specified in the individual representation. The frozen components are non-optimizable. The individual also encodes the robot assignment for the tasks (ξ), which are also frozen for the tasks for which even a single step has been performed. The individual in such a genotypic form needs to be compiled into the trajectory of the individual robot in the phenotypic part which is done by using Dynamic Programming. However, a direct compilation would again make the algorithm have exponential complexity and hence the fusion is done by taking two tasks at a time in an ordered sequence. Let O: [1,n] → χ be the function that maps every index (order) to a task, O(i) = ψ, which means that ψ will be the ith task to be presented to Dynamic Programming. This will be explained when dealing with Dynamic Programming. The ordering is also optimized in the evolutionary process and represented in the individual.

Fig. 2
figure 2

Individual representation and conversion from genotype to phenotype. Shaded regions are frozen for optimization

The genotype consists of a prospective solution as a sequence of operations for every task with the completed operations shown in grey, along with the order of fusion and robot assignment. Each robot needs a linear sequence of operations (phenotype), based on which the goodness of an individual is evaluated. The conversion of genotype to phenotype happens by noting the tasks assigned to every robot and sorting the tasks as per the order values. The solution trajectory for the task (from the genotype) is then fused by a Dynamic Programming technique in the sorted order of tasks. After fusion for all tasks, the fused trajectory becomes the phenotype that can be used for fitness evaluation.

5.3 Dynamic programming based fitness evaluation

Let us first discuss the fusion of the solutions of two tasks \(\tau_{{\psi_{1} }}^{{{\text{task}}}}\) and \(\tau_{{\psi_{2} }}^{{{\text{task}}}}\), to produce a superstring that satisfies both tasks. Let the Dynamic Programming fusion be denoted by \(\tau_{{\psi_{1} }}^{{{\text{task}}}} \oplus_{D} \tau_{{\psi_{2} }}^{{{\text{task}}}}\). Let d(i,j) denote the cost incurred in the fusion of \(\tau_{{\psi_{1} }}^{{{\text{task}}}}\) till ith position and \(\tau_{{\psi_{2} }}^{{{\text{task}}}}\) till jth position, that is the optimal fusion of \(\tau_{{\psi_{1} }}^{{{\text{task}}}} \left( {1:i} \right)\) and \(\tau_{{\psi_{2} }}^{{{\text{task}}}} \left( {1:j} \right)\). The calculation of d(i,j) can be done by using Dynamic Programming; however, the application also needs the fused string. The string fused by Dynamic Programming is calculated by storing the sub-problem that leads to a solution of d(i,j) in a new data structure called the parent or π(i,j). More specifically, π(i,j) is 1 if the last operation in the optimal calculation of d(i,j) came from \(\tau_{{\psi_{1} }}^{{{\text{task}}}}\) and 2 if the last operation instead came from \(\tau_{{\psi_{2} }}^{{{\text{task}}}}\). For distance computation reasons, the specific operation that contributed at the end also needs to be stored specifically and hence let δ(i,j) be the specific site where the robot is located at the end in the optimal calculation of d(i,j). Using the principles of Dynamic Programming, the optimal fusion is given by Algorithm 1.

Lines 1–7 are used to initialize the Dynamic Programming memorization table for the unit case for the cost (d), parents (π), and last operation (δ). In this algorithm, the Dynamic Programming is 2-dimensional since it is parametrized by two variables (say i and j). The unit case of the 2-dimensional Dynamic Programming is hence when i = 0 (and similarly another when j = 0). This is a 1-dimensional Dynamic Programming problem. This problem is referred to as initialization because the 1-dimensional Dynamic Programming is an initialization to the main 2-dimensional Dynamic Programming problem. If the second input string is empty, the solution of fusion till the ith character of the first input string is the first string till the ith character (with the corresponding parent) whose cost is computed and stored. Similarly, for the case when the first input string is empty.

Lines 8–15 form the main logic of Dynamic Programming. All sub-problems are solved from smaller to larger. Line 10 decides for the solution of d(i,j) whether the sub-problem d(i − 1,j) is a better candidate or the sub-problem d(i,j − 1), given that both decompositions are possible. Leaving the frozen conditions, the line states that if adding the character \(\tau_{{\psi_{1} }}^{{{\text{task}}}} \left( i \right)\) to the sub-problem, d(i − 1,j) is better than adding the character \(\tau_{{\psi_{2} }}^{{{\text{task}}}} \left( j \right)\) to the sub-problem d(i − 1,j), the first sub-problem will be used and vice versa. The frozen constraints additionally ensure that if one of the two characters (\(\tau_{{\psi_{1} }}^{{{\text{task}}}} \left( i \right)\) or \(\tau_{{\psi_{2} }}^{{{\text{task}}}} \left( j \right)\)) is frozen while the other is not, the frozen character takes priority. Lines 11–12 store the result when the sub-problem d(i − 1,j) is better, and lines 14–15 store the result when the sub-problem d(i,j − 1) is better.

In lines 16–21, p iterates over all the sub-problems that lead to the solution for the overall problem d(len(\(\tau_{{\psi_{2} }}^{{{\text{task}}}} \left( j \right)\)), len(\(\tau_{{\psi_{2} }}^{{{\text{task}}}} \left( j \right)\))). As it iterates over all the sub-problems, the solution is stored in τ. If the parent points to the first sub-problem, the corresponding character from the first input string is added at the beginning of the soliton τ and vice versa.

figure a

The unit operations (e.g. collecting an item, getting a document signed, etc.) are exclusive and the robot cannot do two unit operations at the same time. The solution to a task is performing multiple unit operations sequentially. There are synergies between the solution of two tasks, that is the robot will perform multiple steps of different tasks interchangeably. This is primarily done by using Dynamic Programming.

As an example, consider that sequentially solving [σ1, σ2, σ3] is a valid solution of the first task, while sequentially solving [σa, σb] is a valid solution of the second task. Here, all σi are unit operations, while these are two of the many tasks for a robot. Now the valid solutions for simultaneously solving the two tasks are [σa, σb, σ1, σ2, σ3], [σa, σ1, σb, σ2, σ3], [σa, σ1, σ2, σb, σ3], [σa, σ1, σ2, σ3, σb], [σ1, σa, σb, σ2, σ3], [σ1, σa, σ2, σb, σ3], [σ1, σa, σ2, σ3, σb], [σ1, σ2, σa, σb, σ3], [σ1, σ2, σa, σ3, σb,], [σ1, σ2, σa, σb, σ3], [σ1, σ2, σa, σ3, σb], and [σ1, σ2, σ3, σa, σb]. In all these the robot is switching between the solution of the first task [σ1, σ2, σ3] and the second task [σa, σb]. Dynamic Programming selects the best combination out of all these combinations, while Genetic Algorithm selects good solutions ([σ1, σ2, σ3] and [σa, σb]) that after the application of Dynamic Programming become the best solution.

The frozen variables have already been operationalized and therefore should be used before any non-frozen variable in the Dynamic Programming fusion. In other words, a new task cannot come in between the solution of a task that has already been performed and frozen. A requirement is that the linear plan Τ′ = {τ1, τ2, τ3τz} should facilitate mapping to every detail available in the genotype and hence every element in the plan and hence all variables used in the Dynamic Programming, including those coming from the task plan τψ are annotated with their frozen status, robot assigned, a task that the variable is a part of.

Here care must be taken in the implementation of Sr as the source of the robot. It cannot be taken as the current position, since the individual also encodes the operations already performed. It cannot be the initial source since no tasks are available at the time the robot starts, and those are incrementally added. Till a robot has some frozen tasks it does not matter which source is taken since the frozen parts are the same throughout individuals which produce the same fusion whose cost is a constant addition. The source for a task is the robot’s position at the end of its current operation, at the time the task is inserted, which needs to be updated if a robot has no operations frozen.

Let \({\rm T}_{{{\text{task}}}} = \left\{ {\tau_{1}^{{{\text{task}}}} , \tau_{2}^{{{\text{task}}}} , \tau_{3}^{{{\text{task}}}} \ldots ,\tau_{n}^{{{\text{task}}}} } \right\}\) be the trajectory corresponding to the tasks as stored in the genotype. The fusion is between multiple tasks for a robot and hence the fused sequence is given by Eqs. (16) and (17), and the cost of fusion is given by Eq. (18)

$$ \tau_{r}^{\prime } = \tau_{O\left( 1 \right)}^{{{\text{task}}}} \oplus_{D} \tau_{O\left( 2 \right)}^{{{\text{task}}}} \oplus_{D} \tau_{O\left( 2 \right)}^{{{\text{task}}}} \cdots \oplus_{D} \tau_{{O\left( {{\text{len}}\left( {\tau_{r}^{\prime } } \right)} \right)}}^{{{\text{task}}}} $$
(16)
$$ \tau_{r}^{\prime } = \oplus_{D, i = 1}^{n} \left( {\tau_{O\left( i \right)}^{{{\text{task}}}} } \right), \;\xi \left( {\psi = O\left( i \right)} \right) = r $$
(17)
$$ {\text{cost}}_{r}^{{{\text{task}}}} \left( {{\rm T}_{{{\text{task}}}} ,O,\xi } \right) = C\left( {\tau_{r}^{\prime } } \right) $$
(18)

\({\text{cost}}_{r}^{{{\text{task}}}} \left( {{\rm T}_{{{\text{task}}}} ,O,\xi } \right)\) denotes the cost of the task plans Τtask with the order of fusion O and robot assignment function ξ for the robot r. C(τr) is the cost of the sequence of operations τr. The associativity is strictly from left to right and the order of taking the tasks is as per the order specified in the evolutionary individual denoted by the O() subscripts in the notation. For each robot, a penalty is added in proportion to the number of tasks that are assigned to a robot but not solved by the robot as per the evolutionary individual. The magnitude by which the robot misses to solve the mission is also coded if the verification engine reports the same. The robots have problems in doing too many operations, and therefore, a small penalty is added for smaller solutions (in terms of length and not cost). Since optimality is of concern, the individual is also assessed and if a task is satisfied by the initial part of the individual only and all genes are not needed to satisfy the task, the string representing the solution for the task is trimmed. This acts as a local delete evolutionary operator. The fitness function is the maximum cost among all robots given by Eq. (19).

$$ {\text{Fitness}}\left( {{\rm T}_{{{\text{task}}}} ,O,\xi } \right) = \mathop {\max }\limits_{r} \;{\text{cost}}_{r}^{{{\text{task}}}} \left( {{\rm T}_{{{\text{task}}}} ,O,\xi } \right) + \alpha \;\Sigma_{r} {\text{cost}}_{r}^{{{\text{task}}}} \left( {{\rm T}_{{{\text{task}}}} ,O,\xi } \right) + \beta \;\Sigma_{\psi } \;{\text{penalty}}\left( {\tau_{\psi }^{{{\text{task}}}} } \right) + \gamma \;\Sigma_{\psi } {\text{len}}\left( {\tau_{r}^{\prime } } \right) $$
(19)

Here α is a small constant for the summation factor. β >  > 1 is the penalty constant and penalty(τtaskψ) function return the magnitude by which \(\tau_{\psi }^{{{\text{task}}}}\) violates ψ. The penalty function can be adapted as per the language. As an example, if the task ψ asks the robot to sequentially do several complex operations (atomic operations with Boolean constraints), however, the robot only a few of them, then \({\text{penalty}}\left( {\tau_{\psi }^{{{\text{task}}}} } \right)\) is the number of operations in the sequence that were not performed by the robot. γ is a small constant to penalize too many operations and \({\text{len}}\left( {\tau_{\psi }^{{{\text{task}}}} } \right)\) is the length of the solution of the task.

The fitness function can thus be seen to be the cost of the robot working for the maximum time, with a small contribution coming from the other robots. There are penalties for not completely solving a task and for having a robot do too many operations. The fitness function is primarily optimized by changing the sequence of operations that solve a task \(\left( {\tau_{\psi }^{{{\text{task}}}} } \right)\) and by changing the robot assigned to tasks (ξ). Trying all combinations of all robot operations can be computationally expensive. Hence, the planner considers every robot separately. The planar first notes the list of tasks assigned to the robot. The planer then considers all the operations associated with every task. Every task gives one sequence of operations. The sequences of all operations in all tasks are integrated into one sequence by using Dynamic Programming. This gives the sequence of operations (along with cost) of every robot, used for the fitness calculation. The evolutionary algorithm optimizes this fitness function.

This also points to the relation between evolution and Dynamic Programming. Dynamic Programming is an integrator that integrates the task sequence of operations to produce a sequence of operations to be performed by the robot. The evolution hence optimizes the sequence of operations to solve a task and not the sequence of operations to be solved by the robot, which is an easier problem. Informally, Dynamic Programming may be thought of as a local search applied to every individual in every generation.

5.4 Dynamic programming based diversity preservation

Since the tasks are continuously inserted, frozen, and deleted from the mission specification, diversity preservation is important to use. More generally, the optimization problem is against a dynamically changing fitness landscape, or the optimization is for a problem whose fitness function keeps changing due to the change in the mission specifications. A conventional optimization algorithm aims to get convergence over time, possibly converging into a global optimum. However, diversity preservation techniques add constraints that make it less likely to delete an individual that adds to diversity by being a member of an under-represented area. This significantly delays or avoids convergence. If the fitness function is dynamic, a global optimum may later become a local optimum, and a local optimum may later become a global optimum. For example, it is possible that a sub-optimal plan later becomes optimal due to the insertion of a competing task in the system by the user. The convergence into the current global optima is thus not preferred since a change in the fitness function may make the problem converged into what will later turn out to be a local optimum. The aim is now to keep enough individuals spread across the fitness landscape, such that a change in the fitness function means that the nearby individuals can quickly adapt and change the evolutionary pressures to attract more individuals towards the current global optima. Diversity preservation is hence useful to maintain a diverse spread of individuals in anticipation of a change in the fitness landscape.

The crowding-based diversity preservation is used wherein the child is generated by using genetic operators, however, the child replaces the closest of the sampled set of parents, which restricts the population diversity to drop significantly.

The implementation of diversity preservation is straightforward. The problem associated is the absence of a distance measure since the two individuals in both populations have different sizes. Let dplan(X,Y) be the distance function to be made with X and Y as the 2 individuals of the population pool. Dynamic Programming is used to compute the distance between X and Y for the solution of the task ψ and the results are added for all possible tasks. A problem is that the distance between a solution of a task with itself should be zero, while the Dynamic Programming will return the cost of the same task as the cost of fusion, and hence the cost of the maximum of the two tasks is subtracted. The Dynamic Programming-based distance function is hence given by Eq. (20).

$$ \begin{aligned}d^{{\text{plan}}}\left( {X,Y} \right) &= \Sigma_{\psi }\Big( C\left( \tau_{\psi ,X}^{{\text{task}}}\oplus_{D} \tau_{\psi,Y}^{{\text{task}}}\right) \\ &\quad - \max \left( C\left({\tau_{\psi,X}^{{\text{task}}}}\right),C\left(\tau_{\psi,Y}^{{\text{task}}}\right)\right) \Big)\end{aligned} $$
(20)

Here \(\tau_{\psi ,X}^{{{\text{task}}}}\) and \(\tau_{\psi ,Y}^{{{\text{task}}}}\) is the solution (sequence of operations) of task ψ in X and Y, respectively. \(\tau_{\psi ,X}^{{{\text{task}}}} \oplus_{D} \tau_{\psi ,Y}^{{{\text{task}}}}\) is the fusion of the two solutions by using Dynamic Programming and C() function returns the cost of a solution (sequence of operation).

5.5 Freeze aware genetic operators

The Genetic Operators consist of crossover and mutation, and they need to be carried out for all the three variables used, which are task solutions \(\left( {\tau_{\psi }^{{{\text{task}}}} } \right)\), order (O), and robot assignment (ξ). Typicality is that the operations must only alter the variables that are not frozen, which is a constraint easy to code in both crossover and mutation. Further, the crossover of the task solutions \(\left( {\tau_{\psi }^{{{\text{task}}}} } \right)\) is interesting since the parents will be of a non-uniform size. However, crossover between parents of different sizes can be done heuristically as per literature available.

Two crossover techniques are used. The first is a scattered crossover applied only to non-frozen parts. The second exploits (non-strictly) combinatorial optimization characteristics of the algorithm in which case out of unfrozen parts, a random crossover point is taken like the 1-point crossover. Till the crossover point, the strings are directly taken and swapped, and inserted in the other part only if the count of all operations does not exceed the original count. The fundamental is to keep the count of operations of the children the same as the corresponding parent, and therefore, the leftover operations are randomly inserted to maintain the counts.

Mutations are also of 2 types. The structural mutation operations consist of deleting an operation or adding a new operation out of var(ψ). Here, var(ψ) is a collection of atomic operations that appear in the specification of the task ψ. The parametric mutation operations incorporate swapping two operations and replacing an operation with a random one in var(ψ). The crossover and mutation of order variables (O) is a simple combinatorial optimization, and the robot assignment (ξ) is a simple parametric optimization.

Local optimization in a master–slave memetic architecture is also applied since many times a good solution should be readily available. Every individual after every few generations is subjected to local optimization. If an individual is reasonably near to an optimum, the local optimization can readily place it very close to the optimum that will take quite some time for the evolutionary algorithm. The local optimization thus helps in getting good individuals earlier in the optimization process. The evolutionary algorithm benefits from the improved fitness of the individuals by the local optima that makes it easier to distinguish between the global and local optima for guiding convergence. The fitness function is additionally stored as a hash table so that the re-occurrence of the same individual does not result in computation of a new fitness function by a costly Dynamic Programming fusion and can be obtained immediately.

5.6 Reaching a mission site

Suppose a robot is operating on a plan τr and the robot just completed an operation τr(i) in the plan. Now the robot needs to benefit from the optimizations carried and incorporate any addition of tasks that happened while it was completing the operation τr(i). Hence, the best-optimized plan at the time of completion of τr(i) is loaded to the robot for performing the next task. The optimal sequence loaded by the optimizer will consist of the entire list of operations, including those that are already over, and therefore in the sequence, the first unfrozen operation is selected given by Eq. (21).

$$ {\text{nextOp}}\left( r \right) = \tau_{r}^{\prime } \left( j \right):\neg {\text{frozen}}\left( {\tau_{r}^{\prime } \left( j \right)} \right) \wedge \forall_{k < j} \;{\text{frozen}}\left( {\tau_{r}^{\prime } \left( k \right)} \right) $$
(21)

Here frozen(τr(j)) returns whether τr(j) is frozen or not. It may be emphasized that it is assumed that the path τr(j) consists of a trace of whether it was frozen in the task that it came from, the task, and the robot associated with the task in the genetic individual. This is possible since all information is available when integrating using Dynamic Programming.

Since the next operation (nextOp) is transmitted to the robot and it cannot be preempted, the same needs to be frozen for optimization in all individuals. The same calculation is also performed at the task level. The robot is asked to perform the next operation of the task ψ = task(nextOp(r)). The best individual (best) has a task trajectory \(\tau_{{\psi ,{\text{best}}}}^{{{\text{task}}}}\). Suppose the task trajectory has been traced till the index nextTaskOp(ψ) given by Eq. (22).

$$ {\text{nextTaskOp}}\left( \psi \right) = \neg {\text{frozen}}\left( {\tau_{{\psi ,{\text{best}}}}^{{{\text{task}}}} \left( j \right)} \right) \wedge \forall_{k < j} \;{\text{frozen}}\left( {\tau_{{\psi ,{\text{best}}}}^{{{\text{task}}}} \left( k \right)} \right) $$
(22)

Here task(nextOp(r)) returns the task corresponding to the action nextOp(r). nextTaskOp(ψ) is the index of the first unfrozen path in the same task. The trajectory \(\tau_{{\psi ,{\text{best}}}}^{\prime } \left( {1:{\text{nextTaskOp}}\left( \psi \right)} \right)\) is frozen and therefore is appended to all the individuals in the population. This ensures that every individual in the population on a compilation by Dynamic Programming produces the same trajectory that was traced by the robot. Similarly, the robot assigned to the task is also frozen with the current robot assigned (r), if not done already. The task population for any individual X is hence given by Eqs. (2326).

$$ {\text{frozen}}\left( {\tau_{{\psi ,{\text{best}}}}^{\prime } \left( {{\text{nextTaskOp}}\left( \psi \right)} \right)} \right) = {\text{true}} $$
(23)
$$ \tau_{\psi ,X}^{{{\text{task}}}} = \left[ {\tau_{{\psi ,{\text{best}}}}^{{{\text{task}}}} \left( {1:{\text{nextTaskOp}}\left( \psi \right)} \right),\tau_{\psi ,X}^{{{\text{task}}}} \left( {k:{\text{end}}} \right)} \right] $$
(24)
$$ k = \neg {\text{frozen}}\left( {\tau_{\psi ,X}^{{{\text{task}}}} \left( k \right)} \right) \wedge_{l < k} \left( {{\text{frozen}}\left( {\tau_{\psi ,X}^{{{\text{task}}}} \left( l \right)} \right) \vee \tau_{\psi ,X}^{{{\text{task}}}} \left( l \right) = {\text{nextOp}}\left( r \right)} \right) $$
(25)
$$ \xi \left( {\psi ,x} \right) = r,\;{\text{frozen}}\left( {\xi \left( {\psi ,x} \right)} \right) = {\text{true}} $$
(26)

Here frozen(ξ(ψ,X)) denotes whether the robot assignment on task ψ and individual x is frozen. An assumption in (23–26) is that there indeed is a next operation for the robot available and the equations do not hold good when the robot just performed the last operation for the task. In such a case since the entire task is done, the task needs to be deleted from the mission (χ ← χ − {ψ}), the task populations need to be deleted for all individuals X, that is \({\rm T}_{{{\text{task}},X}} \leftarrow {\rm T}_{{{\text{task}},X}} - \left\{ {\tau_{\psi ,X}^{{{\text{task}}}} } \right\}\), along with the order (OX ← OX − {O(ψ)}) and robot assignment (ξX ← ξX − {ξX (ψ)}). In the specific current implementation, the ordering is defined as integers from 1 to the number of tasks, and therefore, the ordering variables will have to be re-worked for the same consistency.

5.7 Inserting tasks

An easier operation is when a user adds a new task to the mission (χ ← χ ∪ {ψ}). This is easy because the population is already in the format of solutions for the task and this insertion means inserting, in all individuals of the population, a new random string representing the solution of the new task. The language considered in this paper allows constructing feasible solutions by a simulation process described as follows. Hence, all the initial individuals are always feasible. Let var(ψ) be the operations that occur in the specification of the task ψ. Since there is no negation, all operations in the initial population generation and thereafter shall be sampled from var(ψ). Further, the architecture of the task specification is exploited to generate the initial population by taking a random option in the case of an OR operator, taking all options in case of AND operation and generating a random permutation of them, and taking temporal constraints in the same order for adherence. Note that so far, no initial population was ever generated, other than a blank population only. This is because this is the only mechanism by which random individuals are randomly generated and added as solutions to the task incrementally.

5.8 Complexity

Since the overall algorithm is evolutionary, the complexity for the algorithm with a population of N individuals is given by a summation of each of the terms:

  • O(N log N) for the sorting of all individuals as per their expectation values as a part of the selection operation (also for the selection).

  • O(N × (fitness function complexity)) for the calculation of the fitness value of all individuals. Since the approach is not for any specific language, suppose O(verifier(s,ψ)) is the complexity of verification that the string s satisfies the task ψ in a language chosen by the user. The verification of every task requires O(|ψ|× verifier(s,ψ)) complexity, where |ψ| is the number of tasks. The fusion of the solution of two tasks requires an additional complexity of O(\(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|^{2}\)), where |\(\tau_{\psi }^{{{\text{task}}}}\)| is the (maximum) length of the task solution. An iterative fusion of |ψ| tasks see an increase in the fused string length and hence a complexity of O(|ψ| ×|\(\tau_{r}\)|× \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|\)), where |\(\tau_{r}\)| is the (maximum) length of a robot plan. The maximum length of the robot plan is |ψ|× \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|\), which is the summation of all operations of a task, making the complexity for fusion as O(|ψ|2 × \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|^{2}\)). The total complexity of computing all fitness values is thus O(N ×|ψ|× verifier(s,ψ) + N ×|ψ|2 × \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|^{2}\)). In our case, the verification algorithm consists of the AND, OR, and sequence operators with a complexity O(\(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|\)× len(ψ) + \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|^{2}\)), where len(ψ) is the length of the task specification string. This further gives the complexity as O(N ×|ψ|× \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|\)× len(ψ) + N ×|ψ|× \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|^{2}\) + N ×|ψ|2 × \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|^{2}\)). Considering len(ψ)≈\(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|\), the complexity is given by O(N ×|ψ|2 × \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|^{2}\))

  • O(N × CF ×|ψ|× \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|^{2}\)) for diversity preservation where CF (crowding factor) is the number of individuals considered for replacement of the child population, |ψ| is the number of tasks, \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|\) is the (maximum) length of the task solution. The additional factor of \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|^{2}\) is for Dynamic Programming fusion and |ψ| is to fuse all the corresponding task solutions.

The total complexity is O(N log N + N ×|ψ|× verifier(s,ψ) + N ×|ψ|2 × \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|^{2}\) + N × CF ×|ψ|× \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|^{2}\)) for the general case, and O(N log N + N ×|ψ|2 × \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|^{2}\) + N × CF ×|ψ|× \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|^{2}\)) for the specific verification system considered. Practically considering CF > log N and CF >|ψ|, the complexity is given by O(N × CF ×|ψ|× \(\left| {\tau_{\psi }^{{{\text{task}}}} } \right|^{2}\)).

It may be inquisitive that the number of robots does not appear in the complexity calculations, which is because the higher the number of robots, the smaller is the sequence length per robot and the smaller is the complexity. This means that the worst-case appears when the number of robots is the smallest that is used for the calculations. Even though the complexity reduces by an increase in the number of robots, adding robots creates new dimensions in the hypervolume of the search space of the evolutionary algorithm that makes the optimization a lot harder.

The approach is probabilistically optimal, like most techniques that use evolution, meaning that the probability of finding the optimal solution is non-zero. In other words, the approach is guaranteed to find an optimal solution as time tends to infinity. This happens because the mutation operator has the potential to suitably (and iteratively) modify any individual into an optimal solution, while the probability of survival of the individual by the selection operator is itself non-zero. It is not possible to compute a theoretical bound on the solution quality with respect to time as per the working of the algorithm.

6 Results

6.1 Comparative analysis

It is imperative to compare the algorithm with state-of-the-art approaches. First comparisons were aimed at using optimal planners. A naïve implementation of the planner could not solve a moderately sized problem because of the exponential time complexity. The Uniform Cost Search exhausted memory, while the Iterative Lengthening Algorithm was exhausted by time. Both searches are explained in [41]. An optimal search could only solve the problem of up to 13 variables. Thereafter, for comparisons, the optimality criterion was removed. The model verification approaches were used using the NuSMV library. The library could give non-optimal results till a problem size of 15 variables. The approach could not convert the LTL into an automaton for model verification because of exponential complexity. These are much smaller than the 24 tasks with up to 15 variables per task, totaling problems with 360 variables. Thereafter, heuristic searches were attempted that attacked the nature of the problem. Again, the search was primarily exponential to the number of variables and thus the approach could be used for problems with a small number of mission sites only. The observations are not surprising. One of the simplest cases of the proposed algorithm is to solve a Travelling Salesman problem, for which meta-heuristic evolutionary algorithms already have gained popularity, in contrast to all classic approaches. Unfortunately, there is no work on evolutionary algorithms for generic mission planning that can be used for comparisons.

The term ‘number of variables’ used in our approach is not the same as that used in the LTL. Assume a problem asking the robot to visit 5 places, say A, B, C, D, and E. In our methodology, the problem has 5 variables that were used to specify the problem. Let visit(A) be a propositional variable of the classic planning equivalent problem that can have a value of true or false at any point of time, and similarly for the other variables. The state-space consists of all possible values that all variables can take. Since visit(A) can take 2 values and similarly for all the other variables, the total state space has a size 25 or 32 (instead of 5). So, for n variables mentioned in our definition, the equivalent state space size for a classic planning system could go up to 2n. LTL in the worst case can generate as many states as in the classic planning, using additional states for handling the temporal constraints. Hence directly comparing the term ‘number of variables’ for the proposed approach and the same term in LTL or classic planning systems is not valid. Different operators have different ways in which they affect the working of the algorithm. The OR operator makes the problem a lot easier as compared to the sequence, which is reasonably easier as compared to the AND operator. The results are with minimal use of the OR operator and sequence operator, and with maximum use of the AND operator.

The algorithm hence is compared with a baseline Genetic Algorithm that is the same as the proposed approach, however, optimizes the fusion to produce a linear path for the robots instead of using Dynamic Programming. This algorithm is henceforth called “without Dynamic Programming”, a short form of the Genetic Algorithm optimization without using Dynamic Programming. A non-incremental variant of the proposed algorithm is also tested, henceforth called “with dynamic Programming”, a short form of the non-incremental Genetic Algorithm optimization also using Dynamic Programming. The proposed algorithm is an incremental Genetic Algorithm optimization also using Dynamic Programming. The problem of comparing these algorithms with the proposed approach is that in the proposed approach, the robot keeps travelling and deletes the completed tasks, which changes the fitness function, while in other approaches, the robot only stays at the source till the optimal sequence is computed. As a result, the robot was not allowed to move in the proposed approach as well for analysis. The results were compared with random problems of a different number of tasks. Each task specification used 5 to 15 atomic operations in their specifications. The tasks were specified by using three operators which are AND (∧), OR (∨), and sequencing (T). Since all operators are binary, the number of operators is 1 less than the number of operands (atomic operations). The toughest operator is the AND operator, which was generated with a probability of 0.7. The sequencing is simpler to optimize as the elements of the sequence may be solved independently and the results appended gives a reasonably good solution of the overall sequence. The probability of the use of sequencing was thus 0.2. OR is the simplest operator which rules out some atomic operations to be performed and hence the probability was kept as 0.1. To avoid the OR operation making the overall task very simple (say making the task as a simple operation OR a high complexity task), the OR operator was used within parenthesis and a very small number of operands, in the absence of sequence operator representing the conjunctive normal form. The problems involve 5 robots.

The comparative results for small and large problem sizes are shown in Fig. 3. Figure 3 also shows the costs relative to the optimal value. The proposed approach performed significantly better than GA. Under the setting, the incremental and non-incremental variants of the proposed algorithm become the same since the robot is not allowed to move. In the incremental version, the tasks are added one after the other. Typicality was that the incremental version performed minutely better consistently, meaning that it is better to optimize every task one by one and then add it to the population pool representing the entire population, which makes another advantage of the work. For the case with 4 tasks, the non-incremental version outperforms the incremental version since the problem is not complex and the non-incremental version allows generation of a new solution considering all the interaction between all tasks, while the incremental version attempts to initially build over the solutions with the existent tasks that can be somewhat restrictive. The total execution time may initially seem large. However, the tasks are complex, and the optimization happens as the robot performs. It was observed that every unit action involved the robot to approach and interact with the human which took an extremely long time in contrast to the times used for the experiments.

Fig. 3
figure 3

Comparative results

It is imperative to see how the algorithm behaves as the number of tasks is increased. The convergence obtained for all the discussed algorithms is shown in Fig. 4. As the number of tasks increases, the convergence gets slower. The larger be the number of tasks, the higher is the overall cost. Therefore, the cost is also plotted relative to the convergence optimal value. The proposed algorithm adds tasks one after the other. This gives it a very good starting point and gives it an overall early convergence. Even though increasing the number of tasks delays the convergence, it may also be seen that a major proportion of the convergence happens very quickly, for a reasonably high number of tasks. It is unlikely that the robot will do too many tasks simultaneously at the same time.

Fig. 4
figure 4

Convergence with increasing the number of tasks

A better manner of understanding the scalability of the algorithm is to plot the convergence time against the increasing number of tasks. The plots for obtaining 80% of the convergence value and 85% of the convergence value are shown in Fig. 5. Generally, increasing the number of tasks increases the convergence time. However, sometimes adding a task makes the solver biased to get a solution that also simultaneously solves the other task. Therefore, sometimes the time decreases. It takes sufficiently more time to get 85% convergence in contrast to getting 80% convergence. The plots are only shown for a smaller number of tasks for the approach without dynamic programming since it is unable to get the set desirable value till the set timeout.

Fig. 5
figure 5

Time to converge for different algorithms for an increasing number of tasks

Finally, the convergent values reached by the different algorithms are shown in Fig. 6a. As the number of tasks increases, the robots must invest more traversal time. Figure 6b specifically shows the convergent costs relative to the converged optimal value. A value of 0 is therefore achieved by the best algorithm, while the closer the value is to 0, the better is the algorithm.

Fig. 6
figure 6

Increased complexity with the number of tasks

To understand the approach better, the algorithm is further compared against 7 different algorithms representing some greedy heuristics.

  1. 1.

    Random Genetic Algorithm: The first algorithm is a naïve implementation of a Genetic Algorithm that generates random strings, tests the validity of the strings, and computes the fitness function. The algorithm does not use any language-specific heuristic. After prolonged runs, it was observed that the algorithm could not generate any feasible solution for the mission.

  2. 2.

    Random Search: Another algorithm was used for the comparisons. The algorithm generates feasible solutions of tasks using the language-specific heuristics. The task solutions are randomly fused into a mission solution. There is no implementation of a Genetic Algorithm like an optimization algorithm. The algorithm only does random searches. The solutions generated using the approach were feasible due to the language-specific heuristics, but the solutions were extremely poor in terms of cost.

  3. 3.

    Random Search with Dynamic Programming: An attempt was made to improve the above algorithm by using Dynamic Programming for fusing the task solutions into the mission solutions. As expected, the results were significantly better in contrast to a random fusion. However, the solutions were still found to be reasonably poor in contrast to the proposed algorithm.

  4. 4.

    Greedy Robot Assignment: One of the aspects of the algorithm is a robot assignment problem that assigns the robot to the different tasks. An auctioning-based system is implemented. Upon getting a new task, the system asks for bids from all the robots and the best bid is given the task. The robots bid with the time of completion of their currently assigned tasks and the robot that is expected to complete all its currently assigned tasks at the earliest wins the bids. Genetic Algorithm is used for optimizing the task solutions and Dynamic Programming is used for the fusion of the task solutions. The approach performs well, however not as good as the proposed approach that shows the limitation of the heuristic.

  5. 5.

    Sequentially Solving Tasks: Several humans solve the missions by completing one person’s order and only then accepting the other person’s order. The task solutions are optimized. This heuristic is implemented, wherein a robot does not start a new task until and unless its current task is over. The heuristic is sub-optimal, and therefore, the approach generates very poor solutions.

  6. 6.

    Greedy Solver: A greedy solution to the problem is constructed. If the robot is asked to perform several operations using the AND operator, the robot will perform all of them in the same order as specified. If the robot is asked to perform one of the several operations using the OR operator, the robot would choose the lowest cost operation from its last location in the greedy solution, without caring about the future. For several tasks, the robot solves one task only after finishing an earlier one. The robot assignment is optimized. Essentially the approach does a greedy traversal. The heuristic is sub-optimal, and this results in paths that are as poor as a random traversal.

  7. 7.

    Greedy Solver with Dynamic Programming: The above algorithm was improved by using Dynamic Programming for fusing the task solutions. The approach resulted in better costs that was still reasonably poor due to the poor choice of greedy heuristics.

The results using the above methods are given in Table 1. Except for the very small problem size, the proposed approach performs better than all the other approaches.

Table 1 Comparative analysis of different approaches for different problem sizes

6.2 Experiments with different numbers of robots

The mission planning problem is solved using several robots. The workplaces shall normally deploy many robots to efficiently solve the requirements of all the users. Therefore, the effect of increasing the number of robots is explicitly studied. The aim is also to see the betterment of the proposed algorithm in contrast to the naïve implementation of an evolutionary algorithm that does not use dynamic programming. The results for the different algorithms from small to many robots are shown in Fig. 7. To make matters hard, the experiments are done using the maximum number of tasks investigated earlier, that is 24 tasks. The proposed approach outperforms the evolutionary approach without dynamic programming for all cases of varying the number of robots. Again, it is further observed that giving the tasks one after the other has a slight advantage to giving all tasks at once.

Fig. 7
figure 7

Comparative results for different number of robots

The effects of increasing the number of robots in different algorithms are further studied by plotting the convergence profiles. The results are shown in Fig. 8. In terms of the absolute values, increasing the number of robots decreases the overall cost, and therefore, the associated curves dig deeper. The convergence profiles however show no major change by increasing the number of robots. The relative costs more clearly show the effects of the number of robots. Increasing the number of robots generally makes it harder for the algorithms to converge. However, unlike increasing the number of tasks, in the case of robots, the change is not much profound.

Fig. 8
figure 8

Convergence with increasing the number of robots

The time to converge to 85% and 80% values are shown in Fig. 9. Initially, increasing the number of robots shows an appreciable change in the convergence time. However, later the convergence time becomes invariant to the increase in the number of robots. This happens when there is a sufficiently high number of robots to do the task, and adding more robots has a negligible improvement which has a negligible increase in the convergence time. The case of without Dynamic Programming is not shown as the method does not make the convergence to the specified limit value even after a prolonged time.

Fig. 9
figure 9

Time to converge for different algorithms for an increasing number of robots

The effects of increasing the number of robots are further shown in Fig. 10. In terms of the absolute values, increasing the number of robots from 1 to 2 gives a significantly smaller cost. However, a still further increase in the number of robots has a much smaller reduction in the overall cost. The relative costs show the goodness of the different algorithms, with the proposed algorithm being the best and hence a relative cost of 0.

Fig. 10
figure 10

Complexity with the number of robots

6.3 Experimental results

The results are obtained on a Pioneer LX robot, which is a differential wheel drive robot equipped with a lidar sensor for 2-dimensional vision and high accuracy wheel encoders for odometry. The robot has manufacturer-supported libraries that enable it to make a map of the workplace using the lidar sensor, encoders, and an onboard IMU. The robot was initially taken to the robotics facility at the Centre of Intelligent Robotics at the host institute to make a detailed map, while the robot was initially operated by using a joystick. The different places of interest were marked while the map was being made. The map was loaded into a manufacturer-provided map editor to mark the other places of interest. These places are used by the robot to convert a symbolic place name into real-world coordinates. The map was used to create a cost matrix between every pair of places. This was facilitated by using a Probabilistic Roadmap technique to fit a roadmap. The distance between every pair of places was computed by using a graph search algorithm. This cost matrix was loaded into the algorithm setup. The Robot Operating System (ROS) was used as a platform to allow easy integration between different modules. The robot is already ROS compatible with integration with the ROS navigation stack. This helps to pass coordinates of the places to the robot at the goal channel and the robot is autonomously moved to the place by using the ROS navigation stack and an acknowledgment is received on the channel. The robot uses the lidar, map, and wheel encoders to continuously localize while using a fusion of a deliberative and a reactive planning technique for the motion. The physical motion happens by using a manufacturer-specific control module. The proposed algorithm computes an operation, consisting of the place to be visited by the robot and the operation to perform (currently limited to saying a few statements). The goal location is given to the goal channel and once an acknowledgment is received, the operation is performed. Since an audio message is the only operation now, the message is played on the robot’s speaker, followed by a pre-specified time wait.

Consider the mission to be performed by 3 robots, initially placed at R1, R2, and R3. The initial position of the robots along with the mission sites are shown in Fig. 11. Box 1 shows the different tasks given by different users in the order of input and Box 2 shows the output log of the robot as it completes all tasks step by step. The snapshots of the robot's execution are given in Fig. 12. The robots chose optimal sequences and completed all the tasks that were given. The tasks are added one after the other as per a pre-specified time in the simulator. Since there was only 1 available robot in the laboratory, the results were first recorded in the simulator under live settings, and the sequences were transferred to the robot. Further, since the actual robot was slow, there was enough time to highly optimize the plans, the results of which would not have been transferable to faster robots. Hence, the simulations purposefully assumed faster robots. The manipulation tasks were not done by the robot since it did not have a manipulator and those may be assumed to have been performed. Due to the heavy constraints of space, only one result is discussed.

Fig. 11
figure 11

Map of the robotics and machine intelligence laboratory along with the mission sites

Box 1 Input mission statement
Box 2 Steps that were taken to solve the mission by the robots
Fig. 12
figure 12

Experimental results. The first row has the execution of robot 1, the second row of robot 2, and the last row of robot 3

7 Conclusions

In this paper, the problem of complex mission planning with multiple robots was attempted. Classic approaches work on generic temporal logic and typically have an exponential complexity and are not scalable to many variables. The proposed approach uses Boolean and sequencing specifications to specify a mission in a language that is generic enough to represent many problems of service robotics while guaranteeing a polynomial-time verification. The synthesis of a trajectory is thus done by using an evolutionary approach. The proposed solution is hence significantly better than the model verification approaches that have exponential complexity and are solvable for only a small number of mission sites.

The focus was to work incrementally, wherein the robot keeps optimizing the mission solution as it operates. This acts as a natural stopping criterion of the evolutionary mission solver. Even though a lot of work has already been done for planning using evolutionary computation, most evolutionary approaches do not consider the incremental nature of the problem. The research has highlighted new issues with incremental evolution which were handled in the algorithm by freezing some genes in optimization and being conscious of the frozen variables in all aspects of the optimization. The problem is harder since the genotype is compiled to a different domain to give the physically traced trajectory. In non-incremental implementations, if a sub-optimal solution is found early, there is nothing that the robot does to optimize the plan while it is performing the mission. Further, the robot may have to wait for a prolonged time for a mission solution to be computed in the first place. The incremental implementations solve both the limitations.

It is important to understand that the verification-based approaches solve for the complete class of LTL, that is for all systems that can be specified using an LTL formulation. This paper does not solve the same problem. It cannot take any general LTL as an input and use approximations for solving the same problem. The approach is only for polynomial-verifiable languages, while the restriction that a string can be verified in a polynomial-time already means that many constraints shall have to be imposed. The experiments are done using a language that only uses AND, OR, and sequencing operators, which is a small subset of the entire representation capability of the LTL. While this is a major disadvantage, the paper argues that the language can cover a lot of scenarios specific to service robotics, where a robotic team services the everyday needs of humans. Attempts were made to quantify and measure the approximation against optimal solvers, by running the optimal solvers for a prolonged time. However, the optimal solvers having an exponential complexity could only give a result for problems of small size, while the exponential complexity suggests that waiting for a prolonged duration of time does not ensure a solution.

Since the approach marks a paradigm shift in the solution class to the mission planning problem, there were no competent methods for comparison available in the literature. Comparisons were hence shown on using a Genetic Algorithm as a baseline, and to make the baseline as strong as possible, every heuristic available in the problem and used by the solver was also added in the baseline Genetic Algorithm. Comparisons were also made with a non-iterative version of the proposed algorithm that took all tasks at once instead of taking them one by one as is in the case of the proposed algorithm. Comparisons were also made against several greedy heuristics that may be thought of for solving the problem. The proposed algorithm beats all these approaches. The system was tested on a real robot named Pioneer LX.