1 1 Introduction

Natural computing (NC) [1] is a term used to describe those methods inspired by nature to create innovative problem-solving techniques or those focused on the use of computer systems to imitate natural phenomena or those which use bio-inspired materials for processing. The main research fields that involve NC are artificial neural networks (ANN) [2], swarm intelligence [3], evolutionary algorithms [4], quantum computing [5] and membrane computing (MC) [6]. It might seem that the relationship between computer science and biology is boosting at the moment, with positive impacts for both fields. Biology is a rich source of ideas for creating new solutions and paradigms for computationally hard problems, generating new concepts and hypotheses that help to describe and evaluate different systems. By the end of 1990s, membrane computing (MC) had emerged as a branch of natural computing. The models pertaining to this computational framework, called P systems, were considered as distributed devices inspired by cell structure. Some of these models were also taken off the structure and function of more complex entities such as tissues and organs. MC began as a new experimental paradigm that created a wide range of basic research in theoretical computer science. Membrane systems have been used in recent years as modeling vehicles for different biological systems or as design languages in graphics or to represent a wide variety of parallel systems. Because of its non-determinism and maximum parallelism features, many engineering design optimization problems are solved with MC at a reasonable time [7]. Membrane structure classifies MC systems into cell-like [8], tissue-like [9], and neural-like [10] structures. The latter group is MC's new branch, which is integrated into the P systems (SN P systems) by spiking neurons.

Spiking neural P systems (SN P systems) are bio-inspired neural computation models that are designed by abstracting the way the neurons spike. This means the coordination of biological neurons is accomplished by spikes in central nervous systems. SN P systems come under artificial neural network models of the third generation.

An SN P system can be represented by a direct graph with nodes referring to spiking neurons and edges shape synapses between neurons that can represent SN P systems. SN P systems with the division of neurons [11], budding [12] and spikes and anti-spikes [13] are implemented in many applications and have proven to be efficient and reliable. SN P systems have been viewed as word generative devices of 0’s and 1’s [14]. To solve knapsack problems, an optimization SN P system was introduced to find solutions to the unconstrained single objective optimization problem [15]. It was applied to solve knapsack instances and proved to be efficient. Also, the use of SN P systems can be seen in many smart and expert fields, such as the semantics of deductive database systems [16] and multiples in parallel [17]. It is therefore concluded that SN P systems are more powerful in both theoretical and practical aspects of intelligent and expert systems [18].

The Vehicle Routing Problem (VRP) [19] is the problem of constructing suitable routes from a depot to a range of destinations each with business-specific restrictions, such as vehicle limits, cost controls, time periods and resources limitations related to the depot loading process. If all the services start and end in some fixed time interval, then it is called VRP with time windows (VRPTW) [20]. VRP is known for its computational complexity for which many exact and heuristic algorithms have been proposed, but arriving optimal solutions is still a challenging task. Researchers almost define some basic information about the locations and demands of customers, available vehicles, and so on. In most VRP studies, these are fully known before the service plan is carried out. However, VRP is dynamic (DVRP) [21] in actual service processes; that is, the demands and arrangements of customers are gradually changing over time, although some of the demands of customers may be known beforehand. Also, since 2014, most of the researchers explore green VRP (GVRP) [22], a new variant of VRP, which focuses on overcoming difficulties of limited refueling infrastructure and minimizing CO2 emissions.

A meta-heuristic suggested by Yang in 2008 is inspired by nature is based on fireflies ' flashing actions, which serves as a signal system for attracting other fireflies. As can be seen in several surveys [2325], since its proposal, the FA has been applied successfully to many different fields of optimization problems. Nevertheless, in the current scientific community, it still attracts a lot of attention. Since FA has been designed for continuous optimization problems, there are only very few applications of FA in VRP instances. This lack of work, together with the minimal computational overhead and the good performance that the FA has demonstrated since its proposal, has motivated its use in this study.

The aim of this paper is to develop an efficient spiking neural P system with modified rules, potentials and learning capabilities to solve the combination of all of the above-mentioned variants, called dynamic green VRP with time windows (DGVRPTW), which is more complex because of the additional constraints and objectives added. The suggested SN P system here is a multilayer SN P system used for the region prediction of dynamic customers followed by an FA for route optimization.

The main contributions of the proposed work are:

  1. 1.

    A mathematical model of the DGVRPTW has been formulated with some added constraints which optimize the refueling capacity of vehicles and CO2 emissions.

  2. 2.

    A multilayer SN P system with learning and training has been designed with activation rules on synapses for tackling dynamism.

  3. 3.

    An ideal arrangement of neurons in the SN P system is presented which carries information and does the processing accurately.

  4. 4.

    A new way of encoding fireflies and distance calculation is adopted for optimizing routes inside each cluster.

The remaining sections are methodized as follows. Section 2 carries out a detailed literature survey on the problem. Section 3 talks about classical FA. Section 4 presents details about DGVRPTW, dynamism followed by the mathematical formulation of the problem. Section 5 gives the design of SN P systems with potentials. Section 6 gives a detailed description of different modules, structure, and functioning of the proposed system. Experimental analysis and results are given in Sect. 7. Finally, the conclusion and future scope of the work are presented in Sect. 8.

2 Literature review

2.1 Dynamic vehicle routing problem: survey

Technology innovation in various fields has fostered the development of intelligent transportation systems (ITS), using the advent of geo-location techniques with geographic information systems [26]. The object of vehicle routing is an administrative function linked to the competence of dispatch as well as the cost of optimization directly dependent on the size of the fleet [27]. The state-of-the-art schemes in ITS use an optimization technique coupled with some heuristic search techniques to attain a solution.

Additional constraints such as regulations on waiting, service and travel times characterize the transportation of people. Cab system is the most common individual online transportation system where the requests from the customers are made up of pickup time and location coupled with the drop location. There are many variants available, such as advance booking, sharing, etc. [28,29,30,31].

Owing to the highly variable travel time, factors such as congestion, rivalry, and cooperation between transport companies should be taken into consideration. All these applications fall under the logistics of the region. Effective scheduling of real-time traffic and dynamic routing of a fleet of vehicles require additional modules and attributes. For better trade-off between dynamic travel and service times, a decision support system (DSS) can be used [32,33,34].

In courier transport systems, not only request locations, time windows and capacity constraints should be taken into account, but also traffic data and prohibited paths should also be considered. The Automatic Fleet Management System (AFMS) enabled optimization has been considered to solve these types of problems [35,36,37].

Customer satisfaction is an important factor in FMS. Delivery of newspaper is an example of such a type of domains where complaints will be filed in the event of delivery delay. Centralized applications were proposed to reduce costs and improve customer satisfaction, using cell phones to communicate continuously with drivers at the same time as routing [38, 39].

The retailer designs different customers in grocery delivery systems that can be considered within a fixed time span. At the same time, when the capacity constraints are violated, it is made inaccessible to the customers. A customer's time slots are actively designed based on future in-home delivery request issues [40]. Greedy randomized adaptive search procedure (GRASP) and adaptive large neighborhood search (ALNS) have been proposed that consider uncertainty by introducing scenarios for possible realizations of demand. Problems with dynamic stacker cranes [41, 42] which considered container carriers with loading and unloading vessels are undergoing operational study. However, factories and hospitals are other areas of operation where goods or medical devices have to be moved, respectively [43].

The capacity constraints can be exceeded in the service domain. The dynamic traveling salesman problem [44] is one of the most illustrative models in this domain. Certain fields provide activities for repair and maintenance, winter gritting, etc. A daily visit to customer locations is mandatory in the first groups. A customer can also order for a new service in the event of an emergency. A context is that the French non-profit organization operates under the call of patients with a crew of physicians supported by certain emergency services. A subset of streets or roads that are impacted by the storm must be gritted in winter gritting applications. Depending on the path of the storms, the new sections have to be gritted [45].

2.2 Green vehicle routing problem: survey

Green-VRP (G-VRP) research focuses on optimizing transport energy consumption. The cost of gas accounts for a large part of the total transportation cost of petroleum [46]. The most obvious course of action would be to reduce fuel consumption and improve transportation performance at the operational level. It is also important that a reduction in fuel consumption dependent on petroleum will substantially reduce greenhouse gas emissions [47]. To include fuel consumption in the routing model, it is important to formulate computing fuel consumption in relation to the condition of a moving vehicle. In [48], the authors considered a more practical transport cost impacting both the load of the vehicle and the length of the traveled arc. They describe the Energy Minimizing Vehicle Routing Problem (EMVRP) as the CVRP with a new cost objective, where the cost function is a combination of the total load (including the empty vehicle weight) and the arc duration. A model of fuel consumption is given in [49]. They suggested a Fuel Consumption Rate (FCR) known to be CVRP (FCVRP), which would extend CVRP to reduce fuel consumption. In [50], the authors applied the transport speed to the fuel consumption measurement model in time-dependent VRPs in addition to the transport range and the charging weight discussed in the above two articles. Other VRP-related studies that aim at minimizing total fuel consumption are also been designed [51,52,53,54]. This design seeks to eliminate the possibility of running out of fuel with the goal of reducing the total distance. Stegner [55] extended time-windowed G-VRP. The above papers on GVRP state that fuel consumption merely provides the formula for calculating fuel consumption, assuming that the fuel is sufficient to cover the entire tour.

State-of-the-art schemes have only considered the DVRP without any additional constraints such as CO2 emissions, fuel consumptions etc. Also, it lacks concrete learning and training methods to predict the dynamic regions for customers. Instead they make use of schedulers to convert a dynamic requests to static requests over time periods and then send to the optimization procedures for optimal solutions.

The proposed system is no way similar to the schemes in literature. The authors try to approach the problem from a different perceptive. Here, the maximum parallelism feature of SN P systems are coupled with learning and training methods in order to predict the dynamic customers on arrival rather than converting it to static requests by schedulers. Thus, the proposed system replaces the schedulers and predicts the region of customers based on the trained data. Moreover, it is for the first time, the SN P systems are employed for such specific tasks with learning and training functions in VRP domain, which highlights the novelty of the system.

3 Classical firefly algorithm

CFA was first introduced in [56], which was based on the flashing behavior of fireflies for drawing in prey and mates. The main rules used in FA are:

  1. 1.

    Fireflies will be attracted to each other regardless of their gender.

  2. 2.

    As the separation between the two fireflies’ increases, their attraction decreases. For any two fireflies, the less bright one will move towards the brighter one. In the event that the brightness is the same or less, it will move arbitrarily.

  3. 3.

    The landscape of the objective function is dependent on the brightness of the firefly.

The attraction of a firefly depends upon the light intensity and this relation can be expressed by the following formula:

$$ \beta \left( r \right) = \beta_{0} D, \;{\text{where}} \;D = e^{{ - \gamma r^{2} }} $$
(1)

where \(\gamma\) is the light absorption coefficient.

Also, the distance between two fireflies at positions i and j is calculated by:

$$ r_{ij} = \left| {\left| {X_{i} - X_{j} } \right|} \right| $$
(2)

The movement of a less bright firefly i towards a brighter one j can be calculated by:

$$ X_{i}^{{{\text{new}}}} = X_{i}^{{{\text{old}}}} + \beta \left( {r_{ij} } \right)\left( {X_{j} - X_{i} } \right) + \alpha \left( {{\text{rand}} - 0.5} \right) $$
(3)

where \(\alpha\) a randomization parameter and rand is a random number within the uniform distribution over [0, 1].

4 Problem description and formulation

4.1 Dynamic green vehicle routing problem with time windows

Given a complete graph G with nodes representing client locations, depot and fuel stations, the DGVRPTW finds a set of routes, each of which starts and ends at depot, visit a set of clients within the specified time window without violating the vehicle capacity and fuel tank capacity with minimum total distance by each of the vehicles. Because of the recent improvements in real-world communications, ‘dynamism’ would be very significant. In dynamic routing, not all pieces of information on the routing process are available when it begins, and the information may change even after the routes have been constructed. Figures 1 and 2 show the route planning procedure in the static and dynamic environment, respectively. The dotted lines in Fig. 2 show the new path constructed based on the dynamic request from the customer D. The F1 and F2 are fuel stations.

Fig. 1
figure 1

Static environment route planning (at t = t0)

Fig. 2
figure 2

Dynamic environment route planning (at t = t1)

4.2 Degree of dynamism

The performance of dynamic routing is calculated based on the number of dynamic demands and the time when these demands occur, while static VRP depends on the number and distribution of clients. If we examine the performance of a specific algorithm under constraints, the measure of' dynamism' would be important. The reaction time of 2 customers with time windows under dynamic situation is given in Fig. 3, where r1 and r2 are reaction times of customer 1 and customer 2, respectively.

Fig. 3
figure 3

Reaction time of two customers in dynamic scenarios with a time window

In case of time windows (\({t}_{w})\), the effective dod (edod) can be defined as:

$$ \begin{gathered} e_{{{\text{dod}}}} = \frac{1}{n}\mathop \sum \limits_{k = 1}^{n} \left( {\frac{{T - \left( {l_{k} - t_{k} } \right)}}{T}} \right) + t_{w} \hfill \\ \quad \;\; = \frac{1}{n}\mathop \sum \limits_{k = 1}^{n} \left( {1 - \frac{{r_{k} }}{T}} \right) + t_{w} \hfill \\ \end{gathered} $$
(4)

where 0 \(\le {e}_{dod}-{t}_{w}\le 1\), \({l}_{k}-{t}_{k}\le T, k=\mathrm{1,2},..n\), where n is the total number of requests and the planning horizon is defined in the interval l[0, T].

4.3 The mathematical formulation of the problem

DGVRPTW is defined based on a complete graph G = (V, E), where V = {1, 2, M, M + 1} denotes the vertex set and E = {{(i, j)| i, j \(\in \) M, i ≠ j} denotes the edge set. Here, Vcust \(=\hspace{0.17em}\{\mathrm{1,2},\dots M\}\) is the customer set of vertices. M + 1 is the depot vertex. The method of delivery of the commodity to the set of customers is carried out using the same set of M vertices with a fixed maximum capacity Q for each vehicle. The depot D is fitted with a collection of n vehicles.

Each vertex i \(\in \) V is associated with several measures such as traveling cost/distance denoted as \({C}_{ij ,}, \left(i,j\right)\in E\) and \(i \in M, j \in M\), a request\({r}_{i}\), time of service by vehicle k, \({t}_{ik}\) and the time window [\({e}_{i }, {l}_{i}\)], where \({e}_{i}\) represents the earliest time and \({l}_{i}\) represents the latest time when the service at vertex i can starts. Also, the speed function f is defined by p non decreasing levels \({p}^{f}\) over set S = {1, 2, p}. So, the problem is to find the minimum total traveling cost subject to vehicle capacity, time window, and fleet size that visit each customer in the set \({V}_{cust}\). But, while servicing the customers, each vehicle emits carbon dioxide (CO2) which is proportional to the total fuel used by vehicles.

So, the main objectives of DGVRPTW are to find routes with minimal total traveling cost, total fuel consumption and CO2 emissions subject to the following conditions:

The vehicle leaves the depot D and returns to D after completion of customer service.

  1. 1.

    The number of vehicles used at the depot cannot exceed the size of the fleet.

  2. 2.

    Each customer is served once by one vehicle.

  3. 3.

    The service to the customers must start with the time window [\({e}_{i }, {l}_{i}\)]

  4. 4.

    The total capacity of each vehicle cannot exceed Q

  5. 5.

    The vehicle’s early arrival is allowed, but the vehicle must wait for the service to be delayed to time \({e}_{i}\) but not later than \({l}_{i}\)

  6. 6.

    Emissions of CO2 are proportional to the overall vehicle fuel consumption

The mathematical model of DGVRPTW involves a group of decision variables that model the sequence of vehicles visiting customers and the following are defined:

$$ y_{ijk} = \left\{ {\begin{array}{*{20}l} {1,\quad {\text{if}}\;{\text{vehilce}}\;k\;{\text{travels}}\;{\text{from}} \;i\;{\text{to}}\;j} \hfill \\ {0,\quad {\text{otherwise}}} \hfill \\ \end{array} } \right. $$
(5)
$$ x_{ijk} = \left\{ {\begin{array}{*{20}l} {1, \quad {\text{if}}\;{\text{vehicle}}\;k\;{\text{traverses}} \;{\text{edge}}\;\left( {i,\;j} \right)\;{\text{with}}\;{\text{speed}}\;f} \hfill \\ {0, \quad {\text{otherwise}}} \hfill \\ \end{array} } \right. $$
(6)

\(P_{ijk } \in R +\): the total amount of flow on each edge \(\left( {i,j} \right) \in V\).

\(Z_{i} \in R +\): the time which service starts at node \(i \in V\).

The mathematical model of the problem can be formulated as:

Minimize.

  • The total traveling cost performed by each of the vehicles.

    $$ \mathop \sum \limits_{i \in V } \mathop \sum \limits_{j \in V, i \ne j } \mathop \sum \limits_{k \in N} C_{ij} y_{ijk} $$
    (7)
  • Minimize the total fuel usage and CO2 emissions

  • The total fuel usage of each of the vehicles is directly proportional to the CO2 emissions. The fuel rate [57] can be expressed by:

    $$ {\text{Fuel}}\;{\text{rate}} = \xi \left( {\omega GV + {\raise0.7ex\hbox{$P$} \!\mathord{\left/ {\vphantom {P \psi }}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$\psi $}}} \right)/k, $$
    (8)

where \(\xi \)-fuel to air ratio;

\(\omega -\) engine friction factor.

G-engine speed.

V-engine displacement.

\(\psi \) and k are constants.

P-engine power output per second which is given by:

$$ P = P_{{{\text{total}}}} /\varphi_{tf} + P_{re} ,\;{\text{where}}, $$
(9)

\(P_{re}\)—engine power needed for running losses of the engine.

\(\varphi_{tf}\)—vehicle drive train efficiency.

\(P_{{{\text{total}}}}\)—total tractive power in kilowatts

$$ \begin{gathered} \mathop \sum \limits_{i\smallint V} \mathop \sum \limits_{j\smallint V, i \ne j} \mathop \sum \limits_{k \in N} \omega GV\lambda C_{ij} \mathop \sum \limits_{f \in S} {\raise0.7ex\hbox{${x_{ijk}^{f} }$} \!\mathord{\left/ {\vphantom {{x_{ijk}^{f} } {r^{p} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${r^{p} }$}} + \mathop \sum \limits_{i\smallint V} \mathop \sum \limits_{j\smallint V, i \ne j} \mathop \sum \limits_{k \in N} \omega \lambda \alpha_{ij} C_{ij} \delta y_{ijk} + \mathop \sum \limits_{i\smallint V} \mathop \sum \limits_{j\smallint V, i \ne j} \mathop \sum \limits_{k \in N} \lambda \alpha_{ij} C_{ij} \delta P_{ijk} \hfill \\ + \mathop \sum \limits_{{i\smallint V_{{{\text{cust}}}} }} \mathop \sum \limits_{{j\smallint V_{{{\text{cust}}}} , i \ne j}} \mathop \sum \limits_{k \in N} \beta \delta \lambda C_{ij} \mathop \sum \limits_{f\smallint S} x_{ijk}^{f} \left( {r^{f} } \right)^{2} \delta y_{ijk} \hfill \\ \end{gathered} $$
(10)

subject to:

  1. 1.
    $$ \sum\limits_{{j \in V_{{{\text{cust}}}} }} {\sum\limits_{k \in N} {y_{ijk} } } \le \left| N \right| $$
    (11)
  2. 2.
    $$ \sum\limits_{j \in V} {\sum\limits_{K \in N} {y_{ijk} = 1,\;i \in V_{{{\text{cust}}}} } } $$
    (12)
  3. 3.
    $$ \sum\limits_{i \in V} {\sum\limits_{k \in N} {y_{ijk} } } = 1,\;j \in V_{{{\text{cust}}}} $$
    (13)
  4. 4.
    $$ \sum\limits_{{d\int {V_{M + 1} } }} {\sum\limits_{{j\int {V_{{{\text{cust}}}} } }} {y_{djk} } } \le 1,\;k \in N $$
    (14)
  5. 5.
    $$ \sum\limits_{{d \in V_{M + 1} }} {\sum\limits_{{j \in V_{{{\text{cust}}}} }} {y_{idk} } } \le 1,\;k \in N $$
    (15)
  6. 6.
    $$ \sum\limits_{{i \in V_{M + 1} }} {\sum\limits_{{j \in V_{M + 1} }} {y_{ijk} } } = 0,\;k \in N $$
    (16)
  7. 7.
    $$ \sum\limits_{j \in V} {P_{jik} } - \sum\limits_{j \in V} {P_{ijk} } = t_{i} ,\;i \in V_{cust} ,\;k \in N $$
    (17)
  8. 8.
    $$ \sum\limits_{f \in S} {x_{ijk}^{f} } = y_{ijk} ,\;i,\;j \in V,\;k \in N $$
    (18)
  9. 9.
    $$ y_{ijk} \in \left\{ {0,\;1} \right\},\;i,\;j \in V,\;K \in N $$
    (19)
  10. 10.
    $$ P_{ijk} \in R^{ + } ,\;i,\;j \in \left\{ {V,\;k \in N} \right\} $$
    (20)
  11. 11.
    $$ x_{ijk}^{f} \in \left\{ {0,1} \right\}, i,j \in V,k \in N,f \in S $$
    (21)

5 Design of spiking neural P systems with potentials

A variation in the SN P system is designed to select rules in a simple manner appropriately as in [58]. A small rule modification of the rule has been done in order to introduce the firing process in the proposed system. To this extent, we do not include spikes as the regular SN P systems, yet it is believed that each neuron contains a potential that can be communicated as a rule. If the potential is less than or equal to its threshold potential, each neuron fires. At that point, the neuron will devour a fraction of the potential and the remaining potential is transmitted to the neighboring neurons by the neurotransmitters, which may increase or decrease with a weight in the positive and negative form, respectively.

$$ \Pi \, = \, \left( {A, \, \sigma_{1} \ldots ..\sigma_{m} , \, Sn, \, fn, \, In, \, Ot} \right) $$

Where

  1. 1.

    A is the alphabet set.

  2. 2.

    σ 1…..σ m, are neurons of the tuple σ i = ( Pi, γi, Ri) 1 ≤ i ≤ m, where:

    1. (1)

      Piis the potential that is initially given to neuron σi

    2. (2)

      γi is a real number called output weight of neuron σ i;

    3. (3)

      Ri is the rule set in σ i of the form:

      • \(E/ a^{k} \to a;d,\) where E is a regular expression over the alphabet A, k ≥ 1 and d ≥ 0. At some point in time, t, if neuron σi contains c spikes and ∈ \({a}^{c}\)L (E), c ≥ k, then this rule will be activated, consuming k spikes, sending one spike to each of the neighboring neurons after d time steps. At the step t + d, the neuron again becomes active and can receive spikes and is able to apply the rule at t + d + 1.

      • Thi/dj → 1, f, where j = 1, 2… ri, for some ri ≥ 1, where Thi ∈ Rc, and f ≥ 0, Thi ≥ 0 is the potential of neuron σi and dj ∈ Rc with the constraint 0 < dj < Thi.

  3. 3.

    Snt is the synapses at step t of the form (j, k, γjk). γjk is the weight of the synapse connecting the neurons σ j and σ k and is used to amplify the signals. If the synapse gets only one spike, then the weight γjk is multiplied by 1 and sends to the receptor neuron. Generally Sn ⊆ {1,2,….m} x {1,2,….m} x Rc are neurotransmitters between the neurons of the system where i ≠ j, k ≠ 0, for (i, j, k) ∈ Sn.

  4. 4.

    In ∈ {σ 1…σ m} is the input set of neurons.

  5. 5.

    Ot ∈ {σ 1… σ m} is the output set of neurons.

    The information in the form of spike trains is read by the input neurons from the environment. Based on the spike train configurations it spikes and passes information to the synapses. The output neurons are responsible for emitting spikes to the environment and In ∩ Ot = ɸ.

  6. 6.

    Learning function fn is used to amplify or weaken the synapses during the activations.

Note that there are no forgetting rules inside each neuron yet it is controlled by adjusting the neuron’s potentials.

6 Proposed system design

State-of-the-art algorithms use either ANN to group customers into different clusters using x and y coordinate positions that represent customers’ geo locations or they use any clustering technique to group customers based on locations. In ANN, the neural synapses are weighted by weight vectors. In all those cases, the dynamic requests from customers are converted into static requests by an event scheduler. Density-based models have also been used to identify the nearest neighbor based on the Euclidean distance. But, the degree of dynamism was very low when compared to the real-time routing scenario. In order to fix this problem, a multilayer spiking neural P system is designed, which will predict the new customer coordinates based on the trained dataset. The trained model computes the results and maps the dynamic customers to appropriate regions. The route optimization in each region is performed by FA. The outline of the proposed system is illustrated in Fig. 4. It contains 2 modules:

  • Customer coordinate agent

  • Route planning agent

Fig. 4
figure 4

Agent outline of the proposed system

The customer coordinate agent handles the dynamic request from customers. It invokes the SN P system to predict the dynamic coordinate region and this information will be transferred to the route planning agent. The route planning agent performs optimization of routes using FA and then the routes are combined. That is, the function of the route planning agent is to perform optimal routing inside each cluster/region in order to optimize the objectives described in Sect. 4.3. If we use a multi-agent system including vehicle agent, dynamic agent, etc., the performance can be greatly improved. A detailed block diagram of the proposed system is given in Fig. 5. The inputs of the system are customer coordinates, demands of the customers and time windows. After passing this information through all the modules, finally, we get the optimized routes for each of the vehicles as output. A step by step description of the same has been given in the following sections.

Fig. 5
figure 5

Block diagram of the proposed system

6.1 Static clustering of customer coordinates

Here, initially, k means clustering is used to map the known customer coordinates into regions. The travel distance and travel time are also considered. The k value is evaluated for each dataset using the elbow method [59]. After obtaining the result of the elbow method for each dataset, the location of the customer region is obtained by the k-mean method.

6.2 Spiking neural P system modeling

A set of SN P systems with modified rules and training are designed to handle the problem. There are mainly two modules in the proposed SN P system:

  • Read module

  • Cluster module

Read module: The read module consists of a total of 42 neurons that read the input spike trains to the system. Representation of geo-location coordinates by spike trains is detailed in Sect. 6.3.1. The structure of the SN P system read module is given in Fig. 6. It consists of 42 neurons (40 neurons in general and two neurons, one for reading the input spike train and another for spike generation). The process of inputting spike train s = s1s2…s20, where, si \(\in \) {0, 1} is as follows:

Fig. 6
figure 6

Spiking neural P system for reading module

Initially, all the neurons potential is set to 0 and has no spike, except the σspike − generator, which is having one spike at the start and the potential is set to 0.5. The threshold potential for all the neurons in the input module is 1 and is maintained throughout the reading process so that it will be activated whenever the spiking rule is satisfied. So, at t1, σspike − generator fires, passing one spike to \({\sigma }_{r1}\). Simultaneously σinput fires and reads s (s1) from the input spike train. If s1 = 1, then the σinput spikes, passing one spike to the neurons \({\sigma }_{vi, }, i\in \) {1, 2, …., 20}. Also, \({\sigma }_{r1}\) fires, passing on a spike to \({\sigma }_{r2}\) and \({\sigma }_{v1}\), enabling the spiking rule in \({\sigma }_{v1}\). Then \({\sigma }_{v1}\) fires and store a spike in \({\sigma }_{z1}\). In order to clear the previous spikes, we set the potential of neurons to be equal to one until the next activation by σinput. Now, \({\sigma }_{z1}=1\), representing s1 = 1.

If s1 = 0 σinput is disabled at step 1. Since it is disabled, \({\sigma }_{r2}\) is deactivated by setting the potential equal to 1. So. \({\sigma }_{z1}=0\). The spike train s can be read by bits in a step by step and keep it in \({\sigma }_{zi, i}\in \) {1, 2, …., 20}. When the reading finishes, a spike from \({\sigma }_{r20}\) goes back to σspike − generator by passing all the \({\sigma }_{ri, }i\in \) {1, 2, …., 20} returns to the initial configuration and is enabled for reading the next spike train. The weight values are set to 1 in the reading stage.

Cluster module: The structure of the cluster module is shown in Fig. 7. The \({\sigma }_{zi, i}\in \){1, 2, …, 20} is used for the design of SN P systems for clustering. The 20 \({\sigma }_{zi, i}\in \){1, 2, 20} neurons from the output of the reading module together with the k output neurons (here k = 4), a total of 24 neurons are used in the cluster module. The spiking rule is used to process the information, whereas the spiking activity is controlled by the potential rule. If a neuron contains n spikes and the potential is enabled, it spikes for n times in n steps. A three-layer structure is designed for passing and processing information. The input layer, the intermediate layer, and the output layer. The input layer and the intermediate layer are fully connected. The connections between the intermediate layer and the output layer are specially formulated.

Fig. 7
figure 7

Spiking neural P system for cluster module

The \({\sigma }_{z1}\) and \({\sigma }_{z10}\) are chosen as two input neurons, \({\sigma }_{zi, }\) i = {3, 9, 12, 13, 15, 18} as intermediate neurons and the remaining neurons as output neurons. The neurons in the intermediate layer and the outer layer are divided into 2 and 4 groups respectively. There are 12 neurons in the output layer in which each of the 3 neurons is grouped and colored by red, green, blue and yellow respectively. Finally, 4 output neurons are designed to take the information from the 4 groups of output layers. Neuron \({\sigma }_{zi, i}\in \) {1, 2,…, 20} will fire for n times if there are n spikes, passing one spike each time. The weight of the synapses in the cluster module is initialized by 1 and updated by the function \({f}_{reLu}\) (Eq. 22). Thus the 4 output neurons σouti, i = {1, 2, 3, 4} have spikes and are being counted. A 4-dimensional vector of a number of spikes (NS) is obtained after finishing the firing process. Also, the potential values of the 4 output neurons are recorded. So, we get a 4 dimensional ordered pair (\({NS}_{i}, {P}_{i})\) vector, where I \(\in \) {1, 2, 3, 4} for each of the input spike train combinations.

For each of the synapses in the cluster module, an activation rule is defined in the form:

(f, d) where,

  1. 1.

    f- activation function as defined in Eq. 22.

  2. 2.

    d = delay

    $$ = \left\{ \begin{gathered} \hfill 0,\quad {\text{if}}\;{\text{the}}\;{\text{spikes}}\;{\text{are}}\;{\text{emitted}}\;{\text{immediately}} \\ \hfill 1,\quad {\text{if}}\;{\text{the}}\;{\text{spikes}}\;{\text{are}}\;{\text{emitted}}\;{\text{immediately}} \\ \end{gathered} \right. $$
$$ f_{{{\text{reLu}}}} \left( x \right) = \left\{ {\begin{array}{*{20}c} {0, {\text{if}} \;x \le 0} \\ {x, {\text{if}} \;x > 0} \\ \end{array} } \right. $$
(22)

6.3 Mapping dynamic coordinates using spiking neural P system

The processes involved in mapping dynamic coordinates are described below.

6.3.1 Encoding customer coordinates by spike trains

In this section, the way to encode customer coordinate values by spike trains is discussed. For each customer coordinate (x, y), 10 binary digits are used for each of the x and y values. If the coordinates involve fractions, a round to next integer calculation is performed for easier processing. With 10 bits, a maximum of 512 km in both x and y axes can be considered. If one needs to cover more distance, more bits can be added. For example, if a customer is located at (52, 128) then the corresponding spike train will be encoded as:

$$ \left( {52, \, 128} \right)\, = \,0,000,110,1000,010,000,000 $$

The most significant 10 digits represent the x coordinate value and the least significant 10 digits represent the y coordinate value of a customer.

6.3.2 The process of mapping

The process of mapping coordinates involves mainly 4 steps. It is depicted in Fig. 8.

Fig. 8
figure 8

Block diagram of mapping dynamic requests

Coordinate input: The customer coordinates x and y are given as spike trains to the input neuron σinput, σx_inputandσy_input from the environment. Initially, the potential of the input neuron is set to 0.5 and the threshold potential is set to 1. So, it is always been enabled and spike. After consuming a fraction of the potential, the remaining potential is emitted to the neighboring neurons with the spike, there it may get added or subtracted. In this way, the input information will be passed from the input layer to the output layer.

Training: When the input module reads the spikes, the \({\sigma }_{zi, i}\in \) {1, 2, 20} neurons will contain a certain number of spikes. The potential of the neuron is managed by a potential adapter [60]. So, when, \({\sigma }_{zi, i}\in \) {1, 2, 20} has spikes, it fires if the potential is less than that of the threshold potential. During the firing process, the weight of each of the synapses will be updated by the learning function as defined in Eq. 22. Thus the spikes will be passed through the synapses and thus the potential of all of the participant neurons will be changed. Finally, it reaches the output layer. The ‘k’ neurons (for each cluster evaluated by elbow method) in the output layer is associated with a potential and can emit spikes to the environment. At the end of the computation, we obtain a topological structure, which is a set of values in the form of (\({NS}_{i}\), \({P}_{i}\)) where i \(\in \) {1, 2,…k}. We run each coordinate input 10 times and compute the average. Thus for k clusters, k trained SN P systems will be obtained. These trained systems will be used to cluster unknown coordinates. Since the size of the training set of each cluster used in the proposed work is within a finite limit w, the bound to the weights on each synapse is not more than w. For SN P systems, the synapse weight should be limited. So the worst-case bound to the weight of the synapse is O (w3). The difference between known and unknown coordinates inputs are simplified using a variance.

Generating standard output combinations: For an input (x–y coordinates of the customer), the set of spikes trains are read into the k trained SN P systems. With this coordinate input, each of the systems starts computing. Finally, we will get a k dimensional vectors of the form (NSi, \({P}_{i}\)), where i \(\in \) {1, 2… k}. We run the coordinates for a maximum of 10 iterations and computed the combination of an average number of spikes generated and the average potential for each of the k output neurons. It is given by:

$$ N_{{{\text{avg}}}} = \mathop \sum \limits_{j = 1}^{{\max - {\text{iter}}}} \mathop \sum \limits_{i = 1}^{k} {\raise0.7ex\hbox{${{\text{NS}}_{ji} }$} \!\mathord{\left/ {\vphantom {{{\text{NS}}_{ji} } {\max - {\text{iter}}}}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\max - {\text{iter}}}$}} $$
(23)
$$ P_{{{\text{avg}}}} = \mathop \sum \limits_{j = 1}^{{\max - {\text{iter}}}} \mathop \sum \limits_{i = 1}^{k} {\raise0.7ex\hbox{${P_{ji} }$} \!\mathord{\left/ {\vphantom {{P_{ji} } {\max - {\text{iter}}}}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\max - {\text{iter}}}$}} $$
(24)

where max-iter = 10 is used in the proposed system.

Clustering unknown coordinates: If a new customer with a dynamic request arrives, its coordinate values are read into each of the k trained SN P systems and allow them to start processing. It will produce a k dimensional vector (NSi, \({P}_{i}\)) where i \(\in \) {1, 2, k} after the computation halts for each of the k output neurons which are the number of spikes and the corresponding potential respectively. Then it calculates the variance between the output vectors of unknown coordinates and known output vectors using the formula,

$$ {\text{Var}}_{{{\text{NS}}}} = \sqrt {\mathop \sum \limits_{i = 1}^{k} {\text{NS}}_{{{\text{unknown}}\left( i \right)}} - N_{{{\text{avg}}\left( i \right)}} } $$
(25)
$$ {\text{Var}}_{P} = \sqrt {\mathop \sum \limits_{i = 1}^{k} P_{{{\text{unknown}}\left( i \right)}} - P_{{{\text{avg}}\left( i \right)}} } $$
(26)

where \({\text{NS}}_{{{\text{unknown}}\left( i \right)}}\), i \(\in \) {1, 2… k} is the k output vectors of unknown coordinates and \(N_{{{\text{avg}}\left( i \right)}}\), i \(\in \) {1, 2… k} is the average output vector of known trained coordinates. The one with the lowest value of variance is considered as the final result.

6.4 Firefly encoding

The route optimization is done by FA. The main processes are described as follows.

6.4.1 Motivation

Though Most of the existing methods generally perform well in almost all circumstances where they are implemented, there is still room for further modifications of new methods since current methods have some disadvantages in terms of computational complexity or in applications. An optimization approach that is well suited for VRP and its variants is ACS [61]. Authors have tried many combinations with ACS such as ACS with TS [62], PSO [63], and variable neighborhood search (VNS) [64], etc. But major limitations of ACS include premature convergence and getting trapped into local optima.

On the other hand, FA has acquired high popularity; it has been rarely applied in VRP. The process of encoding fireflies in discrete optimization problems is a major issue. But, due to its multimodality nature, more diversified solution space can be obtained that is very useful for routing problems. So, it was felt that appropriate initialization and distance calculation of FA might be more effective in solving routing problems. Keeping this in mind, we adopted a different firefly encoding and distance calculation suitable for the application.

6.4.2 Overview

Although FA was designed for continuous optimization problems, it has been applied to many discrete optimization problems after some modifications. But the quality of solutions depends on the initial pool of feasible solutions. Also in VRP, the duplication of customers may result in the position update strategy of FA. So it involves the maintenance overhead of infeasible solutions.

The initial feasible solution is defined by Clarke and wright algorithm [65]. Also, the fireflies are encoded as a set of customers with index denoting the visiting order. The shaded structures are used to denote the depot. Figures 9 and 10 show different firefly representations of two different tours.

Fig. 9
figure 9

Representation of tour 1

Fig. 10
figure 10

Representation of tour 2

The connection distance Cij between two fireflies is defined by:

$$ C_{ij} = \left\{ \begin{gathered} 0,\quad {\text{if}}\;{\text{node}}\;i\;{\text{is}}\;{\text{connected}}\;{\text{to}}\;{\text{node}}\;j\;{\text{in}}\;F1\;{\text{then}}\;{\text{in}}\;F2\;{\text{node}}\;i\;{\text{should}}\;{\text{be}}\;{\text{connected}}\;{\text{to}}\;{\text{node}}\;j \hfill \\ 1,\quad {\text{Otherwise}} \hfill \\ \end{gathered} \right. $$
(27)

The total distance between two fireflies is the sum of the connection distances between the two fireflies F1 and F2.

Figure 11 shows the two different firefly representations F1 and F2. The distance between F1 and F2 is 9 which is the sum of distances of all the connections which are connected to different connections in two fireflies. It is depicted in Fig. 12.

Fig. 11
figure 11

Firefly F1 and firefly F2

Fig. 12
figure 12

Distance calculation of two fireflies (individual distance is represented by blue lines)

7 Experimental results and discussions

In this section, the performance of the proposed system to solve DGVRPTW will be rigorously assessed. The system is tested on a benchmark dataset of DVRPTW instances and GVRPTW instances separately and the results are analyzed. Various parameter settings are given in Table 1. All the experiments are simulated using MATLAB with Intel xenon 2.93 GHz processor, 12 GB RAM and Windows 10 OS.

Table 1 Parameter settings

7.1 Case study 1: DVRPTW dataset

The proposed system is tested by an average of 40 runs on each of the benchmark dataset, which are open and available at http://neo.lcc.uma.es/vrp/. A comparison of the solution quality in terms of the best and average values of the proposed system is done with state-of-the-art schemes: ACO, k-ACO and E-ACO [20], VNS [66], GA-DVRP [67], and is tabulated in Table 2. The best results are in boldface. The proposed system achieves 17 out of 20 best values and 15 out of 20 average values compared with other best-known ACO variants. The K-ACO attains the worst performance compared to the other algorithms in terms of best and average solutions.

Table 2 Comparison of the best and the average values

The solution improvement percentage of the proposed system is compared with other schemes considering k-ACO as the benchmark scheme. It is noted that the proposed system achieves the highest solution improvement. It is given in Fig. 13.

Fig. 13
figure 13

Comparison of solution improvement percentage of various schemes (K-ACO as the benchmark scheme)

The effect of dynamism with different dod (1/3,1/2 and 2/3) under time windows is analyzed. The best and average values under different dod are given in Figs. 14 and 15, respectively. It is noted that the distribution of best and average results is relatively balanced and stable with different edod. So, we concluded that the proposed system is stable and robust.

Fig. 14
figure 14

Comparison of best solutions based on different degrees of dynamism

Fig. 15
figure 15

Comparison of average solutions based on different degrees of dynamism

7.2 Case study 2: GVRPTW dataset

The GVRPTW benchmark dataset is taken from EI bouzekri [68], which is having a range of customers from 10 to 300. The depot is identified as [0, 0] and the customer coordinates are included in the region [0,100] km. The capacity of the vehicle is set as 25000 kg. The load volumes of customers randomly belong to [500 kg, 2500 kg] and the service time of customers is fixed at 15 min. The service period should range in [8 h, 18 h] and the average speed of the vehicle is fixed as 80 km/hr.

To make the static GVRPTW benchmark dataset dynamic, we performed the following:

  • Appearance time- for each order, the time of working day, when the request becomes known to the SN P system.

  • The number of vehicles is set to 30 for each instance.

We performed a comparison of the influence of dynamism on minimizing CO2 emissions by vehicles, and the solution quality is compared by minimizing the total emissions at the static and dynamic environment. From the analysis of the result, it is found that the CO2 emissions and fuel consumption are high in dynamic requests due to the dynamic allocation of vehicles to corresponding customers within the time frame regardless of the location. In static case, the routes have already been known to the depots so no re-rerouting need to be performed once starts serving customers. It is detailed in Fig. 16. The quality of solutions versus CO2 emissions is analyzed in a static and dynamic environment and is tabulated in Table 3. It is noted that the solution quality is less for dynamic instances compared to static instances. A comparison of the vehicle utilization rate of various schemes is plotted in Fig. 17.

Fig. 16
figure 16

Analysis of the influence of dynamism in CO2 emissions

Table 3 Analysis of solution quality versus CO2 emissions under static and dynamic conditions
Fig. 17
figure 17

Comparison of the vehicle utilization rate of different schemes

7.3 Case study 2: large-scale dataset

Most of the state-of-the-art algorithms use small and medium datasets to test the performance of DVRP. The authors found only one scheme, EACO, which has used the Kelly data, having a scale of 225–480. The SFO [69] algorithm has been simulated on Kelly dataset to compare the performance. Then the proposed system is compared with the ACO variants and SFO on Kelly dataset on best and average solution quality and is given in Fig. 18. From the analysis of the result, it is noted that the proposed scheme is superior in arriving best and average solution values compared to other systems.

Fig. 18
figure 18

Comparison of best and average solution values of various schemes on Kelly dataset

7.4 Nature of convergence of proposed system

The convergence rate of the proposed system has been examined by taking two instances from Kelly Dataset, namely "Kelly09" and "Kelly10," and is given in Fig. 19a and b, respectively. The green line and blue line, respectively, indicate the best and average solution values with the number of iterations. For Kelly 09, the proposed scheme started to converge in 20 iterations while 30 in EACO and SFO [70] schemes. For Kelly 10 instance also the same has noticed. From the analysis of solution values, it has found that the proposed system is superior to other best-known schemes in finding the best and average solutions. So, it is concluded that the proposed system arrives at the optimal solution in a short time.

  1. 1.

    For problems having around 5001000 scale datasets, it is found that the proposed scheme can find optimal solutions in 30 iterations.

  2. 2.

    After analyzing the results, the following conclusions have been gained:

  3. 3.

    The performance of the proposed system is superior to the schemes ACO, KACO, EACO, VNS, and GADVRP in terms of solution values (Table 2).

  4. 4.

    The proposed system is having the best vehicle utilization over other schemes (Fig. 17).

  5. 5.

    The solution improvement percentage of the proposed method is almost 30%, while other schemes have below 20 improvements (Fig. 13).

  6. 6.

    The proposed scheme works well for large-scale datasets as well compared to existing schemes in terms of solution values (Fig. 18).

  7. 7.

    It is noted that the distribution of best and average results is relatively balanced and stable with different edod. So, we concluded that the proposed system is durable and robust (Figs. 14, 15).

  8. 8.

    It is scalable since it shows excellent performance in large-scale datasets. Also, the convergence rate is high (20 iterations) compared to existing schemes (Fig. 19).

Fig. 19
figure 19

Nature of convergence of proposed system ((a) Kelly 09 and (b) Kelly 10)

7.5 Statistical analysis

A statistical analysis has been performed over the algorithms considered for DVRPTW datasets. A pairwise t-test has been done to check the significance of the proposed scheme. The null hypothesis for the paired t-test is assumed as the true mean difference between paired samples is zero. On the other hand, the alternative hypothesis is that the true mean difference between paired samples is not zero. The level of significance is 0.05. Since all pairs have significantly less than 0.05, the null hypothesis is rejected. The results of the pairwise t-test are given in Table 4. The results show that there are significant differences between these pairs of algorithms.

Table 4 Pairwise T-test statistics

After analyzing the results so far, the algorithms are ranked on the basis of the average solution quality obtained over 20 test instances. A score is assigned for each of the problem instances as per the Holm-Bonferroni [70] ranking scheme. The p values are compared with the calculated values and analyzed the system performance. The results are given in Table 5. The significance is given as ‘Accepted’ if the null hypothesis (no difference in performance between the considered algorithms) is true otherwise ‘Rejected’ if the null hypothesis is false.

Table 5 Holm–Bonferroni ranking (proposed scheme as the reference with rank 5.15)

The results summarized above appear competitive with respect to state-of-the-art algorithms.

8 Conclusions and future scope

As spiking neural P systems are computing models, having maximum exploration and exploitation capabilities, they are adequate for arriving global optimal solutions in a feasible time. In this paper, we proposed a multilayer SN P system with potentials and activation functions, which does region mapping of customers in DGVRPTW, followed by a route optimization by FA. So, the major concern is to focus on improving dynamic measures which is well tackled by the proposed scheme. Apparently, other measures such as convergence rate and vehicle utilization have also been improved. It is tested on benchmark instances and proved the efficiency and reliability of the system.

The advantages of proposed system are as follows:

  1. 1.

    The performance of the proposed system is competitive to the other schemes in terms of best and average solution quality.

  2. 2.

    The solution improvement percentage of the proposed system is high compared to state-of-the-art schemes.

  3. 3.

    The quality of solutions is high under static conditions, and the amount of CO2 emissions is found to be less.

  4. 4.

    The distribution of best and average results is the same under different dod, and thereby, the system is stable and robust.

  5. 5.

    A balance between solution quality and time impediments has been found on large-scale dataset, which highlights the scalability of the system

Even though the system works better than state-of-the-art schemes, there is still room for further improvements in terms of number of vehicles and distance metric. As future research directions, we may focus on improving the solution quality further along with other optimization constraints as well.