1 Introduction

In distributed systems involving the interactions of autonomous agents on behalf of their owners, negotiation activities are essential for resolving differences and conflicting goals [11] and to control and manage resources [30, 34, 38]. Automated negotiation among agents has been widely used for supporting e-commerce and is also becoming increasingly important for managing massive distributed computational systems such as Grid/Cloud computing systems because interactions between participating agents can occur in many different contexts. Whereas there are many existing negotiation agents for e-commerce (e.g., [4, 22]), Grid resource management (e.g., [2, 20, 30, 31, 34, 38]) and Cloud resource management (e.g., [33, 35, 36]), (1) most of the negotiation agents are designed to reach an agreement consisting of coinciding proposals of participating agents and (2) each agent’s decision to reach an agreement is focused on optimizing the value of the proposal (typically price) only without consideration of reaching a consensus more rapidly (i.e., the participating agents do not consider optimizing negotiation speed). However, there are some practical negotiation applications with time constraints in which both the issues of obtaining the cheapest possible resources and getting them rapidly are essential (e.g., negotiations for Grid or Cloud resources). In such applications, obtaining resources more rapidly is one of the most desirable properties depending on negotiation participants preferences because any delay incurred on waiting for negotiations as well as resources can be perceived as an overhead. Even though there is a lot of existing research (e.g. [14, 16, 27]) that focuses on developing multi-attribute negotiation mechanisms to deal with different attributes (i.e., issues of negotiation such as price, quality, quantity, delivery time, etc.) of participating negotiation agents, there is little research that considers the duration of a negotiation (i.e., negotiation speed) as a factor affecting performance for time-constrained negotiations [32]. Whereas (negotiation) success rate (i.e., the chance of successfully finding a mutually acceptable agreement) is the main consideration for negotiation agents that operate in domains that do not have very stringent constraints on time, negotiation speed (as well as success rate) is an important consideration for those that operate in domains with very stringent constraints on time. This is because negotiation agents that consider enhancing negotiation speed can make agreements quickly (by sacrificing expected utility on issues of negotiation) and therefore, it is also possible for the agents to obtain higher success rates in negotiations under such time-constrained domains. In this regard, designing negotiation agents considering negotiation speed and finding efficient (or optimal) negotiation strategies of such agents are the main focuses of this work. Even though this work currently deals with negotiation considering negotiation speed based on a single issue (i.e., price), it can be extended to deal with multi-attribute negotiation considering negotiation speed.

In this work, the agents focusing on optimizing price only and optimizing both price and negotiation speed are denoted as price optimizing (P-optimizing) and price and speed optimizing (PS-optimizing) ones, respectively. PS-optimizing agents were first proposed and considered in [32]. To illustrate the detailed negotiation applications with examples, consider the following negotiation scenarios in which: (1) there are two types of self-interested PS-optimizing negotiation agents (that they will act so as to maximize their own outcomes) called as a seller (or consumer) and buyer (or provider) and (2) each seller and buyer has different preference criteria for optimizing both price and negotiation speed. The preference criteria of the seller can be classified into the following two cases.

  1. (1)

    The seller prefers to sell (or provide) a resource/service at a higher price than the given (expected) agreement price at the expense of having to wait longer than the given (expected) agreement time. We denote such seller is more P-optimizing (more-P-optimizing).

  2. (2)

    The seller prefers to sell (or provide) a resource/service more rapidly than the given (expected) agreement time perhaps by providing its resource/service with a lower price than the given (expected) agreement price at an earlier negotiation round. We denote such seller is more speed optimizing (more-S-optimizing).

Similarly, the preference criteria of the buyer can also be classified into the following two cases.

  1. (1)

    The buyer prefers to acquire cheaper resource/service alternatives than the given (expected) agreement price at the expense of having to wait longer than the given (expected) agreement time. We denote the buyer is more-P-optimizing.

  2. (2)

    The buyer prefers to acquire a resource/service more rapidly than the given (expected) agreement time perhaps by paying a higher price than the given (expected) agreement price at an earlier negotiation round. We denote the buyer is more-S-optimizing.

To adequately address such negotiation problems, negotiation agents called PS-optimizing agents should be designed to: (1) determine the solution space SS PS-opt for PS-optimizing negotiation consisting of (i) the solution space SS NP for optimizing price (in which different possible preference criteria of price can be represented) and (ii) the solution space SS NS for optimizing negotiation speed (in which different possible preference criteria of negotiation speed can be represented), (2) appropriately optimize both price and negotiation speed in SS PS-opt for the given various combinations of possible preference criteria (of agents), and (3) make successful agreements in various negotiation situations. To this end, the impetus of this work is to devise mechanisms for finding effective PS-optimizing negotiation strategies (of agents) which result in reasonable PS-optimizing negotiation outcomes.

Based on the information about their opponents (i.e., the other participating agents), negotiation (parameter) settings can be generally classified into the two types: (1) a complete information setting in which (participating) agents share their private information to their opponents, and (2) an incomplete information setting in which agents do not share their private information to their opponents. We denote the agent having the negotiation setting of a complete information setting (respectively, an incomplete information setting) as the agent with complete information (respectively, the agent with incomplete information). Following the above definitions, PS-optimizing agents are also divided into two categories based on the negotiation settings that they adopt. That is, a PS-optimizing agent under a complete information setting knows its opponent’s private information while a PS-optimizing agent under an incomplete information setting does not know its opponent’s private information. Further details of negotiation models for P-optimizing negotiation and the PS-optimizing negotiation will be described and compared in Sect. 2.

The existing preliminary works in [32] and [9] have attempted to find negotiation strategies for PS-optimizing agents with incomplete information using coevolutionary learning by adopting evolutionary algorithms (EAs). Nevertheless, the results in [32] and [9] showed that: (1) there are possibilities of coevolution failure using the fitness function defined in [32] due to the ambiguity in the utility space, (2) the converged coevolution results cannot be achieved in some cases using conventional EAs used in [32] and [9]. Furthermore, in [32] and [9], there was no theory explaining and supporting the optimality of the achieved results. To overcoming these drawbacks and to complement and enhance the existing PS-optimizing agents, this work will design: (1) PS-optimizing agents for performing effective PS-optimizing negotiations under a complete information setting (Sect. 4.1) and (2) mechanisms for finding effective negotiation strategies of PS-optimizing agents with incomplete information (Sect. 4.2) by using coevolutionary learning (described in Sect. 3) adopting estimation of distribution algorithms (EDAs).

A series of experiments (see Sect. 5) was carried out to: (1) show the effectiveness of the coevolutionary learning for finding effective negotiation strategies of PS-optimizing agents with incomplete information and (2) compare the performance of coevolutionary learning adopting S-EDAs against adopting ID2C-EDAs. Empirical results in Sect. 5 show that ID2C-EDAs can coevolve effective converged negotiation strategies which are close to the optimum for both PS-optimizing agents in most of the cases. While Sect. 6 compares this work with existing works, Sect. 7 concludes this paper by summarizing a list of contributions and future work.

2 Negotiation models

This work considers a bilateral negotiation model between two self-interested agents with conflicting interests such that the seller (S) that wishes to provide a good or service at the highest possible price and the buyer (B) that purchase the good or service at the cheapest possible price. We first investigate one of the most widely used P-optimizing negotiation model for optimizing price only. Then, the P-optimizing negotiation model will be extended to the PS-optimizing negotiation model that is capable of dealing with optimizing both price and negotiation speed using preferences of price and negotiation speed.

2.1 Price optimizing negotiation model

In the P-optimizing negotiation model, there are three key elements of negotiation [15]: (1) the negotiation protocol, (2) the negotiation strategies that the agents adopt during the negotiation process, and (3) the utility functions for the agents. The agents adopt Rubinstein’s alternating offers protocol [26] and negotiate by exchanging proposals with their negotiation partners. The alternating offers protocol is simple but it is the most influential general negotiation protocol. Furthermore, it has been applied to many existing works (e.g., see [6, 21, 41]). At each alternate round, an agent makes and sends a proposal. Then, the other agent evaluates the proposal and takes one of the following actions: (1) accepting the proposal, (2) rejecting the proposal, or (3) making a counter proposal. Negotiation between the two agents terminates with an agreement when an offer or a counter-offer is accepted or with a conflict if no agreement is reached when one of the two agents’ deadlines is reached. An agreement is reached when one agent proposes a deal that matches or exceeds what another agent asks for.

The agent x∈{B,S} generates a proposal at a negotiation round t, 0≤tτ x , as follows:

$$ P_{t}^{x} = \mathit{IP}_{x} + ( - 1)^{\alpha} \biggl( \frac{t}{\tau_{x}} \biggr)^{\lambda _{x}}| \mathit{RP}_{x} - \mathit{IP}_{x} |, $$
(1)

where α=1 for S and α=0 for B. IP x is the initial price of x that is the most favorable price for x, and RP x is the reserve price that is the least favorable price for x; τ x is the deadline and λ x , 0≤λ x ≤∞, is the time-dependent strategy of x. During the negotiation process, starting from the initial prices, successive proposals of S are monotonically decreasing while successive proposals of B are monotonically increasing.

As shown in Fig. 1, for each agent x, the possible range of price, [IP x ,RP x ], is denoted as the acceptability zone for price of x, \(\mathit{AccZ}_{x}^{\mathit{NP}}\), and the possible range of negotiation time, [0,τ x ], is denoted as the acceptability zone for negotiation time of x, \(\mathit{AccZ}_{x}^{\mathit{NT}}\). The negotiation solution space (NSS) for the negotiation between B and S consists of: (1) the agreement zone of price (AgZ NP), or sometimes called the price-surplus, which is the overlapping region between \(\mathit{AccZ}_{B}^{\mathit{NP}}\) and \(\mathit{AccZ}_{S}^{\mathit{NP}}\), and (2) the agreement zone of negotiation time (AgZ NT) which is the overlapping region between \(\mathit{AccZ}_{B}^{\mathit{NT}}\) and \(\mathit{AccZ}_{S}^{\mathit{NT}}\). In Fig. 1, AgZ NP is [RP S ,RP B ] and AgZ NT is [0,min{τ B ,τ S }].

Fig. 1
figure 1

Example of negotiation solution space between B and S

Time-dependent negotiation strategies are adopted in which the negotiation agents make successive proposals depending on the remaining negotiation time. The concession behavior of x is determined by the values of the time-dependent strategy and is classified as follows [28, 29, 37]:

  1. (1)

    Conciliatory (0<λ x <1): x makes larger concessions in earlier negotiation rounds and smaller concessions in later negotiation rounds.

  2. (2)

    Linear (λ x =1): x makes a constant rate of concession.

  3. (3)

    Conservative (1<λ x <∞): x makes smaller concessions in earlier negotiation rounds and larger concessions in later negotiation rounds.

Let D be the event in which x fails to reach an agreement. The utility function of x is defined as U x :[IP x ,RP x ]∪D→[0,1] such that U x (D)=0 and for any \(P_{t}^{x} \in [\mathit{IP}_{x},\mathit{RP}_{x}]\), \(U_{x}(P_{t}^{x}) > U_{x}(D)\) in which \(U_{x}(P_{t}^{x})\) is given as follows:

$$ U_{x}\bigl(P_{t}^{x}\bigr) = u_{\min} + (1 - u_{\min} ) \biggl( \frac{\mathit{RP}_{x} - P_{t}^{x}}{\mathit{RP}_{x} - \mathit{IP}_{x}} \biggr), $$
(2)

where u min is the minimum utility that x receives for reaching an agreement at RP x and the value of u min is set larger than 0. u min is set to 0.001 in this work for the experimental purpose. Then, at \(P_{t}^{x} = \mathit{RP}_{x}\), U x (RP x )=0.001>U x (D)=0.

Definition 1

(P-optimizing Agent)

For a given negotiation setting, a P-optimizing agent is designed to optimize the price only by maximizing the utility in (2).

A negotiation between P-optimizing agents is denoted as the P-optimizing negotiation. Self-interested P-optimizing agents B and S favor an agreement that maximizes their own (price) utilities given in (2) at an agreement price.

In P-optimizing negotiations between B and S, finding their optimal negotiation strategies plays an important role in a sense that by adopting optimal negotiation strategies, both achieve optimal negotiation outcomes (i.e., optimal agreement prices). In determining optimal negotiation strategies for P-optimizing negotiations with complete information settings, deadline effect is the most important factor. This is because if one P-optimizing agent has a longer deadline than the other, the agent having a longer deadline will dominate the whole negotiation. Since the strategy of the agent having a longer deadline determines whether both agents can reach an agreement before their deadlines, the agent having a longer deadline has (significant) a bargaining advantage in terms of time over the other agent.

For a P-optimizing negotiation under a complete information setting, an agent knows the other agent’s private information such as RP and deadline. Therefore, the optimal agreement price (\(P_{c}^{P\text{-}\mathit{opt}}\)) and agreement time (\(T_{c}^{P\text{-}\mathit{opt}}\)) for the P-optimizing negotiation between B and S can be analyzed by the following theorems.

Theorem 1

[40, pp. 199–200]

If the P-optimizing agent B has longer deadline than the P-optimizing agent S, \(P_{c}^{P\text{-}\mathit{opt}}\) is RP S and \(T_{c}^{P\text{-}\mathit{opt}}\) is τ S .

Proof

Since the minimal possible agreement price for B is RP S and at which B obtains the maximal utility, \(P_{c}^{P\text{-}\mathit{opt}}\) is made at RP S . Whatever strategy S adopts, S concedes to RP S at τ S following (1). Before reaching τ S , the utility of S’s proposals for B will be lower than the utility at τ S . Furthermore, B fails to reach an agreement after τ S . Hence, \(T_{c}^{P\text{-}\mathit{opt}}\) is made at τ S . □

Theorem 2

[40, pp. 199–200]

If the P-optimizing agent S has longer deadline than the P-optimizing agent B, \(P_{c}^{P\text{-}\mathit{opt}}\) is RP B and \(T_{c}^{P\text{-}\mathit{opt}}\) is τ B .

Proof

Symmetrically, \(P_{c}^{P\text{-}\mathit{opt}}\) is made at RP B and \(T_{c}^{P\text{-}\mathit{opt}}\) is made at τ B because at which S obtains its maximal utility. □

Finally, using the obtained \(P_{c}^{P\text{-}\mathit{opt}}\) and \(T_{c}^{P\text{-}\mathit{opt}}\) from Theorems 1 and 2, the optimal P-optimizing negotiation strategy of \(B (\lambda_{B}^{P\text{-}\mathit{opt}})\) and the optimal P-optimizing negotiation strategy of \(S (\lambda_{S}^{P\text{-}\mathit{opt}})\) are derived from (1) by substituting \(P_{t}^{x}\) in (1) by \(P_{c}^{P\text{-}\mathit{opt}}\) and t in (1) by \(T_{c}^{P\text{-}\mathit{opt}}\), respectively, as follows:

(3)
(4)

Theorems 1 and 2 are based on Theorems 1 and 2 in [40, pp. 199–200].

2.2 Price and speed optimizing negotiation model

The proposed PS-optimizing negotiation model also considers price as a negotiation issue similarly to the P-optimizing negotiation model in Sect. 2.1. Therefore, the agents participating in PS-optimizing negotiations exchange offers or counter-offers that consist of price proposals only—but not time proposals or both price and time proposals—during the negotiation process. However, compared to the P-optimizing negotiation model designed to take only price into consideration in optimizing negotiation outcomes, the PS-optimizing negotiation model is designed to take both price and negotiation speed (in terms of the number of negotiation rounds) into consideration in optimizing negotiation outcomes.

Definition 2

(PS-optimizing Agent)

For a given negotiation setting, a PS-optimizing agent is designed to optimize both price and negotiation speed (by maximizing the total utility consisting of both price and speed utilities) using its given preferences of price and negotiation speed.

A negotiation between PS-optimizing agents is denoted as the PS-optimizing negotiation.

The PS-optimizing negotiation model also has three key elements similarly to the P-optimizing negotiation model. The PS-optimizing agents also adopt Rubinstein’s alternating offers protocol and time-dependent negotiation strategies as in P-optimizing agents. However, for each PS-optimizing agent, there are two types of utility functions: (1) one designed to measure the degree of satisfaction for price and (2) the other designed to measure the degree of satisfaction for negotiation speed. Depending on the strategies that a PS-optimizing agent adopts, there can be a variety of possible PS-optimizing negotiation outcomes. Hence, this research will focus on designing PS-optimizing agents B and S and finding their optimal PS-optimizing negotiation strategies for achieving optimal PS-optimizing negotiation outcomes for their given preferences of price and negotiation speed.

In addition to the three key elements, the PS-optimizing negotiation model has one more key element: the preferences of price and negotiation speed (for each PS-optimizing agent). With regard to preferences of price and negotiation speed of a PS-optimizing agent, different preference criteria such as optimizing price and optimizing negotiation speed are individually modeled as corresponding different weightings of price and negotiation time, respectively; for a PS-optimizing agent x, the preference for optimizing price is denoted as \(w_{\mathit{NP}}^{x}\) and the preference for optimizing negotiation speed is denoted as \(w_{\mathit{NS}}^{x}\). Based on the user’s preferences for optimizing price and optimizing speed, \(w_{\mathit{NP}}^{x}\) and \(w_{\mathit{NS}}^{x}\) are provided by the user with the constraint \(w_{\mathit{NP}}^{x} + w_{\mathit{NS}}^{x} = 1.0\) where \(w_{\mathit{NP}}^{x} \ge 0\) and \(w_{\mathit{NS}}^{x} \ge 0\). \(w_{\mathit{NP}}^{x} + w_{\mathit{NS}}^{x}\) is set to 1.0 because the preference criteria of x are interdependent and conflicting each other; as x prefers to achieve negotiation outcomes that is more P-optimizing at the expense of waiting longer, then x will put more emphasis on \(w_{\mathit{NP}}^{x}\) and less emphasis on \(w_{\mathit{NS}}^{x}\). Conversely, as x prefers to achieve its negotiation outcome more rapidly at the expense of conceding more in price, then x will put more emphasis on \(w_{\mathit{NS}}^{x}\) and less emphasis on \(w_{\mathit{NP}}^{x}\). Depending on different preference criteria, agents can be summarized as the following three representative groups [32]:

  1. (1)

    (Totally) P-optimizing agents in which total emphasis is given for optimizing price such as \((w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x}) = (1.0, 0.0)\).

  2. (2)

    (Totally) S-optimizing agents in which total emphasis is given for optimizing negotiation speed such as \((w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x}) = (0.0, 1.0)\).

  3. (3)

    PS-optimizing agents in which emphases are given for optimizing both price and negotiation speed such as \((w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x}) = \{\text{the weightings except }(1.0, 0.0)\text{ and }\allowbreak (0.0, 1.0)\}\).

However, (Totally) S-optimizing agents are not considered because such S-optimizing agents model the situation when agents are totally optimizing negotiation speed without consideration of optimizing price. This negotiation situation will not be realistic in practice because such S-optimizing agents generally reach an agreement without any negotiation by just accepting its opponent’s first proposal. Hence, the possible region of preferences of price and negotiation speed is set as \((w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x})=[(1.0, 0.0), (0.0, 1.0))\).

In regard to various possible combinations of preference criteria of PS-optimizing agents, there are three representative groups of PS-optimizing agents: (1) the agents placing the equal emphasis on optimizing price and negotiation speed such as \((w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x}) = (0.5, 0.5)\) are denoted as exact-PS-optimizing agents, (2) if agents place more emphasis on optimizing price than exact-PS-optimizing agents, then they are denoted as more-P-optimizing agents, and (3) if agents place more emphasis on optimizing negotiation speed than exact-PS-optimizing agents, then they are denoted as more-S-optimizing agents.

Comparing to P-optimizing agents, the PS-optimizing agent x requires two types of utility functions: (1) a price utility function for measuring the degree of satisfaction in terms of price and (2) a speed utility for measuring the degree of satisfaction in terms of negotiation speed. The price utility function \(U_{\mathit{NP}}^{x}\) for the given input P x (for price) and the speed utility function \(U_{\mathit{NS}}^{x}\) for the given input T x (for negotiation time) are defined as follows:

(5)
(6)

where \(U_{\mathit{NP}}^{x}(P_{x}) \in [0, 1]\) and \(U_{\mathit{NS}}^{x}(T_{x}) \in [0, 1]\). \(u_{\min}^{P}\) is the minimum utility that x receives a deal at its RP, and \(u_{\min}^{S}\) is the minimum utility that x receives a deal at its deadline. For the experimental purpose, the values of \(u_{\min}^{P}\) and \(u_{\min}^{S}\) is set to 0.0001. Next, for achieving the composite utility of x consisting of both \(U_{\mathit{NP}}^{x}\) and \(U_{\mathit{NS}}^{x}\), the following total utility function \(U_{\mathit{Total}}^{x}\) was used:

$$ U_{\mathit{Total}}^{x}( P_{x},T_{x} ) = w_{\mathit{NP}}^{x} \times U_{\mathit{NP}}^{x}( P_{x} ) + w_{\mathit{NS}}^{x} \times U_{\mathit{NS}}^{x}( T_{x} ) $$
(7)

where P x ∈{0,P c } and T x ∈{0,T c }. If x does not reach an agreement before its deadline, then \(U_{\mathit{Total}}^{x} = 0\) because \(U_{\mathit{NP}}^{x} = U_{\mathit{NS}}^{x} = 0\). If x reaches an agreement at P x =P c and T x =T c , then \(U_{\mathit{Total}}^{x}(P_{c},T_{c}) > 0\) because \(U_{\mathit{NP}}^{x} > 0\) and \(U_{\mathit{NS}}^{x} > 0\).

The remaining part is designing PS-optimizing agents and finding their optimal negotiation strategies to achieve optimal PS-optimizing negotiation outcomes under both complete and incomplete information settings. In designing PS-optimizing agents, the design goal is as follows:

Design goal

The ultimate design goal of PS-optimizing agents is to achieve optimal PS-optimizing negotiation outcomes satisfying preferences of price and negotiation speed under given negotiation settings. Even though a PS-optimizing negotiation cannot achieve optimal negotiation outcomes, the performance of the PS-optimizing negotiation in optimizing the preferences should be superior to or (at least) equal to that of the P-optimizing negotiation. The latter case is denoted as the minimum performance requirement in this work.

The similarities and differences between P-optimizing agents (Sect. 2.1) and PS-optimizing agents (Sect. 2.2) are as follows. (1) Both negotiation models adopt Rubinstein’s alternating offers protocol as the negotiation protocol. Furthermore, agents in the two negotiation models exchange offers or counter-offers that consist of price proposals—but not time proposals or both price and time proposals—for making a mutual agreement. (2) To evaluate price and negotiation time (of the proposals), a speed utility function as well as a price utility function is required. Accordingly, the total utility function consisting of both price and speed utility functions, associated with preferences of price and negotiation speed, respectively, is adopted for PS-optimizing agents. However, for P-optimizing agents, the sole utility function equivalent to the price utility function in (2) is used. (3) As the name denotes, the PS-optimizing negotiation model requires an optimization procedure (denoted as the PS-optimization) for optimizing both price and negotiation speed while the (original) P-optimizing negotiation model in itself have the optimization procedure (denoted as the P-optimization) for optimizing price only. Hence, a PS-optimizing negotiation mechanism that enables rational PS-optimization is required to find effective negotiation strategies of PS-optimizing agents.

In designing the PS-optimizing negotiation mechanism, it is assumed that PS-optimizing agents do not change their preferences of price and negotiation speed with the knowledge of the opponent’s information. This means that the PS-optimizing agents with their given preferences of price and negotiation speed are cooperative for optimizing both price and negotiation speed. This assumption makes sense in that PS-optimizing negotiations will not operate well if negotiating agents do not cooperate at all. For instance, if one agent having a bargaining advantage in terms of time tries to achieve higher speed utility without conceding any price utility, there is no reason for its opponent to make an earlier agreement by conceding its speed utility. Furthermore, PS-optimizing agents should be trustable in the sense that they cooperate for optimizing negotiation speed without changing its initial preferences of price and negotiation speed during negotiation process. Another issue for designing PS-optimizing agents is to determine: (1) the value(s) of preferable agreement price in SS NP using \(w_{\mathit{NP}}^{x}\) and (2) the value(s) of preferable agreement time in SS NS using \(w_{\mathit{NS}}^{x}\). Determining such a range of values within effective SS PS-opt (consisting of SS NP and SS NS ) is essential because there can be different realizations of PS-optimizing negotiations depending on the values. The specific details for designing PS-optimizing agents to find optimal negotiation strategies will be described in Sect. 4.

3 Overview of EDAs for coevolutionary learning

EDAs, sometimes called probabilistic model building genetic algorithms (PMBGAs), have become one of the new paradigms within genetic and evolutionary computation research [17, 42]. Like other EAs based on ideas borrowed from genetics and natural selection such as genetic algorithms (GAs), evolutionary strategies (ESs) and evolutionary programming (EP), EDAs also use selection to choose good candidate solutions and successively evolve a population of the selected solutions until some termination criteria are satisfied. However, to evolve a population of promising solutions, EDAs build probabilistic models of the selected solutions and sample useful genetic information (i.e., good offspring) from the probabilistic models instead of using variation operators such as crossover and mutation. From the perspective of the fitness landscape (that is the geographical distribution consisting of peaks and valleys of fitness over solution space), while EAs such as GAs, ESs and EP search promising regions (i.e., solutions) of fitness landscape with both exploitation and exploration using genetic operators (i.e., selection and variation operators), EDAs search the regions by exploiting feasible probabilistic models and efficiently traversing the solution space [1, 25].

This section demonstrates the application of EDAs to solve the coevolutionary problem of finding optimal negotiation strategies of PS-optimizing agents operating under an incomplete information setting. First, S-EDA is presented. Then, ID2C-EDA incorporating S-EDA with a novel diversity controlling technique is presented. Table 1 shows symbols used for the EDAs in this work.

Table 1 Symbols used for the EDAs

3.1 Description of S-EDA

The S-EDA is based on the continuous (i.e., real-coded) univariate marginal distribution algorithm (UMDAc) [17]. The pseudocode of S-EDA is presented as follows.

Step 1. :

Initialization

Generate the initial population P 0 with n P individuals at random;

g←0; cnt←0.

Step 2. :

Selection

g++;

Select a set of promising candidates S g−1 with n S (<n P ) individuals from P g−1.

Step 3. :

Building Model

Estimate the probability distribution \(f_{\mathbf{X}^{g}}(\mathbf{x}^{g})\) from S g−1.

Step 4. :

Sampling Model

Generate offspring O g with n O individuals by sampling \(f_{\mathbf{X}^{g}}(\mathbf{x}^{g})\).

Step 5. :

Replacement

Create a new population P g by replacing some individuals of P g−1 with O g .

Step 6. :

Reinitializing Population and Restarting Evolution

If cnt<CNT max and an inappropriate configuration is detected,

 initialize P g at random;

g←0; cnt++;

Go to Step 2.

Step 7. :

Termination

If the termination criteria are not satisfied,

 go to Step 2.

Else return the best solution found so far.

The main distinguishing features of S-EDA (compared to other EAs adopting genetic operators) is building a probabilistic model (Step 3) and sampling the model to generate new solutions, i.e., offspring (Step 4). A continuous optimization problem with n variables is considered. The corresponding n-dimensional random variable and one of its possible instances at each generation g is denoted as \(\mathbf{X}^{g} = (X_{1}^{g},X_{2}^{g},\ldots,X_{n}^{g})\) and \(\mathbf{x}^{g} = (x_{1}^{g},x_{2}^{g},\ldots,x_{n}^{g})\). Following UMDAc, S-EDA assumes that marginal independence among the variables. Hence, the joint probability distribution of X g follows an n-dimensional normal distribution which is factorized as a product of n independent univariate marginal distributions as follows.

$$f_{\mathbf{X}^{g}}\bigl(\mathbf{x}^{g}\bigr) = \prod _{i = 1}^{n} f_{X_{i}^{g}}\bigl(x_{i}^{g} \bigr). $$

Each variable of X g follows a univariate normal distribution with mean \(\mu_{i}^{g}\) and the standard deviation \(\sigma_{i}^{g}\) as follows:

$$f_{X_{i}^{g}}\bigl(x_{i}^{g}\bigr) = \frac{1}{\sqrt{2\pi} \sigma_{i}^{g}}e^{ - \frac{(x_{i}^{g} - \mu _{i}^{g})^{2}}{2(\sigma _{i}^{g})^{2}}},\quad \mbox{with } i = 1,2,\ldots,n. $$

\(\mu_{i}^{g}\) and \(\sigma_{i}^{g}\) are estimated using maximum likelihood estimation from S g−1 as follows:

where \((x_{i}^{g - 1})_{j}\) is the j-th individual in S g−1.

Then, offspring O g are randomly generated by sampling normal random variables from \(f_{\mathbf{X}^{g}}(\mathbf{x}^{g})\).

Another distinguishing feature of S-EDA (compared to UMDAc, as well as other conventional EAs) is reinitializing population and restarting the evolution (Step 6). This is to escape an inappropriate configuration of populations (where S-EDA can no longer evolve a population containing promising solutions for future evolution process) though simply restarting evolution process with randomly initialized population. In the coevolutionary learning problem (described in Sect. 4.2 in detail), there can be two types of inappropriate population configurations: (1) Type-I error: P g cannot converge to a certain value until G max is reached (i.e., very slow convergence or non-convergence is presented) and (2) Type-II error: This is due to premature convergence occurring at early generations (generally, \(g \le G^{\mathit{max\_infeasible\_band}}\)) caused by the domination of inappropriate individuals in P g having fitness values of all 0s or all 1s (that cannot occur in the coevolution problem described in Sect. 4.2), which occurs from repeated inappropriate random pairing of individuals [25]. If an inappropriate populations is detected, S-EDA initializes P g and restarts its evolution procedures. Step 6 is incorporated before testing termination criteria in Step 7. The S-EDA stops its evolution process and returns the best solution found so far when either of the following conditions is satisfied (Step 7): (1) g=G max and cnt=CNT max, and (2) \(| f_{\mathit{best}}^{g} - f_{\mathit{best}}^{g} | < \delta_{\mathit{fit}}\) and \(\operatorname {Var}(P_{g}) < \delta_{\mathit{var}}\).

3.2 Description of ID2C-EDA

In the authors’ previous works [8] and [10], the novel diversity controlling GA and EDA called ID2C-GA and ID2C-EDA were developed based on a real-coded GA (called S-GA) and S-EDA, respectively. In [8], ID2C-GAs and ID2C-EDAs were used for coevolutionary learning in which the objective is to find optimal (P-optimizing) negotiation strategies for interacting agents with incomplete information. Although this work is similar to [8] and [10] in a sense that both adopt EAs for (a similar type of) coevolutionary learning, they are mainly different in two ways: (1) [8] and [10] only focused on finding negotiation strategies for optimizing price only but this work deals with the more difficult problem of finding negotiation strategies that can optimize both price and negotiation speed and (2) while the fitness functions of [8] and [10] are directly related with the (price) utility function, the fitness functions of this work has indirect relationship with the (price and speed) utility functions. It is noted that it has been proved theoretically and demonstrated empirically that the performances of GAs and EDAs are found to be very close to each other although they adopt quite different search strategies [17, 25]. Furthermore, it is empirically observed in [10] that: (1) ID2C-GAs and ID2C-EDAs outperform S-GAs and S-EDAs for the coevolutionary learning because ID2C-GAs and ID2C-EDAs have enough capability for overcoming premature convergence and achieving non-biased coevolution results for both populations and (2) ID2C-EDAs ensures better efficacy and reliability in achieving good solutions than ID2C-GAs if the search space is (very) large. For these reasons, this work adopts ID2C-EDAs and S-EDAs for the coevolutionary learning to carry out comparative studies on their coevolution performance. ID2C-EDAs adopts a subspace-based dynamic (i.e., adaptive) diversity controlling technique called modified (i.e., improved) diversification and refinement (mDR) and two local improvement methods such as population repair (PR) and local neighborhood search (LNS). The pseudocode of ID2C-EDA is presented as follows.

Step 1. :

Initialization

Generate the initial population P 0 with n P individuals at random;

g←0; cnt←0.

Step 2. :

Selection

g++;

Select a set of promising candidates S g−1 with n S (<n P ) individuals from P g−1.

Step 3. :

Building Model

Estimate the probability distribution \(f_{\mathbf{X}^{g}}(\mathbf{x}^{g})\) from S g−1.

Step 4. :

Sampling Model

Generate offspring O g with n O individuals by sampling \(f_{\mathbf{X}^{g}}(\mathbf{x}^{g})\).

Step 5. :

Replacement

Create a new population P g by replacing some individuals of P g−1 with O g .

Step 6. :

Diversification and Refinement (DR)

Calculate \(\operatorname {Div}(P_{g})\);

If \(\operatorname {Div}(P_{g}) < \delta_{\mathit{low}}\) and \(\operatorname {Div}(P_{g}) > \delta_{\mathit{high}}\), conduct the following procedures of DR

  1. A.

    Pre-ordering individuals in P g in both fitness and solution spaces;

  2. B.

    Eliminating redundant individuals in P g using the similarity of individuals;

  3. C.

    Calculating BOF i and update C i (1≤in band );

  4. D.

    Eliminating infeasible bands if \(g > G^{\mathit{max\_infeasible\_band}}\);

  5. E.

    Refining the population using diversified artificial individuals (DAIs)

    1. (a)

      Generating DAIs using BOFs based on population diversity;

    2. (b)

      Injecting the generated DAIs into the population.

Else calculate BOF i and update C i (1≤in band ).

Step 7. :

Population Repair (PR)

Replace some infeasible individuals consisting of an inappropriate population configuration with new individuals randomly generated using the feasible individual list (FI_List).

Step 8. :

Local Neighborhood Search (LNS)

Replace some less feasible individuals (having lower fitness) by the neighborhoods generated from the locally best solution in the population.

Step 9. :

Reinitializing Population and Restarting Evolution

If cnt<CNT max and an inappropriate configuration is detected,

 initialize P g at random;

g←0; cnt++;

Go to Step 2.

Step 10. :

Termination

If the termination criteria are not satisfied,

 go to Step 2.

Else return the best solution found so far.

mDR (Step 6) is the main part of ID2C-EDA and its objective is to achieve individuals (of a population) that are both feasible and diversified through a dynamic diversity control of the population. mDR utilizes robustness of bands (ROBs) in which a band is defined as a distinct (small) fraction of solution space and the solution space is mapped into bands with each fixed size of Band_Size. A band i is defined as the more robust one than another band j (ij and 1≤i,jn band ) if more individuals belonging to i have survived for more generations than j. For measuring ROBs of solution space at each generation g, each band i: (1) counts band-occupying frequency (BOF i ) which is the accumulated frequency of individuals belonging to i until g is reached and (2) stores BOF i into the global counter variable C i . mDR operates selectively depending on the population diversity of \(P_{g}, \operatorname {Div}(P_{g})\). mDR operates if \(\operatorname {Div}(P_{g})\) is below the given lowest possible threshold δ low (i.e., \(\operatorname {Div}(P_{g}) < \delta_{\mathit{low}}\)) or is above the given highest possible threshold δ high (i.e., \(\operatorname {Div}(P_{g}) > \delta_{\mathit{high}}\)). Otherwise (i.e., \(\delta_{\mathit{low}} \le \operatorname {Div}(P_{g}) \le\delta_{\mathit{high}}\)), mDR does not operate. In this way, BOF i is calculated and C i is updated, 1≤in band . mDR has two main functionalities: (1) diversification ensures achieving a sufficiently high population diversity (from A and B in Step 6), and (2) refinement guarantees achieving a refined population consisting of more promising (i.e., feasible and diversified) solutions (from D and E in Step 6) using ROB information (from C in Step 6). The details of mDR are as follows:

  1. A.

    Pre-ordering the population.

    Before eliminating redundant individuals, ordering individuals in P g is carried out in both the fitness and solution spaces. First, individuals in P g are sorted according to their fitness values in decreasing order. Next, if there are some individuals with the same fitness values (or less than the predefined threshold δ fit ), then the individuals are sorted according to their solution values in decreasing order. Then, the ordered population \(P_{g}^{\mathit{Ordered}}\) is obtained.

  2. B.

    Eliminating redundant individuals.

    Domination of redundant (or duplicate) individuals can lead to premature convergence. This is because redundant individuals with a similar structure reduce population diversity; parents with a similar structure can often reproduce offspring with the same (or very similar) structure in the next generation. To avoid such premature convergence, redundant individuals in \(P_{g}^{\mathit{Ordered}}\) are eliminated from \(P_{g}^{\mathit{Ordered}}\) after testing the similarity among individuals in both fitness and solution levels as follows. If i-th and j-th individuals (1≤in P and i+1≤jn P ) in \(P_{g}^{\mathit{Ordered}}\) have very close fitness values (that is less than the given threshold α) and also very close solution values (that is less than the given threshold β), the j-th individual is considered as the redundant individual and is eliminated from \(P_{g}^{\mathit{Ordered}}\). This similarity test is carried out from i=1 to n P −1. Then, the population \(P_{g}^{\mathit{Eliminated}}\) with high population diversity can be achieved.

  3. C.

    Calculating BOFs for all bands.

    For each band i (1≤in band ), BOF i is calculated from \(P_{g}^{\mathit{Eliminated}}\) and the corresponding C i is updated. By calculating BOFs from \(P_{g}^{\mathit{Eliminated}}\)—not from P g or \(P_{g}^{\mathit{Ordered}}\) in which both can have redundant individuals, only the individuals ensuring the higher population diversity will contribute to calculating BOFs. Therefore, reliable BOFs without redundant information are guaranteed.

  4. D.

    Eliminating infeasible bands.

    If C i of the band i is not updated (i.e., there is no individual belonging to i) during a certain number of generations (\(G^{\mathit{max\_infeasible\_band}}\)), i is considered as the infeasible band and is removed from the feasible band list (FB_List) containing all feasible bands. This procedure operates if \(g > G^{\mathit{max\_infeasible\_band}}\) because at least \(G^{\mathit{max\_infeasible\_band}}\) is required to collect infeasible band information for carrying out such elimination.

  5. E.

    Refining the population.

    Since redundant individuals are eliminated (in the B-th Step), diversified artificial individuals (DAIs) are generated and injected into \(P_{g}^{\mathit{Eliminated}}\) equal to the number of the eliminated individuals. For generating feasible DAIs using achieved reliable BOF information, bands belonging to FB_List and having high values of BOFs (i.e., representing high robustness) are considered as promising solution regions for future evolutionary search. As evolution progresses, more effective BOFs will be achieved because more reliable ROB information will be gathered over all bands at later generations.

    1. (a)

      Each promising DAI is generated randomly in the selected band based on the two modes: (1) In exploration mode (\(\operatorname {Div}(P_{g}^{\mathit{Eliminated}}) < \delta_{\mathit{low}}\)), the bands with the lower BOFs have a higher probability to be selected for generating DAIs and (2) In exploitation mode (\(\operatorname {Div}(P_{g}^{\mathit{Eliminated}}) > \delta_{\mathit{high}}\)), the bands with the higher BOFs have a higher probability to be selected for generating DAIs.

    2. (b)

      The generated DAIs are injected into \(P_{g}^{\mathit{Eliminated}}\).

Finally, \(P_{g}^{\mathit{Refined}}\) can be ensured that all individuals are both feasible and diversified.

Although the reinitializing population and restarting evolution (in Step 6 of S-EDA and Step 9 of ID2C-EDA) can be a solution for both Type-I and the Type-II errors resulting in domination of infeasible individuals, extremely large overheads (in terms of both computation and time) are inevitable using the procedure. Therefore, PR and LNS are devised to overcome such drawbacks.

PR (Step 7) is devised to prevent domination of infeasible individuals (having fitness values of all 0s or all 1s) due to the Type-II error by replacing infeasible individuals with feasible individuals. Using the feasible individual list (FI_List) consisting of feasible individuals stored in the previous evolution step, PR replaces infeasible individuals with randomly selected individuals from FI_List.

LNS (Step 8) is devised to solve non-convergence or very slow convergence due to the Type-I error by compensating degradation of the S-EDA’s search efficiency during coevolution process. First, LNS generates effective solution candidates from the neighboring bands of the band involving the current local optimal solution. Then, LNS replaces a small number of individuals (denoted as LNS_SIZE) having the lowest fitness values with the generated solution candidates. The replaced solution candidates can contribute to accelerating convergence of the population.

Since PR and LNS replace some infeasible and less feasible individuals with more promising solutions, they can be considered as replacement techniques. Whereas the replacement (in Step 5) is the technique that simply replaces some individuals of P g−1 with some individuals from O g to create P g , PR and LNS are adaptive techniques that replace some infeasible and less feasible individuals in P g (after Step 5) or \(P_{g}^{\mathit{Refined}}\) (after Step 6) with feasible and more promising solution candidates.

4 PS-optimizing agents using acceptability zones

In a complete information setting, any adoption of SS PS-opt can be allowed because it is possible for a PS-optimizing agent to calculate optimal PS-optimizing negotiation outcomes and corresponding negotiation strategies using its opponent’s known information. However, in practical situations, it may be difficult for agents to obtain complete information about their opponents because agents generally do not expose their private information, strategies and preferences due to strategic reasons. Nevertheless, one of the greatest challenges in designing agents for PS-optimizing negotiations is the mathematical formulations of optimal PS-optimizing negotiation outcomes and negotiation strategies under a complete information setting (Sect. 4.1). This is because they can be directly used to verify the effectiveness (or correctness) of the coevolved solutions under an incomplete information setting (Sect. 4.2). Herein, PS-optimization under an incomplete information setting largely depends on the choice of SS PS-opt . Hence, for designing PS-optimizing agents operating properly in both complete and incomplete information settings, it is crucial to choose effective SS PS-opt of both PS-optimizing agents.

In determining SS PS-opt , the two solution spaces SS NP and SS NS are considered independently. That is, an agent x adopts \(\mathit{AccZ}_{x}^{\mathit{NP}}\) as SS NP for optimizing price and \(\mathit{AccZ}_{x}^{\mathit{NT}}\) as SS NS for optimizing negotiation speed. Such SS PS-opt adoption provides a significant advantage for executing (population-based) PS-optimizing negotiations and finding effective negotiation strategies for both PS-optimizing agents B and S using a coevolutionary learning approach under an incomplete information setting (Sect. 4.2). This is because B and S do not require each other’s private negotiation parameters for establishing SS PS-opt before conducting a PS-optimizing negotiation as \(\mathit{AccZ}_{x}^{\mathit{NP}}\) and \(\mathit{AccZ}_{x}^{\mathit{NT}}\) of x are independently considered from those of its opponent in the PS-optimization process. To demonstrate the effectiveness of such SS PS-opt , consider a counter-example such that AgZ NP and AgZ NT are adopted as SS NP and SS NS , respectively, for SS PS-opt . Then, before carrying out a PS-optimizing negotiation, each agent needs to achieve its opponent’s private information (such as RP and deadline) for establishing SS PS-opt ; however, estimating the opponent’s accurate private information is a very difficult (and sometimes impossible) problem itself under an incomplete information setting.

4.1 PS-optimizing agents with complete information

Given \(w_{\mathit{NP}}^{x}\) and \(w_{\mathit{NS}}^{x}\), each PS-optimizing agent x with complete information is designed to use: (1) \(\mathit{AccZ}_{x}^{\mathit{NP}}\) (i.e., [IP x ,RP x ]) for optimizing price using \(w_{\mathit{NP}}^{x}\) and (2) \(\mathit{AccZ}_{x}^{\mathit{NT}}\) (i.e., [0,τ x ]) for optimizing negotiation speed using \(w_{\mathit{NS}}^{x}\). Although x still adopts the price utility \(U_{\mathit{NP}}^{x}\) in (5), the speed utility \(U_{\mathit{NS}}^{x}\) in (6) needs to be further modified to reflect characteristics of the preference of negotiation speed of x.

Using \(w_{\mathit{NP}}^{x}\) and \(w_{\mathit{NS}}^{x}, x\) decides the desired agreement price (\(\mathit{dP}_{c}^{x}\)) in \(\mathit{AccZ}_{x}^{\mathit{NP}}\) and desired agreement time (\(\mathit{dT}_{c}^{x}\)) in \(\mathit{AccZ}_{x}^{\mathit{NT}}\), respectively. First, x determines \(\mathit{dP}_{c}^{x}\) satisfying the preference criterion of price in \(\mathit{AccZ}_{x}^{\mathit{NP}}\) using \(w_{\mathit{NP}}^{x}\):

(8)

x treats \(\mathit{dP}_{c}^{x}\) as the most favorable possible agreement price because at which x maximizes \(U_{\mathit{NP}}^{x}\) in (5) and hence, it is the upper bound of \(U_{\mathit{NP}}^{x}\) satisfying its preference criterion of price; (1) if x concedes less price utility than \(U_{\mathit{NP}}^{x}(\mathit{dP}_{c}^{x})\), the price utility of its opponent will be decreased, and conversely, (2) if x concedes more price utility than \(U_{\mathit{NP}}^{x}(\mathit{dP}_{c}^{x}), U_{\mathit{NP}}^{x}\) will be decreased. Second, x determines \(\mathit{dT}_{c}^{x}\) satisfying the preference criterion of negotiation speed in \(\mathit{AccZ}_{x}^{\mathit{NT}}\) using \(w_{\mathit{NS}}^{x}\):

$$ \mathit{dT}_{c}^{x} = \tau_{x} \cdot\bigl(1 - w_{\mathit{NS}}^{x}\bigr). $$
(9)

x treats the range of negotiation time in \([0, \mathit{dT}_{c}^{x}]\) as the favorable possible agreement times because: (1) all the agreement times shorter than \(\mathit{dT}_{c}^{x}\) satisfy the preference criterion of negotiation speed and (2) when the negotiation begins, all the negotiation times in \([0, \mathit{dT}_{c}^{x}]\) can be mutually favorable possible agreement times satisfying the preference criterion of negotiation speed for both x and its opponent. To incorporate \(U_{\mathit{NS}}^{x}\) in (6) with the favorable possible agreement times, we define the new speed utility function \(U_{\mathit{NS}\text{-}\mathit{mapped}}^{x}(T_{c}^{x})\) for an agreement time \(T_{c}^{x}\) as follows:

$$ U_{\mathit{NS}\text{-}\mathit{mapped}}^{x}\bigl(T_{c}^{x}\bigr) = \begin{cases} U_{\mathit{NS}}^{x}(\mathit{dT}_{c}^{x}), & \mbox{if }T_{c}^{x} \in [0, \mathit{dT}_{c}^{x}], \\ U_{\mathit{NS}}^{x}(T_{c}^{x}), & \mathrm{otherwise}. \end{cases} $$
(10)

Then, the total utility function \(U_{\mathit{Total}}^{x}\) in (7) is modified as follows:

$$ \begin{aligned}[b] U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}( P_{x},T_{x} ) &= w_{\mathit{NP}}^{x} \times U_{\mathit{NP}}^{x}( P_{x} ) \\ &\quad {}+ w_{\mathit{NS}}^{x} \times U_{\mathit{NS}\text{-}\mathit{mapped}}^{x} \bigl(T_{c}^{x}\bigr). \end{aligned} $$
(11)

Compared to P-optimizing agents, each PS-optimizing agent x: (1) makes concessions in the range of prices from IP x up to the price less than \(\mathit{dP}_{c}^{x}\), which corresponds to (at most) the amount of the price utility \(w_{\mathit{NP}}^{x}(|\mathit{IP}_{x} - \mathit{dP}_{c}^{x}|)\), and (2) aims to achieve a faster agreement time that is equal to or less than \(\mathit{dT}_{c}^{x}\) in the hope of achieving (at least) the amount of speed utility \(U_{\mathit{NS}}^{x}(|\tau_{x} - \mathit{dT}_{c}^{x}|)\). In a P-optimizing negotiation under a complete information setting, \(P_{c}^{P\text{-}\mathit{opt}}\) and \(T_{c}^{P\text{-}\mathit{opt}}\) can be specified by either of Theorems 1 and 2 (Sect. 2.1) depending on a bargaining advantage in terms of time. For a PS-optimizing negotiation under a complete information setting, a similar analysis based on a bargaining advantage in terms of time can be applied to determine the optimal agreement price and negotiation time.

At first, we define possible AgZ NP and AgZ NT between the PS-optimizing agents B and S in NSS. The possible AgZ NP determined by \(\mathit{dP}_{c}^{B}\) and \(\mathit{dP}_{c}^{S}\) is given as \([\min(\mathit{dP}_{c}^{B}, \mathit{dP}_{c}^{S}), \max(\mathit{dP}_{c}^{B}, \mathit{dP}_{c}^{S})]\) and the possible AgZ NT determined by \(\mathit{dT}_{c}^{B}\) and \(\mathit{dT}_{c}^{S}\) is given as \([0, \min\{ \mathit{dT}_{c}^{B}, \mathit{dT}_{c}^{S}\}]\)—which is the overlapping region of \(\mathit{dT}_{c}^{B}\) and \(\mathit{dT}_{c}^{S}\). Then, following Definition 2, PS-optimizing agents are designed to optimize price and speed utilities in the possible AgZ NP and AgZ NT, respectively, to maximize the total utility in (11). Next, the optimal PS-optimizing negotiation outcomes consisting of the optimal agreement price (\(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\)) and optimal agreement time (\(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\)) will be determined based on a bargaining advantage in terms of time and are defined as follows.

Definition 3

If a PS-optimizing agent x has a bargaining advantage in terms of time over its opponent, (1) \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is the price that maximizes \(U_{\mathit{NP}}^{x}\) in the possible AgZ NP and (2) \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is the range of negotiation times satisfying both agents’ preference criteria of negotiation speed in the possible AgZ NT.

Following Definition 3, \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) are obtained from the following Theorems 3 and 4 depending on a bargaining advantage in terms of time.

Theorem 3

If the PS-optimizing agent B has a longer deadline than the PS-optimizing agent S, (1) \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is made at \(\min(\mathit{dP}_{c}^{B}, \mathit{dP}_{c}^{S})\) and (2) any agreement time in \([0, \min(\mathit{dT}_{c}^{B}, \mathit{dT}_{c}^{S})]\) is \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\).

Proof

Since B has a longer deadline than S,B has a bargaining advantage over S in terms of time; hence, the final agreement price and agreement time will be completely determined by B. From Definition 3, \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is \(\min(\mathit{dP}_{c}^{B}, \mathit{dP}_{c}^{S})\) at which \(U_{\mathit{NP}}^{B}\) is maximized. Since \(\mathit{AccZ}_{B}^{\mathit{NT}}\) is \([0, \mathit{dT}_{c}^{B}]\) and \(\mathit{AccZ}_{S}^{\mathit{NT}}\) is \([0, \mathit{dT}_{c}^{S}], T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is determined as \([0, \min\{ \mathit{dT}_{c}^{B}, \mathit{dT}_{c}^{S}\}]\)—which is the overlapping region between \(\mathit{AccZ}_{B}^{\mathit{NT}}\) and \(\mathit{AccZ}_{S}^{\mathit{NT}}\) and at which both B and S satisfy their preference criteria of negotiation speed. □

Theorem 4

If the PS-optimizing agent S has longer deadline than the PS-optimizing agent B, (1) \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is made at \(\max(\mathit{dP}_{c}^{B}, \mathit{dP}_{c}^{S})\) and (2) any agreement time in \([0, \min(\mathit{dT}_{c}^{B},\allowbreak \mathit{dT}_{c}^{S})]\) is \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\).

Proof

Symmetrically, \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is \(\max(\mathit{dP}_{c}^{B}, \mathit{dP}_{c}^{S})\) at which \(U_{\mathit{NP}}^{S}\) is maximized; \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is the overlapping region \([0,\allowbreak \min\{ \mathit{dT}_{c}^{B}, \mathit{dT}_{c}^{S}\}]\) at which both B and S satisfy their preference criteria of negotiation speed. □

Figure 2 shows an example of the agreement behavior between PS-optimizing agents B and S when B has a longer deadline than S. The negotiation parameters for B and S are as follows: (1) IP B =5, RP B =80, τ B =100 and \((w_{\mathit{NP}}^{B}, w_{\mathit{NS}}^{B}) = (0.7, 0.3)\) for B; (2) IP S =95, RP S =15, τ S =50 and \((w_{\mathit{NP}}^{S}, w_{\mathit{NS}}^{S}) = (0.5, 0.5)\) for S. The NSS is [15,80] for AgZ NP and [0,50] for AgZ NT. \(\mathit{dP}_{c}^{x}\) and \(\mathit{dT}_{c}^{x}\) are determined by (8) and (9), respectively: (1) For the given \((w_{\mathit{NP}}^{B}, w_{\mathit{NS}}^{B}) = (0.7, 0.3), \mathit{dP}_{c}^{B} = 27.5\) and \(\mathit{dT}_{c}^{B} = 70\) and (2) for the given \((w_{\mathit{NP}}^{S}, w_{\mathit{NS}}^{S}) = (0.5, 0.5), \mathit{dP}_{c}^{S} = 55\) and \(\mathit{dT}_{c}^{S} = 25\). Then, the possible AgZ NP is determined as [55, 70] and the possible AgZ NT is determined as [0, 25]. Finally, \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) will be determined by Theorem 3 (because B has the bargaining advantage over S); hence, \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is 27.5 and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is [0, 25].

Fig. 2
figure 2

Agreement behavior of PS-optimizing negotiation when sufficient AgZ NP is provided

We have so far determined \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) using Theorems 3 and 4 under the assumption that \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) belong to NSS (e.g., Fig. 2). However, if \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is outside of NSS, any agreement cannot be made within NSS; therefore, Theorems 3 and 4 can be no longer applied to determine \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\). In general, \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) sometimes may be outside of NSS depending on the input parameter values of the agent d having a bargaining advantage in terms of time. This is because d determines \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) based on optimizing \(\mathit{AccZ}_{d}^{\mathit{NP}}\) and \(\mathit{AccZ}_{d}^{\mathit{NT}}\) using \(w_{\mathit{NP}}^{d}\) and \(w_{\mathit{NS}}^{d}\), respectively, without considering the agreement zones between d and its opponent (i.e., d does not consider AgZ NP and AgZ NT for carrying out the PS-optimizing negotiation). \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) can be made outside of AgZ NP if: (1) d is B and \(\mathit{dP}_{c}^{B}\) is less than RP S and (2) d is S and \(\mathit{dP}_{c}^{S}\) is larger than RP B (e.g., see Fig. 3). However, in the case of \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\), it always belongs to AgZ NT because \(\mathit{dT}_{c}^{B}\) and \(\mathit{dT}_{c}^{S}\) are always less than or equal to τ B and τ S , respectively. In summary, it is concluded that for x: (1) if sufficient AgZ NP is provided for carrying out a PS-optimizing negotiation, \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) can be determined using either Theorems 3 or 4; however, (2) if AgZ NP is provided for carrying out a PS-optimizing negotiation, \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) cannot be determined using Theorems 3 and 4.

Fig. 3
figure 3

Agreement behavior of PS-optimizing negotiation when insufficient AgZ NP is provided

Given that insufficient AgZ NP is provided for carrying out a PS-optimizing negotiation, the following definition is adopted for designing agreement behaviors of PS-optimizing agents.

Definition 4

If insufficient AgZ NP is provided for carrying out a PS-optimizing negotiation, \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) are made at the nearest point in NSS from \(\mathit{dP}_{c}^{d}\) and \(\mathit{dT}_{c}^{d}\) of the agent d having a bargaining advantage in terms of time.

Definition 4 leads to Theorem 5 showing that the proposed PS-optimizing agents satisfy (at least) the minimum performance requirement for PS-optimizing agents (defined in Sect. 2.2) even under the given condition that insufficient AgZ NP is provided.

Theorem 5

If insufficient AgZ NP is provided, PS-optimizing negotiation outcomes equal to P-optimizing negotiation outcomes for the given same negotiation settings.

Proof

If B has a bargaining advantage in terms of time, the nearest price from \(\mathit{dP}_{c}^{B}\) in NSS is RP S (at which \(U_{\mathit{NP}}^{B}\) is maximized) and the nearest negotiation time is τ S . Hence, following Definition 4, \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) are made at RP S and τ S , respectively—which has the same result of the P-optimizing negotiation given by Theorem 1. Similarly, if S has a bargaining advantage in terms of time, the nearest price from \(\mathit{dP}_{c}^{S}\) in NSS is RP B (at which the price utility \(U_{\mathit{NP}}^{S}\) is maximized) and the nearest negotiation time is τ B . Hence, following Definition 4, \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) are made at RP B and τ B , respectively—which has the same result of the P-optimizing negotiation given by Theorem 2. □

Figure 3 shows an example of the agreement behaviors between PS-optimizing agents B and S when (1) B has a longer deadline than S and (2) insufficient AgZ NP is provided. The negotiation parameters for B and S are as follows: (1) IP B =15, RP B =60, τ B =100 and \((w_{\mathit{NP}}^{B}, w_{\mathit{NS}}^{B}) = (0.7, 0.3)\) for B; (2) IP S =85, RP S =40, τ S =50 and \((w_{\mathit{NP}}^{S}, w_{\mathit{NS}}^{S}) = (0.5, 0.5)\) for S. The NSS is [40,60] for AgZ NP and [0,50] for AgZ NT. \(\mathit{dP}_{c}^{x}\) and \(\mathit{dT}_{c}^{x}\) are determined by (8) and (9), respectively; (1) For the given \((w_{\mathit{NP}}^{B}, w_{\mathit{NS}}^{B}) = (0.7, 0.3)\), \(\mathit{dP}_{c}^{B} = 28.5\) and \(\mathit{dT}_{c}^{B} = 70\) and (2) for the given \((w_{\mathit{NP}}^{S}, w_{\mathit{NS}}^{S}) = (0.5, 0.5)\), \(\mathit{dP}_{c}^{S} = 62.5\) and \(\mathit{dT}_{c}^{S} = 25\). Then, the possible AgZ NP is determined as [40,60] and the possible AgZ NT is determined as [0,25]. Since B has a bargaining advantage over S in terms of time, \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is 28.5 and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is [0,25] following Theorem 3. However, \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is not in AgZ NP while \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) is in AgZ NT; the agreement points of such PS-optimizing negotiation are not made in NSS (i.e., PS-optimizing negotiation fails without making an agreement). Therefore, from Theorem 5, \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) are made at 28.5 and 25, respectively—which are equal to \(P_{c}^{P\text{-}\mathit{opt}}\) and \(T_{c}^{P\text{-}\mathit{opt}}\) achieved from Theorem 1 (i.e., \(P_{c}^{P\text{-}\mathit{opt}} = 28.5\) and \(T_{c}^{P\text{-}\mathit{opt}} = 25\)).

Finally, from the achieved \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) (using one of Theorems 3 to 5), optimal negotiation strategies of PS-optimizing agents B and S (to carry out the optimal PS-optimizing negotiation) can be achieved. Specifically, the optimal PS-optimizing negotiation strategies of B and S are derived from (1) by substituting \(P_{t}^{x}\) and t in (1) by \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\), respectively, as follows:

(12)
(13)

4.2 PS-optimizing agents with incomplete information

Given \(w_{\mathit{NP}}^{x}\) and \(w_{\mathit{NS}}^{x}\), each PS-optimizing agent x with incomplete information also adopts: (1) \(\mathit{AccZ}_{x}^{\mathit{NP}}\) for optimizing price using \(w_{\mathit{NP}}^{x}\) and (2) \(\mathit{AccZ}_{x}^{\mathit{NT}}\) for optimizing negotiation speed using \(w_{\mathit{NS}}^{x}\). Since PS-optimizing agents with incomplete information do not know their opponents’ private information (such as RP, deadline and preferences for price and negotiation time), their opponents’ desired agreement points are unknown. Therefore, the agents cannot apply one of Theorems 3 to 5 directly for determining their optimal negotiation outcomes. Owing to the lack of information about their opponents’ private information, this research adopts a coevolutionary learning approach to find effective PS-optimizing negotiation strategies for both PS-optimizing agents B and S. Coevolutionary learning approaches have long been used to model competitive coevolution problems (e.g., the iterated prisoner’s dilemma [3]).

Given that PS-optimizing agents B (with IP B , RP B , τ B and \((w_{\mathit{NP}}^{B}, w_{\mathit{NS}}^{B})\)) and S (with IP S , RP S , τ S and \((w_{\mathit{NP}}^{S}, w_{\mathit{NS}}^{S})\)), the following three key components are required for the coevolutionary learning:

  1. (C1)

    Creating populations: Two heterogeneous populations with size n P are created: POP B where individuals consist of Bs and POP S where individuals consist of Ss. Throughout the rest of this paper, we use: (1) the term “individual” interchangeably with “agent” and (2) the term “solution” interchangeably with “(PS-optimizing) negotiation strategy” depending on the context.

  2. (C2)

    Initialization: Individuals of POP B and POP S are initialized. All individuals in POP B are initialized with IP B , RP B , τ B and \((w_{\mathit{NP}}^{B}, w_{\mathit{NS}}^{B})\); however, the negotiation strategy of each individual in POP B is randomly determined in the possible strategy range [λ lower ,λ upper ] where λ lower is the lower bound of possible strategies and λ upper is the upper bound of possible strategies. In the same manner, all individuals in POP S are initialized with IP S , RP S , τ S and \((w_{\mathit{NP}}^{S}, w_{\mathit{NS}}^{S})\); however, the negotiation strategy of each individual in POP S is randomly determined in [λ lower ,λ upper ]. Hence, individuals are mainly characterized by their negotiation strategies in its population.

  3. (C3)

    Making interactions between populations: Coevolutionary interactions between POP B and POP S are carried out as follows:

    1. a.

      Individuals in POP B and POP S are randomly chosen and matched in a one-to-one manner.

    2. b.

      Each matched pair of POP B and POP S conducts a PS-optimizing negotiation without the knowledge of private information of its opponent.

    As a result of the coevolutionary interaction, individuals in POP B and POP S obtain negotiation outcomes.

The coevolutionary learning procedure using the same type of EDAs for both POP B and POP S is as follows. Two EDAs (i.e., either two S-EDAs or two ID2C-EDAs) are adopted for coevolving POP B and POP S , respectively: one EDA for POP B and the other EDA for POP S . Here, Step 1 of S-EDAs and ID2C-EDAs is substituted by the above C1 and C2 to create populations and initialize them. Throughout its evolution procedure described in Sect. 3, each EDA evolves solutions (of individuals) in its population. Each EDA evaluates fitness of individuals from the negotiation outcomes of individuals obtained from C3. Therefore, C3 is executed in the fitness evaluation stage before applying selection. In summary, the negotiation strategies of agents in both POP B and POP S are coevolved from: (1) the coevolutionary interaction between POP B and POP S in C3 and (2) the evolution procedure for evolving solutions in each population.

In the coevolutionary learning procedure, the main issues for finding (or coevolving) effective PS-optimizing negotiation strategies of B and S are: (1) adopting (or developing) EAs suitable for the coevolution, and (2) given \((w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x})\), designing an appropriate fitness function that can achieve good candidate solutions for a PS-optimizing negotiation.

First, for coevolving optimal PS-optimizing strategies between B and S, special EAs called ID2C-EDAs were adopted for both POP B and POP S due to its effectiveness in coevolution learning [8, 10] while S-EDAs were used for comparative studies of the coevolution performance. The coevolutionary learning approach adopting ID2C-EDAs allows us to achieve an approximation to the optimal PS-optimizing negotiation strategies achieved in the complete information setting in Sect. 4.1.

We then need to consider how a PS-optimizing agent x represents the preference of price using \(w_{\mathit{NP}}^{x}\) in \(\mathit{AccZ}_{x}^{\mathit{NP}}\) and the preference of negotiation speed using \(w_{\mathit{NS}}^{x}\) in \(\mathit{AccZ}_{x}^{\mathit{NT}}\) in an incomplete information setting. The same definitions of \(\mathit{dP}_{c}^{x}\) in (8) and \(\mathit{dT}_{c}^{x}\) in (9) were adopted for x with incomplete information adopts. Accordingly, x with incomplete information treats \(\mathit{dP}_{c}^{x}\) as the most favorable agreement price and \(\mathit{dT}_{c}^{x}\) as one of the favorable agreement times. Then, the fitness function maximizing its values at \(\mathit{dP}_{c}^{x}\) and \(\mathit{dT}_{c}^{x}\) for the given \((w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x})\) is required. We will briefly examine some drawbacks of the fitness functions in the previous studies and then describe the details of the proposed fitness function.

In most of the previous studies (e.g., [810, 23, 32] and [19] where they mostly dealt with P-optimizing negotiations), fitness functions as the form of utility functions were widely adopted. Therefore, total utility functions \(U_{\mathit{Total}}^{x}\) in (7) and \(U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}\) in (11) also can be adopted as fitness functions for the coevolutionary learning. To do this, we need to evaluate the suitability (or effectiveness) of the fitness functions adopting \(U_{\mathit{Total}}^{x}\) and \(U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}\). \(U_{\mathit{Total}}^{x}\) is calculated by linearly combining the results of: (1) \(U_{\mathit{NP}}^{x}\) in (5) multiplied by \(w_{\mathit{NP}}^{x}\) and (2) \(U_{\mathit{NS}}^{x}\) in (6) multiplied by \(w_{\mathit{NS}}^{x}\). Similarly, \(U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}\) is calculated by linearly combining the results of: (1) \(U_{\mathit{NP}}^{x}\) in (5) multiplied by \(w_{\mathit{NP}}^{x}\) and (2) \(U_{\mathit{NS}\text{-}\mathit{mapped}}^{x}\) in (10) multiplied by \(w_{\mathit{NS}}^{x}\). The difference between the fitness functions using \(U_{\mathit{Total}}^{x}\) and \(U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}\) is that the fitness function adopting \(U_{\mathit{Total}}^{x}\) considers the different emphases in calculating \(U_{\mathit{NS}}^{x}\) in \([0, \mathit{dT}_{c}^{x}]\) by giving a higher value to \(U_{\mathit{NS}}^{x}\) for a smaller agreement time. However, the fitness function adopting \(U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}\) has the same value of \(U_{\mathit{NS}}^{x}\) for all agreement times belonging to \([0, \mathit{dT}_{c}^{x}]\) as \(U_{\mathit{NS}}^{x}(\mathit{dT}_{c}^{x})\). Using the fitness function as either \(U_{\mathit{Total}}^{x}\) or \(U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}\), the coevolution performance in terms of intensification capability for coevolving converged solutions can be severely deteriorated. As a result, both EDAs for POP B and POP S generally cannot evolve effective PS-optimizing negotiation strategies (within reasonable generations). In the case of the fitness function adopting \(U_{\mathit{Total}}^{x}\), this is mainly because the fitness at different \(P_{c}^{x}\) and \(T_{c}^{x}\) can have the same value as \(U_{\mathit{Total}}^{x}( \mathit{dP}_{c}^{x},\mathit{dT}_{c}^{x} )\). For example, consider the case that: (1) an individual i obtained a negotiation outcome at the agreement price \(P_{c}^{i}\) and the agreement time \(T_{c}^{i}\) where \(U_{T}^{i}\) consists of \(U_{\mathit{NP}}^{i}(P_{c}^{i})\) (\(<\nobreak U_{\mathit{NP}}^{i}(\mathit{dP}_{c}^{i})\)) and \(U_{\mathit{NS}}^{i}(P_{c}^{i})\) (\(>\nobreak U_{\mathit{NS}}^{i}(\mathit{dT}_{c}^{i})\)) and (2) another individual j obtained a negotiation outcome at the agreement price \(P_{c}^{j}\) and the agreement time \(T_{c}^{j}\) where \(U_{\mathit{Total}}^{j}\) consists of \(U_{\mathit{NP}}^{j}(P_{c}^{j})\) (\(> U_{\mathit{NP}}^{j}(\mathit{dP}_{c}^{j})\)) and \(U_{\mathit{NS}}^{j}(T_{c}^{j})\) (\(< U_{\mathit{NS}}^{j}(\mathit{dT}_{c}^{j})\)). Then, there are many possible combinations of values for \((P_{c}^{i},T_{c}^{i})\) and \((P_{c}^{j},T_{c}^{j})\) where the fitness \(\mathit{fit}(P_{c}^{i},T_{c}^{i})\)—set as \(U_{\mathit{Total}}^{i}(P_{c}^{i},T_{c}^{i}) = w_{\mathit{NP}}^{x} \times U_{\mathit{NP}}^{i}(P_{c}^{i}) + w_{\mathit{NS}}^{x} \times U_{\mathit{NS}}^{i}(T_{c}^{i})\)—is equal to \(\mathit{fit}(P_{c}^{j},T_{c}^{j})\)—set as \(U_{\mathit{Total}}^{j}(P_{c}^{j},T_{c}^{j}) = w_{\mathit{NP}}^{x} \times U_{\mathit{NP}}^{j}(P_{c}^{j}) + w_{\mathit{NS}}^{x} \times U_{\mathit{NS}}^{j}(T_{c}^{j})\), in which both are equal to \(U_{\mathit{Total}}^{x}(\mathit{dP}_{c}^{x},\mathit{dT}_{c}^{x})\). Hence, the ambiguity (in representing \(\mathit{fit}(P_{c}^{x},T_{c}^{x})\) that needs to be maximized at \(U_{\mathit{Total}}^{x}(\mathit{dP}_{c}^{x},\mathit{dT}_{c}^{x})\)) makes the coupled fitness landscape for the coevolutionary learning more complicated (to evolve effective solutions), which leads to the deterioration of intensification capability of both EDAs. In the case of the fitness function adopting \(U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}\), the problem of such ambiguity may be solved to some extent compared to the fitness function adopting \(U_{\mathit{Total}}^{x}\). This is because fitness values can be solely determined by price utility if the agreement times of all individuals are made in \([0, \mathit{dT}_{c}^{x}]\) in which speed utilities of all individuals are the same. However, we cannot always guarantee that all agreement times are within \([0, \mathit{dT}_{c}^{x}]\) because these depend on the negotiation parameter settings and the evolution of EDAs. Hence, although the fitness function using \(U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}\) is more robust than the fitness function adopting \(U_{\mathit{Total}}^{x}\) in charactering more promising solutions, both cannot solve the problem of the ambiguity (completely) and prevent deterioration of intensification capability. From this analysis, we found that the fitness functions (as the form of the total utility functions in \(U_{\mathit{Total}}^{x}\) and \(U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}\)) are not appropriate for coevolving effective PS-optimizing negotiation strategies (within reasonable numbers of generations).

For designing an effective fitness function, this work uses price and speed likelihood functions instead of using price and speed utility functions directly. Given \(w_{\mathit{NP}}^{x}\) and \(w_{\mathit{NS}}^{x}\), the price likelihood function (\(\mathit{Lh}_{\mathit{NP}}^{x}\)) measures the likelihood between P x and \(\mathit{dP}_{c}^{x}\) and the speed likelihood function (\(\mathit{Lh}_{\mathit{NS}}^{x}\)) measures the likelihood between T x and \(\mathit{dT}_{c}^{x}\).

\(\mathit{Lh}_{\mathit{NP}}^{x}\) for a price P x is defined as follows:

$$ \mathit{Lh}_{\mathit{NP}}^{x}(P_{x}) = \begin{cases} \frac{1}{\sqrt{\pi \cdot \rho_{\mathit{NP}}}} \exp\bigl( - \frac{\frac{| \mathit{RP}_{x} - P_{x} |}{| \mathit{RP}_{x} - \mathit{IP}_{x} |} - w_{\mathit{NP}}^{x}}{\rho_{\mathit{NP}}} \bigr)^{2},\\ \quad \mbox{if an agreement is reached}, \\ 0, \quad \mbox{otherwise}, \\ \end{cases} $$
(14)

where |RP x P x |/|RP x IP x | is a relative position of P x in \(\mathit{AccZ}_{x}^{\mathit{NP}}\). As P x is close to \(\mathit{dP}_{c}^{x}, | \mathit{RP}_{x} - P_{x} |/| \mathit{RP}_{x} - \mathit{IP}_{x} | - w_{\mathit{NP}}^{x}\) will have a smaller value; hence, a higher value of \(\mathit{Lh}_{\mathit{NP}}^{x}\) will be obtained. Through empirical studies, the deviation of \(\mathit{Lh}_{\mathit{NP}}^{x}\) (i.e., the shape of \(\mathit{Lh}_{\mathit{NP}}^{x}\)) is designed to be very narrow by normalizing it with \(\rho_{\mathit{NP}} = w_{\mathit{NP}}^{x}/100\) in which ρ NP can be considered as a weighting factor to put more emphasis on P x (i.e., for achieving a higher value of \(\mathit{Lh}_{\mathit{NP}}^{x}\)) if it is closer to \(\mathit{dP}_{c}^{x}\). Instead of using price utility in (4), (14) measures the closeness of P x from \(\mathit{dP}_{c}^{x}\) using the Gaussian likelihood function designed to maximize \(\mathit{Lh}_{\mathit{NP}}^{x}(P_{x})\) at \(P_{x} = \mathit{dP}_{c}^{x}\).

\(\mathit{Lh}_{\mathit{NS}}^{x}\) for a negotiation time T x is defined as follows:

$$ \mathit{Lh}_{\mathit{NS}}^{x}(T_{x}) = \begin{cases} \frac{1}{\sqrt{\pi \cdot \rho_{\mathit{NS}}}} \exp\bigl( - \frac{( 1.0 - \frac{T_{x}}{\tau_{x}} ) - w_{\mathit{NS}}^{x}}{\rho_{\mathit{NS}}} \bigr)^{2}, \\ \quad \mbox{if an agreement is reached}, \\ 0, \quad \mbox{otherwise}, \\ \end{cases} $$
(15)

where (1.0−T x /τ x ) is a relative position of T x in \(\mathit{AccZ}_{x}^{\mathit{NT}}\). As T x is close to \(\mathit{dT}_{c}^{x}, (1.0 - T_{x}/\tau_{x}) - w_{\mathit{NS}}^{x}\) will have the smaller value; hence, a higher value of \(\mathit{Lh}_{\mathit{NS}}^{x}\) will be obtained. Through empirical studies, the deviation of \(\mathit{Lh}_{\mathit{NS}}^{x}\) is designed to be not very narrow (compared to the deviation of \(\mathit{Lh}_{\mathit{NP}}^{x}\)) by normalizing it with \(\rho_{\mathit{NS}} = w_{\mathit{NS}}^{x}\). ρ NS has a large value to put less emphasis on \(\mathit{dT}_{c}^{x}\) because a shorter agreement time will be better for both B and S; however, setting ρ NS to be too large will slow down the coevolution speed because large possible solution candidates can slow down coevolution speed, which affects the intensification capability of EDAs. Instead of using speed utility in (5) or (10), (15) measures the closeness of T x from \(\mathit{dT}_{c}^{x}\) using the Gaussian likelihood function designed to maximize \(\mathit{Lh}_{\mathit{NS}}^{x}(T_{x})\) at \(T_{x} = \mathit{dT}_{c}^{x}\). In regard to optimizing negotiation speed, a special mapping function \(f_{t\text{-}\mathit{map}}^{x}(T_{c}^{x})\) for negotiation time was adopted and is defined as follows:

$$ f_{t\text{-}\mathit{map}}^{x}\bigl(T_{c}^{x}\bigr) = \begin{cases} \mathit{dT}_{c}^{x}, & \mbox{if }T_{c}^{x} \in [0, \mathit{dT}_{c}^{x}] ,\\ T_{c}^{x}, & \mbox{otherwise}. \\\end{cases} $$
(16)

This is essential to assist and realize the coevolution of effective PS-optimizing negotiation strategies for both B and S because: (1) all \(T_{c}^{x}\) in \([0, \mathit{dT}_{c}^{x}]\) satisfy the preference of negotiation speed and (2) in general, a shorter negotiation time will be better than a longer negotiation time.

Finally, using \(\mathit{Lh}_{\mathit{NP}}^{x}\) in (14) and \(\mathit{Lh}_{\mathit{NS}}^{x}\) in (15) together with \(f_{t\text{-}\mathit{map}}^{x}\) in (16), the final proposed fitness function for EDAs is defined as follows:

(17)

The more \(\mathit{Lh}_{\mathit{NP}}^{x}(P_{c}^{x})\) is close to \(\mathit{Lh}_{\mathit{NP}}^{x}(\mathit{dP}_{c}^{x})\) and \(w_{\mathit{NP}}^{x}\) is large, the more the value of the exponential function for price in (17) is large. Similarly, the more \(\mathit{Lh}_{\mathit{NS}}^{x}(T_{c}^{x})\) is close to \(\mathit{Lh}_{\mathit{NS}}^{x}(\mathit{dT}_{c}^{x})\) and \(w_{\mathit{NS}}^{x}\) is large, the more the value of the exponential function for negotiation speed in (17) is large. Therefore, \(\mathit{fit}(P_{c}^{x},T_{c}^{x})\) emphasizes the exponential functions for price and negotiation speed by linearly combining them with \(w_{\mathit{NP}}^{x}\) and \(w_{\mathit{NS}}^{x}\), respectively.

5 Empirical evaluation and analysis

In this section, we first detail the methodology for analyzing the performance of coevolved PS-optimizing negotiation strategies (of PS-optimizing agents B and S) using S-EDAs or ID2C-EDAs under an incomplete information setting. We then proceed to the actual empirical study of the coevolved PS-optimizing negotiation strategies by comparing them with the optimal negotiation strategies obtained under a complete information setting. Furthermore, we compare and analyze the empirical results of S-EDAs and ID2C-EDAs obtained from coevolutionary learning.

5.1 Methodology

For PS-optimizing agents B and S, optimal PS-optimizing negotiation outcomes and the corresponding negotiation strategies are subject to the size of AgZ NP as described in Sect. 4.1. Hence, two groups of experiments were designed to evaluate the performance of PS-optimizing negotiations for the different negotiation situations: (1) Type-I experiments for the negotiation settings with sufficient AgZ NP, and (2) Type-II experiments for the negotiations with insufficient AgZ NP. In each group of experiments, the coevolution results using ID2C-EDAs are compared with those of S-EDAs to evaluate their coevolution performance in finding effective PS-optimizing negotiation strategies.

5.1.1 Testbed

To evaluate the performance of the coevolved PS-optimizing negotiation strategies of the proposed PS-optimizing agents, a simulation testbed consisting of a virtual negotiation environment for supporting population-based negotiations in an incomplete information setting using EDAs was implemented using C++. Both POP B and POP S were evolved using either S-EDAs or ID2C-EDAs as described in Sect. 3. Coevolutionary interaction is achieved from the (population-based) negotiations between POP B and POP S as described in Sect. 4.2. In addition, each POP B and POP S has a controller that: (1) generates agents and initializes their negotiation parameters (such as preferences of price and negotiation speed, IP, RP, deadlines and negotiation strategies), (2) manages the information of the matched pairs of agents between POP B and POP S , (3) monitors the termination status of its EDA and shares the information with the controller of the opponent population to check the termination conditions for the coevolution, which is to terminate both EDAs simultaneously, (4) synchronizes PS-optimizing negotiations and handles message passing and payment transfer between all matched agents, and (5) reinitializes its population and restarts evolution of EDAs when the CNT max is reached.

The experiments were conducted on a computer with Windows XP (32-bit) service pack 3, Intel® Core™2 Duo CPU E8500 @ 3.16 GHz & 3.17 GHz and 4 GB RAM.

5.1.2 Experimental settings

The input parameters for two types of PS-optimizing negotiation agents B and S are described in Table 2.

Table 2 Parameter setting for negotiation agents B and S

The price ranges (determined by IPs and RPs) and strategy ranges for B and S were adopted for the purpose of the Type-I and Type-II experiments. The deadline range of agents was grouped into three categories empirically: Long when negotiation rounds are in [16,30]; Moderate (denoted as Mid in Table 2) when negotiation rounds are in [31,60]; and Short when negotiation rounds are in [31,60]. The deadline ranges [1,15] and [121,∞] were not considered because repeated experimental tuning showed that the average success rate of PS-optimizing negotiations is very low when the agents adopt their deadlines in the ranges. Due to space limitation, three representative values from the three categories are chosen, respectively: 20 for Short, 50 for Mid, and 100 for Long. Based on a bargaining advantage in terms of time, there exist six representative deadline combinations between B and S as follows:

  1. (1)

    (Long, Mid), (Mid, Short) and (Long, Short) for the case that B has a longer deadline than S,

  2. (2)

    (Mid, Long), (Short, Mid) and (Short, Long) for the case that S has a longer deadline than B.

However, since cases 1 and 2 are symmetrical and the similar analysis can be applied to both cases, we only describe the results for the case 1. In the negotiations between B and S, the experiments were set such that S starts its negotiation first by proposing its first proposal to B.

Different emphases on price and negotiation speed (i.e., different weightings between w p and w s ) lead to different groups of preference criteria. Each PS-optimizing agent x has three representative preference criteria as follows:

  1. (1)

    more-P-optimizing case: \((w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x}) = (0.7, 0.3)\)

  2. (2)

    exact-PS-optimizing case: \((w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x}) = (0.5, 0.5)\)

  3. (3)

    more-S-optimizing case: \((w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x}) = (0.3, 0.7)\)

Hence, the following nine combinations are possible between B and S as described in Table 3.

Table 3 Combinations of preference criteria of B and S

The experimental parameter settings for S-EDAs and ID2C-EDAs are described in Table 4. We used the experimentally tuned parameters from [8] and [10].

Table 4 Parameter settings for S-EDAs and ID2C-EDAs

5.1.3 Description of results

Even though extensive simulations were carried out for all the situations, only representative results are presented in this section due to space limitation. Empirical results for the Type-I and Type-II experiments are shown in Tables 6 to 8 and Tables 9 to 11, respectively. All the values in the experimental tables were averaged based on more than 103 runs. The symbols for the results in Tables 6 to 11 and their descriptions are summarized in Table 5. In the results of Tables 6 to 11, the rows for the performance measures (see Sect. 5.1.4) are shaded; in addition, the rows for the results achieved from ID2C-EDAs are in boldface to discriminate and emphasize them with those achieved from S-EDAs.

Table 5 Summary of notation for the results
Table 6 Results of Type-I experiments in (Long, Mid)

5.1.4 Performance measure

Under a complete information setting, optimal PS-optimizing negotiation outcomes and negotiation strategies are achieved as follows: (1) \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) are obtained from equilibrium analyses using Theorems 3 to 5 and (2) from the obtained \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\), \(\lambda_{B}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(\lambda_{S}^{\mathit{PS}\text{-}\mathit{opt}}\) are calculated using (12) and (13), respectively.

Under an incomplete information setting, if EDAs carry out balanced coevolution (for both POP B and POP S ), the agreement price and agreement time obtained for B will be very close to the agreement price and agreement time obtained for S, respectively. This is because the optimal agreement price and agreement time of B should be equal to those of S, respectively. To verify the effectiveness of the coevolved PS-optimizing negotiation strategies of both B and S, we compare the coevolution results (obtained from coevolutionary learning using either S-EDAs or ID2C-EDAs) with the optimal results (obtained under a complete information setting) by examining the following two conditions: (1) closeness to the optimum: the obtained PS-optimizing negotiation outcomes and coevolved PS-optimizing negotiation strategies should be close to the optimal results and (2) balanced coevolution: the obtained negotiation outcomes of B should be the same as those of S as a result of coevolutionary learning.

First, for measuring closeness between the coevolution results and optimal results, the following three types of closeness metric are devised for each type of EDA:

  1. (1)

    \(\delta_{\mathit{dist}}^{P_{c}^{x}}\) measures the closeness between \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and the agreement price \(\bar{P}_{c}^{x\text{ (S-EDA)}}\) (respectively, \(\bar{P}_{c}^{x\text{ (ID$^{2}$C-EDA)}}\)) obtained from coevolutionary learning as follows:

    \(\delta_{\mathit{dist}}^{P_{c}^{x}}\text{(S-EDA)} =|\bar{P}_{c}^{x\text{ (S-EDA)}} - P_{c}^{\mathit{PS}\text{-}\mathit{opt}}|\) for the coevolution using S-EDA,

    \(\delta_{\mathit{dist}}^{P_{c}^{x}}\text{(ID$^{2}$C-EDA)} =|\bar{P}_{c}^{x\text{ (ID$^{2}$C-EDA)}} - P_{c}^{\mathit{PS}\text{-}\mathit{opt}}|\) for the coevolution using ID2C-EDA

where if S-EDA (respectively, ID2C-EDA) has obtained \(\bar{P}_{c}^{x\text{ (S-EDA)}}\) (respectively, \(\bar{P}_{c}^{x\text{ (ID$^{2}$C-EDA)}}\)) that is the same as \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) from coevolutionary learning, then \(\delta_{\mathit{dist}}^{P_{c}^{x}}\text{(S-EDA)}\) (respectively, \(\delta_{\mathit{dist}}^{P_{c}^{x}}\text{(ID$^{2}$C-EDA)}\)) will be 0; otherwise, \(\delta_{\mathit{dist}}^{P_{c}^{x}}\text{(S-EDA)}\) (respectively, \(\delta_{\mathit{dist}}^{P_{c}^{x}}\text{(ID$^{2}$C-EDA)}\)) will be larger than 0.

  1. (2)

    \(\delta_{\mathit{dist}}^{T_{c}^{x}}\) measures the closeness between \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and the agreement time \(\bar{T}_{c}^{x\text{ (S-EDA)}}\) (respectively, \(\bar{T}_{c}^{x\text{ (ID$^{2}$C-EDA)}}\)) obtained from coevolutionary learning as follows:

    \(\delta_{\mathit{dist}}^{T_{c}^{x}}\text{(S-EDA)} = \bar{T}_{c}^{x\text{ (S-EDA)}} - \max(T_{c}^{\mathit{PS}\text{-}\mathit{opt}})\) for the coevolution using S-EDA,

    \(\delta_{\mathit{dist}}^{T_{c}^{x}}\text{(ID$^{2}$C-EDA)} = \bar{T}_{c}^{x\text{ (ID$^{2}$C-EDA)}} - \max(T_{c}^{\mathit{PS}\text{-}\mathit{opt}})\) for the coevolution using ID2C-EDA

where we consider its maximum value \(\max(T_{c}^{\mathit{PS}\text{-}\mathit{opt}})\) as the basis of closeness because \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) can be represented as a range of negotiation time. Hence, if S-EDA (respectively, ID2C-EDA) has obtained \(\bar{T}_{C}^{x\text{ (S-EDA)}}\) (respectively, \(\bar{T}_{c}^{x\text{ (ID$^{2}$C-EDA)}}\)) belonging to \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\), then \(\delta_{\mathit{dist}}^{T_{c}^{x}}\text{(S-EDA)}\) (respectively, \(\delta_{\mathit{dist}}^{T_{c}^{x}}\text{(ID$^{2}$C-EDA)}\)) will be less than or equal to 0 (i.e., negative real numbers or 0); otherwise, \(\delta_{\mathit{dist}}^{T_{c}^{x}}\text{(S-EDA)}\) (respectively, \(\delta_{\mathit{dist}}^{T_{c}^{x}}\text{(ID$^{2}$C-EDA)}\)) will have positive real numbers.

  1. (3)

    \(\delta_{\mathit{dist}}^{\lambda _{x}}\) measures the closeness between \(\lambda_{x}^{\mathit{PS}\text{-}\mathit{opt}}\) and coevolved PS-optimizing negotiation strategy \(\bar{\lambda}_{x}^{\mathit{PS}\text{-}\mathit{opt}\text{ (S-EDA)}}\) (respectively, \(\bar{\lambda}_{x}^{\mathit{PS}\text{-}\mathit{opt}\text{ (ID$^{2}$C-EDA)}}\)) from coevolutionary learning as follows:

    \(\delta_{\mathit{dist}}^{\lambda _{x}}\text{(S-EDA)} = \bar{\lambda}_{x}^{\mathit{PS}\text{-}\mathit{opt}\text{ (S-EDA)}} - \max(\lambda_{x}^{\mathit{PS}\text{-}\mathit{opt}})\) for the coevolution using S-EDA,

    \(\delta_{\mathit{dist}}^{\lambda _{x}}\text{(ID$^{2}$C-EDA)} = \bar{\lambda}_{x}^{\mathit{PS}\text{-}\mathit{opt}\text{ (ID$^{2}$C-EDA)}} - \max(\lambda_{x}^{\mathit{PS}\text{-}\mathit{opt}})\) for the coevolution using ID2C-EDA

where we consider the maximum value \(\max(\lambda_{x}^{\mathit{PS}\text{-}\mathit{opt}})\) as the basis of closeness because \(\lambda_{x}^{\mathit{PS}\text{-}\mathit{opt}}\) can be represented as a range of strategy. Hence, if S-EDA (respectively, ID2C-EDA) has coevolved \(\bar{\lambda}_{x}^{\mathit{PS}\text{-}\mathit{opt}\text{ (S-EDA)}}\) (respectively, \(\bar{\lambda}_{x}^{\mathit{PS}\text{-}\mathit{opt}\text{ (ID$^{2}$C-EDA)}}\)) belonging to \(\lambda_{x}^{\mathit{PS}\text{-}\mathit{opt}}\), then \(\delta_{\mathit{dist}}^{\lambda _{x}}\text{(S-EDA)}\) (respectively, \(\delta_{\mathit{dist}}^{\lambda _{x}}\text{(ID$^{2}$C-EDA)}\)) will be less than or equal to 0 (i.e., negative real numbers or 0); otherwise, \(\delta_{\mathit{dist}}^{\lambda _{x}}\text{(S-EDA)}\) (respectively, \(\delta_{\mathit{dist}}^{\lambda _{x}}\text{(ID$^{2}$C-EDA)}\)) will have positive real numbers.

Second, for checking whether balanced negotiation outcomes were achieved, we will simply compare the closeness metric of B with that of S for the obtained agreement prices and agreement times, respectively, as follows:

  1. (1)

    \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}\) are compared for the coevolution using S-EDA,

    \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}\) are compared for the coevolution using ID2C-EDA

where if \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}\) (respectively, \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}\)) have the same values, then it is determined that S-EDAs (respectively, ID2C-EDAs) achieved balanced agreement prices (for both B and S); otherwise, it is determined that S-EDAs (respectively, ID2C-EDAs) achieved biased agreement prices (for both B and S).

  1. (2)

    \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}\) are compared for the coevolution using S-EDA,

    \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(ID$^{2}$C-EDA)}\) are compared for the coevolution using ID2C-EDA

where if \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}\) (respectively, \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(ID$^{2}$C-EDA)}\)) have the same values, then it is determined that S-EDAs (respectively, ID2C-EDAs) achieved balanced agreement times (for both B and S); otherwise, it is determined that S-EDAs (respectively, ID2C-EDAs) achieved biased agreement times (for both B and S).

In addition, for measuring and comparing the coevolution performance between S-EDAs and ID2C-EDAs in terms of the number of generations, we can use N Gen and \(N^{\mathit{Re\_init}}\) together as a performance measure. This is because the total average number of generations for coevolutionary learning is determined as \((N^{\mathit{Re\_init}} \times G^{\max} ) + N^{\mathit{Gen}}\). Furthermore, \(N^{\mathit{Re\_init}}\) gives additional information about the coevolution capability of S-EDAs and ID2C-EDAs. For example, there can be the case that \(\bar{N}^{\mathit{Re\_init}(\text{S-EDA})}\) (respectively, \(\bar{N}^{\mathit{Re\_init}(\mathrm{ID}^{2}\text{C-EDA})}\)) has reached CNT max, which means that S-EDAs (respectively, ID2C-EDAs) does not have enough coevolution capability for achieving converged solutions. The higher value of \(\bar{N}^{\mathit{Re\_init}(\text{S-EDA})}\) (respectively, \(\bar{N}^{\mathit{Re\_init}(\mathrm{ID}^{2}\text{C-EDA})}\)) indicates insufficient coevolution capability of S-EDAs (respectively, ID2C-EDAs) for achieving converged solutions.

5.2 Observations and analyses

5.2.1 Results of Type-I experiments

For (Long, Mid) in Table 6, (Mid, Short) in Table 7 and (Long, Short) in Table 8, B has a bargaining advantage over S in terms of time. Furthermore, in (Long, Mid), (Mid, Short) and (Long, Short), \(\mathit{dP}_{c}^{B}\) and \(\min(\mathit{dT}_{c}^{B}, \mathit{dT}_{c}^{S})\)—which determine the optimum agreement points of all optimization modes 1 to 9—are in the range of NSS. Hence, we calculated both \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) from Theorem 3, \(\lambda_{B}^{\mathit{PS}\text{-}\mathit{opt}}\) from (12) and \(\lambda_{S}^{\mathit{PS}\text{-}\mathit{opt}}\) from (13). From the results in Tables 6 to 8, the following two observations were drawn.

Table 7 Results of Type-I experiments in (Mid, Short)
Table 8 Results of Type-I experiments in (Long, Short)

Observation 1

While S-EDAs generally could not obtain effective PS-optimizing negotiation outcomes and coevolve effective PS-optimizing negotiation strategies in most of the optimization modes, ID2C-EDAs generally obtained effective PS-optimizing negotiation outcomes and coevolved effective PS-optimizing negotiation strategies in all the optimization modes.

Analysis

It can be observed from Tables 6 to 8 that: (1) the values of \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}\) were large (especially, for S), and the difference between \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}\) was too large to be effective agreement prices in most of the optimization modes, (2) the values of \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}\) were large positive real numbers and the difference between \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}\) was too large to be effective agreement times in most of the optimization modes, and (3) the values of \(\delta_{\mathit{dist}}^{\lambda _{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{\lambda _{S}}\text{(S-EDA)}\) were large positive real numbers (especially, for S) to be effective PS-optimizing negotiation strategies in most of the optimization modes. These indicate that S-EDAs generally: (1) achieved both ineffective agreement prices (especially, for S) and agreement times and (2) coevolved ineffective PS-optimizing negotiation strategies in terms of both closeness to the optimum and balanced coevolution. This was mainly because of different coevolution speed between POP B and POP S ; since POP B has converged to around an optimal value more rapidly than POP S ,POP S could not have sufficient diversity (of opponents) to optimize its solutions in making coevolutionary interactions and hence, S-EDAs could not achieve effective and balanced solutions. This phenomenon can be considered as the premature convergence in the coevolution situation.

It can be also observed from Tables 6 to 8 that: (1) both the values of \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}\) were either small or optimal, and the difference between \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}\) was also small in all the optimization modes; (2) the values of \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(ID$^{2}$C-EDA)}\) were small or in the range of the optimum and the difference between \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(ID$^{2}$C-EDA)}\) was also small in most of the optimization modes; and (3) the values of \(\delta_{\mathit{dist}}^{\lambda _{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{\lambda _{S}}\text{(ID$^{2}$C-EDA)}\) were small in most of the optimization modes. These indicate that ID2C-EDAs generally could: (1) achieve both effective agreement prices and agreement times and (2) coevolve effective PS-optimizing negotiation strategies in terms of both closeness to the optimum and balanced coevolution. This was because ID2C-EDAs have sufficient capability for achieving close to optimal and balanced solutions for both B and S by dynamically adjusting the degree of intensification and diversification of POP B and POP S using DR. Furthermore, by adopting LNS and PR, it is possible for ID2C-EDAs to improve solution accuracy and to avoid inappropriate population configurations, respectively.

Observation 2

While S-EDAs could not coevolve converged populations, ID2C-EDAs generally coevolved converged populations.

Analysis

It can be observed from Tables 6 to 8 that the values of \(\bar{N}^{\mathit{Re\_init}(\text{S-EDA})}\) reached CNT max in most of the optimization modes while the values of \(\bar{N}^{\mathit{Re\_init}(\mathrm{ID}^{2}\text{C-EDA})}\) were much smaller than \(\bar{N}^{\mathit{Re\_init}(\text{S-EDA})}\) (and were often close to 0) in all the optimization modes. Hence, comparing to ID2C-EDAs, S-EDAs required an extremely larger number of (total) generations for the coevolution. This indicates that S-EDAs generally did not have enough capability for coevolving converged populations while ID2C-EDAs had enough capability for coevolving converged populations within a reasonable number of generations. The reason that ID2C-EDAs outperformed S-EDAs is mainly due to the innovation of DR (in the coevolution process) which allows ID2C-EDAs to search for promising solutions adaptively. In DR, the diversification procedure helps to avoid premature convergence in a population by maintaining population diversity to a certain level and the refinement procedure helps to achieve optimal solutions by generating promising solutions using regional population history information and replacing less feasible solutions with the generated promising solutions. Furthermore, LNS can contribute to resolving the problem of the late convergence of populations, and PR assists to avoid configuring inappropriate populations in early generations. Hence, by adopting DR together with LNS and PR, each ID2C-EDA is more likely to escape premature convergence and maintain enough population diversity, which enables ID2C-EDAs to coevolve converged populations in the coevolutionary learning.

From these Observations 1 and 2, we can draw the following conclusion for the Type-I experiments.

Conclusion 1

When the negotiation setting with sufficiently large AgZ NP is provided for PS-optimizing agents, ID2C-EDAs generally coevolve effective (converged) PS-optimizing negotiation strategies for both B and S while S-EDA generally fails to coevolve such negotiation strategies within reasonable numbers of generations in most of the cases.

5.2.2 Results of Type-II experiments

For (Long, Mid) in Table 9, (Mid, Short) in Table 10 and (Long, Short) in Table 11, B has a bargaining advantage over S in terms of time. In addition, in (Long, Mid), (Mid, Short) and (Long, Short), \(\mathit{dP}_{c}^{B}\) and min(\(\mathit{dT}_{c}^{B}, \mathit{dT}_{c}^{S}\)) of the optimization modes 1 to 6 are not in the range of NSS. Hence, we calculated both \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) from Theorem 5, \(\lambda_{B}^{\mathit{PS}\text{-}\mathit{opt}}\) from (12) and \(\lambda_{S}^{\mathit{PS}\text{-}\mathit{opt}}\) from (13). In contrast, in (Long, Mid), (Mid, Short) and (Long, Short), \(\mathit{dP}_{c}^{B}\) and \(\min(\mathit{dT}_{c}^{B}, \mathit{dT}_{c}^{S})\) of the optimization modes 7 to 9 are in the range of NSS. Hence, we calculated both \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) from Theorem 3, \(\lambda_{B}^{\mathit{PS}\text{-}\mathit{opt}}\) from (12) and \(\lambda_{S}^{\mathit{PS}\text{-}\mathit{opt}}\) from (13). From the results in Tables 9 to 11, the following three observations were drawn.

Table 9 Results of Type-II experiments in (Long, Mid)
Table 10 Results of Type-II experiments in (Mid, Short)
Table 11 Results of Type-II experiments in (Long, Short)

Observation 3

In optimization modes 1 to 6, both S-EDAs and ID2C-EDAs generally achieved effective PS-optimizing negotiation outcomes and coevolved effective PS-optimizing negotiation strategies.

Analysis

From the results of the optimization modes 1 to 6 in Tables 9 to 11, it can be seen that: (1) the values of \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}\) were small and the difference between \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}\) was also small, (2) the values of \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}\) were in the range of the optimum and the difference between \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}\) was small, and (3) the values of \(\delta_{\mathit{dist}}^{\lambda _{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{\lambda _{S}}\text{(S-EDA)}\) were in the range of the optimum. These indicate that S-EDAs generally could: (1) obtain effective agreement prices and agreement times and (2) coevolve effective PS-optimizing negotiation strategies in terms of both closeness to optimum and balanced coevolution. Similarly, from all the results of the optimization modes 1 to 6 in Tables 9 to 11, it can be seen that: (1) the values of \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}\) were 0 and the difference between \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}\) was also 0; (2) the values of \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(ID$^{2}$C-EDA)}\) were 0, and the difference between \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(ID$^{2}$C-}\allowbreak \text{EDA)}\) was also 0; and (3) the values of \(\delta_{\mathit{dist}}^{\lambda _{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{\lambda _{S}}\text{(ID$^{2}$C-EDA)}\) were in the range of the optimum. In summary, both S-EDAs and ID2C-EDAs could obtain optimal PS-optimizing negotiation outcomes and coevolve optimal PS-optimizing negotiation strategies for these modes. In the results, the coevolved negotiation agreements were made at RP S and τ S . This is because B adopting the time-dependent negotiation strategy achieves all of its payoff at τ S by accepting S′s final proposal RP S . Since we designed the fitness function putting more emphasis on optimizing price than optimizing speed by setting \(\rho_{\mathit{NP}}=w_{\mathit{NP}}^{x}/100\) and \(\rho_{\mathit{NS}} = w_{\mathit{NS}}^{x}\), S-EDAs and ID2C-EDAs are less likely to make price concessions for achieving rapid agreements and strictly hold RP S as the optimal agreement price. This is the reason of the successful coevolution performance of S-EDAs for the above cases (when it is compared to the Type-I experiments).

Observation 4

In the optimization modes 7 to 9, ID2C-EDAs generally obtained effective PS-optimizing negotiation outcomes and coevolved effective PS-optimizing negotiation strategies while S-EDAs did not.

Analysis

From the results of the optimization modes 7 to 9 in Tables 9 to 11, it can be seen that: (1) the values of \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}\) were large and the difference between \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}\) was too large to be effective agreement prices; (2) the values of \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}\) were large positive real numbers and the difference between \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}\) was too large to be effective agreement times; and (3) the values of \(\delta_{\mathit{dist}}^{\lambda _{B}}\text{(S-EDA)}\) and \(\delta_{\mathit{dist}}^{\lambda _{S}}\text{(S-EDA)}\) were large positive real numbers to be effective PS-optimizing negotiation strategies. These indicate that S-EDAs generally could: (1) obtain both ineffective agreement prices and agreement times and (2) coevolve ineffective PS-optimizing negotiation strategies in terms of both closeness to optimum and balanced coevolution. In contrast, from all the results of the optimization modes 7 to 9 in Tables 9 to 11, it can also be observed that: (1) both the values of \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}\) were small and the difference between \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}\) is also small; (2) the values of \(\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(ID$^{2}$C-EDA)}\) were large positive real numbers and the difference between \(\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}\) was small; and (3) the values of \(\delta_{\mathit{dist}}^{\lambda _{B}}\text{(ID$^{2}$C-EDA)}\) and \(\delta_{\mathit{dist}}^{\lambda _{S}}\text{(ID$^{2}$C-EDA)}\) were small. These indicate that ID2C-EDAs generally could: (1) achieve both effective agreement prices and agreement times and (2) coevolve effective PS-optimizing negotiation strategies in terms of both closeness to optimum and balanced coevolution.

Observation 5

S-EDAs could not coevolve converged populations in some cases while ID2C-EDAs generally coevolved converged populations in most of the cases.

Analysis

From the results of optimization modes 1 to 6 in Tables 9 to 11, it can be observed that the values of \(\bar{N}^{\mathit{Re\_init}(\text{S-EDA})}\) were 0 in (Long, Mid) and (Mid, Short) but were very high (>6) in (Long, Short). Hence, S-EDAs required an extremely many generations for coevolving converged populations in (Long, Short). This is because (Long, Short) has a large search space compared to (Long, Mid) and (Mid, Short). These indicate that the search capability of S-EDAs was deteriorated in the large search space of (Long short). In contrast, it can also be observed from the results of optimization modes 1 to 6 in Tables 9 to 11 that the values of \(\bar{N}^{\mathit{Re\_init}(\mathrm{ID}^{2}\text{C-EDA})}\) were all 0. This indicates that ID2C-EDAs have enough search capability even in the large search space. From most of the results of the optimization modes 7 to 9 in Tables 9 to 11, it can be observed that the values of \(\bar{N}^{\mathit{Re\_init}(\text{S-EDA})}\) were 10. This indicates that S-EDAs could not have enough capability for coevolving converged populations within reasonable numbers of generations for these modes. In contrast, from all the results of the optimization modes 7 to 9 in Tables 9 to 11, it can be observed that the values of \(\bar{N}^{\mathit{Re\_init}(\mathrm{ID}^{2}\text{C-EDA})}\) were very small. This indicates that ID2C-EDAs have enough capability for coevolving converged populations within reasonable numbers of generations for these modes. A similar analysis used in Observation 2 can be used to explain why ID2C-EDAs outperformed S-EDAs.

From the Observations 3 to 5, we can draw the following conclusion for the Type-II experiments.

Conclusion 2

When the negotiation settings with insufficient AgZ NP are provided for PS-optimizing agents, ID2C-EDAs generally coevolve effective converged PS-optimizing negotiation strategies for both B and S within reasonable numbers of generations while S-EDA has a high possibility of failure in coevolving such negotiation strategies.

6 Related works

Since this work mainly focuses on finding effective negotiation strategies for PS-optimizing agents with incomplete information using coevolutionary learning, the related works are approaches using EAs for evolving negotiation strategies.

There are some existing works on using EAs as a decision making component to determine an agent’s optimal negotiation strategy (that ensures reaching an agreement successfully and achieving higher utilities) under incomplete information settings by generating adaptive proposals at every negotiation round (e.g., [18, 39, 40]). However, EAs in this work were used for learning agents’ negotiation strategies to find both agents’ effective negotiation strategies through coevolutionary learning under an incomplete information setting. Hence, this section only introduces and discusses related works on applying EAs to learn effective negotiation strategies.

In [24], Oliver has utilized standard GAs [7] for learning (simple) strategies of agents in which agents use quite simple threshold rules for bargaining and showed that by adopting GAs, agents can learn strategies for simple negotiation games. Whereas empirical studies seem to indicate the agents in [24] are generally successful in learning effective strategies, the research is only limited to learning simple threshold rules; offers are accepted if agents learn strategies having a higher utility over a predefined threshold.

In [23], Matos et al. have also utilized a GA for learning the most successful strategies against different types of opponents in different negotiation situations in which a service-oriented negotiation model in [4] was adopted to determine successful strategies for different types of environment by coevolving negotiation strategies and tactics. Empirical results in [23] are carried out for bilateral negotiations having two issues and showed that the agents adopting the GA are generally effective in evolving effective strategies for different negotiation circumstances. Nevertheless, the approach used in [23] has a serious limitation in that it requires a centralized coevolution model where complete information about each agent is assumed for evolving populations using one GA. Furthermore, such assumption is not realistic in many practical negotiation systems in which agents generally have incomplete information about each other.

In [13], Jin and Tsang have utilized a genetic programming (GP) for comparing the evolved results achieved from the GP with sub-game perfect equilibrium (SPE) solutions from game-theoretic analysis for complete information bargaining problems and showed that GP results achieved approximate solutions to the SPE solutions. Later, in [12], Jin has extended the simple bargaining problems to incomplete information bargaining problems and showed that the GP was capable of achieving reasonably good solutions. Similar to [23], the works [13] and [12] also have the problem of the centralized coevolution model for the coevolutionary learning using one GP.

This paper significantly and considerably extended the previous works reported in [8, 9, 32] and [10].

In [32], Sim has utilized an EDA (specifically, UMDAc) for coevolving effective negotiation strategies of agents having difference preference criteria for optimizing price and optimizing negotiation speed and it seems that the preliminary empirical results in [32] showed that the EDA was capable of coevolving PS-optimizing negotiation strategies of agents for P-optimizing negotiation. The fitness function used in [32] was the (total) utility function, which is similar to (7) in this work, consisting of both price and speed utility functions in which each weighting factor was incorporated to its corresponding utility function. However, using the coevolved results in [32], it is difficult to investigate convergence of the EDA because the results have no information about generations required for achieving converged populations. Later, in the extended work [9], Gwak and Sim have found the problem of the fitness function in [32] and have devised new fitness functions based on measuring the difference between: (1) the ratio of the price weighting factor to the time weighting factor and (2) the corresponding ratio of price utility to speed utility. Although the fitness function in [32] has some ambiguity in defining better negotiation solutions with higher fitness in the composite utility space (as described in Sect. 4.1), the fitness functions in [9] are more effective in defining better negotiation solutions. This is because they can differentiate negotiation strategies with the given ratio of the price weighting to the speed weighting from others. Empirical results in [9] showed that the devised fitness functions outperform the fitness function used in [32]. Furthermore, comparing the coevolution performance between conventional GA and EDA for finding effective negotiation strategies, it can be found that both conventional GA and EDA have limited performance for coevolving effective negotiation strategies.

In [8] and [10], Gwak and Sim empirically have proved that conventional GA and EDA generally could not achieve effective (or near-optimal) coevolution results for finding optimal P-optimizing negotiation solutions. In addition, under the assumption that dynamic diversity controlling methods can assist EAs to coevolve optimal solutions for both populations (in which both agents adopted different EAs and a decentralized coevolution model was assumed), the DR procedure was devised and two local improvement methods called LNS and PR were also devised for further coevolution performance improvement. If DR together with LNS and PR is incorporated with conventional GA (respectively, conventional EDA), we called it ID2C-GA (respectively, ID2C-EDA). Empirical results showed that ID2C-GA and ID2C-EDA have complementary performance in that one achieved better performance for some cases and the other one achieved better performance for the other cases. Furthermore, since it was also shown that ID2C-EDA has better performance than ID2C-EDA for coevolving optimal strategies in the larger solution space, we adopted ID2C-EDA as the EA model for the coevolutionary learning.

Finally, it is acknowledged that this work significantly and considerably enhances [32] and [9] as well as the closely related works [12, 13, 23, 24] as follows:

  1. 1)

    In [32] and [9], there is theory only for optimal negotiation solutions of P-optimizing negotiations given as Theorems 1 and 2; however, there is no such theory for the other types of negotiations (e.g., for PS-optimizing negotiations). To this end, this work provides theoretical background of optimal negotiation solutions for PS-optimizing negotiations given as Theorems 3 to 5 by designing optimal PS-optimizing agents with complete information. Hence, it is possible (i) to calculate optimal negotiation solutions for each PS-optimization mode and (ii) to evaluate optimality of coevolved negotiation solutions (under an incomplete information setting) by comparing them with the optimal negotiation solutions.

  2. (2)

    Since the fitness function in [9] (showing better performance than the fitness function in [32]) simply measures difference of ratios between weightings and utilities, higher fitness values indicate that the ratios are (much) closer. Furthermore, using the fitness function in [25], it is hard to find direct relationship between fitness and optimal PS-optimizing negotiation solutions achieved from Theorems 3 to 5. Hence, it is required to develop the new fitness function in (17) which is based on composite likelihoods of agreement price in (14) and agreement time in (15).

  3. (3)

    Although the previous works [12, 13, 23, 24] and [32] assumed a fully centralized coevolution model using one EA in which complete information about each agent is assumed for evolutionary learning, this work provides the fully decentralized coevolution model in Sect. 4.2 using two EDAs (for populations of B and S, respectively) and a coordinator to share and determine conditions of coevolution termination and inappropriate coevolution. The decentralized coevolution model will be more realistic in simulating negotiations of agents having fully incomplete information about each other.

  4. (4)

    Although the experiments in both [32] and [9] were carried out under the assumption that both agents have the same negotiation mode (e.g., if B is exact-PS-optimizing, then S is also exact-PS-optimizing), this work carried out extensive experiments by considering all possible combinations of (representative) PS-optimizing negotiation modes between B and S (see Table 3).

  5. (5)

    As shown in [8] and [10], conventional EAs such as S-GAs and S-EDAs has some drawbacks for coevolutionary learning due to premature convergence and biased coevolution effects, which can be also observed from the results in [32] and [9]. In contrast, ID2C-GAs and ID2C-EDAs have enough coevolution capability for coevolutionary learning because they are augmented with DR together with LNS and PR. Empirical results in Observations 1 to 5 demonstrated that effective (sometimes optimal) PS-optimizing negotiation solutions can be achieved using ID2C-EDAs for the coevolutionary learning while such solutions cannot be achieved using S-EDAs. Hence, this paper can be seen as an extension of [8] and [10] providing empirical evidences of the effectiveness of ID2C-EDAs for coevolutionary learning.

7 Conclusion and future work

Based on the theoretical results obtained in Sect. 4.1 for finding optimal negotiation strategies of PS-optimizing agents with complete information, this work has developed an effective coevolutionary learning mechanism (by adopting ID2C-EDAs) for finding effective PS-optimizing negotiation strategies of PS-optimizing agents with incomplete information. The novel feature and significance of this research is therefore designing and developing negotiation mechanisms that can: (1) optimize both price and negotiation speed of PS-optimizing agents with complete information and (2) coevolve effective, or (near-)optimal, negotiation strategies for PS-optimizing agents with incomplete information.

The contributions of this work are detailed as follows.

  1. (1)

    For PS-optimizing negotiation under a complete information setting, this work determines \(P_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) and \(T_{c}^{\mathit{PS}\text{-}\mathit{opt}}\) (Theorems 3 to 5 in Sect. 4.1) that lead to optimal negotiation strategies for both PS-optimizing agents [(12) and (13) in Sect. 4.1]. Whereas Theorems 1 and 2 are based on Theorems 1 and 2 in [40, pp. 199–200], this research, to the best of the authors’ knowledge, is the earliest work suggesting optimality of agreements between PS-optimizing agents with incomplete information.

  2. (2)

    This contribution distinguishes this work from [40] and [5] in that (i) [5] only showed that there are three classes of optimal strategy such as Boulware, Linear and Conceder depending on different negotiation scenarios and (ii) [40] only focused on showing that there is optimal negotiation strategies for both P-optimizing agents in which one agent (having a bargaining advantage over the opponent in terms of time) maximizes its price utility and guarantees that an agreement is reached. The following summarizes the optimality of agreements for P-optimizing and PS-optimizing agents.

    Optimal agreement point of P-optimizing agents

    When τ B >τ S

    \(\begin{cases} P_{c}^{P\text{-}\mathit{opt}} = \mathit{RP}_{S} \\ T_{c}^{\mathit{PS}\text{-}\mathit{opt}} = \tau_{S} \\ \end{cases}\)

    When τ B <τ S

    \(\begin{cases} P_{c}^{P\text{-}\mathit{opt}} = \mathit{RP}_{B} \\ T_{c}^{\mathit{PS}\text{-}\mathit{opt}} = \tau_{B} \\ \end{cases}\)

    Optimal agreement points of PS-optimizing agents

    When τ B >τ S

    \(\begin{cases} P_{c}^{\mathit{PS}\text{-}\mathit{opt}} = \max_{\mathit{dP}_{c}^{x}} \{ U_{\mathit{NP}}^{B}(\mathit{dP}_{c}^{B}), U_{\mathit{NP}}^{B}(\mathit{dP}_{c}^{S})\} \\[4pt] T_{c}^{\mathit{PS}\text{-}\mathit{opt}} = [0, \max_{\mathit{dT}_{c}^{x}} \{ U_{\mathit{NS}}^{B}(\mathit{dT}_{c}^{B}), U_{\mathit{NS}}^{B}(\mathit{dT}_{c}^{S})\} ] \\ \end{cases}\)

    When τ B <τ S

    \(\begin{cases} P_{c}^{\mathit{PS}\text{-}\mathit{opt}} = \max_{\mathit{dP}_{c}^{x}} \{ U_{\mathit{NP}}^{B}(\mathit{dP}_{c}^{B}), U_{\mathit{NP}}^{B}(\mathit{dP}_{c}^{S})\} \\[4pt] T_{c}^{\mathit{PS}\text{-}\mathit{opt}} = [0, \max_{\mathit{dT}_{c}^{x}} \{ U_{\mathit{NS}}^{B}(\mathit{dT}_{c}^{B}), U_{\mathit{NS}}^{B}(\mathit{dT}_{c}^{S})\} ] \\ \end{cases}\)

  3. (3)

    Whereas several existing works (discussed in Sect. 6) adopt EAs for evolving successful negotiation strategies for agents under different negotiation situations, these works are limited in that: (1) agents mostly did not consider optimization of both price and negotiation speed and (2) centralized coevolution models were used for coevolutionary learning in which complete information settings for agents are generally assumed. However, agents in this work are designed to optimize both price and negotiation speed using coevolutionary learning for an incomplete information setting. Furthermore, we adopted a decentralized coevolution model in which incomplete information settings can be generally assumed.

  4. (4)

    In comparison with authors’ previous works [32] and [9], this paper has provided much more detailed and enhanced designs for PS-optimizing agents for both complete and incomplete information settings (Sect. 4).

  5. (5)

    A new fitness function was designed and implemented for S-EDAs and ID2C-EDAs and has the following novel features.

    1. (a)

      The likelihood based (Gaussian) distance metric was formulated and applied to measure the closeness between the achieved agreement price and time and the desired agreement price and time, respectively.

    2. (b)

      The fitness function is composed of a weighted linear combination of individual similarities for price and negotiation speed in which each similarity is weighted by the corresponding preference weight and is also magnified by the weight for further discrimination from the other similarities.

    3. (c)

      While the previous fitness functions in [32] and [9] as the form of utility functions lead to the deterioration of intensification capability of both EDAs (Sect. 4.2), the proposed fitness function is effective in coevolving effective negotiation strategies for both PS-optimizing agents.

  6. (6)

    Empirical results (Observations 1 to 5) show that (i) ID2C-EDAs significantly outperforms S-EDAs in terms of coevolution performance for achieving close to the optimum and balanced solutions and (ii) throughout the (decentralized) coevolutionary learning, ID2C-EDAs adopting the proposed fitness function generally achieved effective, or (near-)optimal, negotiation strategies for both PS-optimizing agents for various combinations of preference criteria under the negotiation settings having both sufficient and insufficient AgZ NP. From these results, we can conclude that ID2C-EDAs are more suitable than S-EDAs for the coevolutionary learning in achieving effective PS-optimizing negotiation strategies. Hence, this work together with [10] can also provide evidences to show the effectiveness of ID2C-EDAs for competitive coevolution of heterogeneous populations.

Finally, the authors acknowledge that although this work develops a coevolutionary learning approach for finding effective PS-optimizing negotiation strategies under an incomplete information setting in which one agent having a bargaining advantage over the other in terms of time, this work in its present form does not deal with the negotiation situation that neither agent has bargaining advantage in terms of time. Hence, extending this work to include the design of PS-optimizing agents neither having a bargaining advantage is on the agenda for future work. Since the focus of this work is designing a PS-optimizing negotiation mechanism, for simplicity, this work only considers bilateral single-issue negotiations between PS-optimizing agents. In addition, ID2C-EDAs were adopted for coevolutionary learning because they have showed better performance in searching larger solution space than ID2C-GAs for competitive coevolution [10]. Therefore, other possible enhancements of this work may include: (1) extending this work to deal with multi-issue negotiations and (2) adopting other possible types of EAs with the dynamic diversity controlling method in this work to compare the coevolution performances and find a more effective and efficient EA model for the coevolutionary learning.