A novel method for coevolving PS-optimizing negotiation strategies using improved diversity controlling EDAs

Gwak, Jeonghwan; Sim, Kwang Mong

doi:10.1007/s10489-012-0378-4

A novel method for coevolving PS-optimizing negotiation strategies using improved diversity controlling EDAs

Published: 21 September 2012

Volume 38, pages 384–417, (2013)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

A novel method for coevolving PS-optimizing negotiation strategies using improved diversity controlling EDAs

Download PDF

Jeonghwan Gwak¹ &
Kwang Mong Sim²

376 Accesses
13 Citations
Explore all metrics

Abstract

In agent-mediated negotiation systems, the majority of the research focused on finding negotiation strategies for optimizing price only. However, in negotiation systems with time constraints (e.g., resource negotiations for Grid and Cloud computing), it is crucial to optimize either or both price and negotiation speed based on preferences of participants for improving efficiency and increasing utilization. To this end, this work presents the design and implementation of negotiation agents that can optimize both price and negotiation speed (for the given preference settings of these parameters) under a negotiation setting of complete information. Then, to support negotiations with incomplete information, this work deals with the problem of finding effective negotiation strategies of agents by using coevolutionary learning, which results in optimal negotiation outcomes. In the coevolutionary learning method used here, two types of estimation of distribution algorithms (EDAs) such as conventional EDAs (S-EDAs) and novel improved dynamic diversity controlling EDAs (ID²C-EDAs) were adopted for comparative studies. A series of experiments were conducted to evaluate the performance for coevolving effective negotiation strategies using the EDAs. In the experiments, each agent adopts three representative preference criteria: (1) placing more emphasis on optimizing more price, (2) placing equal emphasis on optimizing exact price and speed and (3) placing more emphasis on optimizing more speed. Experimental results demonstrate the effectiveness of the coevolutionary learning adopting ID²C-EDAs because it generally coevolved effective converged negotiation strategies (close to the optimum) while the coevolutionary learning adopting S-EDAs often failed to coevolve such strategies within a reasonable number of generations.

A Mediator-Based Agent Negotiation Protocol for Utilities That Change with Time

Effects of GA Based Mediation Protocol for Utilities that Change Over Time

Fitness function shaping in multiagent cooperative coevolutionary algorithms

Article 26 November 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In distributed systems involving the interactions of autonomous agents on behalf of their owners, negotiation activities are essential for resolving differences and conflicting goals [11] and to control and manage resources [30, 34, 38]. Automated negotiation among agents has been widely used for supporting e-commerce and is also becoming increasingly important for managing massive distributed computational systems such as Grid/Cloud computing systems because interactions between participating agents can occur in many different contexts. Whereas there are many existing negotiation agents for e-commerce (e.g., [4, 22]), Grid resource management (e.g., [2, 20, 30, 31, 34, 38]) and Cloud resource management (e.g., [33, 35, 36]), (1) most of the negotiation agents are designed to reach an agreement consisting of coinciding proposals of participating agents and (2) each agent’s decision to reach an agreement is focused on optimizing the value of the proposal (typically price) only without consideration of reaching a consensus more rapidly (i.e., the participating agents do not consider optimizing negotiation speed). However, there are some practical negotiation applications with time constraints in which both the issues of obtaining the cheapest possible resources and getting them rapidly are essential (e.g., negotiations for Grid or Cloud resources). In such applications, obtaining resources more rapidly is one of the most desirable properties depending on negotiation participants preferences because any delay incurred on waiting for negotiations as well as resources can be perceived as an overhead. Even though there is a lot of existing research (e.g. [14, 16, 27]) that focuses on developing multi-attribute negotiation mechanisms to deal with different attributes (i.e., issues of negotiation such as price, quality, quantity, delivery time, etc.) of participating negotiation agents, there is little research that considers the duration of a negotiation (i.e., negotiation speed) as a factor affecting performance for time-constrained negotiations [32]. Whereas (negotiation) success rate (i.e., the chance of successfully finding a mutually acceptable agreement) is the main consideration for negotiation agents that operate in domains that do not have very stringent constraints on time, negotiation speed (as well as success rate) is an important consideration for those that operate in domains with very stringent constraints on time. This is because negotiation agents that consider enhancing negotiation speed can make agreements quickly (by sacrificing expected utility on issues of negotiation) and therefore, it is also possible for the agents to obtain higher success rates in negotiations under such time-constrained domains. In this regard, designing negotiation agents considering negotiation speed and finding efficient (or optimal) negotiation strategies of such agents are the main focuses of this work. Even though this work currently deals with negotiation considering negotiation speed based on a single issue (i.e., price), it can be extended to deal with multi-attribute negotiation considering negotiation speed.

In this work, the agents focusing on optimizing price only and optimizing both price and negotiation speed are denoted as price optimizing (P-optimizing) and price and speed optimizing (PS-optimizing) ones, respectively. PS-optimizing agents were first proposed and considered in [32]. To illustrate the detailed negotiation applications with examples, consider the following negotiation scenarios in which: (1) there are two types of self-interested PS-optimizing negotiation agents (that they will act so as to maximize their own outcomes) called as a seller (or consumer) and buyer (or provider) and (2) each seller and buyer has different preference criteria for optimizing both price and negotiation speed. The preference criteria of the seller can be classified into the following two cases.

(1)
The seller prefers to sell (or provide) a resource/service at a higher price than the given (expected) agreement price at the expense of having to wait longer than the given (expected) agreement time. We denote such seller is more P-optimizing (more-P-optimizing).
(2)
The seller prefers to sell (or provide) a resource/service more rapidly than the given (expected) agreement time perhaps by providing its resource/service with a lower price than the given (expected) agreement price at an earlier negotiation round. We denote such seller is more speed optimizing (more-S-optimizing).

Similarly, the preference criteria of the buyer can also be classified into the following two cases.

(1)
The buyer prefers to acquire cheaper resource/service alternatives than the given (expected) agreement price at the expense of having to wait longer than the given (expected) agreement time. We denote the buyer is more-P-optimizing.
(2)
The buyer prefers to acquire a resource/service more rapidly than the given (expected) agreement time perhaps by paying a higher price than the given (expected) agreement price at an earlier negotiation round. We denote the buyer is more-S-optimizing.

To adequately address such negotiation problems, negotiation agents called PS-optimizing agents should be designed to: (1) determine the solution space SS _PS-opt for PS-optimizing negotiation consisting of (i) the solution space SS _NP for optimizing price (in which different possible preference criteria of price can be represented) and (ii) the solution space SS _NS for optimizing negotiation speed (in which different possible preference criteria of negotiation speed can be represented), (2) appropriately optimize both price and negotiation speed in SS _PS-opt for the given various combinations of possible preference criteria (of agents), and (3) make successful agreements in various negotiation situations. To this end, the impetus of this work is to devise mechanisms for finding effective PS-optimizing negotiation strategies (of agents) which result in reasonable PS-optimizing negotiation outcomes.

Based on the information about their opponents (i.e., the other participating agents), negotiation (parameter) settings can be generally classified into the two types: (1) a complete information setting in which (participating) agents share their private information to their opponents, and (2) an incomplete information setting in which agents do not share their private information to their opponents. We denote the agent having the negotiation setting of a complete information setting (respectively, an incomplete information setting) as the agent with complete information (respectively, the agent with incomplete information). Following the above definitions, PS-optimizing agents are also divided into two categories based on the negotiation settings that they adopt. That is, a PS-optimizing agent under a complete information setting knows its opponent’s private information while a PS-optimizing agent under an incomplete information setting does not know its opponent’s private information. Further details of negotiation models for P-optimizing negotiation and the PS-optimizing negotiation will be described and compared in Sect. 2.

The existing preliminary works in [32] and [9] have attempted to find negotiation strategies for PS-optimizing agents with incomplete information using coevolutionary learning by adopting evolutionary algorithms (EAs). Nevertheless, the results in [32] and [9] showed that: (1) there are possibilities of coevolution failure using the fitness function defined in [32] due to the ambiguity in the utility space, (2) the converged coevolution results cannot be achieved in some cases using conventional EAs used in [32] and [9]. Furthermore, in [32] and [9], there was no theory explaining and supporting the optimality of the achieved results. To overcoming these drawbacks and to complement and enhance the existing PS-optimizing agents, this work will design: (1) PS-optimizing agents for performing effective PS-optimizing negotiations under a complete information setting (Sect. 4.1) and (2) mechanisms for finding effective negotiation strategies of PS-optimizing agents with incomplete information (Sect. 4.2) by using coevolutionary learning (described in Sect. 3) adopting estimation of distribution algorithms (EDAs).

A series of experiments (see Sect. 5) was carried out to: (1) show the effectiveness of the coevolutionary learning for finding effective negotiation strategies of PS-optimizing agents with incomplete information and (2) compare the performance of coevolutionary learning adopting S-EDAs against adopting ID²C-EDAs. Empirical results in Sect. 5 show that ID²C-EDAs can coevolve effective converged negotiation strategies which are close to the optimum for both PS-optimizing agents in most of the cases. While Sect. 6 compares this work with existing works, Sect. 7 concludes this paper by summarizing a list of contributions and future work.

2 Negotiation models

This work considers a bilateral negotiation model between two self-interested agents with conflicting interests such that the seller (S) that wishes to provide a good or service at the highest possible price and the buyer (B) that purchase the good or service at the cheapest possible price. We first investigate one of the most widely used P-optimizing negotiation model for optimizing price only. Then, the P-optimizing negotiation model will be extended to the PS-optimizing negotiation model that is capable of dealing with optimizing both price and negotiation speed using preferences of price and negotiation speed.

2.1 Price optimizing negotiation model

In the P-optimizing negotiation model, there are three key elements of negotiation [15]: (1) the negotiation protocol, (2) the negotiation strategies that the agents adopt during the negotiation process, and (3) the utility functions for the agents. The agents adopt Rubinstein’s alternating offers protocol [26] and negotiate by exchanging proposals with their negotiation partners. The alternating offers protocol is simple but it is the most influential general negotiation protocol. Furthermore, it has been applied to many existing works (e.g., see [6, 21, 41]). At each alternate round, an agent makes and sends a proposal. Then, the other agent evaluates the proposal and takes one of the following actions: (1) accepting the proposal, (2) rejecting the proposal, or (3) making a counter proposal. Negotiation between the two agents terminates with an agreement when an offer or a counter-offer is accepted or with a conflict if no agreement is reached when one of the two agents’ deadlines is reached. An agreement is reached when one agent proposes a deal that matches or exceeds what another agent asks for.

The agent x∈{B,S} generates a proposal at a negotiation round t, 0≤t≤τ _x, as follows:

$$ P_{t}^{x} = \mathit{IP}_{x} + ( - 1)^{\alpha} \biggl( \frac{t}{\tau_{x}} \biggr)^{\lambda _{x}}| \mathit{RP}_{x} - \mathit{IP}_{x} |, $$

(1)

where α=1 for S and α=0 for B. IP _x is the initial price of x that is the most favorable price for x, and RP _x is the reserve price that is the least favorable price for x; τ _x is the deadline and λ _x, 0≤λ _x≤∞, is the time-dependent strategy of x. During the negotiation process, starting from the initial prices, successive proposals of S are monotonically decreasing while successive proposals of B are monotonically increasing.

As shown in Fig. 1, for each agent x, the possible range of price, [IP _x,RP _x], is denoted as the acceptability zone for price of x, $\mathit{AccZ}_{x}^{\mathit{NP}}$, and the possible range of negotiation time, [0,τ _x], is denoted as the acceptability zone for negotiation time of x, $\mathit{AccZ}_{x}^{\mathit{NT}}$. The negotiation solution space (NSS) for the negotiation between B and S consists of: (1) the agreement zone of price (AgZ ^NP), or sometimes called the price-surplus, which is the overlapping region between $\mathit{AccZ}_{B}^{\mathit{NP}}$ and $\mathit{AccZ}_{S}^{\mathit{NP}}$, and (2) the agreement zone of negotiation time (AgZ ^NT) which is the overlapping region between $\mathit{AccZ}_{B}^{\mathit{NT}}$ and $\mathit{AccZ}_{S}^{\mathit{NT}}$. In Fig. 1, AgZ ^NP is [RP _S,RP _B] and AgZ ^NT is [0,min{τ _B,τ _S}].

Time-dependent negotiation strategies are adopted in which the negotiation agents make successive proposals depending on the remaining negotiation time. The concession behavior of x is determined by the values of the time-dependent strategy and is classified as follows [28, 29, 37]:

(1)
Conciliatory (0<λ _x<1): x makes larger concessions in earlier negotiation rounds and smaller concessions in later negotiation rounds.
(2)
Linear (λ _x=1): x makes a constant rate of concession.
(3)
Conservative (1<λ _x<∞): x makes smaller concessions in earlier negotiation rounds and larger concessions in later negotiation rounds.

Let D be the event in which x fails to reach an agreement. The utility function of x is defined as U _x:[IP _x,RP _x]∪D→[0,1] such that U _x(D)=0 and for any $P_{t}^{x} \in [\mathit{IP}_{x},\mathit{RP}_{x}]$, $U_{x}(P_{t}^{x}) > U_{x}(D)$ in which $U_{x}(P_{t}^{x})$ is given as follows:

$$ U_{x}\bigl(P_{t}^{x}\bigr) = u_{\min} + (1 - u_{\min} ) \biggl( \frac{\mathit{RP}_{x} - P_{t}^{x}}{\mathit{RP}_{x} - \mathit{IP}_{x}} \biggr), $$

(2)

where u _min is the minimum utility that x receives for reaching an agreement at RP _x and the value of u _min is set larger than 0. u _min is set to 0.001 in this work for the experimental purpose. Then, at $P_{t}^{x} = \mathit{RP}_{x}$, U _x(RP _x)=0.001>U _x(D)=0.

Definition 1

(P-optimizing Agent)

For a given negotiation setting, a P-optimizing agent is designed to optimize the price only by maximizing the utility in (2).

A negotiation between P-optimizing agents is denoted as the P-optimizing negotiation. Self-interested P-optimizing agents B and S favor an agreement that maximizes their own (price) utilities given in (2) at an agreement price.

In P-optimizing negotiations between B and S, finding their optimal negotiation strategies plays an important role in a sense that by adopting optimal negotiation strategies, both achieve optimal negotiation outcomes (i.e., optimal agreement prices). In determining optimal negotiation strategies for P-optimizing negotiations with complete information settings, deadline effect is the most important factor. This is because if one P-optimizing agent has a longer deadline than the other, the agent having a longer deadline will dominate the whole negotiation. Since the strategy of the agent having a longer deadline determines whether both agents can reach an agreement before their deadlines, the agent having a longer deadline has (significant) a bargaining advantage in terms of time over the other agent.

For a P-optimizing negotiation under a complete information setting, an agent knows the other agent’s private information such as RP and deadline. Therefore, the optimal agreement price ($P_{c}^{P\text{-}\mathit{opt}}$) and agreement time ($T_{c}^{P\text{-}\mathit{opt}}$) for the P-optimizing negotiation between B and S can be analyzed by the following theorems.

Theorem 1

[40, pp. 199–200]

If the P-optimizing agent B has longer deadline than the P-optimizing agent S, $P_{c}^{P\text{-}\mathit{opt}}$ is RP _S and $T_{c}^{P\text{-}\mathit{opt}}$ is τ _S.

Proof

Since the minimal possible agreement price for B is RP _S and at which B obtains the maximal utility, $P_{c}^{P\text{-}\mathit{opt}}$ is made at RP _S. Whatever strategy S adopts, S concedes to RP _S at τ _S following (1). Before reaching τ _S, the utility of S’s proposals for B will be lower than the utility at τ _S. Furthermore, B fails to reach an agreement after τ _S. Hence, $T_{c}^{P\text{-}\mathit{opt}}$ is made at τ _S. □

Theorem 2

[40, pp. 199–200]

If the P-optimizing agent S has longer deadline than the P-optimizing agent B, $P_{c}^{P\text{-}\mathit{opt}}$ is RP _B and $T_{c}^{P\text{-}\mathit{opt}}$ is τ _B.

Proof

Symmetrically, $P_{c}^{P\text{-}\mathit{opt}}$ is made at RP _B and $T_{c}^{P\text{-}\mathit{opt}}$ is made at τ _B because at which S obtains its maximal utility. □

Finally, using the obtained $P_{c}^{P\text{-}\mathit{opt}}$ and $T_{c}^{P\text{-}\mathit{opt}}$ from Theorems 1 and 2, the optimal P-optimizing negotiation strategy of $B (\lambda_{B}^{P\text{-}\mathit{opt}})$ and the optimal P-optimizing negotiation strategy of $S (\lambda_{S}^{P\text{-}\mathit{opt}})$ are derived from (1) by substituting $P_{t}^{x}$ in (1) by $P_{c}^{P\text{-}\mathit{opt}}$ and t in (1) by $T_{c}^{P\text{-}\mathit{opt}}$, respectively, as follows:

(3)

(4)

Theorems 1 and 2 are based on Theorems 1 and 2 in [40, pp. 199–200].

2.2 Price and speed optimizing negotiation model

The proposed PS-optimizing negotiation model also considers price as a negotiation issue similarly to the P-optimizing negotiation model in Sect. 2.1. Therefore, the agents participating in PS-optimizing negotiations exchange offers or counter-offers that consist of price proposals only—but not time proposals or both price and time proposals—during the negotiation process. However, compared to the P-optimizing negotiation model designed to take only price into consideration in optimizing negotiation outcomes, the PS-optimizing negotiation model is designed to take both price and negotiation speed (in terms of the number of negotiation rounds) into consideration in optimizing negotiation outcomes.

Definition 2

(PS-optimizing Agent)

For a given negotiation setting, a PS-optimizing agent is designed to optimize both price and negotiation speed (by maximizing the total utility consisting of both price and speed utilities) using its given preferences of price and negotiation speed.

A negotiation between PS-optimizing agents is denoted as the PS-optimizing negotiation.

The PS-optimizing negotiation model also has three key elements similarly to the P-optimizing negotiation model. The PS-optimizing agents also adopt Rubinstein’s alternating offers protocol and time-dependent negotiation strategies as in P-optimizing agents. However, for each PS-optimizing agent, there are two types of utility functions: (1) one designed to measure the degree of satisfaction for price and (2) the other designed to measure the degree of satisfaction for negotiation speed. Depending on the strategies that a PS-optimizing agent adopts, there can be a variety of possible PS-optimizing negotiation outcomes. Hence, this research will focus on designing PS-optimizing agents B and S and finding their optimal PS-optimizing negotiation strategies for achieving optimal PS-optimizing negotiation outcomes for their given preferences of price and negotiation speed.

In addition to the three key elements, the PS-optimizing negotiation model has one more key element: the preferences of price and negotiation speed (for each PS-optimizing agent). With regard to preferences of price and negotiation speed of a PS-optimizing agent, different preference criteria such as optimizing price and optimizing negotiation speed are individually modeled as corresponding different weightings of price and negotiation time, respectively; for a PS-optimizing agent x, the preference for optimizing price is denoted as $w_{\mathit{NP}}^{x}$ and the preference for optimizing negotiation speed is denoted as $w_{\mathit{NS}}^{x}$. Based on the user’s preferences for optimizing price and optimizing speed, $w_{\mathit{NP}}^{x}$ and $w_{\mathit{NS}}^{x}$ are provided by the user with the constraint $w_{\mathit{NP}}^{x} + w_{\mathit{NS}}^{x} = 1.0$ where $w_{\mathit{NP}}^{x} \ge 0$ and $w_{\mathit{NS}}^{x} \ge 0$. $w_{\mathit{NP}}^{x} + w_{\mathit{NS}}^{x}$ is set to 1.0 because the preference criteria of x are interdependent and conflicting each other; as x prefers to achieve negotiation outcomes that is more P-optimizing at the expense of waiting longer, then x will put more emphasis on $w_{\mathit{NP}}^{x}$ and less emphasis on $w_{\mathit{NS}}^{x}$. Conversely, as x prefers to achieve its negotiation outcome more rapidly at the expense of conceding more in price, then x will put more emphasis on $w_{\mathit{NS}}^{x}$ and less emphasis on $w_{\mathit{NP}}^{x}$. Depending on different preference criteria, agents can be summarized as the following three representative groups [32]:

(1)
(Totally) P-optimizing agents in which total emphasis is given for optimizing price such as $(w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x}) = (1.0, 0.0)$.
(2)
(Totally) S-optimizing agents in which total emphasis is given for optimizing negotiation speed such as $(w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x}) = (0.0, 1.0)$.
(3)
PS-optimizing agents in which emphases are given for optimizing both price and negotiation speed such as $(w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x}) = \{\text{the weightings except }(1.0, 0.0)\text{ and }\allowbreak (0.0, 1.0)\}$.

However, (Totally) S-optimizing agents are not considered because such S-optimizing agents model the situation when agents are totally optimizing negotiation speed without consideration of optimizing price. This negotiation situation will not be realistic in practice because such S-optimizing agents generally reach an agreement without any negotiation by just accepting its opponent’s first proposal. Hence, the possible region of preferences of price and negotiation speed is set as $(w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x})=[(1.0, 0.0), (0.0, 1.0))$.

In regard to various possible combinations of preference criteria of PS-optimizing agents, there are three representative groups of PS-optimizing agents: (1) the agents placing the equal emphasis on optimizing price and negotiation speed such as $(w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x}) = (0.5, 0.5)$ are denoted as exact-PS-optimizing agents, (2) if agents place more emphasis on optimizing price than exact-PS-optimizing agents, then they are denoted as more-P-optimizing agents, and (3) if agents place more emphasis on optimizing negotiation speed than exact-PS-optimizing agents, then they are denoted as more-S-optimizing agents.

Comparing to P-optimizing agents, the PS-optimizing agent x requires two types of utility functions: (1) a price utility function for measuring the degree of satisfaction in terms of price and (2) a speed utility for measuring the degree of satisfaction in terms of negotiation speed. The price utility function $U_{\mathit{NP}}^{x}$ for the given input P _x (for price) and the speed utility function $U_{\mathit{NS}}^{x}$ for the given input T _x (for negotiation time) are defined as follows:

(5)

(6)

where $U_{\mathit{NP}}^{x}(P_{x}) \in [0, 1]$ and $U_{\mathit{NS}}^{x}(T_{x}) \in [0, 1]$. $u_{\min}^{P}$ is the minimum utility that x receives a deal at its RP, and $u_{\min}^{S}$ is the minimum utility that x receives a deal at its deadline. For the experimental purpose, the values of $u_{\min}^{P}$ and $u_{\min}^{S}$ is set to 0.0001. Next, for achieving the composite utility of x consisting of both $U_{\mathit{NP}}^{x}$ and $U_{\mathit{NS}}^{x}$, the following total utility function $U_{\mathit{Total}}^{x}$ was used:

$$ U_{\mathit{Total}}^{x}( P_{x},T_{x} ) = w_{\mathit{NP}}^{x} \times U_{\mathit{NP}}^{x}( P_{x} ) + w_{\mathit{NS}}^{x} \times U_{\mathit{NS}}^{x}( T_{x} ) $$

(7)

where P _x∈{0,P _c} and T _x∈{0,T _c}. If x does not reach an agreement before its deadline, then $U_{\mathit{Total}}^{x} = 0$ because $U_{\mathit{NP}}^{x} = U_{\mathit{NS}}^{x} = 0$. If x reaches an agreement at P _x=P _c and T _x=T _c, then $U_{\mathit{Total}}^{x}(P_{c},T_{c}) > 0$ because $U_{\mathit{NP}}^{x} > 0$ and $U_{\mathit{NS}}^{x} > 0$.

The remaining part is designing PS-optimizing agents and finding their optimal negotiation strategies to achieve optimal PS-optimizing negotiation outcomes under both complete and incomplete information settings. In designing PS-optimizing agents, the design goal is as follows:

Design goal

The ultimate design goal of PS-optimizing agents is to achieve optimal PS-optimizing negotiation outcomes satisfying preferences of price and negotiation speed under given negotiation settings. Even though a PS-optimizing negotiation cannot achieve optimal negotiation outcomes, the performance of the PS-optimizing negotiation in optimizing the preferences should be superior to or (at least) equal to that of the P-optimizing negotiation. The latter case is denoted as the minimum performance requirement in this work.

The similarities and differences between P-optimizing agents (Sect. 2.1) and PS-optimizing agents (Sect. 2.2) are as follows. (1) Both negotiation models adopt Rubinstein’s alternating offers protocol as the negotiation protocol. Furthermore, agents in the two negotiation models exchange offers or counter-offers that consist of price proposals—but not time proposals or both price and time proposals—for making a mutual agreement. (2) To evaluate price and negotiation time (of the proposals), a speed utility function as well as a price utility function is required. Accordingly, the total utility function consisting of both price and speed utility functions, associated with preferences of price and negotiation speed, respectively, is adopted for PS-optimizing agents. However, for P-optimizing agents, the sole utility function equivalent to the price utility function in (2) is used. (3) As the name denotes, the PS-optimizing negotiation model requires an optimization procedure (denoted as the PS-optimization) for optimizing both price and negotiation speed while the (original) P-optimizing negotiation model in itself have the optimization procedure (denoted as the P-optimization) for optimizing price only. Hence, a PS-optimizing negotiation mechanism that enables rational PS-optimization is required to find effective negotiation strategies of PS-optimizing agents.

In designing the PS-optimizing negotiation mechanism, it is assumed that PS-optimizing agents do not change their preferences of price and negotiation speed with the knowledge of the opponent’s information. This means that the PS-optimizing agents with their given preferences of price and negotiation speed are cooperative for optimizing both price and negotiation speed. This assumption makes sense in that PS-optimizing negotiations will not operate well if negotiating agents do not cooperate at all. For instance, if one agent having a bargaining advantage in terms of time tries to achieve higher speed utility without conceding any price utility, there is no reason for its opponent to make an earlier agreement by conceding its speed utility. Furthermore, PS-optimizing agents should be trustable in the sense that they cooperate for optimizing negotiation speed without changing its initial preferences of price and negotiation speed during negotiation process. Another issue for designing PS-optimizing agents is to determine: (1) the value(s) of preferable agreement price in SS _NP using $w_{\mathit{NP}}^{x}$ and (2) the value(s) of preferable agreement time in SS _NS using $w_{\mathit{NS}}^{x}$. Determining such a range of values within effective SS _PS-opt (consisting of SS _NP and SS _NS) is essential because there can be different realizations of PS-optimizing negotiations depending on the values. The specific details for designing PS-optimizing agents to find optimal negotiation strategies will be described in Sect. 4.

3 Overview of EDAs for coevolutionary learning

EDAs, sometimes called probabilistic model building genetic algorithms (PMBGAs), have become one of the new paradigms within genetic and evolutionary computation research [17, 42]. Like other EAs based on ideas borrowed from genetics and natural selection such as genetic algorithms (GAs), evolutionary strategies (ESs) and evolutionary programming (EP), EDAs also use selection to choose good candidate solutions and successively evolve a population of the selected solutions until some termination criteria are satisfied. However, to evolve a population of promising solutions, EDAs build probabilistic models of the selected solutions and sample useful genetic information (i.e., good offspring) from the probabilistic models instead of using variation operators such as crossover and mutation. From the perspective of the fitness landscape (that is the geographical distribution consisting of peaks and valleys of fitness over solution space), while EAs such as GAs, ESs and EP search promising regions (i.e., solutions) of fitness landscape with both exploitation and exploration using genetic operators (i.e., selection and variation operators), EDAs search the regions by exploiting feasible probabilistic models and efficiently traversing the solution space [1, 25].

This section demonstrates the application of EDAs to solve the coevolutionary problem of finding optimal negotiation strategies of PS-optimizing agents operating under an incomplete information setting. First, S-EDA is presented. Then, ID²C-EDA incorporating S-EDA with a novel diversity controlling technique is presented. Table 1 shows symbols used for the EDAs in this work.

Table 1 Symbols used for the EDAs

Full size table

3.1 Description of S-EDA

The S-EDA is based on the continuous (i.e., real-coded) univariate marginal distribution algorithm (UMDA_c) [17]. The pseudocode of S-EDA is presented as follows.

Step 1. :

Initialization

Generate the initial population P ₀ with n _P individuals at random;

g←0; cnt←0.

Step 2. :

Selection

g++;

Select a set of promising candidates S _g−1 with n _S(<n _P) individuals from P _g−1.

Step 3. :

Building Model

Estimate the probability distribution $f_{\mathbf{X}^{g}}(\mathbf{x}^{g})$ from S _g−1.

Step 4. :

Sampling Model

Generate offspring O _g with n _O individuals by sampling $f_{\mathbf{X}^{g}}(\mathbf{x}^{g})$.

Step 5. :

Replacement

Create a new population P _g by replacing some individuals of P _g−1 with O _g.

Step 6. :

Reinitializing Population and Restarting Evolution

If cnt<CNT ^max and an inappropriate configuration is detected,

initialize P _g at random;

g←0; cnt++;

Go to Step 2.

Step 7. :

Termination

If the termination criteria are not satisfied,

go to Step 2.

Else return the best solution found so far.

The main distinguishing features of S-EDA (compared to other EAs adopting genetic operators) is building a probabilistic model (Step 3) and sampling the model to generate new solutions, i.e., offspring (Step 4). A continuous optimization problem with n variables is considered. The corresponding n-dimensional random variable and one of its possible instances at each generation g is denoted as $\mathbf{X}^{g} = (X_{1}^{g},X_{2}^{g},\ldots,X_{n}^{g})$ and $\mathbf{x}^{g} = (x_{1}^{g},x_{2}^{g},\ldots,x_{n}^{g})$. Following UMDA_c, S-EDA assumes that marginal independence among the variables. Hence, the joint probability distribution of X ^g follows an n-dimensional normal distribution which is factorized as a product of n independent univariate marginal distributions as follows.

$$f_{\mathbf{X}^{g}}\bigl(\mathbf{x}^{g}\bigr) = \prod _{i = 1}^{n} f_{X_{i}^{g}}\bigl(x_{i}^{g} \bigr). $$

Each variable of X ^g follows a univariate normal distribution with mean $\mu_{i}^{g}$ and the standard deviation $\sigma_{i}^{g}$ as follows:

$$f_{X_{i}^{g}}\bigl(x_{i}^{g}\bigr) = \frac{1}{\sqrt{2\pi} \sigma_{i}^{g}}e^{ - \frac{(x_{i}^{g} - \mu _{i}^{g})^{2}}{2(\sigma _{i}^{g})^{2}}},\quad \mbox{with } i = 1,2,\ldots,n. $$

$\mu_{i}^{g}$ and $\sigma_{i}^{g}$ are estimated using maximum likelihood estimation from S _g−1 as follows:

where $(x_{i}^{g - 1})_{j}$ is the j-th individual in S _g−1.

Then, offspring O _g are randomly generated by sampling normal random variables from $f_{\mathbf{X}^{g}}(\mathbf{x}^{g})$.

Another distinguishing feature of S-EDA (compared to UMDA_c, as well as other conventional EAs) is reinitializing population and restarting the evolution (Step 6). This is to escape an inappropriate configuration of populations (where S-EDA can no longer evolve a population containing promising solutions for future evolution process) though simply restarting evolution process with randomly initialized population. In the coevolutionary learning problem (described in Sect. 4.2 in detail), there can be two types of inappropriate population configurations: (1) Type-I error: P _g cannot converge to a certain value until G ^max is reached (i.e., very slow convergence or non-convergence is presented) and (2) Type-II error: This is due to premature convergence occurring at early generations (generally, $g \le G^{\mathit{max\_infeasible\_band}}$) caused by the domination of inappropriate individuals in P _g having fitness values of all 0s or all 1s (that cannot occur in the coevolution problem described in Sect. 4.2), which occurs from repeated inappropriate random pairing of individuals [25]. If an inappropriate populations is detected, S-EDA initializes P _g and restarts its evolution procedures. Step 6 is incorporated before testing termination criteria in Step 7. The S-EDA stops its evolution process and returns the best solution found so far when either of the following conditions is satisfied (Step 7): (1) g=G ^max and cnt=CNT ^max, and (2) $| f_{\mathit{best}}^{g} - f_{\mathit{best}}^{g} | < \delta_{\mathit{fit}}$ and $\operatorname {Var}(P_{g}) < \delta_{\mathit{var}}$.

3.2 Description of ID²C-EDA

In the authors’ previous works [8] and [10], the novel diversity controlling GA and EDA called ID²C-GA and ID²C-EDA were developed based on a real-coded GA (called S-GA) and S-EDA, respectively. In [8], ID²C-GAs and ID²C-EDAs were used for coevolutionary learning in which the objective is to find optimal (P-optimizing) negotiation strategies for interacting agents with incomplete information. Although this work is similar to [8] and [10] in a sense that both adopt EAs for (a similar type of) coevolutionary learning, they are mainly different in two ways: (1) [8] and [10] only focused on finding negotiation strategies for optimizing price only but this work deals with the more difficult problem of finding negotiation strategies that can optimize both price and negotiation speed and (2) while the fitness functions of [8] and [10] are directly related with the (price) utility function, the fitness functions of this work has indirect relationship with the (price and speed) utility functions. It is noted that it has been proved theoretically and demonstrated empirically that the performances of GAs and EDAs are found to be very close to each other although they adopt quite different search strategies [17, 25]. Furthermore, it is empirically observed in [10] that: (1) ID²C-GAs and ID²C-EDAs outperform S-GAs and S-EDAs for the coevolutionary learning because ID²C-GAs and ID²C-EDAs have enough capability for overcoming premature convergence and achieving non-biased coevolution results for both populations and (2) ID²C-EDAs ensures better efficacy and reliability in achieving good solutions than ID²C-GAs if the search space is (very) large. For these reasons, this work adopts ID²C-EDAs and S-EDAs for the coevolutionary learning to carry out comparative studies on their coevolution performance. ID²C-EDAs adopts a subspace-based dynamic (i.e., adaptive) diversity controlling technique called modified (i.e., improved) diversification and refinement (mDR) and two local improvement methods such as population repair (PR) and local neighborhood search (LNS). The pseudocode of ID²C-EDA is presented as follows.

Step 1. :

Initialization

Generate the initial population P ₀ with n _P individuals at random;

g←0; cnt←0.

Step 2. :

Selection

g++;

Select a set of promising candidates S _g−1 with n _S(<n _P) individuals from P _g−1.

Step 3. :

Building Model

Estimate the probability distribution $f_{\mathbf{X}^{g}}(\mathbf{x}^{g})$ from S _g−1.

Step 4. :

Sampling Model

Generate offspring O _g with n _O individuals by sampling $f_{\mathbf{X}^{g}}(\mathbf{x}^{g})$.

Step 5. :

Replacement

Create a new population P _g by replacing some individuals of P _g−1 with O _g.

Step 6. :

Diversification and Refinement (DR)

Calculate $\operatorname {Div}(P_{g})$;

If $\operatorname {Div}(P_{g}) < \delta_{\mathit{low}}$ and $\operatorname {Div}(P_{g}) > \delta_{\mathit{high}}$, conduct the following procedures of DR

A.
Pre-ordering individuals in P _g in both fitness and solution spaces;
B.
Eliminating redundant individuals in P _g using the similarity of individuals;
C.
Calculating BOF_i and update C _i (1≤i≤n _band);
D.
Eliminating infeasible bands if $g > G^{\mathit{max\_infeasible\_band}}$;
E.
Refining the population using diversified artificial individuals (DAIs)
1. (a)
  Generating DAIs using BOFs based on population diversity;
2. (b)
  Injecting the generated DAIs into the population.

Else calculate BOF_i and update C _i (1≤i≤n _band).

Step 7. :

Population Repair (PR)

Replace some infeasible individuals consisting of an inappropriate population configuration with new individuals randomly generated using the feasible individual list (FI_List).

Step 8. :

Local Neighborhood Search (LNS)

Replace some less feasible individuals (having lower fitness) by the neighborhoods generated from the locally best solution in the population.

Step 9. :

Reinitializing Population and Restarting Evolution

If cnt<CNT ^max and an inappropriate configuration is detected,

initialize P _g at random;

g←0; cnt++;

Go to Step 2.

Step 10. :

Termination

If the termination criteria are not satisfied,

go to Step 2.

Else return the best solution found so far.

mDR (Step 6) is the main part of ID²C-EDA and its objective is to achieve individuals (of a population) that are both feasible and diversified through a dynamic diversity control of the population. mDR utilizes robustness of bands (ROBs) in which a band is defined as a distinct (small) fraction of solution space and the solution space is mapped into bands with each fixed size of Band_Size. A band i is defined as the more robust one than another band j (i≠j and 1≤i,j≤n _band) if more individuals belonging to i have survived for more generations than j. For measuring ROBs of solution space at each generation g, each band i: (1) counts band-occupying frequency (BOF_i) which is the accumulated frequency of individuals belonging to i until g is reached and (2) stores BOF_i into the global counter variable C _i. mDR operates selectively depending on the population diversity of $P_{g}, \operatorname {Div}(P_{g})$. mDR operates if $\operatorname {Div}(P_{g})$ is below the given lowest possible threshold δ _low (i.e., $\operatorname {Div}(P_{g}) < \delta_{\mathit{low}}$) or is above the given highest possible threshold δ _high (i.e., $\operatorname {Div}(P_{g}) > \delta_{\mathit{high}}$). Otherwise (i.e., $\delta_{\mathit{low}} \le \operatorname {Div}(P_{g}) \le\delta_{\mathit{high}}$), mDR does not operate. In this way, BOF_i is calculated and C _i is updated, 1≤i≤n _band. mDR has two main functionalities: (1) diversification ensures achieving a sufficiently high population diversity (from A and B in Step 6), and (2) refinement guarantees achieving a refined population consisting of more promising (i.e., feasible and diversified) solutions (from D and E in Step 6) using ROB information (from C in Step 6). The details of mDR are as follows:

A.
Pre-ordering the population.

Before eliminating redundant individuals, ordering individuals in P _g is carried out in both the fitness and solution spaces. First, individuals in P _g are sorted according to their fitness values in decreasing order. Next, if there are some individuals with the same fitness values (or less than the predefined threshold δ _fit), then the individuals are sorted according to their solution values in decreasing order. Then, the ordered population $P_{g}^{\mathit{Ordered}}$ is obtained.
B.
Eliminating redundant individuals.

Domination of redundant (or duplicate) individuals can lead to premature convergence. This is because redundant individuals with a similar structure reduce population diversity; parents with a similar structure can often reproduce offspring with the same (or very similar) structure in the next generation. To avoid such premature convergence, redundant individuals in $P_{g}^{\mathit{Ordered}}$ are eliminated from $P_{g}^{\mathit{Ordered}}$ after testing the similarity among individuals in both fitness and solution levels as follows. If i-th and j-th individuals (1≤i≤n _P and i+1≤j≤n _P) in $P_{g}^{\mathit{Ordered}}$ have very close fitness values (that is less than the given threshold α) and also very close solution values (that is less than the given threshold β), the j-th individual is considered as the redundant individual and is eliminated from $P_{g}^{\mathit{Ordered}}$. This similarity test is carried out from i=1 to n _P−1. Then, the population $P_{g}^{\mathit{Eliminated}}$ with high population diversity can be achieved.
C.
Calculating BOFs for all bands.

For each band i (1≤i≤n _band), BOF_i is calculated from $P_{g}^{\mathit{Eliminated}}$ and the corresponding C _i is updated. By calculating BOFs from $P_{g}^{\mathit{Eliminated}}$—not from P _g or $P_{g}^{\mathit{Ordered}}$ in which both can have redundant individuals, only the individuals ensuring the higher population diversity will contribute to calculating BOFs. Therefore, reliable BOFs without redundant information are guaranteed.
D.
Eliminating infeasible bands.

If C _i of the band i is not updated (i.e., there is no individual belonging to i) during a certain number of generations ($G^{\mathit{max\_infeasible\_band}}$), i is considered as the infeasible band and is removed from the feasible band list (FB_List) containing all feasible bands. This procedure operates if $g > G^{\mathit{max\_infeasible\_band}}$ because at least $G^{\mathit{max\_infeasible\_band}}$ is required to collect infeasible band information for carrying out such elimination.
E.
Refining the population.

Since redundant individuals are eliminated (in the B-th Step), diversified artificial individuals (DAIs) are generated and injected into $P_{g}^{\mathit{Eliminated}}$ equal to the number of the eliminated individuals. For generating feasible DAIs using achieved reliable BOF information, bands belonging to FB_List and having high values of BOFs (i.e., representing high robustness) are considered as promising solution regions for future evolutionary search. As evolution progresses, more effective BOFs will be achieved because more reliable ROB information will be gathered over all bands at later generations.
1. (a)
  Each promising DAI is generated randomly in the selected band based on the two modes: (1) In exploration mode ($\operatorname {Div}(P_{g}^{\mathit{Eliminated}}) < \delta_{\mathit{low}}$), the bands with the lower BOFs have a higher probability to be selected for generating DAIs and (2) In exploitation mode ($\operatorname {Div}(P_{g}^{\mathit{Eliminated}}) > \delta_{\mathit{high}}$), the bands with the higher BOFs have a higher probability to be selected for generating DAIs.
2. (b)
  The generated DAIs are injected into $P_{g}^{\mathit{Eliminated}}$.

Finally, $P_{g}^{\mathit{Refined}}$ can be ensured that all individuals are both feasible and diversified.

Although the reinitializing population and restarting evolution (in Step 6 of S-EDA and Step 9 of ID²C-EDA) can be a solution for both Type-I and the Type-II errors resulting in domination of infeasible individuals, extremely large overheads (in terms of both computation and time) are inevitable using the procedure. Therefore, PR and LNS are devised to overcome such drawbacks.

PR (Step 7) is devised to prevent domination of infeasible individuals (having fitness values of all 0s or all 1s) due to the Type-II error by replacing infeasible individuals with feasible individuals. Using the feasible individual list (FI_List) consisting of feasible individuals stored in the previous evolution step, PR replaces infeasible individuals with randomly selected individuals from FI_List.

LNS (Step 8) is devised to solve non-convergence or very slow convergence due to the Type-I error by compensating degradation of the S-EDA’s search efficiency during coevolution process. First, LNS generates effective solution candidates from the neighboring bands of the band involving the current local optimal solution. Then, LNS replaces a small number of individuals (denoted as LNS_SIZE) having the lowest fitness values with the generated solution candidates. The replaced solution candidates can contribute to accelerating convergence of the population.

Since PR and LNS replace some infeasible and less feasible individuals with more promising solutions, they can be considered as replacement techniques. Whereas the replacement (in Step 5) is the technique that simply replaces some individuals of P _g−1 with some individuals from O _g to create P _g, PR and LNS are adaptive techniques that replace some infeasible and less feasible individuals in P _g (after Step 5) or $P_{g}^{\mathit{Refined}}$ (after Step 6) with feasible and more promising solution candidates.

4 PS-optimizing agents using acceptability zones

In a complete information setting, any adoption of SS _PS-opt can be allowed because it is possible for a PS-optimizing agent to calculate optimal PS-optimizing negotiation outcomes and corresponding negotiation strategies using its opponent’s known information. However, in practical situations, it may be difficult for agents to obtain complete information about their opponents because agents generally do not expose their private information, strategies and preferences due to strategic reasons. Nevertheless, one of the greatest challenges in designing agents for PS-optimizing negotiations is the mathematical formulations of optimal PS-optimizing negotiation outcomes and negotiation strategies under a complete information setting (Sect. 4.1). This is because they can be directly used to verify the effectiveness (or correctness) of the coevolved solutions under an incomplete information setting (Sect. 4.2). Herein, PS-optimization under an incomplete information setting largely depends on the choice of SS _PS-opt. Hence, for designing PS-optimizing agents operating properly in both complete and incomplete information settings, it is crucial to choose effective SS _PS-opt of both PS-optimizing agents.

In determining SS _PS-opt, the two solution spaces SS _NP and SS _NS are considered independently. That is, an agent x adopts $\mathit{AccZ}_{x}^{\mathit{NP}}$ as SS _NP for optimizing price and $\mathit{AccZ}_{x}^{\mathit{NT}}$ as SS _NS for optimizing negotiation speed. Such SS _PS-opt adoption provides a significant advantage for executing (population-based) PS-optimizing negotiations and finding effective negotiation strategies for both PS-optimizing agents B and S using a coevolutionary learning approach under an incomplete information setting (Sect. 4.2). This is because B and S do not require each other’s private negotiation parameters for establishing SS _PS-opt before conducting a PS-optimizing negotiation as $\mathit{AccZ}_{x}^{\mathit{NP}}$ and $\mathit{AccZ}_{x}^{\mathit{NT}}$ of x are independently considered from those of its opponent in the PS-optimization process. To demonstrate the effectiveness of such SS _PS-opt, consider a counter-example such that AgZ ^NP and AgZ ^NT are adopted as SS _NP and SS _NS, respectively, for SS _PS-opt. Then, before carrying out a PS-optimizing negotiation, each agent needs to achieve its opponent’s private information (such as RP and deadline) for establishing SS _PS-opt; however, estimating the opponent’s accurate private information is a very difficult (and sometimes impossible) problem itself under an incomplete information setting.

4.1 PS-optimizing agents with complete information

Given $w_{\mathit{NP}}^{x}$ and $w_{\mathit{NS}}^{x}$, each PS-optimizing agent x with complete information is designed to use: (1) $\mathit{AccZ}_{x}^{\mathit{NP}}$ (i.e., [IP _x,RP _x]) for optimizing price using $w_{\mathit{NP}}^{x}$ and (2) $\mathit{AccZ}_{x}^{\mathit{NT}}$ (i.e., [0,τ _x]) for optimizing negotiation speed using $w_{\mathit{NS}}^{x}$. Although x still adopts the price utility $U_{\mathit{NP}}^{x}$ in (5), the speed utility $U_{\mathit{NS}}^{x}$ in (6) needs to be further modified to reflect characteristics of the preference of negotiation speed of x.

Using $w_{\mathit{NP}}^{x}$ and $w_{\mathit{NS}}^{x}, x$ decides the desired agreement price ($\mathit{dP}_{c}^{x}$) in $\mathit{AccZ}_{x}^{\mathit{NP}}$ and desired agreement time ($\mathit{dT}_{c}^{x}$) in $\mathit{AccZ}_{x}^{\mathit{NT}}$, respectively. First, x determines $\mathit{dP}_{c}^{x}$ satisfying the preference criterion of price in $\mathit{AccZ}_{x}^{\mathit{NP}}$ using $w_{\mathit{NP}}^{x}$:

(8)

x treats $\mathit{dP}_{c}^{x}$ as the most favorable possible agreement price because at which x maximizes $U_{\mathit{NP}}^{x}$ in (5) and hence, it is the upper bound of $U_{\mathit{NP}}^{x}$ satisfying its preference criterion of price; (1) if x concedes less price utility than $U_{\mathit{NP}}^{x}(\mathit{dP}_{c}^{x})$, the price utility of its opponent will be decreased, and conversely, (2) if x concedes more price utility than $U_{\mathit{NP}}^{x}(\mathit{dP}_{c}^{x}), U_{\mathit{NP}}^{x}$ will be decreased. Second, x determines $\mathit{dT}_{c}^{x}$ satisfying the preference criterion of negotiation speed in $\mathit{AccZ}_{x}^{\mathit{NT}}$ using $w_{\mathit{NS}}^{x}$:

$$ \mathit{dT}_{c}^{x} = \tau_{x} \cdot\bigl(1 - w_{\mathit{NS}}^{x}\bigr). $$

(9)

x treats the range of negotiation time in $[0, \mathit{dT}_{c}^{x}]$ as the favorable possible agreement times because: (1) all the agreement times shorter than $\mathit{dT}_{c}^{x}$ satisfy the preference criterion of negotiation speed and (2) when the negotiation begins, all the negotiation times in $[0, \mathit{dT}_{c}^{x}]$ can be mutually favorable possible agreement times satisfying the preference criterion of negotiation speed for both x and its opponent. To incorporate $U_{\mathit{NS}}^{x}$ in (6) with the favorable possible agreement times, we define the new speed utility function $U_{\mathit{NS}\text{-}\mathit{mapped}}^{x}(T_{c}^{x})$ for an agreement time $T_{c}^{x}$ as follows:

$$ U_{\mathit{NS}\text{-}\mathit{mapped}}^{x}\bigl(T_{c}^{x}\bigr) = \begin{cases} U_{\mathit{NS}}^{x}(\mathit{dT}_{c}^{x}), & \mbox{if }T_{c}^{x} \in [0, \mathit{dT}_{c}^{x}], \\ U_{\mathit{NS}}^{x}(T_{c}^{x}), & \mathrm{otherwise}. \end{cases} $$

(10)

Then, the total utility function $U_{\mathit{Total}}^{x}$ in (7) is modified as follows:

$$ \begin{aligned}[b] U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}( P_{x},T_{x} ) &= w_{\mathit{NP}}^{x} \times U_{\mathit{NP}}^{x}( P_{x} ) \\ &\quad {}+ w_{\mathit{NS}}^{x} \times U_{\mathit{NS}\text{-}\mathit{mapped}}^{x} \bigl(T_{c}^{x}\bigr). \end{aligned} $$

(11)

Compared to P-optimizing agents, each PS-optimizing agent x: (1) makes concessions in the range of prices from IP _x up to the price less than $\mathit{dP}_{c}^{x}$, which corresponds to (at most) the amount of the price utility $w_{\mathit{NP}}^{x}(|\mathit{IP}_{x} - \mathit{dP}_{c}^{x}|)$, and (2) aims to achieve a faster agreement time that is equal to or less than $\mathit{dT}_{c}^{x}$ in the hope of achieving (at least) the amount of speed utility $U_{\mathit{NS}}^{x}(|\tau_{x} - \mathit{dT}_{c}^{x}|)$. In a P-optimizing negotiation under a complete information setting, $P_{c}^{P\text{-}\mathit{opt}}$ and $T_{c}^{P\text{-}\mathit{opt}}$ can be specified by either of Theorems 1 and 2 (Sect. 2.1) depending on a bargaining advantage in terms of time. For a PS-optimizing negotiation under a complete information setting, a similar analysis based on a bargaining advantage in terms of time can be applied to determine the optimal agreement price and negotiation time.

At first, we define possible AgZ ^NP and AgZ ^NT between the PS-optimizing agents B and S in NSS. The possible AgZ ^NP determined by $\mathit{dP}_{c}^{B}$ and $\mathit{dP}_{c}^{S}$ is given as $[\min(\mathit{dP}_{c}^{B}, \mathit{dP}_{c}^{S}), \max(\mathit{dP}_{c}^{B}, \mathit{dP}_{c}^{S})]$ and the possible AgZ ^NT determined by $\mathit{dT}_{c}^{B}$ and $\mathit{dT}_{c}^{S}$ is given as $[0, \min\{ \mathit{dT}_{c}^{B}, \mathit{dT}_{c}^{S}\}]$—which is the overlapping region of $\mathit{dT}_{c}^{B}$ and $\mathit{dT}_{c}^{S}$. Then, following Definition 2, PS-optimizing agents are designed to optimize price and speed utilities in the possible AgZ ^NP and AgZ ^NT, respectively, to maximize the total utility in (11). Next, the optimal PS-optimizing negotiation outcomes consisting of the optimal agreement price ($P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$) and optimal agreement time ($T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$) will be determined based on a bargaining advantage in terms of time and are defined as follows.

Definition 3

If a PS-optimizing agent x has a bargaining advantage in terms of time over its opponent, (1) $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is the price that maximizes $U_{\mathit{NP}}^{x}$ in the possible AgZ ^NP and (2) $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is the range of negotiation times satisfying both agents’ preference criteria of negotiation speed in the possible AgZ ^NT.

Following Definition 3, $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ are obtained from the following Theorems 3 and 4 depending on a bargaining advantage in terms of time.

Theorem 3

If the PS-optimizing agent B has a longer deadline than the PS-optimizing agent S, (1) $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is made at $\min(\mathit{dP}_{c}^{B}, \mathit{dP}_{c}^{S})$ and (2) any agreement time in $[0, \min(\mathit{dT}_{c}^{B}, \mathit{dT}_{c}^{S})]$ is $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$.

Proof

Since B has a longer deadline than S,B has a bargaining advantage over S in terms of time; hence, the final agreement price and agreement time will be completely determined by B. From Definition 3, $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is $\min(\mathit{dP}_{c}^{B}, \mathit{dP}_{c}^{S})$ at which $U_{\mathit{NP}}^{B}$ is maximized. Since $\mathit{AccZ}_{B}^{\mathit{NT}}$ is $[0, \mathit{dT}_{c}^{B}]$ and $\mathit{AccZ}_{S}^{\mathit{NT}}$ is $[0, \mathit{dT}_{c}^{S}], T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is determined as $[0, \min\{ \mathit{dT}_{c}^{B}, \mathit{dT}_{c}^{S}\}]$—which is the overlapping region between $\mathit{AccZ}_{B}^{\mathit{NT}}$ and $\mathit{AccZ}_{S}^{\mathit{NT}}$ and at which both B and S satisfy their preference criteria of negotiation speed. □

Theorem 4

If the PS-optimizing agent S has longer deadline than the PS-optimizing agent B, (1) $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is made at $\max(\mathit{dP}_{c}^{B}, \mathit{dP}_{c}^{S})$ and (2) any agreement time in $[0, \min(\mathit{dT}_{c}^{B},\allowbreak \mathit{dT}_{c}^{S})]$ is $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$.

Proof

Symmetrically, $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is $\max(\mathit{dP}_{c}^{B}, \mathit{dP}_{c}^{S})$ at which $U_{\mathit{NP}}^{S}$ is maximized; $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is the overlapping region $[0,\allowbreak \min\{ \mathit{dT}_{c}^{B}, \mathit{dT}_{c}^{S}\}]$ at which both B and S satisfy their preference criteria of negotiation speed. □

Figure 2 shows an example of the agreement behavior between PS-optimizing agents B and S when B has a longer deadline than S. The negotiation parameters for B and S are as follows: (1) IP _B=5, RP _B=80, τ _B=100 and $(w_{\mathit{NP}}^{B}, w_{\mathit{NS}}^{B}) = (0.7, 0.3)$ for B; (2) IP _S=95, RP _S=15, τ _S=50 and $(w_{\mathit{NP}}^{S}, w_{\mathit{NS}}^{S}) = (0.5, 0.5)$ for S. The NSS is [15,80] for AgZ ^NP and [0,50] for AgZ ^NT. $\mathit{dP}_{c}^{x}$ and $\mathit{dT}_{c}^{x}$ are determined by (8) and (9), respectively: (1) For the given $(w_{\mathit{NP}}^{B}, w_{\mathit{NS}}^{B}) = (0.7, 0.3), \mathit{dP}_{c}^{B} = 27.5$ and $\mathit{dT}_{c}^{B} = 70$ and (2) for the given $(w_{\mathit{NP}}^{S}, w_{\mathit{NS}}^{S}) = (0.5, 0.5), \mathit{dP}_{c}^{S} = 55$ and $\mathit{dT}_{c}^{S} = 25$. Then, the possible AgZ ^NP is determined as [55, 70] and the possible AgZ ^NT is determined as [0, 25]. Finally, $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ will be determined by Theorem 3 (because B has the bargaining advantage over S); hence, $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is 27.5 and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is [0, 25].

We have so far determined $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ using Theorems 3 and 4 under the assumption that $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ belong to NSS (e.g., Fig. 2). However, if $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is outside of NSS, any agreement cannot be made within NSS; therefore, Theorems 3 and 4 can be no longer applied to determine $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$. In general, $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ sometimes may be outside of NSS depending on the input parameter values of the agent d having a bargaining advantage in terms of time. This is because d determines $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ based on optimizing $\mathit{AccZ}_{d}^{\mathit{NP}}$ and $\mathit{AccZ}_{d}^{\mathit{NT}}$ using $w_{\mathit{NP}}^{d}$ and $w_{\mathit{NS}}^{d}$, respectively, without considering the agreement zones between d and its opponent (i.e., d does not consider AgZ ^NP and AgZ ^NT for carrying out the PS-optimizing negotiation). $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ can be made outside of AgZ ^NP if: (1) d is B and $\mathit{dP}_{c}^{B}$ is less than RP _S and (2) d is S and $\mathit{dP}_{c}^{S}$ is larger than RP _B (e.g., see Fig. 3). However, in the case of $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$, it always belongs to AgZ ^NT because $\mathit{dT}_{c}^{B}$ and $\mathit{dT}_{c}^{S}$ are always less than or equal to τ _B and τ _S, respectively. In summary, it is concluded that for x: (1) if sufficient AgZ ^NP is provided for carrying out a PS-optimizing negotiation, $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ can be determined using either Theorems 3 or 4; however, (2) if AgZ ^NP is provided for carrying out a PS-optimizing negotiation, $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ cannot be determined using Theorems 3 and 4.

Given that insufficient AgZ ^NP is provided for carrying out a PS-optimizing negotiation, the following definition is adopted for designing agreement behaviors of PS-optimizing agents.

Definition 4

If insufficient AgZ ^NP is provided for carrying out a PS-optimizing negotiation, $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ are made at the nearest point in NSS from $\mathit{dP}_{c}^{d}$ and $\mathit{dT}_{c}^{d}$ of the agent d having a bargaining advantage in terms of time.

Definition 4 leads to Theorem 5 showing that the proposed PS-optimizing agents satisfy (at least) the minimum performance requirement for PS-optimizing agents (defined in Sect. 2.2) even under the given condition that insufficient AgZ ^NP is provided.

Theorem 5

If insufficient AgZ ^NP is provided, PS-optimizing negotiation outcomes equal to P-optimizing negotiation outcomes for the given same negotiation settings.

Proof

If B has a bargaining advantage in terms of time, the nearest price from $\mathit{dP}_{c}^{B}$ in NSS is RP _S (at which $U_{\mathit{NP}}^{B}$ is maximized) and the nearest negotiation time is τ _S. Hence, following Definition 4, $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ are made at RP _S and τ _S, respectively—which has the same result of the P-optimizing negotiation given by Theorem 1. Similarly, if S has a bargaining advantage in terms of time, the nearest price from $\mathit{dP}_{c}^{S}$ in NSS is RP _B (at which the price utility $U_{\mathit{NP}}^{S}$ is maximized) and the nearest negotiation time is τ _B. Hence, following Definition 4, $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ are made at RP _B and τ _B, respectively—which has the same result of the P-optimizing negotiation given by Theorem 2. □

Figure 3 shows an example of the agreement behaviors between PS-optimizing agents B and S when (1) B has a longer deadline than S and (2) insufficient AgZ ^NP is provided. The negotiation parameters for B and S are as follows: (1) IP _B=15, RP _B=60, τ _B=100 and $(w_{\mathit{NP}}^{B}, w_{\mathit{NS}}^{B}) = (0.7, 0.3)$ for B; (2) IP _S=85, RP _S=40, τ _S=50 and $(w_{\mathit{NP}}^{S}, w_{\mathit{NS}}^{S}) = (0.5, 0.5)$ for S. The NSS is [40,60] for AgZ ^NP and [0,50] for AgZ ^NT. $\mathit{dP}_{c}^{x}$ and $\mathit{dT}_{c}^{x}$ are determined by (8) and (9), respectively; (1) For the given $(w_{\mathit{NP}}^{B}, w_{\mathit{NS}}^{B}) = (0.7, 0.3)$, $\mathit{dP}_{c}^{B} = 28.5$ and $\mathit{dT}_{c}^{B} = 70$ and (2) for the given $(w_{\mathit{NP}}^{S}, w_{\mathit{NS}}^{S}) = (0.5, 0.5)$, $\mathit{dP}_{c}^{S} = 62.5$ and $\mathit{dT}_{c}^{S} = 25$. Then, the possible AgZ ^NP is determined as [40,60] and the possible AgZ ^NT is determined as [0,25]. Since B has a bargaining advantage over S in terms of time, $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is 28.5 and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is [0,25] following Theorem 3. However, $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is not in AgZ ^NP while $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ is in AgZ ^NT; the agreement points of such PS-optimizing negotiation are not made in NSS (i.e., PS-optimizing negotiation fails without making an agreement). Therefore, from Theorem 5, $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ are made at 28.5 and 25, respectively—which are equal to $P_{c}^{P\text{-}\mathit{opt}}$ and $T_{c}^{P\text{-}\mathit{opt}}$ achieved from Theorem 1 (i.e., $P_{c}^{P\text{-}\mathit{opt}} = 28.5$ and $T_{c}^{P\text{-}\mathit{opt}} = 25$).

Finally, from the achieved $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ (using one of Theorems 3 to 5), optimal negotiation strategies of PS-optimizing agents B and S (to carry out the optimal PS-optimizing negotiation) can be achieved. Specifically, the optimal PS-optimizing negotiation strategies of B and S are derived from (1) by substituting $P_{t}^{x}$ and t in (1) by $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$, respectively, as follows:

(12)

(13)

4.2 PS-optimizing agents with incomplete information

Given $w_{\mathit{NP}}^{x}$ and $w_{\mathit{NS}}^{x}$, each PS-optimizing agent x with incomplete information also adopts: (1) $\mathit{AccZ}_{x}^{\mathit{NP}}$ for optimizing price using $w_{\mathit{NP}}^{x}$ and (2) $\mathit{AccZ}_{x}^{\mathit{NT}}$ for optimizing negotiation speed using $w_{\mathit{NS}}^{x}$. Since PS-optimizing agents with incomplete information do not know their opponents’ private information (such as RP, deadline and preferences for price and negotiation time), their opponents’ desired agreement points are unknown. Therefore, the agents cannot apply one of Theorems 3 to 5 directly for determining their optimal negotiation outcomes. Owing to the lack of information about their opponents’ private information, this research adopts a coevolutionary learning approach to find effective PS-optimizing negotiation strategies for both PS-optimizing agents B and S. Coevolutionary learning approaches have long been used to model competitive coevolution problems (e.g., the iterated prisoner’s dilemma [3]).

Given that PS-optimizing agents B (with IP _B, RP _B, τ _B and $(w_{\mathit{NP}}^{B}, w_{\mathit{NS}}^{B})$) and S (with IP _S, RP _S, τ _S and $(w_{\mathit{NP}}^{S}, w_{\mathit{NS}}^{S})$), the following three key components are required for the coevolutionary learning:

(C1)
Creating populations: Two heterogeneous populations with size n _P are created: POP _B where individuals consist of Bs and POP _S where individuals consist of Ss. Throughout the rest of this paper, we use: (1) the term “individual” interchangeably with “agent” and (2) the term “solution” interchangeably with “(PS-optimizing) negotiation strategy” depending on the context.
(C2)
Initialization: Individuals of POP _B and POP _S are initialized. All individuals in POP _B are initialized with IP _B, RP _B, τ _B and $(w_{\mathit{NP}}^{B}, w_{\mathit{NS}}^{B})$; however, the negotiation strategy of each individual in POP _B is randomly determined in the possible strategy range [λ _lower,λ _upper] where λ _lower is the lower bound of possible strategies and λ _upper is the upper bound of possible strategies. In the same manner, all individuals in POP _S are initialized with IP _S, RP _S, τ _S and $(w_{\mathit{NP}}^{S}, w_{\mathit{NS}}^{S})$; however, the negotiation strategy of each individual in POP _S is randomly determined in [λ _lower,λ _upper]. Hence, individuals are mainly characterized by their negotiation strategies in its population.
(C3)
Making interactions between populations: Coevolutionary interactions between POP _B and POP _S are carried out as follows:
1. a.
  Individuals in POP _B and POP _S are randomly chosen and matched in a one-to-one manner.
2. b.
  Each matched pair of POP _B and POP _S conducts a PS-optimizing negotiation without the knowledge of private information of its opponent.
As a result of the coevolutionary interaction, individuals in POP _B and POP _S obtain negotiation outcomes.

The coevolutionary learning procedure using the same type of EDAs for both POP _B and POP _S is as follows. Two EDAs (i.e., either two S-EDAs or two ID²C-EDAs) are adopted for coevolving POP _B and POP _S, respectively: one EDA for POP _B and the other EDA for POP _S. Here, Step 1 of S-EDAs and ID²C-EDAs is substituted by the above C1 and C2 to create populations and initialize them. Throughout its evolution procedure described in Sect. 3, each EDA evolves solutions (of individuals) in its population. Each EDA evaluates fitness of individuals from the negotiation outcomes of individuals obtained from C3. Therefore, C3 is executed in the fitness evaluation stage before applying selection. In summary, the negotiation strategies of agents in both POP _B and POP _S are coevolved from: (1) the coevolutionary interaction between POP _B and POP _S in C3 and (2) the evolution procedure for evolving solutions in each population.

In the coevolutionary learning procedure, the main issues for finding (or coevolving) effective PS-optimizing negotiation strategies of B and S are: (1) adopting (or developing) EAs suitable for the coevolution, and (2) given $(w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x})$, designing an appropriate fitness function that can achieve good candidate solutions for a PS-optimizing negotiation.

First, for coevolving optimal PS-optimizing strategies between B and S, special EAs called ID²C-EDAs were adopted for both POP _B and POP _S due to its effectiveness in coevolution learning [8, 10] while S-EDAs were used for comparative studies of the coevolution performance. The coevolutionary learning approach adopting ID²C-EDAs allows us to achieve an approximation to the optimal PS-optimizing negotiation strategies achieved in the complete information setting in Sect. 4.1.

We then need to consider how a PS-optimizing agent x represents the preference of price using $w_{\mathit{NP}}^{x}$ in $\mathit{AccZ}_{x}^{\mathit{NP}}$ and the preference of negotiation speed using $w_{\mathit{NS}}^{x}$ in $\mathit{AccZ}_{x}^{\mathit{NT}}$ in an incomplete information setting. The same definitions of $\mathit{dP}_{c}^{x}$ in (8) and $\mathit{dT}_{c}^{x}$ in (9) were adopted for x with incomplete information adopts. Accordingly, x with incomplete information treats $\mathit{dP}_{c}^{x}$ as the most favorable agreement price and $\mathit{dT}_{c}^{x}$ as one of the favorable agreement times. Then, the fitness function maximizing its values at $\mathit{dP}_{c}^{x}$ and $\mathit{dT}_{c}^{x}$ for the given $(w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x})$ is required. We will briefly examine some drawbacks of the fitness functions in the previous studies and then describe the details of the proposed fitness function.

In most of the previous studies (e.g., [8–10, 23, 32] and [19] where they mostly dealt with P-optimizing negotiations), fitness functions as the form of utility functions were widely adopted. Therefore, total utility functions $U_{\mathit{Total}}^{x}$ in (7) and $U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}$ in (11) also can be adopted as fitness functions for the coevolutionary learning. To do this, we need to evaluate the suitability (or effectiveness) of the fitness functions adopting $U_{\mathit{Total}}^{x}$ and $U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}$. $U_{\mathit{Total}}^{x}$ is calculated by linearly combining the results of: (1) $U_{\mathit{NP}}^{x}$ in (5) multiplied by $w_{\mathit{NP}}^{x}$ and (2) $U_{\mathit{NS}}^{x}$ in (6) multiplied by $w_{\mathit{NS}}^{x}$. Similarly, $U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}$ is calculated by linearly combining the results of: (1) $U_{\mathit{NP}}^{x}$ in (5) multiplied by $w_{\mathit{NP}}^{x}$ and (2) $U_{\mathit{NS}\text{-}\mathit{mapped}}^{x}$ in (10) multiplied by $w_{\mathit{NS}}^{x}$. The difference between the fitness functions using $U_{\mathit{Total}}^{x}$ and $U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}$ is that the fitness function adopting $U_{\mathit{Total}}^{x}$ considers the different emphases in calculating $U_{\mathit{NS}}^{x}$ in $[0, \mathit{dT}_{c}^{x}]$ by giving a higher value to $U_{\mathit{NS}}^{x}$ for a smaller agreement time. However, the fitness function adopting $U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}$ has the same value of $U_{\mathit{NS}}^{x}$ for all agreement times belonging to $[0, \mathit{dT}_{c}^{x}]$ as $U_{\mathit{NS}}^{x}(\mathit{dT}_{c}^{x})$. Using the fitness function as either $U_{\mathit{Total}}^{x}$ or $U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}$, the coevolution performance in terms of intensification capability for coevolving converged solutions can be severely deteriorated. As a result, both EDAs for POP _B and POP _S generally cannot evolve effective PS-optimizing negotiation strategies (within reasonable generations). In the case of the fitness function adopting $U_{\mathit{Total}}^{x}$, this is mainly because the fitness at different $P_{c}^{x}$ and $T_{c}^{x}$ can have the same value as $U_{\mathit{Total}}^{x}( \mathit{dP}_{c}^{x},\mathit{dT}_{c}^{x} )$. For example, consider the case that: (1) an individual i obtained a negotiation outcome at the agreement price $P_{c}^{i}$ and the agreement time $T_{c}^{i}$ where $U_{T}^{i}$ consists of $U_{\mathit{NP}}^{i}(P_{c}^{i})$ ($<\nobreak U_{\mathit{NP}}^{i}(\mathit{dP}_{c}^{i})$) and $U_{\mathit{NS}}^{i}(P_{c}^{i})$ ($>\nobreak U_{\mathit{NS}}^{i}(\mathit{dT}_{c}^{i})$) and (2) another individual j obtained a negotiation outcome at the agreement price $P_{c}^{j}$ and the agreement time $T_{c}^{j}$ where $U_{\mathit{Total}}^{j}$ consists of $U_{\mathit{NP}}^{j}(P_{c}^{j})$ ($> U_{\mathit{NP}}^{j}(\mathit{dP}_{c}^{j})$) and $U_{\mathit{NS}}^{j}(T_{c}^{j})$ ($< U_{\mathit{NS}}^{j}(\mathit{dT}_{c}^{j})$). Then, there are many possible combinations of values for $(P_{c}^{i},T_{c}^{i})$ and $(P_{c}^{j},T_{c}^{j})$ where the fitness $\mathit{fit}(P_{c}^{i},T_{c}^{i})$—set as $U_{\mathit{Total}}^{i}(P_{c}^{i},T_{c}^{i}) = w_{\mathit{NP}}^{x} \times U_{\mathit{NP}}^{i}(P_{c}^{i}) + w_{\mathit{NS}}^{x} \times U_{\mathit{NS}}^{i}(T_{c}^{i})$—is equal to $\mathit{fit}(P_{c}^{j},T_{c}^{j})$—set as $U_{\mathit{Total}}^{j}(P_{c}^{j},T_{c}^{j}) = w_{\mathit{NP}}^{x} \times U_{\mathit{NP}}^{j}(P_{c}^{j}) + w_{\mathit{NS}}^{x} \times U_{\mathit{NS}}^{j}(T_{c}^{j})$, in which both are equal to $U_{\mathit{Total}}^{x}(\mathit{dP}_{c}^{x},\mathit{dT}_{c}^{x})$. Hence, the ambiguity (in representing $\mathit{fit}(P_{c}^{x},T_{c}^{x})$ that needs to be maximized at $U_{\mathit{Total}}^{x}(\mathit{dP}_{c}^{x},\mathit{dT}_{c}^{x})$) makes the coupled fitness landscape for the coevolutionary learning more complicated (to evolve effective solutions), which leads to the deterioration of intensification capability of both EDAs. In the case of the fitness function adopting $U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}$, the problem of such ambiguity may be solved to some extent compared to the fitness function adopting $U_{\mathit{Total}}^{x}$. This is because fitness values can be solely determined by price utility if the agreement times of all individuals are made in $[0, \mathit{dT}_{c}^{x}]$ in which speed utilities of all individuals are the same. However, we cannot always guarantee that all agreement times are within $[0, \mathit{dT}_{c}^{x}]$ because these depend on the negotiation parameter settings and the evolution of EDAs. Hence, although the fitness function using $U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}$ is more robust than the fitness function adopting $U_{\mathit{Total}}^{x}$ in charactering more promising solutions, both cannot solve the problem of the ambiguity (completely) and prevent deterioration of intensification capability. From this analysis, we found that the fitness functions (as the form of the total utility functions in $U_{\mathit{Total}}^{x}$ and $U_{\mathit{Total}\text{-}\mathit{mapped}}^{x}$) are not appropriate for coevolving effective PS-optimizing negotiation strategies (within reasonable numbers of generations).

For designing an effective fitness function, this work uses price and speed likelihood functions instead of using price and speed utility functions directly. Given $w_{\mathit{NP}}^{x}$ and $w_{\mathit{NS}}^{x}$, the price likelihood function ($\mathit{Lh}_{\mathit{NP}}^{x}$) measures the likelihood between P _x and $\mathit{dP}_{c}^{x}$ and the speed likelihood function ($\mathit{Lh}_{\mathit{NS}}^{x}$) measures the likelihood between T _x and $\mathit{dT}_{c}^{x}$.

$\mathit{Lh}_{\mathit{NP}}^{x}$ for a price P _x is defined as follows:

$$ \mathit{Lh}_{\mathit{NP}}^{x}(P_{x}) = \begin{cases} \frac{1}{\sqrt{\pi \cdot \rho_{\mathit{NP}}}} \exp\bigl( - \frac{\frac{| \mathit{RP}_{x} - P_{x} |}{| \mathit{RP}_{x} - \mathit{IP}_{x} |} - w_{\mathit{NP}}^{x}}{\rho_{\mathit{NP}}} \bigr)^{2},\\ \quad \mbox{if an agreement is reached}, \\ 0, \quad \mbox{otherwise}, \\ \end{cases} $$

(14)

where |RP _x−P _x|/|RP _x−IP _x| is a relative position of P _x in $\mathit{AccZ}_{x}^{\mathit{NP}}$. As P _x is close to $\mathit{dP}_{c}^{x}, | \mathit{RP}_{x} - P_{x} |/| \mathit{RP}_{x} - \mathit{IP}_{x} | - w_{\mathit{NP}}^{x}$ will have a smaller value; hence, a higher value of $\mathit{Lh}_{\mathit{NP}}^{x}$ will be obtained. Through empirical studies, the deviation of $\mathit{Lh}_{\mathit{NP}}^{x}$ (i.e., the shape of $\mathit{Lh}_{\mathit{NP}}^{x}$) is designed to be very narrow by normalizing it with $\rho_{\mathit{NP}} = w_{\mathit{NP}}^{x}/100$ in which ρ _NP can be considered as a weighting factor to put more emphasis on P _x (i.e., for achieving a higher value of $\mathit{Lh}_{\mathit{NP}}^{x}$) if it is closer to $\mathit{dP}_{c}^{x}$. Instead of using price utility in (4), (14) measures the closeness of P _x from $\mathit{dP}_{c}^{x}$ using the Gaussian likelihood function designed to maximize $\mathit{Lh}_{\mathit{NP}}^{x}(P_{x})$ at $P_{x} = \mathit{dP}_{c}^{x}$.

$\mathit{Lh}_{\mathit{NS}}^{x}$ for a negotiation time T _x is defined as follows:

$$ \mathit{Lh}_{\mathit{NS}}^{x}(T_{x}) = \begin{cases} \frac{1}{\sqrt{\pi \cdot \rho_{\mathit{NS}}}} \exp\bigl( - \frac{( 1.0 - \frac{T_{x}}{\tau_{x}} ) - w_{\mathit{NS}}^{x}}{\rho_{\mathit{NS}}} \bigr)^{2}, \\ \quad \mbox{if an agreement is reached}, \\ 0, \quad \mbox{otherwise}, \\ \end{cases} $$

(15)

where (1.0−T _x/τ _x) is a relative position of T _x in $\mathit{AccZ}_{x}^{\mathit{NT}}$. As T _x is close to $\mathit{dT}_{c}^{x}, (1.0 - T_{x}/\tau_{x}) - w_{\mathit{NS}}^{x}$ will have the smaller value; hence, a higher value of $\mathit{Lh}_{\mathit{NS}}^{x}$ will be obtained. Through empirical studies, the deviation of $\mathit{Lh}_{\mathit{NS}}^{x}$ is designed to be not very narrow (compared to the deviation of $\mathit{Lh}_{\mathit{NP}}^{x}$) by normalizing it with $\rho_{\mathit{NS}} = w_{\mathit{NS}}^{x}$. ρ _NS has a large value to put less emphasis on $\mathit{dT}_{c}^{x}$ because a shorter agreement time will be better for both B and S; however, setting ρ _NS to be too large will slow down the coevolution speed because large possible solution candidates can slow down coevolution speed, which affects the intensification capability of EDAs. Instead of using speed utility in (5) or (10), (15) measures the closeness of T _x from $\mathit{dT}_{c}^{x}$ using the Gaussian likelihood function designed to maximize $\mathit{Lh}_{\mathit{NS}}^{x}(T_{x})$ at $T_{x} = \mathit{dT}_{c}^{x}$. In regard to optimizing negotiation speed, a special mapping function $f_{t\text{-}\mathit{map}}^{x}(T_{c}^{x})$ for negotiation time was adopted and is defined as follows:

$$ f_{t\text{-}\mathit{map}}^{x}\bigl(T_{c}^{x}\bigr) = \begin{cases} \mathit{dT}_{c}^{x}, & \mbox{if }T_{c}^{x} \in [0, \mathit{dT}_{c}^{x}] ,\\ T_{c}^{x}, & \mbox{otherwise}. \\\end{cases} $$

(16)

This is essential to assist and realize the coevolution of effective PS-optimizing negotiation strategies for both B and S because: (1) all $T_{c}^{x}$ in $[0, \mathit{dT}_{c}^{x}]$ satisfy the preference of negotiation speed and (2) in general, a shorter negotiation time will be better than a longer negotiation time.

Finally, using $\mathit{Lh}_{\mathit{NP}}^{x}$ in (14) and $\mathit{Lh}_{\mathit{NS}}^{x}$ in (15) together with $f_{t\text{-}\mathit{map}}^{x}$ in (16), the final proposed fitness function for EDAs is defined as follows:

(17)

The more $\mathit{Lh}_{\mathit{NP}}^{x}(P_{c}^{x})$ is close to $\mathit{Lh}_{\mathit{NP}}^{x}(\mathit{dP}_{c}^{x})$ and $w_{\mathit{NP}}^{x}$ is large, the more the value of the exponential function for price in (17) is large. Similarly, the more $\mathit{Lh}_{\mathit{NS}}^{x}(T_{c}^{x})$ is close to $\mathit{Lh}_{\mathit{NS}}^{x}(\mathit{dT}_{c}^{x})$ and $w_{\mathit{NS}}^{x}$ is large, the more the value of the exponential function for negotiation speed in (17) is large. Therefore, $\mathit{fit}(P_{c}^{x},T_{c}^{x})$ emphasizes the exponential functions for price and negotiation speed by linearly combining them with $w_{\mathit{NP}}^{x}$ and $w_{\mathit{NS}}^{x}$, respectively.

5 Empirical evaluation and analysis

In this section, we first detail the methodology for analyzing the performance of coevolved PS-optimizing negotiation strategies (of PS-optimizing agents B and S) using S-EDAs or ID²C-EDAs under an incomplete information setting. We then proceed to the actual empirical study of the coevolved PS-optimizing negotiation strategies by comparing them with the optimal negotiation strategies obtained under a complete information setting. Furthermore, we compare and analyze the empirical results of S-EDAs and ID²C-EDAs obtained from coevolutionary learning.

5.1 Methodology

For PS-optimizing agents B and S, optimal PS-optimizing negotiation outcomes and the corresponding negotiation strategies are subject to the size of AgZ ^NP as described in Sect. 4.1. Hence, two groups of experiments were designed to evaluate the performance of PS-optimizing negotiations for the different negotiation situations: (1) Type-I experiments for the negotiation settings with sufficient AgZ ^NP, and (2) Type-II experiments for the negotiations with insufficient AgZ ^NP. In each group of experiments, the coevolution results using ID²C-EDAs are compared with those of S-EDAs to evaluate their coevolution performance in finding effective PS-optimizing negotiation strategies.

5.1.1 Testbed

To evaluate the performance of the coevolved PS-optimizing negotiation strategies of the proposed PS-optimizing agents, a simulation testbed consisting of a virtual negotiation environment for supporting population-based negotiations in an incomplete information setting using EDAs was implemented using C++. Both POP _B and POP _S were evolved using either S-EDAs or ID²C-EDAs as described in Sect. 3. Coevolutionary interaction is achieved from the (population-based) negotiations between POP _B and POP _S as described in Sect. 4.2. In addition, each POP _B and POP _S has a controller that: (1) generates agents and initializes their negotiation parameters (such as preferences of price and negotiation speed, IP, RP, deadlines and negotiation strategies), (2) manages the information of the matched pairs of agents between POP _B and POP _S, (3) monitors the termination status of its EDA and shares the information with the controller of the opponent population to check the termination conditions for the coevolution, which is to terminate both EDAs simultaneously, (4) synchronizes PS-optimizing negotiations and handles message passing and payment transfer between all matched agents, and (5) reinitializes its population and restarts evolution of EDAs when the CNT ^max is reached.

The experiments were conducted on a computer with Windows XP (32-bit) service pack 3, Intel^® Core™2 Duo CPU E8500 @ 3.16 GHz & 3.17 GHz and 4 GB RAM.

5.1.2 Experimental settings

The input parameters for two types of PS-optimizing negotiation agents B and S are described in Table 2.

Table 2 Parameter setting for negotiation agents B and S

Full size table

The price ranges (determined by IPs and RPs) and strategy ranges for B and S were adopted for the purpose of the Type-I and Type-II experiments. The deadline range of agents was grouped into three categories empirically: Long when negotiation rounds are in [16,30]; Moderate (denoted as Mid in Table 2) when negotiation rounds are in [31,60]; and Short when negotiation rounds are in [31,60]. The deadline ranges [1,15] and [121,∞] were not considered because repeated experimental tuning showed that the average success rate of PS-optimizing negotiations is very low when the agents adopt their deadlines in the ranges. Due to space limitation, three representative values from the three categories are chosen, respectively: 20 for Short, 50 for Mid, and 100 for Long. Based on a bargaining advantage in terms of time, there exist six representative deadline combinations between B and S as follows:

(1)
(Long, Mid), (Mid, Short) and (Long, Short) for the case that B has a longer deadline than S,
(2)
(Mid, Long), (Short, Mid) and (Short, Long) for the case that S has a longer deadline than B.

However, since cases 1 and 2 are symmetrical and the similar analysis can be applied to both cases, we only describe the results for the case 1. In the negotiations between B and S, the experiments were set such that S starts its negotiation first by proposing its first proposal to B.

Different emphases on price and negotiation speed (i.e., different weightings between w _p and w _s) lead to different groups of preference criteria. Each PS-optimizing agent x has three representative preference criteria as follows:

(1)
more-P-optimizing case: $(w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x}) = (0.7, 0.3)$
(2)
exact-PS-optimizing case: $(w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x}) = (0.5, 0.5)$
(3)
more-S-optimizing case: $(w_{\mathit{NP}}^{x}, w_{\mathit{NS}}^{x}) = (0.3, 0.7)$

Hence, the following nine combinations are possible between B and S as described in Table 3.

Table 3 Combinations of preference criteria of B and S

Full size table

The experimental parameter settings for S-EDAs and ID²C-EDAs are described in Table 4. We used the experimentally tuned parameters from [8] and [10].

Table 4 Parameter settings for S-EDAs and ID²C-EDAs

Full size table

5.1.3 Description of results

Even though extensive simulations were carried out for all the situations, only representative results are presented in this section due to space limitation. Empirical results for the Type-I and Type-II experiments are shown in Tables 6 to 8 and Tables 9 to 11, respectively. All the values in the experimental tables were averaged based on more than 10³ runs. The symbols for the results in Tables 6 to 11 and their descriptions are summarized in Table 5. In the results of Tables 6 to 11, the rows for the performance measures (see Sect. 5.1.4) are shaded; in addition, the rows for the results achieved from ID²C-EDAs are in boldface to discriminate and emphasize them with those achieved from S-EDAs.

Table 5 Summary of notation for the results

Full size table

Table 6 Results of Type-I experiments in (Long, Mid)

Full size table

5.1.4 Performance measure

Under a complete information setting, optimal PS-optimizing negotiation outcomes and negotiation strategies are achieved as follows: (1) $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ are obtained from equilibrium analyses using Theorems 3 to 5 and (2) from the obtained $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$, $\lambda_{B}^{\mathit{PS}\text{-}\mathit{opt}}$ and $\lambda_{S}^{\mathit{PS}\text{-}\mathit{opt}}$ are calculated using (12) and (13), respectively.

Under an incomplete information setting, if EDAs carry out balanced coevolution (for both POP _B and POP _S), the agreement price and agreement time obtained for B will be very close to the agreement price and agreement time obtained for S, respectively. This is because the optimal agreement price and agreement time of B should be equal to those of S, respectively. To verify the effectiveness of the coevolved PS-optimizing negotiation strategies of both B and S, we compare the coevolution results (obtained from coevolutionary learning using either S-EDAs or ID²C-EDAs) with the optimal results (obtained under a complete information setting) by examining the following two conditions: (1) closeness to the optimum: the obtained PS-optimizing negotiation outcomes and coevolved PS-optimizing negotiation strategies should be close to the optimal results and (2) balanced coevolution: the obtained negotiation outcomes of B should be the same as those of S as a result of coevolutionary learning.

First, for measuring closeness between the coevolution results and optimal results, the following three types of closeness metric are devised for each type of EDA:

(1)
$\delta_{\mathit{dist}}^{P_{c}^{x}}$ measures the closeness between $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and the agreement price $\bar{P}_{c}^{x\text{ (S-EDA)}}$ (respectively, $\bar{P}_{c}^{x\text{ (ID$^{2}$C-EDA)}}$) obtained from coevolutionary learning as follows:

$\delta_{\mathit{dist}}^{P_{c}^{x}}\text{(S-EDA)} =|\bar{P}_{c}^{x\text{ (S-EDA)}} - P_{c}^{\mathit{PS}\text{-}\mathit{opt}}|$ for the coevolution using S-EDA,

$\delta_{\mathit{dist}}^{P_{c}^{x}}\text{(ID$^{2}$C-EDA)} =|\bar{P}_{c}^{x\text{ (ID$^{2}$C-EDA)}} - P_{c}^{\mathit{PS}\text{-}\mathit{opt}}|$ for the coevolution using ID²C-EDA

where if S-EDA (respectively, ID²C-EDA) has obtained $\bar{P}_{c}^{x\text{ (S-EDA)}}$ (respectively, $\bar{P}_{c}^{x\text{ (ID$^{2}$C-EDA)}}$) that is the same as $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ from coevolutionary learning, then $\delta_{\mathit{dist}}^{P_{c}^{x}}\text{(S-EDA)}$ (respectively, $\delta_{\mathit{dist}}^{P_{c}^{x}}\text{(ID$^{2}$C-EDA)}$) will be 0; otherwise, $\delta_{\mathit{dist}}^{P_{c}^{x}}\text{(S-EDA)}$ (respectively, $\delta_{\mathit{dist}}^{P_{c}^{x}}\text{(ID$^{2}$C-EDA)}$) will be larger than 0.

(2)
$\delta_{\mathit{dist}}^{T_{c}^{x}}$ measures the closeness between $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and the agreement time $\bar{T}_{c}^{x\text{ (S-EDA)}}$ (respectively, $\bar{T}_{c}^{x\text{ (ID$^{2}$C-EDA)}}$) obtained from coevolutionary learning as follows:

$\delta_{\mathit{dist}}^{T_{c}^{x}}\text{(S-EDA)} = \bar{T}_{c}^{x\text{ (S-EDA)}} - \max(T_{c}^{\mathit{PS}\text{-}\mathit{opt}})$ for the coevolution using S-EDA,

$\delta_{\mathit{dist}}^{T_{c}^{x}}\text{(ID$^{2}$C-EDA)} = \bar{T}_{c}^{x\text{ (ID$^{2}$C-EDA)}} - \max(T_{c}^{\mathit{PS}\text{-}\mathit{opt}})$ for the coevolution using ID²C-EDA

where we consider its maximum value $\max(T_{c}^{\mathit{PS}\text{-}\mathit{opt}})$ as the basis of closeness because $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ can be represented as a range of negotiation time. Hence, if S-EDA (respectively, ID²C-EDA) has obtained $\bar{T}_{C}^{x\text{ (S-EDA)}}$ (respectively, $\bar{T}_{c}^{x\text{ (ID$^{2}$C-EDA)}}$) belonging to $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$, then $\delta_{\mathit{dist}}^{T_{c}^{x}}\text{(S-EDA)}$ (respectively, $\delta_{\mathit{dist}}^{T_{c}^{x}}\text{(ID$^{2}$C-EDA)}$) will be less than or equal to 0 (i.e., negative real numbers or 0); otherwise, $\delta_{\mathit{dist}}^{T_{c}^{x}}\text{(S-EDA)}$ (respectively, $\delta_{\mathit{dist}}^{T_{c}^{x}}\text{(ID$^{2}$C-EDA)}$) will have positive real numbers.

(3)
$\delta_{\mathit{dist}}^{\lambda _{x}}$ measures the closeness between $\lambda_{x}^{\mathit{PS}\text{-}\mathit{opt}}$ and coevolved PS-optimizing negotiation strategy $\bar{\lambda}_{x}^{\mathit{PS}\text{-}\mathit{opt}\text{ (S-EDA)}}$ (respectively, $\bar{\lambda}_{x}^{\mathit{PS}\text{-}\mathit{opt}\text{ (ID$^{2}$C-EDA)}}$) from coevolutionary learning as follows:

$\delta_{\mathit{dist}}^{\lambda _{x}}\text{(S-EDA)} = \bar{\lambda}_{x}^{\mathit{PS}\text{-}\mathit{opt}\text{ (S-EDA)}} - \max(\lambda_{x}^{\mathit{PS}\text{-}\mathit{opt}})$ for the coevolution using S-EDA,

$\delta_{\mathit{dist}}^{\lambda _{x}}\text{(ID$^{2}$C-EDA)} = \bar{\lambda}_{x}^{\mathit{PS}\text{-}\mathit{opt}\text{ (ID$^{2}$C-EDA)}} - \max(\lambda_{x}^{\mathit{PS}\text{-}\mathit{opt}})$ for the coevolution using ID²C-EDA

where we consider the maximum value $\max(\lambda_{x}^{\mathit{PS}\text{-}\mathit{opt}})$ as the basis of closeness because $\lambda_{x}^{\mathit{PS}\text{-}\mathit{opt}}$ can be represented as a range of strategy. Hence, if S-EDA (respectively, ID²C-EDA) has coevolved $\bar{\lambda}_{x}^{\mathit{PS}\text{-}\mathit{opt}\text{ (S-EDA)}}$ (respectively, $\bar{\lambda}_{x}^{\mathit{PS}\text{-}\mathit{opt}\text{ (ID$^{2}$C-EDA)}}$) belonging to $\lambda_{x}^{\mathit{PS}\text{-}\mathit{opt}}$, then $\delta_{\mathit{dist}}^{\lambda _{x}}\text{(S-EDA)}$ (respectively, $\delta_{\mathit{dist}}^{\lambda _{x}}\text{(ID$^{2}$C-EDA)}$) will be less than or equal to 0 (i.e., negative real numbers or 0); otherwise, $\delta_{\mathit{dist}}^{\lambda _{x}}\text{(S-EDA)}$ (respectively, $\delta_{\mathit{dist}}^{\lambda _{x}}\text{(ID$^{2}$C-EDA)}$) will have positive real numbers.

Second, for checking whether balanced negotiation outcomes were achieved, we will simply compare the closeness metric of B with that of S for the obtained agreement prices and agreement times, respectively, as follows:

(1)
$\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}$ are compared for the coevolution using S-EDA,

$\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}$ are compared for the coevolution using ID²C-EDA

where if $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}$ (respectively, $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}$) have the same values, then it is determined that S-EDAs (respectively, ID²C-EDAs) achieved balanced agreement prices (for both B and S); otherwise, it is determined that S-EDAs (respectively, ID²C-EDAs) achieved biased agreement prices (for both B and S).

(2)
$\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}$ are compared for the coevolution using S-EDA,

$\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(ID$^{2}$C-EDA)}$ are compared for the coevolution using ID²C-EDA

where if $\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}$ (respectively, $\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(ID$^{2}$C-EDA)}$) have the same values, then it is determined that S-EDAs (respectively, ID²C-EDAs) achieved balanced agreement times (for both B and S); otherwise, it is determined that S-EDAs (respectively, ID²C-EDAs) achieved biased agreement times (for both B and S).

In addition, for measuring and comparing the coevolution performance between S-EDAs and ID²C-EDAs in terms of the number of generations, we can use N ^Gen and $N^{\mathit{Re\_init}}$ together as a performance measure. This is because the total average number of generations for coevolutionary learning is determined as $(N^{\mathit{Re\_init}} \times G^{\max} ) + N^{\mathit{Gen}}$. Furthermore, $N^{\mathit{Re\_init}}$ gives additional information about the coevolution capability of S-EDAs and ID²C-EDAs. For example, there can be the case that $\bar{N}^{\mathit{Re\_init}(\text{S-EDA})}$ (respectively, $\bar{N}^{\mathit{Re\_init}(\mathrm{ID}^{2}\text{C-EDA})}$) has reached CNT ^max, which means that S-EDAs (respectively, ID²C-EDAs) does not have enough coevolution capability for achieving converged solutions. The higher value of $\bar{N}^{\mathit{Re\_init}(\text{S-EDA})}$ (respectively, $\bar{N}^{\mathit{Re\_init}(\mathrm{ID}^{2}\text{C-EDA})}$) indicates insufficient coevolution capability of S-EDAs (respectively, ID²C-EDAs) for achieving converged solutions.

5.2 Observations and analyses

5.2.1 Results of Type-I experiments

For (Long, Mid) in Table 6, (Mid, Short) in Table 7 and (Long, Short) in Table 8, B has a bargaining advantage over S in terms of time. Furthermore, in (Long, Mid), (Mid, Short) and (Long, Short), $\mathit{dP}_{c}^{B}$ and $\min(\mathit{dT}_{c}^{B}, \mathit{dT}_{c}^{S})$—which determine the optimum agreement points of all optimization modes 1 to 9—are in the range of NSS. Hence, we calculated both $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ from Theorem 3, $\lambda_{B}^{\mathit{PS}\text{-}\mathit{opt}}$ from (12) and $\lambda_{S}^{\mathit{PS}\text{-}\mathit{opt}}$ from (13). From the results in Tables 6 to 8, the following two observations were drawn.

Table 7 Results of Type-I experiments in (Mid, Short)

Full size table

Table 8 Results of Type-I experiments in (Long, Short)

Full size table

Observation 1

While S-EDAs generally could not obtain effective PS-optimizing negotiation outcomes and coevolve effective PS-optimizing negotiation strategies in most of the optimization modes, ID²C-EDAs generally obtained effective PS-optimizing negotiation outcomes and coevolved effective PS-optimizing negotiation strategies in all the optimization modes.

Analysis

It can be observed from Tables 6 to 8 that: (1) the values of $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}$ were large (especially, for S), and the difference between $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}$ was too large to be effective agreement prices in most of the optimization modes, (2) the values of $\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}$ were large positive real numbers and the difference between $\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}$ was too large to be effective agreement times in most of the optimization modes, and (3) the values of $\delta_{\mathit{dist}}^{\lambda _{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{\lambda _{S}}\text{(S-EDA)}$ were large positive real numbers (especially, for S) to be effective PS-optimizing negotiation strategies in most of the optimization modes. These indicate that S-EDAs generally: (1) achieved both ineffective agreement prices (especially, for S) and agreement times and (2) coevolved ineffective PS-optimizing negotiation strategies in terms of both closeness to the optimum and balanced coevolution. This was mainly because of different coevolution speed between POP _B and POP _S; since POP _B has converged to around an optimal value more rapidly than POP _S,POP _S could not have sufficient diversity (of opponents) to optimize its solutions in making coevolutionary interactions and hence, S-EDAs could not achieve effective and balanced solutions. This phenomenon can be considered as the premature convergence in the coevolution situation.

It can be also observed from Tables 6 to 8 that: (1) both the values of $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}$ were either small or optimal, and the difference between $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}$ was also small in all the optimization modes; (2) the values of $\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(ID$^{2}$C-EDA)}$ were small or in the range of the optimum and the difference between $\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(ID$^{2}$C-EDA)}$ was also small in most of the optimization modes; and (3) the values of $\delta_{\mathit{dist}}^{\lambda _{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{\lambda _{S}}\text{(ID$^{2}$C-EDA)}$ were small in most of the optimization modes. These indicate that ID²C-EDAs generally could: (1) achieve both effective agreement prices and agreement times and (2) coevolve effective PS-optimizing negotiation strategies in terms of both closeness to the optimum and balanced coevolution. This was because ID²C-EDAs have sufficient capability for achieving close to optimal and balanced solutions for both B and S by dynamically adjusting the degree of intensification and diversification of POP _B and POP _S using DR. Furthermore, by adopting LNS and PR, it is possible for ID²C-EDAs to improve solution accuracy and to avoid inappropriate population configurations, respectively.

Observation 2

While S-EDAs could not coevolve converged populations, ID²C-EDAs generally coevolved converged populations.

Analysis

It can be observed from Tables 6 to 8 that the values of $\bar{N}^{\mathit{Re\_init}(\text{S-EDA})}$ reached CNT ^max in most of the optimization modes while the values of $\bar{N}^{\mathit{Re\_init}(\mathrm{ID}^{2}\text{C-EDA})}$ were much smaller than $\bar{N}^{\mathit{Re\_init}(\text{S-EDA})}$ (and were often close to 0) in all the optimization modes. Hence, comparing to ID²C-EDAs, S-EDAs required an extremely larger number of (total) generations for the coevolution. This indicates that S-EDAs generally did not have enough capability for coevolving converged populations while ID²C-EDAs had enough capability for coevolving converged populations within a reasonable number of generations. The reason that ID²C-EDAs outperformed S-EDAs is mainly due to the innovation of DR (in the coevolution process) which allows ID²C-EDAs to search for promising solutions adaptively. In DR, the diversification procedure helps to avoid premature convergence in a population by maintaining population diversity to a certain level and the refinement procedure helps to achieve optimal solutions by generating promising solutions using regional population history information and replacing less feasible solutions with the generated promising solutions. Furthermore, LNS can contribute to resolving the problem of the late convergence of populations, and PR assists to avoid configuring inappropriate populations in early generations. Hence, by adopting DR together with LNS and PR, each ID²C-EDA is more likely to escape premature convergence and maintain enough population diversity, which enables ID²C-EDAs to coevolve converged populations in the coevolutionary learning.

From these Observations 1 and 2, we can draw the following conclusion for the Type-I experiments.

Conclusion 1

When the negotiation setting with sufficiently large AgZ ^NP is provided for PS-optimizing agents, ID²C-EDAs generally coevolve effective (converged) PS-optimizing negotiation strategies for both B and S while S-EDA generally fails to coevolve such negotiation strategies within reasonable numbers of generations in most of the cases.

5.2.2 Results of Type-II experiments

For (Long, Mid) in Table 9, (Mid, Short) in Table 10 and (Long, Short) in Table 11, B has a bargaining advantage over S in terms of time. In addition, in (Long, Mid), (Mid, Short) and (Long, Short), $\mathit{dP}_{c}^{B}$ and min($\mathit{dT}_{c}^{B}, \mathit{dT}_{c}^{S}$) of the optimization modes 1 to 6 are not in the range of NSS. Hence, we calculated both $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ from Theorem 5, $\lambda_{B}^{\mathit{PS}\text{-}\mathit{opt}}$ from (12) and $\lambda_{S}^{\mathit{PS}\text{-}\mathit{opt}}$ from (13). In contrast, in (Long, Mid), (Mid, Short) and (Long, Short), $\mathit{dP}_{c}^{B}$ and $\min(\mathit{dT}_{c}^{B}, \mathit{dT}_{c}^{S})$ of the optimization modes 7 to 9 are in the range of NSS. Hence, we calculated both $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ from Theorem 3, $\lambda_{B}^{\mathit{PS}\text{-}\mathit{opt}}$ from (12) and $\lambda_{S}^{\mathit{PS}\text{-}\mathit{opt}}$ from (13). From the results in Tables 9 to 11, the following three observations were drawn.

Table 9 Results of Type-II experiments in (Long, Mid)

Full size table

Table 10 Results of Type-II experiments in (Mid, Short)

Full size table

Table 11 Results of Type-II experiments in (Long, Short)

Full size table

Observation 3

In optimization modes 1 to 6, both S-EDAs and ID²C-EDAs generally achieved effective PS-optimizing negotiation outcomes and coevolved effective PS-optimizing negotiation strategies.

Analysis

From the results of the optimization modes 1 to 6 in Tables 9 to 11, it can be seen that: (1) the values of $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}$ were small and the difference between $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}$ was also small, (2) the values of $\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}$ were in the range of the optimum and the difference between $\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}$ was small, and (3) the values of $\delta_{\mathit{dist}}^{\lambda _{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{\lambda _{S}}\text{(S-EDA)}$ were in the range of the optimum. These indicate that S-EDAs generally could: (1) obtain effective agreement prices and agreement times and (2) coevolve effective PS-optimizing negotiation strategies in terms of both closeness to optimum and balanced coevolution. Similarly, from all the results of the optimization modes 1 to 6 in Tables 9 to 11, it can be seen that: (1) the values of $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}$ were 0 and the difference between $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}$ was also 0; (2) the values of $\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(ID$^{2}$C-EDA)}$ were 0, and the difference between $\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(ID$^{2}$C-}\allowbreak \text{EDA)}$ was also 0; and (3) the values of $\delta_{\mathit{dist}}^{\lambda _{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{\lambda _{S}}\text{(ID$^{2}$C-EDA)}$ were in the range of the optimum. In summary, both S-EDAs and ID²C-EDAs could obtain optimal PS-optimizing negotiation outcomes and coevolve optimal PS-optimizing negotiation strategies for these modes. In the results, the coevolved negotiation agreements were made at RP _S and τ _S. This is because B adopting the time-dependent negotiation strategy achieves all of its payoff at τ _S by accepting S′s final proposal RP _S. Since we designed the fitness function putting more emphasis on optimizing price than optimizing speed by setting $\rho_{\mathit{NP}}=w_{\mathit{NP}}^{x}/100$ and $\rho_{\mathit{NS}} = w_{\mathit{NS}}^{x}$, S-EDAs and ID²C-EDAs are less likely to make price concessions for achieving rapid agreements and strictly hold RP _S as the optimal agreement price. This is the reason of the successful coevolution performance of S-EDAs for the above cases (when it is compared to the Type-I experiments).

Observation 4

In the optimization modes 7 to 9, ID²C-EDAs generally obtained effective PS-optimizing negotiation outcomes and coevolved effective PS-optimizing negotiation strategies while S-EDAs did not.

Analysis

From the results of the optimization modes 7 to 9 in Tables 9 to 11, it can be seen that: (1) the values of $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}$ were large and the difference between $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(S-EDA)}$ was too large to be effective agreement prices; (2) the values of $\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}$ were large positive real numbers and the difference between $\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(S-EDA)}$ was too large to be effective agreement times; and (3) the values of $\delta_{\mathit{dist}}^{\lambda _{B}}\text{(S-EDA)}$ and $\delta_{\mathit{dist}}^{\lambda _{S}}\text{(S-EDA)}$ were large positive real numbers to be effective PS-optimizing negotiation strategies. These indicate that S-EDAs generally could: (1) obtain both ineffective agreement prices and agreement times and (2) coevolve ineffective PS-optimizing negotiation strategies in terms of both closeness to optimum and balanced coevolution. In contrast, from all the results of the optimization modes 7 to 9 in Tables 9 to 11, it can also be observed that: (1) both the values of $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}$ were small and the difference between $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}$ is also small; (2) the values of $\delta_{\mathit{dist}}^{T_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{T_{c}^{S}}\text{(ID$^{2}$C-EDA)}$ were large positive real numbers and the difference between $\delta_{\mathit{dist}}^{P_{c}^{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{P_{c}^{S}}\text{(ID$^{2}$C-EDA)}$ was small; and (3) the values of $\delta_{\mathit{dist}}^{\lambda _{B}}\text{(ID$^{2}$C-EDA)}$ and $\delta_{\mathit{dist}}^{\lambda _{S}}\text{(ID$^{2}$C-EDA)}$ were small. These indicate that ID²C-EDAs generally could: (1) achieve both effective agreement prices and agreement times and (2) coevolve effective PS-optimizing negotiation strategies in terms of both closeness to optimum and balanced coevolution.

Observation 5

S-EDAs could not coevolve converged populations in some cases while ID²C-EDAs generally coevolved converged populations in most of the cases.

Analysis

From the results of optimization modes 1 to 6 in Tables 9 to 11, it can be observed that the values of $\bar{N}^{\mathit{Re\_init}(\text{S-EDA})}$ were 0 in (Long, Mid) and (Mid, Short) but were very high (>6) in (Long, Short). Hence, S-EDAs required an extremely many generations for coevolving converged populations in (Long, Short). This is because (Long, Short) has a large search space compared to (Long, Mid) and (Mid, Short). These indicate that the search capability of S-EDAs was deteriorated in the large search space of (Long short). In contrast, it can also be observed from the results of optimization modes 1 to 6 in Tables 9 to 11 that the values of $\bar{N}^{\mathit{Re\_init}(\mathrm{ID}^{2}\text{C-EDA})}$ were all 0. This indicates that ID²C-EDAs have enough search capability even in the large search space. From most of the results of the optimization modes 7 to 9 in Tables 9 to 11, it can be observed that the values of $\bar{N}^{\mathit{Re\_init}(\text{S-EDA})}$ were 10. This indicates that S-EDAs could not have enough capability for coevolving converged populations within reasonable numbers of generations for these modes. In contrast, from all the results of the optimization modes 7 to 9 in Tables 9 to 11, it can be observed that the values of $\bar{N}^{\mathit{Re\_init}(\mathrm{ID}^{2}\text{C-EDA})}$ were very small. This indicates that ID²C-EDAs have enough capability for coevolving converged populations within reasonable numbers of generations for these modes. A similar analysis used in Observation 2 can be used to explain why ID²C-EDAs outperformed S-EDAs.

From the Observations 3 to 5, we can draw the following conclusion for the Type-II experiments.

Conclusion 2

When the negotiation settings with insufficient AgZ ^NP are provided for PS-optimizing agents, ID²C-EDAs generally coevolve effective converged PS-optimizing negotiation strategies for both B and S within reasonable numbers of generations while S-EDA has a high possibility of failure in coevolving such negotiation strategies.

6 Related works

Since this work mainly focuses on finding effective negotiation strategies for PS-optimizing agents with incomplete information using coevolutionary learning, the related works are approaches using EAs for evolving negotiation strategies.

There are some existing works on using EAs as a decision making component to determine an agent’s optimal negotiation strategy (that ensures reaching an agreement successfully and achieving higher utilities) under incomplete information settings by generating adaptive proposals at every negotiation round (e.g., [18, 39, 40]). However, EAs in this work were used for learning agents’ negotiation strategies to find both agents’ effective negotiation strategies through coevolutionary learning under an incomplete information setting. Hence, this section only introduces and discusses related works on applying EAs to learn effective negotiation strategies.

In [24], Oliver has utilized standard GAs [7] for learning (simple) strategies of agents in which agents use quite simple threshold rules for bargaining and showed that by adopting GAs, agents can learn strategies for simple negotiation games. Whereas empirical studies seem to indicate the agents in [24] are generally successful in learning effective strategies, the research is only limited to learning simple threshold rules; offers are accepted if agents learn strategies having a higher utility over a predefined threshold.

In [23], Matos et al. have also utilized a GA for learning the most successful strategies against different types of opponents in different negotiation situations in which a service-oriented negotiation model in [4] was adopted to determine successful strategies for different types of environment by coevolving negotiation strategies and tactics. Empirical results in [23] are carried out for bilateral negotiations having two issues and showed that the agents adopting the GA are generally effective in evolving effective strategies for different negotiation circumstances. Nevertheless, the approach used in [23] has a serious limitation in that it requires a centralized coevolution model where complete information about each agent is assumed for evolving populations using one GA. Furthermore, such assumption is not realistic in many practical negotiation systems in which agents generally have incomplete information about each other.

In [13], Jin and Tsang have utilized a genetic programming (GP) for comparing the evolved results achieved from the GP with sub-game perfect equilibrium (SPE) solutions from game-theoretic analysis for complete information bargaining problems and showed that GP results achieved approximate solutions to the SPE solutions. Later, in [12], Jin has extended the simple bargaining problems to incomplete information bargaining problems and showed that the GP was capable of achieving reasonably good solutions. Similar to [23], the works [13] and [12] also have the problem of the centralized coevolution model for the coevolutionary learning using one GP.

This paper significantly and considerably extended the previous works reported in [8, 9, 32] and [10].

In [32], Sim has utilized an EDA (specifically, UMDA_c) for coevolving effective negotiation strategies of agents having difference preference criteria for optimizing price and optimizing negotiation speed and it seems that the preliminary empirical results in [32] showed that the EDA was capable of coevolving PS-optimizing negotiation strategies of agents for P-optimizing negotiation. The fitness function used in [32] was the (total) utility function, which is similar to (7) in this work, consisting of both price and speed utility functions in which each weighting factor was incorporated to its corresponding utility function. However, using the coevolved results in [32], it is difficult to investigate convergence of the EDA because the results have no information about generations required for achieving converged populations. Later, in the extended work [9], Gwak and Sim have found the problem of the fitness function in [32] and have devised new fitness functions based on measuring the difference between: (1) the ratio of the price weighting factor to the time weighting factor and (2) the corresponding ratio of price utility to speed utility. Although the fitness function in [32] has some ambiguity in defining better negotiation solutions with higher fitness in the composite utility space (as described in Sect. 4.1), the fitness functions in [9] are more effective in defining better negotiation solutions. This is because they can differentiate negotiation strategies with the given ratio of the price weighting to the speed weighting from others. Empirical results in [9] showed that the devised fitness functions outperform the fitness function used in [32]. Furthermore, comparing the coevolution performance between conventional GA and EDA for finding effective negotiation strategies, it can be found that both conventional GA and EDA have limited performance for coevolving effective negotiation strategies.

In [8] and [10], Gwak and Sim empirically have proved that conventional GA and EDA generally could not achieve effective (or near-optimal) coevolution results for finding optimal P-optimizing negotiation solutions. In addition, under the assumption that dynamic diversity controlling methods can assist EAs to coevolve optimal solutions for both populations (in which both agents adopted different EAs and a decentralized coevolution model was assumed), the DR procedure was devised and two local improvement methods called LNS and PR were also devised for further coevolution performance improvement. If DR together with LNS and PR is incorporated with conventional GA (respectively, conventional EDA), we called it ID²C-GA (respectively, ID²C-EDA). Empirical results showed that ID²C-GA and ID²C-EDA have complementary performance in that one achieved better performance for some cases and the other one achieved better performance for the other cases. Furthermore, since it was also shown that ID²C-EDA has better performance than ID²C-EDA for coevolving optimal strategies in the larger solution space, we adopted ID²C-EDA as the EA model for the coevolutionary learning.

Finally, it is acknowledged that this work significantly and considerably enhances [32] and [9] as well as the closely related works [12, 13, 23, 24] as follows:

1)
In [32] and [9], there is theory only for optimal negotiation solutions of P-optimizing negotiations given as Theorems 1 and 2; however, there is no such theory for the other types of negotiations (e.g., for PS-optimizing negotiations). To this end, this work provides theoretical background of optimal negotiation solutions for PS-optimizing negotiations given as Theorems 3 to 5 by designing optimal PS-optimizing agents with complete information. Hence, it is possible (i) to calculate optimal negotiation solutions for each PS-optimization mode and (ii) to evaluate optimality of coevolved negotiation solutions (under an incomplete information setting) by comparing them with the optimal negotiation solutions.
(2)
Since the fitness function in [9] (showing better performance than the fitness function in [32]) simply measures difference of ratios between weightings and utilities, higher fitness values indicate that the ratios are (much) closer. Furthermore, using the fitness function in [25], it is hard to find direct relationship between fitness and optimal PS-optimizing negotiation solutions achieved from Theorems 3 to 5. Hence, it is required to develop the new fitness function in (17) which is based on composite likelihoods of agreement price in (14) and agreement time in (15).
(3)
Although the previous works [12, 13, 23, 24] and [32] assumed a fully centralized coevolution model using one EA in which complete information about each agent is assumed for evolutionary learning, this work provides the fully decentralized coevolution model in Sect. 4.2 using two EDAs (for populations of B and S, respectively) and a coordinator to share and determine conditions of coevolution termination and inappropriate coevolution. The decentralized coevolution model will be more realistic in simulating negotiations of agents having fully incomplete information about each other.
(4)
Although the experiments in both [32] and [9] were carried out under the assumption that both agents have the same negotiation mode (e.g., if B is exact-PS-optimizing, then S is also exact-PS-optimizing), this work carried out extensive experiments by considering all possible combinations of (representative) PS-optimizing negotiation modes between B and S (see Table 3).
(5)
As shown in [8] and [10], conventional EAs such as S-GAs and S-EDAs has some drawbacks for coevolutionary learning due to premature convergence and biased coevolution effects, which can be also observed from the results in [32] and [9]. In contrast, ID²C-GAs and ID²C-EDAs have enough coevolution capability for coevolutionary learning because they are augmented with DR together with LNS and PR. Empirical results in Observations 1 to 5 demonstrated that effective (sometimes optimal) PS-optimizing negotiation solutions can be achieved using ID²C-EDAs for the coevolutionary learning while such solutions cannot be achieved using S-EDAs. Hence, this paper can be seen as an extension of [8] and [10] providing empirical evidences of the effectiveness of ID²C-EDAs for coevolutionary learning.

7 Conclusion and future work

Based on the theoretical results obtained in Sect. 4.1 for finding optimal negotiation strategies of PS-optimizing agents with complete information, this work has developed an effective coevolutionary learning mechanism (by adopting ID²C-EDAs) for finding effective PS-optimizing negotiation strategies of PS-optimizing agents with incomplete information. The novel feature and significance of this research is therefore designing and developing negotiation mechanisms that can: (1) optimize both price and negotiation speed of PS-optimizing agents with complete information and (2) coevolve effective, or (near-)optimal, negotiation strategies for PS-optimizing agents with incomplete information.

The contributions of this work are detailed as follows.

(1)
For PS-optimizing negotiation under a complete information setting, this work determines $P_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ and $T_{c}^{\mathit{PS}\text{-}\mathit{opt}}$ (Theorems 3 to 5 in Sect. 4.1) that lead to optimal negotiation strategies for both PS-optimizing agents [(12) and (13) in Sect. 4.1]. Whereas Theorems 1 and 2 are based on Theorems 1 and 2 in [40, pp. 199–200], this research, to the best of the authors’ knowledge, is the earliest work suggesting optimality of agreements between PS-optimizing agents with incomplete information.

(2)

This contribution distinguishes this work from [40] and [5] in that (i) [5] only showed that there are three classes of optimal strategy such as Boulware, Linear and Conceder depending on different negotiation scenarios and (ii) [40] only focused on showing that there is optimal negotiation strategies for both P-optimizing agents in which one agent (having a bargaining advantage over the opponent in terms of time) maximizes its price utility and guarantees that an agreement is reached. The following summarizes the optimality of agreements for P-optimizing and PS-optimizing agents.

Optimal agreement point of P-optimizing agents
When τ _B>τ _S	$\begin{cases} P_{c}^{P\text{-}\mathit{opt}} = \mathit{RP}_{S} \\ T_{c}^{\mathit{PS}\text{-}\mathit{opt}} = \tau_{S} \\ \end{cases}$
When τ _B<τ _S	$\begin{cases} P_{c}^{P\text{-}\mathit{opt}} = \mathit{RP}_{B} \\ T_{c}^{\mathit{PS}\text{-}\mathit{opt}} = \tau_{B} \\ \end{cases}$

Optimal agreement points of PS-optimizing agents
When τ _B>τ _S	$\begin{cases} P_{c}^{\mathit{PS}\text{-}\mathit{opt}} = \max_{\mathit{dP}_{c}^{x}} \{ U_{\mathit{NP}}^{B}(\mathit{dP}_{c}^{B}), U_{\mathit{NP}}^{B}(\mathit{dP}_{c}^{S})\} \\[4pt] T_{c}^{\mathit{PS}\text{-}\mathit{opt}} = [0, \max_{\mathit{dT}_{c}^{x}} \{ U_{\mathit{NS}}^{B}(\mathit{dT}_{c}^{B}), U_{\mathit{NS}}^{B}(\mathit{dT}_{c}^{S})\} ] \\ \end{cases}$
When τ _B<τ _S	$\begin{cases} P_{c}^{\mathit{PS}\text{-}\mathit{opt}} = \max_{\mathit{dP}_{c}^{x}} \{ U_{\mathit{NP}}^{B}(\mathit{dP}_{c}^{B}), U_{\mathit{NP}}^{B}(\mathit{dP}_{c}^{S})\} \\[4pt] T_{c}^{\mathit{PS}\text{-}\mathit{opt}} = [0, \max_{\mathit{dT}_{c}^{x}} \{ U_{\mathit{NS}}^{B}(\mathit{dT}_{c}^{B}), U_{\mathit{NS}}^{B}(\mathit{dT}_{c}^{S})\} ] \\ \end{cases}$

(3)
Whereas several existing works (discussed in Sect. 6) adopt EAs for evolving successful negotiation strategies for agents under different negotiation situations, these works are limited in that: (1) agents mostly did not consider optimization of both price and negotiation speed and (2) centralized coevolution models were used for coevolutionary learning in which complete information settings for agents are generally assumed. However, agents in this work are designed to optimize both price and negotiation speed using coevolutionary learning for an incomplete information setting. Furthermore, we adopted a decentralized coevolution model in which incomplete information settings can be generally assumed.
(4)
In comparison with authors’ previous works [32] and [9], this paper has provided much more detailed and enhanced designs for PS-optimizing agents for both complete and incomplete information settings (Sect. 4).
(5)
A new fitness function was designed and implemented for S-EDAs and ID²C-EDAs and has the following novel features.
1. (a)
  The likelihood based (Gaussian) distance metric was formulated and applied to measure the closeness between the achieved agreement price and time and the desired agreement price and time, respectively.
2. (b)
  The fitness function is composed of a weighted linear combination of individual similarities for price and negotiation speed in which each similarity is weighted by the corresponding preference weight and is also magnified by the weight for further discrimination from the other similarities.
3. (c)
  While the previous fitness functions in [32] and [9] as the form of utility functions lead to the deterioration of intensification capability of both EDAs (Sect. 4.2), the proposed fitness function is effective in coevolving effective negotiation strategies for both PS-optimizing agents.
(6)
Empirical results (Observations 1 to 5) show that (i) ID²C-EDAs significantly outperforms S-EDAs in terms of coevolution performance for achieving close to the optimum and balanced solutions and (ii) throughout the (decentralized) coevolutionary learning, ID²C-EDAs adopting the proposed fitness function generally achieved effective, or (near-)optimal, negotiation strategies for both PS-optimizing agents for various combinations of preference criteria under the negotiation settings having both sufficient and insufficient AgZ ^NP. From these results, we can conclude that ID²C-EDAs are more suitable than S-EDAs for the coevolutionary learning in achieving effective PS-optimizing negotiation strategies. Hence, this work together with [10] can also provide evidences to show the effectiveness of ID²C-EDAs for competitive coevolution of heterogeneous populations.

Finally, the authors acknowledge that although this work develops a coevolutionary learning approach for finding effective PS-optimizing negotiation strategies under an incomplete information setting in which one agent having a bargaining advantage over the other in terms of time, this work in its present form does not deal with the negotiation situation that neither agent has bargaining advantage in terms of time. Hence, extending this work to include the design of PS-optimizing agents neither having a bargaining advantage is on the agenda for future work. Since the focus of this work is designing a PS-optimizing negotiation mechanism, for simplicity, this work only considers bilateral single-issue negotiations between PS-optimizing agents. In addition, ID²C-EDAs were adopted for coevolutionary learning because they have showed better performance in searching larger solution space than ID²C-GAs for competitive coevolution [10]. Therefore, other possible enhancements of this work may include: (1) extending this work to deal with multi-issue negotiations and (2) adopting other possible types of EAs with the dynamic diversity controlling method in this work to compare the coevolution performances and find a more effective and efficient EA model for the coevolutionary learning.

References

Ahn CW, Ramakrishna RS (2008) On the scalability of real-coded Bayesian optimization algorithm. IEEE Trans Evol Comput 12(3):307–322
Article Google Scholar
Ardaiz O, Artigas P, Eymann T, Freitag F, Navarro L, Reinicke M (2006) The catallaxy approach for decentralized economic-based allocation in grid resource and service markets. Appl Intell 25(2):131–145
Article Google Scholar
Axelrod R (1987) The evolution of strategies in the iterated prisoner’s dilemma. In: Davis L (ed) Genetic algorithms and simulated annealing. Morgan Kaufmann, Los Altos
Google Scholar
Faratin P, Sierra C, Jennings NR (1998) Negotiation decision functions for autonomous agents. Robot Auton Syst 24(3):159–182
Article Google Scholar
Fatima S, Wooldridge M, Jennings NR (2005) Bargaining with incomplete information. Ann Math Artif Intell 44(3):207–232
Article MathSciNet MATH Google Scholar
Gatti N, Giunta FD, Marino S (2008) Alternating-offers bargaining with one-sided uncertain deadlines: an efficient algorithm. Artif Intell 172(8–9):1119–1157
Article MATH Google Scholar
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman, Reading
MATH Google Scholar
Gwak J, Sim KM (2010) Novel dynamic diversity controlling EDA and its application to automated bilateral negotiation. In: Proceedings of the IEEE 5th international conference of bio-inspired computing: theories and applications (BIC-TA’10), pp 536–544
Google Scholar
Gwak J, Sim KM (2011) Coevolving negotiation strategies for p-s-optimizing agents. In: Ao S-I, Castillo O, Huang X (eds) Intelligent control and computer engineering. LNEE, vol 70. Springer, Berlin, pp 119–135
Chapter Google Scholar
Gwak J, Sim KM Novel dynamic diversity controlling EAs for coevolving optimal negotiation strategies. Inf Sci (to be submitted)
Jennings NR, Faratin P, Lomuscio AR, Parsons S, Sierra C, Wooldridge M (2001) Automated negotiation: prospects, methods and challenges. Group Decis Negot 10(2):199–215
Article Google Scholar
Jin N (2005) Equilibrium selection by co-evolution for bargaining problems under incomplete information about time preferences. In: Corne D, et al (eds) Proceedings of the 2005 IEEE congress on evolutionary computation (CEC’05), vol 3, pp 2661–2668
Chapter Google Scholar
Jin N, Tsang E (2005) Co-evolutionary strategies for an alternating-offer bargaining problem. In: Proceedings of the IEEE 2005 symposium on computational intelligence and games (CIG’05), pp 211–217
Google Scholar
Jonker CM, Robu V, Treur J (2007) An agent architecture for multi-attribute negotiation using incomplete preference information. Auton Agents Multi-Agent Syst 15(2):221–252
Article Google Scholar
Kraus S (2001) Strategic negotiation in multi-agent environments. MIT Press, Cambridge
Google Scholar
Lai G, Sycara K (2009) A generic framework for automated multi-attribute negotiation. Group Decis Negot 18:169–187
Article Google Scholar
Larrañaga P, Lozano JA (2002) Estimation of distribution algorithms: a new tool for evolutionary computation. Kluwer, Norwell
Book MATH Google Scholar
Lau R (2006) An evolutionary learning approach for adaptive negotiation agents. Int J Intell Syst 21(1):41–72
Article MATH Google Scholar
Lewis PR, Marrow P, Yao X (2010) Resource allocation in decentralised computational systems: an evolutionary market-based approach. Auton Agents Multi-Agent Syst 21(2):143–171
Article Google Scholar
Li C, Li L (2006) Multi economic agent interaction for optimizing the aggregate utility of grid users in computational grid. Appl Intell 25(2):147–158
Article MATH Google Scholar
Li C, Giampapa JA, Sycara K (2006) Bilateral negotiation decisions with uncertain dynamic outside options. IEEE Trans Syst Man Cybern, Part C, Appl Rev 36(1):31–44
Article Google Scholar
Lomuscio A, Wooldridge M, Jennings NR (2003) A classification scheme for negotiation in electronic commerce. Group Decis Negot 12(1):31–56
Article Google Scholar
Matos N, Sierra C, Jennings NR (1998) Determining successful negotiation strategies: an evolutionary approach. In: Proceedings of 3rd international conference of multi-agent systems, pp 182–189
Google Scholar
Oliver JR (1996) A machine-learning approach to automated negotiation and prospects for electronic commerce. J Manag Inf Syst 13(3):83–112
Google Scholar
Pelikan M, Goldberg DE, Lobo FG (2002) A survey of optimization by building and using probabilistic models. Comput Optim Appl 21(1):5–20
Article MathSciNet MATH Google Scholar
Rubinstein A (1985) A bargaining model with incomplete information about time preferences. Econometrica 53(5):1151–1172
Article MathSciNet MATH Google Scholar
Schnizler B, Neumann D, Veit D, Weinhardt C (2008) Trading grid services—a multi-attribute combinatorial approach. Eur J Oper Res 187(3):943–961
Article MATH Google Scholar
Sim KM (2002) A market-driven model for designing negotiation agents. Comput Intell 18(4):618–637
Article MathSciNet Google Scholar
Sim KM (2005) Equilibria, prudent compromises, and the “waiting” game. IEEE Trans Syst Man Cybern, Part B, Cybern 35(4):712–724
Article MathSciNet Google Scholar
Sim KM (2006) G-commerce, market-driven G-negotiation agents and grid resource management. IEEE Trans Syst Man Cybern, Part B, Cybern 36(6):1381–1394
Article Google Scholar
Sim KM (2006) Guest editorial: agent-based grid computing. Appl Intell 25(2):127–129
Article Google Scholar
Sim KM (2008) An evolutionary approach for p-s-optimizing negotiation. In: Proceedings of the 2008 IEEE congress on evolutionary computation (CEC’08), pp 1364–1371
Google Scholar
Sim KM (2009) Agent-based cloud commerce. In: Proceedings of the IEEE international conference on industrial engineering and engineering management (IEEM’09), pp 717–721
Chapter Google Scholar
Sim KM (2010) Grid resource negotiation: survey and new directions. IEEE Trans Syst Man Cybern, Part C, Appl Rev 40(3):245–257
Article MathSciNet Google Scholar
Sim KM (2010) Towards complex negotiation for Cloud economy. In: Proceedings of the 5th international conference on grid and pervasive computing (GPC’10). LNCS, vol 6104, pp 395–406
Chapter Google Scholar
Sim KM (2011) Agent-based cloud computing. IEEE Trans Serv Comput. doi:10.1109/TSC.2011.52 (Special Issue on Cloud Computing). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6042853
Google Scholar
Sim KM, Choi CY (2003) Agents that react to changing market situations. IEEE Trans Syst Man Cybern, Part B, Cybern 33(2):188–201
Article Google Scholar
Sim KM, Shi B (2010) Concurrent negotiation and coordination for controlling grid resource co-allocation. IEEE Trans Syst Man Cybern, Part B, Cybern 40(2):753–766
Google Scholar
Sim KM, Guo Y, Shi B (2007) Adaptive bargaining agents that negotiate optimally and rapidly. In: Proceedings of the IEEE congress on evolutionary computation (CEC’07), pp 1007–1014
Google Scholar
Sim KM, Guo Y, Shi B (2009) BLGAN: Bayesian learning and genetic algorithm for supporting negotiation with incomplete information. IEEE Trans Syst Man Cybern, Part B, Cybern 39(1):198–211
Article Google Scholar
Venugopal S, Chu X, Buyya R (2008) A negotiation mechanism for advance resource reservation using the alternate offers protocol. In: Proceedings 16th international workshop on quality of service (IWQoS, pp 40–49
Google Scholar
Xing H, Qu R (2012) A compact genetic algorithm for the network coding based resource minimization problem. Appl Intell 36(4):809–823
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Mechatronics, Gwangju Institute of Science and Technology, 261 Cheomdan-gwagiro (Oryong-dong), Buk-gu, Gwangju, 500-712, Republic of Korea
Jeonghwan Gwak
School of Computing, The University of Kent, Chatham Maritime, Chatham, Kent, ME4 4AG, UK
Kwang Mong Sim

Authors

Jeonghwan Gwak
View author publications
You can also search for this author in PubMed Google Scholar
Kwang Mong Sim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kwang Mong Sim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gwak, J., Sim, K.M. A novel method for coevolving PS-optimizing negotiation strategies using improved diversity controlling EDAs. Appl Intell 38, 384–417 (2013). https://doi.org/10.1007/s10489-012-0378-4

Download citation

Published: 21 September 2012
Issue Date: April 2013
DOI: https://doi.org/10.1007/s10489-012-0378-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A novel method for coevolving PS-optimizing negotiation strategies using improved diversity controlling EDAs

Abstract

Similar content being viewed by others

A Mediator-Based Agent Negotiation Protocol for Utilities That Change with Time

Effects of GA Based Mediation Protocol for Utilities that Change Over Time

Fitness function shaping in multiagent cooperative coevolutionary algorithms

Explore related subjects

1 Introduction

2 Negotiation models

2.1 Price optimizing negotiation model

Definition 1

Theorem 1

Proof

Theorem 2

Proof

2.2 Price and speed optimizing negotiation model

Definition 2

Design goal

3 Overview of EDAs for coevolutionary learning

3.1 Description of S-EDA

3.2 Description of ID2C-EDA

4 PS-optimizing agents using acceptability zones

4.1 PS-optimizing agents with complete information

Definition 3

Theorem 3

Proof

Theorem 4

Proof

Definition 4

Theorem 5

Proof

4.2 PS-optimizing agents with incomplete information

5 Empirical evaluation and analysis

5.1 Methodology

5.1.1 Testbed

5.1.2 Experimental settings

5.1.3 Description of results

5.1.4 Performance measure

5.2 Observations and analyses

5.2.1 Results of Type-I experiments

Observation 1

Analysis

Observation 2

Analysis

Conclusion 1

5.2.2 Results of Type-II experiments

Observation 3

Analysis

Observation 4

Analysis

Observation 5

Analysis

Conclusion 2

6 Related works

7 Conclusion and future work

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

3.2 Description of ID²C-EDA