1 Introduction

In Peer-to-peer networks, all peers have equal responsibilities. It means that at the same time a peer will act as a client also as a server [1]. When all the peers have the same responsibility and function, the network performance will be reduced. Any peer-to-peer network in which an attempt is made to instead of giving the same responsibilities to all peers, give more responsibility to the peers with higher ability, is called a super peer based network [2]. Peers with higher abilities, are called super peers [3]. Super peer based networks has many advantages such as Scalability, load balancing, and improving performance [4, 5]. Therefore, one of the challenges posed in peer-to-peer networks, which are super peer based, is the issue of selecting the super peer [6]. There are four ways to select the super peer, such as selecting the super peers in a simple method [79], selecting the super peers based on group [5, 1012], selecting the super peers based on Distributed Hash Table [1315], adaptive selection of super peers [1625].

A lot of research has been done on the selection of super peer. Because of the dynamism of these networks, adaptive methods are important. In the adaptive method of selecting super peers, the management algorithm has been continioually tried to select appropriate super peers. In this method, selection of super peers is done according to criteria of peers’ utility such as session time, storage space, processing power, bandwidth, capacity of peer, and workload [20].

In [16], the criteria of selecting a super peer are the capacity of the peers. In this algorithm, the ratio of super peers comparing the clients remains constant. The number of super peer is calculated based on the network characteristics. In [17, 18], the criteria of selecting a super peer are the capacity of the peers. In this algorithm, the capacity of super peers must be greater or equal to all clients. In this algorithm, the capacity of peer remains unchanged during the evolution of the network.

In [19], the criteria of selecting a super peer in SG-2 are the capacity of peer and distance among the peers. The SG-2 algorithm creates an overlay network with a small set of super peers in comparison with the size of the network. SG-2 is robust to removing the super peers but generate manay control messages. In [20], Myconet overlay model is presented, which are based on super peers. This network has characteristics such as self-organizing, resiliency and consistency facing with Churn and fast accesses the high levels of capacity utilization. In Myconet network, The capacity of a super peer is the number of peers that can be handled. In this network, super peers can be in different modes depending on the number of peers that are attached to them. In [21], PPT algorithm has been proposed, which is a super peer selection algorithm. In PPT algorithm, the gossip model has been used to exchange peer information in order to select super peer networks with high capacities.

Gradient topology, based on [22, 23] are made in a way that the utility of the peer is of special importance. The criteria of peers’ utility are determined according to the particular application. Peers with high utility locate in the core of the topology, while peers with low utility locate in the around the core. Selecting neighborhood in gradient topology is based on the utility of the peer. GT algorithms in gradient topology use a fundamental property threshold to select the super peer, which is provided with one threshold and two thresholds. On a threshold, all peers that their utility is more than the threshold become super peer and others become normal peers. Due to continuous change of the threshold and the role of each node from normal peers to super peers and vice versa, large overhead is imposed on the network. To reduce this overhead, the two separate parts of the threshold selection are used to select the super peer. Peers that their utility is higher than the high threshold become super peers and the peers, which their utility is lower than the low threshold, become normal peers. In order to set an appropriate threshold for selecting super peers, we need overall information from the utility of the peers. However, it is noteworthy that overall information from the utility of the peers and calculating the threshold according to that is too costly. This algorithm to maintain the characteristics of the system is based on the gossip model.

In this paper, a delay aware super-peer selection algorithm based on learning automata will be proposed for gradient topology. Note that existing adaptive algorithms, which have been reported to select super peer in gradient topology, are not aware of delays among the peers. This paper is organized as follows. Section 2, an overview of learning automata that is used as the main learning strategy in the proposed algorithm, Sect. 3 states the problem, the proposed algorithm in Sects. 4, 5. Simulation results and in Sect. 6, the Conclusion is expressed.

2 Overview of the Learning Automata

Learning automata [2629], are machines that can perform a finite number of actions. An environment evaluates every selected action, the outcome of evaluation is given to the automata in a positive or negative signal and the answer of the environment will affect on the automata to select the next action. The ultimate goal is that automata should to learn to choose the best action from its own actions. The best action is the one that maximizes the probability of receiving a reward from the environment.

Environment can be shown by E ≡ {α, β, c} in which α ≡ {α1, α2, …, αr} is the Set of inputs, β ≡ {β1, β2, …, βm} Set of outputs of the environment and c ≡ {c1, c2,…, cr} is the set of possibilities of fines. When βi has two values, the environment is called the B-model. In this environment, βi (n) = 1 is assumed as a negative response or failure and βi(n) = 0, as a positive response or success. In an environment of Q, βi can discretely get one of the limited values in the interval [0,1]. In the S Model, βi is a random variable between zero and one βi(n) ε [0,1].

Learning automata are divided into two groups of fixed and variable structure. In what follows, the automata with variable structure will be introduced. Learning automata with variable structure can be shown by the foursome of {α, β, p, T} that α ≡ {α1, α2, …, αr} is a set of automata actions, β ≡ {β1, β2, …, βm} set of automata inputs, p ≡ {p1, p2, …, pr} action probability vector and p(n + 1) = T[α(n), β(n), p(n)] is the learning automata. In this type of learning automata, if the action αi is done in the n-th stage and has received favorable response from the environment, the probability of Pi(n) increases and the other probabilities decreases. However, changes happen in such a way that the sum of Pi(n) always remains constant and equal to one (Fig. 1).

Fig. 1
figure 1

Relationship between learning automata and environment

Fig. 2
figure 2

Selecting the super peer using two thresholds [22]

Fig. 3
figure 3

LAGT.2 algorithm with knowledge of the delay

In Some applications automata with various number of actions is required [28]. This automaton at instant n selects its action only from non-empty subsets V(n) of actions, which are called active actions. Selection of V(n) subset is performed randomly by an external factor. The activity of the automaton is as follows: To choose an action in time n, first calculates the total probability of active measures K(n), and then calculates the vector P(n) according to the formula (1). Then automaton randomly selects a measure from the set of active measures according to the probability vector P(n) and applies on the environment. If the selected action is αi.

$${\text{P}}_{\text{i}} \left({\text{n}} \right) = {\text{Prob}}\left[{\upalpha\left({\text{n}} \right) =\upalpha_{\text{i}} \text{|V}\left({\text{n}} \right)\,{\text{is}}\,{\text{set}}\,{\text{of}}\,{\text{active}}\,{\text{actions}},\,\upalpha_{i} {\mathbf{\in}}\text{V}\left(\text{n} \right)} \right] = {\text{P}}_{\text{i}} \left({\text{n}} \right)/{\text{K}}\left({\text{n}} \right)$$
(1)

After receiving the response, probability vector automata P(n), updates its measures based on Formula 2 in case of receiving award and based on Formula 3 in case of receiving fine [28].

$$\begin{aligned} {\text{Favorable}}\,{\text{response}}:{\text{P}}_{\text{i}} \left({{\text{n}} + 1} \right) & = {\text{P}}_{\text{i}} \left({\text{n}} \right) + {\text{a}}\left({1 - {\text{P}}_{\text{i}} \left({\text{n}} \right)} \right) \\ {\text{P}}_{\text{i}} \left({{\text{n}} + 1} \right) & = {\text{P}}_{\text{j}} \left({\text{n}} \right) + a \cdot {\text{p}}_{\text{i}} \left({\text{n}} \right) \\ \end{aligned}$$
(2)
$$\begin{aligned} {\text{Unfavorable}}\,{\text{response}}:{\text{P}}_{\text{i}} \left({{\text{n}} + 1} \right) & = \left({1 - {\text{b}}} \right){\text{p}}_{\text{i}} \left({\text{n}} \right) \\ {\text{P}}_{\text{i}} \left({{\text{n}} + 1} \right) & = \left({{\text{b}}/{\text{r}} - 1} \right) + \left({1 - {\text{b}}} \right){\text{p}}_{\text{j}} \left({\text{n}} \right) \\ \end{aligned}$$
(3)

Then probability vector automata P(n), using the vector P(n + 1) and in the following day updates the measures.

$$\begin{aligned} {\text{P}}_{\text{j}} \left({{\text{n}} + 1} \right) & = {\text{P}}_{\text{j}} \left({{\text{n}} + 1} \right) \cdot {\text{K}}\left({\text{n}} \right)\,{\text{for}}\,{\text{all}}\,{\text{j}},\upalpha_{\text{j}} \in {\text{V}}\left({\text{n}} \right) \\ {\text{P}}_{\text{j}} \left({{\text{n}} + 1} \right) & = {\text{p}}_{\text{j}} \left({\text{n}} \right)\,{\text{for}}\,{\text{all}}\,\text{j},\upalpha_{\text{j}} \notin {\text{V}}\left({\text{n}} \right) \\ \end{aligned}$$
(4)

In the above equations, appropriate and inappropriate responses, a is reward parameters and b is fine parameter. Based on the values of a and b three modes can be considered, When a and b are equal, the algorithm is called LRP, When b is much smaller than a, the algorithm is called LRεP and when b is equal to zero the algorithm is called LRI.

Learning algorithm of learning automata, in model S − L RP appears as sub vector of automaton measures probabilities [28]. In automata with r action, if the action α i is selected in n-th iteration step and the environment’s answer to that is β i (n), the vector of automata probabilities will be updated according to the following equation.

$$\begin{aligned} p_{i} (n + 1) & = p_{i} (n) + a \cdot (1 - \beta_{i} (n)) \cdot (1 - p_{i} (n)) - a \cdot \beta_{i} (n) \cdot p_{i} (n) \\ p_{j} (n + 1) & = p_{j} (n) - a \cdot (1 - \beta_{i} (n)) \cdot p_{j} (n) + a \cdot \beta_{i} (n) \cdot \left[{\frac{1}{r - 1} - p_{j} (n)} \right] - a \cdot (1 - \beta_{i} (n)) \cdot p_{j} (n)\quad \forall j\quad j \ne i \\ \end{aligned}$$
(5)

3 Statement of the Problem

Recently gradient topology comparing to other mentioned adaptive methods has been used in many applications [3033] (Fig. 2). That is why it is very important to pay attention to the issue. The gradient topology for both peers of p and q if peer utility in p is greater than the utility in q, i.e.: U(p) ≥ U(q), then it will be distFootnote 1 (p, p0) ≤ dist(q, p0). Where dist (x, y) is defined as a measure of distance, i.e. the shortest path between x and y, it is noteworthy that p0, is a peer in network that has the most utility. In this case, the distance between the peer is defined based on utility [22, 23].

In GT algorithm in gradient topology, the issue is that the super peer selection benefit is only done based on utility and is not aware of the delay between super peers and has no mechanism to calculate it, for selection of new super peers. For this reason in order to select super peers, periodically adjusted periodically calculates the adaptive threshold for super peer selection. This calculation provides a network overhead. The overhead caused by network characteristics gossip and estimation of super peer selection threshold. Each peer to become super peer in each period compares its utility with estimated threshold, which is a very costly. In GT algorithms that selection of super peers are done with two thresholds, the high threshold is shown by tu and the low threshold is shown by tl. The distance between the two thresholds is shown by Δ, whose value is obtained from this equation: Δ = tu − tl.

The algorithm for calculating adaptive threshold finally will cause the increase or decrease of super peer selection threshold, a change that will affect the distance between the two thresholds, i.e. Δ. If the distance of Δ increases, the error in the selection of appropriate super peer also increases.

By having fixed threshold there will not be any problem bout calculation overhead and estimation of thresholds but the network must be able to learn adaptively, select the peers from delta distance to become super peers that the criteria of delay is considered in it, in a way that their delay be less than the current super peers.

4 The Proposed Algorithm

Our focus is on a peer-to-peer network based on gradient topology with two threshold and using learning automata to select super peer. In this paper, the learning automata are LRP and the model of environment is s. Initially, the network topology and data structure are described, then we will explain the proposed algorithm (Fig. 3).

  1. 1.

    Create a network based on gradient topology with two thresholds. In the first round, the thresholds are estimated adaptively based on [22, 23], after the first round, instead of recalculating the adaptive threshold, the primary adaptive thresholds remain fixed.

  2. 2.

    The utility of each peer has been considered according to its capacity capacity. In addition, for peers P, utility U(p) is equal to C(p), the number of clients that give service to them.

  3. 3.

    It is assumed that the capacity of each peer is unique i.e. U (p) ≠ U (q) [22, 23].

  4. 4.

    We consider the relationship between all neighboring symmetric in a way that each peer discriminates between his own link and the link of its neighbors [22, 23].

  5. 5.

    All peers which are in society of super peers, have a learning automata with variable actions of [28] operates with r act of {α1, α2, …, αr}. r is the numbers of neighboring super peers, which are located at a distance of Δ, are corresponding to the learning automata.

  6. 6.

    pvector vector, is the vector of probability vector of selection probability of learning automata function where pi is the ai-th probability of function selection or the i-th probability of neighboring selection as a super peer.

  7. 7.

    Each super peer in its neighboring table keeps the number of attached super peers, the number of connected clients also the probability vector of learning automata.

The steps of the proposed algorithm LAGT.2 is as follows:

  • Step 1 Each P super peer, in each period, calculates its delay from its neighbors and keeps it in its neighborhood table.

  • Step 2 Then calculates its average end-to-end delay with its super peers, Averagep (delay).

  • Step 3 If the amount of AverageP (delay) was more than the optimal end-to-end delay, Average*P (delay), in the super peer p:

  • Step 4 Each super peer detects its (r) neighbors, which are located at a Δ distance and are only a Hop away.

  • Step 5 Corresponding automata to each super peer, makes the initial value of function selection probability vector equal to \(p_{i} = \frac{1}{r}\).

  • Step 6 Each super peer’s (p) learning automata using pvector vector, randomly selects one of the neighbors (q) and as shown in the following formula, calculates the response of the environment i.e. the ß of the selected neighbor. The amount of dis is equal to (end-to-end delay), the average roundtrip time between the selected super peer and its selected neighbor.

    $${\ss} = \frac{1}{\text{dis}}$$
    (6)
  • Step 7 The more the amount of calculated ß of the selected neighbors is bigger than 1/Average*P (delay), i.e. the more it closer to 1, more reward it gets and its selection probability vector increases.

  • Step 8 Corresponding automata with super peer, again reviews the probability selecting its neighbors from Δ to become super peer. This step is repeated until one of the neighbors at the previous replication θ, is chosen.

  • Step 9 The peer, which has been selected in the previous replication θ, is selected as super peer and will be replaced by a neighbor of the sightly super peer that had the maximum delay in the neighborhood list in step 1.

All clients of the previous super peer will connect to the new super peer and the new super peer will connect to all super peers that the previous super peer was connected to.

5 Simulation Results

In order to stimulate the proposed super peer selection algorithm, the PeerSim [34] stimulation is used. The proposed algorithm LAGT.2 is compared with the GT algorithm [22, 23]. Table 1 summarizes the simulation assumptions with their initial values.

Table 1 Simulation parameters

The simulation time is equal to 30 time rounds. Each round is equal to a second. Each test is performed for several times and the mean number is calculated.

5.1 Experiment 1

The purpose of this test is to evaluate the proposed algorithm from the perspective of end-to-end delay in super peers. Since the gradient topology is not sensitive to delay and super peers are only selected on the basis of utility, in this study we examine the average of end to end delay in super peers’ community in each round, in the two algorithms of LAGT-2 and GT.

According to the simulation results, as can be seen in Fig. 4, the GT algorithm does not have a fixed approach and improvement in average of end-to-end delay in super peers’ community. However, in the proposed algorithm, due to this algorithm’s awareness of the delay and selecting the appropriate super peer considering the delay, comparing to the GT algorithm, has a far better performance. The vertical axis is in seconds.

Fig. 4
figure 4

The average of end-to-end delay in super peers’ community

5.2 Experiment 2

In this experiment, the optimization of a function of the average end-to-end delay called Opt in super peers’ community by increasing the simulation round is checked. In this experiment, it is assumed that the network has high churn. Optimization of the function of average end-to-end delay is calculated from the following equation.

$${\text{Opt}}\left({{\text{t}} = {\text{current}}} \right) = \frac{{end to end delay \left({t = 0} \right) - end to end delay \left({t = current} \right)}}{{end to end delay \left({t = 0} \right)}}$$
(7)

The proposed LAGT.2 algorithm has better performance than the GT algorithm. In LAGT.2 algorithm, by increasing the number of rounds, the rate for optimization of function of average end-to-end delay in super peers’ community increases but in GT algorithm, no constant progression is visible, because selection of super peers in this algorithm is only based on utility. The following diagram is the average of calculated results (Fig. 5).

Fig. 5
figure 5

Opt function of the average end-to-end

Fig. 6
figure 6

Super peer select error with increasing the Δ distance

Fig. 7
figure 7

The super peers’ capacity in case of removing some percent of super peer

Fig. 8
figure 8

The number of super peers after removal of some percent of super peers

5.3 Experiment 3

In order to reduce the change in the role of each peer to super peer and vice versa, two thresholds were used but the tests has indicated that in GT algorithm by increasing the delta distance, the error rate of super peer selection increases (Fig. 6). However, in the proposed algorithm of LAGT-2 the error rate comparing to GT algorithm has been decreased a lot. The reason for error reduction in the proposed algorithm is that the super peers enjoying the learning automata, consider the delay of the peers in Δ distance and according to the formula ß (6) the peer that comparing to the sightly super peer, has less delay more appropriate to be selected. Super peer selection error is obtained based on the delay from the following formula:

$${\text{Err}}_{\text{d}} = \frac{{\left| {{\mathbf{D}}_{{\mathbf{t}}} {-} {\mathbf{D}}_{{\mathbf{t}}}^{*}} \right|}}{{ {\mathbf{D}}_{{\mathbf{t}}}^{*}}}$$
(8)

Notably, since the GT algorithm is not aware of the delay, thus increasing or decreasing the Δ distance does not have any effect on super peer selection error based on delay (Table 2).

Table 2 Delay parameters

5.4 Experiment 4

In this experiment, we study the capacity of super peers after removing 10% of super peers in around 10 (Fig. 7). Apart from the round 10, we do not have churn. We examine whether LAGT.2 and GT algorithms reach their maximum super peer capacity. In addition, how many rounds does it take in each algorithm to reach the maximum number of super peers? in GT algorithm, it takes about 14 rounds for the super peers to reach the maximize capacity. In the LAGT-2 algorithm, since the algorithm is aware of the delay but is not aware of the removal of super peers, if the average of end to end delay in super peers’ community is more than the appropriate value, algorithm starts to select super peer otherwise does not do anything to select super peer. However, because the super peers will be selected from the community peers, which are the Δ distance, and also because the network topology is based on utility, by selecting super peer in the proposed algorithm, indirectly, the capacity of peer is also considered, but cannot be clearly concluded that the proposed algorithm can reach a maximum capacity of super peers.

5.5 Experiment 5

In this study, we study the number of super peers after removing 10% of super peers in round 10 (Fig. 8). In this experiment, we assume that the network has up to 3500 super peers and in around 10, the slightly percentage of super peers have been removed. Apart from the round 10, we do not have any churn. Threshold in GT algorithm is considered based on the appropriate threshold, Q. Does LAGT.2 algorithm reach the maximum number of super peers? In the GT-Q algorithm after removal of 10% of super peers in round 10, it has taken about 14 rounds for the super peers to reach the sightly Max number. However, LAGT-2 algorithm does not reach the maximum number of super peers. Because if a super peer decides to choose a super peer, it is just because of the end-to-end delay and the previous super peer will replace the new selected super peer.

6 Conclusions

In this paper, an adaptive algorithm for super peer selection called LAGT-2 utilizing learning automata is proposed to manage the gradient topology. Algorithm LAGT-2, unlike GT algorithm in gradient topology that are not aware of super peers’ delay, is the first algorithm that is aware of super peers’ delay. According to the simulation, results show that the proposed algorithm compared to GT, in the average end-to-end delay in super peers’ community, and reducing super peer selection error, has better performance.