Weight-Assignment Last-Position Elimination-Based Learning Automata

An, Haiwei; Di, Chong; Li, Shenghong

doi:10.1007/978-981-13-9409-6_41

Haiwei An⁴⁰,
Chong Di⁴⁰ &
Shenghong Li⁴⁰

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 571))

Included in the following conference series:

International Conference in Communications, Signal Processing, and Systems

24 Accesses

Abstract

Learning Automata (LA) is an adaptive decision-making unit under the reinforcement learning category. It can learn the randomness of the environment by interacting with it and adaptively adjust its behavior to maximize its long-term benefits from the environment. This learning behavior reflects the strong optimization ability of the learning automaton. Therefor LA has been applied in many fields. However, the commonly used estimators in previous LA algorithms have problems such as cold start, and the initialization process can also affect the performance of the estimator. So, in this paper, we improve these two weaknesses by changing the maximum likelihood estimator to a confidence interval estimator, using Bayesian initialization parameters and proposes a new update strategy. Our algorithm is named as weight-assignment last-position elimination-based learning automata (WLELA). Simulation experiments show that the algorithm has higher accuracy and has the fastest convergence speed than various classical algorithms.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A novel estimator based learning automata algorithm

Article 03 October 2014

An Introduction to Learning Automata and Optimization

An Efficient Learning Automaton Scheme for Massive-Action Environments

Keywords

1 Introduction

Learning automaton can be seen as an adaptive decision-making unit, It can constantly interact with the random environment to adjust its choices to maximize the probability of being rewarded. The process of the LA interacting with the environment is shown in Fig. 1; [1]. At each moment t, an action α(t) will be chosen by LA to interact with the random environment and receives the environment feeds back β(t), which can be either a reward or a penalty. Then, the automaton updates the state probability vector according to the received feedback. Because it’s simple algorithm, strong anti-noise ability and strong optimization ability, it has received extensive attention and has been applied in many fields, such as random function optimization, QoS Optimization and certificate authentication.

In the LA field, the most classical discrete pursuit algorithm with deterministic estimator is the DPRI algorithm given by Oommen in [2]. The main idea of the DPRI algorithm is to increase the probability vector which has the maximum value in the running estimates when the environment rewards the current action and decreases others; otherwise, the automaton changes nothing. Furthermore, many classical pursuit algorithms, such as DGPA [3] and SERI [4], also use estimators and discretization to improve the convergence speed of automaton. In [5], an algorithm named last-position elimination-based LA( LELA) which is contrary to the classical pursuit algorithm is proposed. Instead of greedily increasing the probability vector of the optimal estimation action, this algorithm reduces the probability of choosing the current worst estimation action, t Experiments show that LELA can get faster convergence speed than DGPA.

However, the estimator used by LELA has some innate defects. One typical flaw is the cold start and initialization problem, for example, since the maximum likelihood estimator does not have any information at the beginning, each action has to interact with the environment a certain number of times, it will Increase the cost of getting information in some complicated situations. And the update strategy of the LELA algorithm simply makes all active actions equally share the penalized state probability from the last-position action, does not consider the difference between optimal action and other actions at all. Thus, in this paper, we propose a weight assignment LELA (WLELA) algorithm which has made the following three changes: 1. Improvement of initialization parameters; 2. the estimator improvement; 3. changes in the probability vector update strategy.

In Sect. 2, a brief introduction of the LELA algorithm is given. In Sect. 3 we will show our algorithm WLELA in detail. Then in Sect. 4 part, we will give the simulation results of the WLELA algorithm, which will be compared with LELA and other classic algorithms such as DPRI, DGPA. Finally, summarize in Sect. 5.

2 Related Work

LA can be represented by a four-tuple $ \left\langle {\text{A, B, Q, T}} \right\rangle $ model. They are explained as follows.

A is the set of actions. B is the feedbacks from random environment, when B = {0, 1}, where β = 0 represents that the LA has been penalized, and β = 1 means the LA has been rewarded. Q = <P, E>, E represents the estimator, it contains all the historical information that each action interacts with the environment. The most commonly used estimator is the maximum likelihood estimator. P is the state probability vector of choosing an action α(t) at any instant t, it satisfies ∑pi(t) = 1. T is the state transition function of LA, which determines how LA migrates to the state of t + 1 according to the state of output, input, and t at time t.

The random environment can also be described by a triple <A, B, C> mathematical model. where A and B are defined in the same way as above, and C is defined as C = {cij = Pr{β(t) = βj|α(t) = αi}}.

In the original LELA algorithm [5], it uses the maximum likelihood estimator to record the historical information of all actions, according to the following formula

$$ {\text{di}}\left( {\text{t}} \right) = \frac{Wi\left( t \right)}{Zi\left( t \right)} $$

(1)

where Zi(t) is the number of times the action ai was selected up to time instant t and Wi(t) is the sum of the environmental feedbacks received up to time t. when an action is rewarded, the automaton will select the worst performing action from the estimator vector set and decrease corresponding state probability vector by a step ∆ = 1/rn, where r is the number of allowable actions and n is a resolution parameter. If some action’s state probability vector is reduced to zero during the process, this action will be removed from the optional set of actions, while the remaining actions will evenly share the state probability value from each decrease. The update scheme is described as follows:

If β(t) = 1 then

Find m ∈ N_r such that
$$ d_{m} \left( t \right) = \hbox{min} \left\{ { d_{i} \left( t \right) |p_{i} \left( t \right) \ne 0 } \right\}, i \in N_{r} $$

$$ p_{m} \left( {t + 1} \right) = { \hbox{max} }\left\{ {p_{m} \left( t \right) - \Delta ,0} \right\} $$

If $ p_{m} \left( {t + 1} \right) = 0 $ Then $ {\text{k}}\left( {\text{t}} \right) = k\left( t \right) - 1 $ Endif
$$ p_{j} \left( {t + 1} \right) = \hbox{min} \left\{ {p_{j} \left( t \right) + \frac{{p_{m} \left( t \right) - p_{m} \left( {t + 1} \right)}}{k\left( t \right)},1} \right\}, \forall j \in N_{r} ,such \,that p_{t} \left( t \right) > 0. $$
Else
$$ p_{i} \left( {t + 1} \right) = p_{i} \left( t \right) \forall i \in N_{r} $$
Endif.

κ(t) denotes the number of active actions and is initialized by r.

LELA has been proved to be ε-optimal in every stationary random environment.

3 Proposed Learning Automata

In order to overcome the shortcomings of the LELA algorithms, we propose the following improvements.

Firstly, we use the Bayesian estimator introduced in [6] to solve the cold start and initialization problems. However, if the Bayesian estimator is used directly in the LA algorithm, the convergence speed will be additionally affected, so in WLELA, we directly modify it to the mean of the posterior distribution which is to set all actions’ d_i(0) = 0.5, thereby improving convergence efficiency while ensuring overcoming cold start and initialization problems.

Secondly, in order to get more information, in the WLELA algorithm we used the confidence interval estimator proposed in [7] which is

$$ d_{i} \left( t \right) = \left\{ {1 + \frac{{Z_{i} \left( t \right) - W_{i} \left( t \right)}}{{\left( {W_{i} \left( t \right) + 1} \right)F_{{2\left( {W_{i} \left( t \right) + 1} \right),2(Z_{i} \left( t \right) - W_{i} \left( t \right),0.005}} }}} \right\}^{ - 1} ,\forall i \in N_{r} $$

(2)

where $ F_{{2\left( {W_{i} \left( t \right) + 1} \right),2(Z_{i} \left( t \right) - W_{i} \left( t \right),0.005}} $ is the 0.005 right tail probability of the F distribution $ 2\left( {W_{i} \left( t \right) + 1} \right) $ and $ 2(Z_{i} \left( t \right) - W_{i} \left( t \right) $ dimensional degrees of freedom.

Last, since all state probability vectors add up to a total of 1, so the value of each state probability vector can be thought of as its weights in the vector set. So, WLELA increase their probability vector according to their weights. In this way, similar to the idea of finding the optimal action in the classic pursuit algorithm, the probability vector of the optimal action will get more attention when updating, so that more values can be added to the optimal action’s state probability vector each time. Assigning the added value by weight is more in line with the purpose of learning the automatic machine to select the optimal action.

A detailed description of the WLELA is as follows

Algorithm WLELA

Initialize $ p_{i} \left( 0 \right) = \frac{1}{r};W_{i} \left( 0 \right) = 1,Z_{i} \left( 0 \right) = 2,\forall i \in N_{r} $

Initialize $ d_{i} \left( 0 \right) = \left\{ {1 + \frac{{Z_{i} \left( 0 \right) - W_{i} \left( 0 \right)}}{{\left( {W_{i} \left( 0 \right) + 1} \right)F_{{2\left( {W_{i} \left( 0 \right) + 1} \right),2(Z_{i} \left( 0 \right) - W_{i} \left( 0 \right),0.005}} }}} \right\}^{ - 1} ,\forall i \in N_{r} $

Step 1: At time t, pick α(t) = α_i according to the state probability vector P(t);

Step 2: Receive feedback β_i(t) {0,1}. Update the estimate values

$$ {\text{W}}_{i} \left( {\text{t}} \right) = {\text{W}}_{i} \left( {{\text{t}} - 1} \right) +\upbeta_{i} \left( {\text{t}} \right), $$

$$ {\text{Z}}_{i} \left( {\text{t}} \right) = {\text{Z}}_{i} \left( {{\text{t}} - 1} \right) + 1 $$

$$ d_{i} \left( t \right) = \left\{ {1 + \frac{{Z_{i} \left( t \right) - W_{i} \left( t \right)}}{{\left( {W_{i} \left( t \right) + 1} \right)F_{{2\left( {W_{i} \left( t \right) + 1} \right),2(Z_{i} \left( t \right) - W_{i} \left( t \right),0.005}} }}} \right\}^{ - 1} $$

Step 3: If β_i(t) = 1 Then

Find m ∈ N_r such that

$$ d_{m} \left( t \right) = \hbox{min} \left\{ { d_{i} \left( t \right) |p_{i} \left( t \right) \ne 0 } \right\}, i \in N_{r} $$

$$ p_{m} \left( {t + 1} \right) = { \hbox{max} }\left\{ {p_{m} \left( t \right) - \Delta ,0} \right\} $$

If $ p_{m} \left( {t + 1} \right) = 0 $ Then $ {\text{k}}\left( {\text{t}} \right) = k\left( t \right) - 1 $ Endif

$$ p_{j} \left( {t + 1} \right) = \mathop {\hbox{min} }\limits_{{\forall j \in N_{{r, p_{j} \left( t \right) > 0}} }} \left\{ {p_{j} \left( t \right) + \{ p_{m} \left( t \right) - p_{m} \left( {t + 1} \right)\} \times \{ p_{j} \left( t \right) + \frac{{p_{m} \left( t \right) - p_{m} \left( {t + 1} \right)}}{k\left( t \right)}\} ,1} \right\} $$

Endif

Step 4: If β_i(t) = 0 Then

$$ p_{i} \left( {t + 1} \right) = p_{i} \left( t \right) \forall i \in N_{r} $$

Goto Step 1.

Endif

Step 5: If $ \hbox{max} \left\{ {{\text{P}}\left( {\text{t}} \right)} \right\} = 1, $

Then CONVERGE to the action whose p = $ \hbox{max} \left\{ {{\text{P}}\left( {\text{t}} \right)} \right\} $.

ELSE

Goto step 1.

Endif

END

The parameter k(t) has the same meaning in algorithm LELA.

4 Simulation Results

This section we compare the relative performances of the proposed WLELA with the LELA and the classical pursuit algorithms DPRI and DGPA by presenting their accuracy and convergence speed. The random environment we used is the most commonly used benchmark environment E1-E4 with 10 allowable actions as shown in Table 1.

Table 1. Benchmark environments

Full size table

In the process of the LA simulation experiment, if the state probability vector of a certain action exceeds the set threshold T(0 < T ≤ 1), the algorithm is considered to have converged, if the converged action has the highest reward probability in the environment, it is considered that the learning automaton converged correctly.

For all the algorithms in the experiment, they are simulated with their best parameters, which are defined as the values that yielded the fastest convergence speed and guaranteed the automaton converged to the optimal action in a sequence of NE experiments. Specifically, in our experiment, we set the same threshold T and NE in [2, 3, 5], that is, T = 0.999 and NE = 750. After adjusting to the best parameters, we carried out 250,000 experiments to evaluate the average convergence rate and accuracy.

Accuracy is an indicator for judging the performance of an automaton, accuracy is defined as the probability that a learning automaton converges to the optimal action in an environment. As can be seen from Table 2, “Res” denotes the best resolution parameter, all algorithms can converge with high accuracy, while WLELA has higher accuracy than other algorithms, although the difference is not insignificant.

Table 2. Accuracy (number of correct convergences/number of experiments)

Full size table

A.
Average converge times

Convergence speed is one of the most critical performance indicators in learning automata. Convergence speed comparison data is shown in Table 2, “Ite” denotes the convergence speed.

From the Table 3, we can see that the WLELA algorithm is better than other algorithms in terms of convergence speed. Compared with LELA, the rate of convergence improvement in each environment is {6.93, 19.76, 3.13, 12.49 %}. Compared with the traditional DGPA and DPRI algorithms, the rate of improvement is {27.91, 40.43, 18.01, 28.28%} and {51.45, 72.24, 21.65, 56.02%}. It can be seen that WLELA converges faster than the other three algorithms, and the E2 environment is the most complex compared to other environments, and WLELA still performs best.

Table 3. Convergence speed

Full size table

5 Conclusion

This paper proposes an improved algorithm WLELA. By using Bayesian initialization eliminates the cold start problem, using confidence interval estimator gets more interactive information and using weight allocation strategy to realize the classical LA’s idea of pursuing the best behavior. These three improvements allow the WLELA algorithm to achieve high accuracy and fast convergence in the simulation experiments, and the results show that WLELA not only has the highest accuracy, but also the fastest convergence speed. Especially in the most complex environment, the WLELA still performs very well. In future work, consider using a random estimator instead of a deterministic estimator in WLELA. and the WLELA algorithm can be used in many applications that need to learn automata.

References

Thathachar M, Sastry PS (2004) Networks of learning automata: techniques for online stochastic optimization. Kluwer, Dordrecht
Book Google Scholar
Oommen BJ, Lanctôt JK (1990) Discretized pursuit learning automata. IEEE Trans Syst Man Cybern 20(4):931–938
Article MathSciNet Google Scholar
Agache M, Oommen BJ (2002) Generalized pursuit learning schemes: new families of continuous and discretized learning automata. IEEE Trans Syst Man Cybern Part B Cybern 32(6):738–749
Article Google Scholar
Papadimitriou GI, Sklira M, Pomportsis AS (2004) A new class of ε-optimal learning automata. IEEE Trans Syst Man Cybern Part B 34(1):246–254
Article Google Scholar
Zhang J, Wang C, Zhou MC (2014) Last-position elimination-based learning automata. IEEE Trans Cybern 44(12):2484–2492
Article Google Scholar
Xuan Z, Granmo OC, Oommen BJ (2013) On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata. Appl Intell 39(4):782–792
Article Google Scholar
Hao G, Jiang W, Li S et al (2015) A novel estimator based learning automata algorithm. Appl Intell 42(2):262–275
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Key Research and Development Project of China under Grant 2016YFB0801003.

Author information

Authors and Affiliations

School of Cyber Space Security, Shanghai Jiao Tong University, 800 Dong Chuan Road, 200240, Shanghai, China
Haiwei An, Chong Di & Shenghong Li

Authors

Haiwei An
View author publications
You can also search for this author in PubMed Google Scholar
Chong Di
View author publications
You can also search for this author in PubMed Google Scholar
Shenghong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haiwei An .

Editor information

Editors and Affiliations

Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA
Qilian Liang
College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, China
Wei Wang
School of Information and Communication Engineering, Dalian University of Technology, Dalian, Liaoning, China
Xin Liu
School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, China
Zhenyu Na
School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin, China
Min Jia
College of Physical and Electronic Information, Tianjin Normal University, Tianjin, China
Baoju Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

An, H., Di, C., Li, S. (2020). Weight-Assignment Last-Position Elimination-Based Learning Automata. In: Liang, Q., Wang, W., Liu, X., Na, Z., Jia, M., Zhang, B. (eds) Communications, Signal Processing, and Systems. CSPS 2019. Lecture Notes in Electrical Engineering, vol 571. Springer, Singapore. https://doi.org/10.1007/978-981-13-9409-6_41

Download citation

DOI: https://doi.org/10.1007/978-981-13-9409-6_41
Published: 04 April 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9408-9
Online ISBN: 978-981-13-9409-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Weight-Assignment Last-Position Elimination-Based Learning Automata

Abstract

Similar content being viewed by others

A novel estimator based learning automata algorithm

An Introduction to Learning Automata and Optimization

An Efficient Learning Automaton Scheme for Massive-Action Environments

Keywords

1 Introduction

2 Related Work

3 Proposed Learning Automata

4 Simulation Results

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Weight-Assignment Last-Position Elimination-Based Learning Automata

Abstract

Similar content being viewed by others

A novel estimator based learning automata algorithm

An Introduction to Learning Automata and Optimization

An Efficient Learning Automaton Scheme for Massive-Action Environments

Keywords

1 Introduction

2 Related Work

3 Proposed Learning Automata

4 Simulation Results

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation