Abstract
Porker is a game of imperfect information (Rubin and Waston, Artif Intell, 175(5-6), [1]), in which each player has cards hidden from other players. In previous research (Billings et al, in American Association for Artificial Intelligence Press, [2]; Popyack, in Blackjack-playing agents in an advanced AI course, [3]), Markov model serves as a powerful framework to design a Poker agent. In this paper, we propose a new approach named intuitive searching to build the Exploitative Agent AI against the Markov model Agent in the Blackjack Game. The correctness of our approach can be proved by analyzing the predictive distribution of winning rate. Our AI prevails Markov Agent in a 10000-round experiment. The performance of intuitive policy searching approach in general cases and corresponding limitations are also discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rubin J, Waston I (2011) Computer poker: a review. In: Artificial intelligence, vol 175, no 5–6, April 2011, pp 958–987. https://doi.org/10.1016/j.artint.2010.12.005
Billings D, Castillo L, Schaeffer J, Szafron D (1999) Using probabilistic knowledge and simulation to play Poker. American Association for Artificial Intelligence Press, pp 697–703. http://poker.cs.ualberta.ca/publications/AAAI99.pdf
Popyack JL (2009) Blackjack-playing agents in an advanced AI course. In: ITiCSE ’09: proceedings of the 14th annual ACM SIGCSE conference on innovation and technology in computer science education, pp 208–212. https://doi.org/10.1145/1562877.1562944
Thorp Edward O (1966) Beat the dealer: a winning strategy for the game of twenty-one. Vintage Books. 0394703103
Bishop CM (2006) Pattern recognition and machine learning (Information science and statistics). Springer, New York Inc, Secaucus, NJ
QazazChristopher CS, Williams Christopher KI, Bishop M. An upper bound on the Bayesian error bars for generalized linear regression. In: Ellacott SW, Mason JC, Anderson IJ (eds) Mathematics of neural networks. Operations research/Computer science interfaces series, vol 8. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-6099-9_51
Spiegelhalter D, Lauritzen S (1990) Sequential updating of conditional probabilities on directed graphical structures. In: Network, vol 20, no 5, August 1990, pp 579–605. https://doi.org/10.1002/net.3230200507
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix 1
Appendix 1
1.1 Missing Proofs
1.1.1 Proof of Lemma 2
For any \(0 \le R\le 2T\), we have
Case 1: \(0 \le R \le T\)
Case 2:\(T \le R \le 2T\)
1.1.2 Proof of Lemma 3
When \(0\le R\le 2T\), we have
1.1.3 Proof of Lemma 4
We have
Using \(R=2Tx\) to replace R, we have
Similarly,
and
Using \(R=2Tx\) to replace R, we then have
As a result,
1.1.4 Proof of Theorem 1
Part 1: \(\displaystyle \frac{1}{n}\bigg (\sum _{1\le i\le n} f(\phi (y_i),\theta )\bigg ) \xrightarrow {P} \displaystyle \frac{1}{n}\bigg (\sum _{1\le i\le n} f(\phi (x_{k_i}),\theta )\bigg )\)
Proof
Let P be the probability that y and its nearest neighbor belong to different class. Then we have
Because the random event “\(f(\phi (y_i),\theta _x) \ne f(\phi (x_{k_i}))\)” and “|\(f(\phi (y_i),\theta _x) - f(\phi (x_{k_i}))|=1\)” are the same, the conditional expectation by given training data \(x_1,\ldots ,x_n\) is
and the conditional variance by given training data \(x_1,\ldots ,x_n\) is
Denote by \(\varGamma _{\theta _x}\) the decision boundary of classification function f with parameter \(\theta _x\) and \(\Vert y\varGamma _{\theta _x} \Vert \) as the distance from y to the decision boundary. Then we have
As the decision boundary of the classification function f relies on training data \(x_1,\dots ,x_n\), which is independent from y, we have
We remark that \(\Vert \varGamma _{\theta _x} \Vert \) are effected by the model complex.
Part 2: \(\frac{1}{n}\bigg (\sum _{1\le i\le n} f(\phi (x_{k_i}),\theta _x)\bigg )\xrightarrow {P} \frac{1}{n}\bigg (\sum _{1\le i\le n} f(\phi (x_i),\theta _x)\bigg )\)
Proof
Let \(\xi _i\) be the frequency of \(x_i\) in sequence \(x_{k_1}\dots x_{k_n}\). Then we have
Applying the big number theorem, we have
which completes the proof.
1.1.5 Proof of Lemma 8
Proof
Let \(V^{*}= {<}v_1^{*},\dots ,v_n^{*}{>}^{T}\) be the optimal solution. If \(v_1^{*} \ne 0\) or \(v_1^{*} \ne 1 \rightarrow v_1\in (0,1)\) at the point \(\mathbf {v}^{*}= {<}v_1^{*},\dots ,v_n^{*}{>}^T\), then the partial deviation of cost function is equal to zero, i.e.,
We can divide the expression of \(f_C(\mathbf {v})\) into three groups: the expression ends with \((1-v_1)\), the expression begin with \(v_1\), and the expression does not contain \(v_1\), i.e.,
Let \(k=|\{c|c=(1-v_1) \wedge c\in C\}|\), at the point \({<}v_1^{*},\dots ,v_n^{*}{>}^T\), we have the equation
Therefore,
Note that the optimal value of the cost function at the point \(\mathbf {v}^{*}= {<}v_1^{*},\dots ,v_n^{*}{>}^T\) is
Hence, \(\mathbf {v}^{'}= {<}0,v_2^{*},\ldots ,v_n^{*}{>}^{T}\) is another optimal solution. Repeating the above operation, we can eventually find an optimal solution \(v^{'}\) in which each \(v_i\) is either 0 or 1.
1.1.6 Proof of Lemma 12
Lemma 12
When \(m\ge 1\) , \(g(x)=\frac{\ln (1-x^{m})}{\ln (1-x)}\) is a monotonously increase function on interval \(x\in (0,1)\).
Proof
For any dx, functions \(g_1(x)=\ln (1-x^{m})\) and \(g_2(x)=\ln (1-x)\), we have
Note that
Furthermore, we have \(\xi (0)=1\) and \(\xi (1)=0\), where \(\xi (x)=1+(m-1)x^{m}-mx^{m-1}\). Besides, we have
Therefore,
Let \(-\frac{1}{1-x}=c\). we have
Because
We can find an \(\epsilon >0\) such that \(\forall dx( 0 \le dx\le \epsilon \rightarrow d<0)\) and
where \(d=cdx+o(dx)\).
1.1.7 Proof of Lemma 10
We have
When \(x \in (0,1)\), we have \((1-x)^k>(1-x^m)\) iff \(k < \frac{\ln (1-x^m)}{\ln (1-x)} \le 1\). For any \(y \in (0,1)\),
Let \(k=\frac{\ln (1-y^m)}{\ln (1-y)}\). Notice that
Therefore,
According to Lemma 12, we further have
when \(y=n^{-\frac{1}{m}}\), we have \(\frac{1}{2kn+1}\in O(n^{-\frac{1}{m}})\). Because \(\lim _{y\rightarrow 0}\frac{1-\sqrt{1-y}}{y}=1\), we have \(1-\sqrt{1-y}\in O(y) = O(n^{-\frac{1}{m}})\). Therefore,
Similarly, we have
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pan, Z., Xue, J., Ge, T. (2022). Intuitive Searching: An Approach to Search the Decision Policy of a Blackjack Agent. In: Yang, XS., Sherratt, S., Dey, N., Joshi, A. (eds) Proceedings of Sixth International Congress on Information and Communication Technology. Lecture Notes in Networks and Systems, vol 236. Springer, Singapore. https://doi.org/10.1007/978-981-16-2380-6_77
Download citation
DOI: https://doi.org/10.1007/978-981-16-2380-6_77
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2379-0
Online ISBN: 978-981-16-2380-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)