A novel deep residual network-based incomplete information competition strategy for four-players Mahjong games

Wang, Mingyan; Yan, Tianwei; Luo, Mingyuan; Huang, Wei

doi:10.1007/s11042-019-7682-5

A novel deep residual network-based incomplete information competition strategy for four-players Mahjong games

Published: 04 May 2019

Volume 78, pages 23443–23467, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

A novel deep residual network-based incomplete information competition strategy for four-players Mahjong games

Download PDF

Mingyan Wang¹,
Tianwei Yan¹,
Mingyuan Luo¹ &
…
Wei Huang¹

1014 Accesses
7 Citations
Explore all metrics

Abstract

The game theory is widely acknowledged to benefit a lot from recent advances in deep learning, and intelligent competition strategies have been proposed for both complete information games and incomplete information games in recent years. In this paper, the four-players Chinese Mahjong game, which is a typical incomplete information game, is emphasized, a low-level semantic pseudo image generated based on game related prior knowledge and a novel deep residual network-based competition strategy are introduced to play the Chines Mahjong game. Technically, the deep learning within this new competition strategy is realized by a series of “GoBlock”, which is a new deep learning model structure introduced in this paper. Also, the “GoBlock” is further made up of several “Inception+” sub-structures, which is novel as well. Comprehensive experiments are conducted to reveal the superiority of this new competition strategy. A great number of the Chinese Mahjong game data have been collected from an online Chinese Mahjong company to construct the dataset in this study, and the newly proposed competition strategy has been compared with several shallow learning-based methods as well as deep learning-based methods. Both qualitative and quantitative analysis have been conducted based on outcomes obtained by all compared methods, and the superiority of the new competition strategy over others are suggested. Furthermore, an interesting competition among the new AI competition strategy and three real senior players are also introduced in this paper. The effectiveness and efficiency of the new competition strategy over real senior human players are also revealed by quantitative analysis based on four measures, from the statistical point of view. It is also necessary to point out that, this work is the first attempt to tackle the Mahjong game, which is a typical incomplete information game, from the deep learning perspective.

Algorithms for Mastering Board Game Nanahoshi Considering Deep Neural Networks

A Further Investigation of Neural Network Players for Game 2048

Challenging Human Supremacy: Evaluating Monte Carlo Tree Search and Deep Learning for the Trick Taking Card Game Jass

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

It is widely acknowledged that, the game theory is a classic research domain [36]. Generally speaking, depending on the available information within games, most games can be roughly divided into two categories, i.e., the complete information games and the incomplete information games. For a complete information game, each participant of the game shares the same amount of available information (e.g., features, strategies, etc.), and typical complete information games include Go, chess, etc. For an incomplete information game, the amount of available information known by each individual participant is not equivalent. For example, the bridge (i.e., a popular card game) is a classic incomplete information game. In this came, the available information that one participant possesses is acquired from cards of her/his own. In other words, the amount of available information varies depending on cards of each individual participant in this bridge game. Typical incomplete information games include but not limit to poker games, Shikoku chess, Mahjong. Starcraft, etc. It is necessary to point out that, the competition strategy of playing incomplete information games keeps on receiving much research attention and popularity nowadays, since a lot of decision-making problems in daily lives of human beings actually arise from incomplete information games. For instance, the business negotiations in many economic scenarios are in fact problems of incomplete information games, as the possessed information among each side of the business negotiations cannot be really symmetric [16]. After known the importance of incomplete information games, it is easy to perceive that, the competition strategy of playing incomplete information games is worthy of comprehensive and thorough investigations.

In recent years, it is noticed that, the development of competition strategies of playing either complete or incomplete information games often complies well with the progress of machine learning techniques. For instance, in the game of Go, the AlphaGo series (i.e., AlphaGo, AlphaGoZero, AlphaZero) [29,30,31] proposed by Google DeepMind become well known in the last 3 years, and the main technical motivation of them belongs to sophisticated deep learning techniques. In the game of Texas Hold’em, the DeepStack system was introduced by the University of Alberta, and the system employs an anti-regretization minimization algorithm (CFR) as well as a multi-layer deep neural network (DNN) for playing the Texas Hold’em [3]. Also, another introducing work named Libratus was proposed by the Carnegie Mellon University to handle the same Texas Hold’em game recently [4, 5]. In the game of Nash equilibrium [28], popular machine learning techniques, such as the supervised learning and the reinforcement learning, are both adopted to enable the self-confrontation convergence within the Nash equilibrium game [15]. In the game of Japanese Mahjong, researchers from the University of Tokyo developed a competition system incorporating specific Japanese Mahjong rules, and it was reported that good performance can be obtained based on their introduced Phoenix platform [26].

In this study, the famous Chinese Mahjong, which is a typical incomplete information game, is focused. A novel competition strategy composed of new deep residual networks is proposed for the first time. Generally speaking, the deep residual network is a popular deep discriminant model, and it is still considered to be one of the-state-of-art deep discriminant models nowadays (i.e., the original model receives the only best paper award of CVPR 2016) [13]. The deep residual network is popular at present since it can properly solve the notorious degradation problem in deep learning, which implies that the accuracy of a deep network model can get saturated and it will begin to degrade rapidly with the increase of the model depth. In this study, the merit of the original deep residual network is kept within the newly proposed deep residual network-based competition strategy, and it is valuable to mention that, this study is also the first attempt to solve the problem of incomplete information games based on the deep residual network.

The organization of this paper is as follows. In Section 2, a comprehensive review of recent developments in deep learning techniques is first described in Section 2.1. After that, a thorough review of recent developments in competition strategies of playing incomplete information games is elaborated in Section 2.2. In Section 3, details of the newly introduced deep residual network-based competition strategy for playing the Chinese Mahjong game are described. In Section 4, comprehensive experiments are conducted and the superiority of the newly proposed competition strategy in this study is revealed based on statistical comparisons with other popular competition strategies. In Section 5, the conclusion of this study is drawn and the future direction is suggested therein.

2 Related works

Since a novel competition strategy motivated by up-to-date deep learning techniques is proposed for playing incomplete information games in this study, two reviews will be provided in this section. The first review emphasizes on recent developments in deep learning techniques, and it is described in Section 2.1. The second review focused on recent developments of competition strategies of playing incomplete information games, and it is shown in Section 2.2.

2.1 Recent developments in deep learning techniques

It is widely acknowledged that, deep learning techniques represent the current trend of machine learning, and the early paper published by Hinton et al. can be viewed as a modern symbol to revive the large-scale neural network study, i.e., deep learning [17]. Nowadays, deep learning models can be characterized via both deeper layers and sophisticated model structures. To be specific, in the year 2012, a parallel multi-channel convolutional neural network named as AlexNet was proposed [24]. This network only has 8 layers and it is a bit shallow compared with those deep learning model vastly utilized nowadays. In the year 2014, another well known deep learning model, the VGG, was introduced. This model has a deeper structure and the depth can reach up towards 16 layers or 19 layers, depending on its different versions [32]. At the same year, Google proposed the GoogleNet (i.e., Inception V1 to V3 [19, 33, 34]), and its structure is more sophisticated compared with those predecessors of deep learning. Inception V1 utilizes a large number of sparsely connected networks and it can achieve high computational performance when dealing with dense matrices. It also uses three different 1 × 1, 3 × 3, and 5 × 5 sizes of convolution kernels, which enables the fusion of different initial features possible. In Inception V2, the famous batch normalization (BN) technique was proposed to modify the network connection to avoid excessive parameters and information loss caused by large convolution kernels [19]. Also, n × 1 and 1 × n were used in this version instead of original convolutions. In Inception V3, the initial convolutional layer was further replaced with a small convolution kernel, and its energy equation also got updated. Also, the depth of the original Inception model is 22 layers, which is significantly deeper than AlexNet and VGG.

When the deep residual network (ResNet) was initially proposed in the year 2015 [13], it soon becomes one of the dominant deep learning models, and still influences the design of many deep discriminant models to date [10, 20, 40, 42]. The essence of ResNet is illustrated in Fig. 1, which employs identified mappings as shortcut connections to effectively avoid the vanishing gradient problem caused by the excessively enlarging the number of layers in ResNet for boosting its generalization capability. As a result, the ResNet can be as deep as 1001 layers [14], and its generalization performance is still guaranteed. After ResNet was introduced, a great number of ResNet variants, including the DenseNet [18], the ResNeXt [41], the Dual Path Network [6], have been introduced in recent years. For the DenseNet, it continues with the similar idea of creating short paths among different layers introduced in ResNet [18]. For the ResNeXt, it introduces a new concept named cardinality. It also incorporates structures of stacked-VGG as well as Inception’s split-transform-merge in its model’s construction [41]. For the Dual Path Network, it combines the ResNeXt and the DenseNet within one single network, and the generalization capability of this network was also verified [6]. In addition, an increasing number of studies and applications of RNN (i.e., Recurrent Neural Network) have been made. There are many variations of the RNN model including the classical structure of LSTM (i.e., Long Short-Term Memory) [11], GRU (i.e., Gate Recurrent Unit) [7] and the recent research hotspot attention mechanism [1]. These typical RNN models are also combined with convolutional neural networks and even ResNet [38], which have made amazing progress in image recognition, image semantic analysis and other aspects [37,38,39].

In this study, the idea of ResNet is also incorporated in the newly introduced competition strategy, and a new ResNet-based deep discriminant learning model is proposed for the first time to handle the problem of incomplete information games. It can be perceived from the above review that, the new model follows the current trend of building up deep discriminant learning models very well. It is also necessary to mention that, deep generative models, such as the variational auto-encoder (VAE) [21], the generative adversarial network (GAN) [12], the graph lowering compiler technique for neural networks (GLOW) [23], also receive much popularity in contemporary deep leaning studies. However, since this study is deep discriminant learning-based, reviews about deep generative learning models are not included in this subsection.

2.2 Recent developments in competition strategies of playing incomplete information games

As it is described in Section 1, developments of competition strategies of playing complete / incomplete information games comply well with the recent advance in machine learning techniques. In the era of shallow learning, many research efforts need to be spent to construct semantic feature spaces for each individual complete / incomplete information game, and the performance of competition strategies could be heavily influenced by those semantic features. For instance, in [26], researchers in the University of Tokyo really spent a great number of efforts to incorporating prior knowledge of human beings into the semantic features construction of the Japanese Mahjong game. When the era of deep learning begins, those burdensome hand-crafted semantic features are mainly replaced by latent semantic features automatically learned from various deep learning models. For example, in the famous AlphaGo, the classic convolutional neural network (CNN) is adopted to construct a fast rollout integrated strategy network (i.e., named “policy net”) via supervised learnings. Meanwhile, another self-gaming system named “value evaluation net” is built up through reinforcement learning methods. Rules of the Go game are implemented via the classic Monte-Carlo idea and a conventional Monte Carlo tree search is employed as the optimization and the deduction of the whole Go system. For its successors, the AlphaGo Zero and the AlphaZero, ResNets are incorporated to replace the role of CNNs, and the capability of the whole Go system can be improved via self-playing. It can be perceived from the above descriptions that, the advance from AlphaGo to AlphaGo Zero / AlphaZero also complies well with the trend in deep discriminant learning models.

For recent studies on competition strategies for playing incomplete information games, Texas Hold’em and Mahjong become representative incomplete information games to be investigated. For the Texas Hold’em game, in [3], researchers incorporated CNNs and a virtual self-game strategy to fulfill the supervised learning of the Texas Hold’em deep learning model. The performance turns out that, the learned system is capable to defeat three top ACPC (i.e., Annual Computer Poker Competition) computer poker programs on limited-limit Texas Hold’em [3]. In [27], it is reported that the DeepStack system utilizing the counterfactual regret minimization algorithm becomes the first Texas Hold’em artificial intelligence system to defeat human professionals in the Texas Hold’em game. In [5], the Libratus system is introduced, in which a security-proximal optimal subproblem solving algorithm as well as an improved counterfactual regret algorithm are incorporated. For the Mahjong game, recent studies demonstrate that, only the Japanese Mahjong game is considered as an incomplete information game and shallow learning techniques (i.e., the linear regression and logistic regression methods) are adopted to realize its competition strategy [26]. Also , in multi-person non-cooperative games like StarCraft II or Defense of the Ancients(Dota), many remarkable works have been done [2, 9, 35, 43]. But, it is still a hard problem.

In this study, a newly unbalanced ResNet-based deep discriminant learning model is proposed for the first time to handle the problem of incomplete information games. The contributions of this study can be summarized as follows. First, it is the first attempt to handle the Chinese Mahjong game as a incomplete information game problem. Second, it is also the first ever attempt to tackle the Mahjong game via the deep learning perspective. Third, the ResNet-based deep discriminant learning model is novel in its model construction and its superiority will be revealed via comprehensive experiments. Details of the new model is introduced in Section 3.

3 Methodology

In this section, details of the unbalanced ResNet-based deep discriminant learning model for handling the Chinese Mahjong game are elaborated. Significantly different from other deep learning studies, which utilize raw data as the input of their deep learning models (e.g., in computer vision studies, images themselves are often directly inputed into deep learning models, and the semantic gap between raw images (at the input of deep learning models) and their semantic understandings (at the output of deep learning models) is expected to be bridged via the generalization capability provided by deep learning models), low-level semantic features inspired by prior knowledge closely related towards rules and status of the Chinese Mahjong game are selected as the input of the unbalanced ResNet-based deep discriminant learning model. The reason is because that, in every decision situation in Chinese Mahjong game, it is necessary to connect decision-making situations and decision-making mapping. Specifically, the extraction of the significant information related to the Mahjong game, including the information on the board, the behavior of the opponents, our own hand, as well as the existing rules and winning methods, to conduct semantic segmentation, and generate pseudo images. It is assumed that the generalization capability of the new deep learning model can be improved with the help of prior knowledge, learning the probability distribution related to the decision, and solving this Chinese Mahjong game. Therefore, the low-level semantic features inspired by prior knowledge as well as the newly unbalanced ResNet-based deep discriminant learning model will be emphasized in this section, and they are elaborated in Sections 3.1 and 3.2, respectively.

Figure 2 demonstrates the main flowchart of the newly unbalanced ResNet-based deep discriminant learning model for handling the Chinese Mahjong game in this study. It can be noticed that, semantic features described in Section 3.1 are to be employed as the input of the whole model. The new model introduced in Section 3.2 will be made up of a series of “GoBlock”, which is a novel deep learning structure proposed in this study. Details are described in the followings.

3.1 Low-level semantic features based on compressed prior knowledge

3.1.1 Basic rules of the Chinese Mahjong game

The Chinese Mahjong game is a table game, in which four players start with several tiles (i.e., cards in Poker game) and compete to achieve the highest score to win. Generally speaking, one ordinary game of the Chinese Mahjong usually consists of four or eight rounds. In each individual round, one player usually gets 13 tiles as the “initial hand”. one of the four players will be determined as the dealer, who should have one more tile and play first. At each individual turn in one round, each player draws a tile from the wall (i.e., a set of invisible tiles decided at the start with randomly arranged). After that, the very player should either discard a tile or apply another action among Chow, Pong and Kong. The above playing continues until one player declares a Win. Upon winning in the Chinese Mahjong game, only when a player with a “winning hand” consisting of 14 tiles in a particular combination is eligible. Also, ways to win the Chinese Mahjong game vary. For example, when a player picks up a winning tile from the wall, it is named winning from the wall. If a player wins when one of other players discards a winning tile, it is called winning by a discard. It is necessary to mention that, in the Chinese Mahjong game, it is not allowed to win by a discard, which is significantly different from the Japanese Mahjong game.

More specifically, in the Chinese Mahjong game, there are 3 types of “numbers tile” including numbers or also can be called character, balls, and sticks, each type has 9 numbers (i.e., from 1 to 9). Besides the above 3 types of “numbers tile”, another kind of tiles in the Chinese Mahjong game are Wind and Dragon, in which East Wind, South Wind, West Wind, North Wind, Red Dragon, Green Dragon and White Dragon are included. During each individual round of playings, one player can steal the discarded tile from another player. However, the action steal is further made up of three ways. The first one is the action Chow, which means when the first player discards a tile and it is exactly the very tile that the second player needs to make up a “3-sequential-tile” (e.g., the second player has ball tile 1 and 3, and the first player discards a ball tile 2, then the second player can pick up the very tile). The second one is the action Pong, which means a player can pick up a tile making up “3-same-tile” (i.e., triplet, for instance, three ball tiles 1). The third one is the action Kong, which means a player can pick up a tile which can make up “4-same-tile”. Furthermore, when a player declares an action including Chow, Pong or Kong, this player should put the combined tile in front of his hand tiles, which is called suit, and those tiles cannot be utilized anymore. Upon winning, when a player needs only one tile to win, it is called waiting. In the Chinese Mahjong game, there are often four types of winning combinations, and they are the common type, the seven pairs type, the thirteen different type and the nine-one type. Detailed explanations of all above 3 “steal” actions and 4 winning combinations in the Chinese Mahjong game are elaborated in Table 1. It’s important to note that, the complexity of four-players Mahjong game is more than 3.4 × 10²⁸² decision points.

Table 1 3 “steal” actions and 4 basic main combinations of tiles in the Chinese Mahjong game

Full size table

3.1.2 Compressed low-level semantic features of the Chinese Mahjong game

Low-level semantic features which will later be fed into the unbalanced ResNet-based deep discriminant learning model in this study as the model’s input, are mainly inspired by basic rules of the Chinese Mahjong game introduced above. Before constructing the low-level semantic features, it is necessary to understand the main challenge of the Chinese Mahjong game. Generally speaking, the main challenge of the Chinese Mahjong game is that, hands of other players are invisible, so that tactics of other players are totally blind. Therefore, one player can only get known the current situation better by observing historical events in this game. Because of the unpredictable randomness of each Chinese Mahjong game, making a correct decision will be become a challenging issue for players.

In order to tackling the above challenging issue, low-level semantic features are constructed based on basic rules and valuable prior knowledge of the Chinese Mahjong game. Among them, basic rules are already elaborated, therefore prior knowledge of the Chinese Mahjong game is emphasized here. In this study, the prior knowledge is suggested to be collected from the player, and it is made of 1) the hand information, 2) the field information of tiles that are discarded at each step of the game, and 3) the action information of each player in each step of the game. To be specific, the hand information is from initial tiles obtained from the player as well as tiles that have been changed within the game process. Based on Table 2, it can be perceived that, the hand information includes tiles of numbers, balls, sticks, winds, etc. The field information, on the other hand, means the type and number of tiles which have been discarded. This kind of information is caused by actions that all players have performed, including Chow, Pong, Kong, which are listed in Table 1 . The action information, however, is more sophisticated to be represented. Generally speaking, in one Chinese Mahjong game, most of the winning combinations require a variety of suits. In other words, operations such as Chow, Pong, Kong can speed up the victory. It suggests that constructing tiles that can quickly form suits is of high importance. In order to make up a suit, the essence resides in keeping the neighboring tiles or non-neighboring tiles with equal intervals towards each other. For example, Number tiles 1 and 3 are more valuable than Number tiles 1 and 7. Besides the above three kinds of informations to be incorporated as prior knowledge, there are also other semantic features to be added in. One is the waiting number. Generally speaking, the waiting number is the number of tiles that one player needs for achieving a win, and it is mathematically defined in (1).

$$ N_{waiting} = N_{MAX} - N_{current} $$

(1)

where, N_waiting represents the waiting number; N_MAX describes the max waiting number of this winning type (e.g., the max waiting number of the common type is 13, and the max waiting number of Seven Pairs type is 7). Table 3 elaborates all action information to construct semantic features in this study.

Table 2 Semantic features of the hand information in the Chinese Mahjong game

Full size table

Table 3 Semantic features of the action information in the Chinese Mahjong game

Full size table

3.2 The unbalanced ResNet-based deep discriminant learning model

As observed in Fig. 2, the unbalanced ResNet-based deep discriminant learning model is made up of a series of “GoBlock”, which is a new deep learning model structure introduced in this part. Inside each individual GoBlock, there are also three “Inception+” sub-structures. Details are described as the followings.

It is widely acknowledged in ResNet that, the output of ResNet is changed from the conventional output H(x) of a deep learning model into H(x) = F(x) + x (illustrated in Fig. 1), which ensures that all calculated gradients at each individual layer can propagate well. Another interesting thing to take note is that, in GoogleNet Inception V3, a variety of convolution kernels of different sizes are employed, and a flexible and efficient network is obtained via splicing of different channel numbers. Since the decision-making process in the Chinese Mahjong game is complicated, the deep learning model utilized to tackle the very game requires richer parameters and deeper networks for enormous generalization capabilities. Therefore, ResNet and GoogleNet Inception V3 become motivations to build up the unbalanced deep discriminant learning model in this study. To be specific, this new model consists of a stack of residual blocks, which is named as “GoBlocks”. It is also necessary to mention that, these GoBlocks share the same topology, and the basic structure of a GoBlock is illustrated in Fig. 3.

Based on Fig. 3, it can be observed that, its basic structure contains a channel copy layer which contains two twin parallel layers, a channel splitting layer which splits the channel into two different parts, and three multi-convolution nuclear multi-channel fusion layers all of which are organized similarly to the Google Inception V3 block. And we put the splitting layer and the fusion layers together, named Inception+ structure. The number of channels entering the Inception+ structure is 192. There are several key issues to be addressed here. First, the channel copy layer just copy the input channel, because our semantic segmentation features are less than a really image, which means we need more feautres for convolution operations. Twin parallel convolution layers are directly connected. Significantly different from the later splitting layer, the number of channels here cannot be duplicated, and the original channel should be symmetrically divided. After the convolution operation is executed on twin parallel convolution layers, a linear splicing operation is utilized for back-prorogation, and the size of the convolution kernel is set as 3 × 3. Such an operation aims to realize the cardinality concept, and it is helpful to further increase the width of the network. Second, the upper layer aims to split the input features and pass them into two parallel convolutional layers. The size of the convolution kernel is 3 × 3. The purpose of this layer is to take great advantage of the different channels, which is also beneficial to increase the width of the whole model for better generalization capability and improved efficiency. Unfortunately, we tried several plans to enlarge the channels, but it turns out to be a catastrophic overfitting. So we just keep two channels splitting opeations for the best performance.

Then, the classic structure of Google Inception V3 is directly connected [34]. In its structure, we do a little change ,the 1 × 1 and 3 × 3 convolution layers without 5 × 5 convolution layers mainly determine the number of feature maps, so that the entire Inception structure not only has good feature extraction capabilities, but also guarantees good compression performance regarding the amount of parameters. Considering of the input size of our feature maps, We abandoned the 5 × 5 kernel to keep enough key information without wasting. Different from Inception V3, we divide the original 192 channels into 3 channels of 64 channels for processing in different convolutional layers. It should be noted here that the size of feature maps through the convolutional layer should remain unchanged. Because we’re passing in pseudo images that have been processed, there’s not a lot of rich information to lose. In this study, the Inception+ model structure is applied to increase the number of channels, in order to improve fitting capability of the unbalanced deep learning model as well. To be specific, there are three parallel networks in each Inception structure, which include a convolution layer of 1 × 1 convolution kernel size conneceting with 1 × 1 concatenation, a convolution layer of 1 × 1 connecting with 3 × 3 convolution layer, and a 3 × 3 maxpooling layer connection 1 × 1 convolution layer. Additionally, two parallel convolutional layers are connected by an Inception structure to form a new Inception+ layer, and its the main structure is illustrated in Fig. 4. It can also be observed from Fig. 3 that, three Inception+ structures are concatenated to form the GoBlock structure. One thing to mention here is that, in order to make the data distribution more reasonable and reduce the occurrence of over-fitting problems, the BN (i.e.,Batch Normalization) layer is added in front of each convolutional layer [19].

It is worth noting that, in the design of the whole discriminant model, in order to make the overall network better for the low-level semantic features, we tested the parameter setting of multiple groups of different GoBlocks. When the number of GoBlocks is less than 5, its network performance does not exceed the original res-1001 network structure on the same data set.That might be because when the number of layers was low, our shortcut and unbalanced multiple convolution kernel didn’t make much sense. When the number of GoBlocks is greater than 8, the network performance does not change significantly. Its test loss remains around 0.64 ± 0.002. However, the parameter is increased by 407k compared with 7 GoBlocks (i.e., specifically 2887372 total parameters in 7 GoBlocks and 3295116 total parameters in 8 GoBlocks). When the number of GoBlocks is greater than 20, obvious overfitting occurs.

4 Experiments and analysis

4.1 Data description and data pre-processing

All implementations in this study are realized based on Tensorflow and Keras open source frameworks. The deep learning platform utilized in this study is a GTX 1080 8g GPU card with 16g RAM. In order to verify the effectiveness of the new deep discriminant learning model proposed in this paper, a great number of data are collected from one Chinese Mahjong game company, which has collaborations with researchers of this study. To be specific, the data collection span ranges from November 2017 and April 2018, and the data sources are from real anonymous players who have played the online Chinese Mahjong of that company during the data collection span. After collecting the Chinese Mahjong game data, all data under a rigorous pre-processing operation. First, in order to ensure the high quality of the constructed Chinese Mahjong dataset, data are only enclosed from master players. The standard to pick master players is to shortlist ones with game scores in the top 1000 of all players and winning records of at least 500 games, simultaneously. Second, in order to make the data samples well distributed, all kinds of winning patterns in diverse situations are taken into consideration. After all above rigorous pre-processing operations, the Chinese Mahjong game dataset composed of the 2.40 million hand-made decision scenes with 1000,000 bureaus are constructed at the end. Inside the dataset, there are about 548,690 matches of Common type, 89,121 of Seven pairs type, 339,120 Nine-one type, and 178,910 of Thirteen different type. For the data related to operating decision making, there are about 311,891 of Chow, 350,190 of Pong, 70,332 of Kong and 503,878 of passing. In the data set of the action decision model, key actions including the Chow, Pong and Kong are adopted to form the data set. It is also worthy to mention that, after composing the data feature following Section 3.1, the data also needs to be reshaped as a 2D matrix for the ease of 2D convolution within the newly introduced deep learning model. To be specific, the method of padding 0 is utilized, and the data of the comprehensive decision model is filled from 331 to 361. In other words, a Mahjong decision feature map with a dimension of 19 ∗ 19 is generated. For the action decision model, the 253-D column feature is reshaped as a 16 ∗ 16 input matrix for supervised training of the deep learning model. Also, the whole data set has been randomly divided into a training set, a verification set and a test set according to the ratio of 3:1:1.

4.2 Experimental analysis

In this experiments, totally 10 methods have been compared for analyzing the competing capability of playing the Chinese Mahjong game. There are 3 shallow learning-based methods and 7 deep learning-based methods. For shallow learning-based methods, the linear classification (Linear), the support vector machine (SVM) [8], and the gradient lifting decision tree (GBDT) [25] are included. For deep learning-based methods, FC, ResNet-34 [13], ResNet-1001 [14], DenseNet [18], simple LSTM [11], Bidirectional LSTMs with attention [1] and our method are implemented. Details of parameters setting are elaborated in Table 4. It is necessary to mention that, shallow learning-based methods are compared to highlight the superiority of deep learning-based methods in this study.More than that, simple LSTM and Bidirectional LSTMs with attention are compared because these deep learning model are based on the recurrent neural network different from the CNN structure. While ResNet-34, ResNet-1001 and DenseNet are compared since our method is a new ResNet-based model. All experiments were conducted under the same comprehensive decision dataset and action decision dataset. All model training batches were set as 50 epoches for fairness in the models’ learning stage. Special note here about the bidirectional LSTM with attention method. First, in order to use the Recurrent Neural Network to coordinate the attention results, we modified the original input of single pseudo images appropriately. The reason is that for such a structure, the number of Recurrent times is large, and it is very difficult to train, which is manifested in slow speed and serious overfitting. Moreover, in order to apply the time dimension properly, we have to change the input according to the structure of attention. In other words, we treat the data of a whole game as a “sentence”, each decision status as a “word”, and the embedded vector of the word is replaced by low-level semantic pictures, so as to reproduce the original structure. We set the step input to 22, because in the four-players Chinese Mahjong, the maximum number of rounds will not exceed 22. We did the cast calculation once per step. In the experiment, we found that the training can only be carried out after using 1/10 of the original data set. At last, after 50 epoches, we were surprised to find that the average test accuracy was over 97%. This may be because the loss calculation method used is different from the discriminate model based on image. In further tests, when we put all the decision models together to played against themselves, this method had only a winning percentage of about 7% against our method.

Table 4 Comparison experiment parameter settings

Full size table

For accuracies of shallow learning-based methods, they are 22.72%, 31.15%, and 69.47% for Liner, SVM, and GBDT, respectively. For comparisons among deep learning-based methods, curves of model accuracy, loss are depicted in Figs. 5, 6, and the convergence figure of our model in 150 epoches and the precision-recalls are depicted in Figs. 7 and 8, respectively.

In order to verify the correctness of decision-making outcomes provided by the competition strategy introduced in this paper, a challenging competition between one AI and three real senior human players is also conducted. Before that, a competition between 4 models including FC, DenseNet, Bidirectional LSTMs with attention and our method. We set these models to play 2-on-2 against FC models respectively for 1000 matches, using the real four-players Mahjong game rules. Table 5 shows the results of the battle between different decision models. The overall mahjong decision strategy is shown in Algorithm 1. The following parts, Figs. 9, 10, 11 and 12, show that how do our Agent play Chinese Mahjong against with the other three human players and win the victory in different winning types including Common type, Thirteen Different type, Nine-one type and Seven Pairs type. We put the Agent at the forefront.

Table 5 Decision models against themselves(set FC model as the benchmark)

Full size table

In Fig. 9a, the red box area contains the hands, there are 13 tiles, the blue box area contains the tiles which the player takes from the Wall each turns, and the yellow box area contains the discarded tiles.In Fig. 9d, the grey box area constains the suits made by stealing. In Fig. 9h, the purple box area shows the actions options which players can choose, and we engaged that if any player could win, he should choose to win immediately. In Fig. 9a, the Agent’ hands are more likely to get the Common type, so “he” discarded the red dragon and white dragon and so on (Fig. 9b, c). When the last player discarded the number tile 8, the Agent choosen to Pong to accelerated to win (Fig. 9d). The agent is smart enough to keep suits and get to the common type. In Fig. 10a, accroding to hands, it is obviously to get the thirteen different type or nine-one type rather than common type. In Fig. 10b, the Agent discards the White Dragon tile reasonably which is inconsistent with the direction of the winning type. In Fig. 10f, after Agent has number tiles 7, 8 and 9, it decisively discards number tile 5 that are generally very effective. In Mahjong, no matter which kinds of tile 5 it is, tile 5 is an valuable that many experts need to keep. However, Agent judges that it is useless, which is proved to be the case. Our Agent resolutly abandoned pairs and discarded number tile 3 and stick tile 3 to keep wind and dragon tiles (Fig. 10b-g). In Fig. 10f, the Agent gives up ball tile 4 and leaves ball tile 5 and 8. Since ball tile 4 has appeared for three times, the Agent decides that its value is smaller than the other option, which is also very reasonable.

And the seven pairs type is very rare in Chinese Mahjong, when the station Fig. 12a was given to our Agent, “he” s The Fig. 11 presents that our Agent achieves the victory in Nine-one type.In the station Fig.11a, it is naturally thinking of making the hands close to nine-one type or thirteen different type. But the rule is, you should not have any suits. So when the last player discarded a tile East wind, the Agent choose to skip without taking any actions (Fig. 11c). Same station happend in Fig. 11d-g, but different was, a suit of number tile 9 was exactly what the agent want to make nine-one type (based on rule). seemed to keep looking. From Fig. 12a to c, it is obvious that Agent chooses to play Thirteen different type under that uncertain circumstance. However, with the requirement of a ball tile 7, it immediately decides to work towards the Seven Pairs type and Nine-one type. And our Agent kept ball tile 1 and many dragon tiles to reaction to nine-one type instead of only waiting the common type (Fig. 12b-e). It is so brave to continuesly discarded a pairs of ball tile 5 (Fig. 12f-g). Because of the choices, our Agent can waiting seven pairs type and nine-one type at same time.

Four measures, including winning rates, scores, numbers of actions taken until winning games and winning types, are incorporated for quantitive analysis. Their outcomes are summarized in Table 6 (i.e., winning rates, scores, numbers of actions taken until winning games) and Table 7 (i.e., winning types). From Table 6, it can be observed that, the average winning rate obtained by the new competition strategy is higher (i.e., 26.471%) than that of real senior human players. Also, scores obtained from playing the Chinese Mahjong game suggests that the new competition strategy is also superior (21.333 > 19.000). It is also promising to observe that, the new competition strategy takes less actions than real senior human players to win the Chinese Mahjong game (i.e., 8.333 < 12.539), which is a strong indicator that the new competition strategy introduced in this paper is more efficient. Table 7 provides more details regarding winning types obtained by the new competition strategy and real senior human players. It is interesting to observe that, real senior human players are

Table 6 The perfermance of winning rates, scores, and numbers of actions taken until winning games

Full size table

Table 7 Winning types obtained by the new competition strategy and real senior human players

Full size table

more likely to win the Chinese Mahjong game via the ordinary way (i.e., “common type” for 68.0%). Also, the new competition strategy demonstrates superior capabilities to win the Chinese Mahjong game via advanced ways (i.e., “thirteen different type” for 22.2% and “nine-one type” for 22.2% as well). Since those advanced ways can generate more scores than the ordinary way, it also substantiates the statistics in Table 6 that, the new competition strategy is capable to win more scores than real senior human players.

More than that, we added several confrontation results between decision agents after model training, and we deployed the trained model in an online game APP of a company. In particular, it is deployed at the same time in the four-players Chinese Mahjong game, 8 parallel servers provide decision-making service, and these decision service will be irregular appearing across the real players, ensure that 1 Agent will be in the four-players game. Of course, they are similar to the names of all other human players, they will not know the game is the presence of the robot. A total of about 10,000 games were played, and the statistical results showed that our decision model had an average win rate of 28.38%. It is worth noting that more than 70% of the players on this game platform have played more than 10 matches, among which there are many masters. Therefore, the effectiveness of the new competition strategy can also be suggested therein.

5 Conclusion

It is widely acknowledge that, the game theory benefits a lot from recent advances in deep learning, and competition strategies have been proposed for both complete information games and incomplete information games. In this paper, the four-players Chinese Mahjong game, which is a typical incomplete information game, is focused, and a low-level semantic pseudo image generated based on game related prior knowledge and a new deep residual network-based competition strategy are introduced to realize its competition strategy. The contributions of this study can be summarized as follows. First, it is the first attempt to tackle the Mahjong game from the perspective of deep learning. Specifically, competition strategies inspired by shallow learning techniques are already introduced for playing the Japanese Mahjong game, but there have not been existing works to incorporate deep learning techniques for playing the Chines Mahjong game, to the best of our knowledge. Second, a new deep residual network is proposed to realize the competition strategy in this study. Specifically, this network is composed of a series of “GoBlock”, which is a new deep learning model structure introduced in this paper. Also, the “GoBlock” is further made up of “Inception+” sub-structures, which is inspired by the Google Inception model and novel as well. Comprehensive experiments are conducted to reveal the superiority of this new competition strategy. A great number of the Chinese Mahjong game data have been collected from an online Chinese Mahjong company to construct the dataset, and the newly proposed competition strategy has been compared with several shallow learning-based methods as well as deep learning-based methods. Both qualitative and quantitative analysis are conducted based on outcomes obtained by all compared methods, and the superiority of the new competition strategy over others are suggested. Furthermore, an interesting competition among the new AI competition strategy and three real senior players are also conducted. The effectiveness and efficiency of the new competition strategy over real senior human players are also revealed by quantitative analysis based on four measures, from the statistical point of view. Since this is the first attempt to incorporate deep learning techniques in playing the Chinese Mahjong game, future efforts will be spent to propose more sophisticated deep learning models to further boost the performance of the competition strategy.

References

Bahdanau D, Chorowski J, Serdyuk D, Brakel P, Bengio Y (2015) End-to-end attention-based large vocabulary speech recognition. https://doi.org/10.1109/icassp.2016.7472618
Bansal T, Pachocki J, Sidor S et al (2017) Emergent complexity via multi-agent competition. arXiv:1710.03748
Bowling M, Burch N, Johanson M, Tammelin O (2017) Heads-up limit hold’em poker is solved. Science 347(6218):145–149. https://doi.org/10.1145/3131284
Article Google Scholar
Brown N, Sandholm T (2017) Reduced space and faster convergence in imperfect-information games via pruning. In: International conference on machine learning, pp 596–604. http://proceedings.mlr.press/v70/brown17a.html
Brown N, Sandholm T (2017) Safe and nested subgame solving for imperfect-information games. In: Advances in neural information processing systems, pp 689–699. arXiv:1705.02955
Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J (2017) Dual path networks. arXiv:1707.01629
Cho K, Van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H et al (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. Computer Science, https://doi.org/10.3115/v1/d14-1179
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/BF00994018
MATH Google Scholar
Drachen A, Yancey M, Maguire J, Chu D, Wang IY, Mahlmann T, Klabajan D (2014) Skill-based differences in spatio-temporal team behaviour in defence of the ancients 2 (dota 2). In: Games media entertainment (GEM), 2014 IEEE. IEEE, pp 1–8. https://doi.org/10.1109/gem.2014.7048109
Figurnov M, Collins MD, Zhu Y, Zhang L, Huang J, Vetrov DP, Salakhutdinov R (2017) Spatially adaptive computation time for residual networks. In: CVPR, vol 2, p 7. https://doi.org/10.1109/cvpr.2017.194
Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with lstm. Neural Comput 12(10):2451–2471. https://doi.org/10.1049/cp:19991218
Article Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition, pp 770–778, https://doi.org/10.1109/cvpr.2016.90
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, Cham, pp 630–645. https://doi.org/10.1007/978-3-319-46493-0_38
Heinrich J, Silver D (2016) Deep reinforcement learning from self-play in imperfect-information games. arXiv:1603.01121
Helpman E (1987) Imperfect competition and international trade: evidence from fourteen industrial countries. J Jpn Int Econ 1(1):62–81. https://doi.org/10.1016/0889-1583(87)90027-X
Article Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507. https://doi.org/10.1126/science.1127647
Article MathSciNet MATH Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks, vol 1. CVPR, p 3. https://doi.org/10.1109/cvpr.2017.243
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision. Springer, Cham, pp 694–711. https://doi.org/10.1007/978-3-319-46475-6_43
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:1312.6114
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Kingma DP, Dhariwal P (2018) Glow: generative flow with invertible 1x1 convolutions. arXiv:1807.03039
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems, vol 60. Curran Associates Inc, pp 1097–1105. https://doi.org/10.1145/3065386
Mason L, Baxter J, Bartlett PL, Frean MR (2000) Boosting algorithms as gradient descent. In: Advances in neural information processing systems, pp 512–518. https://dblp.org/rec/conf/nips/MasonBBF99
Mizukami N, Tsuruoka Y (2015) Building a computer Mahjong player based on Monte Carlo simulation and opponent models. In: IEEE conference on computational intelligence and games. IEEE, pp 275–283. https://doi.org/10.1109/cig.2015.7317929
Moračík M, Schmid M, Burch N, Lisý V., Morrill D, Bard N et al (2017) Deepstack: expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337):508. https://doi.org/10.1126/science.aam6960
Article MathSciNet MATH Google Scholar
Nash J (1951) Non-cooperative games. Ann Math 54(2):286–295. https://doi.org/10.1515/9781400884087-009
Article MathSciNet MATH Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489. https://doi.org/10.1038/nature16961
Article Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A et al (2017) Mastering the game of go without human knowledge. Nature 550 (7676):354–359. https://doi.org/10.1038/nature24270
Article Google Scholar
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A (2017) Mastering chess and shogi by self-play with a general reinforcement learning algorithm, Lillicrap, T, arXiv:1712.01815
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9. https://doi.org/10.1109/cvpr.2015.7298594
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826. https://doi.org/10.1109/cvpr.2016.308
Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M, Quan J (2017) Starcraft ii: a new challenge for reinforcement learning. arXiv:1708.04782
Von Neumann J (1959) On the theory of games of strategy. Contributions to the Theory of Games 4:13–42. https://doi.org/10.1515/9781400882168-003
MathSciNet MATH Google Scholar
Wang C, Yang H, Bartz C, Meinel C (2016) Image captioning with deep bidirectional lstms, https://doi.org/10.1145/2964284.2964299
Wang C (2017) RRA: recurrent residual attention for sequence learning. arXiv:1709.03714
Wang C, Yang H, Meinel C (2018) Image captioning with deep bidirectional lstms and multi-task learning. ACM Trans Multimed Comput Commun Appl 14(2s):1–20. https://doi.org/10.1145/3115432
Google Scholar
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Klingner J (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 5987–5995. https://doi.org/10.1109/cvpr.2017.634
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv:1605.07146, https://doi.org/10.5244/c.30.87
Zambaldi V, Raposo D, Santoro A, Bapst V, Li Y, Babuschkin I, Shanahan M (2018) Relational deep reinforcement learning. arXiv:1806.01830

Download references

Acknowledgements

The authors would like to acknowledge the grant 61862043 approved by National Natural Science Foundation of China, key grants 20181ACB20006 and 20171ACB21017 as well as grant 20161BAB212047 approved by Natural Science Foundation of Jiangxi Province for supporting this study.

Author information

Authors and Affiliations

Department of Computer Science, School of Information Engineering, Nanchang University, Nanchang, China
Mingyan Wang, Tianwei Yan, Mingyuan Luo & Wei Huang

Authors

Mingyan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tianwei Yan
View author publications
You can also search for this author in PubMed Google Scholar
Mingyuan Luo
View author publications
You can also search for this author in PubMed Google Scholar
Wei Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Huang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, M., Yan, T., Luo, M. et al. A novel deep residual network-based incomplete information competition strategy for four-players Mahjong games. Multimed Tools Appl 78, 23443–23467 (2019). https://doi.org/10.1007/s11042-019-7682-5

Download citation

Received: 22 August 2018
Revised: 28 February 2019
Accepted: 24 April 2019
Published: 04 May 2019
Issue Date: 30 August 2019
DOI: https://doi.org/10.1007/s11042-019-7682-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A novel deep residual network-based incomplete information competition strategy for four-players Mahjong games

Abstract

Similar content being viewed by others

Algorithms for Mastering Board Game Nanahoshi Considering Deep Neural Networks

A Further Investigation of Neural Network Players for Game 2048

Challenging Human Supremacy: Evaluating Monte Carlo Tree Search and Deep Learning for the Trick Taking Card Game Jass

1 Introduction

2 Related works

2.1 Recent developments in deep learning techniques

2.2 Recent developments in competition strategies of playing incomplete information games

3 Methodology

3.1 Low-level semantic features based on compressed prior knowledge

3.1.1 Basic rules of the Chinese Mahjong game

3.1.2 Compressed low-level semantic features of the Chinese Mahjong game

3.2 The unbalanced ResNet-based deep discriminant learning model

4 Experiments and analysis