1 Introduction

It is widely acknowledged that, the game theory is a classic research domain [36]. Generally speaking, depending on the available information within games, most games can be roughly divided into two categories, i.e., the complete information games and the incomplete information games. For a complete information game, each participant of the game shares the same amount of available information (e.g., features, strategies, etc.), and typical complete information games include Go, chess, etc. For an incomplete information game, the amount of available information known by each individual participant is not equivalent. For example, the bridge (i.e., a popular card game) is a classic incomplete information game. In this came, the available information that one participant possesses is acquired from cards of her/his own. In other words, the amount of available information varies depending on cards of each individual participant in this bridge game. Typical incomplete information games include but not limit to poker games, Shikoku chess, Mahjong. Starcraft, etc. It is necessary to point out that, the competition strategy of playing incomplete information games keeps on receiving much research attention and popularity nowadays, since a lot of decision-making problems in daily lives of human beings actually arise from incomplete information games. For instance, the business negotiations in many economic scenarios are in fact problems of incomplete information games, as the possessed information among each side of the business negotiations cannot be really symmetric [16]. After known the importance of incomplete information games, it is easy to perceive that, the competition strategy of playing incomplete information games is worthy of comprehensive and thorough investigations.

In recent years, it is noticed that, the development of competition strategies of playing either complete or incomplete information games often complies well with the progress of machine learning techniques. For instance, in the game of Go, the AlphaGo series (i.e., AlphaGo, AlphaGoZero, AlphaZero) [29,30,31] proposed by Google DeepMind become well known in the last 3 years, and the main technical motivation of them belongs to sophisticated deep learning techniques. In the game of Texas Hold’em, the DeepStack system was introduced by the University of Alberta, and the system employs an anti-regretization minimization algorithm (CFR) as well as a multi-layer deep neural network (DNN) for playing the Texas Hold’em [3]. Also, another introducing work named Libratus was proposed by the Carnegie Mellon University to handle the same Texas Hold’em game recently [4, 5]. In the game of Nash equilibrium [28], popular machine learning techniques, such as the supervised learning and the reinforcement learning, are both adopted to enable the self-confrontation convergence within the Nash equilibrium game [15]. In the game of Japanese Mahjong, researchers from the University of Tokyo developed a competition system incorporating specific Japanese Mahjong rules, and it was reported that good performance can be obtained based on their introduced Phoenix platform [26].

In this study, the famous Chinese Mahjong, which is a typical incomplete information game, is focused. A novel competition strategy composed of new deep residual networks is proposed for the first time. Generally speaking, the deep residual network is a popular deep discriminant model, and it is still considered to be one of the-state-of-art deep discriminant models nowadays (i.e., the original model receives the only best paper award of CVPR 2016) [13]. The deep residual network is popular at present since it can properly solve the notorious degradation problem in deep learning, which implies that the accuracy of a deep network model can get saturated and it will begin to degrade rapidly with the increase of the model depth. In this study, the merit of the original deep residual network is kept within the newly proposed deep residual network-based competition strategy, and it is valuable to mention that, this study is also the first attempt to solve the problem of incomplete information games based on the deep residual network.

The organization of this paper is as follows. In Section 2, a comprehensive review of recent developments in deep learning techniques is first described in Section 2.1. After that, a thorough review of recent developments in competition strategies of playing incomplete information games is elaborated in Section 2.2. In Section 3, details of the newly introduced deep residual network-based competition strategy for playing the Chinese Mahjong game are described. In Section 4, comprehensive experiments are conducted and the superiority of the newly proposed competition strategy in this study is revealed based on statistical comparisons with other popular competition strategies. In Section 5, the conclusion of this study is drawn and the future direction is suggested therein.

2 Related works

Since a novel competition strategy motivated by up-to-date deep learning techniques is proposed for playing incomplete information games in this study, two reviews will be provided in this section. The first review emphasizes on recent developments in deep learning techniques, and it is described in Section 2.1. The second review focused on recent developments of competition strategies of playing incomplete information games, and it is shown in Section 2.2.

2.1 Recent developments in deep learning techniques

It is widely acknowledged that, deep learning techniques represent the current trend of machine learning, and the early paper published by Hinton et al. can be viewed as a modern symbol to revive the large-scale neural network study, i.e., deep learning [17]. Nowadays, deep learning models can be characterized via both deeper layers and sophisticated model structures. To be specific, in the year 2012, a parallel multi-channel convolutional neural network named as AlexNet was proposed [24]. This network only has 8 layers and it is a bit shallow compared with those deep learning model vastly utilized nowadays. In the year 2014, another well known deep learning model, the VGG, was introduced. This model has a deeper structure and the depth can reach up towards 16 layers or 19 layers, depending on its different versions [32]. At the same year, Google proposed the GoogleNet (i.e., Inception V1 to V3 [19, 33, 34]), and its structure is more sophisticated compared with those predecessors of deep learning. Inception V1 utilizes a large number of sparsely connected networks and it can achieve high computational performance when dealing with dense matrices. It also uses three different 1 × 1, 3 × 3, and 5 × 5 sizes of convolution kernels, which enables the fusion of different initial features possible. In Inception V2, the famous batch normalization (BN) technique was proposed to modify the network connection to avoid excessive parameters and information loss caused by large convolution kernels [19]. Also, n × 1 and 1 × n were used in this version instead of original convolutions. In Inception V3, the initial convolutional layer was further replaced with a small convolution kernel, and its energy equation also got updated. Also, the depth of the original Inception model is 22 layers, which is significantly deeper than AlexNet and VGG.

When the deep residual network (ResNet) was initially proposed in the year 2015 [13], it soon becomes one of the dominant deep learning models, and still influences the design of many deep discriminant models to date [10, 20, 40, 42]. The essence of ResNet is illustrated in Fig. 1, which employs identified mappings as shortcut connections to effectively avoid the vanishing gradient problem caused by the excessively enlarging the number of layers in ResNet for boosting its generalization capability. As a result, the ResNet can be as deep as 1001 layers [14], and its generalization performance is still guaranteed. After ResNet was introduced, a great number of ResNet variants, including the DenseNet [18], the ResNeXt [41], the Dual Path Network [6], have been introduced in recent years. For the DenseNet, it continues with the similar idea of creating short paths among different layers introduced in ResNet [18]. For the ResNeXt, it introduces a new concept named cardinality. It also incorporates structures of stacked-VGG as well as Inception’s split-transform-merge in its model’s construction [41]. For the Dual Path Network, it combines the ResNeXt and the DenseNet within one single network, and the generalization capability of this network was also verified [6]. In addition, an increasing number of studies and applications of RNN (i.e., Recurrent Neural Network) have been made. There are many variations of the RNN model including the classical structure of LSTM (i.e., Long Short-Term Memory) [11], GRU (i.e., Gate Recurrent Unit) [7] and the recent research hotspot attention mechanism [1]. These typical RNN models are also combined with convolutional neural networks and even ResNet [38], which have made amazing progress in image recognition, image semantic analysis and other aspects [37,38,39].

Fig. 1
figure 1

An illustration of identified mapping from the ResNet model [13]

In this study, the idea of ResNet is also incorporated in the newly introduced competition strategy, and a new ResNet-based deep discriminant learning model is proposed for the first time to handle the problem of incomplete information games. It can be perceived from the above review that, the new model follows the current trend of building up deep discriminant learning models very well. It is also necessary to mention that, deep generative models, such as the variational auto-encoder (VAE) [21], the generative adversarial network (GAN) [12], the graph lowering compiler technique for neural networks (GLOW) [23], also receive much popularity in contemporary deep leaning studies. However, since this study is deep discriminant learning-based, reviews about deep generative learning models are not included in this subsection.

2.2 Recent developments in competition strategies of playing incomplete information games

As it is described in Section 1, developments of competition strategies of playing complete / incomplete information games comply well with the recent advance in machine learning techniques. In the era of shallow learning, many research efforts need to be spent to construct semantic feature spaces for each individual complete / incomplete information game, and the performance of competition strategies could be heavily influenced by those semantic features. For instance, in [26], researchers in the University of Tokyo really spent a great number of efforts to incorporating prior knowledge of human beings into the semantic features construction of the Japanese Mahjong game. When the era of deep learning begins, those burdensome hand-crafted semantic features are mainly replaced by latent semantic features automatically learned from various deep learning models. For example, in the famous AlphaGo, the classic convolutional neural network (CNN) is adopted to construct a fast rollout integrated strategy network (i.e., named “policy net”) via supervised learnings. Meanwhile, another self-gaming system named “value evaluation net” is built up through reinforcement learning methods. Rules of the Go game are implemented via the classic Monte-Carlo idea and a conventional Monte Carlo tree search is employed as the optimization and the deduction of the whole Go system. For its successors, the AlphaGo Zero and the AlphaZero, ResNets are incorporated to replace the role of CNNs, and the capability of the whole Go system can be improved via self-playing. It can be perceived from the above descriptions that, the advance from AlphaGo to AlphaGo Zero / AlphaZero also complies well with the trend in deep discriminant learning models.

For recent studies on competition strategies for playing incomplete information games, Texas Hold’em and Mahjong become representative incomplete information games to be investigated. For the Texas Hold’em game, in [3], researchers incorporated CNNs and a virtual self-game strategy to fulfill the supervised learning of the Texas Hold’em deep learning model. The performance turns out that, the learned system is capable to defeat three top ACPC (i.e., Annual Computer Poker Competition) computer poker programs on limited-limit Texas Hold’em [3]. In [27], it is reported that the DeepStack system utilizing the counterfactual regret minimization algorithm becomes the first Texas Hold’em artificial intelligence system to defeat human professionals in the Texas Hold’em game. In [5], the Libratus system is introduced, in which a security-proximal optimal subproblem solving algorithm as well as an improved counterfactual regret algorithm are incorporated. For the Mahjong game, recent studies demonstrate that, only the Japanese Mahjong game is considered as an incomplete information game and shallow learning techniques (i.e., the linear regression and logistic regression methods) are adopted to realize its competition strategy [26]. Also , in multi-person non-cooperative games like StarCraft II or Defense of the Ancients(Dota), many remarkable works have been done [2, 9, 35, 43]. But, it is still a hard problem.

In this study, a newly unbalanced ResNet-based deep discriminant learning model is proposed for the first time to handle the problem of incomplete information games. The contributions of this study can be summarized as follows. First, it is the first attempt to handle the Chinese Mahjong game as a incomplete information game problem. Second, it is also the first ever attempt to tackle the Mahjong game via the deep learning perspective. Third, the ResNet-based deep discriminant learning model is novel in its model construction and its superiority will be revealed via comprehensive experiments. Details of the new model is introduced in Section 3.

3 Methodology

In this section, details of the unbalanced ResNet-based deep discriminant learning model for handling the Chinese Mahjong game are elaborated. Significantly different from other deep learning studies, which utilize raw data as the input of their deep learning models (e.g., in computer vision studies, images themselves are often directly inputed into deep learning models, and the semantic gap between raw images (at the input of deep learning models) and their semantic understandings (at the output of deep learning models) is expected to be bridged via the generalization capability provided by deep learning models), low-level semantic features inspired by prior knowledge closely related towards rules and status of the Chinese Mahjong game are selected as the input of the unbalanced ResNet-based deep discriminant learning model. The reason is because that, in every decision situation in Chinese Mahjong game, it is necessary to connect decision-making situations and decision-making mapping. Specifically, the extraction of the significant information related to the Mahjong game, including the information on the board, the behavior of the opponents, our own hand, as well as the existing rules and winning methods, to conduct semantic segmentation, and generate pseudo images. It is assumed that the generalization capability of the new deep learning model can be improved with the help of prior knowledge, learning the probability distribution related to the decision, and solving this Chinese Mahjong game. Therefore, the low-level semantic features inspired by prior knowledge as well as the newly unbalanced ResNet-based deep discriminant learning model will be emphasized in this section, and they are elaborated in Sections 3.1 and 3.2, respectively.

Figure 2 demonstrates the main flowchart of the newly unbalanced ResNet-based deep discriminant learning model for handling the Chinese Mahjong game in this study. It can be noticed that, semantic features described in Section 3.1 are to be employed as the input of the whole model. The new model introduced in Section 3.2 will be made up of a series of “GoBlock”, which is a novel deep learning structure proposed in this study. Details are described in the followings.

Fig. 2
figure 2

The main flowchart of the unbalanced ResNet-based deep discriminant learning model for handling the Chinese Mahjong game

3.1 Low-level semantic features based on compressed prior knowledge

3.1.1 Basic rules of the Chinese Mahjong game

The Chinese Mahjong game is a table game, in which four players start with several tiles (i.e., cards in Poker game) and compete to achieve the highest score to win. Generally speaking, one ordinary game of the Chinese Mahjong usually consists of four or eight rounds. In each individual round, one player usually gets 13 tiles as the “initial hand”. one of the four players will be determined as the dealer, who should have one more tile and play first. At each individual turn in one round, each player draws a tile from the wall (i.e., a set of invisible tiles decided at the start with randomly arranged). After that, the very player should either discard a tile or apply another action among Chow, Pong and Kong. The above playing continues until one player declares a Win. Upon winning in the Chinese Mahjong game, only when a player with a “winning hand” consisting of 14 tiles in a particular combination is eligible. Also, ways to win the Chinese Mahjong game vary. For example, when a player picks up a winning tile from the wall, it is named winning from the wall. If a player wins when one of other players discards a winning tile, it is called winning by a discard. It is necessary to mention that, in the Chinese Mahjong game, it is not allowed to win by a discard, which is significantly different from the Japanese Mahjong game.

More specifically, in the Chinese Mahjong game, there are 3 types of “numbers tile” including numbers or also can be called character, balls, and sticks, each type has 9 numbers (i.e., from 1 to 9). Besides the above 3 types of “numbers tile”, another kind of tiles in the Chinese Mahjong game are Wind and Dragon, in which East Wind, South Wind, West Wind, North Wind, Red Dragon, Green Dragon and White Dragon are included. During each individual round of playings, one player can steal the discarded tile from another player. However, the action steal is further made up of three ways. The first one is the action Chow, which means when the first player discards a tile and it is exactly the very tile that the second player needs to make up a “3-sequential-tile” (e.g., the second player has ball tile 1 and 3, and the first player discards a ball tile 2, then the second player can pick up the very tile). The second one is the action Pong, which means a player can pick up a tile making up “3-same-tile” (i.e., triplet, for instance, three ball tiles 1). The third one is the action Kong, which means a player can pick up a tile which can make up “4-same-tile”. Furthermore, when a player declares an action including Chow, Pong or Kong, this player should put the combined tile in front of his hand tiles, which is called suit, and those tiles cannot be utilized anymore. Upon winning, when a player needs only one tile to win, it is called waiting. In the Chinese Mahjong game, there are often four types of winning combinations, and they are the common type, the seven pairs type, the thirteen different type and the nine-one type. Detailed explanations of all above 3 “steal” actions and 4 winning combinations in the Chinese Mahjong game are elaborated in Table 1. It’s important to note that, the complexity of four-players Mahjong game is more than 3.4 × 10282 decision points.

Table 1 3 “steal” actions and 4 basic main combinations of tiles in the Chinese Mahjong game

3.1.2 Compressed low-level semantic features of the Chinese Mahjong game

Low-level semantic features which will later be fed into the unbalanced ResNet-based deep discriminant learning model in this study as the model’s input, are mainly inspired by basic rules of the Chinese Mahjong game introduced above. Before constructing the low-level semantic features, it is necessary to understand the main challenge of the Chinese Mahjong game. Generally speaking, the main challenge of the Chinese Mahjong game is that, hands of other players are invisible, so that tactics of other players are totally blind. Therefore, one player can only get known the current situation better by observing historical events in this game. Because of the unpredictable randomness of each Chinese Mahjong game, making a correct decision will be become a challenging issue for players.

In order to tackling the above challenging issue, low-level semantic features are constructed based on basic rules and valuable prior knowledge of the Chinese Mahjong game. Among them, basic rules are already elaborated, therefore prior knowledge of the Chinese Mahjong game is emphasized here. In this study, the prior knowledge is suggested to be collected from the player, and it is made of 1) the hand information, 2) the field information of tiles that are discarded at each step of the game, and 3) the action information of each player in each step of the game. To be specific, the hand information is from initial tiles obtained from the player as well as tiles that have been changed within the game process. Based on Table 2, it can be perceived that, the hand information includes tiles of numbers, balls, sticks, winds, etc. The field information, on the other hand, means the type and number of tiles which have been discarded. This kind of information is caused by actions that all players have performed, including Chow, Pong, Kong, which are listed in Table 1 . The action information, however, is more sophisticated to be represented. Generally speaking, in one Chinese Mahjong game, most of the winning combinations require a variety of suits. In other words, operations such as Chow, Pong, Kong can speed up the victory. It suggests that constructing tiles that can quickly form suits is of high importance. In order to make up a suit, the essence resides in keeping the neighboring tiles or non-neighboring tiles with equal intervals towards each other. For example, Number tiles 1 and 3 are more valuable than Number tiles 1 and 7. Besides the above three kinds of informations to be incorporated as prior knowledge, there are also other semantic features to be added in. One is the waiting number. Generally speaking, the waiting number is the number of tiles that one player needs for achieving a win, and it is mathematically defined in (1).

$$ N_{waiting} = N_{MAX} - N_{current} $$
(1)

where, Nwaiting represents the waiting number; NMAX describes the max waiting number of this winning type (e.g., the max waiting number of the common type is 13, and the max waiting number of Seven Pairs type is 7). Table 3 elaborates all action information to construct semantic features in this study.

Table 2 Semantic features of the hand information in the Chinese Mahjong game
Table 3 Semantic features of the action information in the Chinese Mahjong game

3.2 The unbalanced ResNet-based deep discriminant learning model

As observed in Fig. 2, the unbalanced ResNet-based deep discriminant learning model is made up of a series of “GoBlock”, which is a new deep learning model structure introduced in this part. Inside each individual GoBlock, there are also three “Inception+” sub-structures. Details are described as the followings.

It is widely acknowledged in ResNet that, the output of ResNet is changed from the conventional output H(x) of a deep learning model into H(x) = F(x) + x (illustrated in Fig. 1), which ensures that all calculated gradients at each individual layer can propagate well. Another interesting thing to take note is that, in GoogleNet Inception V3, a variety of convolution kernels of different sizes are employed, and a flexible and efficient network is obtained via splicing of different channel numbers. Since the decision-making process in the Chinese Mahjong game is complicated, the deep learning model utilized to tackle the very game requires richer parameters and deeper networks for enormous generalization capabilities. Therefore, ResNet and GoogleNet Inception V3 become motivations to build up the unbalanced deep discriminant learning model in this study. To be specific, this new model consists of a stack of residual blocks, which is named as “GoBlocks”. It is also necessary to mention that, these GoBlocks share the same topology, and the basic structure of a GoBlock is illustrated in Fig. 3.

Fig. 3
figure 3

The basic structure of a GoBlock unit

Based on Fig. 3, it can be observed that, its basic structure contains a channel copy layer which contains two twin parallel layers, a channel splitting layer which splits the channel into two different parts, and three multi-convolution nuclear multi-channel fusion layers all of which are organized similarly to the Google Inception V3 block. And we put the splitting layer and the fusion layers together, named Inception+ structure. The number of channels entering the Inception+ structure is 192. There are several key issues to be addressed here. First, the channel copy layer just copy the input channel, because our semantic segmentation features are less than a really image, which means we need more feautres for convolution operations. Twin parallel convolution layers are directly connected. Significantly different from the later splitting layer, the number of channels here cannot be duplicated, and the original channel should be symmetrically divided. After the convolution operation is executed on twin parallel convolution layers, a linear splicing operation is utilized for back-prorogation, and the size of the convolution kernel is set as 3 × 3. Such an operation aims to realize the cardinality concept, and it is helpful to further increase the width of the network. Second, the upper layer aims to split the input features and pass them into two parallel convolutional layers. The size of the convolution kernel is 3 × 3. The purpose of this layer is to take great advantage of the different channels, which is also beneficial to increase the width of the whole model for better generalization capability and improved efficiency. Unfortunately, we tried several plans to enlarge the channels, but it turns out to be a catastrophic overfitting. So we just keep two channels splitting opeations for the best performance.

Then, the classic structure of Google Inception V3 is directly connected [34]. In its structure, we do a little change ,the 1 × 1 and 3 × 3 convolution layers without 5 × 5 convolution layers mainly determine the number of feature maps, so that the entire Inception structure not only has good feature extraction capabilities, but also guarantees good compression performance regarding the amount of parameters. Considering of the input size of our feature maps, We abandoned the 5 × 5 kernel to keep enough key information without wasting. Different from Inception V3, we divide the original 192 channels into 3 channels of 64 channels for processing in different convolutional layers. It should be noted here that the size of feature maps through the convolutional layer should remain unchanged. Because we’re passing in pseudo images that have been processed, there’s not a lot of rich information to lose. In this study, the Inception+ model structure is applied to increase the number of channels, in order to improve fitting capability of the unbalanced deep learning model as well. To be specific, there are three parallel networks in each Inception structure, which include a convolution layer of 1 × 1 convolution kernel size conneceting with 1 × 1 concatenation, a convolution layer of 1 × 1 connecting with 3 × 3 convolution layer, and a 3 × 3 maxpooling layer connection 1 × 1 convolution layer. Additionally, two parallel convolutional layers are connected by an Inception structure to form a new Inception+ layer, and its the main structure is illustrated in Fig. 4. It can also be observed from Fig. 3 that, three Inception+ structures are concatenated to form the GoBlock structure. One thing to mention here is that, in order to make the data distribution more reasonable and reduce the occurrence of over-fitting problems, the BN (i.e.,Batch Normalization) layer is added in front of each convolutional layer [19].

Fig. 4
figure 4

The main structure of the Inception+ network

It is worth noting that, in the design of the whole discriminant model, in order to make the overall network better for the low-level semantic features, we tested the parameter setting of multiple groups of different GoBlocks. When the number of GoBlocks is less than 5, its network performance does not exceed the original res-1001 network structure on the same data set.That might be because when the number of layers was low, our shortcut and unbalanced multiple convolution kernel didn’t make much sense. When the number of GoBlocks is greater than 8, the network performance does not change significantly. Its test loss remains around 0.64 ± 0.002. However, the parameter is increased by 407k compared with 7 GoBlocks (i.e., specifically 2887372 total parameters in 7 GoBlocks and 3295116 total parameters in 8 GoBlocks). When the number of GoBlocks is greater than 20, obvious overfitting occurs.

4 Experiments and analysis

4.1 Data description and data pre-processing

All implementations in this study are realized based on Tensorflow and Keras open source frameworks. The deep learning platform utilized in this study is a GTX 1080 8g GPU card with 16g RAM. In order to verify the effectiveness of the new deep discriminant learning model proposed in this paper, a great number of data are collected from one Chinese Mahjong game company, which has collaborations with researchers of this study. To be specific, the data collection span ranges from November 2017 and April 2018, and the data sources are from real anonymous players who have played the online Chinese Mahjong of that company during the data collection span. After collecting the Chinese Mahjong game data, all data under a rigorous pre-processing operation. First, in order to ensure the high quality of the constructed Chinese Mahjong dataset, data are only enclosed from master players. The standard to pick master players is to shortlist ones with game scores in the top 1000 of all players and winning records of at least 500 games, simultaneously. Second, in order to make the data samples well distributed, all kinds of winning patterns in diverse situations are taken into consideration. After all above rigorous pre-processing operations, the Chinese Mahjong game dataset composed of the 2.40 million hand-made decision scenes with 1000,000 bureaus are constructed at the end. Inside the dataset, there are about 548,690 matches of Common type, 89,121 of Seven pairs type, 339,120 Nine-one type, and 178,910 of Thirteen different type. For the data related to operating decision making, there are about 311,891 of Chow, 350,190 of Pong, 70,332 of Kong and 503,878 of passing. In the data set of the action decision model, key actions including the Chow, Pong and Kong are adopted to form the data set. It is also worthy to mention that, after composing the data feature following Section 3.1, the data also needs to be reshaped as a 2D matrix for the ease of 2D convolution within the newly introduced deep learning model. To be specific, the method of padding 0 is utilized, and the data of the comprehensive decision model is filled from 331 to 361. In other words, a Mahjong decision feature map with a dimension of 19 ∗ 19 is generated. For the action decision model, the 253-D column feature is reshaped as a 16 ∗ 16 input matrix for supervised training of the deep learning model. Also, the whole data set has been randomly divided into a training set, a verification set and a test set according to the ratio of 3:1:1.

4.2 Experimental analysis

In this experiments, totally 10 methods have been compared for analyzing the competing capability of playing the Chinese Mahjong game. There are 3 shallow learning-based methods and 7 deep learning-based methods. For shallow learning-based methods, the linear classification (Linear), the support vector machine (SVM) [8], and the gradient lifting decision tree (GBDT) [25] are included. For deep learning-based methods, FC, ResNet-34 [13], ResNet-1001 [14], DenseNet [18], simple LSTM [11], Bidirectional LSTMs with attention [1] and our method are implemented. Details of parameters setting are elaborated in Table 4. It is necessary to mention that, shallow learning-based methods are compared to highlight the superiority of deep learning-based methods in this study.More than that, simple LSTM and Bidirectional LSTMs with attention are compared because these deep learning model are based on the recurrent neural network different from the CNN structure. While ResNet-34, ResNet-1001 and DenseNet are compared since our method is a new ResNet-based model. All experiments were conducted under the same comprehensive decision dataset and action decision dataset. All model training batches were set as 50 epoches for fairness in the models’ learning stage. Special note here about the bidirectional LSTM with attention method. First, in order to use the Recurrent Neural Network to coordinate the attention results, we modified the original input of single pseudo images appropriately. The reason is that for such a structure, the number of Recurrent times is large, and it is very difficult to train, which is manifested in slow speed and serious overfitting. Moreover, in order to apply the time dimension properly, we have to change the input according to the structure of attention. In other words, we treat the data of a whole game as a “sentence”, each decision status as a “word”, and the embedded vector of the word is replaced by low-level semantic pictures, so as to reproduce the original structure. We set the step input to 22, because in the four-players Chinese Mahjong, the maximum number of rounds will not exceed 22. We did the cast calculation once per step. In the experiment, we found that the training can only be carried out after using 1/10 of the original data set. At last, after 50 epoches, we were surprised to find that the average test accuracy was over 97%. This may be because the loss calculation method used is different from the discriminate model based on image. In further tests, when we put all the decision models together to played against themselves, this method had only a winning percentage of about 7% against our method.

Table 4 Comparison experiment parameter settings

For accuracies of shallow learning-based methods, they are 22.72%, 31.15%, and 69.47% for Liner, SVM, and GBDT, respectively. For comparisons among deep learning-based methods, curves of model accuracy, loss are depicted in Figs. 56, and the convergence figure of our model in 150 epoches and the precision-recalls are depicted in Figs. 7 and 8, respectively.

Fig. 5
figure 5

Testing curves of model’s accuracy

Fig. 6
figure 6

Testing curves of model’s loss

Fig. 7
figure 7

Training and testing curves of our model is convergent in 150 epoches

Fig. 8
figure 8

Precision rate and recalls rate curves of model perference

In order to verify the correctness of decision-making outcomes provided by the competition strategy introduced in this paper, a challenging competition between one AI and three real senior human players is also conducted. Before that, a competition between 4 models including FC, DenseNet, Bidirectional LSTMs with attention and our method. We set these models to play 2-on-2 against FC models respectively for 1000 matches, using the real four-players Mahjong game rules. Table 5 shows the results of the battle between different decision models. The overall mahjong decision strategy is shown in Algorithm 1. The following parts, Figs. 91011 and 12, show that how do our Agent play Chinese Mahjong against with the other three human players and win the victory in different winning types including Common type, Thirteen Different type, Nine-one type and Seven Pairs type. We put the Agent at the forefront.

figure e
Table 5 Decision models against themselves(set FC model as the benchmark)
Fig. 9
figure 9

Our Agent play with three human players and win the game in Common type

Fig. 10
figure 10

Our Agent play with three human players and win the game in Thirteen Different type

Fig. 11
figure 11

Our Agent play with three human players and win the game in Nine-one type

Fig. 12
figure 12

Our Agent play with three human players and win the game in Seven Pairs type

In Fig. 9a, the red box area contains the hands, there are 13 tiles, the blue box area contains the tiles which the player takes from the Wall each turns, and the yellow box area contains the discarded tiles.In Fig. 9d, the grey box area constains the suits made by stealing. In Fig. 9h, the purple box area shows the actions options which players can choose, and we engaged that if any player could win, he should choose to win immediately. In Fig. 9a, the Agent’ hands are more likely to get the Common type, so “he” discarded the red dragon and white dragon and so on (Fig. 9b, c). When the last player discarded the number tile 8, the Agent choosen to Pong to accelerated to win (Fig. 9d). The agent is smart enough to keep suits and get to the common type. In Fig. 10a, accroding to hands, it is obviously to get the thirteen different type or nine-one type rather than common type. In Fig. 10b, the Agent discards the White Dragon tile reasonably which is inconsistent with the direction of the winning type. In Fig. 10f, after Agent has number tiles 7, 8 and 9, it decisively discards number tile 5 that are generally very effective. In Mahjong, no matter which kinds of tile 5 it is, tile 5 is an valuable that many experts need to keep. However, Agent judges that it is useless, which is proved to be the case. Our Agent resolutly abandoned pairs and discarded number tile 3 and stick tile 3 to keep wind and dragon tiles (Fig. 10b-g). In Fig. 10f, the Agent gives up ball tile 4 and leaves ball tile 5 and 8. Since ball tile 4 has appeared for three times, the Agent decides that its value is smaller than the other option, which is also very reasonable.

And the seven pairs type is very rare in Chinese Mahjong, when the station Fig. 12a was given to our Agent, “he” s The Fig. 11 presents that our Agent achieves the victory in Nine-one type.In the station Fig.11a, it is naturally thinking of making the hands close to nine-one type or thirteen different type. But the rule is, you should not have any suits. So when the last player discarded a tile East wind, the Agent choose to skip without taking any actions (Fig. 11c). Same station happend in Fig. 11d-g, but different was, a suit of number tile 9 was exactly what the agent want to make nine-one type (based on rule). seemed to keep looking. From Fig. 12a to c, it is obvious that Agent chooses to play Thirteen different type under that uncertain circumstance. However, with the requirement of a ball tile 7, it immediately decides to work towards the Seven Pairs type and Nine-one type. And our Agent kept ball tile 1 and many dragon tiles to reaction to nine-one type instead of only waiting the common type (Fig. 12b-e). It is so brave to continuesly discarded a pairs of ball tile 5 (Fig. 12f-g). Because of the choices, our Agent can waiting seven pairs type and nine-one type at same time.

Four measures, including winning rates, scores, numbers of actions taken until winning games and winning types, are incorporated for quantitive analysis. Their outcomes are summarized in Table 6 (i.e., winning rates, scores, numbers of actions taken until winning games) and Table 7 (i.e., winning types). From Table 6, it can be observed that, the average winning rate obtained by the new competition strategy is higher (i.e., 26.471%) than that of real senior human players. Also, scores obtained from playing the Chinese Mahjong game suggests that the new competition strategy is also superior (21.333 > 19.000). It is also promising to observe that, the new competition strategy takes less actions than real senior human players to win the Chinese Mahjong game (i.e., 8.333 < 12.539), which is a strong indicator that the new competition strategy introduced in this paper is more efficient. Table 7 provides more details regarding winning types obtained by the new competition strategy and real senior human players. It is interesting to observe that, real senior human players are

Table 6 The perfermance of winning rates, scores, and numbers of actions taken until winning games
Table 7 Winning types obtained by the new competition strategy and real senior human players

more likely to win the Chinese Mahjong game via the ordinary way (i.e., “common type” for 68.0%). Also, the new competition strategy demonstrates superior capabilities to win the Chinese Mahjong game via advanced ways (i.e., “thirteen different type” for 22.2% and “nine-one type” for 22.2% as well). Since those advanced ways can generate more scores than the ordinary way, it also substantiates the statistics in Table 6 that, the new competition strategy is capable to win more scores than real senior human players.

More than that, we added several confrontation results between decision agents after model training, and we deployed the trained model in an online game APP of a company. In particular, it is deployed at the same time in the four-players Chinese Mahjong game, 8 parallel servers provide decision-making service, and these decision service will be irregular appearing across the real players, ensure that 1 Agent will be in the four-players game. Of course, they are similar to the names of all other human players, they will not know the game is the presence of the robot. A total of about 10,000 games were played, and the statistical results showed that our decision model had an average win rate of 28.38%. It is worth noting that more than 70% of the players on this game platform have played more than 10 matches, among which there are many masters. Therefore, the effectiveness of the new competition strategy can also be suggested therein.

5 Conclusion

It is widely acknowledge that, the game theory benefits a lot from recent advances in deep learning, and competition strategies have been proposed for both complete information games and incomplete information games. In this paper, the four-players Chinese Mahjong game, which is a typical incomplete information game, is focused, and a low-level semantic pseudo image generated based on game related prior knowledge and a new deep residual network-based competition strategy are introduced to realize its competition strategy. The contributions of this study can be summarized as follows. First, it is the first attempt to tackle the Mahjong game from the perspective of deep learning. Specifically, competition strategies inspired by shallow learning techniques are already introduced for playing the Japanese Mahjong game, but there have not been existing works to incorporate deep learning techniques for playing the Chines Mahjong game, to the best of our knowledge. Second, a new deep residual network is proposed to realize the competition strategy in this study. Specifically, this network is composed of a series of “GoBlock”, which is a new deep learning model structure introduced in this paper. Also, the “GoBlock” is further made up of “Inception+” sub-structures, which is inspired by the Google Inception model and novel as well. Comprehensive experiments are conducted to reveal the superiority of this new competition strategy. A great number of the Chinese Mahjong game data have been collected from an online Chinese Mahjong company to construct the dataset, and the newly proposed competition strategy has been compared with several shallow learning-based methods as well as deep learning-based methods. Both qualitative and quantitative analysis are conducted based on outcomes obtained by all compared methods, and the superiority of the new competition strategy over others are suggested. Furthermore, an interesting competition among the new AI competition strategy and three real senior players are also conducted. The effectiveness and efficiency of the new competition strategy over real senior human players are also revealed by quantitative analysis based on four measures, from the statistical point of view. Since this is the first attempt to incorporate deep learning techniques in playing the Chinese Mahjong game, future efforts will be spent to propose more sophisticated deep learning models to further boost the performance of the competition strategy.