Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Over the last decade, or ever since the advent of the General Game-Playing (GGP) competition [7], research interest in general approaches to intelligent game playing has become increasingly mainstay. GGP systems autonomously learn how to skilfully play a wide variety of (simultaneous or alternating) turn-based games, given only a description of the game rules. Similarly, General Video-Game (GVG) systems learn strategies for playing various video games in real-time and non-turn-based settings.

In the above mentioned systems the domain model (i.e., rules) for the game at hand is sent to the game-playing agent at the beginning of each match, allowing legitimate play off the bat. For example, games in GGP are described in a language named Game Description Language (GDL) [12], which has axioms for describing the initial game state, the generation of legal moves and how they alter the game state, and how to detect and score terminal positions. Respectively, video games are described in the Video Game Description Language (VGDL) [18]. The agent then gradually learns improved strategies for playing the game at a competitive level, typically by playing against itself or other agents. However, ideally one would like to build fully autonomous game-playing systems, that is, systems capable of learning not only the necessary game-playing strategies but also the underlying domain model. Such systems would learn skilful play simply by observing others play.

Automated model acquisition is an active research area spanning many domains, including constraint programming and computer security (e.g. [1, 2, 16]). There has been some recent work in GGP in that direction using a simplified subset of board games, henceforth referred to as Simplified Game Rule Learner (SGRL) [3]. In the related field of autonomous planning, the LOCM family of domain model acquisition systems [46] learn planning domain models from collections of plans. In comparison to other systems of the same type, these systems require only a minimal amount of information in order to form hypotheses: they only require plan traces, where other systems require state information.

In this work we extend current work on learning formal models of the rules of (simplified) board games, using as input only example sequences of the moves made in playing those games. More specifically, we extend the previous SGRL game rule acquisition system by allowing pieces to be added and removed from the board during play, and by using the LOCM planning domain model acquisition system for encoding and learning the relationships between the pieces that interact during a move, allowing modeling of moves that have side effects (such as castling in chess). Our work is thus distinguished from previous work in this area in that we learn the interactions between the pieces in the games.

The paper is structured as follows: the next section provides necessary background material on LOCM and SGRL, followed by a description of the combined approach. After this, a system for capturing game is described. This is followed by empirical evaluation and overview of related work, before concluding and discussing future work.

2 Background

In this section we provide background information about the LOCM and SGRL model acquisition systems that we base the present work on.

2.1 LOCM

The LOCM family of domain model acquisition systems [5, 6] are inductive reasoning systems that learn planning domain models from only action traces. This is large restriction, as other similar systems require extra information (such as predicate definitions, initial and goal states, etc.). LOCM is able to recover domain information from such a limited amount of input due to assumptions about the structure of the output domain.

A full discussion of the LOCM algorithm is omitted and the interested reader is referred to the background literature [5, 6, 8] for more information. However, we discuss those aspects of the system as relevant to this work. We use the well-known Blocksworld domain as an example to demonstrate the form of input to, and output gained from, LOCM. Although a simple domain, it is useful as it demonstrates a range of features from both LOCM and LOCM2 that are relevant to this work.

The input to the LOCM system is a collection of plan traces. Suppose, in the Blocks-world domain, we had the problem of reversing a two block tower, where block A is initially placed on block B. The following plan trace is a valid plan for this problem in the Blocksworld domain:

figure a

Each action comprises a set of indexed object transitions. For example, the unstack action comprises a transition for block A (which we denote as unstack.1) and another for block B (which we denote as unstack.2). A key assumption in the LOCM algorithm is that the behavior of each type of object can be encoded in one or more DFAs, where each transition appears at most once. This assumption means that for two object plan traces with the same prefix, the next transition for that object must exit from the same state. Consider plan trace 1 and 2 below:

figure b

In the first plan trace, block A is unstacked, before being put down on to the table. In the second plan trace, X is unstacked from block Y before being stacked on to another block, Z. The assumption that each transition only exists once within the DFA description of an object type means that the state that is achieved following an unstack.1 transition is the same state that precedes both a put-down.1 and a stack.1 transition.

The output formalism of LOCM represents each object type as one or more parametrized DFAs. Figure 1 shows the output DFAs of LOCM for the Blocksworld domain. In this figure, we have manually annotated the state names in order to highlight the meanings of each state. The edges in the LOCM state machines represent object transitions, where each transition is labeled with an action name and a parameter position.

LOCM works in two stages: firstly to encode the structure of the DFAs, secondly to detect the state parameters.

Fig. 1.
figure 1

State machines learned by LOCM in the Blocksworld planning domain. State labels are manually annotated to aid comprehension.

Blocks are the only type of object in Blocksworld, and are represented by the two state machines at the bottom of Fig. 1. Informally, these machines can be seen to represent what is happening above and below the block, respectively. In each planning state, a block is represented by two DFA states (for example, a block placed on the table with nothing above it would be represented by the ‘clear’ and ‘on table’ DFA states). Each of the block DFAs transition simultaneously when an action is performed, so when the previously discussed block is picked up from the table (transition pick_up.1) both the top and bottom machines move to the ‘held’ state.

The machine at the top of Fig. 1 is a special machine known as the zero machine (this refers to an imaginary zeroth parameter in every action) and this machine can be seen as encoding the structural regularities of the input action sequences. In the case of Blocksworld, the zero machine encodes the behavior of the gripper that picks up and puts down each block. Relationships between objects are represented by state parameters. As an example, in the ‘Bottom of Block’ machine, the state labelled ‘on top of’ has a block state parameter which represents the block directly underneath the current block.

2.2 The SGRL System

The SGRL approach [3] models the movements of each piece type in the game individually using a deterministic finite automata (DFA). One can think of the DFA representing a language where each word describes a completed move (or piece movement pattern) on the board and where each letter in the word —represented by a triplet \((\varDelta x, \varDelta y, on)\)— describes an atomic movement. The triplet is read in the context of a game board position and a current square, with the first two coordinates telling relative board displacement from the current square (file and rank, respectively) and the third coordinate telling the content of the resulting relative square (the letter e indicates an empty square, w an own piece, and p an opponent’s piece). For example, \((0\ 1\ e)\) indicates a piece moving one square up the board to an empty square, and the word \((0\ 1\ e)\ (0\ 1\ e)\ (0\ 1\ p)\) a movement where a piece steps three squares up (over two empty squares) and captures an opponent’s piece. Figure 2 shows an example DFA describing the movements of a rook, and Fig. 3 shows an example of how moves on a chess and chess variant relate to micro-moves.

Fig. 2.
figure 2

A DFA, \(D_{rook}\), describing the movements of a rook in chess

Fig. 3.
figure 3

A chess and a chess variant example. The left hand board shows an example of chess where potential moves are shown for the pawn on d4, advancing to d5 or capturing on c5. The former move yields the one-step piece-movement pattern (0, 1, e) and the latter \((-1,1,p)\). The knight move b1–d2 and the bishop move c1–g5 yield the piece-movement patterns (2, 1, e) and (1, 1, e)(1, 1, e)(1, 1, e)(1, 1, e), respectively. The cannon in Chinese chess slides orthogonally, but to capture it must leap over exactly one piece (either own or opponent’s) before landing on the opponent’s piece being captured. This is shown in the right hand board. Assuming the piece on d3 moves like a cannon, the move d3–b3 yields the piece-movement pattern \((-1,0,e)(-1,0,e)\), the move d3–h3 the pattern \((+1,0,e)(+1,0,e)(+1,0,w)(+1,0,p)\), and the move d3–d7 the pattern \((0,+1,e)(0,+1,p)(0,+1,e)(0,+1,p)\)

There are pros and cons with using the DFA formalism for representing legitimate piece movements. One nice aspect of the approach is that well-known methods can be used for the domain acquisition task, which is to infer from observed piece movement patterns a consistent DFA for each piece type (we are only concerned with piece movements here; for details about how terminal conditions and other model aspects are acquired we refer to the original paper). Another important aspect, especially for a game-playing program, is that state-space manipulation is fast. For example, when generating legal moves for a piece the DFA is traversed in a depth-first manner. On each transition the label of an edge is used to find which square to reference and its expected content. If there are no matching edges the search backtracks. A transition into a final DFA state s generates a move in the form of a piece movement pattern consisting of the edge labels that were traversed from the start state to reach s. A special provision is taken to detect and avoid cyclic square reference in piece-movement patterns.

Unfortunately, simplifying compromises were necessary in SGRL to allow the convenient domain-learning mechanism and fast state-space manipulation. One such is that moves are not allowed to have side effects, that is, a piece movement is not allowed to affect other piece locations or types (with the only exception that a moving piece captures the piece it lands on). These restrictions for example disallow castling, en-passant, and promotion moves in chess. We now look at how these restrictions can be relaxed.

3 The GRL System

In this section, we introduce a board-game rule learning system that combines the strengths of both the SGRL and LOCM systems. One strength of the SGRL rule learning system is that it uses a relative coordinate system in order to generalize the input gameplay traces into concise DFAs. A strength of the LOCM system is that it can generalize relationships between objects that undergo simultaneous transitions.

One feature of the SGRL input language is that it represents the distinction between pieces belonging to the player and the opponent, but not the piece identities. This provides a trade-off between what it is possible to model in the DFAs and the efficiency of learning (far more games would have to be observed to learn if the same rules had to be learned for every piece that can be moved over or taken, and the size of the automata learned would be massive). In some cases, however, the identities of the interacting pieces are significant in the game.

We present the GRL system that enhances the SGRL system in such a way as to allow it to discover a restricted form of side-effects of actions. These side-effects are the rules that govern piece-taking, piece-production and composite moves. SGRL allows for limited piece-taking, but not the types such as in checkers where the piece taken is not on the destination square of the piece moving. Piece-production (for example, promotion in chess or adding men at the start of Nine Men’s Morris) is not at all possible in SGRL. Composite moves, such as castling in chess or arrow firing in Amazons also cannot be represented in the SGRL DFA formalism.

3.1 The GRL Algorithm

The GRL algorithm can be specified in three steps:

  1. 1.

    Perform an extended SGRL analysis on the game traces (we call this extended analysis \(SGRL{+}\)). SGRL is extended by adding additional vocabulary to the input language that encodes the addition and removal of pieces from the board. Minor modifications have to be made to the consistency checks of SGRL in order to allow this additional vocabulary.

  2. 2.

    Transform the game traces and the \(SGRL{+}\) output into LOCM input plan traces, one for each game piece. Within this step, a single piece move includes everything that happens from the time a player picks up a piece until it is the next player’s turn. For example, a compound move will have all of its steps represented in the plan trace.

  3. 3.

    Use the LOCM system to generate a planning domain model for each piece type. These domain models encode how a player’s move can progress for each piece type. Crucially, unlike the SGRL automata, the learn domains refer to multiple piece types, and the relationships between objects through the transitions.

The output of this procedure will be a set of planning domains (one for each piece) which can generate the legal moves of a board game. By specifying a planning problem with the current board as the initial state, then enumerating the state space of the problem provides all possible moves for that piece. We now describe these three stages in more detail.

3.2 Extending SGRL Analysis for Piece Addition and Deletion

The first stage of GRL is to perform an extended version of the SGRL analysis. The input alphabet has been extended in order to include vocabulary to encode piece addition and removal. To this end, we add the following two letters as possible commands to the DFA alphabet:

Fig. 4.
figure 4

The Game of Amazons is a two-player game typically played on a \(10 \times 10\) board, shown here on an \(8 \times 8\) board, in which each player has four pieces. The board on the left shows the initial setup of the game. On each turn, a player moves a piece as the queen moves in chess, before firing an arrow from the destination square, again in the move pattern of the queen in chess. The final square of the arrow is then removed from the game, thus after each move, one square is removed from the game. This can be seen on the right hand board: the queen moves from C8 to C2, before firing an arrow from C2 to A4 (A4 now cannot be transited). The loser is the first player unable to make a move.

  1. 1.

    The letter ‘a’ which means that the piece has just been added at the specified cell (the relative coordinates of this move will always be (0, 0)). The piece that is picked up now becomes the piece that is ‘controlled’ by the subsequent micro-moves.

  2. 2.

    The letter ‘d’ which means that the square has a piece underneath it which is removed from the game. This is to model taking pieces in games like peg solitaire or checkers.

The order in which these new words in the vocabulary are used is significant. An ‘a’ implies that the piece underneath the new piece remains in place after the move, unless it is on the final move of the sequence of micro-moves (this is consistent with the more limited piece-taking in SGRL, where pieces are taken only at the end of a move, if the controlled piece is another piece’s square). For example, the following sequences both describe white pawn promotion in chess:

figure c

Adding this additional vocabulary complicates the SGRL analysis in two ways: firstly in the definition of the game state, and secondly in the consistency checking of candidate DFAs. There is no issue when dealing with the removal of pieces and state definitions. There is, however, a small issue when dealing with piece addition. A move specified in SGRL input language is in the following form:

figure d

This move specifies that the piece on square 12 moves vertically up a square, then a new piece appears (and becomes the controlled piece), which subsequently moves another square upwards. The issue is that for each distinct move, in SGRL, the controlled piece is defined by the square specified at the start of the move (in this case square 12). If a piece appears during a move, then \(SGRL{+}\) needs to have some way of knowing which piece has appeared, in order to add this sequence to the prefix tree of the piece type. To mitigate this problem, we require all added pieces to be placed in the state before they are added, and as such they can now be readily identified. This approach allows us to learn the DFAs for all piece types, including piece addition.

The consistency algorithm of SGRL verifies whether a hypothesized DFA is consistent with the input data. One part of this algorithm that is not discussed in prior work is the generateMoves function, that generates the valid sentences (and hence the valid piece moves) of the state machine. There are two termination criteria for this function: firstly the dimensions of the board (a piece cannot move outside of the confines of the board), secondly on revisiting squares (this is important in case cycles are generated in the hypothesis generation phase). This second case is relaxed in GRL slightly since squares may be visited more than once due to pieces appearing and/or disappearing. However, we still wish to prevent looping behavior, and so instead of terminating when the same square is visited, we terminate when the same square is visited in the same state in the DFA.

With these two changes to SGRL analysis we have provided a means to representing pieces that appear and that remove other pieces from the board. However, the state machines do not tell us how the individual piece movement, additions and removals combine to create a complete move in the game. For this task, we employ use of the LOCM system, in order to learn the piece-move interactions.

3.3 Converting to LOCM Input

We use the LOCM system to discover relationships between the pieces and the squares, and to do this we need to convert the micro-moves generated by the \(SGRL{+}\) DFAs to sequences of actions. To convert these, we introduce the following action templates (where X is the name of one of the pieces):

figure e

These action templates mirror the input language of the extended \(SGRL{+}\) system detailed above. Each plan that is provided as input to LOCM is a sequence of micro-moves that define an entire player’s turn. When LOCM learns a model for each piece type, the interactions between the pieces and the squares they visit are encoded in the resultant domain model.

The input actions, therefore, encode the entire (possibly compound and with piece-taking side-effects) move. As an example from the Amazons (see Fig. 4) game, the following plan trace may be generated from the example micro-move sequence, meaning a queen is moved from A1 to B1, and then an arrow appears, and is fired to square C1:

figure f

In all plan traces for the Queen piece in the Amazons game, the destination square of the Queen will always be the square in which the Arrow appears, and from which the Arrow piece begins its movement. Once the LOCM analysis is complete, this relationship is detected, and the induced planning domain will only allow arrows to be fired from the square that the queen moves to.

Fig. 5.
figure 5

The LOCM State Machines learned for the Queen piece in Amazons. The zero machine at the top of the figure shows the higher-level structure of the Queen’s move. The Queen moves for a certain number of moves, then an arrow appears, and the arrow moves for a certain number of moves.

Figure 5 shows the actual LOCM state machines learned for the Queen piece from a collection of game traces. The top state machine represents the zero machine, which describes the general plan structure. The machine underneath represents one of the squares on the board. The parameters of the zero machine represent the current square underneath the controlled piece. As can be seen from the generated PDDL action for \(appear\_a\) (shown in Fig. 6), this parameter ensures that the arrow appears at the same location that the queen ended its move on. It also ensures that the moves are sequenced in the correct way: the Queen piece moves, the Arrow piece appears and finally the Arrow piece moves to its destination. The machine at the bottom of Fig. 5 encodes the transitions that a square on the board goes through during a Queen move. Each square undergoes a similar transition path, with two slightly different variants, and to understand these transitions it is important to see the two options of what happens on a square once a queen arrives there. The first option is that the Queen transits the square, moving on to the next square. The second option is that the Queen ends its movement; in this case, the Arrow appears, before transiting the square. These two cases are taken into account in the two different paths through the state machine.

Fig. 6.
figure 6

The PDDL Action for appear_a, representing the arrow appearing in the Amazons game.

As another example, consider the game of Checkers. The state machines generated by LOCM for the Pawn piece type are shown in Fig. 7. There are two zero machines in this example, which in this case means that LOCM2 analysis was required to model the transition sequence, and there are separable behaviors in the move structure. These separable behaviors are the movement of the Pawn itself and of the King, if and when the Pawn is promoted. The top machine models the alternation between taking a piece and moving on to the next square for both Pawns and Kings, the bottom machine ensures that firstly a King cannot move until it has appeared, and secondly that a Pawn cannot continue to move once it has been promoted.

3.4 Move Generation

We now have enough information to generate possible moves based on a current state. The GRL automata can be used in order to generate moves, restricted by the LOCM state machines, where the LOCM machines define the order in which the pieces can move within a single turn.

figure g

The algorithm for generating moves is presented here as Algorithm 1. The algorithm takes as input an \(SGRL{+}\) state, a LOCM state and a square on the board. A depth-first search is then performed over the search space, in order to generate all valid traces. A state in this context is a triple of a board square, an \(SGRL{+}\) piece state and a LOCM state. Lines 2 to 3 define the termination criteria (a state has already been visited), lines 4 to 5 define when a valid move has been defined (when each of the \(SGRL{+}\) and LOCM DFAs are in a terminal state), and lines 6 to 7 define the recursion (in these lines, we refer to a state as being consistent. Consistency in this sense means that the state is generated by a valid synchronous transition in both the \(SGRL{+}\) and LOCM DFAs.

This completes the definition of the GRL system: an extended SGRL learns DFAs for each piece type, the LOCM system learns how these piece types combine to create entire moves, and then the move generation algorithm produces the valid moves in any given state. Next, we provide an evaluation of GRL, and detail the cost of each element of the system.

4 Game Trace Generation

One important aspect of learning game rules from traces is where our observations come from. The main results presented in this work are based on example traces generated from simulations of games generated from existing correct and complete rule sets. This is important because we can reliably evaluate the correctness of the learned rules if we have the true rules to test against. This is the main purpose of this work: to demonstrate the effectiveness of GRL in learning game rules, wherever the example traces come from.

However, to use the GRL system in the real world, we need to produce game traces from observations of real games. In this section we describe a visual tool for generating game traces. This tool allows the generic setup of a board game, and then allows the user to play out games in order to build a collection of game traces. The creation of a set of rules follows the following pattern:

Fig. 7.
figure 7

The LOCM State Machines learned for the Pawn piece in the game of Checkers. The state machine incorporates the possibility of the Pawn being promoted into a King, and then subsequently can take other pieces.

Fig. 8.
figure 8

Visual Game Trace Generator. This figure shows the gameplay screen for the game of checkers. White pieces are those with white borders, the piece type is denoted by the color in the center of the piece. Play proceeds by dragging the pieces, and the end of each turn and game is signaled by the options on the right hand side of the frame. (Color figure online)

  1. 1.

    Setting the game up. This consists of providing a name for the game, and specifying the number of piece types that are involved in the game per player and the dimensions of the board. For example, checkers is played on an 8\(\,\times \,\)8 board, with two piece types (pawns and queens). Following this stage, a graphical view of the board is generated with a collection of piece prototypes to interact with for the next stage.

  2. 2.

    Constructing the initial state of the game. This stage requires the user to define the initial configuration of pieces graphically, by dragging the piece prototypes to the square(s) that they start from. This completes the configuration of the board: the next stage is actually playing the game.

  3. 3.

    Playing the initial game (Fig. 8 ). The first game is played as a two player game, with the user playing both white and black moves. Within each turn, the player can move as many pieces as he or she chooses, add pieces to the board, and remove pieces from the board. These movements are interpreted as micro-moves, and after the user signals that the move has ended, these micro-moves become the representation for that ‘turn’ in the game. The exact way in which these micro-moves are calculated is explained below. After the user signals that the game is over, and which player (if any) has won the game, the GRL algorithm is used to learn an initial approximation of the game rules. This initial model learning seeds the next stage:

  4. 4.

    Model refinement training. Once an approximation on a set of rules exists, the user takes turns to play games for white and black pieces. The computer will make random moves for the opposing color during play. At the end of each game, all previous game traces are combined with the current game trace to learn a refined set of rules using the GRL system. After each game, the moves that the user has made can expand upon those in the current model. In the next game, the roles are reversed, with the computer controlling the opposite color than previously. Thus, with each game, the user both refines the moves for one of the colors and also observes informally whether or not the model appears to capture the true rules of the game. This stage is repeated until the user feels that the rules learned appear sufficiently close to the true game rules.

4.1 Computing Micro-Move Representations

During each turn that a user makes, he or she may move pieces on the board, add pieces to the board and remove pieces from the board. These moves are broken down into micro-moves, in order to provide input to the GRL algorithm. All of these operations are performed using a mouse-driven point and click interface. Game pieces can be dragged using the mouse into their new positions. These new positions may not be adjacent to the square that the piece was lifted from, and so the question of how we formulate the micro-moves is important. One option would be to simply take the absolute move and encode this (left pane of Fig. 9). However, this has the consequence that important constraints on moves can be missed, as it appears that the piece can jump over whatever was in the way of it. Another option would be to track the mouse cursor as it dragged the piece to its destination, making each newly visited square part of the move (center pane of Fig. 9). This idea places a high burden on the user to move the mouse in an efficient way, in order to preserve the particular micro-move pattern. The method that we actually employ is to find the shortest path between the source and destination squares and encode this as the micro-move sequence (right hand pane of Fig. 9). If there is more than one shortest path, we encode all such paths.

When a piece is taken following a piece move, it is assumed that this piece transited the location of the taken piece. Thus we find the shortest path from the start to the taken piece’s location and then on to the destination. When a piece is added to the board during a move, it begins a new element of a compound move, as described in Sect. 3. Taking these three elements (moving, adding and removing pieces) together, we have a method for computing entire micro-move sequences that then are accumulated to construct each game trace. We present this system in order to show how the GRL system can be used on concrete observations of game play. In future, we also intend to learn terminal states such that the computer attempts to beat the user, rather than simply making random moves. This may also give an indication of when learning has been effective enough, as a good guide to this could be when the computer player can beat the user.

Fig. 9.
figure 9

Three different ways of generating micro-moves for the same piece move from visual mouse-dragged input. On the left, the exact move is encoded in one micro-move (3,3,e). The center option is to track the mouse movement of the user, which gives the messier (0,1,e),(1,1,e),(0,1,e),(0,1,e),(1,-1,e),(1,0,e). The option on the right, which we use, is to visit all squares on the shortest path between the start and the destination square: (1,1,e),(1,1,e),(1,1,e).

Table 1. Learning time in seconds to learn models for each of the pieces in the problem set with all moves known. (Using all moves only affects the time for the \(SGRL{+}\) element, as LOCM does not accept input when all moves are known).
Table 2. Learning time in seconds to learn models for each of the pieces in the problem set with only a single move known per state. Results marked with a (*) returned an incorrect DFA.

5 Empirical Evaluation

In this section, we provide an evaluation of the GRL system. In order to perform this evaluation, we learn the rules of Amazons, Peg Solitaire and Checkers. We also provide evaluation for the three games Breakthrough, Breakthrough Checkers, and Breakthrough Chess as used in the SGRL evaluation [3] in order to demonstrate that performance is not adversely affected due to the changes in GRL, in the alphabet and consistency algorithm. We report on the time taken in the extended \(SGRL{+}\) phase, and the LOCM phase, to show the balance of time taken in each phase. We generate two forms of game traces for each game: one has a single move per turn (for these we generate 1000 game traces) and the other enumerates all possible moves (for these we generate 50 game traces). This method of evaluation is consistent with that chosen in the prior evaluation of GRL. All experiments are run on Mac OSX 10.10, running on an Intel i7 4650U CPU, with 8GB system RAM.

Table 1 shows the results of GRL on our benchmark problems. We show the time taken for each element of GRL to learn its model. We report both the time taken to learn the model for the first and second players (reported under ‘Player1’ and ‘Player2’). In the case of Peg Solitaire, there is only one player, which explains the missing data. The missing data in the LOCM element for Amazons A piece is because an arrow piece in Amazons never starts a move itself and hence has no plan trace data. We first observe that the Breakthrough, Breakthrough Checkers and Breakthrough Chess results are not significantly different to the results for SGRL. Therefore the modifications made to the SGRL algorithm are not adversely affecting performance. We do not report LOCM time here, since these games do not require the LOCM analysis, as \(SGRL{+}\) is sufficient to represent them. It is notable that the time taken to learn the LOCM models is significantly larger than the time to learn the \(SGRL{+}\) individual piece models. The main cause of this difference is that the input plan traces are for every turn of the game, rather than the entire game trace. Because of this, LOCM has 25,000 input traces for the Queen piece in Amazons and 11,000 for the Pawn piece in Checkers, for example. The time required to parse and analyze this number of plans is necessarily time consuming.

Table 2 shows the results of learning when only a single move per turn is known. Learning times are increased, as it takes longer to find consistent DFAs in the \(SGRL{+}\) phase. Note that on several occasions, \(SGRL{+}\) returns incorrect solutions. In these cases, there is simply insufficient input data. GRL is still effective in the majority of cases, however, and does not degrade the performance of SGRL on the previous game data.

6 Related Work

Within the planning literature there are several domain model acquisition systems, each with varying levels of detail in their input observations. The Opmaker2 system [13, 17] learns models in the target language of OCL [14] and requires a partial domain model, along with example plans as input. The ARMS system [19], can learn STRIPS domain models with partial or no observation of intermediate states in the plans, but does at least require predicates to be declared. The LAMP system [20] can target PDDL representations with quantifiers and logical implications.

As for domain model acquisition in board games, an ILP approach exists for inducing chess variant rules from a set of positive and negative examples using background knowledge and theory revision [15]. Furthermore, [10] presents a system that learns games such as Connect4 and Breakthrough from video demonstrations using minimal background knowledge. Rosie [11] is an agent implemented in Soar that learns game rules (and other concepts) for simple board games via restricted interactions with a human. As for general video-game playing, a neuro-evolution algorithm showed good promise playing a large set of Atari 2600 games using little background knowledge [9].

7 Conclusions and Future Work

In this work, we presented a system capable of inducing the rules of more complex games than the current state-of-the-art system. This can help in both constructing the rules of games, or replacing the current move generation routine of an existing general game playing system (for fitting games). GRL improves SGRL by allowing piece addition, removal and structured compound moves. This was achieved by combining two techniques for domain-model acquisition, one rooted in game playing and the other in autonomous planning.

However, there remain interesting classes of board game rules that cannot be learned by GRL. One interesting rule class is that of a state-bound movement restriction. Games that exhibit this behavior allow only a subset of moves to occur in certain contexts: examples of this are check in chess (where only the subset of moves that exit check are allowed) and the compulsion to take pieces in certain varieties of checkers (thus restricting the possible moves to the subset that take pieces when forced). An approach to learning these restrictions could be developed given knowledge of all possible moves at each game state in the game.

Learning the termination criteria of a game is also an important step, if the complete set of rules of a board game is to be learned. This requires learning properties of individual states, rather than state transition systems. However, many games have termination criteria of the type that only one player (or no players) can make a move. For this class of game, and others based on which moves are possible, it should be possible to extend the GRL system to learn how to detect terminal states.