Keywords

1 Introduction

Once a honeybee has found a rich source of pollen, it shares its location with other members of the colony by executing a particular figure called waggle dance [16]. It guides the search for other bees toward flowers yielding nectar and pollen and thus enhances the efficiency of the colony foraging strategy.

More broadly, strategies are general programs aimed at achieving particular goals from a multiplicity of initial states. When a scientist models animal behaviours or other strategies, the learning process generally requires selecting and conducting many experiments, with associated costs. Thus, learning efficiency relies on the sum of the costs of the performed experiments. We investigate in this work how much the experimental cost can be reduced with active learning to learn agent strategies. An active learner is allowed to actively choose the experiments to perform to acquire knowledge during the learning process. Furthermore, in real-world situations, strategies should be resource-efficient to be beneficial for agents. Therefore, we additionally want to converge toward efficient strategies.

Fig. 1.
figure 1

Observations of a bee behaviour

In Sect. 6, we learn a general strategy for a bee to find pollen in an environment. Learnt strategies are logic programs built from observations of bee behaviour. Observations are labelled as positive if the goal is fulfilled and negative otherwise. Figure 1a represents a positive observation: the waggle dance indicates that a flower is at the right of the hive, the bee flies in this direction and finds pollen. Several hypotheses can be inferred from it, among them the two represented in Fig. 1b. To discriminate between them, the experiment of Fig. 1c could be performed. The flower is now on the left, which is indicated by the waggle dance. No matter its outcome, positive or negative, it would eliminate one of these two hypotheses. Therefore, it is an informative query.

Meta-Interpretive Learning (MIL) has been demonstrated to be a suitable paradigm to learn strategies since it supports predicate invention and the learning of recursive programs [22, 24]. Given the observations so far, consistent hypotheses are built from a set of metarules and the background knowledge. A Bayesian posterior distribution is implemented over the hypothesis space [23] and introduces a bias toward hypotheses with lowest complexity. The learner computes at each iteration the entropies of the possible experiments given the current hypothesis space and the prior distribution. The instance with maximum entropy is selected: it is the most discriminative between the remaining competing hypotheses. This process is resumed and more experiments are performed until some target accuracy is reached.

Specifically, our contributions are the introduction of a framework for learning efficient agent strategies with active learning and for reducing the associated cost of experimentation. We describe its implementation. We also evaluate theoretically the expected gain in entropy, and demonstrate experimentally that Bayesian MIL Active Learning converges faster toward efficient agent strategies than a passive learner in the same conditions.

This article is organised as follows. Section 2 reviews some related work. Section 3 describes the framework used in this paper together with the learning protocol. Section 4 details a theoretical analysis. Section 5 describes the implementation. Section 6 reports experiments in learning regular grammars and bee strategies. Finally, we conclude and discuss further work in Sect. 7.

2 Related Work

Active Learning. Active Learning is a protocol in which the learner is able to choose the data from which it learns by accessing an oracle. It contrasts with passive learning for which the labeled data is selected at random. The objective in active learning is to learn a model with high accuracy while ideally making fewer queries than the number of random data required by a passive learner to achieve the same accuracy level. It has been widely studied for identifying classifiers [27] and different query strategy frameworks have been introduced.

In the membership query setting, the learner is allowed to ask for the label of any points of the instance space, even artificially generated ones [1, 2]. However, newly synthesized instances may be uninterpretable by human oracles. An alternative is stream-based selective sampling: the learner can sample from the instance distribution and decide whether to label or discard each sample instance [12]. A variant is to directly sample from a subpart of the instance space that is the most informative [4]. We focus in this work on pool-based active learning: the learner has access to a large number of initial unlabelled data points, and to an oracle which can provide the label of any of these points on request [20].

Several measures have been suggested for evaluating the shrinkage of the hypothesis space during the learning process and thus measuring the benefits of active learning over passive learning. The main ones are the diameter of the version space [11, 29], the measure of the region of disagreement [13, 14], the metric entropy [18] and the size of the version space [10, 21] which inspired this paper. We will more specifically operate in a Bayesian setting [10, 12] that benefits from a prior distribution over the hypothesis space.

Active Learning have been widely used for classification, although there are different applications: version space approaches have been considered for object detection in computer vision [26]. Similarly, the system presented in this article is based upon active learning for devising experiments to rule out hypotheses from the version space. However, our approach is different from the work presented above since we use active learning within the construction of logic programs and for learning agent strategies in a Bayesian setting.

Decision Trees. A search strategy can be representented by a tree whose internal nodes are experiments and whose leaves are hypotheses: minimizing the number of queries means building a tree of minimum average size. In that case, it has been shown that the performances of a greedy strategy are not worst than any other strategy for minimizing the number of label queries [10]. Moreover, the expected depth of any binary decision tree is lower bounded by the entropy of the prior distribution [5].

Combining Active Learning with Inductive Logic Programming. In [28], Inductive Logic Programming (ILP) has been combined with Active Learning for two non-classification tasks in natural language processing: semantic parsing and information extraction. Also, a closed loop Machine Learning system for Scientific Discovery applications is described in [3, 17]: a robot scientist is introduced, it autonomously proposes and performs a sequence of experiments which minimises the expected cost of experimentation for converging upon an accurate hypothesis generated with ILP. Conversely, our work aims at learning efficient agent strategies.

Learning Efficient Strategies. A general framework for learning optimal resource complexity robot strategies is presented in [6]. It has been extended into Metaopt [8], an ILP system that learns minimal cost logic programs by adding a general cost function into the meta-interpreter. By contrast, our work focuses on another aspect of the learning efficiency: we investigate how to reduced experimental costs for learning efficient agent strategies.

Relational Reinforcement Learning. A challenge in reinforcement learning is the exploration/exploitation trade-off. In [19], the authors present relational exploration strategies: the generalisation of learnt knowledge over unobserved instances in relational worlds allows a generalisation of the notion of known states compared to propositional settings. It can also applied to largest domains. In [25], Active Learning is used to select actions to perform for reaching states that will enforce a revision of the current model. It is shown that the integration of Active Learning improves learning speed: an accurate action model is obtained after performing much less actions than when using random exploration only.

To the authors’ best knowledge, this is the first time active learning is integrated with Bayesian MIL to devise a sequence of experiments to perform for learning efficient strategies with reduced experimental costs.

3 Theoretical Framework

3.1 Notations

Let \(\mathcal {X}\) be the instance space, and \(\mathcal {H}\) a concept class over the instance space \(\mathcal {X}\). We consider a probability distribution \(\varPi _{\mathcal{{X}}}\) over the instance space \(\mathcal {X}\) and a probability distribution \(\varPi _{\mathcal{{H}}}\) over the hypothesis space \(\mathcal {H}\). We assume that the target hypothesis \(\bar{H}\) is drawn from \(\mathcal {H}\) and according to \(\varPi _{\mathcal{{H}}}\). We call \(E_{m} = \{e_{0}, ..., e_{m}\}\) the set of examples selected up to the iteration m. The version space \(V_{m}\) is the set of hypotheses \(H \in \mathcal{{H}}\) consistent with \(E_{m}\), therefore \(V_{m} \subset \mathcal{{H}}\).

3.2 Meta-Interpretive Learning (MIL)

MIL is a form of ILP [22, 24]. The learner is given a set of examples E and a background knowledge B composed of a set of Prolog definitions \(B_{p}\) and metarules M such that \(B = B_{p} \cup M\). The goal is to generate a hypothesis H such that \(B,H \models E\). The proof is based upon an adapted Prolog meta-interpreter. It first attempts to prove the examples considered deductively. Failing this, it unifies the head of a metarule with the goal, and saves the resulting meta-substitution. The body and then the other examples are similarly proved. The meta-substitutions recovered for each successful proofs are saved and can be used into further proofs by substituting them into their corresponding metarules. For example, the first clause of the learned hypothesis on Fig. 1 f(A,B):-f1(A,C),grab(C,B). has been derived from the metarule chain rule detailed in Fig. 2a and by applying the meta-substitution [f/2,f1/2,grab/2].

Two key features of MIL are that it supports predicate invention and the learning of recursive programs. The former enables decomposition of the learned logic program into new sub-actions. The latter allows to learn more general programs and with shorter lengths. Both makes MIL well suited for learning strategies.

The choice of metarules induces a declarative bias on the hypothesis space since it determines the structure of learnable programs: an appropriate choice helps minimising the number of clauses in the consistent hypotheses [7]. Also, the use of higher-order Abstractions supports learning more compact programs [9]. We focus in this work on learning logic programs built from the metarules chain rule, precondition and postcondition, whose description is available on the Fig. 2a. Indeed, this set of metarules is enough to learn the class of dyadic logic programs investigated in this paper. The first experiment tackles the task of learning regular grammars, and metarules for finite state acceptors can be expressed with chain rule and postcondition only, as shown on Fig. 2b. The second experiment considers the task of learning agent strategies. Fluents are treated as monadic predicates which apply to a situation, while actions are dyadic predicates which transform one situation to another. We will use Metagol\(_{AI}\) which supports Abstractions and Inventions [9]. We use the two abstractions until/4 and ifthenelse/5 (Fig. 2b) to reduce the complexity of the learned programs: until/4 represents a recursive call to the action Ac while some condition Cond is not fulfilled and ifthenelse/5 expresses a choice between the actions Then and Else based upon the realisation of the condition Cond. Similarly, these Abstractions can be expressed with the chain rule, precondition and postcondition only as shown on the Fig. 2b.

Fig. 2.
figure 2

Metarules considered: the class of dyadic logic program studied in this work can be expressed with chain rule, precondition and postcondition only

3.3 Complexity of an Hypothesis

The hypotheses generated with MIL differ by their complexity. We distinguish two notions of complexity for a logic program H. The textual complexity relies on Occam’s principle and represents the length l(H) of H measured as its number of clauses. However, textually smaller programs are not necessarily the more efficient. The resource complexity r(H) of an agent strategy [6] represents the amount of resources (eg: energy) consumed by the agent while executing the strategy, and can be evaluated as the sum of the actions costs in applying H to transform an initial state into a final state. In the following, we will combine the textual complexity with the resource complexity to learn efficient strategies in terms of a global complexity:

$$\begin{aligned} c(H)=l(H)+r(H) \end{aligned}$$

3.4 Bayesian Prior Distribution

The preference for hypotheses with lowest complexity is encoded in a prior distribution which induces a bias over the hypothesis space and favors more efficient strategies. We consider the framework described in [23]. A Bayesian prior probability is defined for any H in \(\mathcal {H}\) from the complexity c(H) as follows and for \(\frac{1}{a}= \sum _{i=1}^{\infty }\frac{1}{i^{2}}=\frac{\pi ^{2}}{6}\) normalisation constant:

$$\begin{aligned} \varPi _{\mathcal{{H}}}(\{H \mid c(H) = k\}) = \frac{a}{{k}^{2}} \end{aligned}$$

Moreover, given a background knowledge B and a set of examples E, the likelihood of E is:

$$\begin{aligned} p(E \mid B,H) = \left\{ \begin{array}{ll} 1 &{} \text{ if } B,H \models E\\ 0 &{} \text{ else } \end{array} \right. \end{aligned}$$

According to Bayes’s theorem, the posterior is then given by:

$$\begin{aligned} p(H \mid B,E) = \frac{\varPi _{\mathcal{{H}}}(H) p(E \mid B,H)}{c} \end{aligned}$$

The denominator c is a normalization constant. Therefore, the posterior \(p(H\mid B,E)\) is proportional to the prior \(\varPi _{\mathcal{{H}}}(H)\). The MAP hypothesis \(H_{MAP}\) is defined as .

3.5 Active Learning

A set of N instances is initially sampled from \(\mathcal {X}\). The active learner conducts at each iteration \(m+1\) an experiment in which it chooses the next instance \(e_{m+1}\) among this set and observes its label returned by an oracle. This information helps discriminating between the competing hypotheses built as described previously since it rules out some proportion of the version space \(V_{m}\) that is not consistent with it. The shrinkage of the hypothesis space is measured by the ratio \(\frac{\varPi _\mathcal{{H}}( V_{m+1})}{\varPi _\mathcal{{H}}( V_{m})}\). We associate to each sampled instance e a probability \(p_{e}\):

$$\begin{aligned} p(e) = \frac{\min (\varPi _\mathcal{{H}}(\{H \in V_{m} \mid H(e)=1\}),\varPi _\mathcal{{H}}(\{H \in V_{m} \mid H(e) = 0\})}{\varPi _\mathcal{{H}}( V_{m})} \end{aligned}$$

This value represents the minimal reduction ratio over the version space induced by the query of the instance e. Moreover, it was noted in [21] that in general, the optimal query strategy is to select an instance covered by half of the the version space. Indeed, no matter its true label, it would halve the version space. Therefore, the query strategy chosen is to select the instance \(e_{m}\) for which p(e) is the closest to \(\frac{1}{2}\), that is for which the entropy ent(p(e)) is maximal:

From an information-theory point of view, the expected entropy of \(p(e_{m})\) is the expected information gain from the label of \(e_{m}\) [15]. In that case, the instance selected is the most informative from the learner’s point of view, since it is the most discriminative given the current version space. However, active learning with entropy based querying strategy can be used with different kind of hypotheses and therefore is not specific to ILP.

3.6 Learning Protocol

The learning protocol is summarised in the Figs. 3 and 4 and represents how the learner acquires information. First, a pool of N instances is randomly sampled. The training set is initialised with one positive instance randomly selected. At each iteration, a fresh new set of K hypotheses consistent with the examples of the training set is sampled. The entropy of each instance from the pool is computed given the set of sampled hypotheses. The instance with maximum entropy is selected. An oracle provides its label, and it is added to the training set. This process is resumed until the maximum number of iterations is reached.

Fig. 3.
figure 3

Diagram of the framework studied: active learning is integrated within the MIL framework

Fig. 4.
figure 4

Pseudo-code of the framework studied

4 Theoretical Analysis

We evaluate the instantaneous expected gain on entropy, which represents the expected reduction of the hypothesis space at one iteration. We assume that the target hypothesis \(\bar{H}\) is drawn from \(\mathcal{{H}}\). Each instance e from \(\mathcal {X}\) is associated a probability p(e). We consider an arbitrary probability distribution \(\mathcal {D}\) over \(\mathcal {X}\) at some iteration m, that is bounded in \([0, \frac{1}{2}]\).

Lemma 1

A set of N instances \(\{x_{1}, ... x_{N}\}\) is randomly sampled from the instance space \(\mathcal{{X}}\). The active learner selects the instance \(x_{i}\) with maximum entropy among this sample set of size N. Then, the probability of selecting an instance with maximal entropy on \(\mathcal {D}\) is N times the one of a passive learner in the same conditions.

Proof

Let’s take \(\epsilon >0\), \(p_{\epsilon }\) is set to the probability number in \([0, \frac{1}{2}]\) such that a \(\epsilon \)-proportion of the instance space has a probability greater or equal to \(p_{\epsilon }\). Let’s call \(p(x_{i})\) the probability of the instance selected. Then \(p(x_{i})<p_{\epsilon }\) if and only if every instance from the sample set has a probability smaller than \(p_{\epsilon }\). The instances being independently sampled, it can be written as following:

$$\begin{aligned} p(p(x_{i})<p_{\epsilon })=p(p(x_{1})<p_{\epsilon }, ... , p(x_{N})<p_{\epsilon }) = \prod _{k=1}^{N} p(p(x_{k})<p_{\epsilon }) = (1-\epsilon )^{N} \end{aligned}$$

Then, the probability for an active learner to select an instance with probability at least \(p_{\epsilon }\) is, from the binomial theorem:

$$\begin{aligned} p_{active}(p(x_{i})\ge p_{\epsilon }) = 1- (1-\epsilon )^{N} = 1 - \sum _{k=0}^{N}\left( {\begin{array}{c}n\\ k\end{array}}\right) (-\epsilon )^{k} = N \epsilon - o(\epsilon ) \end{aligned}$$

By comparison, the probability for a passive learner to select an instance with probability at least \(p_{\epsilon }\) simply is:

$$\begin{aligned} p_{passive}(p(x_{i})\ge p_{\epsilon }) = \epsilon \end{aligned}$$

Therefore, the probability of selecting an instance with maximal entropy on \(\mathcal {D}\) is N times bigger for the active learner.

5 Implementation

5.1 Sampling a Set of Hypotheses

To cope with very large or potentially infinite hypothesis spaces, a set of consistent hypotheses is sampled at each iteration. This sample set is used both to measure the accuracy and to evaluate the entropies.

We use a process called Regular Sampling [23] which limits the number of duplicates while maintaining a good sampling efficiency. A set of probability fractions \(p_{i}\) is generated from the first K integers and with the following two properties: it is evenly distributed in [0, 1] and it is isomorphic to \(\mathbb {N}\) for K infinite. Consistent hypotheses are ordered in SLD order within the derivation tree. Samples are selected from the tree leafs and following the probability fractions generated. At each node of the tree, the different branches are given equal weights, and are associated a cumulative posterior probability computed as the sum of the posterior probabilities of the hypotheses on the left side. Starting from the top node, we browse through the tree and selects at each node the branch whose cumulative posterior probability interval [minmax] contains the sampling probability fraction \(p_{i}\). This latter is then updated as \((p_{i} - min)(max-min)\), and the process is repeated within the sub-tree selected. At the end, Regular Sampling reproduces sampling without replacement due to the distribution of the sequence of fractions, and provides a sample set representative of the current version space. A set of at most K hypotheses is dynamically sampled according to this process. The first K fractions and corresponding hypotheses are generated. If all of them are inconsistent with the examples selected so far, a new set of hypotheses is sampled from the next K natural numbers, and so forth until at least one consistent hypothesis is returned. After removing potential duplicates, this sampled set is saved for evaluating the accuracy and the entropies.

5.2 Computing the Entropies

As a next step, the entropies are computed from the sampled hypotheses. For every instance initially sampled from the instance space, we compute the proportion of sampled hypotheses that predict a positive label and weight it according to the hypotheses prior distribution. The entropy is derived from this probability. Finally, the instance with the highest entropy is selected. If several instances reach the maximum, one of them is selected at random.

6 Experiments

6.1 Experimental Hypothesis

This section describes two experiments for evaluating the benefits of Bayesian Meta-Interpretive Active Learning over the speed of convergence when learning efficient agent strategiesFootnote 1. Thus, we investigate the following research hypothesis:

Research Hypothesis: Bayesian Meta-Interpretive Active Learning requires a smaller sample complexity for learning efficient agent strategies.

For the sake of comparison, we consider a passive learner which randomly selects one instance at each iteration. Therefore, we associate to the previous research hypothesis the following null hypothesis that we will test:

Null Hypothesis: Bayesian MIL Active Learning can not converge faster toward efficient strategies than Bayesian MIL Passive learning.

Fig. 5.
figure 5

Example of target hypothesis learned and corresponding FSA, the parity grammar

6.2 Learning Regular Grammars

We learn regular languages, which are equivalent to deterministic Finite State Automata (FSA). Generally speaking, FSA represent sequences of actions depending on a sequence of events and an input state. Thus, they consist of compact ways of representing strategies. We additionally require target grammars to have a generality \(g(H) = \varPi _{\mathcal{{X}}}(\{x \mid H(x) = 1\})\) between \(\frac{1}{3}\) and \(\frac{2}{3}\) such that the initial probability for an instance to be positive is \(\frac{1}{2}\) on average. This also ensures that trivial grammars are not considered. An example of a target hypothesis and its corresponding automaton are represented on Fig. 5: the parity grammar accepts any string with an even number of 0.

Materials and Methods. Target grammars are generated with Metagol, from a set of sequences regularly sampled from \(\varSigma ^{*}\) and for \(\varSigma = \{0,1\}\). The number of states \(n \ge 3\) is generated according to an exponential decay distribution with mean 4. The generality of the hypothesis returned is measured against a set of 40 newly regularly sampled instances. These steps are repeated until a grammar with generality in \([\frac{1}{3}; \frac{2}{3}]\) is returned. A new number of states is similarly generated to bound the search space.

The metarules provided are acceptor/1 and delta/3 described previously in Fig. 2. The complexity of the hypotheses is set to their length l(H), and the prior is computed by \(\frac{1}{l(H)^{2}}\). For each target grammar, 150 training instances are initially regularly sampled from \(\varSigma ^{*}\): a threshold on the probability fraction used for sampling is randomly generated for each instance, thus their length is bounded. Another 50 instances are similarly sampled for testing.

At each iteration, 50 hypotheses are regularly sampled. The accuracy is measured as the average accuracy of all sampled hypotheses over the testing set. The results presented in the Fig. 7 have been averaged over 50 trials.

Fig. 6.
figure 6

Target hypothesis describing a bee strategy for finding pollen in an environment

6.3 Learning a Bee Strategy

We learn the strategy introduced in section 1 and that describes a bee strategy for finding pollen following information given by a waggle dance. The target strategy is represented in the Fig. 6: until the bee reaches the flower, it flies in the direction given by the waggle dance, and then grabs pollen.

Materials and Methods. The world is a one-dimensional space of size 10. The state of the world is described by a list of facts: position/1,flower\(\_\)position/1,hive\(\_\)position/1 describe respectively the bee, the flower and the hive position, waggle\(\_\)dance/1 gives the direction indicated by the waggle dance, energy/1 and weight/1 represent the energy left for the bee and the weight it carries. Actions are dyadic predicates that modify the state of the world. The primitive actions are as follows: move\(\_\)right/2, move\(\_\)left/2, grab/2, they all have a cost of one unit of energy. The metarules used are the chain rule and two abstractions until/4 and ifthenelse/5 (Fig. 2). For any hypothesis H with length l(H), the resource complexity r(H) is measured against the examples selected so far, and the prior of H is defined as \(\frac{1}{l(H)+r(H)}\). The maximum length of an hypothesis is set to 3. The hive is located in the middle position of the environment. A flower is randomly positioned, and a waggle dance indicates if it is east or west of the hive. In the initial state, the bee is at the hive with no pollen carried. It has some amount of energy randomly generated between 0 and 30. In the final state, it is on the flower with one or zero unit of pollen carried. Positive examples are pairs of states for which the task of finding pollen is fulfilled and with a positive amount of energy in the final state. Negative examples are pairs of states for which the task is not fulfilled or resulting in a negative amount of energy in the final state. Training and test sets are respectively made of 20 and 40 examples, half positive and half negative. The results presented in the Fig. 8 have been averaged over 20 trials.

Fig. 7.
figure 7

Learning a regular grammar with Bayesian MIL: comparison between active and passive learning; the convergence is faster for active learning.

Fig. 8.
figure 8

Learning a bee strategy with Bayesian MIL: comparison between active and passive learning; the convergence is faster for active learning.

6.4 Results and Discussion

The results are presented in Figs. 7 and 8. The learning process takes between 10 min and a couple of hours for the grammar experiment according to the complexity of the target hypothesis, and around a few seconds for each run for the bee experiment. The entropy (Figs. 7a and 8a) is smaller and less regular for passive learning, which is above all visible for a small number of iterations (smaller than 10). In both cases the entropy is globally decreasing as the version space shrinks. The difference between the two curves represents the gain over the reduction of the version space. However, the entropy is smaller to 1: the number of hypotheses is not halved at each iteration as in the ideal case, even for active learning. The number of sampled hypotheses is represented on Figs. 7b and 8b, it is decreasing with the number of iterations, eventually converging to one hypothesis. The decay rate gets smaller as the entropy drops, and is greater for active learning. The complexity of the MAP hypothesis (Fig. 7c and 8c) increases with the number of iterations both for passive and active learning. Indeed, the search is conducted such that the least complex hypotheses consistent with the examples are preferred. Therefore, the prior of the MAP hypotheses is increasing as the training set grows. Finally, the accuracy (Fig. 7d and 8d) increases, starting between 0.6 and 0.7 (the default accuracy is 0.5, and the learning process starts with one positive instance for initialisation). It eventually converges toward 1 for the bee experiment. The convergence is longer and not guaranteed at every run for the grammar experiment, since the hypothesis space is bounded: a number of states maximum is generated before the learning process. Figure 9 compares the number of queries required to reach some accuracy level for active and passive learning, it suggests that less iterations are required to achieve good performances, and that experimental costs can at least be halved with active learning. A Mann-Whitney U test indicates that the results are significant at a 0.01 level.

According to Lemma 1, the size N of the pool of instance initially sampled should be big enough to ensure that instances with high entropy can always be found. Increasing this number lifts up the entropy of the instance selected, and therefore accelerate the convergence. The number K of hypotheses sampled should be big enough to ensure that the sample set is representative of the hypothesis space, and thus relies on the initial size of the version space. Even though these results seem encouraging, the empirical evaluation has been performed on a small domain and focuses on artificial problems. We plan to demonstrate the scalability of this approach on real-world datasets as future work.

Fig. 9.
figure 9

Number of iterations required to reach some accuracy level for active and passive learning: experimental costs can be halved with active learning.

7 Conclusion and Future Work

This article extends previous work on Meta-Interpretive Learning by integrating active learning for learning efficient agent strategies. We study how automated experimentation can help reducing the experimental costs required to reach some accuracy level. Mainly, we show over two examples that one can expect to halve the experimental costs with active learning and compared to passive learning. We believe that this approach is of interest in several AI domains such as robotics or agent-based modelling and could have a wide range of applications.

A limitation of this article is the lack of theoretical bounds on the sample complexity, which we characterise as future work. We also want to demonstrate the scalability of this approach by considering real-world datasets. A next step is to use this framework to uncover novel logic programs instead of known target hypotheses, to demonstrate that it supports scientific knowledge discovery. Another future line of work is to improve the sampling process over the instance space. So far, instances are sampled before the learning. A fresh new sample set of instances could be generated at each iteration, the sampling distribution being updated given the knowledge acquired so far. Also, instances could eventually be synthesised. Next, we want to study other query strategies. Finally, experiments are so far the observation of a binary output for a particular set-up, which could be extended to probabilistic observations.