Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

2.1 Formal Description

The traditional data description presented in Chap. 1 corresponds to so-called single-instance learning, where each observation or learning object is described by a number of feature values and, possibly, an associated outcome. In our object of study, multiple-instance learning (MIL), the structure of the data is more complex. In this setting, a learning sample or object is called a bag. The defining feature of MIL is that a bag is associated with multiple instances or descriptions. Each instance is described by a feature vector, as we saw in single-instance learning, but an associated outcome is never reported. The only information available about an instance, aside from its feature values, is its membership relationship to a bag.

Formally, an instance x corresponds to a point in the instance space \(\mathbb {X}\). It is commonly assumed that \(\mathbb {X}\subseteq \mathbb {R}^{d}\), that is, each instance is described by a vector of d real-valued numbers, its feature values. However, as described in Sect. 1.1, datasets often contain mixed types of features. To model these situations, \(\mathbb {X}\) can be generalized to \(\mathbb {X}\subseteq \mathscr {A}^{d}=\mathscr {A}_{1}\times \cdots \times \mathscr {A}_{d}\), such that each instance is described by a d-dimensional vector, where each attribute \(\mathscr {A}_{i}(\,i=1,\ldots ,d)\) takes on values from a finite or infinite set \(\mathscr {V}_{i}\). In this way, we can deal with mixed feature sets in which some of the features are categorical and others are numeric.

A bag X is a collection of n instances, where every instance \(x_{i}\) is drawn from the instance space \(\mathbb {X}\). Each bag is allowed to have a different size, which means that the value n can vary among the bags in the dataset. Multiple copies of the same instance can be included in a bag. For this reason, many authors define a bag as \(X\in \mathbb {N}^{\mathbb {X}}\), that is, a multi-set containing elements from \(\mathbb {X}\) such that duplicates can occur. Different bags are also allowed to overlap and contain copies of the same instance. This forms an indication of the higher level of complexity of MIL compared to single-instance learning. Throughout this work, we use lowercase letters to represent instances (e.g., x, a, b) and uppercase letters to represent bags (e.g., X, A, B).

As an example, Table 2.1 presents the general structure of a multi-instance dataset. The first column represents the bags, sometimes also referred to as exemplars. Each bag contains a number of instances, represented in the second column. Each instance identifier corresponds to a vector description, of which the attribute values are arranged from columns \(\mathscr {A}_{1}\) to \(\mathscr {A}_{d}\). The first instance \(x_{1,1}\) in the first bag \(X_{1}\) is for example represented by the feature vector \(\langle x{}_{1,1,1},x_{1,1,2},...,x_{1,1,d}\rangle \). The last column represents the outcome associated with the bag. It is important to stress that this outcome is only known for a bag as a whole and not for each individual instance. Depending on the learning task (see Sect. 2.3), the outcome may be a class label (classification) or a real value (regression). In clustering applications, there are no outcome values available. We briefly note that the work of [11] showed that the performance of multi-instance learners on datasets with very similar meta-characteristics, like dimensionality and size, can be very different.

Table 2.1 Structure of a multi-instance dataset with M bags

2.2 Origin of MIL

The multi-instance learning paradigm was introduced in the seminal work of [16]. It arose in the context of learning tasks where data observations (bags) can have different alternative descriptions (instances). The authors of [16] focused on an application in biochemistry: the drug activity prediction problem. Here, the task is to predict whether or not a given molecule is a good drug molecule, which is measured by its ability to bind to a given target. Each molecule can be represented as a bag, of which the instances correspond to different conformations (molecular structures) of that particular compound. Figure 2.1 depicts this situation for a butane molecule. In this case, butane would be represented by a bag containing the 12 listed shapes as its instances.

Fig. 2.1
figure 1

Conformations of a butane molecule

MIL emerged as an extension of supervised learning. bag-instances relationship models the one-to-many relation characteristic of relational databases, since one bag can contain several different instances. More than an extension, MIL can therefore be considered a generalization of single-instance learning and the latter can be understood as a special case of MIL where each bag contains a single instance. Moreover, MIL has proven to be a bridge between two different paradigms: propositional learning on the one hand and relational learning on the other.

2.2.1 Relationship with Propositional Learning

Propositional or attribute-value learning corresponds to the setting described in Sect. 1.1, where the training data is ordered in a single flat table. In single-instance semi-supervised learning (Sect. 1.3.3), only part of the instance outcomes are available and it therefore shows a certain similarity with MIL, where the outcomes are only known for the bags and not their instances. However, there is a fundamental difference between the two: the relationship between instances and bags in MIL does not exist in semi-supervised learning. In the latter, labeled instances are at the same level as unlabeled instances and there is no specific relationship between them. In MIL on the contrary, a secondary structure is present in the dataset, defining the two different levels of bags and instances. All instances in a bag are somehow interrelated, because of their shared membership to the bag.

2.2.2 Relationship with Relational Learning

In relational learning, structured concept definitions are derived from structured training examples [14]. The training data models the different observations as well as the relations between them, for instance by using multiple tables. A clear example is given in [15], where the relational data is represented by two tables, one providing the description of store customers and the other the marital relations between them.

Many learning methods have been developed for propositional learning, but these can only be applied to data organized in a single table and the relations between different observations can not be taken into account. Propositional algorithms can therefore not be directly applied in relational learning problems. Relational data can be transformed into an attribute-value table in a process called propositionalization, but this implies a steep computational cost and its application to real problems is limited as a result of an internal combinatorial explosion [47].

MIL has come to be considered as the missing link between relational and propositional learning, because, as stated above, the bag label models a one-to-many relationship. The contribution of [13] shows that multi-instance problems can also be considered as a special case of inductive logic programming [37]. All inductive logic programming problems (in the form of relational databases) can be transformed by database join operations in a single one-to-many relationship. Such a relation can in turn be naturally represented as a MIL problem [47, 48]. As will be discussed in later chapters, many single-instance learning algorithms have already been adapted to the multi-instance setting. This feature of MIL allows for many relational learning problems to be solved by traditional supervised learning methods.

2.3 MIL Paradigms

As in traditional single-instance learning, discussed in Sect. 1.3, we can distinguish between a number of learning tasks within MIL. In Sect. 2.3.1 we discuss the two supervised learning settings, classification and regression. Section 2.3.2 describes multi-instance clustering. Several other traditional learning tasks, like semi-supervised or multi-label learning, can find a corresponding MIL equivalent (e.g., [44, 82]). However, we must warn the reader that this general similarity between single-instance and multi-instance learning tasks can not be transferred to their solution methods. Due to the relational nature, MIL solution methods are inherently more complex. This also implies that some MIL tasks have no related single-instance setting. The most prominent example is presented in Sect. 2.3.3.

2.3.1 Multi-instance Classification and Regression

In a multi-instance classification problem, the goal is to determine the class label of new bags, based on the class labels in the training set or, more specifically, using a prediction model built on the labeled training bags. The outcome associated with the training bags is categorical.

More formally, in a classification problem, we deal with a training set \(D=\left( \mathbf {X},\mathbf {L}\right) \), where \(\mathbf {X}=\left\langle X_{1},\ldots ,X_{m}\right\rangle \) is a set of bags and \(\mathbf {L}=\left\langle \ell _{1},\ldots ,\ell _{m}\right\rangle \) a set of class labels, with \(\ell _{i}\in \mathbb {L}\) (\(i=1,\ldots ,m\)) and \(\mathbb {L}\) the finite set of all possible class labels. The bag \(X_{i}\) is assigned the class label \(\ell _{i}\). Recall that only the class labels of the bags are known and not those of the instances inside them. Later on in this work, we provide a detailed discussion on the contribution of the individual instances to the bag label. Traditionally, MIL has focused on two-class classification problems, dealing with one positive and one negative class. However, in general the number of classes can be larger, that is, \(\left| \mathbb {L}\right| \ge 2\). The classification objective is to find a function \(\mathscr {H}:\mathbb {N}^{\mathbb {X}}\rightarrow \mathbb {L}\) based on the training set D. This function is the classification model and is used to predict the class labels of new bags as accurately as possible. More details on multi-instance classification will be provided in Chap. 3.

When the outcomes are known for all training bags, but they correspond to real values rather than class labels, we are dealing with a multi-instance regression problem. The data description is highly similar to the one for classification data. The main difference is that the bag class labels are replaced by numerical values, that is, \(\mathbb {L}\) corresponds to a range of values in \(\mathbb {R}\) rather than to a finite set. Multi-instance regression was proposed in [2, 46], independently at the same conference. This task is discussed further in Chap. 6.

2.3.2 Multi-instance Clustering

As discussed in Sect. 1.3.2, clustering is situated in the unsupervised learning domain. The set of outcomes \(\mathbf {L}\) associated to the training bags \(\mathbf {X}\) in D is not known or not available. The goal is to group these unlabeled bags based on a given similarity measure. A multi-instance clustering method determines a set of groups \(\mathscr {G}=\{G_{1},\ldots G_{k}\}\) and a function \(\mathscr {H}:\mathbb {N}^{\mathbb {X}}\rightarrow \mathscr {G}\) which assigns bags to groups such that it minimizes the similarity differences between bags of the same group and maximizes the similarity differences between bags of different groups. The choice of an appropriate similarity measure is crucial in multi-instance clustering. As noted in [74], not all instances within a bag contribute equally to the bag prediction, which implies that the bags should ideally not be considered as collections of independent instances in the definition of the similarity metric. Multi-instance clustering is discussed in more detail in Chap. 7.

2.3.3 Instance Annotation

An important task in some MIL applications, which has no counterpart in single-instance learning, is the instance-level classification. In this setting, apart from predicting a class label for a new bag, the assignment of class labels to its instances is a key objective as well. Depending on the application, there are two possible cases.

In the first situation, given the training set \(D=\left( \mathbf {X},\mathbf {L}\right) \), the objective is to locate the instance or instances that are key to determining the class of the bag. In general, key instances are considered those that are more likely to have the same (hidden) label as their bag. A function \(h:\mathbb {X}\rightarrow \mathbb {L}\) is constructed, such that the corresponding aggregation function \(H\left( h\left( x_{1}\right) ,\ldots ,h\left( x_{n}\right) \right) \rightarrow \mathbb {L}\) can predict class labels of a new bag \(X=\left\{ x_{1},\ldots ,x_{n}\right\} \) with maximum possible accuracy. This learning strategy is employed by a large group of multi-instance classification algorithms, described in Chap. 4. Some applications require the identification of key instances not only to classify bags, but also because these instances are themselves relevant to the application (e.g., [30]). An example application where the identification of true positive instances is very informative, is that of the stock selection problem [33]. In that setting, true positive instances correspond to stocks that fundamentally perform well, which is an important subgroup to discern from the other stocks.

In the second case, the training set is represented as \(D=\left( \mathbf {X},\mathbf {L}\right) \), where \(\mathbf {X}=\left\langle X_{1},\ldots ,X_{m}\right\rangle \) are bags and \(\mathbf {L}=\left\langle \mathscr {L}_{1},\ldots ,\mathscr {L}_{m}\right\rangle \) are sets of instance labels associated to the bags. In this situation, the set \(\mathscr {L}_{i}=\left\{ \lambda _{1},\ldots ,\lambda _{k_{i}}\right\} \) of explicit instance labels is assigned to the bag \(X_{i}\). These labels are drawn from a set \(\varLambda =\left\{ \lambda _{1},\ldots ,\lambda _{s}\right\} \), which can be different from \(\mathbb {L}\). Unlike the traditional MIL approach, some instance labels are known for each bag. The objective is to find a function that, given a new bag, allows us to find instance labels that best describe it. This setting is very popular in applications such as image annotation (e.g., [7]), where the annotation of image segments (instances) can result in a global label for the complete image (bag). Since one observation (bag) is associated with a set of (instance) labels, this approach shows some similarity with multi-label classification (Sect. 1.3.1). However, multi-label and multi-instance learning remain different paradigms. The former represents each observation by multiple instances and a single global class label, while in the latter an observation corresponds to one instance associated with several labels.

Fig. 2.2
figure 2

A full English breakfast

2.4 Applications of MIL

In MIL, a more complex structure of data observations can be represented. The multi-instance setting is required to model several real-world applications that we list in this section. There is an inherent level of representation ambiguity in this type of problems and we can distinguish between several sources. MIL data naturally arises in the following situations:

  • Alternative representations: different views, appearances or descriptions of the same object are available. A classical example in this case is that of drug activity prediction, the application for which MIL was originally developed in [16] (see also Sect. 2.2).

  • Compound objects: a compound object consists of several parts. In the example of image recognition, an image corresponds to a bag and each image segment forms an instance. An example is found in Fig. 2.2. The image segments can correspond to different breakfast components like the slice of toast, the sausage, the beans, and so on. Together, they form a full English breakfast.

  • Evolving objects: in these applications, an evolving object is sampled at different time intervals. This is also referred to as a time-series problem. The bag represents the object, while the time point samples are its instances. An example is the study around the use of MIL in bankruptcy prediction presented in [27].

The main research focus within the MIL community has been on multi-instance classification problems. A variety of application domains are listed in Sects. 2.4.12.4.6. In Sect. 2.4.7, we consider applications of multi-instance regression, while multi-instance clustering applications are discussed in Sect. 2.4.8.

2.4.1 Bioinformatics

We have already discussed the application of drug activity prediction in Sect. 2.2. Each bag corresponds to a molecule and its instances are the different molecular shapes, as shown in Fig. 2.1. The objective in the original MIL proposal [16] is the prediction of musky and non-musky molecules. Other drug activity problems concern the mutagenicity prediction of compound molecules [52] and activity prediction of molecules as anticancer agents [6]. Studies like [21, 33, 72, 80] address the drug activity prediction problem with their proposed multi-instance classifiers as well.

Another bioinformatics application of MIL is the protein identification task, like the recognition of Thioredoxin-fold proteins, as explored in, e.g., [45, 55, 59]. Binding proteins of the Calmodulin protein are identified in a multi-instance classification process in [36], while the application in [40] is the prediction of binding peptides for the highly polymorphic MHC class II molecules. In [29], multi-instance multi-label classification is used to automate the annotation of gene expression patterns. This method was evaluated on Drosophila melanogaster (fruit fly).

2.4.2 Image Classification and Retrieval

Another widely studied MIL application area is that of image classification, where the goal is to, given an image, decide on what it represents or to which of a given set of categories it belongs. As an example, consider the early work of [34] that revolves around the classification of natural scene images, e.g., images of waterfalls. In the data representation, an image corresponds to a bag. The instances within this bag are subimages, encoded as templates describing color and spatial characteristics of that specific region. The subimages can be obtained by a partitioning process or, possibly more appropriately, an image segmentation procedure. In a perfect segmentation, the resulting regions correspond to individual objects. The classification objective is to predict what the complete image represents. If we consider Fig. 2.2, a multi-instance classifier should derive that it is processing an image of a full English breakfast based on the different objects on the plate. This type of region-based image categorization was also evaluated in [3, 9, 10, 24, 42], although not all of these referenced works developed multi-instance classification methods specific for image data. They often consider more general algorithms and evaluate them on a variety of applications. Multi-instance image datasets have indeed become popular benchmarks to evaluate new proposals on. One specific type of image classification, facial recognition, where a bag of instances can represent images taken of the same person from different angles, was studied in, e.g., [8, 19].

More complex models for the mapping of images to multi-instance data were studied in later works. The method of [43] models the interrelations of instances (regions) in a bag (image) to improve the categorization process, while [25] considers image annotation by means of a joint multi-instance mapping and feature selection process. The recent proposal of [20] develops a multi-instance semi-supervised classification method based on sparse representation and evaluates it on image data.

A task related to image categorization is that of image retrieval. The aim in this case is to obtain images from a dataset that are semantically relevant to the user, based on his specified query or presented examples of images of interest. Multi-instance approaches to this challenge represent, as above, an image as a bag, containing many of its subimages as instances. Examples can be found in, e.g., [7, 66, 71, 7577].

2.4.3 Web Mining and Text Classification

Another application domain of MIL lies in web mining. The web index recommendation problem was introduced as a multi-instance problem in [81]. In this application, a bag corresponds to a web index page and its instances refer to other websites to which the page links. The recommendation task is to suggest relevant web pages to users based on their browser history. Such knowledge is useful for the construction of intelligent web browsers. This problem domain was also the central focus of [67, 69], in which genetic programming algorithms were developed to solve it. In [51], a multi-instance classifier based on the Rocchio classifier [49] was developed for this application.

A related task is that of document classification. In [3], the proposed multi-instance classification method is evaluated on a document categorization problem. In this case, a bag corresponds to a document and the instances are particular passages within that document. In the experiments of [45], the dataset obtained in the biomedical study of [5] is used. A bag corresponds to a biomedical article about a particular protein and the instances are the paragraphs of the text. A positive bag is one that can be labeled with a Gene Ontology code, while a negative bag cannot. The classification goal is to discern between positive and negative bags.

2.4.4 Object Detection and Tracking

This domain requires methods that discern an object of interest in image or video data. Examples are the application of the proposed multi-instance boosting method to horse detection and pedestrian detection in [1]. In [32], the detection of landmines based on radar images is studied in a multi-instance classification context. The study of [61] considers the related aspect of saliency detection, which is the detection of the object in the image that draws the visual attention, as humans focus more on some parts of pictures than on others. It is not known in advance what the object is, only that it draws the attention of the observer.

In an object tracking application, a specific object is followed during the course of a video sequence. Online methods have been proposed in, e.g., [4, 73]. In the recent contributions of [31, 83], online multi-instance boosting algorithms for visual object tracking problems are developed.

2.4.5 Medical Diagnosis and Imaging

Several studies on multi-instance data focus on applications within the medical domain. In [22], a multi-instance classification framework is developed for computer-aided medical diagnosis, like the detection of tumors. It is shown that the use of this framework significantly improves the diagnostic accuracy in the evaluated applications. The study of [53] concerns the automatic detection of myocardial infarction based on electrocardiography (ECG) recordings. For each patient, a 24-h ECG is taken, which traces his or her heart activity for a full day. Such a recording is too large to be interpreted by a cardiologist. Automated prediction tools are required to detect any heart abnormalities in the data. In the input data for the multi-instance classifier, a bag corresponds to a full ECG, while each instance represents a recorded heartbeat.

The proposal of [41] studies the early detection of illnesses, like frailty and dementia, in senior citizens. This is done in a noninterfering way, namely by using sensor data, collected from a number of sensors monitoring elderly people in nursing homes. A bag consists of 24 hourly sensors measurements (instances) taken in one day for a single patient. The label of a bag is determined based on the report made by a nurse for the patient on that particular day. It indicates whether the patient exhibited health problems (positive) or not (negative).

A fourth study [60] develops a multi-instance classification algorithm for the detection of colonic polyps, abnormal growths in the colon. It revolves around video classification. When a possible polyp is present in the colon, images of it are collected from several viewpoints and combined into a video. Each candidate polyp consequently corresponds to a bag. The different viewpoints or video frames are the instances. The prediction aim is to decide whether the videoed candidate is an actual polyp or not.

2.4.6 Other Classification Applications

In this final section on applications of multi-instance classification, we collect a number of miscellaneous applications that do not fall within any of the categories listed in the previous sections.

Multi-instance classification has been applied to prediction of student performance [68]. This problem allows interesting relationships to be obtained that can suggest activities and resources to students and educators that favor and improve both learning and the effective learning process. From the MIL perspective, each student is regarded as a bag which represents the work carried out and is composed of one or several instances where each instance represents the different types of work that the student has done. This representation has shown better results than traditional single-instance representation [68]. The work of [70] proposes a genetic programming model to solve this problem more efficiently.

The study of [35] proposes a method for automatic subgoal discovery in reinforcement learning [54]. The trajectory of an agent in a reinforcement learning process is encoded as a bag. The observations made along this trajectory are the instances. The bag label states whether the trajectory is successful or not, where the definition of success depends on the problem description.

Multi-instance classification has been applied to several computer-related tasks as well, for instance in the work of [50] that focused on computer security applications. Impending failure of computer hard drives is predicted in [38]. A bag corresponds to a single drive and its instances are observations of this drive taken at different time points. In [26], the quality of object-oriented software is estimated. A class hierarchy is transformed into a bag, containing the constituent classes as instances.

The proposed classification method of [33] was evaluated on a stock selection problem. In this work, each bag represents a month of trading. A positive bag contains the 100 stocks (instances) with the highest returns in that month, while a negative bag consists of the five stocks with the lowest returns.

The final classification application that we list, is graph mining, the process of extracting knowledge from graph structured data. Multi-graph learning is a further generalization of MIL, where every bag consists of several graphs. In MIL, all instances in the bags are drawn from the same feature space, but this is no longer the case in multi-graph learning. This area was the focus of the recent works [64, 65].

2.4.7 Regression Applications

Although to a lesser extent than for classification problems, we also encounter real-world applications of multi-instance regression. We collect these examples in this section.

The application referenced in one of the original proposals of multi-instance regression [46] is related to the drug activity prediction problem. Instead of treating this as a yes-or-no question, as done in the classification scenario, real-valued activity levels are estimated for the molecules. The second initial proposal on multi-instance regression [2] also interpreted drug activity prediction as a regression problem, where the binding strength of a molecule is the prediction objective. The theoretical study on multi-instance regression in [17] refers to the real-valued drug activity prediction problem as an important application as well. In [12], the authors develop a method to predict the binding affinity of molecules based on their three-dimensional structure. They evaluate their method on thermolysin inhibitors, dopamine agonists, and thrombin inhibitors. In later work, [56] considers the prediction of protein-ligand affinities and [18] the prediction of the binding affinity of MHC class II molecules.

The study of [23] uses a real-valued outcome in the interval [0, 1] to express the satisfaction degree of a bag to the concept. One of the evaluated applications is landmark recognition for robot vision. In a navigation assignment, robots are required to recognize whether or not they find themselves near one of a given set of landmarks.

Multi-instance regression has also been used in remote sensing applications. The contribution of [57] focuses on an agricultural process, namely the modeling of crop yield based on remote sensing data. A bag corresponds to one county in the United States. The instances in the bag are image pixels covering different parts of that county. The same application was evaluated in [58], where the authors developed a multi-instance regression method for structured data. In [62], a climate research application related to aerosols is considered. The prediction value is the so-called aerosol optical depth, which is a number related to the induced attenuation of radiation. This value characterizes aerosols and is central in the construction of climate models. Aerosols are globally monitored by satellites that provide data in the form of multi-spectral images. In this application, a bag corresponds to a set of neighboring pixels (instances) in such an image. The bag is labeled with an aerosol optical depth value. The two remote sensing applications, aerosol optical depth prediction and crop yield modeling, were also studied in [63].

Finally, we also list the multi-instance regression study of [39]. The authors develop a robust system for age estimation of a person based on an image of his or her face.

2.4.8 Clustering Applications

In this section, we review the applications for multi-instance clustering that have been presented in the literature. Recall that the goal of this learning paradigm is to arrange the bags in a number of well-separated groups of similar observations.

The proposal of [74] references an application in biochemistry. The execution of experiments to determine the functionality of specific molecules can be costly. Multi-instance clustering can be used in the often necessary step to derive the functionality of a molecule by identifying similar molecules with known characteristics. The method of [28] was evaluated on two types of clustering problems. The first one consists of enzyme data, where a bag corresponds to an enzyme and its instances to amino acid sequences. The second problem is the clustering of the molecules in the drug activity prediction datasets taken from [16].

In [78, 79] a multi-instance clustering method based on the maximum margin principle was proposed. It was evaluated on two separate applications. In image clustering, the method is used to detect common hidden concepts or patterns in images. As was done in the image classification applications listed in Sect. 2.4.2, the images correspond to bags and the instances are image segments. The second application is text clustering. In this case, a bag represents a document and is made up from (possibly overlapping) passages taken from this document.