1 Introduction

Very recently the world has coped with the COVID-19 pandemic. COVID-19 is caused by a Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV-2) and its common symptoms are: fever, dry cough, fatigue, short breathing, vanishing of taste, loss of smell. The first known case of this Coronavirus disease was reported in Wuhan, China in the last days of 2019 [35] and, since then, the virus propagated all over the world. The main sources of infection are asymptomatic (but infected) people which can become a source of spread. Transmission mainly occurs by air through the droplets, but also by indirect transmission, such as through contact with contaminated surfaces. On March 11, 2020 the World Health Organization (WHO) declared the epidemic a global emergency (pandemic). Lockdown measures and drastic restrictions of movements and social life affected the lives of billions of people. COVID-19 was the most significant global crisis since the Second World War, but its repercussions exceeded those of a war.

COVID-19 has interstitial pneumonia as the predominant clinical manifestation. The interstitium is a particular entity located between the alveolus and the capillaries, which is investigated mainly with radiological techniques. Radiological imaging does not represent a diagnostic criterion for SARS-CoV2 infection, but it is able to highlight any pneumonia that can be associated with it, and in this case it is possible to see an opacity on the radiograph, called thickening. In the first four days (initial phase) the X-ray image is characterized by blurred thickening in the lower part of the lungs. From the fifth to the eighth day, there may be a clinical worsening of the patient who will present cough and difficulty in breathing (worsening dyspnea): in such case, the radiography image shows a greater extension of the pulmonary thickening and the lungs, so to speak, appear more and more white (Fig. 1).

Fig. 1
figure 1

Normal and COVID-19 X-ray images: a normal case, b COVID-19 at early stage, c COVID-19 at advanced stage [25]

Differently from computer tomography, X-ray imaging is cheaper and easier to perform: this simple and widely available imaging test can actually tell a lot about patient’s clinical status and whether the COVID-19 patient requires hospitalization for mechanical ventilation or intubation. Finally a last, but non negligible, aspect is that the X-ray machines (even portable) are much more available also in poor and developing countries.

Table 1 summarizes the main clinical features which can be detected by any chest X-ray radiography of patients affected by COVID-19, reporting also the major pros and cons relating to its adoption.

Table 1 Clinical COVID-19 features of chest X-ray radiography: main advantages and disadvantages

As confirmed by [51], artificial intelligence systems nowadays play a very important role in supporting early diagnosis, illness evaluation and treatment response assessment for different diseases. As a consequence, we believe that using medical imaging techniques, in combination with more sophisticated machine learning systems, can effectively help also in the diagnosis and the follow-up of patients with COVID-19.

Since 2020 up today, several works have been published on COVID-19 detection by means of chest X-ray image classification. Most of them are convolutional neural network (CNN) approaches or, more in general, deep learning techniques. One of the first papers in this field is [27], where two algorithms were presented, including a deep neural network on fractal features of the images and CNN methods directly, the latter providing an accuracy of 93.2% in discriminating between COVID-19 and normal X-ray images.

In [4] an enhanced dense convolutional network (Dense Net) was proposed, while in [7] a new approach based on existing deep learning models was used, focusing on enhancing the pre-processing stage. In [37] the authors proposed a method based on an optimized robust CNN architecture, while in [45] a linear regression approach was designed in addition to a deep CNN, the former providing an accuracy of 97.6% for discrimination between healthy and COVID-19 patients. In [46], a combination of CNN, support vector machine (SVM) and Sobel filter was proposed on 1332 images, starting from 333 original images (77 images of COVID-19 patients and 256 images of normal subjects), whose number was increased to 1332 by a data augmentation operation. Three augmentation strategies (rotation, random noise and horizontal flips) were also adopted in [26], while a fuzzy logic based on deep learning approach was proposed in [30] to differentiate between images of patients with COVID-19 pneumonia and with interstitial pneumonias not related to COVID-19, obtaining an accuracy of 80.9%. In [31], an evolutionary deep learning approach was designed to discriminate between COVID-19 and healthy patients, obtaining an accuracy of 98.57%. In [32] a stacked ensemble of four heterogeneous pre-trained computer vision deep learning models was presented, while in [38] a new deep learning framework was designed, based on the fusion of a dense convolutional network and a capsule neural network. In [17], a new pre-processing model for COVID-19 images was analyzed and a feature extraction was performed according to RGB values. In [42] a screening system was developed to differentiate among COVID-19, common viral pneumonia, bacterial pneumonia and normal chest X-ray images, using two different stages. The first stage is a pre-processing phase involving bone suppression and lung segmentation models, the former investigated also in [43]. The second stage is the classification task, based on a CNN system.

More recently, in [29] a deep learning method based on a custom CNN was proposed, utilizing dropout and batch normalization to enhance the performance and to reduce the overfitting. The proposed approach achieved a classification accuracy of 98.19% for discriminating among COVID-19, normal and common viral pneumonia images. In [48] an appropriate CNN model was enhanced using a feature fusion strategy from multi-modal imaging datasets and an SVM classifier was employed to discriminate between COVID-19 and healthy people, achieving an accuracy of 98.7%. In [22] after a pre-processing phase, the classification algorithm uses pre-trained CNN models, achieving an accuracy of 95% in the binary case between normal and COVID-19 X-ray images, while in [47] a densely attention mechanism-based network was proposed.

Other more general works on deep learning methods for COVID-19 X-ray classification are [33, 34, 49], while a survey on applying machine learning techniques in this field is reported in [19].

In this paper we present a comparative study of some instance-level Multiple Instance Learning (MIL) techniques applied to COVID-19 detection by means of chest X-ray images. As observed in the above literature review, most of the works in this field are deep learning models based on neural networks. Although the neural network approaches usually work well, on the other hand these models generally require a lot of data in the learning phase and, moreover, they are often uninterpretable, where by interpretability we mean the definition reported in [36], that is the degree to which an observer can understand the cause of a decision. Vice versa, as we will see in the next section, MIL approaches (especially the instance-level ones) are easily interpretable and consequently more suitable for image classification: in fact, once the discrimination task is performed, it is not difficult to practically interpret why an image is classified positive or negative. In addition, even though screening for COVID-19 no longer exists in most countries, we believe that this work is still relevant because it proposes to clinicians, if necessary, a new rapid and automatic approach to discriminate between different pneumonia.

The paper is organized in the following way. In the next section we recall the main concepts characterizing the MIL paradigm. In Sect. 3 we describe some recent linear type MIL techniques, that we have adopted in our numerical experimentation detailed in Sect. 4 and aimed at discriminating between COVID-19 and common viral pneumonia chest X-ray images. Finally some conclusions are drawn in Sect. 5.

2 Multiple instance learning

Multiple Instance Learning [28] (MIL) is a technique consisting in classifying sets of points: such sets are called bags and the points inside the sets are called instances. In comparison with the standard supervised classification, the main characteristic of a MIL approach is that in the learning phase only the class labels of the bags are known, whereas the class labels of the instances are unknown.

MIL techniques are applied in different contexts such as in bankruptcy prediction, image classification, text classification, speaker identification and so on. In particular, the first MIL problem encountered in the literature [21] concerned the classification of drug molecules (bags), on the basis of the possible three-dimensional conformations (instances) they can assume.

We focus on MIL classification with two classes of bags (positive and negative) and two classes of instances (positive and negative), using the so-called standard MIL assumption, which considers positive a bag containing at least a positive instance and negative a bag containing only negative instances. Such assumption fits very well diagnostic imaging (see [39]), where a patient is classified non-healthy (that is positive) if his/her medical image (bag) contains at least an abnormal subregion and is considered healthy (that is negative) if all the subregions forming his/her medical image are normal.

In the literature, for solving a MIL problem, there exist mainly three kinds of approaches (see [5, 18]). The first one is the bag-level approach where each bag is treated as a global entity, while the second one is the instance-level approach where the classification is performed in the instance space, obtaining the class label of each bag as aggregation of the class labels of the corresponding instances. The last approach is a compromise between the two previous ones: it consists in representing each bag by one of its instances, that will be successively used to perform the classification process.

Some recent MIL works in diagnostic imaging are [11, 15, 16, 50]. In particular, in [11] a MIL approach has been adopted for melanoma detection on clinical data constituted by some color dermoscopic images, with the aim of discriminating between melanomas (positive bags) and common nevi (negative bags) images. Extensive numerical experiments, performed on a dataset constituted by 80 melanomas and 80 common nevi images, provided an accuracy of 92.5%, with sensitivity and specificity equal to 97.5% and 87.5%, respectively. These results encourage to investigate on the possible use of MIL approaches also in COVID-19 detection by means of chest X-rays images, which is the scope of our study focusing on the discrimination between COVID-19 and common viral pneumonia patients, which is not an easy task due to the similarity of the X-ray images in the two classes.

The MIL techniques that we adopt to this end are described in the next subsection and fall into the instance-level class. All of them are designed to satisfy the standard MIL assumption, introduced above. As mentioned in the introduction section, the instance-level MIL techniques are really suitable in diagnostic imaging, especially in terms of interpretability: in fact an instance-level approach is aimed at assigning a label to each instance inside the bags, making a bag positive in case at least one of its instances is classified positive. Since in the MIL perspective the images are identified with the bags and the subregions forming the images are the corresponding instances, this criterion, based on the standard MIL assumption, allows the doctors to identify abnormal subregions in a positive X-ray image.

3 Some recent MIL linear type approaches

In this section we describe some recent MIL techniques, that we have used for our comparative study, aimed at discriminating between COVID-19 and viral pneumonia X-ray images. All these approaches provide a linear type separation of the bags, starting from the standard MIL assumption. In particular, while three of them (MIL-RL, mi-SPSVM and MIL-kink) provide a separation hyperplane, the last one (MI-POLY) separates the positive and the negative bags by constructing a polyhedron, based a prefixed finite number of hyperplanes.

We use the following notation. We denote by \(u^Tv\) the inner product between two n-dimensional real vectors \(u, v \in \mathbb {R}^n\) and by \(\Vert w\Vert \) the Euclidean norm of vector \(w\in \mathbb {R}^n\). We indicate by m the number of positive bags, by k the number of negative bags and by \(x_j\in \mathbb {R}^n\) a generic instance characterized by n features. Finally, we denote by \(J_i^+\), for \(i=1,\ldots ,m\), the index set corresponding to the instances of the i-th positive bag and by \(J_i^-\), for \(i=1,\ldots ,k\), the index set corresponding to the instances of the i-th negative bag.

3.1 MIL-RL

Algorithm MIL-RL [9] is an instance-level MIL technique, providing a separation hyperplane of the type:

$$\begin{aligned} H(w,b) {\mathop {=}\limits ^{\triangle }}\{x\in \mathbb {R}^n ~| ~w^{T}x + b = 0\},\end{aligned}$$
(1)

where \(w \in \mathbb {R}^n\) is the normal to the hyperplane and \(b\in \mathbb {R}\) is the bias. The algorithm is based on a heuristic solution to the following SVM type model introduced in [6]:

$$\begin{aligned} \left\{ \begin{array}{lll} \displaystyle \min _{y,w,b} & \displaystyle \frac{1}{2}\Vert w\Vert ^2& +C \displaystyle \sum _{i=1}^m\displaystyle \sum _{j\in J^+_i} \displaystyle \max \{0, 1-y_j(w^T x_j+b)\}\\ & & +C \displaystyle \sum _{i=1}^k\displaystyle \sum _{j\in J^-_i} \displaystyle \max \{0,1+(w^T x_j+b)\}\\ \\ & & \displaystyle \sum _{j\in J^+_i} \displaystyle \frac{y_j+1}{2}\ge 1 \quad i=1,\ldots ,m \\ \\ & & y_j \in \{-1,+1\} \quad j\in J^+_i, \quad i=1,\ldots ,m, \\ \\ \end{array}\right. \end{aligned}$$
(2)

where the unknowns \(y_j\) represent the class labels to be assigned to the instances of the positive bags. As in the standard SVM approach, the positive parameter C tunes the weight between the maximization of the margin, obtained by minimizing the Euclidean norm of w, and the minimization of the misclassification errors of the instances, given by the second and the third term of the objective function. Finally, the constraints

$$\begin{aligned} \displaystyle \sum _{j\in J^+_i} \displaystyle \frac{y_j+1}{2}\ge 1 \quad i=1,\ldots ,m \end{aligned}$$
(3)

impose that, for each positive bag, at least one instance should be positive (that is with label equal to +1).

Note that, when \(m=k=1\) and \(y_j=+1\) for any j, problem (2) reduces to the classical SVM quadratic program.

MIL-RL is based on solving successive Lagrangian relaxation problems of (2), obtained by relaxing constraints (3). In [9] it has been shown that, considering the Lagrangian dual of (2), in correspondence to the optimal solution there is no duality gap between the primal and dual objective functions.

3.2 mi-SPSVM

Algorithm mi-SPSVM, recently introduced in [13, 14], combines the nice properties exhibited for supervised classification by the SVM technique in terms of accuracy, and by the proximal support vector machine (PSVM) approach [24] in terms of computational complexity. It computes a separating hyperplane of the type (1) by solving, at each iteration, the following quadratic problem:

$$\begin{aligned} & \displaystyle \min _{w,b} \displaystyle \frac{1}{2}\left\| \begin{array} {c}w \\ b \end{array}\right\| ^2\displaystyle \frac{C}{2} \displaystyle \sum _{j \in J^+} [1-(w^T x_j+b)]^2 \nonumber \\ & \quad + C \displaystyle \sum _{j\in J^-} \displaystyle \max \{0,1+(w^T x_j+b)\}, \end{aligned}$$
(4)

by varying of the sets \(J^+\) and \(J^-\), which contain the indexes of the instances currently considered positive and negative, respectively. In the separable case, while the third term of problem (4) makes the hyperplane

$$\begin{aligned} H^-(w,b) {\mathop {=}\limits ^{\triangle }}\{x\in \mathbb {R}^n ~| ~w^{T}x + b = -1\}\end{aligned}$$
(5)

a supporting hyperplane for the instances indexed by \(J^-\) (as in the standard SVM), the second term maximizes the proximity of the positive instances (indexed by \(J^+\)) around the hyperplane

$$\begin{aligned} H^+(w,b) {\mathop {=}\limits ^{\triangle }}\{x\in \mathbb {R}^n ~| ~w^{T}x + b = 1\}.\end{aligned}$$
(6)

Moreover, differently from SVM, in the PSVM approach the maximization of the margin is obtained including in the norm also the bias b, making problem (4) stricly convex also with respect to b.

In mi-SPSVM, the inizialization of the sets \(J^+\) and \(J^-\) is done by inserting in \(J^+\) the indexes of all the instances of the positive bags and in \(J^-\) the indexes of all the instances of the negative bags. Once an optimal solution \((w^*,b^*)\) to problem (4) has been computed, the two index sets \(J^+\) and \(J^-\) are updated in the following way:

$$\begin{aligned} J^+:=J^+\setminus \bar{J} \quad \quad \text{ and } \quad \quad J^-:=J^-\cup \bar{J} \end{aligned}$$

where

$$\begin{aligned} \bar{J} = \{j\in J^+\setminus J^*\;|\;w^{*T}x_j + b^{*}\le -1\}, \end{aligned}$$

with

$$\begin{aligned} J^*=\{j^*_i,\; i=1,\ldots ,m \;|\;w^{*T}x_{j^*_i} + b^{*}\le -1\} \end{aligned}$$

and

$$\begin{aligned} j^*_i {\mathop {=}\limits ^{\triangle }}\arg \max _{j\in (J^+_i\cap J^+)} \{w^{*T}x_j+b^{*}\}. \end{aligned}$$

Note that a particular role in the definition of the set \(\bar{J}\) is played by the set \(J^*\), introduced for taking into account constraints (3), which impose the satisfaction of the standard MIL assumption. At the current iteration, the set \(J^*\) is the index set (subset of \(J^+\)) corresponding to the instances closest, for each positive bag, to the current hyperplane \(H(w^*,b^*)\) and strictly lying in the negative side with respect to it. If an index, say \(j^*_i \in J^*\), corresponding to one of such instances entered the set \(J^-\), all the instances of the i-th positive bag would be considered negative by problem (4), favouring the violation of the standard MIL assumption. This is the reason why the indexes of \(J^*\) are prevented from entering the set \(J^-\): in this way, for each positive bag, at least an index corresponding to one of its instances is guaranteed to be inside \(J^+\).

3.3 MIL-kink

This approach, described in details in [23], provides a separation hyperplane of the type (1), by heuristically minimizing the following error function:

$$\begin{aligned} & \displaystyle \sum _{i =1}^m \max \left\{ 0, \min _{j \in J^+_i}\{1-(w^Tx_j+b)\}\right\} \nonumber \\ & \qquad \qquad +\displaystyle \sum _{i=1}^k \displaystyle \sum _{j \in J_i^-}\max \left\{ 0, 1+(w^Tx_j+b) \right\} , \end{aligned}$$
(7)

which is easily derived by taking into account the standard MIL assumption. In fact a positive bag, indexed by \(J_i^+\), is correctly classified if

$$\begin{aligned} w^Tx_j +b \ge 1, \quad \text{ for } \text{ at } \text{ least } \text{ one }\,\,\,j \in J^+_i, \end{aligned}$$

that is if

$$\begin{aligned} 1-(w^Tx_j+b) \le 0, \quad \text{ for } \text{ at } \text{ least } \text{ one }\,\,\,j \in J^+_i. \end{aligned}$$

As a consequence the bag is misclassified if

$$\begin{aligned} 1-(w^Tx_j+b) > 0, \quad \text{ for } \text{ each }\,\,\,j \in J^+_i, \end{aligned}$$

that is if

$$\begin{aligned} \min _{j \in J^+_i}\{1-(w^Tx_j+b)\}>0. \end{aligned}$$

On the other hand, a negative bag, indexed by \(J_i^-\), is correctly classified when

$$\begin{aligned} w^Tx_j +b \le -1, \quad \text{ for } \text{ each } j \in J^-_i, \end{aligned}$$

that is when

$$\begin{aligned} \max _{j \in J^-_i} \{1+(w^Tx_j+b)\}\le 0, \end{aligned}$$

and, consequently, it is misclassified when

$$\begin{aligned} \max _{j \in J^-_i} \{1+(w^Tx_j+b)\}> 0. \end{aligned}$$

Function (7) is very difficult to be minimized, since it is nonconvex and nonsmooth. For this reason, in [23], the authors proposed to adopt a very fast heuristic approach, based on computing the optimal value of b in correspondence to a prefixed value of w (judiciously chosen in advance) and in simply exploring the kink points of (7).

In [23] a variant of the algorithm was also proposed, on the basis of a simple modification of (7).

3.4 MI-POLY

Differently from the above MIL approaches, which provide a single separation hyperplane, Algorithm MI-POLY [8] is based on the concept of polyhedral separability, obtainable by generating a separating polyhedron by means of a finite number, say \(h>1\), of hyperplanes. In order to recall the MIL model proposed in [8], we first report the basic definition of polyhedral separability for supervised learning.

Let

$$\begin{aligned} \mathcal{P}=\{p_1,\ldots ,p_r\}, \; \text{ with } p_j\in \mathbb {R}^n,\,j=1,\ldots ,r \end{aligned}$$

and

$$\begin{aligned} \mathcal{Q}=\{q_1,\ldots ,q_s\}, \; \text{ with } q_j\in \mathbb {R}^n, \,j=1,\ldots ,s, \end{aligned}$$

be two disjoint point sets. They are polyhedrally separable [12] if and only if there exists a finite number h of hyperplanes

$$\begin{aligned} H_t(w_t,b_t){\mathop {=}\limits ^{\triangle }}\{x\in \mathbb {R}^n \;|\;w_t^Tx+ b_t=0 \}, \end{aligned}$$

with \(w_t\in \mathbb {R}^n\) and \(b_t \in \mathbb {R}\), for \(t=1,\ldots ,h\), such that, for all \(j=1,\ldots ,r\),

$$\begin{aligned} w_t^Tp_j+b_t\le -1, \text{ for } \text{ all } t=1,\ldots ,h \end{aligned}$$

and, for all \(j=1,\ldots ,s\),

$$\begin{aligned} w_t^Tq_j+b_t\ge 1, \text{ for } \text{ at } \text{ least } \text{ an } \text{ index } t\in \{1,\ldots ,h\}. \end{aligned}$$

On the basis of the above definition, the two sets \(\mathcal{P}\) and \(\mathcal{Q}\) are polyhedrally separable if there exists a polyhedron generated by a finite number h of hyperplane, such that all points of \(\mathcal{P}\) are inside the polyhedron and all points of \(\mathcal{Q}\) are outside.

To extend the supervised polyhedral separation to MIL, in [8] the authors proposed to heuristically solve the following nonsmooth nonconvex optimization problem, by means of DC (Difference of Convex functions) techniques:

$$\begin{aligned} \left\{ \begin{array}{lll} & & \displaystyle \min _{y,w,b} \displaystyle \frac{1}{2}\displaystyle \sum _{t=1}^h\Vert w_t\Vert ^2+ C\displaystyle \sum _{i=1}^{m}\displaystyle \sum _{j\in J^+_i}\\ & & +\max \{0,1+y_j\displaystyle \max _{1\le t\le h}(w_t^T x_j +b_t)\}\\ & & +C\displaystyle \sum _{i=1}^{k}\displaystyle \sum _{j\in J^-_i}\max \{0,1-\displaystyle \max _{1\le t\le h}(w_t^Tx_j+ b_t)\}\\ & & \displaystyle \sum _{j\in J^+_i} \displaystyle \frac{y_j+1}{2}\ge 1 \quad i=1,\ldots ,m \\ & & y_j \in \{-1,+1\} \quad j\in J^+_i, \quad i=1,\ldots ,m \end{array}\right. \nonumber \\ \end{aligned}$$
(8)

Problem (8) exploits the standard MIL assumption, by imposing that, for each positive bag, at least an instance should lie inside the polyhedron and, for each negative bag, all the instances should be outside. Similarly to model (2), the first term of the objective function is aimed at maximizing the margins in correspondence to the h hyperplanes, while the successive two terms minimize the misclassification error of the instances belonging to the positive and negative bags, respectively. Note that, in case \(h=1\), problem (8) reduces exactly to problem (2) taking into account the symmetric role played by the two halfspaces generated by a single hyperplane.

4 A comparative study on chest X-ray images

We have performed a comparative study of the MIL algorithms described in the previous section, with the aim at discriminating between COVID-19 (positive X-ray images) and common viral pneumonia (negative X-ray images). The flowchart of the overall experimentation is reported in Fig. 2.

Fig. 2
figure 2

Flowchart of the experimentation: MIL-RL [9], mi-SPSVM [13] and MIL-kink [23] generate a separating hyperplane, while MI-POLY [8] a separating polyhedron. On the basis of the standard MIL assumption, the blue outlined images are classified as (positive) COVID-19 images, while the red outlined ones are classified as (negative) common viral pneumonia images (colour figure online)

Fig. 3
figure 3

100 COVID-19 chest X-ray images (positive images)

Fig. 4
figure 4

100 viral pneumonia chest X-ray images (negative images)

All the algorithms have been run on a Microsoft Windows 11 system, characterized by 16 GB of RAM and a 2.30 GHz Intel Core i7 processor. They have been tested on 200 X-ray chest images, randomly taken from the public dataset described in Sect. 4.1: 100 images are relative to people affected by COVID-19 (Fig. 3) and 100 correspond to people with common viral pneumonia (Fig. 4).

4.1 Dataset description

The original dataset from which we have drawn the images used in our computational study is named COVID-19 Radiography Database [40] and it is detailed also in [20, 41]. The overall dataset is constituted by 3616 COVID-19 images, 10,192 normal images, 6012 lung opacity (that is non-COVID lung infection) images and 1345 common viral pneumonia images. Among the COVID-19 images, 2473 were collected from the BIMCV-COVID19+ dataset [1], 183 from a German medical school [2], 559 from the (SIRM), GitHub, Kaggle & Twitter, and 400 from another repository [3].

Table 2 Dataset constituted by 100 COVID-19 and 100 viral pneumonia chest X-ray images: 5-CV average testing values, provided by MIL-RL [9], mi-SPSVM [13], MIL-kink [23] and MI-POLY [8]
Table 3 Dataset constituted by 100 COVID-19 and 100 viral pneumonia chest X-ray images: 10-CV average testing values, provided by MIL-RL [9], mi-SPSVM [13], MIL-kink [23] and MI-POLY [8]

4.2 Implementation details, segmentation and futures

We have used the same MATLAB implementations of MIL-RL adopted in [11], of mi-SPSVM tested in [13] and of MI-POLY presented in [8]. About MIL-kink, we have used both the implementations tested in [23] and named MIL-kink1 and MIL-kink2, corresponding, respectively, to the minimization of function (7) and to its variant cited at the end of Sect. 3.3 when w is fixed. For each code, we have maintained the optimal tuning of the parameters described in the respective papers.

As for the segmentation process, we have adopted a procedure similar to that one used in [10] and [50]. In particular, we have reduced the resolution of each image to \(128 \times 128\) pixels dimension and we have grouped the pixels in appropriate square subregions (blobs). In this way, each image is represented as a bag, while a blob corresponds to an instance of the bag. For each instance (blob), we have first considered the following 10 features:

  • the average and the variance of the grey-scale intensity of the blob: 2 features;

  • the differences between the average of the grey-scale intensity of the blob and that ones of the adjacent blobs (upper, lower, left, right): 4 features;

  • the differences between the variance of the grey-scale intensity of the blob and that ones of the adjacent blobs (upper, lower, left, right): 4 features.

To exploit information about the texture of the images, for each blob we have also computed the corresponding grey-scale co-occurence matrix, by using the graycomatrix subroutine provided by the Image Processing MATLAB toolbox. In particular, fixing the number of gray levels equal to 3, for each blob we have generated a \(3\times 3\) co-occurence matrix, having in this way a total number of features equal to 19.

4.3 Numerical results

In order to consider different sizes of the testing and the training sets, we have used two validation protocols: the 5-fold cross-validation (5-CV) and the 10-fold cross-validation (10-CV). In Table 2 and in Table 3, we report the respective average results computed on the testing set and expressed in terms of the following performance indicators:

  • accuracy = \(\displaystyle \frac{\text{ TP+TN }}{\text{ TP+TN+FP+FN }}\in [0,1]\): it provides the proportion of the correctly classified images, with respect to the overall dataset.

  • sensitivity = \(\displaystyle \frac{\text{ TP }}{\text{ TP+FN }}\in [0,1]\): called also true positive rate or recall, it measures the proportion of the correctly classified positive images, with respect to the total number of positive images.

  • specificity = \(\displaystyle \frac{\text{ TN }}{\text{ TN+FP }}\in [0,1]\): called also true negative rate, it measures the proportion of the correctly classified negative images, with respect to the total number of negative images.

  • \(PPV = \displaystyle \frac{\text{ TP }}{\text{ TP+FP }}\in [0,1]\): called also precision, it is the positive predictive value and it measures the proportion of the correctly classified positive images, with respect to the total number of images classified as positive.

  • NPV = \(\displaystyle \frac{\text{ TN }}{\text{ TN+FN }}\in [0,1]\): it is the negative predictive value and it measures the proportion of the correctly classified negative images, with respect to the total number of images classified as negative.

  • F-score = \(2\displaystyle \frac{\text{ sensitivity }\cdot \text{ PPV }}{\text{ sensitivity } + \text{ PPV }}\in [0,1]\): it is the harmonic mean of sensitivity and PPV.

  • \(\kappa {=} 2\cdot \frac{\text{ TP }\cdot \text{ TN }-\text{ FP }\cdot \text{ FN }}{(\text{ TP }+\text{ FP})\cdot (\text{ FP }+\text{ TN}) + (\text{ TP }+\text{ FN}) \cdot (\text{ TN }+\text{ FN})}\in [{-}1,1]\): it is the Cohen’s kappa coefficient, providing a measure of the agreement between the actual and the predicted observations.

  • MCC = \(\frac{\text{ TP }\cdot \text{ TN }{-}\text{ FP }\cdot \text{ FN }}{\sqrt{(\text{ TP }+\text{ FP})\cdot (\text{ TP }+\text{ FN})\cdot (\text{ TN }+\text{ FP}) \cdot (\text{ TN }{+}\text{ FN})}} \in [{-}1,1]\): it is the Matthews correlation coefficient.

In the above list, the quantities TP, TN, FP and FN are the entries of the so-called \(2\times 2\) confusion matrix. In particular, TP (true positive) indicates the number of correctly classified positive images, TN (true negative) is the number of correctly classified negative images, FP (false positive) denotes the number of misclassified negative images and FN (false negative) is the number of misclassified positive images.

In Tables 2 and 3, we also report the average CPU time spent by the classifier to determine the optimal separation surface in the learning phase. For each of these performance parameters, the best result is highlighted in bold.

Looking at the the two tables, we observe that, apart from MIL-kink1 and MIL-kink2, the three codes MIL-RL, mi-SPSVM and MI-POLY provide quite comparable results (reaching in all the cases an accuracy greater than 92%), even if mi-SPSVM is the best performing also in terms of CPU time. On the other hand, both the versions of Algorithm MIL-kink have a non negligible advantage in terms of speed, providing however a reasonable accuracy close to 80%. MIL-RL, mi-SPSVM and MI-POLY performs well also in terms of MCC, which is a more informative parameter than F1-score, since it takes into account the number of correctly classified negative images (TN), that is not considered by the F1-score indicator.

Finally it is worth noting that the Cohen’s kappa coefficient \(\kappa \) is fully aligned with MCC and it shows an almost perfect agreement between the actual and the predicted observations in case of MIL-RL, mi-SPSVM and MI-POLY, while for MIL-kink it provides a moderate agreement.

5 Conclusions

COVID-19 quickly spread around the world creating an emergency situation. The use of X-ray chest radiographs is an element that assists the diagnosis, allowing also the follow-up of the disease.

In this context, we have focused on Multiple Instance Learning (MIL) techniques, which have proven to be effective in image analysis. In particular, we have presented a comparative study of some very recent MIL approaches, tested on a set of 200 chest X-ray images with the aim of discriminating between COVID-19 and common viral pneumonia. This study has highlighted the great potentiality of MIL, providing an accuracy result equal to 95% and a value of MCC equal to 0.9. Differently from the deep learning techniques, MIL approaches (especially the instance-level ones) are easily interpretable and consequently more suitable for image classification, since it is not difficult to practically interpret why an image is classified positive or negative.

The promising results of this study open up several avenues for future research, which may involve the implementation of more complex MIL systems, including additional features and more sophisticated segmentation and pre-processing techniques, such as those used in [44]. Combination of MIL with deep learning approaches could exploit the strengths of both methodologies. For instance, hybrid models that use MIL for interpretability and CNNs for feature extraction might offer higher performance. A further direction of research aims to include a wider range of cases, encompassing various stages of COVID-19 and other respiratory conditions, across different populations and healthcare settings.