Keywords

1 Introduction

Chronic Obstructive Pulmonary Disease (COPD) is a respiratory disease characterized by a chronic airflow limitation and associated with major economic and social problems. COPD is classified as the fourth leading cause of death in the world and in absence of countermeasures aimed to reduce risk factors it is expected to become the third leading cause of death by 2030 [1]. In 2015 there were 3.2 million deaths associated with COPD and the estimated global prevalence of COPD was about 175 million [2]. From an economic point of view, only in the United States, the annual estimated costs associated with COPD are about 50 billion dollars. In the next years, costs are expected to rise dramatically together with prevalence. Costs increase with increasing severity of the disease and most of them are linked with hospital admissions which in turn are mainly caused by exacerbation episodes [3].

In an attempt to find a solution to these problems, numerous clinical decision support systems (CDSSs) for the management of patients with COPD have been developed in recent years [4]. In particular, systems based on machine learning algorithms have been developed with the aim of monitoring the health status of patients and foreseeing and preventing exacerbations and hospital admissions [5]. An in-depth research into scientific literature has shown that, in the state of the art, these goals have not yet been met and the performance of the current systems is not clinically acceptable. The aim of this work is the design and implementation of a new CDSS that can at least partially fill the current gaps.

2 Materials and Methods

2.1 Data

In order to train, validate and test the decision support system, data from 414 patients affected with COPD and obstructive ventilatory defect were acquired using pulmonary function tests. The following physiological parameters were acquired: Forced Expired Volume in one second (FEV1), Forced Vital Capacity (FVC), Slow Vital Capacity (SVC), FEV1/FVC ratio, FEV1/SVC ratio, Forced Expired Flow at 25–75% (FEF 25–75), Peak Expiratory Flow (PEF), Vital Capacity (VC), Total Lung Capacity (TLC), Residual Volume (RV), Functional Residual Capacity (FRC), Expiratory Reserve Volume, Diffusing Capacity (DLCO), Alveolar Volume (VA) and DLCO/VA ratio. All these parameters were measured before and after bronchodilation. Other parameters were patients’ age, height, bodyweight and sex. According to these parameters five expert pneumologists evaluated the severity of each patient’s ventilatory defect and classified it as mild, moderate or severe.

2.2 Data Analysis and Predictive Model Training

Data were processed and analyzed using IBM SPSS Modeler 18.1 [6]. The aim of this phase was to develop a predictive model able to classify the patients’ ventilatory defect in three categories (Mild, Moderate and Severe) according to the values of the physiological parameters previously described.

The first step was to try to replicate the performance of support systems for similar decisions, already present in scientific literature. Most of these systems and, in particular, the ones which reached better performances in terms of predictive accuracy, sensitivity and specificity were trained using Neural Networks and Support Vector Machines [7, 8]. Therefore, we used those machine learning techniques to train two different predictive models. We then calculated the performances of these predictive models in predicting the severity of patients’ ventilatory defect.

Next step was training a new predictive model with better performances. In order to identify the most suitable machine learning technique for our data, IBM SPSS Modeler’s auto classifier node was used. Performances of various machine learning techniques were compared: CART, Random Forest, QUEST, CHAID, Bayesian Networks, Logistic Regression, C5.0, KNN and others. The best performances were reached by the C5.0 algorithm. We therefore concluded that C5.0 was the most suited algorithm for our data. Finally, we trained a predictive model using the C5.0 algorithm and compared its performances with those reached by the predictive models trained with the Neural Network and the Support Vector Machine.

2.3 Predictive Model Implementation

The predictive model trained with the C5.0 algorithm was implemented within a user interface, implemented in Java programming language, the COPD Management Tool. A demonstrative view of the COPD Management Tool is shown in Fig. 1.

Fig. 1
figure 1

COPD management tool user interface

3 Results

Results and performances related to the three predictive models, respectively trained using Neural Network, Support Vector Machine and C5.0 algorithms, are reported in the tables below (Tables 1, 2 and 3).

Table 1 Support vector machine performances
Table 2 Neural network performances
Table 3 C5.0 performances

4 Conclusions

Performances obtained with the Neural Network and the Support Vector Machine are comparable with those of the scientific literature. Performances obtained with the C5.0 algorithm are significantly better than those obtained with the two previous model.

The proposed approach, designed with the same systematic approach used in previous works from the authors for a Cardiac Heart Failure CDSS [9,10,11,12,13], allows the evaluation and classification of the results of pulmonary function tests, with excellent performance, compared to the current state of the art and can therefore be used in many clinical applications.