Introduction

Spirometry is the most widely used pulmonary function test. It is a relatively simple, noninvasive test that measures the volume of air expelled from fully inflated lungs as a function of time. Spirometric examination is an essential tool in the diagnosis of airway obstruction, and to some extent in the detection of restriction and follow-up of respiratory diseases [13]. Lung diseases are broadly classified as those leading to airflow obstruction, volume restriction, or a combination of obstructive and restrictive defects [1]. An obstructive defect is a disproportionate reduction of maximal airflow from the lung in relation to maximal volume that can displaced form the lung. It implies airway narrowing during exhalation and is defined by a reduced FEV1/FVC ratio. FEV1 is the volume of air that is forcibly exhaled in the first second, whereas FVC is the total volume of air exhaled after a full inspiration. Airflow obstruction can be diagnosed using spirometry alone by demonstrating a low FEV1/FVC ratio [2].

Since affordable hand-held spirometers are now widely available, evaluation of patients with respiratory symptoms can easily be made in primary care. Primary care spirometry allows early detection of obstructive abnormality [4].

Spirometry is also often used as a screening tool to diagnose or rule out restrictive pulmonary impairment [5]. A low FVC seen on spirometry may be a clue to a restrictive impairment. A low spirometric FVC together with a normal or high FEV1/FVC ratio has traditionally been classified as a restrictive abnormality [1]. If spirometry suggests a restrictive disorder, patients with this pattern are usually referred for additional pulmonary function tests (PFTs) to confirm the diagnosis. Diagnosis of a restrictive impairment depends on detecting a reduced total lung capacity (TLC) by lung volume measurement [6]. However Cristine et al. [7] showed that a spirometry-based algorithm accurately excludes pulmonary restriction and reduces unnecessary lung volume testing in the pulmonary function tests laboratory almost in half.

Artificial neural networks (ANNs) have been found to be flexible in modelling and accurate in prediction. Therefore, ANNs have been used in different medical diagnoses and the results were compared with physicians’ diagnoses and existing classification methods [816]. Their capacity to find near-optimum solutions from limited or incomplete data sets and the fact that learning is accomplished through training make ANNs promising tools. In addition to these characteristics, it has been shown that neural networks can combine data of a different nature in one system, such as data derived from clinical protocols and laboratory data obtained from measurements and features from signals and images, thus forming an integrated diagnostic system [816]. There are numerous methods to represent patterns as a grouping of features. The choice of methods appropriate for a given pattern analysis task is rarely obvious. At each level (feature extraction, feature selection, classification) many methods exist [816]. The SVM proposed by Vapnik [17] has been studied extensively for classification, regression and density estimation. In this study, in order to discriminate the spirometric patterns, multiclass support vector machine (SVM) with the error correcting output codes (ECOC) was implemented (Fig. 1). A significant contribution of the present work was to examine the performance of the multiclass SVM with the ECOC on the diagnosis of spirometric patterns (normal, restrictive, obstructive).

Fig. 1
figure 1

Architecture of the SVM (N is the number of support vectors)

Database overview

For the study a total of 499 spirometric examination, which took place on male subjects aged between 35 ± 10 years, were analysed. Spirometry measurements were performed with mass-flow sensor (model Spirolab: MIR: Roma, Italy) by an experienced pulmonary function technicians according to the guidelines of the European Respiratory Society in a regional medical center, Umut Medical Center [18]. The spirometric parameters used in this study were: forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC ratio. FEV1 is the volume of air that is forcibly exhaled in the first second, whereas FVC is the total volume of air exhaled after a full inspiration. Spirometric reference values used were from Knudson et al. [19] for patients <65 years old. For the diagnosis of airway obstruction, FEV1/FVC <75% was used as a fixed cut-off point, according to the recently updated GINA 2006 guidelines [5] and international studies of chronic obstructive pulmonary disease [20, 21]. A subject was said to have a restrictive spirometric pattern if there was a reduced FVC in the presence of a normal FEV1/FVC. A fixed cut-off of 80% of the predicted value for FVC was used [22].

Support vector machines

The SVM maps the input patterns into a higher dimensional feature space through some nonlinear mapping. Then in this high dimensional feature space, a linear decision surface is constructed. The SVM is a linear classifier in the parameter space, but it becomes a nonlinear classifier as a result of the nonlinear mapping of the space of the input patterns into the high dimensional feature space. Training the SVM is a quadratic optimization problem. The construction of a hyperplane w T x + b = 0 (w is the vector of hyperplane coefficients, b is a bias term) so that the margin between the hyperplane and the nearest point is maximized and can be posed as the quadratic optimization problem. High generalization ability has been provided by the SVMs. A proper kernel function for a certain problem is dependent on the specific data and there is no good method on how to choose a kernel function [17, 23]. In this study, the choice of the kernel functions was studied empirically and optimal results were achieved using radial basis function (RBF) kernel function.

The SVM is a binary classifier and this classifier can be extended by fusing several of its kind into a multiclass classifier. In this study, SVM decisions were fused using the ECOC approach, adopted from the digital communication theory [24]. In the ECOC approach, up to 2n  − 1 − 1 (where n is the number of classes) SVMs are trained, each of them aimed at separating a different combination of classes. For three classes (A, B, and C) three classifiers are necessary; one SVM classifies A from B and C, a second SVM classifies B from A and C and a third SVM classifies C from A and B. The multiclass classifier output code for a pattern is a combination of targets of all the separate SVMs. If each of the separate SVMs classifies a pattern correctly, the multiclass classifier target code is met and the ECOC approach reports no error for that pattern. When at least one of the SVMs misclassifies the pattern, this means that the class selected for this pattern is the one its target code closest in the Hamming distance sense to the actual output code. Then this can be an erroneous decision.

Results

The features of spirometric patterns (FEV1, FVC and FEV1/FVC ratio) were used as the inputs of the SVM. The values including FEV1, FVC and FEV1/FVC ratio (the features used as inputs of the classifiers) of sample records of three classes (normal, obstructive, restrictive) are presented in reports of the subjects as shown in Figs. 2, 3 and 4. The sample images of three classes are shown in Figs. 24. There are a total of 499 records in the spirometric database, of which 408 normal, 30 restrictive and 61 are obstructive. In the developed classifier, 200 of 499 records were used for training and the rest for testing. The training set consisted of 162 normal, 12 restrictive and 26 obstructive. The testing set consisted of 246 normal, 18 restrictive and 35 obstructive. The classifier proposed for diagnosis of spirometric patterns was implemented by using the MATLAB software package (MATLAB version 7.0 with neural networks toolbox).

Fig. 2
figure 2

The values of FEV1, FVC and FEV1/FVC ratio (the features used as inputs of the classifiers) of sample record (normal pattern). The sample image for normal subject

Fig. 3
figure 3

The values of FEV1, FVC and FEV1/FVC ratio (the features used as inputs of the classifiers) of sample record (obstructive pattern). The sample image for obstructive subject

Fig. 4
figure 4

The values of FEV1, FVC and FEV1/FVC ratio (the features used as inputs of the classifiers) of sample record (restrictive pattern). The sample image for restrictive subject

The training error rate and the capacity of the learning machine measured by its Vapnik–Chervonenkis (VC) dimension [17] are two different factors for controlling the generalization ability of the SVM. The smaller the VC dimension of the function set of the learning machine, the larger the value of training error rate. We can control the tradeoff between the complexity of decision rule and training error rate by changing a parameter C [21] in the SVM. The SVMs were trained for different C values for achieving the best result. The best result was obtained for C = 80 in the testing procedure. The number of support vectors in the SVMs training was found, when C = 80. Training algorithm of the SVM, based on quadratic programming, incorporates several optimization techniques such as decomposition and caching. The quadratic programming problem in the SVM was solved by using the MATLAB optimization toolbox. Multiclass SVM and the ECOC algorithm were employed to classify the spirometric patterns. The SVM of the three-class classifier used the RBF kernel functions. In order to implement the SVMs with the RBF kernel functions, one has to assume a value for σ. The optimal σ can only be found by systematically varying its value in the different training sessions. To do this, the support vectors were extracted from the training data file with an assumed σ value. After the support vectors have been found and SVM constructed, the model was applied to 1/3 of the training data set to compute the misclassification rate. The σ value was varied between 0.1 and 0.6, at interval of 0.1. The σ = 0.3 resulted in the minimum misclassification rate was thus chosen. In diagnosis of the spirometric patterns three inputs (FVC, FEV1, FEV1/FVC), nine support vectors and three outputs (normal, restrictive, obstructive) were used.

Confusion matrices are used for displaying the classification results of the classifiers. In a confusion matrix, each cell contains the raw number of exemplars classified for the corresponding combination of desired and actual network outputs. The confusion matrix showing the classification results of the SVM used for classification of the spirometric patterns is given in Table 1. One can tell the frequency with which a pattern is misclassified as another from the confusion matrix. As it is seen from Table 1, normal patterns are most often confused with restrictive patterns, likewise restrictive patterns with obstructive patterns.

Table 1 Confusion matrix

The test performance of the SVMs can be determined by the computation of specificity, sensitivity and total classification accuracy. The specificity, sensitivity and total classification accuracy are defined as:

  • Specificity: number of true negative decisions/number of actually negative cases

  • Sensitivity: number of true positive decisions/number of actually positive cases

  • Total classification accuracy: number of correct decisions/total number of cases

A true negative decision occurs when both the classifier and the physician suggested the absence of a positive detection. A true positive decision occurs when the positive detection of the classifier coincided with a positive detection of the physician.

In order to determine the performance of the classifier used for spirometric pattern diagnosis, the classification accuracies (specificity, sensitivity, total classification accuracy) on the test sets are presented in Table 2. The total classification accuracy of the SVM was 97.32%. The performance of a test can be evaluated by plotting a receiver operating characteristic (ROC) curve for the test and therefore, ROC curves were used to describe the performance of the SVM. ROC plots provide a view of the whole spectrum of sensitivities and specificities because all possible sensitivity/specificity pairs for a particular test are graphed. A good test is one for which sensitivity rises rapidly and 1-specificity hardly increases at all until sensitivity becomes high. ROC curve, which is presented in Fig. 5, demonstrates the performance of the SVM on the test files.

Fig. 5
figure 5

ROC curve

Table 2 The values of the statistical parameters

Discussion

The current study shows that the use of multiclass SVM over spirometric parameters can determine normal, restrictive and obstructive patterns with a success rate of 97.32%. Rapid and accurate scanning of crowded populations can be possible with the presented method and can particularly be useful for general practitioners during the diagnosis and treatment at the primary care level. Determination of the obstructive pattern can be used as a predictor for the diagnosis of COPD [1]. Although airway obstruction may not be an essential component of the diagnosis of asthma, spirometric limitations are important in determining the groups with persistent airflow limitations [25]. Since both COPD and asthma are obstructive airway diseases with high morbidity and mortality, early diagnosis and determination of the severity of these diseases are important [1, 26, 27]. Clearly, without TLC measurements, spirometry alone is not adequate in the diagnosis of restrictive lung disease. However, spirometry is useful for demonstrating the absence of a restrictive defect [5]. Thus, unnecessary lung volume testing can be avoided and health care costs can be reduced. In addition, this study contributes to the literature by presenting information regarding the performance of the spirometric algorithm and provides a chance for comparing other algorithms. Such comparisons can reveal the most successful algorithms used in lung function test parameters, which may later become available in the market.

Conclusions

Our purpose was to investigate the accuracy of multiclass SVM with the ECOC trained on the three spirometric parameters for diagnosis of normal, restrictive and obstructive patterns. The multiclass SVM showed a great performance since it maps the features to a higher dimensional space. The total classification accuracy of the multiclass SVM was 97.32%.The results of the present study demonstrated that the multiclass SVM can be used in diagnosis of the spirometric patterns by taking into consideration the misclassification rates.