Introduction

Myocardial perfusion scintigraphy (MPS) is a well-established clinical procedure for the diagnosis and evaluation of patients with suspected or known coronary artery disease (CAD). The visual interpretation of the echocardiogramme (ECG)-gated MPS can be a difficult task, especially for less experienced physicians at hospitals with a low volume of examinations. Therefore, tools for quantification of the images have been developed to assist physicians with the aim of improving accuracy and reducing variability among readers. One approach to quantification is segmental defect or uptake scoring whereby summed rest, stress and difference scores provide a measure of the extent and depth of abnormalities [1]. Another approach is to calculate the extent of a defect [2]. Such quantification can be both diagnostically and prognostically valuable [1, 3].

A third approach is to train an artificial neural network to interpret MPS and to present diagnostic advice regarding the absence or presence of myocardial ischaemia and infarction instead of a quantitative value [4]. This type of neural network has previously been developed to analyse the perfusion images of the rest and stress studies. ECG-gated imaging is now widely used, however, and we therefore recently presented an automated decision support system for interpreting ECG-gated MPS [5]. The system uses a realistic three-dimensional model of the left ventricle, an active shape algorithm and methods for extraction of features describing the perfusion and function of the left ventricle at stress and rest. These features are used as inputs to artificial neural networks and the networks are trained using a database of classified MPS images. Our decision support system was trained and tested regarding the interpretation of myocardial infarction using MPS from the same hospital. The results showed an improved performance of the decision support system when functional information from ECG-gated MPS was added to the perfusion information. However, such a system will only be used widely if it remains accurate in other centres. Differences in protocols, radiopharmaceuticals, imaging procedures as well as differences in interpreting style may limit transferability. The purpose of this study was to evaluate the decision support system based on artificial neural networks in a different hospital from where it was trained and to compare its performance with that of a conventional quantification software package.

Material and methods

Patients

Training material

The training material consisted of patients who underwent ECG-gated MPS between August 1, 2003 and April 30, 2004 at the Royal Brompton Hospital, London, UK. Patients with cardiomyopathy (n = 20) were excluded as well as patients with incomplete data (n = 4) and studies with technical problems (high liver or gut uptake n = 27; transferring difficulties of images n = 58). The study population comprised 418 patients with mean age of 64 ± 12 years (range 12–89 years), and 64% were men. Only one examination per patient was included. MPS was performed for the diagnosis of CAD in 193 patients and for the management of known CAD in 229 patients. There was previous myocardial infarction in 149 patients and previous re-vascularisation in 118 patients (35 coronary artery bypass grafting (CABG), 55 percutaneous coronary intervention (PCI) and 28 both CABG and PCI). Diabetes was present in 26%, hypertension in 51%, hyperlipidaemia in 63% of the patients; 15% were present smokers, and 32% had a positive family history. Mean left ventricular ejection fraction was 63 ± 14%.

Test material

The test material consisted of patients who underwent ECG-gated MPS between September 15, 2004 and September 14, 2005 at Sahlgrenska University Hospital in Gothenburg, Sweden. Patients with incomplete data (n = 4) and studies with technical problems (high liver or gut uptake, motion artifacts, n = 25) were excluded. The study population comprised 532 patients with mean age of 62 ± 11 years, and 49% were men. Only one examination per patient was included. MPS was performed for the diagnosis of CAD in 421 patients and for the management of known CAD in 107 patients. There was previous myocardial infarction in 79 patients and previous re-vascularisation in 98 patients (34 CABG, 51 PCI and 13 both CABG and PCI). Diabetes was present in 16%, hypertension in 50%, hyperlipidaemia in 44% of the patients; 16% were present smokers, and 30% had a family history of coronary disease. Mean left ventricular ejection fraction was 65 ± 11%.

The study was approved by the Research Ethics Committee at Gothenburg University.

Stress protocol

Training material

Patients were stressed using adenosine combined with sub-maximal dynamic exercise or with adenosine alone in patients with left bundle branch block (n = 41) or a paced rhythm (n = 17). When adenosine was contra-indicated, dobutamine was used. The exercise or pharmacological stress was continued for at least 2 min after injection of the tracer.

Test material

Patients were stressed using either maximal exercise, symptom-limited ergometry test (53%) or pharmacological test with adenosine. The exercise or pharmacological stress was continued for at least 2 min after injection of the tracer.

Radionuclide imaging

Training material

The stress and rest studies were performed in a 1-day protocol using 99mTc-tetrofosmin. 250 MBq was used for the stress study and 750 MBq for the rest study. In patients with weight over 100 kg, the dose was increased to 350 MBq and 1,000 MBq, respectively, and in patients with weight over 120 kg a 2-day protocol was used with 1,000 MBq for each study. The scintigraphic data acquisition was started 30–60 min after the injection of the 99mTc-tetrofosmin and was performed using a dual-head camera (Philips Forté, Philips Medical Systems, Milpita, CA, USA). The planar projection images were acquired in step and shoot mode using a 180° elliptical rotation from 45° right anterior oblique position, with the patient in a supine position. A low-energy high-resolution collimator and a zoom factor of 1.46 were used. Thirty-two projections over 180° were obtained in a 64 × 64 matrix, pixel size 5.4 mm. The acquisition time was 45 s per projection for the stress study and 60 s per projection for the rest study. The rest study was acquired in ECG-gated mode using 16 frames and an R–R interval acceptance window of 40%.

Test material

The gated single photon emission computed tomography (SPECT) studies were performed using a 2-day non-gated stress–gated rest 99mTc-sestamibi protocol. Stress and rest acquisition began about 60 min after the injection of 600 MBq 99mTc-sestamibi. Images were acquired with two different dual-head SPECT cameras (Infinia or Millennium VG, General Electric, USA) equipped with low-energy, high-resolution collimator. Acquisition was done in supine position in step and shoot mode using circular acquisition and a 64 × 64 matrix, zoom factor of 1.28, pixel size 6.9 mm with 30 projections over 180°, 40 s per projection. In patients with weight over 90 kg, the acquisition time per projection was increased to 55 s. During the rest acquisition, the patient was monitored with a three-lead ECG. The acceptance window was opened to ±20% of the predefined R–R interval except for a very limited number of studies in which a wider acceptance window was used. Other beats were rejected. Each R–R interval was divided into eight equal time intervals. Gated SPECT acquisition was performed at the same time as un-gated routine SPECT acquisition. An automatic motion-correction programme was applied in studies showing patient motion during acquisition.

Image processing

Training material

Tomographic reconstruction was performed using a Butterworth filter with critical frequency of 0.5 (Nyquist’s limits) and order 5 and iterative reconstruction with 30 iterations. AutoQuant software (Philips Medical Systems, Milpita, CA, USA) was used to aid visual interpretation [6, 7]. No attenuation or scatter correction was used.

Test material

Tomographic reconstruction was performed using filtered back projection with a Butterworth filter with critical frequency of 0.52 cycles per centimetre and order 5. The reconstruction of gated data was done using filtered back projection with a Butterworth filter with critical frequency of 0.40 cycles per centimetre and order 10. CEqual and QGS software packages were used to aid visual interpretation [6, 8]. No attenuation or scatter correction was used.

Image interpretation

Training material

Visual interpretation by one of three experienced physicians at the time of clinical reporting was used as the standard for the scintigraphic presence of myocardial infarction and/or ischaemia. The clinical reports were reviewed by a fourth experienced physician for the purpose of this study and differences of opinion were resolved by consensus. The presence or absence of ischaemia and/or infarction was coded using the following categories: absent, equivocal or present. Equivocal studies were classified as normal studies. For localisation, a five-segment model was used (anterior, septal, inferior, lateral and apical).

Test material

Visual interpretation by one experienced physician at the time of clinical reporting was used as the standard for the scintigraphic presence or absence of myocardial ischaemia and/or infarction. The same coding and localisation systems as in the training material were used.

Automated image analysis

Neural-network-based decision support system

A completely automated method based on computerised image processing and artificial neural networks was developed for the interpretation of MPS regarding myocardial ischaemia and infarction. The image processing technique for automated segmentation of the left ventricle and quantification of CArdiac FUnction—denoted CAFU—has recently been presented [9, 10]. The CAFU method is based on the active shape algorithm. The search and delineation of the left ventricle in the SPECT images is based on a heart-shaped left ventricular model. The method has been presented in detail elsewhere [9]. Measures of stress and rest perfusion are also automatically calculated for each of the landmarks. Finally, features describing the size and severity of stress and rest defects as well as myocardial thickening of the defects and global function of the left ventricle (end diastolic volume and ejection fraction) were calculated and used as input to different artificial neural networks for the interpretation regarding myocardial ischaemia and infarction in each of the five myocardial segments or in the total left ventricle.

An ensemble of single artificial neural networks was used as classifier for each classification problem. The individual members of the ensemble were standard multi-layer perceptrons [11] with one input layer consisting of one node for each input feature, one hidden layer consisting of five nodes and one output node that was used to encode the presence of ischaemia–infarction. Each neural network was trained using gradient descent applied to a cross-entropy error function. The gradient-descent method was augmented with a traditional momentum term and a Langevin extension [12]. To avoid over-training, a weight elimination [13] regularisation term was utilised. The output of the neural network ensemble was computed as the mean of the outputs of the individual members in the ensemble. In this study, an ensemble size of 100 neural networks was used. During training, the output was set to 1 for myocardial ischaemia–infarction (present) and 0 for no myocardial ischaemia–infarction. For each test case, each network presented an output value between 0 and 1. A threshold was used, above which all values were regarded as abnormal. Thresholds for the segmental neural networks were selected to give sensitivities of approximately 80% for myocardial ischaemia and 90% for infarction in the training material because ischaemia has proven to be a more difficult task. These thresholds were then applied to the test material. Thresholds for the neural networks analysing the total left ventricle were selected to give sensitivities of approximately 90% in the test material for the comparison with the quantification software package.

Emory cardiac toolbox

For each patient in the test material, the same sets of short axis slices were processed both with our neural-network-based decision support system and the automatic software package Emory Cardiac Toolbox (ECTb), which includes the CEqual quantitation [8, 14]. ECTb automatically defines apex and base of the left ventricle and generates polar maps. It is possible to change the definition of apex and base and the operators used this possibility when necessary.

The rest studies were processed using the 20-segment model for scoring. The segments were automatically scored regarding radiotracer uptake, using a five-point scoring system (0 = normal; 1 = equivocal; 2 = moderately reduced; 3 = severely reduced; and 4 = absent). The summed stress, rest and difference scores (SRS, SDS) were automatically calculated for each patient by the software package. The scores were generated by comparison with normal databases and the ECTb scoring used in this study was based on a 2-day sestamibi file.

Threshold was applied to the scores, above which values were regarded as abnormal and the thresholds were selected to give a sensitivity of approximately 90% for both ischaemia and infarction in the same way as was done for the neural network values.

Statistical analysis

The technique used for comparing the performance of the neural-network-based decision support system and the ECTb values in the test material was as follows. The thresholds applied to the neural network outputs, SDS and SRS values were chosen individually so that the sensitivities were the same and approximately 90%. Thereafter, the corresponding specificities were compared. The significance of a difference in specificity was tested paying particular attention to the fact that the same studies were used, i.e. a McNemar type of statistic was used [15].

Results

The neural network sensitivities and specificities for ischaemia and infarction in different segments are shown in Table 1. The sensitivities for ischaemia were 83% and 85% in the inferior and lateral segments, respectively, and between 90% and 100% in the other segments. The corresponding specificities were between 75% and 85% in the different segments. The specificities for infarction were 85% in the inferior segment and very high (97–98%) in the other segments. The corresponding sensitivities were between 69% and 100%.

Table 1 Performance for myocardial infarction and ischaemia of the neural-network-based decision support system in test material

A total of 92 patients in the test material had myocardial ischaemia in at least one of the five segments and 47 patients had myocardial infarction in at least one segment. The overall sensitivity and specificity for the diagnosis of ischaemia and infarction for the neural network and the ECTb quantification is shown in Table 2. At the same sensitivity level the decision support system showed a significantly higher specificity for both the diagnosis of ischaemia and infarct than the ECTb quantification.

Table 2 Specificity of the neural-network-based decision support system compared with ECTb quantification at the same sensitivity level

The sensitivities and specificities for the neural network and the ECTb quantification in men and women as well as in patients with and without previous myocardial infarct are shown in Table 3. The specificities were significantly higher for the neural network than for the ECTb in all sub-groups for both infarction and ischaemia. The neural network performed worse for ischaemia in men than in women and also worse in patients with previous myocardial infarction compared to those without previous myocardial infarction. These differences between the sub-groups were not seen for infarction.

Table 3 Performance of the neural-network-based decision support system compared with ECTb quantification in sub-groups of patients

Discussion

Main findings

A decision support system based on artificial neural networks can interpret MPS regarding myocardial ischaemia and infarction in a separate hospital from where it was trained in good agreement with a very experienced physician. The best performance of the neural network was detection of myocardial infarction in the septum with a sensitivity of 100% at a specificity of 98% and the worst performance was the detection of myocardial ischaemia in the inferior segment showing both a sensitivity and specificity of 83%. Complete agreement between the artificial neural network and the experienced physician at the test hospital was unlikely to be obtained. The neural network was trained by interpretations from physicians at the training hospital and was tested against interpretations from the physician at the test hospital. Previous studies have shown that there is intra-observer variability in MPS interpretations even for very experienced physicians and inter-observer variability also between physicians working closely together at the same department [4]. Thus, the experienced physicians at our training hospital would probably not agree with the experienced physician at the test hospital in all MPS cases.

Other things that can affect the performance of the neural network interpretations are differences in patient population and protocols between the training and testing hospitals. The patient populations investigated at the two hospitals were very different. MPS was performed for the management of known CAD in 55% of the patients at the training hospital and 20% of the patients at the test hospital. The patients of the training material had higher prevalence of known myocardial infarction (36% vs. 15%), diabetes (26% vs. 16%) and hyperlipidaemia (63% vs. 44%) and nearly two thirds of the patients in the training material were men compared to a close to 50–50 relation between men and women in the test material. According to the expert interpretations, 38% of the training cases and only 13% of the test cases had myocardial infarction. Other differences between the training and testing hospitals were differences in protocols, radiopharmaceuticals and imaging procedures. Our training data consisted of gated SPECT studies performed in a 1-day protocol using 99mTc-tetrofosmin while the test studies were performed in a 2-day protocol using 99mTc-sestamibi. There were also other differences, for example the use of iterative reconstruction in the training material and filtered back projection in the test material. Thus, all these differences in patient characteristics and imaging procedures between the hospitals where the neural networks were trained and tested could have influenced the performance of the networks. The good performance at the test hospital despite these differences show the feasibility of making the neural-network-based decision support systems available to a wide range of hospitals.

The comparison between the decision support system based on artificial neural networks and the quantification software package ECTb showed higher specificity for the neural network compared at a sensitivity level of 90%. Higher specificities were also seen for the neural networks in sub-groups of patients such as men, women and patients with and without previous myocardial infarction. In the neural networks, perfusion and function are integrated in the interpretation, whereas the ECTb quantification is based only on perfusion. This could contribute to the higher specificity for infarction for the neural networks. The ECTb methods are also expert systems consisting of a number of rules, such as “if the mean count values for pixels of a specific myocardial segment is in the interval between X and Y, then a score of 2 (moderately reduced) is assigned to that segment” or “if the sum of the 20 segmental rest scores is below 4, then the study is defined as normal”. Comparisons between artificial neural networks and this type of rule-based systems, both in the field of MPS [16] and electrocardiography [17], with its very long history of development of automated interpretation programmes, have shown the superiority of the neural network technique. Interpretation of diagnostic images and electrocardiogrammes is to a great extent a pattern recognition task and artificial neural networks have proven to be a powerful tool for that type of analysis.

Study limitation

The training and test materials consisted of MPS images and a gold standard classification regarding myocardial ischaemia and infarction for each case. Ideally, the materials should be large (at least in the order of hundreds), representative of cases found in a clinical MPS routine, and the gold standard method preferably accurate and independent of MPS (for example coronary angiography). These features are difficult to combine in one study. On the other hand, the neural network interpretation is not trying to emulate a gold standard, such as coronary angiography. The aim for the neural network is instead to emulate the expert interpretation of the MPS study and to make that interpretation easily accessible to physicians reporting MPS studies, despite hospital location or the physicians’ own experience.

Clinical implications

Completely automated decision support systems for the interpretation of MPS are capable of providing aid in clinical decision making. Physicians will be able to use the second opinion to improve their clinical accuracy. Positive effects of more accurate image interpretations can be both more efficient identification of patients who will benefit from further examinations or specific forms of treatments such as coronary angiography, CABG or PCI and more efficient in taking the decision to avoid a superfluous coronary angiography in cases of normal or nearly normal MPS. The results obtained in this project will be of value for future development of decision support systems within the field of MPS and beyond.