Introduction

Spectral-domain optical coherence tomography (SD-OCT) and confocal scanning laser ophthalmoscopy (CSLO) are among the best known imaging methods widely used for diagnosing and following of primary open-angle glaucoma (POAG). The repeatability and variability of measurements are important features of any diagnostic method. There are several studies of repeatability and variability of stereometric parameters of the optic disc and retinal nerve fiber layer (RNFL) measurements by CSLO with Heidelberg Retina Tomograph (HRT) [17], but their results are sometimes conflicting. A number of recent studies [814] have dealt with repeatability and variability of RNFL measurements by SD-OCT with Cirrus HD-OCT [812] or other instruments [13, 14], but to our knowledge no comparison with HRT was performed. The purpose of the present study was to compare the intrasession intraoperator and interoperator repeatability and within-subject coefficients of variation of measurements of the RNFL by Cirrus HD-OCT and stereometric parameters of the optic nerve head (ONH) and RNFL by HRT3 in patients with early glaucomatous visual field defect.

Material and methods

Subjects

Patients were recruited consecutively between January and April 2010 at the out-patient department of the S. Fyodorov Eye Microsurgery Complex — Federal State Institution, Moscow, Russian Federation. The enrolled patients underwent recent complete ophthalmologic evaluation. The inclusion criteria were a best-corrected visual acuity of 20/40 or better, spherical equivalent refraction within ±5.0 D, astigmatism within ±3.0 D, clear media, open angles at gonioscopy, controlled intraocular pressure (IOP), reliable repeated standard automated perimetry (Humphrey Field Analyzer II, Carl Zeiss Meditec Inc., Dublin, CA, USA) using the Swedish Interactive Threshold Algorithm (SITA) Standard 30–2 program (perimetry was considered reliable if false positives, fixation losses and false negatives did not exceed 20% and no observable testing artefacts were present). The exclusion criteria were poor fixation, pronounced miosis, history of ocular surgery within 6 months of enrollment date, coexisting retinal disease, uveitis, optic nerve abnormalities or non-glaucomatous optic neuropathy. A total of 39 patients (39 eyes) were enrolled. There were 12 men and 27 women. Mean age was 62.9 ± 8.7, range 35–76 years. One eye of each patient was included; one eye was randomly selected if both eyes were suitable for this study. Early POAG was diagnosed in 30 patients with glaucomatous ONH abnormalities (cup-to-disc ratio ≥0.5, cup-to disc ratio asymmetry ≥0.2, or focal thinning of the rim), a history of elevated IOP prior to study entry, and early glaucomatous visual field defect. Early glaucomatous visual field defect was established according to the staging system suggested by Mills et al. [15] by mean deviation ≥ −6.00 dB and at least one of the following: glaucoma hemifield test outside normal limits, pattern standard deviation significant at P < 0.05 or a cluster of three or more points depressed below the 5% level in an expected location on pattern deviation plot, at least one of which depressed below the 1% level. Nine patients with suspicious but less pronounced ONH changes and early glaucomatous visual field defect were classified as POAG suspects. Best-corrected visual acuity was not less than 20/25 in all eyes except one eye with mixed astigmatism and visual acuity 20/40. Spherical equivalent refraction ranged from −4.25 to +2.5 D. Astigmatism was not more than 2.0 D in all cases except one. This study adhered to the tenets of the Declaration of Helsinki and had local ethics committee approval, with informed consent obtained from all subjects.

Imaging methods

SD-OCT was performed with the Cirrus HD-OCT (Carl Zeiss Meditec Inc., Dublin, CA, USA), software version 4.5.1.11. The scan protocol was “Optic Disc Cube 200 × 200” analyzed by “RNFL Thickness Analysis”. Data acquisition and analysis software of Cirrus HD-OCT has been described in detail by many authors [812]. The analysis protocol is fully automatic, and does not need manual corrections. In this study, we analyzed overall average RNFL thickness and RNFL thickness in temporal, superior, nasal and inferior quadrants. Image quality (signal strength) on a 0–10 scale was 8 or more in 91.0% and 7 in 7.7% of cases; two images (1.3%) had a signal strength of 6.

CSLO was performed with HRT3 (Heidelberg Engineering, Heidelberg, Germany), software version 1.5.1.0. Maximal possible correction of astigmatism was performed. Each measurement consisted of three consecutive scans automatically averaged to a mean topographic image. Image quality was excellent or very good in 71.2%, and good in 26.9% of cases (mean standard deviation ≤30 μm); three images (1.9%) had mean standard deviations of 31, 33 and 38 μm respectively. A contour line was drawn manually to outline the optic disc margin on the mean topographic image. The contour line placement was thoroughly controlled using black-and-white and 3D modes of imaging. Once the contour line was drawn, the software automatically calculated all the ONH measurements. Thirteen stereometric parameters of the ONH and RNFL were analyzed in this study (see Tables 1 and 2).

Table 1 Intraoperator, interoperator repeatability and means of Cirrus HD-OCT and HRT3 measurements in patients with early glaucomatous visual field defect
Table 2 Intraoperator and interoperator within-subject coefficients of variation of Cirrus HD-OCT and HRT3 measurements in patients with early glaucomatous visual field defect

Each patient was examined both by HRT3 and Cirrus HD-OCT without pupil dilation in one session on the same day; the order of the two methods was selected at random. SD-OCT was performed by two experienced operators (MKS and SNO), each taking two measurements in turn. After each measurement, the patient was asked to remove the head from the headrest and replace it again. CSLO was performed in the same fashion by two experienced operators (MKS and INS). HRT3 automatically transfers the contour line to all further images. When comparing operators to each other, we avoided this transfer by creating two separate entries for each patient in the patients’ database.

Statistical Analysis

Statistical analysis was performed using the R software package version 2.10.1 (The R Foundation for Statistical Computing, http://www.r-project.org, accessed January, 28, 2010). The minimal required number of subjects was 31. This was calculated by setting the confidence interval as 25% on either side of the estimate of repeatability: n = 1.962/(2*0.252*(m–1)), where n is the number of subjects and m is the number of observations [16]. Repeatability of measurements (sometimes called the repeatability coefficient) was evaluated according to the Bland and Altman method [17] as √2*1.96*Sw , where Sw is the within-subject standard deviation. Sw was calculated as the square root of the average of within-subject variances (defined as half the square of the difference of two measurements). For the assessment of interoperator repeatability, we used only the first measurements of each operator. The standard error (se) of Sw was calculated as se = Sw/√2n (where n is the number of subjects) and the 95% confidence interval (CI) was defined as ±1.96*se [18]. We also calculated the within-subject coefficients of variation (CVw) as 100*Sw/M, where M is the mean of all paired measurements. Another method was used when individual subjects' standard deviations were related to their means, as confirmed by statistically significant Kendall's rank correlation tau (eight out of 52 cases). First, the within-subject coefficient of variation was defined as the square root of the mean of squared coefficients of variation for each subject (root mean square method) [19]. Then the Sw was calculated back from the coefficient of variation (Sw = CVw*M/100), and repeatability was again defined as √2*1.96*Sw. The comparison of coefficients of variation was performed by the Z test [20]. Standard errors (se) of coefficients of variation were calculated as se = CVw/√2n , where n is the number of subjects [21].

Results

Intraoperator and interoperator repeatability of all parameters studied are shown in Table 1. To simplify the comparison of different parameters, their means (taking into account all four measurements in each patient) are also shown in the right column.

Intraoperator and interoperator within-subject coefficients of variation of all parameters studied are shown in Table 2 and Fig. 1. In both Tables 1 and 2, parameters are listed in the order of increase of coefficients of variation (means of interoperator and averaged intraoperator data). Disc area is included in both tables for information purposes only, because this parameter is not used directly in diagnosing of POAG or evaluation of POAG progression. Disc area as measured by HRT3 was from 1.6 to 2.5 mm2 in most patients; in four patients it was less than 1.6 mm2, and in one patient it was 3.05 mm2.

Fig. 1
figure 1

Intraoperator and interoperator within-subject coefficients of variation (CVw) of Cirrus HD-OCT and HRT3 measurements in patients with early glaucomatous visual field defect (the intraoperator data for both operators are averaged; disc area is not included). *, **, *** — SD-OCT data significantly lower than the best HRT parameter — mean cup depth (P < 0.05, 0.01, and 0.001 respectively)

Both operators demonstrated similar intraoperator repeatability data, with the exception of mean cup depth and cup area (P < 0.01) and maximum cup depth (P < 0.05). The interoperator repeatability values of Cirrus HD-OCT parameters did not differ significantly from averaged intraoperator repeatability of both operators, with the exception of RNFL thickness in temporal quadrant (P < 0.05). In contrast, the interoperator repeatability values for most HRT3 parameters except cup shape measure were substantially and significantly increased compared to averaged intraoperator repeatability of both operators (P < 0.05-P < 0.001). Intraoperator and interoperator within-subject coefficients of variation demonstrated the same relationships as repeatability values.

According to Table 2 and Fig. 1, the Cirrus HD-OCT parameters performed much better than HRT3 stereometric parameters. Mean (for both operators) intraoperator within-subject coefficient of variation of the best Cirrus HD-OCT parameter (average RNFL thickness) was 2.6–2.7 times lower than the best HRT3 parameters [mean cup depth and rim area (P < 0.001)]. Mean intraoperator variability of RNFL thickness in quadrants (except nasal quadrant) was also significantly lower (P < 0.05-P < 0.001). The interoperator within-subject coefficients of variation both for average RNFL thickness and RNFL thickness in all quadrants were significantly lower than interoperator variability for the best HRT3 parameter [mean cup depth (P < 0.001)].

The within-subject coefficients of variation of the average/mean RNFL thickness as assessed by both methods demonstrated a striking difference: for SD-OCT they were 5.4–7.3 times lower.

Among HRT3 parameters, mean cup depth, rim area, and linear cup/disk ratio were the least variable, while cup volume, cup area, and cup/disc area ratio were the most variable parameters.

Discussion

Quantitative evaluation of glaucoma progression by any method could not be performed without the knowledge of its measurement error. It is necessary to differentiate changes caused by the test variability from the real progression of the disease. Although other studies have been published on the repeatability and variability of SD-OCT and CSLO, the importance of the present study lies in the fact that it reports for the first time repeatability and variability data for two modern units — Cirrus HD-OCT and HRT3 — in a head-to-head comparison in eyes of the same patients with early glaucomatous visual field defect. This information is valuable for selecting the parameters which are potentially most useful for monitoring glaucoma progression. As such, we believe the results are worth reporting.

Repeatability (repeatability coefficient) is a very clear and practical characteristic of the measurement error. It shows the maximal possible difference between two measurements taken in the same subject (valid for 95% of pairs of observations) [17], and so permits differentiation of real change from measurement error. For example, according to our data the relatively small 5-μm change in average RNFL thickness (6 μm if we account for CI) could be a significant sign of early POAG progression when measured by the same operator on Cirrus HD-OCT. This is in close agreement with the results of other studies [811, 14]. Repeatability data for other Cirrus HD-OCT and HRT3 parameters shown in Table 1 could be used for evaluation of other changes seen in a patient with early POAG or suspected POAG.

The most important question raised in the present study was the comparison of measurement errors of Cirrus HD-OCT and HRT3. This question could not be answered by repeatability studies, because repeatability does not permit comparison of measurement errors of the parameters of different nature (scale) such as the Cirrus HD-OCT and HRT3 parameters. Even the repeatability of similar parameters assessed by both instruments — average / mean RNFL thickness — could not be compared directly due to the large difference in their scales (as shown by the mean values: 83.3 and 231 μm). The proper comparison of parameters of different nature (scale) could be performed by using coefficients of variation that are independent of the unit and scale of measurement [21]. So the within-subject coefficients of variation were used for comparison of measurement errors of Cirrus HD-OCT and HRT3.

The results of the present study demonstrate quite low variability of RNFL measurements by the new method of SD-OCT. Our data are in agreement with the majority of the recently published papers. The coefficient of variation of average RNFL thickness in healthy and glaucomatous persons varies mostly from 1.4 to 3.0% [811, 14]. Only a few authors have shown worse data (4.3–4.5% or more) [12, 13].

The studies on variability of CSLO (HRT) measurements are contradictory, though more recent studies demonstrate results similar to ours. Most authors consider rim area, mean and/or maximal cup depth among the least variable HRT parameters [13, 6, 7], while cup volume (and sometimes other cup measures [2]), mean RNFL thickness, and RNFL cross-sectional area are among the most variable parameters [1, 3, 6]. Some authors showed different results — low variability of cup volume [4, 5] and cup area [4], and high variability of rim area [5]. But they used the earliest model of HRT. The quantitative data of these studies are not consistent, and sometimes could not be compared to each other due to differences in statistical methods used. Nevertheless, some data are very similar to ours e.g., interoperator coefficients of variation in the study by Strouthides et al. [6].

It should be noted that variability of HRT3 parameters, much more than variability of Cirrus HD-OCT parameters, depended on several cases with outliers (data differing by more than 3 SD from the mean). Outliers in standard reference height (its values and difference between measurements) were the most important. Outliers in scan depth (difference between measurements), optic disc size, astigmatism, age and, for comparison of operators, in difference in disc area (contour line position) also played a role. In practice, a very experienced HRT operator could notice such outliers and perform additional control examinations. But even after exclusion of outliers, the variability of HRT3 parameters still could not approach that of the Cirrus HD-OCT parameters.

The most important result of the present study is the obvious superiority of Cirrus HD-OCT over HRT3 in terms of variability of measurements. That implies that Cirrus HD-OCT might be a better instrument for monitoring of glaucoma progression. But that would be a hasty conclusion, because a smaller measurement variability of any parameter is not equivalent to a higher sensitivity to detecting change: e.g., disk area has the smallest interoperator variability of HRT stereometric parameters, but it might not be an indicator of glaucoma progression at all. Still, if we compare parameters that show similar trends with glaucoma progression, the lower variability of measurements would indeed mean the higher sensitivity to detecting change. It could be said, for example, about a comparison of average RNFL thickness (Cirrus HD-OCT) and mean RNFL thickness (HRT3). Clinicians should be aware of these results when following glaucoma patients.

The results of this study must be interpreted by recognizing some limitations. First, we quantified only intrasession and not intersession (intervisit) measurement error. But we believe that intrasession variability better reflects the measurement error of the method itself, because it is not influenced by the changes of the patient’s condition. Besides, there are conflicting data on the relationship between intrasession and intersession variability. Intrasession within-subject coefficients of variation have been found to be higher [6] or lower [10] than intervisit ones, or sometimes practically did not differ at all [8]. Secondly, we studied only patients with early POAG and POAG suspects, so the repeatability and variability data established in this study could not be extended to patients with other stages of the disease. Finally, we studied only stereometric HRT3 parameters and not the Moorfields Regression Analysis (MRA), Glaucoma Probability Score (GPS) or discriminant functions such as those of Frederick S. Mikelberg (FSM), Reinhard O.W. Burk (RB) etc. However, it should be noted that all these measures are primarily based on stereometric parameters, and the measurement error of such parameters plays a role in any subsequent analysis.

In conclusion, the present study showed that SD-OCT with Cirrus HD-OCT provides excellent intrasession repeatability (both intra- and interoperator) of the RNFL measurements, especially of the average RNFL thickness in patients with early glaucomatous visual field defect. The measurement error (expressed as within-subject coefficient of variation) of RNFL measurements by Cirrus HD-OCT is much lower than the error of HRT3 measurements of stereometric parameters of the ONH and RNFL.