Introduction

Optical coherence tomography angiography (OCTA), as a very important extension of optical coherence tomography (OCT), has significantly advanced our power to visualize and quantify the retinal and choroidal microvasculature [1, 2]. It compares the signal in consecutive B-scans at the same location to reveal the movement of erythrocytes illuminated with near-infrared light and then generates perfusion images [3]. Compared with fluorescein angiography (FA), OCTA has apparent advantages of noninvasive, dyeless scans into specific depths of retina within seconds and provides accurate size and localization information [4, 5]. Furthermore, it has the benefit of providing high-resolution digital images that are accessible to quantification of the retinal and choroidal vasculature, which can be used for diagnosis and following up the retinal diseases [6,7,8,9].

The foveal avascular zone (FAZ) metrics are one kind of the important OCTA quantitative measurement. FAZ is a capillary-free zone in the center of the macula, whose border is connected by the capillaries running in the inner retinal layer and the capillary network at the margin of the fovea [10, 11]. The area, perimeter and circularity are three quantitative parameters commonly used for evaluation of FAZ. Changes in the FAZ metrics, indicating the microcirculatory state of the fovea, are most likely related to macular ischemia, such as diabetic retinopathy [12].

FAZ metrics can be measured in manual or automated methods. Manual measurement requires outlining the border of FAZ on OCT-angiograms by hand, while in automated measurement, this job can be performed by new image processing algorithms. It is apparent that the automated measurement is more convenient, rapid and has the advantage of avoiding inter- and intra-observer variability. However, there may be error in outlining the border of FAZ by the algorithms, which will lead to inaccurate FAZ metrics. Validation is thus undoubtedly essential before these automated algorithms are applied to the clinical practice.

There are several models of commercially available OCTA devices, and some of them provide built-in algorithm for automated measurement of FAZ metrics [13, 14]. There have been already some reports comparing automated and manual measurements of FAZ metrics for Optovue but not for Cirrus OCTA [15, 16]. Our study aims to investigate the reliability of the FAZ using the automated algorithm of Cirrus OCTA, by comparing its repeatability to the manual measurement and analyzing the agreement between the manual and automated measurements.

Methods

Participants

This cross-sectional study was conducted at Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, which adhered to the tenets of the Declaration of Helsinki for research involving human subjects. The informed consents were obtained after explanation of the nature and possible consequences of the research, which was approved by the institutional review board (IRB). The subjects between 18 and 50 years of age without any apparent ocular diseases were recruited. The subjects all had normal retina with best-corrected visual acuity (BCVA) at least 20/20 using the Snellen chart, intraocular pressure less than 21 mmHg and refractive error within ± 6 Diopter (D).

It was assumed that 95% confidence interval of within-subject standard deviation (Sw) is estimated within 15% of Sw, 1.96 × Sw/\(\sqrt {2n(m - 1}\)) = 15% × Sw, that is, n = 1.96/[2(m − 1) × 0.15], in which n and m represent the number of subjects and measuring times, respectively [17]. As this assumption decided the repeatability of the sample size, we measured four times on each subject so that n was calculated to be 30. The sample size of the agreement analysis was determined by the formula n ≥ log(1 − β)/log(1 − α), in which n, α and β mean the sample size, the discordance rate and the tolerance probability, respectively [18]. When α = 0.05 and β = 80%, n ≥ 32.

OCTA imaging

Mydriasis was obtained by using topical 0.5% tropicamide so that the pupils were dilated to more than 6 mm diameter. The eyes were scanned four times continuously using Zeiss Cirrus HD-OCT 5000 with AngioPlex software (Carl Zeiss Meditec, Dublin, CA) operated by a skillful technician under the same circumstance. It is a spectral-domain OCTA device featuring eye tracking during acquisition and obtains angiographic images via a proprietary optical microangiography (OMAG) algorithm. The scanning protocol was macular 3 mm × 3 mm scan. By segmenting preset layers of interest at the posterior pole, the viewing software captures automated tissue boundary detection and the en face images of the superficial capillary plexus (between the ganglion cell layer and the inner plexiform layer) were generated. The current version of Cirrus OCTA provides only analysis of the superficial but not the deep capillary plexus; thus, only the superficial capillary plexus was measured. Those pictures whose image quality index automatically detected (ranged from 0 to 10) was less than 6 had already been excluded.

Measurement of the foveal avascular zone

The angiograms were exported in duplicate to the two masked observers for measurement. The order of the pictures was randomized in sequence to avoid contextual bias. By means of the image tracing, scaling and caliper tool set in ImageJ (National Institute of Health, Bethesda, MD), the region of FAZ was enclosed by the manually tracing outlines. Next, the area and perimeter of FAZ were calculated, in which all pixel values were converted to micrometers. Then, the circularity of the FAZ was calculated using the formula, which was an index as a ratio of the measured area to the expected area [19]. It was a prediction way of compactness of a shape relative to a circle. A ratio closer to 0 implies a more irregular shape far away from a circle [20]. All the angiograms were also measured automatically using the embedded algorithm of Cirrus OCTA (version 10.0.0.14618).

Statistical analysis

Repeatability of the four measurements was assessed using Sw, precision (repeatability coefficient), coefficient of variation (CoV) and intraclass correlation coefficient (ICC). The Sw was calculated as the square root of the within-subject mean square of error [21]. The precision was calculated as 1.96 times Sw. The CoV was calculated as 100 × Sw/overall mean [19]. ICC was calculated with the single-measurement, absolute-agreement, 2-way mixed-effects model both in the repeatability and in agreement analysis. It was suggested that ICC values less than 0.5 are indicative of poor reliability, values between 0.5 and 0.75 indicate moderate reliability, values between 0.75 and 0.9 indicate good reliability, and values greater than 0.90 indicate excellent reliability [22]. Boxplots are used to compare manual and automated metrics of repeatability for each parameter with statistic p-values. The agreement between the first measurements of each subject by different methods was analyzed using the paired t test, the linear regression and the Bland–Altman plots. The statistical significance was defined as p < 0.05. All statistical analyses were performed with SPSS software version 19 (SPSS, Inc., Chicago, IL, USA) and GraphPad Prism (v5.01, GraphPad Software, Inc).

Results

There were 35 volunteers recruited in our study (11 men and 21 women). Either of their eyes was selected to be evaluated, including 22 right eyes and 13 left eyes. The mean of the subjects’ ages was 25.3 ± 4.5 years old (range 20–47 years). The mean spherical equivalent was − 2.21 ± 1.97 D (range − 5.50 to + 0.75 D) in the right eye and − 2.19 ± 2.12 D (range − 5.75 to + 0.75 D) in the left eye. The mean of image quality index was 8.40 ± 1.40 (range 6–10).

Table 1 presents the mean and standard deviation (SD) of FAZ metrics measured automatically by the automated algorithm and manually by the two independent observers. The mean FAZ area measured automatically is significantly smaller than manually (0.255 ± 0.112 mm2 vs. 0.324 ± 0.105 mm2 and 0.340 ± 0.107 mm2, p < 0.001). The FAZ perimeter and circularity were also smaller in automated measurement compared to manual measurement. Figures 1 and 2 demonstrate two examples. It can be found that the automated algorithm outlined the border of FAZ in error. The border outlined by the automated algorithm is much smaller compared to that outlined by two observers in 22.9% cases demonstrated on the scatter plots (Fig. 3).

Table 1 Repeatability of FAZ metrics measured by Cirrus OCTA automatically (Z) and manually (A and B) in normal subjects
Fig. 1
figure 1

An example of error of automated measurement of foveal avascular zone by the Cirrus OMAG algorithm. The results of FAZ automated metrics are shown in the OCTA report (a). And two images (b, c) are manually measured by two observers. (0.291 mm2, 2.100 mm, 0.829 measured by observer A and 0.299 mm2, 2.102 mm, 0.850 measured by observer B)

Fig. 2
figure 2

An example of error of automated measurement of foveal avascular zone by the Cirrus OMAG algorithm. The results of FAZ automated metrics are shown in the OCTA report (a). And two images (b, c) are manually measured by two observers. (0.370 mm2, 2.240 mm, 0.927 measured by observer A and 0.493 mm2, 2.684 mm, 0.859 measured by observer B)

Fig. 3
figure 3

Linear agreement ac with 95% CI (blue zone) and Bland–Altman plots df of foveal avascular zone area measured by two observers (A and B) and automated algorithm (Z)

The repeatability of FAZ metrics is also shown in Table 1. The FAZ area and perimeter in manual metrics have good repeatability (ICC ≥ 0.845, CoV < 13.48%) while in the automated metrics have poor to moderate repeatability (ICC ≤ 0.600, CoV < 19.39%). The circularity of FAZ measured by observer A (ICC = 0.608) and observer B (ICC = 0.538) has moderate repeatability while that measured automatically (Z) has poor repeatability (ICC = 0.221).

The results of the agreement among three different measurements are demonstrated in Table 2 and Figs. 3, 4 and 5. The paired-t test revealed no significant differences for measurements of three parameters between two observers. The ICCs of the manual measurement between two observers were excellent for FAZ area and perimeter (ICC = 0.933 and 0.906, respectively), and moderate for circularity (ICC = 0.674). However, the ICCs of automated–manual measurement were poor (ICC ≤ 0.360) and similar for both observers. The linear regression proved much stronger agreement of the manual measurements compared to the automated ones. The range of 95% limits of agreement was similar in the agreement between the automated and manual measurements for either observer, while the range of 95% limits of agreement manually measured by the two observers was only 20% to 31% of those of automated–manual agreement.

Table 2 Agreement of FAZ metrics measured by Cirrus OCTA automatically (Z) and manually (A and B) in normal subjects
Fig. 4
figure 4

Linear agreement ac with 95% CI (blue zone) and Bland–Altman plots df of foveal avascular zone perimeter measured by two observers (A and B) and automated algorithm (Z)

Fig. 5
figure 5

Linear agreement ac with 95% CI (blue zone) and Bland–Altman plots df of foveal avascular zone circularity measured by two observers (A and B) and automated algorithm (Z)

As was shown in the boxplots (Fig. 6), for the repeatability analysis, the automated groups were more extended in distribution than the manual groups. The manual metrics of three parameters were quite comparable. But there was no statistical difference (p < 0.05) in most groups of comparisons except the perimeter in Z&A and Z&B comparisons (p = 0.249 and 0.066, respectively).

Fig. 6
figure 6

Boxplots of area (a), perimeter (b) and circularity (c) measured by automated algorithm (Z) and two observers (A and B) in repeatability analyses

Discussion

In the current study, we found that in Cirrus HD-OCT 5000 OCTA, the FAZ metrics of the superficial capillary plexus, including the area, perimeter and circularity, measured automatically were significantly smaller than those measured manually. In addition, for all three parameters, the repeatability of the manual measurement was better than that of automated measurement. The agreement between the manual measurements by the two observers was also better compared to that between automated and manual measurements. The Cirrus built-in automated algorithm obviously outlined the border of FAZ wrongly in 22.9% cases in our study.

The strength of our study was to evaluate the reliability of automated FAZ metrics with the manual metrics by Cirrus 5000 OCTA. There have been couples of studies accessing the repeatability, reproducibility or agreement in kinds of devices, but for Cirrus have been still few so far. The past publications are summarized in Table 3 and compared with our current study. The scanning numbers, measurement methods, imaging randomization and statistical methods are various studies, affecting the results possibly. Therefore, they are worthy to be compared and discussed in detail.

Table 3 Comparison of Cirrus 5000 OCTA FAZ area measurement repeatability, reproducibility and agreement

Repeatability and agreement are significant indicators to evaluate the reliability and applicability of any device to be used as a diagnostic or monitoring tool in clinical practice. The low repeatability of the inbuilt automated measurement (ICC = 0.600, 0.405 and 0.221 for the FAZ area, perimeter and circularity, respectively) and its poor agreement with the manual measurement (wide range of 95% limits of agreement) were indicative of the low reliability. As Table 3 shows, Anegondi et al. have reported the excellent repeatability of automated FAZ metrics (ICC = 0.99), but they used local fractal dimension methods to quantify FAZ parameters instead of the inbuilt software [23]. The automated software has been also evaluated in Shiihara et al’s study but showed a controversial result, showing excellent manual–automated agreement using the same ICC model as ours (ICC: 0.987) [24]. However, they evaluated only one parameter and compared with only one observer. Furthermore, they have not investigated the repeatability of automated metrics, neither used the Bland–Altman plots, leading to the less persuasive results.

Our study also found that the embedded algorithm segmented the border of FAZ wrongly in 22.9% cases. This explains why our automated–manual agreement is poor and the repeatability of automated measurement is low. As Figs. 1 and 2 show, the FAZ identified by the automated algorithm is marked with a yellow area, which is obviously smaller than the manual ones. There may be some noise signal inside the FAZ on the superficial capillary plexus en face image. That would mislead the automated algorithm to recognize the noise as the signal of capillary and then outline the border of FAZ, resulting in the inaccurate FAZ detection.

Our results also proved that the manual measurements of FAZ area and perimeter have good repeatability and excellent inter-observer reproducibility, while the measurement of circularity has moderate repeatability and inter-observer reproducibility. As shown in Table 3, other studies have investigated the excellent repeatability and reproducibility. Zhao et al. [1] have reported a higher inter-observer ICC value than ours (0.998 vs 0.933). Different from our manual measurement method using ImageJ, they used a semiautomatic method by using MATLAB to analyze the obtained OCTA data. The better intra-observer repeatability and inter-observer reproducibility have been demonstrated both in Shiihara et al’s study [24] and in Dave et al’s study [25]. The eyes were scanned only one time in their studies but four times in our study, which would definitely affect the obtained images and measurement results. Besides, Shiihara et al. [24] have not reported whether the images were randomized or not. In our study, the exported images were all randomized in sequence to avoid contextual bias. Furthermore, although ICCs provide measures of reliability, various forms of calculating methods can result in different results when applied to the same data. Each form is fitted in specific situations, which should be used in application appropriately and described clearly in the essay. However, researchers usually ignore or are not aware of the importance of reporting which form they used. They just mentioned that they calculated ICC in order to assess the repeatability or agreement, without explaining the exact methods to calculate [26,27,28]. Dave et al. and Shiihara et al. [24, 25] have reported that they used linear mixed models and 1-way random effects model to calculate the intra-rater correlation coefficients, respectively, which differs from ours. It is suggested that 2-way mixed-effects model was appropriate for testing intra-observer reliability with multiple scores from the same rater; therefore, we selected this model instead of 1-way random effects model [29]. Although there were still minor differences between the results of our study and the previous published literature, all the results showed that the manual measurement of FAZ metrics has good to excellent repeatability and inter-observer reproducibility. They have demonstrated that manual measurement is indeed a reliable method in FAZ metrics.

By means of a systematic and scientific way, our study investigated the repeatability of the embedded automated and manual measurements for the superficial FAZ metrics in Cirrus OCTA and the agreement of the automated vs manual measurement. And the results suggested that the current version of embedded automated algorithm is not reliable in outlining the border of FAZ compared to the manual outlines. It is vital to be more careful when using the automated measured FAZ metrics on Cirrus OCTA in clinical practice. Verifying the correct automated FAZ outlines is worthy before the measurement results exported are put into practice. Our results also strongly call for the necessity in the development of more advanced algorithm to automatically quantify FAZ metrics on Cirrus OCTA.

We recognized that there are some limitations existing in our study. First, our subjects are all healthy, so the results may not be suitable for those with ocular diseases, which should be needed for more investigations. Second, we only accessed the reliability of Zeiss Cirrus HD-OCT 5000 with AngioPlex software using 3 mm × 3 mm scanning mode but not 6 mm × 6 mm scanning mode, in which an opposite result may turn out. Third, our results were only applicable to the current version of the Cirrus embedded algorithm.

In conclusion, the repeatability and agreement of automatically measured FAZ metrics are worse than those of manually measured FAZ metrics by Cirrus OCTA on the superficial capillary plexus in healthy subjects. The manual metrics is more reliable and considered as a gold standard in FAZ metrics by Cirrus OCTA while caution should be taken for the automated metrics by the built-in algorithm.