Introduction

Assessment of distant visual acuity is the most frequently used procedure for estimating visual performance. There are many different visual acuity tests in use worldwide. In the US, the most commonly used charts include Snellen and Bailey acuity charts [1] as well as Early Treatment Diabetic Retinopathy Study (ETDRS) charts [2]. ETDRS visual acuity charts have five letters on each row and well-defined spacing between the letters and lines to standardize the crowding effect and the number of errors that are allowed on each line. Many consider ETDRS charts to be the gold standard in visual acuity testing [3]. The eight-orientation Landolt C is the international standard optotype for acuity testing as defined by ISO 8596 [4]. The standard includes information on the size progression and the spacing of the optotypes. Various countries have adopted the use of the Landolt C. In Germany, for instance, building upon ISO 8596 the national standard DIN 58220-3 [5], prescribes the Landolt C as the optotype to be used when acuity is tested for the purpose of a medical expert opinion or fitness-to-drive examinations. A computerized and automated variant of this test is the Freiburg Visual Acuity Test (FrACT) [6].

The variety of visual acuity tests throughout the world may critically interfere with the standardization and comparability of visual test methods.

The type of degradation of visual perception greatly varies between different ophthalmologic disorders [7, 8] and, thus, may not be reflected by all acuity tests to the same degree. Therefore, we assessed the accuracy and reliability of ETDRS charts and a projected eight orientation Landolt C test in ophthalmologically healthy subjects as well as patients with widely different eye diseases, from optical to retinal to visual pathway problems. Originally, we also compared the FrACT test in its eight orientation Landolt C setting. For the patients’ response entry, we used a novel haptic input device, a Landolt-C-shaped rotary switch, which was operated by the subjects to indicate the perceived Landolt orientation. While we had expected that usage of the new haptic input device would be intuitive to patients, the device proved suboptimal in real-world testing, particularly for older individuals. In order to avoid confounding the influence of the input device and the test type, FrACT results are not reported here.

The aims of this study were to quantify (i) the agreement and (ii) the test–retest reliability of the ETDRS and Landolt C visual acuity tests in a wider range of pathological conditions than in previous studies. Furthermore, we assessed two parameters that are of practical importance, namely (iii) the test durations, and (iv) the acceptance of the tests by the examinees as well as the subjects’ coping with the tests as rated by the examiner.

Subjects & methods

Subjects were excluded from our study based on the following three exclusion criteria:

  • age below 18 years

  • the presence of more than one ophthalmologic disease

  • illiteracy (because the letters on the ETDRS visual acuity charts have to be named).

Altogether, we included 75 subjects presenting with a visual acuity of 0.2 (4/20) or better. [At a decimal acuity of 0.1 (4/40) and below, letter and Landolt-C results apparently deviate [7], thus, we avoided this region.] They belonged to one of the six groups below:

  • normal ophthalmologic status; n = 12, median age: 40, range 23–68 years

  • opacity of the refractive media; n = 12, median age: 72 (61–84) years

  • maculopathy; n = 12, median age: 70 (33–78) years

  • optic neuropathy (optic neuritis, glaucoma, anterior ischemic optic neuropathy); n = 12, median age: 56 (26–77) years

  • chiasmal and postchiasmal visual pathway pathologies; n = 12, median age: 45 (23–79) years

  • amblyopia (12 strabismus, 3 deprivation); n = 15, median age: 37 (19–73) years

Prior to testing, the subjects were informed about the scientific purpose of the study as well as the practices used to protect their data and right to privacy. Then, informed consent was obtained in advance of the study from all who participated. The survey design was approved by the local institutional review board of the Medical School of Tuebingen University and was performed in accordance to the ethical standards laid down in the 1964 declaration of Helsinki [9].

Testing

The tests were carried out in an artificially lit room, with optotype and background luminance levels in accordance with the guidelines of physiological visual acuity examinations by the Deutsche Ophthalmologische Gesellschaft (DOG, German Ophthalmologic Society) [10]: The luminance of the optotype background was between 82 cd/m2 (ETDRS) and 283 cd/m2 (Landolt) and ambient illuminance was at 2 lx for all the test runs, in agreement with the range required by ISO [4]. The test distance was 4 m. Each of the three tests was performed on one eye only while the fellow eye was occluded. If both eyes were affected by the respective disease, the eye with the lower visual acuity was tested as long as the visual acuity was 0.2 (4/20) or better. The test sequence was randomized as follows: ABCABC, BCABCA, CABCAB, ACBACB, BACBAC, CBACBA (with: A = Landolt C projection, B = FrACT [data not reported, see Introduction], C = ETDRS charts). For each permutation, two subjects were tested, except for the amblyopia group with 15 subjects, where the permutations ABCABC, CABCAB and ACBACB were tested three times. There was no practice run and the subjects were tested with their habitual refraction correction, i.e., with their own glasses or contact lenses. The forced choice method [11, p26] was consistently applied. For both tests, each optotype was read once, corrections were not allowed. Feedback to accuracy of the statements was not given [12]. The ETDRS and Landolt-C tests were terminated if the subject made three or more mistakes in a row of five optotypes. The test duration, excluding explanations, was measured manually by means of a stopwatch for each single test run. Time measurement was taken from the point when the subjects started to read the optotypes until they made more than two mistakes in one line. At the end of the testing, the subjects were asked to rate their subjective opinions regarding the two tests by means of visual analogue scales [13]. These analogue scales were lines with a length of 10 cm, labeled at both ends—one end with the German translation of “pleasant test” and the other end with “unpleasant test”. Concurrently, the examiner evaluated the test, also using analogue scales and in consideration of the individual patients’ coping with the different test designs.

Landolt C

For the Landolt C projection visual acuity test, the Chart Projector CP-500 by Shin-Nippon was used, which has been approved for clinical studies by the DOG [10] and adheres to the norms ISO 8596 [4] and DIN 58220 part 3 [5]. There were five optotypes in a line for each visual acuity step, with three Landolt C’s having a straight and two a diagonal orientation [10]. The progression of optotype size was geometric and relative spacing between the Landolt Cs increased with decreasing size of the optotypes. Only one line of five optotypes (each with eight possible orientations) was presented to the examinee at a time. If at least three out of the five Landolt C’s were identified correctly, the line of the next smaller optotypes was projected [14]. There were two versions for the acuity levels 0.63–0.8–1.0 and 1.25. Thus, in the second run, a different version was offered in order to prevent memorizing the order of the Landolt C’s. If good visual acuity was expected, the test was started at the line corresponding to a visual acuity of 0.2. In case of expecting a low visual acuity due to previous examinations at the hospital, the presentation started at 0.05 already.

ETDRS

The retro-illuminated visual acuity charts A and B of the Steinbeis Transfer Center visual acuity tester [15], according to Ferris’ guidelines [16], were employed for the ETDRS test. The ten different Sloan letters C, D, H, K, N, O, R, S, V, Z were presented in standardized lines of five letters, each. According to the manufacturer’s information [17], each line has the same reading difficulty. The letter sizes ranged from 58.2 to 2.91 mm and the progression of letter height from line to line was geometric [16]. The space between letters was one letters’ width and the space between lines equal in height to the letters of the next lower line. The two charts were used alternatingly between the two test runs to avoid memorizing responses from the first to the second run. The examinee was asked to read down the chart, starting at the top row and continuing with the next row of smaller optotypes as long as three out of the five optotypes in one line were identified correctly [14]. This testing procedure does not conform to the original ETDRS protocol but was chosen in order to be consistent with the applied Landolt C test procedure.

Statistical analysis

For a general comparison of the two tests, an analysis of variance (ANOVA) [18] was used. The t test for two dependent samples [19] was applied for a comparison of the two tests over all disease groups and for estimating the differences between the two test runs within one disease group. For calculating the medians and corresponding 95 % confidence intervals of comparisons of the two test types within one disease group, the bootstrap [19], as a nonparametric resampling method, was used. The Wilcoxon signed-rank test was adopted for assessing the discrepancies of two different test types at a time for test durations, the duration declines between the test runs and the differences of visual analog scale (VAS) results [19]. Bland-Altman plots [20] were used for visualizing the acuity disparities between the first and the second test runs of visual acuity and test durations. The central lines show the means of measurement differences and the upper and lower lines indicate the mean ±1.96 standard deviations (limits of agreement, [20]). The null hypothesis was rejected in case of p < 0.05 and; hence, a statistically significant difference was assumed.

Results

Test–retest reliability

Visual acuities obtained in second runs were similar or slightly better than first run scores. For ETDRS charts, the greatest difference between the two test runs was found in the optic neuropathy group, where the second run improved by 0.097 logMAR compared to the first run (Fig. 1). Landolt C test results showed better test–retest agreement: The highest deviation can be seen in the group opacity, with a value of 0.048 logMAR. Pooling all disease groups, the divergence of the two test runs, with a 95 % probability, was ≤0.18 logMAR both for Landolt and ETDRS tests. The confidence interval for the differences always straddled zero. Altogether, the Landolt C test provided a slight, non-significant (p = 0.33) better test–retest reliability compared to ETDRS.

Fig. 1
figure 1

Bland-Altman plots for test–retest visual acuity differences (test 2 minus test 1), segregated by disease group (see grey boxes). Multiple coinciding data points are represented as “sunflower” markers [21]

Inter-test agreement

In general, visual acuity scores obtained by the two different test types corresponded very well (see Fig. 2 for pertinent Bland-Altman plots). The maximum deviation between the Landolt C and ETDRS test scores was measured in the groups with maculopathy and amblyopia with a median difference of 0.048 logMAR each, with Landolt acuity being better than ETDRS acuity. The confidence interval for the differences always straddled zero. With this close agreement, there was no statistically significant difference between the results of the two tests.

Fig. 2
figure 2

Bland-Altman plots for visual acuity differences (ETDRS results minus Landolt-C results) between ETDRS and Landolt-C; see also legend Fig. 1

Inter-test comparison of test durations

The absolute durations of each test run are given in Table 1. The Landolt C test lasted 1.8 times longer than ETDRS (142 vs. 77 s). In subjects with normal ophthalmologic status, the Landolt C test lasted 1.5 times longer than the ETDRS. The group with opacity of the refractive media needed most time both for the Landolt C test (174 s) and ETDRS (83 s). In all groups, the duration difference between tests was significant (p ≤ 0.005, consistently).

Table 1 Test durations per run of the two tests per disease group

Examinees’ and examiner’s rating of the two tests

On the basis of the visual analogue scale (0 = unpleasant test, 100 = pleasant test) score medians, examinees rated the ETDRS charts better (median over all groups 90) than the Landolt C test (median 84, p < 0.001, Wilcoxon signed rank test), as can be seen in Table 2. Regarding their evaluation, the subjects with amblyopia had the most problems with the Landolt test. Figure 3 depicts these evaluations for the group amblyopia as an example. The examiner’s rating did not differ between the tests (ETDRS median 88, Landolt median 87, p = 0.20; see Table 2 for medians and corresponding confidence intervals per group).

Table 2 VAS (visual analogue scale) evaluations per run of the two tests per disease group
Fig. 3
figure 3

Visual analog scale (VAS) results for the group amblyopia. Both examinees (strongly, p < 0.001) and examiner (slightly, p = 0.20) preferred the ETDRS test. [Box-plot details: the median is indicated by the thick horizontal lines, the box covers the 25–75 % percentile range, the notches represent a 95 % confidence intervals for the medians (in two boxes here they are slightly larger than the boxes), outliers (1.5× interquartile range beyond the quartiles) are indicated by dots, and the “antennas” indicate the range without outliers]

Discussion

Test–retest reliability of the two tests

Before comparing tests, the intra-test reliability needs to be assessed. Reproducibility of visual tests can be characterized by the deviations of acuity scores between repeated measurements under the same test conditions. According to Petersen [22], with a test criterion of 3 out of 5, merely one third of the subjects would reach the same results with an eight-position Landolt C test over repeated acuity assessments and in 47 %, a difference of one line is expected. Thus, only a difference of three lines or more would testify a relevant change of visual acuity [22]. Another study by Rosser [23], dealing with the sensitivity of ETDRS charts to clinical change, states that only a deviation exceeding 0.2 logMAR, which corresponds to two acuity lines, can reliably display a clinical alteration. Similar to this, Arditi and Caganello [24] found that, with a 95 % confidence, ETDRS chart scores within ±0.1 logMAR for repeated test runs will only be achieved under optimal conditions. They state that an acuity change can be considered as significant when the acuity results deviate at least ±0.14 logMAR (±1.4 lines) [24]; these good results may be due to the low number of subjects (3), all of which were very experienced in psychophysics testing. With the greatest discrepancies in our study being about one line (optic neuropathy patients with the ETDRS test) and 0.5 lines (opacity patients with the Landolt C test), and all the other discrepancies being very low (medians of differences below 0.01 logMAR), both tests in our study indicate an overall good test–retest reliability.

An important question to consider at this point is whether the improvement of 0.097 logMAR between repetitions found in the group with optic neuropathy is clinically relevant. This value seems rather high when compared to the limit of ±0.02 logMAR proposed for the agreement of two acuity tests [5]. That latter value, however, applies to data obtained with healthy individuals [25], for which the present study also found a much smaller test–retest difference.

Comparison of acuity scores between the ETDRS and Landolt C tests

Several studies have already investigated the variance of visual acuity scores across different visual acuity tests. Becker et al. [26] found that subjects with healthy ophthalmologic status and patients with cataract, strabismus amblyopia, refractive amblyopia and maculopathy tested by both an eight position Landolt C and an ETDRS test, did not show significantly differing scores between the two tests. Kuo et al. [7] found no statistically significant deviations in acuity results between a four position Landolt C test and an ETDRS test for the visual acuity range ≥ 0.1 in patients with cataract, maculopathy and in healthy people. Teichler [8] described good agreement for testing with Landolt C (four orientations) and ETDRS charts over all his subject groups (healthy, refractive amblyopia, strabismus amblyopia, cataract and retinal diseases) for visual acuities above 0.32 (20/60). All these outcomes are in good accordance with our results.

For maculopathy patients with a visual acuity below 0.1 (20/200), Kuo et al. detected significant differences in their study [7]. Also Teichler [8] found the largest discrepancies within the visual acuity range of 0.1–0.32 (20/200–20/60) in his sample of patients. In the current study only subjects with a visual acuity ≥ 0.2 (20/100) participated.

Test durations

We could not find any study that addressed the duration of a Landolt-C test in contrast to an ETDRS test. There are some studies on decreasing ETDRS test duration, e.g., by computerizing the test [23, 2729]: In a study by Laidlaw et al. [27] with 70 adults (eight normal, six corneal disease, 27 surgical retinal disease, three optic neuropathy, 26 mixed pathology), median ETDRS testing time was 66 s. In Rosser’s [30] study most (49 %) of the 41 patients with cataract, pseudophakia or early glaucoma needed between 50 and 100 s to complete the ETDRS test. Even though these two studies [27, 30] implemented different test termination criteria, namely that a complete line of letters had to be read incorrectly to stop the test, the durations of those tests compare well with ours for the ETDRS charts.

Camparini et al. [28], who followed the original ETDRS protocol, measured a mean test duration of 99 ± 29 s for the ETDRS tests in their 57 subjects (35 refractive errors, six cataract, eight maculopathy, three trauma, two glaucoma, two diabetic retinopathy, one thyroid-associated ophthalmopathy), which is more time than our subjects needed. In contrast, the 40 ophthalmologic patients with various stable eye diseases in the study by Lim et al. [29] required only 35 s (mean) for reading the ETDRS, which is less than in our study. In Lim’s study however, the forced choice method was not applied and the tests were terminated when, after one motivational request to guess, the subjects stated that they could not recognize any more optotypes.

Visual analogue scale scores

The test rating by means of the visual analogue scales by the patients and the examiner represent subjective evaluations. The analysis of the visual acuity scale scores provides an impression of how the diverse tests were accepted. One reason for the somewhat better acceptance of the ETDRS charts by the patients might be a higher level of familiarity with letters compared to Landolt Cs.

Conclusions

In conclusion, our results underscore that there is little difference between the outcomes of ETDRS and Landolt C tests. The present data suggest that the Landolt C may have a slight advantage in test–retest reliability, which will need to be confirmed in a larger sample of participants. The ETDRS test required less time and was preferred by the patients, while the examiner on average had no preference. Taken together, the present data do not clearly favor one or the other test.