Introduction

Contrast sensitivity has been gaining traction recently as a visual function endpoint with good promise. Compared to visual acuity (VA), the traditionally used functional endpoint used for decades, contrast sensitivity seems to better correlate with vision-related quality of life and patient-reported outcomes [1] and to also better correlate with structural changes in several retinal conditions [2,3,4,5,6]. Test-retest repeatability, responsiveness to time and intervention, testing efficiency, and validity against criterion tests are also crucial towards the validation of a new functional endpoint.

Among the currently available contrast sensitivity tests, the quantitative contrast sensitivity function (qCSF) method seems the most promising. Thanks to its built-in active learning algorithms, the qCSF method efficiently measures contrast sensitivity function in only 3–5 min per eye, while also offering personalized testing. Further, in contrast to the traditionally used Pelli-Robson test, the qCSFtest measures contrast sensitivity thresholds at multiple spatial frequencies, fulfilling the FDA requirement for use of contrast sensitivity as a functional endpoint. It has already been used to report visual function outcomes in several ocular diseases [3,4,5, 7, 8]. At the same time, small studies suggest that the qCSF contrast sensitivity shows very good test-retest reliability [9,10,11,12], while other contrast sensitivity tests that evaluate thresholds at various spatial frequencies have been criticized for their low test-retest reliability [8,9,10,11,12,13,14].

Recent literature points towards the validation of qCSF-measured contrast sensitivity as a functional endpoint: in terms of responsiveness to time, qCSF contrast sensitivity seems to be affected earlier in the course of neurodegenerative diseases, including age-related macular degeneration and diabetic retinopathy (DR), and to be significantly different across disease stages [4, 13, 15,16,17,18,19,20]. In terms of responsiveness to treatment, changes in qCSF contrast sensitivity seem be larger than changes in VA following anti-vascular endothelial growth factor (VEGF) injections [5, 21,22,23].

Regarding validity against criterion or construct tests such as the Pelli-Robson, several clinical trials that directly compare qCSF results with the traditionally used Pelli-Robson test are currently ongoing. In diabetic retinopathy only, three ongoing trials are currently evaluating the validity of qCSF contrast sensitivity as a functional endpoint [24,25,26].

While ongoing studies evaluate the qCSF method in terms of responsiveness to time and intervention, and validity against criterion tests, a comprehensive study reporting on the test-retest repeatability of the qCSF method is currently missing. Any potential visual function endpoint should be highly repeatable so that clinicians and investigators can accurately define true change in visual function.

Herein, we aim to investigate and report on the test-retest reliability and agreement of the qCSF method in the retina clinic, adding a piece of evidence towards the validation of qCSF contrast sensitivity as a functional endpoint.

Methods

Study design

This is a cross-sectional observational, single-center study including patients from the retina clinic at Massachusetts Eye and Ear (MEE) recruited and tested from June 2021 to December 2022. The Institutional Review Boards at Massachusetts General Brigham approved the study protocol. Informed consent was waived as qCSF-measured contrast sensitivity is part of standard clinical testing at MEE, and data were retrospectively reviewed. The study was compliant with the Health Insurance Portability and Accountability Act of 1996 and was adherent to the tenets of the Declaration of Helsinki.

Subject enrollment and standard clinical testing

All retina clinic patients were eligible to be included in this test-retest study of the qCSF device as long as they had no previous experience with qCSF-measured contrast sensitivity testing. All participants were tested only on their right eye. Exclusion criteria included best corrected visual acuity (BCVA) < 20/200 or inability to complete testing. All subjects underwent a comprehensive ophthalmic examination including history taking, measurement of visual acuity (VA) with Snellen charts, measurement of intra-ocular pressure, color fundus photography, spectral-domain OCT imaging, slit lamp examination, and dilated fundus examination. Demographic characteristics and clinical characteristics, such as lens status, were recorded. Lens status grading followed the Lens Opacities Classification System (LOCS) III [7, 27] and then simplified for the purposes of the multivariate regression analysis so that clear lens was graded as “clear,” NO1NC1 was considered as 1+NS, NO2NC1, NO1NC2, or NO2NC2 graded as 2+NS, NO3NC1-3, and NO1-2NC3 graded as 3+NS, and the presence of NO4 and/or NC4 was graded as 4+NS.

qCSF-measured contrast sensitivity test and re-test methodology

Contrast sensitivity was measured using the qCSF method on the AST platform (Adaptive Sensory Technology, San Diego, CA, USA), as previously described [13]. In brief, the qCSF method estimates contrast sensitivity function by presenting spatially filtered optotypes to the patient that modulate in both spatial frequency and contrast, thus enabling the efficient testing of contrast sensitivity across multiple spatial frequencies in parallel [10]. Three filtered Sloan letters of the same spatial frequency and decreasing contrast are simultaneously displayed in a horizontal line on a LED screen, at a viewing distance of 400 cm. The contrast of the right-most letter is chosen by the qCSF method and is usually near threshold contrast, with the middle and left-most letters displayed at two and four times the contrast of the right letter, respectively [10]. The patient verbally reports the three letters presented on each screen to the examiner, who operates the test with a handheld tablet, recording “correct,” “incorrect,” or “no response.” The built-in adaptive Bayesian active learning algorithm uses a one-step-ahead search to identify the next grating stimulus (defined by frequency and contrast) that maximizes the expected information gain [10], in a way that data collected at single spatial frequency improve sensitivity estimates across all frequencies. This allows for the device to select and display to each patient personalized optotypes of optimal contrast-spatial frequency combinations that are based on their previous responses. Based on the novel active learning sampling, the quantitative contrast sensitivity function method generates a contrast sensitivity function curve, integrating spatial frequencies ranging from 1 cycle per degree (cpd) to 18 cpd. The respective time for test completion is to 3–5 min per eye [10], a reasonable time that allows for a contrast sensitivity test to be integrated into routine clinical practice. To investigate test-retest variability and agreement, each participant’s right eye was tested two consecutive times in the same day.

Statistical analysis and test-retest reliability and agreement outcomes

Statistical analysis was performed using RStudio version 4.1.2. The population demographics and ocular characteristics were described using traditional descriptive methods. Data that was not distributed normally were reported as median with interquartile range (IQR). In evaluating test-retest repeatability, variability, and agreement of the qCSF-measured contrast sensitivity, the following study outcomes were included for each one of the qCSF metrics: (1) the means of the test and re-test measurements were compared using a paired t-test. (2) The variability of the test and re-test measurement were compared using the Brown-Forsythe test since the data were not distributed normally. Statistical significance was considered when the p-value was <0.05. (3) The intraclass correlation coefficient (ICC) with 95% confidence intervals (CI) was used to evaluate the reliability between the test and re-test measurements. (4) Bland-Altman plots with mean deviation (MD), coefficients of repeatability (CoR), and 95% limits of agreement (LoA) (set at two standard deviations) were used to evaluate the agreement and repeatability between the test and re-test measurements [28]. Density and box plots were calculated to evaluate the distribution of results in the test and re-test trials.

Results

Demographics

Our cohort comprised of 121 eyes of 121 patients. The mean age was 58 ± 18.4 years and ranged from 13 to 89 years old. Fifty-seven percent of patients were male and 43% were female. Regarding race, 76.9% were white, 6.6% were Asian, 4.1% were black or African American, 9.1% were another race, and 3.3% declined to share their race or their race was unavailable. Regarding ethnicity, 84.3% were non-Hispanic, 10.7% were Hispanic, and 5% declined to share their ethnicity or their ethnicity was unavailable. The lens status was normal in 25.6%, 1+ in 28.9%, 2+ in 14.1%, 3+ in 5%, and pseudophakic in 26.5% of eyes. The mean best corrected visual acuity (VA) was 0.21 ± 0.29 logMAR (20/17 to 20/63) ranging from −0.12 to 1.40 logMAR (20/15 to 20/502). The ocular diseases present included: retinal detachment (n=21), epiretinal membrane (n=18), diabetic retinopathy (n=13), age-related macular degeneration (n=12), retinal tear or hole (n=10), lattice degeneration (n=10), central serous chorioretinopathy (n=7), diabetes mellitus without diabetic retinopathy (n=6), open-angle glaucoma (n=5), high myopia (n=4), macular hole (n=4), retinal vein occlusion (n=4), ocular hypertension (n=3), central retinal artery occlusion (n=2), history of retinopathy of prematurity (n=2), choroidal metastases (n=1), long-term hydroxychloroquine use without retinopathy (n=1), cranial nerve VII palsy (n=1), graft versus host disease (n=1), hypertensive retinopathy (n=1), and vitreomacular adhesion (n=1).

Descriptive statistics

Density plots for the test and re-test measurements of each qCSF metric revealed a skewed distribution and a mild learning effect (Supplemental Figure 1). Similarly, box plots for the test and re-test measurements for each qCSF metric reveal slightly increased means for the re-test measurements due to the mild learning effect and similar test-retest variability for all qCSF metrics (Fig. 1).

Fig. 1
figure 1

Box plots displaying the distribution of results in the test and re-test measurements of the quantitative contrast sensitivity function (qCSF) test. The box represents the two middle quartiles of the sample, and the horizontal line within the box is the mean of the sample. The vertical lines extending from the box represent the range of the data sample. AULCSF, area under the logarithm of the contrast sensitivity function; CA, contrast acuity; CPD, cycles per degree

Test-retest reliability and variability

The difference between the means of the test and retest measurements for all qCSF metrics ranged from 0.02-0.05 (Table 1). When comparing the means of the test and re-test measurements using a paired t-test, these differences were found to be statistically significant, despite their small absolute value (Table 1). Brown-Forsythe test revealed no significant differences between the variability of the test and retest measurements for any of the qCSF metrics (Table 2).

Table 1 Results from a paired-T test comparing the mean of each quantitative contrast sensitivity function (qCSF) metric during the initial qCSF test trial and the re-test trial. Significant associations (p < 0.05) are bolded
Table 2 Results from a Brown-Forsythe test of variability comparing the variability of each quantitative contrast sensitivity function (qCSF) metric during the initial qCSF test trial and the re-test trial. Significant associations (p < 0.05) are bolded

The ICC revealed a strong correlation and reliability between the test and re-test measurements for all qCSF metrics, with all ICC values being >0.9 except for the 1cpd metric (ICC=0.838, Table 3). The qCSF metric with the highest ICC was AULCSF (ICC= 0.971, p = <0.001, Table 3).

Table 3 Comparison of test-retest reliability among visual acuity and contrast sensitivity tests

The mean difference ± the standard deviation and coefficient of repeatability (CoR) for qCSF metrics were the following: AULCSF 0.04 ± 0.08 (0.16), CA 0.03 ± 0.09 (0.17), 1cpd 0.03 ± 0.14 (0.27), 1.5cpd 0.03 ± 0.12 (0.22), 3cpd 0.04 ± 0.11 (0.21), 6cpd 0.05 ± 0.13 (0.26), 12cpd 0.04 ± 0.14 (0.26), and 18cpd 0.02 ± 0.10 (0.19) (Table 3, Fig. 2). Upper and lower levels of agreement (LoA) and 95% confidence intervals are also shown in Table 3 and Fig. 2 (Table 3, Fig. 2). A mild learning effect was observed in all qCSF metrics as mean retest measurements were consistently 0.02–0.05 higher than the respective mean test measurements (Fig. 2). For AULCSF, 95% confidence intervals revealed that for 95% of patients, qCSF test and retest difference (levels of agreement) range from −0.12 to +0.21 (without correcting for the learning effect) with the most probable difference being +0.04.

Fig. 2
figure 2

Bland-Altman plots showing the difference between the re-test and test measurements of the quantitative contrast sensitivity function (qCSF) test. The solid black line represents the mean difference between the trials, and the dotted red lines represent the 95% confidence interval for each qCSF outmode metric. AULCSF, area under the logarithm of the contrast sensitivity function; CA, contrast acuity; CPD, cycles per degree; CI, confidence interval

Discussion

In this study, we comprehensively evaluated the test-retest reliability and variability of qCSF-measured contrast sensitivity in the retina clinic. Our results showed very strong reliability (ICC values) and test-retest agreement (Bland-Altman-derived CoR and limits of agreement) for all qCSF outcome metrics, adding a piece of evidence towards validating qCSF-measured contrast sensitivity as a functional endpoint.

In particular, the Bland-Altman plots revealed a mean difference of 0.02–0.05 log units between test and retest measurements for all qCSF metrics. Standard deviations ranged from 0.08 to 0.14. CoR ranged from 0.16 to 0.27, without correcting for the learning effect that was evident in our cohort of qCSF-naive participants (Table 4). When evaluating the reliability of the qCSF test, we found very strong ICC values for all qCSF metrics (all ICC>0.9 except threshold at 1cpd where ICC=0.84, all p<0.001). This is in line with the finding that the test-retest variability of each of the qCSF metrics was not significantly different from test to retest. The means of the test and re-test trials were found to be significantly different from one another for all qCSF metrics, yet the absolute difference of between the test and retest means seems to be clinically insignificant (<0.05 log units). This can be explained by the strict nature of the paired t-test we employed and by the mild learning effect that was observed in our qCSF-naive participants. The mild learning effect observed herein can also be appreciated when examining the results of the Bland-Altman plots, where the absolute value of the upper confidence interval is greater than the absolute value of the lower confidence interval and mean differences are positive for all qCSF metrics (Fig. 2).

Table 4 Comparison of test-retest reliability among visual acuity and contrast sensitivity tests

Our findings are in line with previous smaller reports of qCSF repeatability in healthy volunteers [9, 11, 12], indicating that qCSF-measured contrast sensitivity has great test-retest repeatability, variability, and agreement even in eyes with retinal diseases. Unsurprisingly, same visit repeatability (as reported herein) was better than between-visit repeatability assessed at visits spaced up to 4 months apart (CoR 0.16-0.27 vs 0.21-27) [18]. Surprisingly though, the CoR we report herein for eyes with retinal diseases for AULCSF (CoR=0.16) was found be as good as the CoR reported for the qCSF in healthy volunteers [9], matching the step-wise resolution of the Pelli-Robson chart (0.15 logCS change between consecutive letter triplets).

In Table 4, we provide a comprehensive comparison of repeatability metrics between qCSF-measured contrast sensitivity and the other currently available contrast sensitivity testing methods [29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45]. qCSF repeatability measures are equal to or better than those of the Pelli Robson and the Mars test (Table 4), but the qCSF offers contrast sensitivity thresholds at multiple spatial frequencies as the FDA mandates. Other currently available contrast sensitivity tests that operate across multiple spatial frequencies (such as FACT, Vistech, CSV-1000) have been criticized for having large variability and therefore limited clinical use (Table 4) [14]. As there is no functional endpoint that is currently validated and acceptable besides ETDRS visual acuity, we also provide a companion of qCSF-measured contrast sensitivity and VA in terms of repeatability measures (Table 4).

The US Food and Drug Administration (FDA) recommends change in visual function as a primary endpoint in trials assessing novel ocular therapeutics. A change of 15 Early Treatment Diabetic Retinopathy Study (ETDRS) letters in high-contrast best corrected visual acuity (BCVA) has consistently been used for the approval of octal therapeutics. Yet, in most retinal conditions, VA is insensitive to early changes [1, 2, 5, 8, 15, 46, 47]. It is not uncommon that patients with very good VA have subjective visual complaints and contrast sensitivity deficits [48,49,50]. However, BCVA is currently the only validated visual function endpoint recognized by regulators. This paucity of accepted and validated endpoints prompts for the evaluation of new functional endpoints that could change the paradigm in visual function testing. qCSF-measured contrast sensitivity emerges as a promising endpoint as it is (1) measured in a time-efficient manner (2–5 min per eye) [10, 13]; (2) strongly associated with patient-reported outcomes, even more so than VA is [1]; (3) sensitive to longitudinal changes and differentiates between disease stages better than VA does [4, 17]; (4) well correlated with structural biomarkers in retinal disorders, even more than VA is [3, 4, 17,18,19,20]; and (5) highly repeatable as presented herein.

The main limitation of the study was the diversity of retinal conditions included, which was thought to accurately represent the population of a real-world retina clinic where clinicians seek to measure visual function in a sensitive, time-efficient, and repeatable way. We sought to employ the qCSF method and measure its test-retest repeatability and variability in the “general population” of a retina clinic irrespective of condition in the same way visual acuity is used. Following the logic above, among the strengths of this study is the inclusion of a broad range of ages and visual acuities. Same-day consecutive testing and retesting further strengthens our results. Lastly, our moderate sample size did not allow for a sub-analysis to report on test-retest repeatability of the qCSF method per visual acuity stratum or per different retinal condition; nevertheless, it provides an overall testament of the performance of the qCSF method in the retina clinic. Future work evaluating qCSF’s reliability and variability across different sites will render even more solid evidence in the path towards validating qCSF as a functional endpoint.

In conclusion, in this study, we present data showing very strong reliability (ICC values) and test-retest repeatability and agreement (Bland-Altman-derived CoR and limits of agreement) for all qCSF outcome metrics, adding a piece of evidence towards validating qCSF-measured contrast sensitivity as a functional endpoint.