Introduction

Recent therapeutic advances have revolutionised the management of ophthalmic disease, but outcomes are still dependent on timely assessments for disease detection and monitoring [1]. Prominent examples include the two leading causes for visual impairment in the developed world, age-related macular degeneration (AMD), and diabetic retinopathy. In these conditions progression may be too subtle to rely on patients to self-report symptoms and so frequent hospital visits may be required to allow for expedient treatment with intraocular injections when the disease is deemed to be active. Alternative treatment regimens may involve injecting at each hospital visit, but even then, optimum care depends on patients returning sooner should there be significant visual deterioration. Although home visual assessments may never be as precise a measure of disease activity as hospital based imaging and examinations, they could be used to reduce the total burden of hospital visits and help detect deterioration more expediently [2]. Home vision monitoring is also potentially useful for the management of many other ophthalmic conditions, such as vein occlusions, drug toxicity, cystoid macular oedema, or monitoring occlusion therapy in amblyopia [3].

The potential for using computerised measurement of visual acuity (VA) was reported as early as 1970 when researchers at Berkeley, CA used a basic computer to operate a slide carousel, projecting ever decreasing sized Landolt C images onto a screen [4]. Later developments included the use of Palm mobile platforms (Palm, Inc., Santa Clara, CA, USA) to display visual stimuli on a CRT monitor for the assessment of distance visual acuity (VA) [5, 6] and wireless keyboards to relay information to monitors via computers [7, 8]. Advances have been made through the use of dedicated hardware and software algorithms to improve detection and monitoring of macular degeneration with dedicated visual function measures [9] and the use of tests such as shape discrimination on a mobile phone to detect maculopathies [1012]. Recently, mobile devices such as iPads have specifically been used for many different tests of visual function, including VA. For example, the PEEKvision team are developing a mobile device for monitoring vision and are currently testing it in developing countries [13, 14]. Many systems have not yet been validated, but for those robust clinical studies that have been presented, limitations in screen resolution have dictated that acuity is measured with distance tests [1012]. They show that if care is taken to standardise factors such as correct distance and position of the tablet computer, external glare sources are removed and the tests are conducted by trained personnel, results similar to the gold standard chart based tests can be achieved.

With clinical and socioeconomic demands rising in parallel with an explosion in availability of home computing technology, it is important that the potential for computerised home vision testing is widely exploited. In particular, there is a need for effective, portable, and fully automated systems of self-testing of vision for patients with ophthalmic pathology. To deliver this optimally with modern tablet computers, inconsistencies in physical screen luminance need to be controlled [15] [16] and sources of glare or reflections must be minimised [10]. Beyond this, however, more complex challenges exist as typical patients may not be accustomed to computer use, and may have visual or physical difficulties that limit their ability to use them. They may be reluctant to interact with such a device and might find it difficult to follow instructions or maintain adequate concentration. Crucially, therefore, the electronic touch screen interfaces must be adapted to the specific needs and abilities of patients [17] and dedicated algorithms that are effective and engaging for the target population with the minimum of additional instruction will have to be developed in order to test effectively and autonomously [5, 6]. This paper describes the development and testing of a novel and innovative system of automated self-testing of vision effective for typical patients with eye disease. The system addresses the above challenges and uses current widely available tablet technologies that would be easily adaptable to home use.

Methods

Ethics

The research adhered to the tenets of the Declaration of Helsinki and ethics committee approval was obtained (National Research Ethics Service, UK committee reference number 10/H1013/58). All patients gave written consent after reading the patient information sheet (as approved by the ethics committee).

Development of the MAVERIC system

The Mobile Assessment of Vision by intERactIve Computer (MAVERIC) system consists of a tablet computer running purpose-built software coded with ActionScript 3 and housed in a bespoke physical booth (Fig. 1). The first design was founded on previous tablet computer tests [15, 16] and the team’s experience in clinical and experimental vision testing. However, many structural and software limitations of the design were revealed when testing on patients and these were iteratively addressed before re-testing. The electronic capabilities of the tablet, such as providing voice or sound feedback and measuring response time were also explored and incorporated in to the latest software. This development phase of the project lasted around one year with tests on 70 further patient tests completed and over 50 significant iterations to either software or hardware before we arrived at the current version of MAVERIC.

Fig. 1
figure 1

Figure of MAVERIC booth containing tablet computer with a screenshot of tablet computer with mouse game

Physical characteristics

The MAVERIC system consists of a combination of specialised software on a tablet computer, housed in a bespoke viewing booth. For the high contrast visual acuity assessments, an iPad3 (©Apple Inc.) was used. The structure allowed the patient easy access to respond to the iPad touchscreen whilst maintaining the correct viewing distance and minimising errors associated with angle of view, screen luminance, and the effect of external light reflections (Fig. 1). The device consists of an oblong booth, 50 cm long, with a viewing aperture at the front end, looking into an internal, adjustable tablet computer mount at the opposite end of the booth at a distance of 40 cm. This enables the tablet to be secured and remain perpendicular to the observer’s line of sight. The whole structure is elevated at the front end, allowing for a more natural reading angle for the viewer while also giving space for access to patients’ hands to operate the touch screen. This space is protected by a curtain to reduce the likelihood of light infringement onto the tablet screen. Protecting the screen from outside illumination removes the need for any special light controls in the testing room itself, an important consideration for a device that could be used for home testing. The distance to the tablet is adjustable through guide rails from 50 to 25 cm, whilst maintaining a perpendicular observation angle. For this study it was kept constant at 40 cm. A switch built into a forehead rest gives an audible indication to the patient when their head is correctly positioned at the viewing aperture.

The physical capabilities of the tablet screen resolution dictated the design of the acuity test, which is based upon gap detection in a square array resembling a Landolt “C”. We have previously established the screen uniformity and stability with power of these devices [15] and the tablet was carefully calibrated with a photometer. For the assessments of acuity, the central target luminance was set at the minimum of 0.57 cd/m2 and surrounding luminance set at the maximum of 397.6 cd/m2, giving overall contrast of 99 %.

The target is constructed from square blocks of 1, 4, 9, 16 …400 pixels. These blocks are arranged to form a 5 × 5 open square, with one block missing from the centre of one side (Fig. 2). The nominal target size is expressed as the gap size in log minutes of arc (logMAR), thus allowing a direct translation to the gold standard clinical logMAR acuity tests. The gap position is determined randomly. There were 20 available testing sizes based on the 40-cm testing distance and the current iPad screen resolution. The resultant pixel pitch limited testing resolution at the smaller letter sizes, with the initial testing step size going from −0.08 logMAR (6/5 Snellen) to 0.22 logMAR (6/10). During initial design we anticipated that until screen resolutions advanced this range would suffice to allow detection of clinically important decreases in vision for a large number of clinical patients, and indeed our baseline VAs of typical ophthalmic patients in this study concur. The system can, however, be easily adapted to be more sensitive to any vision loss for other individuals by varying the pre-set distance to the touch screen. Such adaptations have now been rendered unnecessary by the recent availability of higher resolution screens that are compatible with our hardware and software to allow our algorithms to immediately measure finer step sizes. The Nexus 10 (Google Inc.) would allow for measurement of up to −0.16 logMAR (6/4) with 0.15 logMAR (6/8.5) as the next step at a testing distance of 40 cm. Similarly, the iPad mini retina display and Galaxy Tab Pro 8.4 (Samsung Electronics) would also allow for smaller increments of testing [minimum of −0.22 logMAR (6/3.7) with the next step at 0.09 logMAR (6/7.4) when testing at 40 cm].

Fig. 2
figure 2

Target construction. Each target (shown above left) is composed of 15 blocks. Each of these 15 blocks is made up of a square cluster of varying numbers of pixels. The inset demonstrates how the increasing acuity sizes are built up using different numbers of individual pixels to make up each block

When the low contrast acuity testing phase began, newer tablets had become available which offered greater resolution. Thus, for the low contrast assessments, a Galaxy Tab Pro 8.4 (Samsung Electronics) was used. The range of tested acuities was between −0.22 logMAR (6/3.7) to 1.09 logMAR (6/70). The screen was calibrated in the same manner as previously described and for the assessments of low contrast acuity, the central target luminance was set to 75.1 cd/m2 and the surrounding luminance to 125 cd/m2, giving a mean luminance of 100 cd/m2 and Weber contrast of 25 %. In addition, bars were added around the near Landolt C target to produce a crowding effect.

User interface

Four virtual buttons are provided; each is adjacent to one of the possible gap positions (see Fig. 1). The user is required to press the button that corresponds with the location of the gap. The central target reduces in size and randomly changes the gap location as the test progresses. Following extensive patient testing/feedback sessions these buttons were modified to be resistant to typical hand tremors seen in clinic and button size, colour, and pattern were varied until an optimum graphic was achieved. The button is animated so that it appears to have been pressed, and this is accompanied by a click, thus providing audible and visible feedback. The software is also programmed to give verbal encouragement to the patient to keep trying if they fail to respond to a target within a time limit. This was introduced to help reduce the risk of the threshold acuity being underestimated through either loss of attention or hesitancy to guess. If no response was given even after the verbal encouragement, an incorrect response was recorded.

Boredom from repetitive tasks was a considerable issue until we implemented a gamification element based on experience of testing in children [18]. Various methods were assessed before one was implemented that the adults responded best to with improved feedback, concentration, and completion of testing. This involved different forms of animals appearing from behind the buttons and running into the gap within the C. In one animation, when correct responses are made, mice appear to take cheese through the gap. In others, pigs or sheep enter into the gaps which represent entrances to a pen. When the pen is full there are additional appropriate reward animations. Preliminary testing showed that the introduction of these cartoon graphics also increased compliance in adults.

Testing algorithms

The MAVERIC vision testing strategy evolved from a combination of published reports using computerised adaptation of standardised and validated VA testing protocols [5, 6]. In preliminary testing, the threshold determination was based only on a single test phase. However, this resulted in higher thresholds (poorer VA) than those achieved through the chart-based methods in many subjects. We, therefore, introduced two additional testing phases to make threshold testing more robust and reduce the chances of correct responses based on guesses. This more thorough testing paradigm then introduced another problem as some subjects became bored and frustrated when testing at threshold for long periods of time, and the loss in concentration resulted in poor threshold results. Threshold tests are difficult to do, as they require higher levels of attention. To counteract this, a quick and easy intermediate test between these two additional phases was introduced. This proved successful in re-establishing patient attention and interest was maintained in the test before the final testing phase began. The principle of using multiple tests concurs with other established vision-testing algorithms [5, 6].

Phase 1 – initial threshold

This involved a screening test and used the mouse and cheese graphics. The initial stimuli was presented at 1.22 logMAR, the largest of the 20 acuity levels that could be displayed, and in this screening phase two correct responses resulted in the target size becoming smaller by two step sizes. Two incorrect responses caused the test to end with the resultant visual level represented by the last size that was determined correctly.

Phase 2 – threshold

This phase involved detailed threshold detection using sheep and sheep pen graphics. It started one level up from the threshold level determined in the screening phase 1. In this threshold phase 2, three out of four correct responses were required to progress to the next reduced target size. If this was failed, the same target size was repeated, and if failed twice in a row, the phase was ended and the threshold taken as the last sequence of three correct responses out of four.

Phase 3 – attention check

In this phase, two determinations were made using a target 4 steps above threshold along with a new graphic of a cartoon pig.

Phase 4 – final threshold

In this phase, the threshold was assessed a final time to ensure that the maximum possible VA was indeed reached, again using the sheep pen graphics. The final test result was the highest level of vision recorded in either phase two or four.

During each phase, an adaptive algorithm was employed that gave a set period of time, dependent on the previous fastest test response, before displaying the incorrect response graphic. Total testing time to achieve the final acuity threshold was approximately 5 min.

Testing for reliability and utility

High contrast visual acuity testing

After initial pilot testing and development was completed, 81 sequential patients were recruited from retinal outpatient clinics at the Manchester Royal Eye Hospital. Patients were not excluded on the basis of nationality, native language, or age. The only exclusion criteria were the physical possibility of performing a test that required use of a functioning hand and VA of at least 1.22 logMAR (6/100) in one eye. The research adhered to the tenets of the Declaration of Helsinki, and ethics committee approval was obtained. Informed consent was obtained for all patients. Right eyes were tested unless the vision in that eye was below the inclusion limit, in which case the left eye was tested.

The testing procedure involved giving the study eye the best possible distance correction and a near add of +2.5D in all presbyopic patients to correct for the near MAVERIC test, while occluding the fellow eye using an eye patch. The patient was shown the tablet computer alone and the principles of the test were demonstrated. When the patient appeared to understand the requirements and was happy to proceed, the tablet was placed within the viewing booth and the patient was invited to look through the viewing aperture. The patient began when ready by pressing a large central start button on the iPad and proceeded with the automated test. No further directive external input was given, with the test moving through levels automatically, giving automated encouragement where required and recording responses to allow for automated modulation of the test as it progressed. The patient and examiner were notified with a cheer sound to signify the end of the final phase of testing. The process was repeated to obtain a mean score from two cycles.

Masked to the MAVERIC vision result, the examiner then tested near VA using a near Landolt C chart (Precision Vision, IL, USA 6130) according to standard protocols. Finally, approximately 15–20 min after the original MAVERIC test, a second MAVERIC test was initiated.

Low contrast acuity testing

For the low contrast acuity testing, 95 patients were recruited from retinal outpatient clinics at the Manchester Royal Eye Hospital and tested with a low contrast version of the MAVERIC test. None of these had participated in the high contrast study. Exclusion criteria were as before. Other testing procedures remained unchanged except that a different chart test was used as a comparator. For this, near low contrast acuity (25 %) was determined using a handheld EDTRS chart according to standard protocols. Sixty-two of the 95 MAVERIC low contrast subjects had testing repeated for the reliability study.

Statistical analysis

Test-retest reliability of the MAVERIC system was assessed using the Bland-Altman limits of agreement (LOA) method [19] to assess agreement between the two measures. We also opted to use the Bland-Altman limit of agreements method [20] to assess thoroughly the relationship of the MAVERIC score against the gold standard near charts (Landolt C and handheld EDTRS) .

Results

High contrast visual acuity measurements

Out of the 81 patients who agreed to the study, 78 (96 %) were able to complete the MAVERIC test without assistance, including patients who had interpreters and for whom English was not a fluent language (four patients). Subjects who could not complete the test themselves have been excluded from the analysis. The average age (±1 s.d.) was 61 (±14) years, and there were 37 men and 41 women. The pathologies of patients included 12 with no eye disease, five with AMD, 12 with other macular diseases, 16 with diabetic eye disease, eight with glaucoma, eight with cataracts, and 17 with miscellaneous pathologies such as retinal vascular diseases, naevi, and vitreous detachment.

Repeatability testing

Seventy-eight patients had MAVERIC scores repeated on two occasions. The Bland-Altman plot is presented in Fig. 3 (left plot) with differences randomly scattered around the mean. The differences were approximately normally distributed with a mean of 0.003 and a standard deviation of +/−0.09. Limits of agreement (LOA) of 2SD were +/−0.17 (95 % CI for the upper LOA was +0.181 to +0.180 and lower LOA was −0.173 to −0.174).

Fig. 3
figure 3

Bland-Altman plot of repeatability of MAVERIC test between first and second measures (left plot) and MAVERIC vs. near Landolt C test (right plot). Thick black line shows mean difference, thin black lines show +/−1.96 SD, and dashed red lines show 95%CI for the upper and lower LOA

Association with other vision tests

Figure 3 (right plot) shows the Bland-Altman plot for the average MAVERIC acuity scores and the near Landolt C scores. The differences were approximately normally distributed with a mean of −0.03 and a standard deviation of +/−0.16. Limits of agreement of 2SD were +/−0.31 (95 % CI for the upper LOA was +0.282 to +0.280 and lower LOA was −0.342 to −0.344).

Low contrast acuity measurements

Out of the 95 patients who agreed to the low contrast visual acuity test, 93 (98 %) were able to complete the MAVERIC test without assistance. Subjects who could not complete the test themselves have been excluded from the analysis. The average age was 69 (±15) years and there were 42 men and 51 women. The pathologies of patients included 16 with no eye disease, 24 with AMD, 14 with diabetic eye disease, 12 with retinal vein occlusion, and the rest miscellaneous pathologies such as retinal vascular diseases, cataract, glaucoma, and macular diseases other than AMD.

Repeatability testing

Sixty-two patients had MAVERIC contrast sensitivity scores repeated on two occasions. The Bland-Altman plot is presented in Fig. 4 (left plot) with differences randomly scattered around the mean. The differences were approximately normally distributed with a mean difference of 0.02 and a standard deviation of +/−0.12. Limits of agreement of 2SD were +/−0.23 (95 % CI for the upper LOA was +0.202 to +0.304 and −0.261 to −0.16 for the lower LOA).

Fig. 4
figure 4

Bland-Altman plot of repeatability of MAVERIC low contrast acuity test between first and second measures (left plot) and MAVERIC vs. near low contrast EDTRS test (right plot). Thick black line shows mean difference, thin black lines show +/−1.96 SD, and dashed red lines show 95%CI for the upper and lower LOA

Association with other vision tests

Pixel size limitations meant that acuities that could be output by MAVERIC were slightly different to those for chart-based test. Figure 4 (right plot) shows the Bland-Altman plot for the average MAVERIC low contrast acuity scores and the near EDTRS scores. The differences were approximately normally distributed with a mean of 0.07 and a standard deviation of +/−0.15. Limits of agreement of 2SD were +/−0.30 (95 % CI for the upper LOA was +0.321 to +0.430 and lower LOA was −0.286 to −0.176).

Discussion

Repeatability of any vision test is of great clinical value as patient management decisions are frequently based upon change in vision. Using the gold standard acuity chart and experienced optometrists conducting the testing on patients, mean difference in repeated measures in patients with macular degeneration has been shown to be −0.024(+/−0.306) for high contrast logMar VA [21].

This compares well with the MAVERIC tests, conducted without expert intervention, on a population of similar ophthalmic patients. We found comparable repeatability with acceptable limits of agreement: a mean difference for both tests of approximately one letter on the chart [−0.03 (LOA +/− 0.17 for high contrast VA and 0.02 (LOA +/− 0.23) for low contrast VA].

On examination of the Bland-Atman charts, a funnel effect can be seen more prominently for the low contrast charts suggesting greater reliability at poorer levels of acuity. This is perhaps due to an artefactual anomaly of greater number of pixels in larger sizes allowing for a great range of target sizes that can be presented for measuring the higher logMAR levels. This artefact should be improved with ongoing advances in screen technologies.

In addition to reliability, we have assessed agreement between measures of MAVERIC acuity and chart-based gold standard measures. We would not, however, expect MAVERIC results to mirror those of standard chart tests and acknowledge that in effect they are testing different visual functions. Although the physical characteristics of the targets may be matched there would be differences between the two tests such as in terms of the level of training, level of human encouragement and interpretation, physical response requirements (verbal or motor), interaction with technology, sequence of tests. Thus, although the comparator tests were chosen as they were the nearest possible test that had accepted validity, they still represent different psychophysical tasks to the MAVERIC versions. In some respects the computerised tests may be superior (greater objectivity in recording responses, use of timing, more standardised instruction) and in some ways inferior (limited range of acuities/contrast levels). In particular, pixel size limitations meant that acuities tested by MAVERIC were slightly different to those in the chart-based test, the MAVERIC test used an illuminated screen and also used time taken for input. However, rather than use regression coefficients to compare the MAVERIC with similar tests, we used Bland Altman style reliability indices to allow for greater examination of the relationship. For high contrast acuity the Bland-Altman mean difference equated to being approximately one letter on the near chart at −0.03 and the LOA of +/−0.31. For low contrast near acuity the mean difference was slightly larger at 0.07, approximately three letters on the near chart. Both values equate favourably with the study done by Kaiser [22] who looked at a similar population of mainly elderly patients (n = 163, mean age 65.6 ± 18.9 years) with a large range of visual acuities (0–2.0 logMAR) comparing two established letter test chart types, EDTRS, and Snellen charts. They found a mean difference of −0.13 logMAR (approximately six letters) with LOA of ±0.35.

Perhaps the greatest current limitation of near visual acuity measured using a device with the size and resolution of the iPad 3 is the aforementioned pixel pitch limitation. At 40 cm, after logMAR −0.08, the next recordable acuity represents a significant jump to logMAR 0.22. While the Bland Altman plot (Fig. 3) showed that visual acuity was slightly better (by approximately one letter one average or −0.03 logMAR) on the test chart compared to the MAVERIC test score, there was no systematic variation in differences between the MAVERIC and logMAR chart at different VA levels, indicating that no systemic bias was introduced at the lower range of tested acuities. Furthermore, manually moving the screen forward to 25 cm and repeating the test with a +4 lens, thereby also testing 0.12 logMAR acuity, could improve this range of testing acuities. This was not done for this study in order to comply with the intended remit of no experimenter intervention for the entirety of the test and also as advances in screen technologies will render this unnecessary in the near future. Indeed, the Galaxy tab 8.4 used for the low contrast acuity measures already had a better initial step size (−0.22 logMAR to 0.09 logMAR), allowing more precise measurement at the higher visual acuity levels. This will only improve as screen technologies advance.

We chose to develop a near rather than distance acuity test. A distance VA test would not allow for a direct touch screen response and would require either an examiner to be present or a remote device to be used. It would also necessitate the test to be set to the correct testing distance, at least 3 m away. Control over illumination, to minimise glare sources and reflections, would be more difficult, and the test would most likely have to be conducted in a dark room. These practical implications would render the device more difficult to setup correctly and use at home as a self-testing device and these considerations led us to develop a near VA test in a self-contained, portable unit.

Beyond these physical issues, our challenge was to develop a system whereby older patients with limited experience of computer tablets could initiate a testing process and maintain it autonomously until completion of the test and calculation of an accurate threshold visual function. Although the precise level of involvement of experimenters is often not detailed in published papers on electronic testing, few published studies explicitly claim to complete testing without any operator interaction during the procedure. Reading et al. [23] did report a test that was stated to be self-administered, but also reported that only 70 % were able to complete without help and noted that the test had some unfavourable feedback from older patients. Indeed, our own studies have highlighted the importance of good user interfaces to improve comprehension and ease of use with devices—even small changes such as type of voice (male to female) have had impact on patient compliance. It was also clear from publications that special consideration must be given to the algorithms behind vision testing to minimise the need for external instruction during the test. Ruamvibasoon, Beck, and Moke in particular recognised the importance of well-designed algorithms to achieve good vision measures [5, 6, 8] though they did not test the particularly challenging group of elderly patients with ophthalmic disease used in this study. By enhancing the concepts used in those studies through original gamification principles, animations, voice feedback, and individualised timed responses, our own study was able to demonstrate utility even amongst older patients with eye disease attending hospital eye clinics. The result was that, although many patients voiced initial hesitation over their lack of experience with computers, after a brief period of explaining the principles of the test, practically every patient was able to continue it to completion (97 % over the two patient groups).

While study numbers were comparable to those in the literature (total subjects were 173 in the current study compared with 100 in Aslam et al. [21], 86 in Reading et al. [23], and 163 in Kaiser [22]), we are currently expanding testing on more patients, including children, and will be reporting on them in the near future.

Conclusion

This paper demonstrates the potential utility of a system for patients with ophthalmic disease to self-test their VA with a high degree of reliability and agreement when compared with gold standard chart based measurements. The major limitation at present is screen resolution and this will inevitably improve with time. The ease of availability and mobility of components means that such devices could in future be used for home testing or testing in general practice settings and further studies are needed to confirm this. Many older patients are now becoming familiar with electronic devices and the above-mentioned resistance to the use of an electronic device in the home can be expected to decline. Although VA is the most commonly clinically performed visual function test across all pathologies, the principles demonstrated in MAVERIC can easily be adapted for testing of other visual functions.