Introduction

The multifocal visual evoked potential (mfVEP) provides a topographical method for assessing visual function. Compared with conventional pattern reversal visual evoked potentials (VEP), which represents the sum of the potentials across the field tested, mfVEP can detect local changes due to multiple visual inputs, generating a VEP in each corresponding region of the visual cortex.

Significant mfVEP changes have been found in patients with multiple sclerosis [1], diabetes [2], optic neuritis [3, 4], ischemic optic neuropathy [5], compressive optic neuropathy [6], glaucoma [7, 8] and optic disk drusen [9].

A high intra-subject (within-subject) and inter-subject (between-subject) variability has limited the application of the mfVEP in clinical practice. The variability is, in part, due to a combination of factors including: cortical anatomy, skull thickness, relationship between cortex and external landmarks, electrical noise from environment, placement of electrodes, patient’s attention and impedance.

Several studies have attempted to reduce the variability of mfVEPs by using interocular comparison [5], EEG-based scaling [10], selection of best channels [11] and multiple virtual channels [12, 13]. However, the different methods for mfVEP data analysis have not been directly compared, even though they differ between published mfVEP studies.

The two measures typically quantified when performing mfVEPs are amplitude and latency. To account for the effects of various factors (age, gender, etc.) on the amplitudes and latencies, previous work has used multiple regression models, providing waveforms consisting of t-statistics [14]. The quantification of mfVEP amplitude has been assessed using peak-to-peak method [8, 10, 1517], root mean square (RMS) method [5, 18], signal-to-noise ratio (SNR) [19, 20] and the logarithmic signal-to-noise ratio (logSNR) [21, 22]. In Fig. 1, the different quantification methods of mfVEP amplitude are illustrated.

Fig. 1
figure 1

Methods of mfVEP amplitude quantification. a Peak-to-peak method measures the difference between the two peaks. b RMS method measures the square of the amplitude between 45 and 150 ms to rectify the signal, thereafter calculating the square root of the mean rectified amplitude. c SNR method measures the ratio of RMS in a signal window (left) and a noise window (right)

The peak-to-peak method measures the amplitude between the largest peak (positive) and trough (negative). Several studies have used the peak-to-peak method, as it produces a value in nanovolts [8, 10, 1517], and it is the fastest and simplest way of quantifying mfVEP output. However, the peak-to-peak responses can be contaminated by alpha waves and high-frequency noise interference. Patient cooperation and optimal experimental settings are therefore essential when relying on this method.

The RMS method uses the squared value of the amplitude in a given time interval to ensure that all waveforms analyzed are in a standardized positive format. Then, the square root of the mean amplitude is calculated to give the final output [5, 18]. The RMS method is not dependent on a specific waveform due to averaging and use of a specific time segment.

Signal to noise is the ratio of RMS in a “signal window” divided by a “noise window.” The signal window for mfVEP is normally between 45 and 150 ms and the noise window is between 325 and 430 ms, but the reported intervals vary. Most mfVEP studies use responses that are estimated by cross-correlation with the stimuli, which means they provide no standard error in the estimated waveform coefficients. As a work around, the noise levels are conventionally estimated from a section of the estimated waveform that is assumed to contain no response components but which may have small contributions from nonlinear interactions in the response, so-called kernel overlap [23]. As the ratio in SNR is derived from “signal window amplitude” divided by “noise window amplitude” and both values include background noise, the variation due to factors causing this background noise is decreased. In this way the SNR can reduce the variability in results between tests on different days and between individuals and laboratories. Therefore, the SNR method is very useful in follow-up studies and when examining the same patient in different experimental settings. SNR is sometimes used as a direct measure of mfVEP amplitude [19, 20] but more often is calculated to ensure the responses are of a certain quality [12, 13, 24]. LogSNR is normally used when comparing mfVEP amplitude with visual field analysis due to its comparability to log sensitivity reported by automated perimeters [21, 22].

Multifocal visual evoked potential latency can be assessed using monocular and interocular analysis. For monocular analysis, a template from an age- and gender-matched control group can be used [24]. The interocular analysis compares the latency between the patient’s eyes [25]. The advantage of interocular analysis is that it eliminates factors such as cortical convolutions, which can otherwise cause inter-subject variability. Unfortunately, many retinal and optic nerve diseases affect both eyes, thereby negating the usefulness of inter-eye amplitude or latency measurements. The most commonly used methods for calculating monocular and interocular latency delays are the cross-correlation method [2427] and the second peak method [15]. In Fig. 2, the different methods are illustrated.

Fig. 2
figure 2

Methods of mfVEP interocular latency quantification. a Second peak method compares the latency of the top of the second peak between the two eyes. b Cross-correlation method shifts the response from one eye along the x axis to maximum overlap (best correlation) with the response from the other eye, with the amount of shift representing the latency difference between the eyes. OD oculus dexter (right eye), OS oculus sinister (left eye)

For interocular studies, the second peak method compares the latency of the second peak between the two eyes. The cross-correlation method shifts the response from one eye along the x axis to maximum overlap (best correlation) with the response from the other eye. Hence, the amount of shift represents the latency difference between the eyes. The cross-correlation can also be performed with a Gaussian wavelet transform and yields similar results to direct interocular or monocular cross-correlation [28]. The cross-correlation is a more robust method of measuring latency as it solves two important problems evident with the peak-to-peak method. First, the selection of wrong peaks from mfVEP traces is not uncommon when using second peak method resulting in falsely high values. To avoid this error, all traces must be reviewed and manually changed if the incorrect peaks have been chosen by the algorithm. Secondly, some traces will have a “double-hump” morphology caused by recording artifacts. Such artifacts can create a false second peak or a wide peak can be altered by a negative artifact signal, thereby creating a double hump. These artefacts will result in false latency variation [28].

The large variability for both intra- and inter-subject measurements is the main reason why mfVEP has not moved beyond being a research tool and into clinical application. In particular, the high-amplitude variability makes the differentiation between real pathology in the visual system and random physiological fluctuations very difficult.

To the best of our knowledge, no previous studies have investigated the difference in variability between commonly reported methods of data analysis used in mfVEP studies. Hence, the aim of this study was to compare the inter-subject and intra-subject variability of the methods used to quantify mfVEP amplitude and latency.

Methods

Subjects

Twenty-three normal subjects (nine males and 14 females) were included. The median age of the subjects was 29 years (range 26–66 years). None of the subjects had previous or current ocular pathology, nor systemic diseases that could affect retinal or optic nerve function. All subjects were examined by slit lamp biomicroscopy and/or optical coherence tomography (OCT). Mean spectacle corrected visual acuity was 0.88 (range 0.5–1.0). Informed consent was obtained from all participants. Procedures followed the tenets of the Declaration of Helsinki and were approved by the national research ethics committee (HREC 14855).

Stimulation

The visual stimuli were generated in a dimly lit room, on a screen (22-in. high-resolution LCD display; Hitachi, Tokyo, Japan) with brightness 90 % and contrast 65 %. The stimulus consisted of a 56-segment dartboard containing a checkered pattern of 16 checks in each segment. Segments and checks were cortically scaled to stimulate equal areas of the visual cortex. The head position was at a viewing distance of 30 cm from the screen. This resulted in a radially subtended stimulus covering 24° of the visual field. The subjects were tested with non-dilated pupils and optimal refraction. The checks alternated between black and white according to a pseudorandom sequence. To maintain focus, the central 1° of the stimulus screen worked as a subject fixation area, displaying arrows pointing right or left. The subjects used a game controller to respond to the arrows, allowing the investigator to assess the degree of subject cooperation. High subject cooperation in all our patients required the use of optimal refraction at near.

Electrode position

A cross-shaped electrode holder with four gold cup electrodes (Grass Technologies, West Warwick, RI, USA) was placed over the inion. The hair under each electrode was separated and the scalp was cleaned. To obtain two recording channels, i.e., a horizontal and a vertical, the center of the cross was arranged over the inion with the electrodes in a horizontal and vertical pattern (one positive electrode 2.5 cm above the inion, one negative electrode 4.5 cm below the inion, one negative electrode 4 cm left of the inion and one positive electrode 4 cm right of the inion).

A ground ear clip gold cup electrode was attached to the ear.

Recording

The mfVEP was performed using VisionSearch1 (VisionSearch, Sydney, Australia). Commercial designed software (Terra™ software, ver. 1.6, VisionSearch, Sydney, Australia) was used to record and analyze the mfVEP.

Subjects were seated comfortably in front of the stimulus screen. Non-testing eye was covered with an eye patch. After correct positioning of the subject and the electrodes was confirmed, the impedance was measured for both channels. Only impedance less than 25 K Ohms was accepted, but impedance was normally less than 10 K Ohms. The subject was instructed to fixate centrally on the screen and respond to the fixation arrows. The test was repeated until the noise of the trace was reduced to 10 % or less. On average, the test required 12 rounds of stimulation. The electrical signals were amplified 1 × 105 times and band-pass filtered between 1 and 20 Hz. Data sampling rate was 600 Hz with a recording length of 1000 ms. The software automatically correlated the visual stimuli with the recorded electrical potentials to obtain the mfVEP responses. Among the two channels, the waveform with the wave of maximal peak-trough amplitude within the interval of 70–210 ms was automatically selected by the software as best channel.

Data analysis

Peak-to-peak amplitude and second peak latency for each segment were automatically calculated in the software by a specially designed algorithm. Manual confirmation of the chosen peaks was performed. Recordings from best channel were exported for further analysis in Excel (Excel, version 15.0, Microsoft, Redmond, WA, USA). Custom-made programs written in MATLAB (R2012, The Mathworks Inc., Natick, Ma, 2000) were used to compute SNR, logSNR and RMS and for cross-correlation analysis. Intra-subject and inter-subject coefficients of variability (CV) were used to compare amplitude variables (peak-to-peak, SNR, logSNR and RMS) and latency variables (second peak). CV was obtained using the formula: CV = Standard deviation/mean.

Intra-subject CV of amplitude was calculated as the standard deviation of all sectors from the subject’s amplitude recording divided by the mean amplitude of all sectors. The term intra-subject variability in the study therefore referred to the regional differences in amplitude between the 56 segments from a single mfVEP output, and not to repeated measurements as seen in most studies.

The inter-subject CV of amplitude was calculated as the standard deviation of the mean amplitude of all subjects divided by the mean amplitude of all subjects.

In the latency asymmetry analysis, the standard deviation (SD) was used as a measure of variability. The CV was not used for interocular latency analysis because the low means attained when subtracting the latency of one eye from the other made the CV misleading.

Statistical analysis

Only right eye was used in testing for statistical significance to avoid correlation bias.

Distribution of data was visualized. A linear regression analysis was performed between the peak-to-peak and SNR methods. For each method, intra-subject CVs were estimated by their sample means. Pairwise comparisons were made by means of Z tests accounting for inter-marker correlation nonparametrically using the methodology described in [29]. Resulting alpha levels were adjusted by means of Bonferroni’s correction. For each method, inter-subject CVs were estimated as the ratio of sample standard deviation and sample mean. Pairwise comparisons were made by means of Z tests accounting for inter-marker correlation obtained by nonparametric bootstrap with 5000 bootstrap samples. Resulting alpha levels were adjusted by means of Bonferroni’s correction.

Intra-subject asymmetry variability using SD was compared with nonparametric Wilcoxon’s paired rank test, and inter-subject asymmetry variability using SD was compared with nonparametric bootstrap with 5000 bootstrap samples.

The predetermined level of statistical significance for the comparisons was p ≤ 0.05. The statistical analysis was performed using the SAS program for Windows (version 9.1, SAS Institute, Cary, NC, USA).

Results

The mean mfVEP amplitude using peak-to-peak values was 169.1 nV (CI 95 % 152.9–185.3 nV). Mean RMS was 51.8 (CI 95 % 47.0–56.5) and mean SNR was 4.6 (CI 95 % 4.3–5.0). There was a significant correlation between the peak-to-peak method and SNR method (R 2 = 0.69, p < 0.001) (Fig. 3).

Fig. 3
figure 3

Correlation between peak-to-peak and SNR methods. SNR signal-to-noise ratio

In Fig. 4, the different methods of assessing mfVEP amplitude and their CV are compared. Significant differences were found between peak-to-peak and SNR, peak-to-peak and RMS, logSNR and SNR, and logSNR and RMS. Overall, logSNR and peak-to-peak had a significantly lower intra-subject CV when compared with RMS and SNR. The inter-subject CV was 19.9 % in the peak-to-peak method, 21.1 % in the RMS method, 16.9 % in the SNR method and 11.2 % using logSNR method. Significant differences were found between logSNR and SNR, logSNR and RMS, and logSNR and peak-to-peak. Overall, logSNR had a significantly lower inter-subject CV when compared with SNR, RMS and peak-to-peak.

Fig. 4
figure 4

Intra-subject and inter-subject coefficients of variability using peak-to-peak, logSNR, RMS and SNR methods. Asterisk significant difference (p < 0.05). CV coefficient of variability, SNR signal-to-noise ratio, logSNR logarithmic value of signal-to-noise ratio, RMS root mean square, PtP peak-to-peak

Mean mfVEP second peak latency was 147 ms (CI 95 % 146–149 ms). Second peak latency inter-subject CV was 3.2 %, and second peak latency intra-subject CV was 8.6 %.

Latency asymmetry value for the cross-correlation analysis was 1.7 ms (CI 95 % 1.2–2.3) and for the second peak analysis 2.5 ms (CI 95 % 1.7–3.3 ms). Inter-subject asymmetry variability expressed as SD was 1.8 ms (95 % CI 1.4–2.5 ms) using second peak method and 1.2 ms (95 % CI 1.0–1.8 ms) using cross-correlation. A significant difference in inter-subject asymmetry variability was found between the two methods (p < 0.001). Intra-subject asymmetry variability expressed as SD was 6.1 ms (95 % CI 5.2–7.0 ms) using cross-correlation and 12.6 ms (95 % CI 10.6–14.6 ms) using second peak analysis. A significant difference in intra-subject asymmetry variability was found between the two methods (p < 0.001).

Discussion

Data analysis is required to quantify mfVEP amplitude and latency recordings; however, no gold standard for this analysis has been published. If the mfVEP is to evolve into a useful clinical tool, it is important to understand how the data analysis can affect the quantitative outcomes. Furthermore, it is important to know the most reliable parameters for this quantification. Our study is, as far as we know, the first that compares the variability of different methods for data analysis of mfVEPs. The results of this study demonstrate what differences on intra-subject and inter-subject variability can be expected when using the most common mfVEP analysis methods.

Considerable overlap in mfVEP parameters has been shown when comparing normal controls with patients. Rodarte et al. [7] found no difference in mfVEP latency between normal controls and glaucoma patients, with the exception of one high-tension glaucoma patient who fell outside the control group range.

The low clinical reliability, especially in amplitude and in monocular analysis, is continuously a challenge in the mfVEP and makes latency the preferred parameter in most mfVEP studies. A study by Grippo et al. found a latency delay in patients with optic disk drusen. However, they did not assess amplitude although amplitude abnormalities would mainly be expected due to the compressive nature of the optic disk drusen [9]. Our study has confirmed a low variability in latency values compared with amplitude values. While this may be useful in studying conditions such as multiple sclerosis, there may be valuable information to be gained by appropriate study of amplitude data.

The peak-to-peak amplitude inter-subject CV has previously been assessed in relation to electroencephalogram (EEG)-based scaling of mfVEPs by Klistorner et al. [10]. An inter-subject CV in peak-to-peak amplitude of approximately 14 % was seen after the application of EEG-based scaling. This is a similar outcome to our results. The slightly lower variability could be due to the more sophisticated normalization method. However, the study only assessed variability using the peak-to-peak method.

The results of our study show that logSNR and thereafter peak-to-peak are the preferred methods in detection of local defects. The smaller intra-subject CV indicates that the amplitude varies the least from segment to segment. This is also applicable when performing sectorial analysis.

LogSNR proved to have a small inter-subject CV, which is important when comparing groups. A low inter-subject CV makes it easier to find small differences between patients with and without abnormalities, as the confidence limits will be narrow.

Asymmetry analysis has been used in the assessment of monocular optic nerve damage. Most of these studies use the cross-correlation method to determine the difference in latency between the eyes [6, 7]. We found a significantly higher variability using second peak method when compared with cross-correlation method. Therefore, we recommend the use of cross-correlation method for asymmetry analysis.

The main limitation of this study is that the interpretation of the coefficient of variation is closely linked to the normal distribution. As a consequence, this quantity is, for instance, not invariant with respect to monotone transformations of the marker for which it is calculated. This is contradictory to the logical reasoning that a one-to-one transformation of a marker should not change its ability to differentiate samples. In the context of our research, logSNR and peak-to-peak are preferred based solely on a coefficient of variation evaluation. However, such a ranking should not be based on the coefficient of variation alone, but also on choosing a scale on which normality can be assumed.

In conclusion, this study emphasizes the importance of choosing the right method in mfVEP data analysis and establishes the choice of data analysis as another factor that can lead to a decreased variability. Different methods for quantifying mfVEP amplitude have different indications dependent on the purpose of the study. For comparison of groups, logSNR or SNR would be preferred because of their smaller inter-subject CV. When looking at an individual mfVEP, as may be done in a clinical setting, the logSNR or peak-to-peak methods would be the preferred methods of choice.