1 Introduction

Understanding processes underlying visual perception has been a focus in various research fields including medicine, psychology, advertisement, autonomous driving, or application control. Recent developments in head-mounted eye tracking have enabled researchers to study human visual perception and attention allocation in natural environments. Such systems generally fall into two categories, i.e., remote tracking systems, where the subject is recorded with an external camera, and head-mounted eye trackers. The main challenges that have to be faced in remote eye tracking are the robust detection of the subject’s face and eyes. Several techniques have been proposed for the robust face (e.g., [9, 10]) and eye recognition as well as hallucination (e.g., [11]) in low resolution images.

The measurement of eye movements in head-mounted eye trackers is based on one camera that records the viewed scene and, however, at least one additional camera, which is directed towards the subject’s eye to record the eye movements (e.g., Dikablis Mobile eye tracker, Pupil Labs eye tracker [15], SMI Glasses, Tobii Glasses). The gaze point is then mapped to the viewed scene based on the center of the pupil and a user-specific calibration routine. A crucial prerequisite for a robust tracking is an accurate detection of the pupil center in the eye images. While eye tracking can be accomplished successfully under laboratory conditions, many studies report the occurrence of difficulties when eye trackers are employed in natural environments, such as driving [1, 12, 16, 20], shopping [14, 26], or simply walking around [28].

The main source of error in such settings is a non-robust pupil signal that is mainly related to challenges in the image-based detection of the pupil. Schnipke and Todd summarized in [25] a variety of difficulties occurring when using eye trackers, such as changing illumination, motion blur, recording errors, and eyelashes covering the pupil. Rapidly changing illumination conditions arise primarily in tasks where the subject is moving fast (e.g., while driving) or rotates relative to unequally distributed light sources. Furthermore, in case the subject is wearing eye glasses or contact lenses, further reflections may occur (see Fig. 7, Data set III and XVIII). A further issue arises due to the off-axial position of eye camera in head-mounted eye trackers, e.g., when the pupil is surrounded by a dark region (see Fig. 7, Data set VII). Other difficulties are often posed by physiological eye characteristics, which may interfere with detection algorithms, such as additional dark spots on the iris. Therefore, studies based on eye tracking in uncontrolled environments constantly report low pupil detection rates [13, 20, 33]. As a consequence, the data collected in such studies has to be manually post-processed, which is a laborious and time-consuming procedure. Furthermore, such post-processing is impossible for real-time applications that rely on pupil monitoring (e.g., driving assistance based on eye tracking input [31], gaze-based interaction [22, 27, 34], eye-based activity and context recognition [13] and many more). In light of the above challenges, real-time and accurate pupil detection is an essential prerequisite for pervasive video-based eye tracking.

2 State-of-the-art methods for pupil detection

Fig. 1
figure 1

Visualization of the Starburst [18] algorithm. First the input image (a) is smoothed. Rays are sent out from a starting position, where edge values excelling an edge threshold are selected as pupil contour candidates (b). These points serve then as new starting points for rays going in the opposite direction, providing thus new contour candidates (c). The search process is repeated iteratively until convergence (d). The pupil center is finally estimated using a RANSAC ellipse fit (e)

Over the last years, there has been a large body of work on pupil detection. However, most of the approaches address pupil detection under laboratory conditions, e.g., in [7, 17], both employing a histogram-based threshold. Such algorithms can be applied to eye images captured under infrared light as in [19, 21, 24]. Another threshold-based approach was presented in [38], where the pupil is detected based on the calculation of the curvature of the threshold border. An isophotes curvature-based approach is presented in [35], using the maximum isocenter as pupil center estimation. Probably the most popular algorithm in this realm is Starburst, introduced by [18] in 2005. In 2012, Swirski et al. [30] proposed an algorithm, which starts with a coarse positioning using Haar-like features and then refines the pupil center position.

The open source head-mounted eye tracker from Pupil Labs also comes with a pupil detection algorithm designed for unconstrained everyday settings [15]. Wood et al. [37] presented a model-based gaze estimation system for unmodified tablet computers. Their gaze estimation pipeline also includes a Haar-like feature based eye detector as well as a RANSAC-based limbus ellipse fitting approach. Three recent methods, SET [8], ExCuSe [5], and ElSe [6], explicitly address the aforementioned challenges associated with pupil detection in natural environments. SET is based on thresholding and ellipse fitting. ExCuSe [5] first analyses the input images with regard to reflections based on intensity histograms. Several processing steps based on edge detectors, morphologic operations, and the Angular Integral Projection Function are then applied to extract the pupil contour. Similar to ExCuSe, ElSe is also based on edge filtering, ellipse evaluation, and pupil contour validation [6].

The algorithms Starburst [18], Świrski [30], Pupil Labs [15], SET [8], ExCuSe [5], ElSe [6] will be presented and discussed in detail in the following subsections. We compared these algorithms on a large corpus of hand-labeled eye images (Sect. 3) and will present the results in Sect. 4.

2.1 Starburst

Fig. 2
figure 2

Visualization of the Świrski algorithm [30]. First, the eye image is convolved with a Haar-like center surround feature and the location of the maximal response (a) is used as the pupil region (b). The pupil region is segmented using k-means clustering of its histogram (c) and passed through a Canny edge detector (d). The algorithm finally uses an image-aware Random Sample Consensus (RANSAC) ellipse fitting (e) to detect the pupil (f)

In the first step of Starburst [18], the image is denoised using a Gaussian filter. The algorithm then uses adaptive thresholding on a square region of interest in each video frame to localize the corneal reflection. The location of the corneal reflection is given by the geometric center of the largest region in the image using the adaptively determined threshold. Radial interpolation is further used to remove the corneal reflection from the eye image. The central step of the algorithm that also gave this method its name, is to estimate the pupil contour by detecting edges along a limited number of rays that extend from a central best guess of the pupil center (see Fig. 1b, c, d). The rays are independently evaluated pixel by pixel until a threshold is exceeded, indicating the edge of the pupil. A feature point is defined at that location as a contour edge candidate and the processing along the ray is stopped. For each pupil candidate, another set of rays is generated that creates a second set of pupil contour candidates. This process is iterated until convergence. Model fitting is finally performed following a Random Sample Consensus (RANSAC) paradigm to find the best fitting ellipse describing the pupil boundary. This result is further improved through a model-based optimization that does not rely on feature detection [7, 18].

2.2 Świrski

Fig. 3
figure 3

Visualization of the Pupil Labs algorithm [15]. a Eye image, user region of interest (white stroke rectangle), and initial estimation of pupil region (white square and dashed line square). b Canny edge detection (green lines). c Define “dark” region as offset from lowest spike in histogram within eye image. d Filter edges to exclude spectral reflections (yellow) and not inside “dark” areas (blue). e Remaining edges extracted into contours using connected components and split into sub-contours based on curvature continuity criteria (multi colored lines). f Candidate pupil ellipses (blue) are formed using ellipse fitting. g Final ellipse fit found through an augmented combinatorial search (finally ellipse with center in red)—supporting edge pixels drawn in white (color figure online)

The algorithm by Świrski et al. [30] works in three main stages (see Fig. 2). To find the pupil, the Świrski detector first calculates the integral image and convolves it with a Haar-like feature, similar to the features used in cascade classifiers [36]. The algorithm repeats this convolution for a set of possible radii between a user-specified minimum and maximum.

Fig. 4
figure 4

Visualization of the SET algorithm. a Input image and b result of binarization. c Image after applying a region size threshold to extract pupil candidates. d Candidate pupil centers and e final result

The position of the strongest response is the estimated center of the pupil, with the size of the region determined by the corresponding radius. The initial region estimation is unlikely to be accurately centered on the pupil. Therefore, in the second step, the algorithm approximates the pupil position within this region based on k-means clustering: The image histogram is segmented in dark (pupil pixels) and light. The algorithm then creates a segmented binary image of the pupil region by thresholding any pixels above the maximum intensity in the dark cluster. Afterward, connected components in the segmented image are found and the largest among these is selected to be the pupil.

The center of mass of this component approximates the pupil position. The final stage of the algorithm refines the pupil position estimate by fitting an ellipse to the boundary between the pupil and the iris. To do this, the image is preprocessed to create an edge image and to robustly fit an ellipse to the edge points while ignoring any outliers. To remove features such as eyelashes and glints, a morphological “open” operation is performed to close small bright gaps in the otherwise dark-pupil region, without significantly affecting the pupil’s contour. The algorithm then finds the boundary between pupil and iris using a Canny edge detector. Finally, the algorithm fits an ellipse to the edge pixels using RANSAC as well as an image-aware support function. The support function ensures that the ellipse lies on a boundary from dark pixels to light pixels, and that they lie along strong image edges.

2.3 Pupil Labs

The Pupil Labs detector is integrated into the open source head-mounted eye tracking platform Pupil [15]. Figure 3 visualizes the different processing steps of the algorithm. In a first step, the eye image is converted to grayscale. The initial region estimation of the pupil is found via the strongest response for a center-surround feature as proposed in [30]. The algorithm then uses a Canny filter to detect edges in the eye image and filters these edges based on neighboring pixel intensity. It then looks for darker areas (blue region) while dark is specified using a user set offset of the lowest peak in the histogram of pixel intensities in the eye image. Remaining edges are filtered to exclude those stemming from spectral reflections (yellow region) and extracted into contours using connected components [29]. The contours are simplified using the Ramer–Douglas–Peucker algorithm [4] and then filtered and split into sub-contours based on criteria of curvature continuity. Candidate pupil ellipses are found using ellipse fitting onto a subset of the contours. Good fits are defined in a least square sense, major radii within a user-defined range and ellipse center in a “blue region” (see Fig. 3d). An augmented combinatorial search looks for contours that can be added as support to the candidate ellipses. The results are evaluated based on the ellipse fit of the supporting edges and the ratio of supporting edge length and ellipse circumference. If the ratio is above a threshold, the algorithm uses the raw Canny edge pixels to refine the ellipse fit and reports this final ellipse as the one defining the contour of the pupil. Otherwise the algorithm reports that no pupil was found.

2.4 SET

The SET approach consists of a combination of manual and automatic step estimates the pupil center [8]. Prior to pupil detection, two parameters, i.e., a threshold to convert the input to a binary image (see Fig. 4b) and the size of the segments (see Fig. 4c) considered for pupil detection are set manually [8]. The thresholded image is first segmented and pixels are then grouped into maximal connected regions to find pupil candidates [8]. For each segment larger than a threshold value, the Convex Hull method is applied to compute the segment border. In the last step, an ellipse is fitted to each extracted segment (see Fig. 4d). The ellipse that is closest to a circle is considered as the pupil (see Fig. 4e).

2.5 ExCuSe

Fig. 5
figure 5

The algorithmic work-flow in ExCuSe [5]. a Input image with reflections and b Canny edge filtered image. c Refined edges using morphologic operators. Remaining edges are analyzed regarding their curvature (d). The best edge is fitted by an ellipse and its center is reported as the pupil center (e). f Input image without reflections. g Coarse pupil estimate based on the angular integral projection. h Pupil position refinement based on the Canny edge image. i Rays are send out of the optimized position to select edges representing the pupil border (white dots on line in 9 are ray hits). j Result of the pupil center estimation

ExCuSe is a recently introduced algorithm that builds on edge detection and morphologic operations [5]. The algorithmic work-flow is presented in Fig. 5. In the first step, the input image is normalized and a histogram of the image is calculated. If a peak in the bright histogram area is found, the pupil can be found based on edge analysis (first row in Fig. 5). To achieve this, a Canny edge detector is applied to the input image (see Fig. 5b). The resulting edges are then refined by removing thin lines and thinning thick edges using a morphologic operator. All remaining edges are smoothed and orthogonal edges are removed using morphologic patterns (see Fig. 5c). For each connected line, the mean position is calculated. Based on this information, straight lines are excluded from further processing. All remaining curved lines are kept as shown in Fig. 5d and further processed. For each remaining curve, the enclosed mean intensity value is calculated and the curve with the lowest value is chosen as pupil curve. Afterward, an ellipse is fitted to this curve (see Fig. 5e).

In case no bright peak in the intensity histogram is detected, a threshold based on the standard deviation of the image is calculated. These corresponding steps are visualized in the second row of Fig. 5. First the algorithm determines a coarse pupil position and then refines this stepwise to approach the pupil center. The coarse pupil position is estimated based on the Angular Integral Projection Function (AIPF) [23] on the thresholded image. More specifically, the input image is thresholded and pixels over the threshold are summed along the rows. This summation is done four times by rotating with an orientation angle of \(45^\circ \), Step 7 in Fig. 5. Once a coarse pupil center estimation has been performed, a refinement step based on features of the surrounding neighborhood is performed, Step 8 in Fig. 5. The assumption here is that pixels belonging to the pupil are surrounded by brighter or equally bright pixels. Finally, a thresholded image is used in ExCuSe to improve the edge image and refine the pupil edges. The result for the input image from Fig. 5f is shown in Fig. 5j.

Fig. 6
figure 6

Algorithmic work-flow in ElSe [6]. a Input image, b Canny edge image after morphologic operations. c Remaining edges after curvature analysis, analysis of the enclosed intensity value, and shape. d The curvature with the most enclosing low intensity values and the most circular ellipse is chosen as the pupil boundary. e Pupil center result. If ElSe [6] fails to select a valid ellipse, the input image (f) is downscaled (g). h Image after convolution with a mean and a surface difference filter. i A threshold for pupil region extraction is calculated and used for pupil area estimation. j Pupil center result

Fig. 7
figure 7

Exemplary eye images of each data set introduced together with the pupil detection algorithms ExCuSe [5] and ElSe [6]. Data sets I–IX and XVIII–XXII were collected during an on-road driving experiment with different subjects. The Data sets X–XVII were recorded during a supermarket search task. The remaining Data sets XXIII and XXIV were recorded in a laboratory setting with two Asian subjects

2.6 ElSe

Similar to ExCuSe, the ElSe algorithms operates on Canny edge filtered eye images. The pupil center is found in a decision-based approach as described in [6]. Based on the edge filtered image, edge connections that could impair the surrounding edge of the pupil are removed by means of morphologic operators (see Fig. 6b). Afterward, connected edges are collected and evaluated based on straightness, inner intensity value, elliptic properties, the possibility to fit an ellipse to it, and a pupil plausibility check (see Fig. 6c). If a valid ellipse describing the pupil is found, it is returned as the result (see Fig. 6d).

In case no ellipse is found (e.g., when the edge filtering does not result in suitable edges), a second analysis is conducted. More specifically, as in ExCuSe, ElSe first estimates a likely location candidate and then refines this position. Since a computationally demanding convolution operation is required, the image is rescaled to keep run-time tractable. This rescaling process contains a low-pass procedure to preserve dark regions (see Fig. 6g) and to reduce the effect of blurring or noise caused by eyelashes in the eye image. Afterward, the image is convolved with two different filters separately: (1) a surface difference filter to calculate the area difference between an inner circle and a surrounding box and (2) a mean filter. The results of both convolutions are multiplied (see Fig. 6h), and the maximum value is set as the starting point of the refinement step. Since choosing a pixel position in the downscaled image leads to a distance error of the pupil center in the full scale image, the position has to be optimized on the full scale image based on an analysis of the surrounding pixels of the chosen position (see Fig. 6i). The center of mass of the pixels under this threshold is used as the new pupil position (see Fig. 6j). This position is evaluated regarding the possibility to be the pupil by analyzing the surface difference result of a validity pattern.

3 Data sets

3.1 Świrski

The data set introduced by Świrski et al. [30] in 2012 provides 600 manually labeled, high resolution (640 \(\times \) 480 pixels) eye images. The data was collected during in-door experiments with 2 subjects and 4 different camera angles. The main challenges in pupil detection arise due to highly off-axial camera position and occlusion of the pupil by the eye lid.

Table 1 Four publicly available data sets containing 225,569 ground-truth eye images were employed for the evaluation of pupil detection algorithms

3.2 ExCuSe

This data set was recently provided with the ExCuSe [5] algorithm and includes 38,401 high-quality, manually labeled eye images (384 \(\times \) 288 pixels) from 17 different subjectsFootnote 1 Exemplary images are shown in Fig. 7. The first nine data sets in ExCuSe were recorded during an on-road driving experiment [13] using a head-mounted Dikablis mobile eye tracker. The remaining eight data sets were recorded during a supermarket search task [26]. These data sets are highly challenging, since illumination conditions change frequently. Furthermore, reflections on eye glasses and contact lenses often occur (Table 1). The experiments were not specifically designed to pose challenges to pupil detection, but reflect typical data collected out of the laboratory.

3.3 ElSe

The Data sets XVIII–XXIV (Table 1; Fig. 7) were recently published with the ElSe algorithm in [6]. This data set contains overall 55,712 eye images (384 \(\times \) 288 pixels) collected from seven subjects wearing a Dikablis eye tracker during various tasks. The Data sets XVIII–XXII were derived from eye tracking recordings during an on-road driving experiment [13]. The remaining Data sets XXIII and XXIV were recorded during indoor experiments with two Asian subjects. The challenge in pupil detection arises from eyelids and eyelashes occluding the pupil or casting shadows onto the pupil (and, in one case, glasses reflections). Further challenges associated with Data set XXIV are related to reflections on eye glasses. The challenges in the eye images included in the Data sets XVIII, XIX, XX, XXI, and XXII are related to motion blur, reflections, and low pupil contrast in comparison with to the surrounding area.

3.4 Labeled pupils in the wild (LPW)

The recent Labeled Pupils in the Wild (LPW) data set [32] contains 66 high-quality eye region videos that were recorded from 22 participants using a dark-pupil head-mounted eye tracker from Pupil Labs [15]. Each video in the data set consists of about 2000 frames with a resolution of 640 \(\times \) 480 pixels and was recorded at about 95 FPS, resulting in a total of 130,856 video frames. The data set is one order of magnitude larger than other data sets and covers a wide range of realistic indoor and outdoor illumination conditions, includes participants wearing glasses and eye make-up, as well as different ethnicities with variable skin tones, eye colors, and face shapes (see Fig. 8). All videos were manually ground-truth annotated with accurate pupil center positions.

4 Experimental results

We compared the algorithms StarburstFootnote 2 [18], SETFootnote 3 [8], Świrski et al.Footnote 4 [30], Pupil LabsFootnote 5 [15], ExCuSeFootnote 6 [5], and ElSe\(^{6}\) [6] on the above data sets from Table 1. All algorithms were employed with their default parameter setting. We report the performance of the above algorithms in terms of the detection rate for different pixel errors, where the pixel error represents the Euclidean distance between the manually labeled center of the pupil and the pupil center reported by the algorithm. Note that we do not report performance measures related to the gaze position on the scene, since this also depends on the calibration. We focus on the pupil center position on the eye images, where the first source of noise occurs.

Fig. 8
figure 8

Example images of the labeled pupils in the wild (LPW) Data set. The top row shows different eye appearances. The second row shows particularly difficult cases, such as strong shadows, occlusions from eye lids, reflections on glasses, and make-up. The third row shows close-ups to illustrate the effect of reflections, self occlusions, strong sunlight and shadow, as well as occlusions by glasses

Table 2 summarizes the performance of the evaluated algorithms on each data set. On 42 out of 47 data sets, ElSe [6] clearly outperformed the other state-of-the-art algorithms, being thus the most promising approach toward robust pupil detection in heavily noisy eye images. The average detection rates of the evaluated algorithms on the whole image corpus (i.e., 225,569 ground-truth eye images from Table 1) are presented in Fig. 9. Note that the results are weighted by the number of images on each data set. As shown in the Figure, ElSe shows superior performance, reaching an average detection rate of more than 60 % at a pixel distance error of 5.

Table 2 Performance comparison of SET, Starburst, Świrski, ExCuSe and ElSe in terms of detection rate up to an error of five pixels
Fig. 9
figure 9

Average detection rates at different pixel distances for all data sets. The result for each data set is weighted by the number of images in the corresponding data set

Fig. 10
figure 10

Detection rates of the algorithms ElSe, ExCuSe, Pupil Labs, SET, Starburst, and Świrski for each of the data sets described in Table 1. a Performance on the Świrski data set. b Performance on the ExCuSe data set. c Performance on the ElSe data set. d Performance on the LPW data set

A detailed performance analyses on each data set is visualized in Fig. 10. The highest detection rates are achieved on the Świrski et al. [30] data set. Since this data set was collected in a laboratory setting, it is the least challenging, although most of the contained eye images are highly off-axial. For this data set, the algorithms ExCuSe, ElSe, and Swirksi reach a detection rate far beyond 70 % at a pixel distance of 5. With a detection rate of 86.17 % (Table 2), ExCuSe is the best performing algorithm among the state of the art.

Fig. 11
figure 11

Limitations of the state-of-the-art algorithms. a Data set XIX. b Data set XXI. c Data set XXVIII. d Data set XXIX

The data sets ExCuSe, ElSe, and LPW provide a large corpus of eye images collected in outdoor scenarios and represent the various challenges that have to be faced when head-mounted eye trackers are employed in such settings. Figure 10b shows the evaluation results on the ExCuSe data set. For this data set, ElSe is the best performing algorithm with a detection rate of 70 % at a pixel error of 5. The ExCuSe algorithm achieves also good detection rates of about 55 %, whereas the remaining algorithms show detection rates below 30 %.

Due to the many sources of noise summarized in Table 1, the ElSe data set contains the most challenging eye images. The best detection rates (for a pixel error of 5) are achieved by the algorithms ElSe (50 %) and ExCuSe (35 %), while the remaining algorithms show detection rates of at most 10 %.

According to the evaluation results on the LPW data set (Fig. 10d), ElSe proves to be the most robust algorithm when employed in outdoor scenarios. At a pixel error of 5, ElSe shows a detection rate of 70 %. Good detection rates (50 %) are also achieved by the algorithms ExCuSe and Swirksi, whereas the remaining approaches have detection rates below 40 %.

Figure 11 shows evaluation results for the most challenging data sets. Data set XIX in Fig. 11a is characterized by scattered reflections, which lead to edges on the pupil but not at its boundary. Since most of the state-of-the-art approaches are based on the edge filtering, they are very likely to fail in detecting the pupil boundary. In consequence, the detection rates achieved here are quite poor.

Fig. 12
figure 12

Successfully found pupils by the ElSe algorithm in eye images from Data set XXI

Fig. 13
figure 13

Failure cases of ElSe on eye images from Data sets XIX and XXVIII

Data set XXI (Fig. 11b) poses challenges related to poor illumination conditions, leading thus to an iris with low intensity values. This makes it very difficult to separate the pupil from the iris (e.g., in the Canny edge detection the responses are discarded because they are too low). Additionally, this data set contains reflections, which have a negative impact on the edge lter response. While the algorithms ElSe and ExCuSe achieve detection rates of approximately 45 %, the remaining approaches can detect the pupil center in only 10 % of the eye images. Figure 12 presents examples of successfully found pupils in eye images from Data set XXI. The top row shows two input images, the middle row presents the filtered edges, and the bottom row shows the pupils detected by the ElSe algorithm. Among the evaluated algorithms, only ElSe [6] and ExCuSe [5] find the pupil in these eye images. The remaining algorithms fail due to the low contrast in the pupil area. More specifically, SET [8] fails, since in the thresholding step large parts of the iris are extracted and identified as pupil area. Świrski et al. [30] fails in the coarse positioning step, while Starburst fails while selecting the correct edge candidates that represent the pupil border.

The eye images contained in Data set XXVIII (Fig. 11c) are recorded from a highly off-axial camera position. In addition, poor illumination makes it difficult to separate the pupil from the dark regions at the eyelid areas. Both conditions lead to overall poor detection rates. Figure 13 shows failure cases of ElSe [6] on eye images from Data sets XIX and XXVIII. To demonstrate the challenges associated with automated pupil detection in these images we have chosen ElSe because it was the best performing algorithm. The left column presents the input images to the algorithms, the second column shows the filtered edges, the third column are the blob responses, and the last column the results. The first two rows contain images from Data set XIX and show the high impact of scattered reflections (first row) and induced curved edges by reflections (second row). In the second row, the third image is not present since ElSe [6] did not use blob detection. The last row shows an input image from Data set XXVIII, where the wrong blob response is due to eyelashes. Since the image is recorded highly off-axis, the pupil is only marginally available.

The last challenging Data set XXIX (Fig. 11d) is also characterized by highly off-axial images. In addition, the frame of the subjects’ glasses covers the pupil and most of the images are heavily blurred. This leads to unsatisfactory responses from the Canny edge detector. In consequence, the detection rates are very poor, e.g., ElSe (the best performing algorithm) can detect the pupil in only 25 % of the eye images. Figure 14 shows failure cases of ElSe [6] on eye images from Data sets XXI and XXIX. The left column presents the input images, the second column shows the filtered edges, the third column is the blob responses, and the last column the results obtained with the ElSe algorithm. The blob response of ElSe [6] in the top row is distracted by the high surface difference of the light shadow at the lower eyelid and the bright skin below it. For Data set XXIX, the main pupil recognition problems arise due to the bright eyeglasses frame, which distracts the blob response. In the above failure cases, further improvements in automated pupil detection could come from explicitly considering additional eye-related features such as eyelids and eye corners.

Fig. 14
figure 14

Failure cases of ElSe on eye images from Data sets XXI and XXIX

5 Conclusions

We presented a review of state-of-the art pupil detection algorithms for application in outdoor settings. The focus was primarily on the robustness of these algorithms with respect to frequently and rapidly changing illumination conditions, off-axial camera position, and other sources of noise. Six state-of-the-art approaches were evaluated on over 200,000 ground-truth annotated images collected with different eye tracking devices in a range of different everyday settings. Our extensive evaluation shows that despite good average performance of these algorithms on these challenging data sets, there are still problems in obtaining robust pupil centers in case of reflections or poor illumination conditions.