Keywords

1 Introduction

Using tablets in landscape orientation when watching videos can be quite cumbersome. User interfaces of mobile video players usually ignore the typical position of the hands at the sides of the device. Although mobile touchscreen devices offer manifold interaction options, video players still look and work very similar to their desktop counterparts, as can be seen in Fig. 5. Holding a device in landscape orientation for longer periods of time is usually more convenient, in particular for large tablets, e.g., 12-inch devices. As a result, interaction with the players’ interface becomes unnecessarily hard. Using playback controls that are positioned in the middle of the screen like seeker bars, play/pause/fast forward/fast rewind buttons or similar, can cause strains of the hands and the fingers. This problem is ignored even though there is an increasing amount of video content that is watched with mobile video players. Furthermore, embedded camera systems and mobile content delivery have greatly improved and more video content – for professional and entertainment purposes – is consumed than ever before.

To solve this problem an ergonomic video player interface should be designed that is tailored for the special needs in mobile multimedia interaction. As all eight fingers are occupied when holding the device in landscape orientation, the thumbs become especially important. In contrast to the fingers they are free to move at the left and right sides of the device. Therefore, UI controls need to be positioned at these areas on the screen, so that all functions can be easily reached with the thumbs. The idea of such a UI layout is in fact far from new. On-screen keyboards offer a special mode for such use cases already for a long time. The keyboard is split up into two pieces and placed at the left and right side of the screen. In Fig. 2 examples of default iOS and Windows keyboards can be seen. Nevertheless, this idea is completely ignored by user interfaces of current mobile video players.

Fig. 1.
figure 1

Interface of the ThumbBrowser.

Fig. 2.
figure 2

Examples of split keyboards on iOS and windows.

In [9] we already proposed to use the thumbs for mobile video interaction. However, at that time we were only able to present a first, basic version of our ThumbBrowser. As we took part in the demo paper track we also were not able to provide extensive evaluation results and therefore had to leave the readers unclear about a definite benefit. Therefore, in this work we build on this basis and present extensions to our interface as well as results of a comparative user evaluation with a state-of-the-art mobile video player. The evaluation configuration is inspired by the work of Schoeffmann and Burgstaller [16], as they performed a similar experiment with a video player interface tailored for smartphone screens in portrait orientation. They utilize the idea of a scrubbing wheel, as used on Apple iPods to control music, for navigating in videos. Moreover, we are comparing our results to theirs in the analysis of the evaluation results.

2 Related Work

Hürst and Darzentas [11] propose browsing videos on a tablet with a hierarchical storyboard interface. A videos’ content is represented by a grid of thumbnails representing segments that can be tapped to be transferred to another, lower-level grid, representing the content of the specific segment. Hürst et al. [12] also show mobile browsing for timeline-based video browsing with focus on PDA-like devices that are operated with a stylus. For example, users are able to control seek-speed by varying the vertical position of the stylus. This approach is similar to the interaction paradigm later adopted by the default iOS video player. Furthermore, Hürst et al. [13] present a concept for video navigation on smartphones tailored for one-handed, i.e. thumb interaction.

Hudelist et al. [8] utilize the metaphor of a 3D filmstrip for browsing videos on tablet device, similar to the motivation of [1]. The content is represented by a floating filmstrip as it is used in analog film projectors and cameras. Each image on the strip represents a video segment that can be directly played in the strip visualization. Ganhör [5] shows ProPane, an interface for fast and very precise mobile browsing on smartphone-like devices. It enables users to control playback and seeking in a very precise manner, e.g. for video editing scenarios. Huber et al. [7] present Wipe’n’Watch, an interface for browsing interrelated video collections, similarly to the approach of De Rooij et al. [3] for desktop computers. Karrer et al. [15] propose an interface for mobile devices that utilizes direct manipulation of objects in a scene for navigation instead of traditional seeker bars as was shown in earlier work by Dragicevic et al. [4] for desktop PCs.

Moreover, Schoeffmann et al. [17] show a mobile video player that uses wipe gestures for controlling the seeking speed. Hudelist et al. [10] also show a video player for navigation in single videos on tablet devices that utilizes sub-shots and different levels of detail for browsing via keyframes. For this, the interface uses three synchronized filmstrips with which users can easily navigate in the content. A purely human-computational approach is shown by Hürst et al. [14] where users browse through videos by inspecting a large array of uniformly sampled keyframes. Zhang et al. [19] present a mobile interface for collaborative browsing of two users in a single video with the abilities to share sketches. The position in the video is controlled via simple touch gestures. Similarly, Cobarzan et al. [2] proposes a system for collaborative browsing with multiple mobile clients using tablets and a single server that manages communication and query requests of the clients. Finally, a general overview of the field of novel video browsing interfaces is given by Schoeffmann et al. [18].

3 Interface

The extended interface that is presented in this paper is based on our earlier work on the ThumbBrowser (see Hudelist et al. [9]). We extended our original concept with functionality regarding content analysis and user interaction, before we performed a comparative user study to prove its usefulness to users.

The interface tries to avoid interfering with users’ watching experience as much as possible. Therefore, in normal playback mode all of the UI controls are hidden. When users want to interact with the player, e.g., to change playback position, they simply have to put one of their thumbs on either side on the screen. Depended on which side is touched different controls become visible, as can be seen in Fig. 1.

On the right hand side a vertical seeker control is activated. It is inspired by the classical layout of a traditional seeker bar but the timeline is positioned vertically. As a result, users can easily navigate to every position inside a video with their right thumb as every part of the timeline is reachable. Furthermore, a preview window appears in relation to users vertical thumb position on the screen. This feature is similar to the magnifying glass functionality shown by latest iterations of video players used by YouTube and similar websites. Therefore, to jump to any position in the video users have to place their right thumb on the screen, drag it to the wanted position on the timeline and then lift it up again. This activates the navigation process and the playback position is changed accordingly. It is possible to avoid this navigational jump by dragging the thumb all way to the right out of the screen, instead of lifting it.

Fig. 3.
figure 3

Visualization of dominant color directly in the vertical timeline. (Color figure online)

We further extended the original interface concept by automatically analyzing the current video in the background and determining the dominant color in five second steps. This is done by creating a simple color histogram based on the HSB color space. Color values are assigned to one of eight bins. The bins cover value ranges of different size and were defined by results of a preliminary test (Fig. 3).

On the left side of the screen a radial menu can be activated. It offers options to play/pause the video, perform fast forwarding and fast rewinding.

Additionally, we added another timeline visualization mode to the interface, called the filmstrip mode.

The filmstrip mode provides a scroll-able list of keyframes at the right hand side of the screen, as can be seen in Fig. 4. The keyframes are uniformly sampled from the video in five second steps.

Fig. 4.
figure 4

Vertical seeker control after activation of the filmstrip mode.

When users tap on one of the keyframes the video player adjusts accordingly. This feature is designed to help users refine their search in case of very long videos. For example, they start a rather crude search with the seeker control and after some time notice a promising section of the video. As the seeker control is too sensitive to examine it in detail they are able to continue their browsing process by switching to the filmstrip mode.

4 Evaluation

In order to make the results of this study comparable we designed the evaluation to be very similar to the one used by Schoeffmann and Burgstaller [16]. We even used the same data set (we want to thank the authors for providing the whole data set with ground truth).

Fig. 5.
figure 5

Screenshot of the standard video player.

Participants of the user study had to search and mark all occurrences of predefined objects. Four videos had to be inspected and annotated. The first video was a documentary about gravity and planets in outer space with a duration of 35 min. In this video users had to find all scenes where images of the planet Earth were visible. The second video was an extended report about worldwide multi-cultural societies with a duration of 30 min. In this case users had to find scenes where glasses were shown. The third video was a documentary about cultivating fruits and vegetables with a duration of 25 min where study participants had to find all occurrences of bananas. Finally, video four was a report about Gamification with 40 min of length where all scenes with smartphones had to be marked by the participants.

Moreover, participants were told that they could spend as much time as they deemed appropriate to complete a task, but it was recommended not to spend more than up to seven minutes in each case. This was done in order to avoid putting too much stress on users by trying to find really every instance and thus spending unrealistic amounts of time.

To compare the performance of the ThumbBrowser to the performance of a default player we used the media player control of the iOS API, which offers a play/pause button and a seeker bar. Furthermore, two buttons were always available in both interfaces: one button to mark a scene and one button to finish the current task. As testing device we used an iPad Air (first generation) with iOS 9.3.

The order of interfaces and videos were alternated between each participant with the exception that two consecutive videos were always tested with the same interface, e.g., video one and two were processed with interface A, followed by video three and four processed with interface B. Furthermore, before a user could start the study, we asked them to provide us with their age, gender and smartphone/tablet experience level, e.g., beginner, advanced or professional. At the beginning of each task the system described the required objects textually. After completion of the first two tasks with the first interface a questionnaire was displayed where users had to give ratings about the interface according to the NASA Task-Load-Index (TLX) [6] and users had to rate them on a Likert-scale. The following questions were asked: (i) how mentally demanding the interaction was, (ii) how physically demanding the interaction was, (iii) how much they had the feeling that the interface supported them in solving the task, (iv) how much fun it was to use the interface, (v) how frustrating it was to use the interface, and (vi) how easy to understand and easy to use the interface was.

4.1 Experimental Results and Statistical Analysis

In total 26 participants took part in the study of which exactly the half were female. Average age of the participants was at 25.5 years. Moreover, eight indicated that they were smartphone/tablet beginners, 13 told us that they were advanced users and five selected that they were very experienced users.

A paired-samples t-test was used to determine whether there was a statistically significant mean difference between the search performance, e.g. how many scenes were found with the ThumbBrowser compared to the standard player. One outlier was detected that was more than 1.5 box-lengths from the edge of the box in a boxplot. Inspection of its value did not reveal it to be extreme and it was kept in the analysis. The difference scores for the search performance of the ThumbBrowser and standard player were normally distributed, as assessed by Shapiro-Wilk’s test (p = 0.70). Data are mean ± standard deviation, unless otherwise stated. The percentage of found scenes was higher for the ThumbBrowser (66.19% ± 21.28%) than for the standard player (48.43% ± 18.68%). The test revealed a statistically significant difference between the two interfaces (t(51) = 4.53, p < 0.0005, d = 0.63). Please see Fig. 6 for a visualization of the performance differences.

Fig. 6.
figure 6

Amount of retrieved target scenes (error bars: ± s.e. of the mean).

This result is encouraging as it shows that the ThumbBrowser can in fact improve users performance significantly. Next, we analyzed the differences on how much time participants spent for each task.

A paired-samples t-test was used to determine if there was a statistically significant mean difference between the search times of the ThumbBrowser compared to the standard player. Three outliers were detected that were more than 1.5 box-lengths from the edge of the box in a boxplot. Inspection of their values did reveal that one of them was extreme and therefore was excluded from the analysis. The difference scores for the ThumbBrowsers’ search times and the standard players’ search times were normally distributed, as assessed by Shapiro-Wilk’s test (p = 0.16). Data are mean ± standard deviation, unless otherwise stated. The search times were higher for the ThumbBrowser (417.1s ± 217.4s) than for the standard player (315.7s ± 229.9s). The test revealed a statistically significant difference between the two interfaces (t(50) = 2.529, p < 0.05, d = 0.35). In Fig. 7 the difference in mean search times are visualized.

Fig. 7.
figure 7

Task solve time (error bars: 95% confidence interval).

This result is interestingly similar to the results of Schoeffmann and Burgstaller [16] and could indicate that users were more comfortable with the ThumbBrowser than with the standard player. Therefore, they invested more time in the tasks and this could also contribute to the overall better search performance.

To determine if there are statistical significant differences between the answers given in the questionnaires a Wilcoxon signed-rank test was performed. In case of mental demand the ThumbBrowser was significantly less demanding than the standard player (Z = 2.22, p < 0.05). Moreover, it was physically less demanding to use (Z = 2.373, p < 0.05), users had the feeling that it significantly supported them in their tasks more than the standard player (Z = –4.373, p < 0.005), it was significantly more fun to use (Z = –4.286, p < 0.005) and less frustrating (Z = 4,106, p < 0.005). In terms of usability both interface were equal (Z = 0.776, p = 0.438). In Fig. 8 the differences between the interfaces regarding the questionnaires are visualized.

Fig. 8.
figure 8

Workload ratings according to NASA Task-Load-Index [6], with Likert-scale 1–10, for both interfaces (error bars: ± s.e. of the mean). Lower is better for Mental, Physical and Frustration.

Fig. 9.
figure 9

Sample of the navigation behavior for one user in the documentary video. Users have to find scenes where planet earth is visible when using the ThumbBrowser (top) and the standard player (bottom).

When we visually examined the navigation behaviors of the participants it also became clear that the search strategies between the two interfaces were slightly different (see Fig. 9). In case of the ThumbBrowser users were much more likely to perform their search unidirectional, e.g., starting from the beginning and searching to the end of the video. This was not the case for the standard player, as users were more likely to restart their search again and again from the beginning when they reached the end of the video. Therefore, we can agree with the the findings of Schoeffmann and Burgstaller [16] who discovered the same behavior. Moreover, when we further compare their results to ours, we see that although the search performance as well as the average search duration were slightly higher in our study, the general trend is very similar. People spend significantly less time with the standard player and also find significantly less target scenes.

4.2 User Feedback

Some additional comments that were given after completing the user study were that the ThumbBrowser offered a much smoother seeking interaction and that they would prefer a different way to open and close the filmstrip UI. Furthermore, they told us that additional fast forwarding and fast rewinding speeds would have also helped, as in its current iteration the ThumbBrowser only supports a fixed seeking speed of two times the normal playback speed. One participant who was editing a lot of videos professionally told us that she would love to have the interface for her work.

5 Conclusions

In this paper we have presented and evaluated an extended version of the ThumbBrowser - a video browsing and video search tool for tablets that is optimized for landscape operation.

As playback controls are often hard too reach when holding the device with both hands at the sides, using interface controls explicitly designed for the thumbs provides a much better user experience. The interface provides a vertical seeker control similar to a timeline on the right hand side and a radial menu with additional playback functionality on the left hand side of the screen. Moreover, the extended version of the interface uses an easy to understand visualization of dominant colors across the video for faster navigation to scenes with salient color patterns and provides means to switch between coarse and detailed browsing modes.

We tested the interface in a user study with 26 participants where users had to mark scenes containing predefined objects. The results of our study show that the ThumbBrowser could outperform a traditional standard player by letting users find significantly more target scenes. Also, it is significantly less demanding in terms of mental and physical needs, it supports users better in solving the tasks, is is more fun, less frustrating and as easy-to-use as a standard player.