Keywords

1 Introduction

Text entry on smartwatches is challenging, primarily due to the small size of their touchscreen. As the popularity of this kind of devices is increasing, a lot of research is being carried out on their input methods, from both industry and academia, to improve them. Despite these efforts, there is still a gap in comfort and efficiency between small touchscreens and bigger devices (smartphones, tablets, etc.). More generally, a common problem for touch-based text entry methods is the reluctance of the users to learn new keyboard layouts [20].

Since a full keyboard layout does not fit well a small screen, researchers adopted two common actions already used for maps: zoom and pan. ZoomBoard [15] is a zoom-based soft keyboard. Zoom is performed through a first touch to enlarge a part of the keyboard. Then, a character from the enlarged area is entered through a second touch. This additional zooming step, however, increases the time required to select each key. In fact, the recorded average number of keystrokes needed to enter a single a character is 2.15, i.e. a user needs more than two interactions to input a character.

Popular pan-based methods are Splitboard [12] and Driftboard [17]. In the former, the QWERTY layout is split into two (partially overlapping) parts along the vertical axis. The user can only visualize one part at a time and move between them with a swipe gesture. The advantage of this technique is that swipes are not so frequent, thus keeping low the key stroke per character. In the latter, the QWERTY layout can be dragged across the display. There is a cursor (a circle) fixed on the left side of the display, and the user can enter a character by dragging the keyboard until the corresponding key is placed under the cursor.

Different approaches include: DualKey [11], in which each key contains two characters and the selection depends on which finger (index/middle) touches the screen; SwipeBoard [2], an eyes-free text entry method in which two swipes are used to enter each character; WatchWriter [10], where the user can input by tapping on a key or by swiping through keys to write a whole word with a single gesture; C-QWERTY [3, 4] an adaptation of the classic layout in which the keys are arranged along the edge of a circular screen. It supports both tapping and gesture interaction modes. The Optimal-T9 [16] and T18 [6] soft keyboard are QWERTY-based variation of the original T9 method; other methods also exploit handwriting [5].

The purpose of this paper is to introduce and evaluate a soft-keyboard for smartwatches, which is an improvement of the zoom-based methods. BubbleBoard is a soft keyboard for smartwatches with a QWERTY layout enhanced for small screens. In particular, some characters on the keyboard are in a big font and allow entering the corresponding character with a single touch, while other characters are in a small font and a touch enlarges the pressed character and its neighbors, so that they can be entered with a second touch.

In order to evaluate the extent of the improvement over previous zoom-based methods, we compared BubbleBoard with ZoomBoard [15] in a user study. In a previous user study, we compared different competing variants of the method in order to choose the best one.

The paper is organized as follows: Sect. 2 describes BubbleBoard and its design choices, Sects. 3 and 4 show its experimental evaluation. Finally, Sect. 5 concludes the paper with a discussion on future work.

2 BubbleBoard

While designing the method, we choose the QWERTY layout as a basis so that users would not be required to learn a new layout. We also decided to show a maximum of four large characters in the first two lines of the keyboard, and a maximum of five in the last line (since it has only seven characters) so that we can assign to the large characters a width (including left and right padding) equal to one-sixth of the screen. This is supported by two factors: firstly, six keys per line are already used in the literature (in SplitBoard [12]); secondly, in a short informal test [6] of layouts with 4–8 keys per line, tapping keys became difficult with 7 or more keys.

We also made the following design choices:

  • small characters should always be grouped in at least two consecutive characters, otherwise their size would be too small to allow easy selection;

  • the first and last characters of each line should always be large because otherwise in case the user would tap on the second (small) character of the line, there would not be enough space to allow the magnifying (at the exact point where the user touched) of that character without making the next one disappear.

Fig. 1.
figure 1

BubbleBoard static layout (left) and example expansion of the f character (right).

Once these constraints were fixed, it was then necessary to choose the best layout regarding which characters are shown in large font and which in small font. To this end we used an English corpora [8, 9] to choose, among all the possible layouts that respect the above constraints, the layout minimizing the number of touches. Figure 1 shows the selected layout and an example character expansion.

Although the user causes the layout to change when s/he taps on one of the small characters, the initial layout is restored immediately after a character is entered (with a second tap). For this reason we call this variant of the method “the static mode”.

We also designed a dynamic mode, in which after a character is entered, the user is shown a new layout in which, based on the written text, the next key is most likely to be among the big ones. To this end, we built a simple predictive model on an English corpora [8, 9], based on the last three typed characters. Although the dynamic mode theoretically has a lower key stroke per character (1.16 versus 1.38 of the static layout), it might be difficult to use because the continuous layout changes while typing can confuse the user.

Lastly, regarding the space and backspace keys, in addition to the use of dedicated keys as an additional last keyboard row, we also introduced the possibility to replace these keys with a swipe gesture to the right and to the left, respectively.

3 First User Study

In order to select the most promising BubbleBoard variant, we conducted a user study to compare the modes (static/dynamic) and the space/backspace entry methods (key/gesture).

3.1 Methods

Participants. For the experiment, we recruited 20 participants (5 female). They were all university students in Computer Science, between 22 and 28 years old (\(M=23.75\), \(SD=1.83\)), who agreed to participate for free. Two of them were left-handed, but everyone decided to use the right hand to perform the experiment. All declared medium or high proficiency with English and a high level of experience with smartphones, and most of them at least a basic experience with smartwatches.

Apparatus. The experiment was conducted on a ASUS ZenWatch 2 equipped with a Snapdragon Wear 2100 Quad Core 1.2 GHz processor and running the Wear OS operating system (see Fig. 2). The device weighs 60 g and has a square display with a 1.63” diagonal and a resolution of 320 \(\times \) 320 pixels.

Fig. 2.
figure 2

The ASUS ZenWatch 2 running the BubbleBoard application (Dynamic Key).

The experimental software was a Wear OS application implementing the four variants of BubbleBoard (Static Key, Static Gesture, Dynamic Key, Dynamic Gesture). At startup, the application asked the participant to choose the desired variant. After that, the application showed the chosen keyboard and the sentence to transcribe. After entering the sentence, the participant can confirm the entered text by performing a long press over the transcribed text placed at the top of the screen. Once confirmed, the application showed the next sentence (or asked the participant to transcribe the same sentence again, if the threshold of 15% of non-corrected errors was exceeded).

Procedure. Before starting the experiment, participants were instructed on the aims and procedures of the experiment; they were then asked to complete a brief survey asking age, gender, dominant hand, level of proficiency with English and with the use of smartphones and smartwatches.

The experiment was conducted in a well-lit laboratory. Participants were asked to wear the device on the non-dominant arm and to perform the tasks while remaining seated, possibly placing the arm on a desk. They had a short practice session to get familiar with the keyboards and the experiment settings before starting the recorded tasks. Participants were given all the recommendations related to the experiment, and in particular to:

  • read and memorize the sentence before starting to transcribe it;

  • balance speed and accuracy while typing;

  • correct mistakes made while entering text. Since the only way to correct errors is by using the backspace key, they were also told to avoid correcting errors noticed only after having already entered other words.

Each participant had to transcribe 7 short sentences for each of the four BubbleBoard variant (the first 2 sentences were used as practice and not recorded). For each participant, the sentences were chosen at random from the set by MacKenzie and Soukoreff [14], which includes English sentences without punctuation or numbers. At the end of each 7 sentence block, participants were allowed to rest for a few minutes.

After completing this phase, participants were asked to fill a System Usability Scale (SUS) [1] questionnaire for each of the two modes they tried (Static and Dynamic). SUS includes ten statements, that alternate between positive and negative, to which respondents have to specify their level of agreement using a five-point Likert scale. Each SUS questionnaire has a score between 0 and 100, which was averaged on all participants. Finally, they were asked to fill a questionnaire, in which they where asked their preferred BubbleBoard mode and their feedback in open form.

Design. The experiment was a two-factor within-subjects design. The two factors were the keyboard mode and the space/backspace entry method. The keyboard mode included two levels: the Static mode and the Dynamic mode, while the space/backspace entry method included two levels: Key and Gesture. Our dependent variables were text entry speed, accuracy and the keystrokes per character (KSPC). In particular, the text entry speed was measured in words per minute (wpm) as specified in [13] and the accuracy was measured in terms of total error rate (TER) and non corrected error rate (NCER) [19]. Moreover, to counterbalance keyboard mode and space/backspace entry method, we arranged the experiments according to the order shown in Table 1.

Table 1. Counterbalancing scheme used in the first user study.

3.2 Results

All participants completed the experiment. For each participant, the experiment lasted about 25 min. We tested significance using an analysis of repeated variance measures (ANOVA) [7].

Speed. The text entry speeds (in wpm) are shown in Fig. 3. The grand mean was 10.6 wpm. Participant were fastest with Dynamic Key (11.4 wpm), followed by Dynamic Gesture (10.9 wpm), Static Key (10.0 wpm), and Static Gesture (9.8 wpm). This is probably due to the fact that in Dynamic mode participants were successfully helped by the layout predictive model.

From the ANOVA resulted that the effect of the keyboard mode on speed was statistically significant (\(F_{1, 19}=13.619\), \(p<.005\)). However, there was no significant effect for space/backspace entry method (\(F_{1,19}=0.659\), ns) and for the interaction between keyboard mode and space/backspace entry method (\(F_{1,19}=0.466\), ns).

Fig. 3.
figure 3

First user study: text entry speed of BubbleBoard. Error bars show the standard deviation.

Accuracy. Average values for TER and NCER are shown in Fig. 4.

For TER the grand mean was 9.2%. Dynamic Gesture was the most accurate variant with a mean TER of 7.0%, followed by Static Gesture (9.3%), Static Key (9.9%), and Dynamic Key (10.5%). This is probably due to the fact that in Dynamic mode it was more likely for participants to make errors when the system changes the layout. However, from the ANOVA no statistically significant effect resulted neither for keyboard mode (\(F_{1, 19}=1.034\), \(p>.05\)), nor for space/backspace entry method (\(F_{1, 19}=2.965\), \(p>.05\)) nor for the interaction between keyboard mode and space/backspace entry method (\(F_{1, 19}=2.532\), \(p>.05\)).

For NCER the grand mean was 0.6%. All variants achieved an NCER of less than 1%, and also in this case the Dynamic Gesture was the most accurate variant with a mean NCER of 0.4%. In fact, from the ANOVA no statistically significant effect resulted neither for keyboard mode (\(F_{1, 19}=2.938\), \(p>.05\)), nor for space/backspace entry method (\(F_{1, 19}=3.854\), \(p>.05\)) nor for the interaction between keyboard mode and space/backspace entry method (\(F_{1, 19}=1.998\), \(p>.05\)).

Fig. 4.
figure 4

First user study: total error rate (left) and non corrected error rate (right). Error bars show the standard deviation.

KSPC. Average values for keystrokes per character are presented in Fig. 5. The grand mean was 1.540. As expected the Dynamic mode had a lower KSPC at 1.419, with Static mode at 1.662.

From the ANOVA resulted that the effect of the keyboard mode on KSPC was statistically significant (\(F_{1, 19}=77.259\), \(p<.0001\)), while the effect of space/backspace entry method (\(F_{1, 19}=4.010\), \(p>.05\)) and interaction between between keyboard mode and space/backspace entry method (\(F_{1, 19}=2.461\), \(p>.05\)) were not statistically significant.

Fig. 5.
figure 5

First user study: keystrokes per character of BubbleBoard. Error bars show the standard deviation.

User Satisfaction and Free-form Comments. As regards user satisfaction, the mean SUS score was 86.38 (\(SD=10.18\)) for the Static mode and 84.75 (\(SD=14.09\)) for the Dynamic mode. A Wilcoxon matched-pairs signed-ranks test [18] performed on SUS scores revealed no statistical significant difference between the two techniques (\(Z=-0.2831\), \(p>.05\)). This trend was not confirmed by the final questionnaire, where 13 participants preferred the Dynamic mode and 7 preferred the Static mode.

From the open-feedback questionnaire we noticed that, most of the participants who chose the Dynamic mode appreciated it for allowing faster typing. On the other hand, most of the participants that preferred the Static mode appreciated the fact that they could easily remember the size (small/large font) and position of characters on the keyboard. Finally, some participants complained about the fact that text correction is only possible through backspace, and asked to be able to freely position the text cursor to make corrections.

3.3 Discussion

Given these results, we could conclude that the dynamic mode showed the most promising performance, with the reduced number of required taps compensating for the higher difficulty and likelihood of errors. The use of gestures for space and backspace doesn’t seem instead to bring benefits over the classic keys.

4 Second User Study

After selecting the most promising variant of BubbleBoard (Dynamic Key), we decided to compare it with ZoomBoard [15], an existing text entry method for smartwatches based on key magnification. As described in Sect. 1, however, ZoomBoard requires two touches for each character entry. To this end we conducted a second user study, in which we compared BubbleBoard with ZoomBoard over multiple text entry sessions.

Participants. For the experiment, we recruited 18 participants (10 female), different from the ones of the first experiment. They were mostly students, between 18 and 32 years old (\(M=23.8\), \(SD=4.2\)), who agreed to participate for free. All declared medium or high proficiency with English and a high level of experience with smartphones, and little to none experience with smartwatches.

Apparatus. The experiment was conducted on the same ASUS ZenWatch 2 of the first experiment.

For BubbleBoard the same experimental software of the first experiment was used, configured for Dynamic Key mode and for 5 sentences. For ZoomBoard an application with similar functionalities by the authors of [12] was used.

Procedure. The procedure was similar to the one of the first experiment.

In this case, however, each participant had to perform a total of three sessions, where the task of each session was to transcribe 5 short text sentences (from the set by MacKenzie and Soukoreff [14]) for each of the two keyboard (BubbleBoard a ZoomBoard). At the end of each session, participants were allowed to rest for a few minutes.

After completing the three sessions, participants were asked to fill a System Usability Scale (SUS) questionnaire for each of the two keyboards, their preferred keyboard (for speed, accuracy, and overall) and their feedback in open form.

Design. The experiment was a two-factor within-subjects design. The two factors were the keyboard and the session. The keyboard included two levels: BubbleBoard, and ZoomBoard. The dependent variables were the same as in the first experiment. Moreover, to counterbalance the two keyboard layouts, we arranged the sessions according to the order shown in Table 2.

Table 2. Counterbalancing scheme used in the second user study.

4.1 Results

All participants completed the experiment. For each participant, the experiment lasted about one hour. We tested significance using an analysis of repeated variance measures (ANOVA).

Speed. The text entry speeds (in wpm) are shown in Fig. 6. The grand mean was 9.4 wpm. BubbleBoard was the fastest keyboard layout with a mean of 10.7 wpm, outperforming ZoomBoard at 8.1 wpm. In fact, as can be seen from the Figure, BubbleBoard significantly outperforms ZoomBoard in every session. Moreover for both keyboards there is a slight speed increase between sessions. On the third (last) session BubbleBoard reached 11.2 wpm, while ZoomBoard 8.4 wpm.

From the ANOVA resulted that the effect of the keyboard on speed was statistically significant (\(F_{1,17}=75.417\), \(p<.0001\)). The effect of the session on the speed was also statistically significant (\(F_{2,34}=4.206\), \(p<.05\)), while the interaction between keyboard and session was not statistically significant (\(F_{2,34}=0.195\), ns).

Fig. 6.
figure 6

Second user study: text entry speed of BubbleBoard and ZoomBoard. Error bars show the standard deviation.

Accuracy. Average values for TER and NCER are shown in Fig. 7.

For TER the grand mean was 6.6%. ZoomBoard was the more accurate keyboard with a mean TER of 4.0%, while BubbleBoard reached 9.3%.

From the ANOVA resulted that the effect of the keyboard on TER was statistically significant (\(F_{1,17}=26.882\), \(p<.0001\)), while the effect of session (\(F_{2,34}=2.126\), \(p>.05\)) and interaction between keyboard and session (\(F_{2,34}=0.661 \), ns) were not statistically significant.

For NCER the grand mean was 1.5%. There was little difference between ZoomBoard at 1.7% and BubbleBoard at 1.4%. In fact, from the ANOVA no statistically significant effect resulted neither for keyboard (\(F_{1,17}=1.366\), \(p>.05\)), session (\(F_{2,34}=0.216\), ns) and the interaction between keyboard and session (\(F_{2,34}=0.028\), ns).

Fig. 7.
figure 7

Second user study: total error rate (left) and non corrected error rate (right). Error bars show the standard deviation.

KSPC. Average values for keystrokes per character are presented in Fig. 8. BubbleBoard had a lower KSPC at 1.443, with ZoomBoard at 2.168.

From the ANOVA resulted that the effect of the keyboard on KSPC was statistically significant (\(F_{1,17}=366.325\), \(p<.0001\)), while the effect of session (\(F_{2,34}=3.200\), \(p>.05\)) and interaction between keyboard and session (\(F_{2,34}=1.888\), \(p>.05\)) were not statistically significant.

Fig. 8.
figure 8

Second user study: keystrokes per character of BubbleBoard and ZoomBoard. Error bars show the standard deviation.

User Satisfaction and Free-form Comments. As regards user satisfaction, the mean SUS score was 70.7 (\(SD=12.5\)) for ZoomBoard and 66.1 (\(SD=15.0\)) for BubbleBoard. A Wilcoxon matched-pairs signed-ranks test performed on SUS scores revealed no statistical significance between the two techniques (\(Z=-1.1361\), \(p>.05\)). In the final questionnaire we obtained mixed results, with 61.1% of the participants preferred BubbleBoard regarding speed, 77.8% who preferred ZoomBoard regarding Accuracy, and 55.6% who preferred BubbleBoard overall. Given the differences in performance between the two keyboards, we expected a stronger preference for BubbleBoard by the participants, but we believe that the greater possibility of making errors in BubbleBoard influenced this result.

Through the open-feedback questionnaire some participants asked to further improve the dynamic layout prediction, while one lamented its presence.

4.2 Discussion

In this second experiment, slower speeds were detected for BubbleBoard than in the first experiment. This can likely be attributed to the greater experience with smartwatches and technology in general for the first experiment’s participants (all Computer Science students).

Participants had significantly higher error totals with BubbleBoard than ZoomBoard, although in terms of non-corrected errors the difference was not significant. This may indicate a greater difficulty in using BubbleBoard that negatively affects text entry speeds, although they still remain significantly higher than ZoomBoard. The introduction of dictionary-based auto-correction functionality could, however, reduce the need for corrections and significantly increase the speed achievable with BubbleBoard, while ZoomBoard would likely benefit less from such functionality given the lower total errors.

5 Conclusions and Further Works

In this paper, we presented BubbleBoard, a new soft keyboard for smartwatches, that uses a QWERTY layout in which some characters are in a big font and can be entered with a single touch, while other characters are in a small font and a touch enlarges the pressed character and its neighbors, so that they can be entered with a second touch. The method may use a static or dynamic layout and space/backspace input with either keys or gestures, these variants were evaluated through a user study with the dynamic mode and keys for space/backspace being the most promising variant. The method was then compared to ZoomBoard in a second user study, which showed a 33% advantage in text input speed in favor of BubbleBoard with 10.7 wpm compared to ZoomBoard with 8.0 wpm.

Future work will focus on integrating auto-correction and prediction capabilities in order to enable lower error rates and faster speeds, and then further comparisons with other text entry methods for smartwatches, including those not based on magnification.