1 Introduction

Given their convenience and portability, smartphones have become one of the most frequently used consumer electronic devices of daily life. Furthermore, the number of smartphone users worldwide has risen sharply: Global market volume reached 1.96 billion units in 2017 and the global share was forecast to top 80% in 2018, while the estimation of global volume for 2022 is 1.9 billion units [48]. Due to technological advances, the touchscreen has become one of the most popular interface on the market [59] since it facilitates work and aids life in the Internet world [6, 50, 54].

Messaging through text entry on smartphones has become an important communication tool around the world [55]. Many users currently prefer to use their smartphone for sharing content and activities such as e-mails [28], blogging, or social networking services [38]. The most widely used input method for text is typing on touchscreen keyboards. To improve the usability of text entry, scholars have proposed many input methods. For example, Zhai and Kristensson [35, 86] proposed Smart Gesture Keyword (SGK), which combined pen-based computing with tapping to strength the efficiency. Although many innovative text input methods have been proposed for touchscreens, the standard QWERTY virtual keyboard still appears to be the most popular. Users are already familiar with it. The rise of T9 is to get rid of QWERTY to a layout that is statistically more efficient [13]. According to iiMedia Research data, more than 60% of smartphone users in China prefer to use pinyin input for text entry on smartphones. Among these, the proportion of users who prefer T9 is 32.5%, while QWERTY users account for 29.3% [25].

Considering the different key sizes of different text input methods, users prefer different input modes and may change finger usage depending on the use situation. An observational study of texters (n = 859) showed that the most frequently used styles were holding the device with both hands and using two-thumb text entry (46.1%) and one-thumb text entry when the same hand holds the device (44.1%) [18]. Since both one-thumb text entry and two-thumb text entry on a touchscreen are an increasingly significant task in the daily lives of smartphone users, user experience (UX) and improvement of text entry are important to increase user efficiency [2, 67]. As one-handed thumb and two-handed thumbs text entry on a touchscreen has become an important task in our daily life in the era of smartphones, texting experience and its improvement are important for user efficiency [67].

Previous studies report thumb gesture, and relative muscle activity during the use of mobile devices. These studies have shown that apparatus design affects user performance and musculoskeletal tension [27, 71]. The ever-increasing demand of users has led to the continuous upgrading of smartphones, and the size of smartphones has also become larger. Smartphone size is one of the factors that must be considered from product design to consumer choice, and has a profound impact on users. A study indicated that proper font and icons sizes on a smartphone can improve the efficiency of text entry [24]. On small touchscreens, typing was prone to errors caused by key distribution, while on large displays, typing on a large keyboard may become faster [87]. From an ergonomic point of view, too large a screen display and smartphone size may cause several difficulties using the device. Screen size also affects display resolution, which influences the user’s visual perception.

Most research on touchscreens has made progress toward a better understanding of the speed and aggregated error rate of users [34, 36]. The present study focused on UX during text input on a smartphone as well as the usability of the device. Most studies investigate a single aspect such as performance or subjectivity [39, 56, 70]. It is even possible to identify indirect signs of emotional processing by identifying changes in anatomical structure activities related to human emotional processing activities [3, 25]. In the context of the present study, subjective measures are self-reported qualitative measures, and objective evaluation refers to the collection of indirect measures of user experience, that often reveal reactions that were not consciously available to the user (such as variations in performance, physiological reaction, emotional reaction and so on) [49]. The present study combined performance, physiological reaction, and subjective measures.

In summary, text input method, gesture, and smartphone size are three critical factors that affect users when typing text messages. It is important to investigate how these factors influence users’ performance, how people perceive their smartphone use, and the nature of their subjective experience. The UX measured in this study may be affected by two factors: thumb gesture and smartphone size. Motivated by this, our aim is to find out the relationship between UX on smartphone text input method and the factors (gesture and smartphone size) affecting this.

2 Related works

2.1 Text input method

The most widely accepted and commonly used text input method is QWERTY [85]. The layout of a QWERTY keyboard is very similar to the traditional QWERTY keyboards used for personal computers; therefore, users are often more familiar with it. QWERTY still stands as the default keyboard layout in contemporary mobile touch-screen devices [16]. However, the button size on a QWERTY keyboard is relatively small since more than 30 buttons are accessed through a very small space. Therefore, using a QWERTY keyboard on small smartphone displays is prone to cause unintentional errors [29]. The T9 keyboard consists of fewer buttons (n = 12), where two or more letters are assigned to each button. Consequently, T9 uses relatively large buttons, making them comparatively easy to find and press. T9 also tolerates imprecision and automatically corrects the user’s errors [2]. However, using a T9 keyboard is often inefficient because users are not familiar with the arrangement of letters and multiple taps are required to toggle between letters [82]. MacKenzie, Zhang, and Soukoreff [46] reported that the data entry rate using a T9 keyboard is not as good as the rate when using a QWERTY keyboard. Niu et al. [53] pointed out that QWERTY was originally designed for Latin alphabets. Li et al. [83] reported that there was no significant difference between QWERTY and T9 when inputting phrases with only English or Chinese characters.

2.2 Input gestures

During one-hand operation, the operating finger has to cover a considerable distance to reach the keys on the edge of the keyboard. Supporting the smartphone while typing, that is, sharing the task requirements of holding and typing, a single hand is obviously much more difficult than using a two-handed grip [69]. Trudeau et al. [69] showed that holding a smartphone with two hands can improve usability. Holding a smartphone with two hands may affect the motor performance of the thumb and joint postures, compared with tapping the touchscreen with the thumb of the holding hand. Azenkot and Zhai [2] reported that the patterns between input with two thumbs and one thumb differ distinctly, suggesting that keyboards design should adapt to these diverse gestures. Ljubic, Glavinic, and Kukec [44] concluded that two-thumb text entry in landscape orientation allowed higher input rates, as opposed to low efficiency when typing was performed with one finger in portrait mode. Most of the research on smartphone gestures used the inputting tasks with a single target. However, continuous text entry imposes higher cognitive and motor requirements.

2.3 Smartphone size

A number of studies have investigated keyboard size [30, 33], button size [56], and button spacing [10, 12]. All these variables depend on the size of the smartphone. Larger devices naturally offer layouts with more convenient key button sizes, but this comes with increased distance between buttons. Therefore, screen size has a certain impact on usability [44]. In fact, in 2014, the average screen size of a smartphone was 4.86 inch [74] and increased to 5.5 inch by 2018 [15]. For adults, 4.7 inch was found to be the most popular smartphone size, even though users subjectively preferred to use 5-inch smartphones [68]. A survey showed that 5.5-inch smartphones are still most widely used in China [63]. Tsai et al. [68] reported that button size significantly affects the accuracy rates and task-completion times; moreover, larger screens outperformed smaller screens, even though they require more than twice the amount of finger movement on the screen. Kietrys et al. [32] studied the effects of the input device type (physical keypad and touchscreen), texting style, and screen size on muscle activities, confirming that there was an increase in muscle activities according to increasing screen size. Zhou, Rau, and Salvendy [77] showed that participants’ typing performance was not significantly better on 5-inch screens than on 3.5-inch screens. They also indicated that the display size has to be sufficiently large to make a significant difference. Al-Suleyman et al. [1] evaluated the usability of smartphone interfaces for different age groups, who use 3.2-, 7-, and 10.1-inch screen size of smartphone and tablet devices. They found that small screen size had smaller saccades proportion. However, the relationship among them and whether larger or smaller sizes help to improve UX still remains unknown.

2.4 User experience

The galvanic skin response (GSR) is used as a measure to index variations in the sympathetic arousal together with emotion, cognition, and attention [14]. Glands in human skin produce ionic sweat, thus causing alterations of electric conductivity. Changes in the electrical resistance of the skin may indicate increasing emotional activation or cognitive effort [49]. GSR in a Web browser with poor design is significantly higher than that of a well-designed Web browser [5]. Heart rate (HR) is the index of the speed of the heartbeat typically measured by beats per minute (bpm) [17]. Studies have shown that HR is correlated with the pleasantness of affective stimuli [58], stress [51], and mental load [83]. The surface electromyography (sEMG) signal closely related to muscle activity [25] provides complex interference patterns of the electrical activity during muscle contraction [52]. To measure the subjective experience, we selected the system usability scale (SUS) to measure user satisfaction and used the NASA task load index (NASA-TLX) to measure users’ subjective task load. SUS enables to get a measure of the perceived usability of a system with a small sample and assesses quickly how people perceived the usability of systems on which they were working [6]. The NASA-TLX [21] was a widely used, subjective, multidimensional assessment tool that rates perceived workload. It can be used as an effective tool for evaluating the mental workload [11].

3 Method

3.1 Participants

The first experiment recruited 24 participants (16 females and 8 males) with an average age of 19.68 years (age range 17–23, SD = 1.56) from Shaanxi Normal University, China. All participants were experts with regard to text input and sent a mean of 234.635 messages per week. Twelve participants had high QWERTY experience, and the other 12 participants had high T9 experience. Half of the QWERTY expert users had to use QWERTY, while the second half used T9 (random assignment) and vice versa for T9 expert users. To balance the sequence of tasks, half of the participants in each group used one-thumb text entry first, followed by two-thumb text entry, while the other half of the participants used the reverse order.

In the second experiment, 24 participants (16 females and 8 males) with an average age of 19.59 years (age range 18–22, SD = 1.44) were recruited from Shaanxi Normal University, China. All participants had to use all smartphone sizes. All participants were expert text input users, since the mean number of messages per week they sent was 196.845. All participants usually used two-thumb text entry. To balance the sequence of tasks, the Latin square design method was used to assign the order in which smartphone sizes were presented to the participants. The order of using different smartphone size (in inches) was 4.7, 5, 5.5.; 4.7, 5.5, 5; 5, 4.7, 5.5; 5, 5.5, 4.7; 5.5, 5, 4.7; and 5.5, 4.7, 5. Each order was assigned to four participants.

All 48 participants in both experiments were right-handed and had a computer use experience of more than three years. Right-handed individuals were chosen to facilitate comparison because most studies that tested smartphone usability choose right-handed participants [64, 78]. None of the participants had ever participated in similar experiments before and none of them reported any major current finger, hand, or wrist problems.

3.2 Experimental design

An online questionnaire on the use of smartphone keyboards was compiled to better understand the habits of smartphone users. A total of 1059 questionnaires were collected, including 465 from male users (43.91%) and 594 from female users (56.09%). The results of the survey showed that 396 (37.39%) users used one-thumb text entry and 510 (48.16%) used two-thumb text entry. The number of users who normally use a 5.5-inch smartphone was 519 (49.01%), the number of those who use a 4.7-inch smartphone was 225 (21.25%), and the number of those who use a 5-inch smartphone was 198 (18.7%).

The first experiment used two text input methods (QWERTY/T9) × two gestures (one-thumb text entry/two-thumb text entry) mixed design. Each participant used one of the smartphone text input methods with both gestures. Gesture was a within-subject variable, while input method was a between-subject variable. In the second experiment, a two text input methods (QWERTY/T9) × three smartphone sizes (4.7/5/5.5 inches) mixed design was used. Smartphone size was defined as within-subject variable, while input method was defined as between-subject variable.

Three different dependent variables were assessed to evaluate UX. Entry speed and error rates were collected as performance measures. Words per minute (WPM) was used as a representation of the time required for producing a specific number of words. The input methods were compared at the character level, using the total error rate. Error rate was defined as the percentage of errors in a single attempt. GSR, HR, and sEMG were measured. Subjective measures were collected based on system usability scale (SUS) [6] and NASA task load index (NASA-TLX) questionnaires [21].

3.3 Materials and apparatus

The smartphone used in experiment 1 was a Redmi 2 (Android 4.4, MSM8916, 1280 * 720 pixels). In experiment 2, participants held a vivo X9 (Android 6.0, MSM8953, 1920 * 1080 pixels), a Meilan 3S (Android 5.1, MTK MT6750, 1280 * 720 pixels), and a Redmi 2 (Android 4.4, MSM8916, 1280 * 720 pixels), which measured 6.0 * 2.9 * .28 inches and weighed 5.5 oz, 5.6 * 2.8 * .33 inches and weighed 4.87 oz, and 5.3 * 2.6 * .37 inches and weighed 11.9 oz, respectively.

For consistency, both input methods were evaluated in portrait mode. The input method used in experiment 1 and experiment 2 was Baidu. The auto-correct setting and the function of space bar confirmation were unavailable for participants. All participants completed the study by typing phrases on TEMA [7], an application that displayed a series of phrases and a keyboard. This was used to present phrases at random and collect performance data for both input methods. Twenty phrases were randomly selected from a list of 500 phrases composed by MacKenzie et al. [47]. All phrases had neither punctuation symbols nor numbers and were lowercase to enable participants to fully focus on each keyboard technique. Recording for each phrase began when the participant’s finger touched the screen and ended when the final word or character had appeared on the screen.

A portable Bio-Radio wireless physiological detector was used to record and display physiological reactions in real time, which can comprehensively detect physiological data and transfer these data via Bluetooth transmission [22]. The sampling rate was 1 kHz. To avoid noise interference, the sensor and its cables were tightly affixed to the participants’ body using tape [22]. A grounding electrode was fixed on the bone of the wrist on the left hand to ensure electrode stability [4]. More details for this device are shown in Fig. 1. The GSR was measured using two Ag–AgCl sensors with a 20-mm inter-electrode spacing on the non-thumb side of the palm on the left hand. HR was measured on the end segment of the left index finger, utilizing two types of LEDs to measure the absorption of light within the finger to determine the pulse [65, 66]. sEMG was recorded using two Ag–AgCl sensors with a 20-mm inter-electrode spacing on the abductor pollicis brevis (APB), placed midway between the volar aspect of the first metacarpophalangeal joint and the first carpometacarpal joint of the right hand [57, 62]. DataWave SciWorks (DataWave Technologies, Parsippany, NJ, USA) provided the data playback functionality associated with real-time experimentation, while providing the ability to analyze existing data from various software applications.

Fig. 1
figure 1

Details of the wearable device

A SUS questionnaire (consisting of ten items), which was first described by Brooke [6], was used to investigate users’ perceived usability. Participants responded to these items using a five-point Likert scale, ranging from “strongly disagree” to “strongly agree.” Scoring of the SUS yielded a composite score between 0 and 100, where a higher score indicated higher perceptions of usability.

To assess workload, NASA-TLX, as proposed by Hart and Staveland [21], was used to measure the subjective workload. The NASA-TLX consists of two parts: ratings and weights. This commonly used rating scale is based on the following six independent scales: mental demand, physical demand, temporal demand, performance, effort, and frustration. The questionnaire consisted of six Likert-scale items, ranging from 0 to 100 in increments of 5. The participants were asked to choose one of two indicators that are felt to be more dominant causing mental workload on these activities. The weights are calculated based on the total number of these choices of 15 combination pairs created from the six subscales. The weights range from 0 to 5. To get a NASA-TLX mental workload score, the weights and ratings for each indicator are multiplied and then added up and divided by 15 (number of pairwise comparisons).

3.4 Procedure

The experiment was conducted in a quiet room. After providing written informed consent, participants completed the pre-established questionnaire regarding their smartphone usage and demographic information. Then, each participant received brief descriptions for the input method on the test smartphone, especially with regard to functional keys. In addition, each experimental block contained a short practice session where participants could familiarize with the smartphone.

All participants sat comfortably in an armless chair (the height of which was adjustable to match various body heights) in front of a desk of 70 cm. height. Participants placed the tested right arm on the desk in a posture and position that provided them with acceptable comfort, so that their arms and wrists were fully supported. This was necessary to ensure that the participants could concentrate on the experimental tasks during the experiment. During the experiment, they also received an introduction on how to operate the TEMA software.

They were asked to remove all metallic objects such as watches, jewelry, and smartphones from their bodies, to avoid interference with the signal received by the physiological instrument. The participants were asked to clean their skin where the electrode would be attached using an alcohol swab. After the alcohol had dried, the cloth electrodes were removed from their backing and were applied to the designated spots of the palm and finger. Then, the participants rested on a chair to relax until they felt comfortable with the experimental environment.

The participants were connected to Bio-radio and were asked to take a deep breath and calm down. Then, the baseline data for each physiological channel of each participant were set at the middle of 120 s., while the first 16 s and final 16 s were excluded from analyses in expectation of an associated increase in movement artifacts [66]. This interval served as baseline. Then, the experiment began and the start time was marked. The participants had to enter a total of 20 phrases in random sequence. They were instructed to input the text as quickly and accurately as possible (so that the participants felt sufficiently comfortable to send the respective message to someone else) [20]. Moreover, participants were allowed to correct potential mistakes.

After the trial runs, participants completed the SUS and NASA-TLX questionnaires and all comments were recorded by the experimenter. After each experiment, the smartphone input cache was cleared. After the questionnaire, the subjects were asked to rest for 1 min before being subjected to the measurements for another gesture task. The test lasted for about 40 min., including initial preparation and breaks. The participants received a compensation of 10 RMB.

3.5 Data analysis

Performance data were recorded by the TEMA software [7, 8]. Each phrase was marked to ensure the accuracy of each trail. The physiological reaction was measured by Bio-radio, and noise was eliminated automatically. DateWave SciWorks was used to analyze the data. A 51–400 Hz band-pass filter was employed for the GSR and sEMG signal, while the HR signal applied a 12–100 Hz band-pass filter. The data were normalized to minimize the influence of participants’ difference in physiological indicators [22, 88, 89]. GSR and HR data were computed, and the mean for each epoch was calculated and normalized by subtracting the mean of the participant’s data for the entire dataset (the base data). For sEMG, the data were calculated after performing low frequency digital filtering and the absolute value was calculated. For SUS, we convert the SUS scores to a percentile rank to normalize it. We assign level 1 to the minimum.

All statistical analyses were performed using SPSS 20.0. Arcsine square root transformations were applied to all error-rate distributions. The data were analyzed using analysis of mixed-design ANOVA. The level of significance was set to p < .05 (two-tailed). Partial eta squared (ηp2) was used to estimate the effect sizes of all ANOVA tests. The sphericity was examined with Mauchly’s test. If the assumption of sphericity was violated, the degrees of freedom were corrected by the Greenhouse–Geisser correction. The Shapiro–Wilk test was used to examine normally distributed. When a main effect or interaction was found to be significant, post hoc comparisons were performed using Bonferroni correction. Error bars in figures represent the standard error of the mean.

4 Results and analysis

4.1 Experiment 1: input method and gestures

4.1.1 Performance

Table 1 shows a summary of descriptive statistics.

Table 1 Descriptive statistics summary for performance
4.1.1.1 Words per minute

Under inspection level α = .05, the data can be considered to obey the normal distribution (two-thumb text entry using QWERTY: W = .924, p = .317; one-thumb text entry using QWERTY: W = .929, p = .373; two-thumb text entry using T9: W = .929, p = .373; one-thumb text entry using T9: W = .970, p = .907). The results of mixed ANOVA showed a significant main effect for input method, F(1, 22) = 27.947, p = .000, ηp2 = .560, indicating that QWERTY achieved a faster WPM than T9. A significant main effect of gesture, F(1, 22) = 37.635, p = .000, ηp2 = .631, was also found, indicating that users who used two-thumb text entry had a faster WPM than those who used one-thumb text entry. Furthermore, a significant interaction was found between input method and gesture: F(1, 22) = 4.484, p = .046, ηp2 = .169 (Fig. 2).

Fig. 2
figure 2

Interaction effect of input method and gesture for words per minute (WPM)

The results of simple effect analysis showed that users who used two-thumb text entry achieved a faster WPM via QWERTY and via T9 than when one-thumb text entry was used, F(1, 22) = 34.050, p = .000, ηp2 = .607 and F(1, 22) = 8.069, p = .010, ηp2 = .268, respectively. Users who used both two-thumb and one-thumb text entry via QWERTY achieved a faster WPM than via T9, F(1, 22) = 36.457, p = .000, ηp2 = .624 and F(1, 22) = 18.598, p = .000, ηp2 = .458, respectively.

4.1.1.2 Total error rate

Under inspection level α = .05, the data can be considered to obey the normal distribution (two-thumb text entry using QWERTY: W = .884, p = .100; one-thumb text entry using QWERTY: W = .898, p = .150; two-thumb text entry using T9: W = .917, p = .260; one-thumb text entry using T9: W = .942, p = .521). The results of mixed ANOVA showed a significant main effect for gesture (F(1, 22) = 4.501, p = .045, ηp2 = .170), indicating that users who used one-thumb text entry had a lower average total error rate than those who used two-thumb text entry.

4.1.2 Physiological reaction

Table 2 shows a summary of descriptive statistics.

Table 2 Descriptive Statistics Summary for Physiological Reaction
4.1.2.1 Galvanic skin response

Under inspection level α = .05, the data can be considered to obey the normal distribution (two-thumb text entry using QWERTY: W = .983, p = .992; one-thumb text entry using QWERTY: W = .906, p = .191; two-thumb text entry using T9: W = .979, p = .979; one-thumb text entry using T9: W = .958, p = .760). The results of mixed ANOVA showed a significant main effect of input method (F(1, 22) = 4.431, p = .047, ηp2 = .168), indicating that users who used QWERTY had a lower GSR than users who used T9. Moreover, a significant interaction was found between input method and gesture (F(1, 22) = 6.308, p = .020, ηp2 = .223) (Fig. 3).

Fig. 3
figure 3

Interaction effect of input method and gesture for galvanic skin response (GSR)

The results of further simple effect analysis showed that users who used two-thumb text entry achieved a lower GSR via QWERTY than via T9, F(1, 22) = 7.270, p = .013, ηp2 = .248. Users who used T9 had a lower GSR with one-thumb text entry than with two-thumb text entry, F(1, 22) = 4.505, p = .045, ηp2 = .170.

4.1.2.2 Heart rate

Under inspection level α = .05, the data can be considered to obey the normal distribution (two-thumb text entry using QWERTY: W = .902, p = .166; one-thumb text entry using QWERTY: W = .914, p = .244; two-thumb text entry using T9: W = .896, p = .143; one-thumb text entry using T9: W = .968, p = .886). The results of mixed ANOVA showed a significant interaction between input method and gesture, F(1, 22) = 6.952, p = .015, ηp2 = .240 (Fig. 4). The results of further simple effect analysis showed that users who used one-thumb text entry via T9 had a lower HR than those who did so via QWERTY, F(1, 22) = 4.908, p = .037, ηp2 = .182. Moreover, users who used QWERTY with two-thumb text entry had a lower HR compared with those who used one-thumb text entry, F(1, 22) = 7.112, p = .014, ηp2 = .244.

Fig. 4
figure 4

Interaction effect of input method and gesture for heart rate (HR)

4.1.2.3 Surface electromyography

Under inspection level α = .05, the data can be considered to obey the normal distribution (two-thumb text entry using QWERTY: W = .863, p = .054; one-thumb text entry using QWERTY: W = .912, p = .223; two-thumb text entry using T9: W = .187, p = .200; one-thumb text entry using T9: W = .241, p = .052). The results of mixed ANOVA showed a significant main effect of gesture, F(1, 22) = 13.532, p = .001, ηp2 = .381, indicating that users who used two-thumb text entry had a lower sEMG than those who used one-thumb text entry. Furthermore, a significant interaction was found between input method and gesture, F(1, 22) = 4.370, p = .048, ηp2 = .166 (Fig. 5). The results of further simple effect analysis showed that users who used two-thumb text entry achieved a lower sEMG via QWERTY than if they used one-thumb text entry, F(1, 22) = 16.641, p = .000, ηp2 = .431.

Fig. 5
figure 5

Interaction effect of input method and gesture for surface electromyography (sEMG)

4.1.3 Subjective measure

Table 3 shows a summary of descriptive statistics.

Table 3 Descriptive statistics summary for subjective measure
4.1.3.1 System usability scale

Under inspection level α = .05, the data can be considered to obey the normal distribution (two-thumb text entry using QWERTY: W = .891, p = .123; one-thumb text entry using QWERTY: W = .927, p = .349; two-thumb text entry using T9: W = .892, p = .124; one-thumb text entry using T9: W = .894, p = .134). The results of mixed ANOVA showed a significant interaction was found between input method and gesture, F(1, 22) = 29.053, p = .000, ηp2 = .542 (Fig. 6). The results of further simple effect analysis showed that users who used two-thumb text entry achieved higher SUS scores via QWERTY than via T9 (F(1, 22) = 10.426, p = .004, ηp2 = .322). However, users who used one-thumb text entry achieved higher SUS scores via T9 than via QWERTY (F(1, 22) = 5.713, p = .026, ηp2 = .206). Users who used QWERTY achieved higher SUS scores with two-thumb text entry than with one-thumb text entry (F(1, 22) = 77.254, p = .000, ηp2 = .778) and users who used T9 achieved higher SUS scores with one-thumb text entry than with two-thumb text entry (F(1, 22) = 11.355, p = .003, ηp2 = .340).

Fig. 6
figure 6

Interaction effect of input method and gesture for system usability scale (SUS)

4.1.3.2 NASA task load index

Under inspection level α = .05, the data can be considered to obey the normal distribution (two-thumb text entry using QWERTY: W = .870, p = .066; one-thumb text entry using QWERTY: W = .897, p = .146; two-thumb text entry using T9: W = .971, p = .919; one-thumb text entry using T9: W = .971, p = .919). The results of mixed ANOVA showed a significant main effect of input method (F(1, 22) = 15.773, p = .001, ηp2 = .418), indicating that users who used QWERTY achieved lower NASA-TLX scores than those who used T9. Moreover, a significant interaction was found between input method and gesture, F(1, 22) = 371.800, p = .000, ηp2 = .944 (Fig. 7).

Fig. 7
figure 7

Interaction effect of input method and gesture for NASA task load index (NASA-TLX)

The results of further simple effect analysis showed that users who used two-thumb text entry achieved lower NASA-TLX scores via QWERTY than via T9 (F(1, 22) = 35.150, p = .000, ηp2 = .615). Users who used QWERTY with two-thumb text entry achieved lower NASA-TLX scores than users who used one-thumb text entry (F(1, 22) = 215.600, p = .000, ηp2 = .907). Users who used T9 with one-thumb text entry achieved lower NASA-TLX scores than users who used two-thumb text entry (F(1, 22) = 158.400, p = .000, ηp2 = .878).

4.2 Experiment 2: input method and smartphone size

4.2.1 Performance

Table 4 shows a summary of descriptive statistics.

Table 4 Descriptive statistics summary for performance
4.2.1.1 Words per minute

Under inspection level α = .05, the data can be considered to obey the normal distribution (using QWERTY with a 4.7-inch smartphone: W = .922, p = .307; using QWERTY with a 5-inch smartphone: W = .957, p = .737; using QWERTY with a 5.5-inch smartphone: W = .915, p = .249; using T9 with a 4.7-inch smartphone: W = .941, p = .517; using T9 with a 5-inch smartphone: W = .900, p = .158; using T9 with a 5.5-inch smartphone: W = .863, p = .053). The assumption of sphericity was violated, and the degrees of freedom were corrected by the Greenhouse–Geisser correction (p = .029). The results of mixed ANOVA showed a significant main effect of input method (F(1, 22) = 66.990, p = .000, ηp2 = .753), indicating that QWERTY had a higher WPM than T9; a significant main effect of smartphone size was found (F(1.56, 34.23) = 4.105, p = .034, ηp2 = .157), indicating that users who used the 5-inch smartphone had a higher WPM than user who used a 4.7-inch smartphone (p = .039). Moreover, a significant interaction was found between input method and smartphone size (F(1.56, 34.23) = 3.770, p = .043, ηp2 = .146) (Fig. 8).

Fig. 8
figure 8

Interaction effect of input method and smartphone size for WPM

The results of further simple effect analysis showed that users who used QWERTY with smartphones with different sizes showed a significant difference (F(1, 22) = 4.679, p = .021, ηp2 = .308). Users who used QWERTY with a 5.5-inch smartphone achieved a significant higher WPM than users who used a 4.7-inch smartphone (p = .019). The results of further simple effect analysis also showed that users who used T9 with smartphones of different sizes achieved significantly different results (F(1, 22) = 3.486, p = .049, ηp2 = .349). Users who used T9 with a 5-inch smartphone achieved a significant higher WPM than users who used a 5.5-inch smartphone (p = .047).

4.2.1.2 Total error rate

Under inspection level α = .05, the data can be considered to obey the normal distribution (using QWERTY with a 4.7-inch smartphone: W = .950, p = .630; using QWERTY with a 5-inch smartphone: W = .922, p = .304; using QWERTY with a 5.5-inch smartphone: W = .892, p = .124; using T9 with a 4.7-inch smartphone: W = .923, p = .310; using T9 with a 5-inch smartphone: W = .960, p = .790; using T9 with a 5.5-inch smartphone: W = .906, p = .191). The assumption of sphericity was fitted, and the degrees of freedom were corrected by the Greenhouse–Geisser correction (p = .206). The results of mixed ANOVA showed a significant main effect of smartphone size (F(1, 22) = 7.004, p = .015, ηp2 = .241), indicating that users who used a 5.5-inch smartphone had a lower average total error rate than users who used a 4.7-inch smartphone (p = .044).

4.2.2 Physiological reaction

Table 5 shows a summary of descriptive statistics.

Table 5 Descriptive statistics summary for physiological reaction
4.2.2.1 Galvanic skin response

Under inspection level α = .05, the data can be considered to obey the normal distribution (using QWERTY with a 4.7-inch smartphone: W = .917, p = .261; using QWERTY with a 5-inch smartphone: W = .930, p = .377; using QWERTY with a 5.5-inch smartphone: W = .935, p = .432; using T9 with a 4.7-inch smartphone: W = .920, p = .287; using T9 with a 5-inch smartphone: W = .965, p = .854; using T9 with a 5.5-inch smartphone: W = .945, p = .559). The assumption of sphericity was fitted, and the degrees of freedom were corrected by the Greenhouse–Geisser correction (p = .934). The results of mixed ANOVA showed a significant main effect of input method (F(1, 22) = 7.591, p = .012, ηp2 = .257), indicating that users who used QWERTY had a lower GSR than those who used T9. A significant main effect of smartphone size was also found (F(1, 22) = 13.711, p = .000, ηp2 = .384), indicating that users who used a 5-inch smartphone had a lower GSR than users who used a 4.7-inch smartphone (p = .001) and a 5.5-inch smartphone (p = .000). A significant interaction was also found between input method and smartphone size (F(1, 22) = 4.045, p = .024, ηp2 = .155) (Fig. 9).

Fig. 9
figure 9

Interaction effect of input method and smartphone size for GSR

The results of further simple effect analysis showed that users who used QWERTY with smartphones with different sizes showed a significant difference (F(1, 22) = 9.487, p = .001, ηp2 = .475). Users who used QWERTY with a 5-inch smartphone achieved a significant lower GSR than those who used a 4.7-inch smartphone (p = .001). The results of further simple effect analysis also showed that users who used T9 with smartphones with different sizes showed a significant difference (F(1, 22) = 7.728, p = .003, ηp2 = .424). Users who used T9 with a 5-inch smartphone achieved a significantly lower GSR than users who used a 5.5-inch smartphone (p = .002).

4.2.2.2 Heart rate

Under inspection level α = .05, the data can be considered to obey the normal distribution (using QWERTY with a 4.7-inch smartphone: W = .902, p = .169; using QWERTY with a 5-inch smartphone: W = .955, p = .704; using QWERTY with a 5.5-inch smartphone: W = .964, p = .834; using T9 with a 4.7-inch smartphone: W = .896, p = .139; using T9 with a 5-inch smartphone: W = .950, p = .643; using T9 with a 5.5-inch smartphone: W = .939, p = .486). The assumption of sphericity was fitted, and the degrees of freedom were corrected by the Greenhouse–Geisser correction (p = .659). The results of mixed ANOVA showed a significant main effect of input method (F(1, 22) = 13.259, p = .001, ηp2 = .376), indicating that users who used QWERTY had a lower HR than those who used T9. A significant main effect of smartphone size was also found (F(1, 22) = 9.761, p = .000, ηp2 = .307), indicating that users who used a 5-inch smartphone had a lower HR than those who used a 4.7-inch smartphone (p = .004) and a 5.5-inch smartphone (p = .002). A significant interaction was found between input method and smartphone size (F(1, 22) = 3.333, p = .045, ηp2 = .132) (Fig. 10).

Fig. 10
figure 10

Interaction effect of input method and smartphone size for HR

The results of further simple effect analysis showed that users who used QWERTY with smartphones with different sizes had a significant difference (F(1, 22) = 3.665, p = .043, ηp2 = .259). Users who used QWERTY with a 5-inch smartphone achieved a significantly lower HR than those who used a 4.7-inch smartphone (p = .034). The results of further simple effect analysis also showed that users who used T9 with smartphones with different sizes showed a significant difference (F(1, 22) = 9.109, p = .001, ηp2 = .465). Users who used T9 with a 5-inch smartphone achieved a significantly lower HR than users who used a 5.5-inch smartphone (p = .001).

4.2.2.3 Surface electromyography

Under inspection level α = .05, the data can be considered to obey the normal distribution (using QWERTY with a 4.7-inch smartphone: W = .932, p = .398; using QWERTY with a 5-inch smartphone: W = .985, p = .996; using QWERTY with a 5.5-inch smartphone: W = .925, p = .334; using T9 with a 4.7-inch smartphone: W = .948, p = .612; using T9 with a 5-inch smartphone: W = .975, p = .956; using T9 with a 5.5-inch smartphone: W = .936, p = .449). The assumption of sphericity was violated, and the degrees of freedom were corrected by the Greenhouse–Geisser correction (p = .000). The results of mixed ANOVA showed a significant main effect for smartphone size (F(1.30, 28.60) = 5.198, p = .022, ηp2 = .191), indicating that users who used a 5-inch smartphone had a lower sEMG than users who used a 4.7-inch smartphone (p = .042) and a 5.5-inch smartphone (p = .016).

4.2.3 Subjective measure

Table 6 shows a summary of descriptive statistics.

Table 6 Descriptive statistics summary for subjective measure
4.2.3.1 System usability scale

Under inspection level α = .05, the data can be considered to obey the normal distribution (using QWERTY with a 4.7-inch smartphone: W = .938, p = .473; using QWERTY with a 5-inch smartphone: W = .932, p = .406; using QWERTY with a 5.5-inch smartphone: W = .921, p = .298; using T9 with a 4.7-inch smartphone: W = .892, p = .123; using T9 with a 5-inch smartphone: W = .943, p = .537; using T9 with a 5.5-inch smartphone: W = .898, p = .152). The assumption of sphericity was violated, and the degrees of freedom were corrected by the Greenhouse–Geisser correction (p = .000). The results of mixed ANOVA showed a significant main effect of input method (F(1, 22) = 6.310, p = .020, ηp2 = .215), indicating that users who used QWERTY achieved higher SUS scores than users who used T9. A significant interaction between input method and smartphone size was also found (F(1.06, 23.38) = 5.667, p = .048, ηp2 = .218) (Fig. 11).

Fig. 11
figure 11

Interaction effect of input method and smartphone size for SUS

The results of further simple effect analysis showed that users who used QWERTY with smartphones with different sizes showed a significant difference (F(1, 22) = 8.260, p = .002, ηp2 = .526). Users who used QWERTY with a 5-inch smartphone achieved significantly higher SUS scores than those who used a 4.7-inch smartphone (p = .001). The results of further simple effect analysis also showed that users who used T9 with smartphones with different sizes showed a significant difference (F(1, 22) = 8.260, p = .002, ηp2 = .514). Users who used T9 with a 4.7-inch smartphone achieved significantly higher SUS scores than those who used a 5-inch smartphone (p = .001).

4.2.3.2 NASA task load index

Under inspection level α = .05, the data can be considered to obey the normal distribution (using QWERTY with a 4.7-inch smartphone: W = .880, p = .088; using QWERTY with a 5-inch smartphone: W = .902, p = .168; using QWERTY with a 5.5-inch smartphone: W = .956, p = .721; using T9 with a 4.7-inch smartphone: W = .922, p = .303; using T9 with a 5-inch smartphone: W = .913, p = .233; using T9 with a 5.5-inch smartphone: W = .942, p = .524). The assumption of sphericity was violated, and the degrees of freedom were corrected by the Greenhouse–Geisser correction (p = .000). The results of mixed ANOVA showed a significant main effect for input method (F(1, 22) = 4.578, p = .044, ηp2 = .172), indicating that users who used QWERTY achieved lower NASA-TLX scores than those who used T9. A significant main effect was found for smartphone size (F(1.30, 28.59) = 9.818, p = .002, ηp2 = .309), indicating that users who used a 5-inch smartphone achieved lower NASA-TLX scores than users who used a 4.7-inch smartphone (p = .000) and a 5.5-inch smartphone (p = .030).

5 Discussion

Typing on a smartphone remains the only suitable text entry mode in many situations, and an active UX touch typing is essential for an effortless flow of text input. Users’ touch patterns may vary according to hand posture (i.e., how a user holds the smartphone and types on it), and smartphone size is one of the factors that have to be considered from product design to consumer choice. This study investigated the impact of input mode and smartphone size on the UX of smartphone text input method. Study 1 showed that QWERTY was more effective for two-thumb input, T9 was more effective for one thumb input, and two-thumb input achieved an overall better UX than one-thumb input. To further explore the user experience of two-thumb text entry on smartphone of different sizes, we conducted a second experiment. Study 2 showed that using QWERTY and T9 with a 5-inch smartphone achieved a better UX, while using QWERTY with a 5.5-inch smartphone achieved better performance.

The present study showed that users who used two-thumb text entry achieved a higher WPM count via QWERTY and T9 than users who used one-thumb text entry. When each thumb only moves within one half of the screen, it takes less time to move from one key to the next when using two-thumb text entry [63]. On-screen keyboards are not very fast when used with a single finger [34]. Furthermore, a user can move one thumb toward its next target, while the other thumb is also active, thus parallelizing movements when typing. The natural limits of thumb movement [40] can complicate thumb typing while simultaneously holding the device with one hand. The study also showed that users who used one-thumb text entry had a lower average total error rate than user who used two-thumb text entry. During the single-thumb text input on the virtual keyboard, the touch coordinates are displayed in real time decreased the touch deviation from the center of the keys and decreased the number of typing errors [67]. Schildbach and Rukzio [61] studied participants using one-handed interaction with thumb performed target acquisition and reading tasks while standing and walking. The results showed that the text input speed increases and the error rate decreases the larger the keys are in the walking condition. The present study also showed that users who used both two-thumb and one-thumb text entry via QWERTY achieved more WPM than those who used T9. This is partly due to the utilized English input phrases. The reason for choosing these was that these phrases are more authoritative, and many studies have already used them [9, 42, 72]. Moreover, the participants were all college students with a wealth of English experience. The obtained results are in line with those reported by Azenkot and Zhai [2], who found that two-thumb text entry was faster than one-thumb text entry; however, two-thumb text entry had a higher error rate than one-thumb text entry. The rapid and repetitive movements may fatigue the thumb, thus increasing typing errors and unnecessary repetitive typing, both of which reduce performance [79]. Li et al. [41] found that T9 was more effective for indoors and when sitting than QWERTY, while QWERTY was more effective than T9 for outdoor use and when walking. The difference of results is that their texting task is in Chinese, while our texting task is in English. Vertanen and Kristensson [73] found that mobile users tended to more frequently use emoticons and texting language, but used fewer commas. They demonstrated that mobile state influences users’ text input method.

The present study found that users who used two-thumb text entry had a lower GSR via QWERTY than via T9. Frequent left–right alternation is an attribute of the QWERTY layout, which was originally designed to minimize mechanical interference and later became a major advantage for two-handed typing [80]. Each key of QWERTY keyboard corresponds to a specific character, which provides appropriate real-time feedback of the actions of users, thus reducing their mental load and making the interaction easy and fast [77]. Furthermore, users who used QWERTY with two-thumb text entry had a lower HR and sEMG than users who used one-thumb text entry. Gustafsson et al. [20] also found higher muscle activation in the extensors during typing with one thumb compared with two thumbs. Their study used a large sample size (n = 56), which was attributed to the increased need for stability while holding the smartphone while simultaneously typing. Trudeau et al. found that holding a smartphone with both hands enhanced the range of motion of the thumb, thus reducing the variability of use [69]. The present study also showed that users who used T9 had a lower GSR with one-thumb text entry than with two-thumb text entry; moreover, users who used one-thumb text entry had a lower HR via T9 than via QWERTY. The reason might be that the T9 keyboard had a relatively larger key size, which helped during one-thumb text entry to hit the target keys, while using two-thumb text entry, both thumbs may collide and interfere with each other. When using T9, the thumb can move relatively shorter distances due to the bigger keys. There exists a different touching pattern when comparing the usages of the index finger and thumb. The main reason would be the postural stability. Using one hand for grip posture while the other hand is in dynamic movement reduces finger reach. Most studies on smartphones text entry emphasized the one-thumb operation of touchscreens when considering the control area and efficiency. The reason is that the touch control range was influenced more by the key placement on the display when the smartphone was held vertically and by one hand [23].

The present study has shown that users who used two-thumb text entry achieved higher SUS scores and lower NASA-TLX scores via QWERTY than via T9. Moreover, users who used QWERTY achieved higher SUS scores and lower NASA-TLX scores with two-thumb text entry than with one-thumb text entry. Most users are comfortable with the QWERTY design and find it very difficult to accept other layouts [18]. Holding a device and tapping on its touchscreen using a single hand may be more difficult than using two hands that can share all the task requirements [69]. It is sometimes clumsy for a user to hold a large touch screen smartphone in one hand. Using two hands can satisfy the user to reach distant targets on the screen more easily. The present study also found that users who used T9 with one-thumb text entry achieved higher SUS scores and lower NASA-TLX scores than those who used two-thumb text entry. Furthermore, users who used one-thumb text entry achieved higher SUS scores via T9 than via QWERTY. To cover the keys at the corners of the keyboard (single hand operation), when using QWERTY, the thumb needs to move a lot from left to right on the screen. This potentially causes uncomfortable interactions between the thumb and the keyboard touchscreen [79]. During single-handed smartphone use, the movement of the thumb is limited because the hand has to also successfully complete the task of securely holding the smartphone [70].

The results showed that users who used a 5-inch smartphone had a higher WPM than users who used a 4.7-inch smartphone. Users who used a T9 with a 5-inch smartphone achieved a significantly higher WPM than those who used a 5.5-inch smartphone. With T9, the typical typing speed is only about 10 WPM [45], which is identical to the data found in the present study. Hwangbo et al. [24] showed that when target size reached a certain size, the task completion increased. Excessively large smartphones size would lead to muscle discomfort related to the thumb, and the key area of T9 was relatively large. The obtained results were consistent with those reported by Yi et al. [81], who found that increases in size did not significantly improve performance with regard to text entry speed, error rate, or user preference. These results also showed that users who used QWERTY with a 5.5-inch smartphone achieved a significantly higher WPM than those who used a 4.7-inch smartphone. Moreover, users who used a 5.5-inch smartphone had a lower average total error rate than those who used a 4.7-inch smartphone. Kane et al. [31] examined the effect of walking and the adaption of the user interface on the performance when using two hands and the thumb to interact with soft buttons on a mobile. The results showed a decrease in error rate when increasing the target size. Wang et al. [75] compared two screen sizes (5.0 inch vs. 6.5 inch), two area sizes (small-area vs. large-area), and two keyboard layouts (curved QWERTY vs. traditional QWERTY). They found that the main effect of screen size, that is, the pair per minute of the 5.0-inch screen is longer than that of the 6.5-inch screen. Our results had a common trend with their research results, the larger screen size of the smartphone, the faster the input speed, the smaller the error rate. Kwon, Lee, and Chung [37] set key size as independent variable and pointed out that user performance and preference improved with increasing key size. This may be because the larger QWERTY keyboard may prompt participants to type as if they were on a regular computer keyboard. When key size was small, entire keys could be covered by the fingers, which cause the wrong keys to be activated, which lengthens operation time [29].

The results showed that users who used a 5-inch smartphone had lower GSR, HR, and sEMG. Our results are the same as those presented in [43], where the participants used their thumb to touch every point on the screen. The participants had different hand length and hand width. They found that touchscreen size from 4.6 inch to 5.0 inch is suggested for most people because this interval of screen size is relatively suitable. Users who used QWERTY with a 5-inch smartphone achieved a significantly lower GSR and HR than those who used a 4.7-inch smartphone. Users who used T9 with a 5-inch smartphone achieved significantly lower GSR and HR than those who used a 5.5-inch smartphone. According to Fitts’ law, the task difficulty increases as the distance to the target increases. Increasing the size of the device may have a negative impact on the grip comfort and increase the physical requirements needed to use a smartphone [32, 37]. However, the increase in thumb coverage area does not completely match the increase in touchscreen size [79]. For T9, multiple letters are placed on one key; therefore, these keys must be larger than the keys on a QWERTY keyboard of the same size. Smaller buttons are physically more difficult to locate than large ones. In order to reduce the contact area, participants had to raise their thumbs and hold them in a vertical gesture (i.e., the thumb was perpendicular to the surface of the smartphone screen). In doing so, the accuracy of target selection could be maintained [57] and users’ cognitive load and thumb muscle activation increased. These results are consistent with those reported by Kietrys et al. [32], who found a non-significant trend in thumb muscle activity with increasing screen size. Werth and Babski-Reeves [76] also reported lower muscle activity when typing on a virtual (touchscreen) keyboard.

The obtained results showed that users who used QWERTY with a 5-inch smartphone achieved significantly higher SUS scores than users who used a 4.7-inch smartphone. Users who used a 5-inch smartphone achieved the lowest NASA-TLX scores. QWERTY can be problematic since the small size of the keys can lead to the coverage of the keys by the finger; therefore, subjective rating was poor if the display size was too small [79]. The Chinese standard stated that the widths of the distal joint of the thumb were 17 mm. and 18 mm. on average for female and male adults, respectively [60]. According to the average, the thumb covers almost 2/3 of the entry area. Pritom et al. [59] found that fitting such a large number of buttons on a small QWERTY screen often makes it difficult for unskilled users to enter text and they found it hard to maintain the same quality. Jia et al. [26] found that many participants liked large displays for readability when staying at home, and many participants liked small displays for portability when going out. Large hand-sized users find it difficult to make multiple key presses fast and without any or lesser errors to enter both text and special characters. Situation becomes more difficult when there is a need to text in a hurry or while in motion (walking or talking to someone else).

The present study has certain limitations: the education level of the subjects and the memory effect of the visual feedback may affect the results. Also, different types of participants can be chosen, such as male and female participants since their mean palm length differs. Because the study recruited only right-handed users, to avoid effects associated with primary handedness, a further study should investigate the effect of users’ primary hand on pointing performance. Future research should also test non-alphanumeric input (i.e., punctuation [84], symbols, and modifiers), which have so far been ignored by the vast majority of studies. Modern Android smartphones come with a specific “one-handed mode,” which provides a convenient to users who used one-thumb text entry. Future research can study the user experience of different settings of input methods, such as input settings, and interface setting. More and more people use their mobile phone while walking in order to browse the Web, to read or for social networking. It should be analyzed how mobile user interfaces could be improved or adapted so that they reduce the time the user is engaged with them and the cognitive load while walking. Currently, virtual reality can relieve the users from physical world limitations such as constrained space or noisy environments [19]. In future work, we could study text input and document editing for office work by VR media. Most studies have analyzed these factors in a stationary state, in which subjects sit comfortably on a chair in a quiet situation (i.e., in the least distracting environment conceivable) while completing a single text input task [41]. The reason is that a noisy environment outdoor will also affect the user’s performance and experience. Future studies can perform other tasks in different contexts and environments of use (e.g., at home, at the office, outside under direct sunlight, while carrying out other tasks in parallel, etc.).

6 Conclusion

The present study investigated the impact of gestures and smartphone size on the UX of users during a smartphone text input task. The results showed that QWERTY was more effective for two-thumb text entry, T9 was more effective for one-thumb text entry, and two-thumb text entry had an overall better UX than one-thumb text entry. Furthermore, using QWERTY and T9 on a 5-inch smartphone achieved a better UX, while using QWERTY on a 5.5-inch smartphone achieved a better performance. At present, QWERTY and T9 are the most commonly used text input methods, so it is necessary to improve the user experience of the two input methods. Our results are useful for understanding human typing ability on QWERTY and T9 and offer suggestions for users using different gesture. QWERTY is suitable for both hands rather than one hand, because it has more keys. If we can put the commonly used keys closer to the used hand, that is, the edge of the QWERTY, it will be more conducive to improving the user experience. It is also necessary to innovate in the button design to reduce the horizontal length. T9 is suitable for one rather than both hands, because its keys are large enough to operate with one hand, which will cause two hands to collide. Therefore, the distance between the T9’s keys can be increased appropriately. We should weaken the input imprecision of its ambiguous keyboards, to ensure users can accurately target each key and input the key as few as possible. Text input is one of the most intensive and frequent human–computer interaction (HCI) tasks, and speed is a very important consideration. To improve efficiency, T9 can improve its word association and intelligent error correction function. For gestures, long-term mobile phone input operation will inevitably cause the thumb or index finger discomfort. It is thus important to update the profile of the phone, or the long-width ratio changes to reduce hand muscle fatigue. Smartphones should be designed to enhance two-hand performance and reduce the damage to muscles and bones caused by one-hand use.

The results are also helpful for text input method design, in terms of interface design and screen size. With regard to the sizes, we can find that 5-inch smartphone size has a better user experience. For the two input methods, not only for the input method letter key size is appropriate, but also for the palm control phone is also suitable. According to the user’s hand size and input gesture, the input method system recommends selecting the appropriate adjustable input method interface size. The design of user interfaces for touchscreen smartphones needs to consider the movement characteristics of the thumb, rather than simply varying the sizes of keys or screens. For QWERTY, the users prefer the largest possible display area and key size. While it is not absolute, we need to continue to explore the most appropriate mobile phone input method key size. Our experimental approach, including performance, subjectivity and physiology, provides a better realistic assessment of the user experience. Our experimental research can also be applied to more input methods (e.g., strokes, handwriting) or to use in all situations and circumstances (lying, walking).