Keywords

1 Introduction

A significant body of research has documented age-related decline in cognitive and motor abilities among adults over 60 [1,2,3,4,5,6,7]. These declines manifest themselves in a variety of ways. Slower reaction time for older adults has been documented for a number of tasks, for example: naming items [8], wayfinding [9], and simple and complex auditory tasks [10].

Age-related differences have also been found in accomplishing tasks with technology devices. Some studies have focused on measuring behavioral differences when using a computer mouse or when touching the screen. Older users took longer to select targets using a PC mouse than younger or middle-aged adults but there was no time difference by age when the task was to touch the target on the PC screen [11]. When using touch on smaller devices, age related differences appear. For example, one study found that older adults took longer to touch targets, and made more selection errors in terms of misses and slip rate (where the person originally hits the target but slips off the target before lifting the finger) compared to younger adults on smaller devices [12]. Other researchers found that while older adults were slower in touch target tasks than younger adults, the gap between the two groups was smaller with iPad touch screens than with a mouse on desktop screens [13]. Similarly, another study found that the computer mouse increased both cognitive- and motor-demands compared to touch pad and touch screen for tasks involving static image selection [14]. Touch screens, especially with large touch buttons, more easily accommodate older adults compared to a mouse or keyboard for input [15]. In addition to reaction time, website research has found differences in eye movement between older and younger adults [16, 17]. These data suggest smaller devices lead to more performance issues for older adults.

In general these findings suggest that devices activated by touch might be easier for older adults to use than desktop or laptop computers, which require an external navigation device such as a mouse or keyboard. These studies also suggest that device size influences performance, with larger touch devices leading to fewer differences between older and younger users. This is especially the case for novice users as research also shows the amount of time spent using and interacting with devices affects older adult performance [14, 18]. The direction of the relationship is as expected with more experience with a device type leading to more positive usability outcomes such as higher accuracy and increased speed accomplishing tasks.

Some studies have focused on improving the usability of smartphones by designing inputs on smartphones for older adults. For touch entry, Murata & Iwase (2005) found 16.5 mm (about .6 in.) button size and gaps between buttons resulted in the fastest reaction time for older users [11]. In order to reduce performance error rates for older users, the optimal amount of spacing between buttons was found to be between 3.17 mm and 12.7 mm (about .12 to .5 in.) [15]. One study warns against using small buttons and placing frequently used buttons on the right side, especially the lower right side, of the screen because of thumb mobility issues in older adults, given the assumption of right-single handed phone operation [19].

While tasks on mobile devices still might take longer for older adults to accomplish, that does not necessarily mean that older adults will be less satisfied with accomplishing those tasks. Older adults have been associated with what has been called a “positivity effect” [20]. That is, researchers have observed a shift in behavior and attitude from a negativity bias early in life to a positivity bias in middle and late adulthood [21]. One study concludes that older adults rate experiences more positively not because they forget or suppress negative experiences or have fewer negative experiences compared to younger adults, but rather they simply assess those experiences more favorably than younger adults [26]. This was true for both real and hypothetical life experiences. In that study, older adults could recall negative experiences (both real and hypothetical) as well as younger adults. However their appraisal of those real and hypothetical experiences were more positive than their younger counterparts. While there is some contradictory evidence, the effect has been documented in over 100 studies [22]. Examples of this positivity effect include but are not limited to visual attention (older adults remembered and focused on pictures of positive social interactions more so than younger adults) [23], working memory (responded better to positive images) [24], and short-term memory (remembered the positive words and pictures more) [25].

The mission of the U.S. Census Bureau is to produce accurate statistics of the American people, households and economy. Determining the best questions and designs to use to collect the information is essential to producing accurate data. During development of surveys, we measure reaction time and user satisfaction with different survey designs with the goal of developing surveys that collect accurate data in the least burdensome manner. In this present study we use data from three experiments on smartphones to explore whether optimal survey designs for older users are also optimal for younger users as measured by time-on-task; whether older adults take more time to complete survey questions (regardless of design) compared to younger adults; and whether the positivity effect holds when older adults complete surveys on smartphones.

The original data collection was focused on optimal designs for mobile web surveys [27,28,29]. The data were generated from a series of experiments where older adults completed different survey tasks on mobile phones using different screen designs. Based on their performance, satisfaction, and preference, mobile web survey design guidelines were proposed. The assumption was that if older adults performed optimally with a specific design, then younger adults would do at least as well because of their superior perceptual and motor capabilities. To validate this underlying assumption, younger adults were also recruited for this work. This paper has a two-fold purpose. First, it explores whether survey designs that take less time for older adults to complete also reduce the time younger adults need to complete the survey. Hypothesis 1 is that given the same device size and touch feature size, the effect of age on question completion time will not vary by survey design. That is, if one design takes older adults more time to complete, then it also will take younger adults more time to complete. Second, the paper compares the behavioral and attitudinal differences of older adults and younger adults by examining the reaction time and satisfaction data collected for these experiments to determine if reaction time increases for older adults using mobile survey designs than younger adults and whether there is a positivity effect for a common usability metric of satisfaction in use of the online mobile survey. Hypotheses 2 and 3 are that older adults will take more time to answer questions on mobile web surveys than younger adults regardless of design and they will have higher satisfaction answering those surveys than younger adults.

2 Methods

Below are highlights of methods relevant to the three experiments described in this paper. In the analyses, we consider significance to be at p = 0.05 or less.

2.1 Participants

The participants were a convenience sample recruited from senior centers, community centers, and community colleges in a major metropolitan area in the U.S. between late 2016 and the summer of 2018. We prescreened participants to make sure they owned a smartphone and had at least 12 months of experience using a smartphone. Additionally, we prescreened participants to include only individuals who had an 8th grade education or higher, who were fluent in English, and who had normal (or corrected to normal) vision.

Participant characteristics are provided in Table 1. Experiments 1, 2 and 3 were conducted with a pool of 122, 71, and 40 participants respectively. The mean, median and range of participant ages are provided in Table 1 to show that the data skewed older for some experiments and younger for others. Some participants participated in more than one experiment.

Table 1. Participant demographics for 3 experiments.

2.2 Data Collection Methods

One-on-one sessions were conducted at senior centers, community centers, and community colleges. Participants were walk-ups that day or were pre-scheduled. At the appointment time, they were screened by Census Bureau staff and signed a consent form. Then, each participant worked with a test administrator (TA) and completed between 4 to 6 experiments, only a subset of which are the subject of this paper. In this paper, we draw upon the experiments that collected data from older and younger adults. The experiments were implemented as mobile apps which were loaded on a Census-owned iPhone 5s or 6s. Test administrators provided participants with one of these devices for purposes of the experiment, and gave instructions to the participants. This included instructing participants not to talk aloud during the session, and to complete the survey to the best of their ability as though they were answering the survey at home without anyone’s assistance. The participants performed the task independently, taking 10–20 min for each experiment, depending upon the experimental design. At the end of the session, each participant was given a $40 honorarium.

2.3 Stimuli

Each experiment consisted of a series of survey questions, with one question per screen and distinct design conditions. While the same questions were asked within each experiment, the design of the screens differed depending on the condition. The questions also differed across experiments. The following description is provided for each experiment so the reader has a sense of the stimuli participants encountered.

Using a between-subjects experimental design, Experiment 1 tested five different label location conditions for text input fields as shown in Figs. 1, 2, 3, 4 and 5. Participants were randomly assigned to one condition. Each condition had the same 14 open-ended questions on a range of topics. All the questions required the participant to interact with the keyboard or keypad on the phone. Participants were instructed to answer the questions as they would if they were at home and with no researcher present.

Fig. 1.
figure 1

Label above box.

Fig. 2.
figure 2

Inline labels that move.

Fig. 3.
figure 3

Label to the left of the box and left aligned.

Fig. 4.
figure 4

Label to left and right align.

Fig. 5.
figure 5

Label to right of box.

Using a between-subjects experimental design, Experiment 2 tested three different select-one conditions as shown in Figs. 6, 7 and 8. Two of the conditions (Figs. 6 and 7) were dropdowns. When focus was placed on the dropdown, the answer choices appear in the default manner of either an iOS display (with the answer choices at the bottom in a spinner wheel – Fig. 6) or an Android display where the answer choices appear as a pop-up window (Fig. 7). Participants were randomly assigned to one condition. Each condition had the same 12 questions. All the questions required only a single answer, but some questions had a long list of possible response choices and other questions had fewer response options. Again, participants were instructed to answer the questions as they would if they were at home and with no researcher present.

Fig. 6.
figure 6

iOS picker.

Fig. 7.
figure 7

Android spinner.

Fig. 8.
figure 8

Radio button/keypad.

Using a between-subjects experimental design, Experiment 3 tested three different ways to display the states in an Android spinner design as shown in Figs. 9, 10 and 11. Participants were randomly assigned to one condition. There was only one question “What state shall I select?” The participant was instructed to read that question aloud and the TA replied by reading a state name from a randomized list. The participant recorded the state, selected next and the sequence repeated 50 more times with the TA responding with a new state so that all 50 states and the District of Columbia were entered. In this experiment, participants answered the questions based on data provided to them orally by the TA.

Fig. 9.
figure 9

Full state name.

Fig. 10.
figure 10

State abbreviation.

Fig. 11.
figure 11

State abbr. & name.

2.4 Analytic Strategy

The original goal of the experiments was to determine which mobile survey designs worked the best for users. To answer that question, we compared behavioral measures captured within the app between the conditions within an experiment. Two behavioral measures captured across all experiments were time-on-screen and a self-reported satisfaction rating as measured by the participant rating the task on a scale of easy to difficult. The latter was captured once for each participant after he or she had finished the survey. The participant was asked to rate how easy or difficult it was to complete the task on a 5-point scale with the endpoints labeled 1 = Very Easy and 5 = Very Difficult. In this paper, we use these data to answer age-related questions.

To answer the first research question, whether the effect of age on question completion time varies by survey design, we used a mixed model (PROC MIXED in SAS) for each experiment. The outcome measure was the log of time to complete a screen at the question level. Modeling at the question level increases the number of observations and allows us to account for different question characteristics. We controlled for the condition, the age of the participant, and the interaction between condition and age. Because participants contribute a response for each question in the survey, we included a random effect for each participant. If the interaction between condition and age was significant, then the effect of age on completion time varies by survey design; if the interaction was not significant, then there is insufficient evidence that a design that is optimal for older adults (with regard to time-on-task) would differ from that for younger adults. We ran three separate models instead of collapsing the data together because some experiments produced significant differences by design.

To answer the second research question, whether older adults take longer to answer survey questions on a mobile phone than younger adults regardless of design, we ran the same models but without the interaction term. If the fixed effect of age is significant in the model, then the sign of the coefficient indicates whether the association between age and response time is positive or negative: i.e., a positive association indicates that increased age corresponds to increased response time.

To answer the third research question, whether older adults are more satisfied with the mobile web survey than younger adults, we combined the satisfaction data across the experiments and ran a proportional odds model (PROC LOGISTIC in SAS) with the satisfaction score (1 through 5) as the dependent variable and the age as the predictor variable. If age is significant in the model, then the sign of the coefficient indicates whether the association between age and satisfaction is positive or negative: i.e., a positive association indicates that increased age corresponds to increased satisfaction. We collapsed the data across the three experiments together for this analysis because there were so few dissatisfied participants based on the scores.

3 Results

3.1 Hypothesis 1: The Effect of Age on Question Completion Time Will not Vary by Survey Design

Table 2 contains the model results predicting time with the interaction term, the condition (design) and age for each of the experiments. Examining the p value for the interaction terms in these models, across all three experiments, we do not find sufficient evidence that time to complete different survey designs varies by the age of the participant, meaning that designs that take older adults longer to complete also take younger adults longer to complete, when compared to other designs. Therefore we do not reject Hypothesis 1.

Table 2. Type 3 Test of fixed effects for models with interaction term. Each column displays three (individual) F-tests for the specified experiment. Each entry shows the p value of the test, along with the numerator and denominator degrees of freedom in parentheses.

3.2 Hypothesis 2: Older Adults Will Take More Time to Answer Questions on Mobile Web Surveys Than Younger Adults Regardless of Design

Table 3 displays results for models of log response time, without the interaction term, for each of the experiments. Examining the p value for the fixed effect of age in these models, we find that age significantly predicts time-on task in each of the experiments. The β coefficient in each of the models for age was positive. The positive coefficient means that as age increases, so does the time to answer the survey questions. Therefore we do not reject Hypothesis 2. We conclude that older adults take longer than younger adults to complete questions on mobile web surveys for questions that require the user to answer a question by touching an answer choice after reading a question (Experiment 2 and 3) and for questions that require the user to type an answer (Experiment 1). The coefficient was larger for Experiment 1 than the other experiments, which implies that the reaction time gap between older and younger users is wider for touch typing than for simply selecting touch buttons.

Table 3. Solution for Fixed Effects for main effect models only. Note these values come from individual tests and not a joint experiment. Estimated coefficient values are displayed with associated p-values in parentheses.

Figures 12, 13 and 14 show the estimated line for the effect of age on time for each of the experiments. The dotted lines represent pointwise 95% confidence bounds, which become wider as age increases, suggesting that there is more variability in reaction time for older adults than for younger adults. While it could also mean that there are fewer older adults, in Experiment 1 and 2, this is not the case, as the median age is 60 or older.

Fig. 12.
figure 12

Estimated line for the effect of age on log of time for Experiment 1.

Fig. 13.
figure 13

Estimated line for the effect of age on log of time for Experiment 2.

Fig. 14.
figure 14

Estimated line for the effect of age on log of time for Experiment 3.

3.3 Hypothesis 3: Older Adults are More Satisfied with the Mobile Web Survey Than Younger Adults

Figure 15 contains the distribution of satisfaction scores combined across all three experiments for older (n = 109) and younger adults (n = 118). Older adults were defined as individuals 60 years old or older. The scores indicate that most participants regardless of age found the tasks and interfaces easy to use.

Fig. 15.
figure 15

Distribution of satisfaction scores combined across the three experiments by participant age.

The proportional odds model found age to be marginally associated with satisfaction (p value 0.079) with coefficient \( \hat{\beta } = 0.0104 \). Here, a positive coefficient represents a negative association between age and survey difficulty, so that older participants are more likely to report lower difficulty with the survey. This provides marginal evidence in favor of Hypothesis 3.

4 Discussion

The purpose of this research was to add to the literature on age-related differences on mobile survey completion and satisfaction with mobile web surveys.

Our first hypothesis was that there would not be a significant interaction between age and different mobile web designs when predicting time to complete a survey question. We could not reject the hypothesis for any of the experiments, meaning that designs that worked well for older users also worked well for younger users or the designs work equally poorly regardless of participant age. This research provides some evidence that for mobile web survey design, testing with older users could be sufficient for testing different touch and keyboard designs to come up with design guidelines that are based on time-on-task metrics.

In all three experiments, the time needed to complete the experiments increased as the age of the participant increased which supports Hypothesis 2. This finding was in-line with other research which found older adults taking longer to complete tasks on mobile phones [12]. This increase in time was found for subtasks that required on-screen keyboard/keypad use, and for surveys that required touching buttons. Based on the model coefficients, the gap between older and younger users is wider for tasks that involve touching and interacting with the keyboard and keypad compared to simply touching buttons. Based on the confidence bounds, there is more variability in time-on-task as participants age. As an overall measure of time needed with designs, testing only with older users will lead to a bias of longer time; while testing only with younger users will lead to the opposite conclusion. This highlights the importance of recruiting participants of different ages for any human-computer design testing.

This research also suggests that surveys are generally easy to complete and generate high satisfaction scores. We found a marginally significant increase in positive scores for older adults over those of younger adults to support our third hypothesis. This finding was also in line with other research which finds a positivity effect at work in older adults [22].

5 Limitations

Much of the cognitive aging literature is based on relatively small samples of college students and older adult volunteers brought into university labs, whereas middle-aged adults or those without some college education are included less often [30]. Our research also used a convenience sample of young adults from community colleges and older adults who traveled to community centers. Additionally, our sample lived in one major metropolitan area. While we are unaware of any regional differences in smartphone use, it could be that older adults who cannot travel outside of their homes would behave differently. It also could be that non college-educated younger individuals and middle-aged individuals would behave differently. While we did not have a random sample, the statistical methodology used assumes a random sample. Future research should aim to recruit these other user groups to determine if results differ. Additionally, future studies could include other survey tasks besides typing answers and selecting buttons, such as navigating between pages, dragging and dropping actions, and reading text. Besides time-on-task, future studies could look at accuracy of responses.

Disclaimer.

This report is released to inform interested parties of research and to encourage discussion. The views expressed are those of the authors and not necessarily those of the U.S. Census Bureau. The disclosure review number for this paper: CBDRB-FY20-143.