Although the use of online samples for research in some areas of psychology is growing (e.g., Gosling, Vazire, Srivastava, and John 2004; Paolacci, Chandler, and Ipeirotis 2010), their use for studying the psychology of aging might be questioned on several grounds. Perhaps, most obviously, one might question the possibly limited age range as well as whether the older adults who are online represent a highly select group. In other research areas, use of online samples has been validated by replicating established findings, as in the study by Paolacci et al., who replicated a number of standard results in judgment and decision making. It is likely, however, that not all research questions can be adequately addressed using this approach.

For example, Crump, McDonnell, and Gureckis (2013) reported that although a variety of benchmark phenomena from cognitive psychology (e.g., the Stroop, Flanker, Simon, and attentional blink effects) could all be instantiated online, some experimental paradigms, including category learning and masked priming, produced more problematic results. More recently, Shapiro, Chandler, and Mueller (2013) suggested that online samples may also be useful for studying clinical and subclinical populations, and reported that data provided by these groups over the Internet can be of relatively high quality. Their study is of particular relevance because it highlights an instance in which special populations that can be difficult to bring into the laboratory were studied online.

Whether one can replicate established phenomena regarding age-related differences using online samples is an open question. It is also an important one because to the extent that the concerns about this approach can be overcome, the Internet has the potential to greatly increase the efficiency of aging research in a number of ways, perhaps most obviously by making it possible to assess many more participants in much less time than is typically the case when using traditional laboratory methods.

Amazon Mechanical Turk

In recent years, web services such as Amazon Mechanical Turk (MTurk) have greatly facilitated the recruitment of participants for online research (for an overview, see Mason and Suri 2012). MTurk has a number of features that make it especially useful. For example, it allows researchers access to a pool of potential participants that is more diverse on many dimensions (e.g., age, ethnicity, education, country of origin) than most undergraduate subject pools, and whose size remains relatively constant year round. Not only can such diversity help with issues of generalizability, but for researchers at highly competitive institutions, it also may help alleviate problems recruiting older adults who are comparable to the available undergraduates. Another appealing feature is that subject payments are typically a fraction of what they would be in laboratory settings. For example, Paolacci et al. (2010) replicated standard laboratory results in judgment and decision making using MTurk participants who were paid less than $2 an hour (i.e., $0.10 to perform three tasks that took less than 5 min). Importantly, such low pay does not necessarily mean lower quality data from Internet samples (e.g., Buhrmester, Kwang, and Gosling 2011; Crump et al. 2013).

The current study sought to determine whether older adults can be recruited using MTurk, and if so, whether such samples show age-related differences similar to those observed in laboratory settings. Previous studies assessing the use of online samples have focused on benchmark findings in part because of their significance but also because if one fails to replicate an unreliable phenomenon using a new method, this says little about whether the method works. Using similar logic, our experiments focused on the relationships among age, processing speed, and working memory because of the robust nature of age-related slowing (Cerella and Hale 1994; Cerella, Poon, and Williams 1980; Verhaeghen and Salthouse 1997) as well as because of its link to changes in working memory and other higher order abilities (Fry and Hale 1996; Kail and Salthouse 1994; Salthouse 1996).

Across three experiments, participants recruited from MTurk performed four different processing speed tasks and a working memory task. Experiment 1 examined age-related differences in lexical decision response times. Experiment 2 followed up by comparing age-related declines in verbal and visuospatial processing speed. Finally, Experiment 3 examined the relationships among age, processing speed, and working memory ability.

Experiment 1

Our first experiment examined the distribution of participants’ ages in an online sample, and whether such a sample would show the usual age-related decline in processing speed. In addition, we sought to determine whether practice decreased age differences in response time (RT), as has been repeatedly observed in previous studies (e.g., Jordan and Rabbitt 1977; Myerson, Robertson, and Hale 2007; for a recent review, see Verhaeghen 2014).

Method

Participants

One hundred and twenty-two participants were recruited using MTurk. Those interested in participating were informed they would be paid $0.15 for a task that would take 2–5 min; no age requirements were mentioned. All participants resided in the United States and reported proficiency with English. Although participants in their 50s and 60s were recruited at a lower rate than younger participants, and there were no participants in their 70s, the experiment took less than 3 weeks to complete, showing that data from adults whose ages varied widely could be rapidly collected online (see Table 1 for participant characteristics).

Table 1 Descriptive statistics of mean age, gender, and education for five age groups spanning 10 years each for Experiments 1–3

Procedure

Processing speed was measured using a lexical decision task (Meyer and Schvaneveldt 1971). On each trial, participants saw a string of three letters (e.g., “bin,” “mun”) and had to decide as quickly and accurately as possible whether or not it was a real English word. They reported their decisions using the left- and right-arrow keys, with the assignment of “yes” and “no” responses counterbalanced across participants. The stimuli consisted of 20 words and 20 nonwords whose order was randomized for each participant. Individuals’ mean RTs were calculated based on correct responses to both words and nonwords.

Results and Discussion

Data from four participants (all in their 20s and 30s) were excluded due to low accuracy (<80 %), and RTs more than two standard deviations from each participant’s mean (2.8 % of the data) were removed prior to further analysis. The left panel of Fig. 1 depicts individual participants’ RTs plotted as a function of age. As may be seen, age and RT were strongly correlated, r(116) = .61, p < .001, replicating previous studies. Indeed, Madden (1992) reported a very similar correlation (r = .57) and rate of increase in RT with age in a laboratory study that examined age-related differences in lexical decision times in a cross-sectional sample similar to the present sample in both size and age range.

Fig. 1
figure 1

The left panel represents individual participants’ mean response times on the lexical decision task as a function of age in Experiment 1. The right panel represents mean response time as a function of block for young and older adults in Experiment 1, with the solid line represents best fitting polynomial for each age group

In order to determine whether older adults improved more with practice than younger adults, the experimental session was divided into four blocks of 10 trials each, and two extreme groups consisting of the youngest and oldest participants were created: a young adult group (n = 18, M = 23.6, SD = 1.7) and an older adult group (n = 19, M = 59.2, SD = 4.0). As may be seen in the right panel of Fig. 1, young adults’ RTs and older adults’ RTs both decreased across blocks. A 2 (age) × 4 (block) analysis of variance (ANOVA) revealed main effects of age and block, F(1, 35) = 44.35, p < .001, η2 = .56, and F(3, 105) = 28.24, p < .001, η2 = .45, as well as an interaction between age and block, F(3, 105) = 3.82, p = .012, η2 = .10, reflecting the fact that older adults’ RTs decreased with practice more than those of the young adults.

The present results demonstrate that a sample of adults who vary widely in age and who show age-related slowing similar to that observed in laboratory samples can be rapidly recruited via MTurk. Not only was the correlation between age and lexical decision RT similar to that in comparable laboratory studies (e.g., Madden 1992) but, also as in previous studies (e.g., Jordan and Rabbitt 1977; Myerson et al. 2007), older adults in our online sample benefited more from practice than did young adults, lending further support to the idea that aging studies conducted with MTurk can yield data comparable to that of laboratory-based studies.

Experiment 2

In our second experiment, we sought to replicate our finding that processing speed declined with age in an online sample using a more difficult verbal speed task. More importantly, participants also performed a speeded visuospatial task so that we could compare rates of decline in an attempt to replicate the robust finding that aging affects visuospatial processing more than verbal processing (e.g., Hale and Myerson 1996; Jenkins, Myerson, Joerding, and Hale 2000). In addition, we examined whether using a two-stage recruiting process would make it possible to obtain an online sample with a flat age distribution in an efficient manner.

Method

Participants

One hundred and eight participants were recruited using MTurk. All participants reported proficiency with English and resided in the United States. To ensure that the entire age range was well represented, we used a two-stage recruiting process and inclusion criteria that changed as a function of the current age distribution of our sample. In the first stage, workers were told they would be paid $0.30 for 4–7 min of work, and those interested in participating were directed to a webpage where they filled out a questionnaire that asked their age, gender, education, handedness, and whether English was their primary language. In the second stage, those who qualified were given a link to the task and a pass code that would give them access to the experiment. We kept track of participants’ ages using 10-year bins (18–27, 28–37, etc.), and when at least 20 had been tested in a particular bin, the inclusion criteria were changed, and those in that bin no longer qualified for participation.

This method, which limited expenditure on participants whose ages were already well represented, yielded roughly equal numbers of participants in each bin (see Table 1). Although it required monitoring the age distribution, the method was very efficient, as evidenced by the fact that only 10 days were needed to collect the data for this experiment. More specifically, the inclusion criterion was changed 7 days after the initial posting so as to exclude anyone 37 or younger, and again 1 day later to exclude those 47 or younger; a final change, to exclude anyone 57 or younger, was made after 1 more day.

Double lexical decision task

Participants saw two four-letter strings and had to decide whether both were real English words. They were told to report their yes/no decisions as quickly and accurately as possible using the left- and right-arrow keys, with assignment of responses to the keys counterbalanced across participants. There were four conditions of 15 trials each: word/word, nonword/word, word/nonword, and nonword/nonword. The order of presentation was randomized for each participant, and mean RTs were based on correct responses on both “yes” and “no” trials.

Visual conjunction search task

Participants saw arrays containing red circles and green squares and had to decide whether a red square was also present. They were told to report their decisions as quickly and accurately as possible using their left- and right-arrow keys, with the assignment of responses to keys counterbalanced across participants. There were four conditions of 10 trials each: target present and target absent at array sizes of 15 and 25, with order randomized for each participant. Mean RTs were based on correct responses on both “yes” and “no” trials.

Results and Discussion

Data from two participants in their 20s, two in their 30s, two in their 40s, and one in her 60s were excluded due to low accuracy (<80 %) on at least one of the tasks. RTs two or more standard deviations from a participant’s mean RT (4.5 % of the verbal and 4.1 % of the visuospatial RTs) were removed prior to further analysis. Age was positively correlated with RTs on both the verbal task, r(99) = .37, p < .001, RT = 9.98 × age + 1,000.0, and the visuospatial task, r(99) = .65, p < .001, RT = 15.11 × age + 599.8. A significant interaction between age and domain was observed, F(1, 99) = 5.49, p = .021, η2 = .05, reflecting the fact that, as indicated by the regression slopes, visuospatial RTs increased with age at a rate that was approximately 50 % greater than the rate of increase in verbal RTs.

A question always arises when the age difference on one task is greater than on another: Does this indicate that older adults have a specific deficit on the task associated with the larger difference, or does it reflect a more general phenomenon, the complexity effect (Cerella et al. 1980), in which age differences increase with task difficulty? This question is particularly relevant when RT is the dependent variable, because general slowing produces apparent complexity effects in the absence of specific deficits (Myerson, Adams, Hale, and Jenkins 2003). One way to resolve this issue is to show that one age group can perform both tasks equally well. If so, then any difference in performance of the two tasks that is observed in the other age group cannot reflect a difference in difficulty in a general, age-independent sense of the term, but rather indicates that, for age-related reasons, the tasks differ in difficulty for one, and only one, of the groups (Hale, Myerson, Emery, Lawrence, and DuFault 2007).

In order to use this approach, we again created two extreme groups: a younger group consisting of the 22 participants under the age of 30 (M = 24.7 years, SD = 2.7) and an older group consisting of the 22 oldest participants (M = 61.0 years, SD = 3.4). A 2 × 2 mixed-design ANOVA revealed main effects of age, F(1,42) = 37.87, p < .001, η2 = .47, and domain (verbal vs. visuospatial), F(1, 42) = 19.40, p < .001, η2 = .32. More importantly, there was a significant age by domain interaction, F(1, 42) = 6.33, p = .016, η2 = .13, reflecting the fact that although the older adults’ RTs did not differ on the visuospatial (M = 1,498 ms, SD = 316) and verbal (M = 1,576 ms, SD = 329) tasks, t(21) = 1.16, p = .258, the younger adults were faster on the visuospatial (M = 958 ms, SD = 177) than the verbal task (M = 1,263 ms, SD = 235), t(21) = 5.41, p < .001, d = 1.47.

Taken together, the present results replicate our previous finding that age-related declines in processing speed can be observed in an MTurk sample using two new tasks, visual search and double lexical decision, and extend our findings by revealing that such a sample shows greater age-related slowing in the visuospatial domain than in the verbal domain, consistent with many prior laboratory studies (e.g., Hale, Myerson, Faust, and Fristoe 1995; Myerson et al. 2003).

Experiment 3

The goal of our third experiment was to examine the relationships among age, processing speed, and working memory using an Internet sample. A major reason for the interest in age-related slowing in recent years is because processing speed is negatively correlated with working memory ability (e.g., Kail and Salthouse 1994), which in turn is strongly correlated with other higher order abilities, including fluid intelligence (Kane and Engle 2002), and complex learning (Tamez, Myerson, and Hale 2012). Not surprisingly, given the relationship between speed and working memory, adults’ working memory declines with age (e.g., Myerson et al. 2003; Salthouse 1996). At issue in the current experiment was whether the established relationships among age, processing speed, and working memory would be observed in an MTurk sample.

Method

Participants

One hundred and twenty-two participants were recruited via MTurk using the same procedure as in Experiment 2, except that participants were told that they could earn $0.60 for 15–18 min of work. All participants resided in the United States and reported proficiency with English.

Tasks

Processing speed was measured using a shape-classification task (Jenkins et al. 2000) in which participants decided whether or not two objects had the same shape. Each object could be a circle, a square, a triangle, or a pentagon, and either the two objects were the same size or one was larger than the other. There were four conditions (same or different shapes crossed with same or different sizes) of 16 trials each. Participants were told to report their decisions as quickly and accurately as possible using the left- and right-arrow keys on their keyboard, with the assignment of responses to arrow keys counterbalanced across participants. Mean RTs were calculated based on all correct responses, and the order of the stimuli were randomized for each participant.

The Letter-Number Sequencing task (adapted from the Wechsler Adult Intelligence Scale, WAIS-IV; Wechsler 2008) was used to assess working memory. Participants were shown a series of alternating numbers and letters at the rate of one item per second, and told to report the numbers in numerical order followed by the letters in alphabetical order. There were 24 series ranging from 3–12 items in length, with two at each length. A trial was considered correct if all items were recalled in the correct order. Scores were calculated as the sum of the series lengths of the correct trials.

Results and Discussion

Data from four participants were excluded because of low accuracy (<80 %) on the processing speed task; seven participants’ data were excluded due to their not understanding the instructions for the working memory task (as evidenced by not reordering the letter/numbers). With respect to the ages of the participants whose data were excluded, four were in their 20s, two in their 30s, four in their 40s, and one in her 60s. Shape-classification RTs two or more standard deviations from a participant’s mean RT (4.2 %) were excluded from the analyses.

Participant characteristics are provided in Table 1, and scatterplots showing the pair-wise relationships among age, processing speed, and working memory are presented in Fig. 2. As was the case in Experiments 1 and 2, processing speed declined (i.e., RTs increased) with age, r(109) = .40, p < .001, as did working memory r(109) = -.25, p < .007, and slower processing was associated with lower working memory, r(109) = -.26, p < .006. Importantly, the relationship between age and working memory was not statistically significant after controlling for speed, r(108) = -.17, p = .080. The present results indicate that not only can age-related declines in processing speed and working memory be observed in an MTurk sample, but that the relationships among these variables replicate the finding that although age is a predictor of working memory, much of this relationship is mediated by processing speed. (e.g., Salthouse 1992; Verhaeghen and Salthouse 1997).

Fig. 2
figure 2

Scatterplots depicting the relationships among individual participants’ age, processing speed (response time), and working memory in Experiment 3. Lines represent the best fitting linear functions

General Discussion

Taken together, the present results provide strong evidence that online samples recruited through MTurk show age-related declines in processing speed similar to those observed in laboratory samples. Even though the age range was narrower than in most laboratory-based cross-sectional studies, significant slowing was seen in three different samples and on four different processing speed tasks. In addition, the results replicated several benchmark findings: Older adults improved more with practice, but their asymptotic RTs remained longer than those of the younger adults (Experiment 1); greater age-related declines were observed in visuospatial processing speed than in verbal speed (Experiment 2); and although working memory decreased with age, age was not a significant predictor of working memory once processing speed was statistically controlled (Experiment 3).

Previously, it was unclear whether online samples would have too narrow an age range or whether older MTurk workers would turn out to be too highly select a group for typical age-related differences to be observed. Although the age range of those easily recruited via MTurk is currently narrower than in many laboratory-based cross-sectional studies, this range proved quite adequate for replicating established findings. Importantly, older adults recruited online do not appear to be a highly select group with uniquely preserved abilities. Moreover, the age range available for study online is likely to increase in the future as is the representativeness of online samples. The diversity of younger adult participants available online already exceeds that of undergraduate samples in most studies, and as baby boomers (now in their 50s and 60s) get older, computer literacy in older adults is likely to become the rule rather than the exception, and the diversity of older online participants will grow accordingly.

The efficiency of online recruitment and data collection can be an advantage in many research areas, but it is especially advantageous for research with older adults. Whereas cost is often not a factor with younger adults, older participants are typically paid for participating, and MTurk workers are willing to work for much less than researchers typically pay older adults who come to the laboratory. In addition, cross-sectional designs and designs for studies involving structural equation modeling, both of which are often used in aging studies, may require very large samples that multiply not only the cost but also the time and effort involved in doing research. In contrast, the present experiments, which together involved over 350 participants, each required at most several weeks to complete and together took only a few months. Moreover, whereas collecting data from very large samples in the laboratory can require a lab manager and multiple research assistants to schedule appointments and administer the experimental tasks, data collection for the present experiments required only that a single researcher (D. C. B.) monitor the process at his convenience.

There are also, of course, disadvantages to this method of data collection. For example, if researchers conduct related studies online and require that participants be naïve, they may need to check to ensure that their current participants were not in one of their previous experiments. Perhaps the biggest disadvantage is that researchers have less control over the testing environment. For this reason, they may want to break up longer tasks or task batteries so that they can be administered over multiple sessions, allowing participants to perform tasks at their convenience when distractions are minimal. This is not to say that experimental sessions conducted online tasks must always be brief. For example, Bui, Maddox, and Balota (2013; Experiment 2) conducted an experiment online in which participants completed two working memory tasks in the first session that took up to 30 min, and then returned for a second session that lasted up to 45 min. Moreover, what appear to be limitations sometimes have a flip side. In the case of online research, for example, brief experiments may now be practical that previously would not have been worthwhile for either experimenters or participants. As for long experiments, the Internet makes multiple sessions more practical, which may increase the quality of the data by decreasing boredom and fatigue.

In sum, the present findings show that aging research with online samples can yield results comparable to those obtained in the psychological laboratory. Over the next few years, computer savvy middle-aged adults will become older and those who are already elderly may become more familiar and comfortable with the Internet. As a result, the availability of older workers on MTurk likely will increase, as will the age of the oldest potential online participants. Thus, the effectiveness of aging research conducted with online samples should only grow, adding an increasingly powerful tool to the researchers’ toolkit.