Introduction

In everyday life we often have to make choices without having a clear criterion for which option is better: Choosing what to eat when we only care about whether we eat, which set of purpose-appropriate clothes to pick from our wardrobe, from which lane we want to take a shopping cart when they’re all equally far away and so on. Despite occasional assertions to the contrary,Footnote 1 we make such decisions with ease and swiftly. This type of choice devoid of almost all personal meaning, however, is also often used in laboratories when certain modes of action selection are investigated with so-called free choice tasks.

Free choice tasks

In these tasks, participants are instructed to freely choose one of two (or more) response options that are considered equally correct. For example, consider a task in which whenever an ‘H’ is displayed on a screen, participants are supposed to press either a button to their left or a button to their right. Often, the participants are instructed to avoid obvious patterns in their choices (like left-right-left-right, for example) and to give all response options in equal proportions. We will discuss potential issues with this type of instruction in the subsequent “Free choice and random generation tasks” section, after we have introduced the task and important observations in the following. The experiments reported in this paper address critical aspects following from such instructions.

Starting with Berlyne’s (1957) study, free choice tasks are often used in contrast with forced choice tasks, in which only one response is considered correct to a stimulus. One almost universal observation in the literature is that free choice response times (RTs) are longer than forced choice RTs (but see, e.g., Wirth et al. 2018 for an exception). This RT difference might be taken to indicate qualitative differences with regard to response selection. Accordingly, free choice tasks are often used to operationalize what has been termed self-generated (or intentional, internally generated, intention-based, voluntary, goal-directed) action, while forced choice tasks are often used to operationalize externally triggered (or stimulus-based) action (e.g., Brass and Haggard 2008; Herwig et al. 2007; Passingham et al. 2010; Keller et al. 2006; Waszak et al. 2005). In support of this, there is some evidence that associations between actions and their effects can only be learned in an intention-based action control mode as operationalized with free choice tasks (Herwig et al. 2007; see also; Gaschler and Nattkemper 2012; Herwig and Waszak 2009, 2012; Pfister et al. 2010). However, Pfister et al. (2011) reported that these associations are also learned in forced choice tasks. In addition, there is ample evidence that action effects play a role even when using forced choice tasks (e.g., Gozli et al. 2016; Huffman et al. 2018; Janczyk et al. 2012a, b, 2014, 2017; Kühn et al. 2009, Exp. 3; Kunde 2001; Kunde et al. 2012; Pfister and Kunde 2013; Wolfensteller and Ruge 2011). In sum, it appears that the majority of evidence argues for the same role of action effects in forced and free choice tasks. This conclusion received additional support from other lines of research. For example, Janczyk et al. (2015a) compared both task types with regard to their susceptibility to dual-task interference. While replicating the RT difference in all experiments, no differences in dual-task costs between free and forced choice tasks were observed, again pointing to similar “action control mechanisms” involved in both tasks. In line with this, the RT difference was attributed to a perceptual source in a further study (Janczyk et al. 2015b). Coming from a different perspective, Bermeitinger and Hackländer (2018) observed that response priming effects induced by motion primes affected both free and forced choice tasks similarly.

If, then, both tasks do not differ regarding their response selection mechanisms, it appears helpful to identify further commonalities. As a step toward this, Naefgen et al. (2017) viewed the RT difference through a sequential sampling lens (e.g., Grice 1968). In such a framework, evidence for or against a response option (or more precisely in the context of that study: the desired goal state, that is, the depressing of a left or right response key) is noisily accumulated over time. Once the total amount of this evidence surpasses one of the thresholds, a response is emitted. This results in three theoretically relevant parameters for a choice type: The speed of evidence accumulation, the thresholds for making a choice, and the time not spent accumulating evidence (such as, e.g., time needed for the motor execution of the choice made). Within this framework, Naefgen et al. then asked whether the RT difference can be attributed to differences in the speed of evidence accumulation or to differences outside the accumulation process. To this end, the amount of catch-trials (e.g., Bausenhart et al. 2010) and time pressure (e.g., Dror et al. 1999) were used to manipulate decision thresholds. If differences in evidence accumulation were the reason, the RT difference should become smaller the lower the thresholds. As this was not observed, the cause is likely located in a process different from evidence accumulation, that is, in the non-accumulation time. The present study aims to address the nature of this process and focuses on the generation of random responses as one candidate.

Free choice and random generation tasks

Frith (2013) argued that in free choice tasks, “in essence, the experimenter is asking her subjects to try to be unpredictable and random” (p. 291). He based this argument both on psychological evidence that participants associate randomness and the perception of choices as free (Ebert and Wegner 2011) and on neuroimaging evidence that random choice tasks and free choice tasks activate similar brain regions (Jahanshahi et al. 2000; Jenkins et al. 2000). This becomes even more evident when looking at the similarities between the instructions for free choice tasks and random generation tasks. The former appear in three variants: (1) explicit instructions to choose responses at random, (2) instructions similar to random generation instructions (e.g., avoidance of patternsFootnote 2), and (3) instructions emphasizing spontaneity or freedom of choice. Lastly, there are also studies in which no instruction as to the desired patterns was reported. Examples for these categories can be found in Table 1. Please note that this overview is meant as an illustration, and is not exhaustive. One thing illustrated by Table 1 is the prevalence of instructions to avoid patterns in the free choice responses. One reason for such instructions is that, when they are not given, participants sometimes give responses with only one or almost only one of the response options.

Table 1 Illustrative examples of different instructions for free choice tasks as well as random generation tasks

While this type of instruction could be argued to constrain the choices that participants can give, this is true of all tasks that could feasibly be observed in an experimental laboratory. However, free choice responses are still less constrained than forced choice responses. While free choice instructions and random generation instructions bear similarities, free choice instructions are used this way in the literature on self-generated action and are, as such, worthy of investigation. The next section will discuss the relationship between random generation tasks and how they are affected by working memory (WM) manipulations.

Random generation and working memory

Baddeley reported that random generation performance can be influenced by various factors such as time constraints (Baddeley 1962, as cited in, 1966) or concurrently performed tasks (Baddeley 1966), suggesting that the capacity to create random information is limited in some way. As such, it stands to reason that adding a secondary task that involves WM to the random generation task would interfere with the random generation task. For example, Cooper et al. (2012) used a dual-tasking paradigm in which a random digit (1–9) generation task was coupled either with a 2-back task or a go/no-go task. Indeed, performance in the random generation task as measured through RTs and different indices of randomness was worse when combined with the 2-back task.

Additional evidence for a relationship between WM functions and random generation can be derived from principal component analyses. In particular, Miyake et al. (2000) reported correlations between the executive functions of updating and inhibition with measures of randomness (equality of response usage and inhibition of prepotent associates, respectively) as described by Towse and Neil (1998).

In sum, the literature suggests that WM plays a critical role in random generation tasks. The assessment of randomness will be discussed in the next section.

Measuring randomness

A difference between the aforementioned random generation tasks and free choice tasks is that in free choice tasks there are most often only two response options while for the random generation tasks there were usually nine response options. This renders several ways of how randomness of a choice sequence can be measured less informative. For example, it cannot be measured, as it can be with nine digits, whether two subsequent responses have adjacent values.

As there is a plethora of different measures of randomness (Towse and Neil 1998 alone described 14 different measures in their review), it is necessary to choose which one(s) to use. For the purposes of the present paper, randomness will be measured through the local unevenness (LU) measure (see, e.g., Heuer et al. 2005, 2010). While earlier studies used a more general form of LU, the following description is specific to a two-response-options situation with left and right responses.

In essence, the LU is a measure of the deviation of empirical responses from an ideal random distribution of responses, as measured in running windows of predefined sizes. “Running window” here means that a sequence is divided into all possible sequential sub-sequences of a predefined length and the formula is applied to all of these sub-sequences. For an illustration of what this looks like, see Fig. 1. The formula for the LU in each segment is as follows:

Fig. 1
figure 1

Examples of (average) values of local unevenness in an example sequence for window sizes 2, 4, 6, and 8

$${\text{L}}{{\text{U}}_w}=\sqrt {\frac{{{{({p_{{\text{left}}}} - 0.5)}^2}+{{({p_{{\text{right}}}} - 0.5)}^2}}}{2}} ,$$

where p is the ratio of the respective response option given in the respective window. Because in the case of only two options the two ratios are complementary, this formula can be further simplified to:

$$\text{L}{\text{U}_w}=\frac{{\sqrt {{{(2 \cdot {p_{\text{left}}} - 1)}^2}} }}{2}.$$

The range of values for the LU lies between 0 and 0.5, where 0 means that in the given window, the distribution is perfectly in line with the expected ratios (i.e., both choices are represented equally often, that is completely evenly) and 0.5 means that only one of the two choices is present in the given window (i.e., the sequence is as uneven as possible).

To illustrate, Fig. 1 gives an example sequence of choices and the resulting LUs, for four different window sizes of 2, 4, 6, and 8, as well as the mean LU for the sequence.

For an infinitely long random sequence, the expected mean value of the LU is, however, not 0.0, as this would imply that in every single segment the options are represented equally often, without, for example, any run-ons of the same choice. Instead, it is the average of all the potential combinations of the options when taking the order of the options into account. Figure 2 illustrates the potential response option combinations when using a window of the size 4.

Fig. 2
figure 2

All sequences that can occur for window size 4 and the respective value of local unevenness (LU). The resulting ideal LU value is then 0.1875

This results in an ideal LU of 0.1875, as all these potential sequences have the same chance to appear in a random sequence. The ideal values for the four window sizes mentioned above are 0.25, 0.1875, 0.15625, and 0.1367188 (for window size of 2, 4, 6, and 8, respectively). Mean LUs higher than those ideal values then mean that unbalanced segments were overrepresented in the whole sequence compared to what would be expected in a random sequence. Conversely, mean LUs below those ideal values imply that balanced segments were overrepresented. From this follows that the deviation from these ideal LU values in a sufficiently long sequence can be viewed as a deviation from (ideal) randomness.

The present study

Our prediction is that, if free choice tasks are random generation tasks, WM manipulations should influence randomness (and also response speed) accordingly. We chose a complementary approach of both lowering and increasing WM load. WM support should then increase randomness (and LUs should be closer to ideally random LUs) and decrease RTs, while experimentally induced WM load should have the opposite effects. To achieve a decrease and an increase in WM load we (1) either displayed varying amounts of previous choices to reduce the need for participants to remember their choices (Experiment 1), or (2) introduced a concurrent n-back task of varying difficulty (Experiment 2). We then measured the (non-)randomness of the responses in a free choice task via the distance to the ideal LU and the speed of the responses. While analyses of LU are the theoretically most important ones, we also included the analysis of RTs to exclude any kinds of potential trade-offs. For example, it might be the case that participants change from a focus on more random responses to a focus on faster responses (similar to speed-accuracy trade-offs, where faster responses come with committing more errors). Thus, additionally analyzing RTs makes it possible to rule out such phenomena.

Experiment 1

Experiment 1 used a paradigm in which the participants gave free choice responses while receiving different levels of WM support in the form of arrows that display previous choices (for a similar approach, see Hadland et al. 2001). We used WM support because one potential way WM influences the ease with which participants generate random responses is by providing information (i.e., previous responses) that is used to decide which response would look more ‘random’ if chosen next. We predict that with growing WM support the distance from ideal LU will decrease and the RTs will shorten.

Methods

Participants

Thirty people from the Tübingen area participated for monetary compensation (Mean age = 23 years, 26 female, 4 male). All participants reported normal or corrected-to-normal vision, were naïve regarding the underlying hypotheses, and provided written informed consent prior to data collection.

Apparatus and stimuli

Stimulus presentation and response collection happened on a PC connected to a 17-in. CRT monitor. Stimuli were a fixation circle in the middle of the screen as well as arrows, appearing within the fixation circle and, depending on block type, above it. Stimuli were white, presented against a black background. The manual responses were given with the left and right Ctrl keys on a QWERTZ keyboard.

Tasks and procedure

The task was to freely choose one of the two response options. The fixation circle was always visible during blocks slightly below the middle of the screen. After a response, an arrow indicating which response was given in the current trial appeared for 50 ms in the fixation circle. During these 50 ms, no new response could be given. In the two block types with WM support, the same arrow then appeared above the fixation circle, shifting all other already displayed arrows one slot upwards and, once three/seven responses were already given, displacing the oldest arrow at the top of the screen. This results in up to three or seven arrows indicating previous choices that are displayed above the fixation circle, as is illustrated in Fig. 3. The 50 ms in which no new response could be given were the only inter-trial interval. There was no time limit for responses.

Fig. 3
figure 3

Examples of the different WM support conditions. In the left panel, no WM support is given, in the middle panel three previous choices are displayed, and in the right panel seven previous choices are displayed. Not visible here are the arrows that appear within the circle for 50 ms after a response

Responses were collected in blocks of 500 trials with every participant performing all three block types twice, that is, in a total of six blocks. The order of the first three blocks was counterbalanced and the second set of three blocks was ordered in the reverse of the first three blocks. Participants were informed before each block how many of their previous choices would be displayed in this block.

Participants were instructed to give about equal amounts of left and right responses and to avoid patterns (e.g., alternating left and right responses or repeating sequences). There was one test session per participant which lasted about 45 min.

Design and analyses

The dependent variables were the distances from the ideally random LU (LUD) and the RTs. The independent variable was the level of WM support (0 vs. 3 vs. 7). For analyses of LUDs, however, we also analyzed four different window sizes (2 vs. 4 vs. 6 vs. 8). Accordingly, two main analyses were performed: LUDs were analyzed with a 3 × 4 analysis of variance (ANOVA) with WM support and window size as repeated-measures. RTs were analyzed with an ANOVA with WM support as a repeated-measure. Because we predicted decreasing RTs and LUD approaching zero with increasing WM support, we calculated Helmert contrasts on WM support (Contrast 1: no support vs. three and seven previous displayed choices; Contrast 2: three vs. seven displayed previous choices). In case of interactions between window size and the Helmert contrast, separate Helmert contrasts for each window size were calculated and are reported in the “Appendix” section.

LUDs were calculated on the whole data set once sufficient responses were given for the respective window size. For the subsequent analyses, trials were excluded as outliers if their RTs deviated more than 2.5 SDs from the respective cell mean (calculated separately for each participant).

Results

The LUDs and average RTs (1.79% outliers) are visualized in Fig. 4 and are summarized in Table 2. For LUDs, Contrast 1 was significant and indicated a difference between conditions with and without memory support, t(29) = 3.79, p = .001, without interacting with window size, t(29) = 1.70, p = .100. However, there was no significant difference between the two memory support conditions according to Contrast 2, t(29) = 0.36, p = .551. While this contrast interacted with window size, t(29) = 2.68, p = .012, when tested separately, all contrasts were not significant, all ps ≥ 0.217 (for more details, please see the “Appendix” section).

Fig. 4
figure 4

Mean distances to ideal local unevenness (LUDs) (for the window sizes 2, 4, 6, and 8) and response times (RTs) in Experiment 1 for each level of working memory (WM) support. Error bars are 95% within confidence intervals (separate for all window sizes in case of the LUDs) (Loftus and Masson 1994)

Table 2 Means (and SD) of RTs in ms and LUDs for Experiment 1 for each WM support condition

Responses were significantly slower in the condition without WM support compared with the two other conditions, Contrast 1: t(29) = 2.63, p = .013, but there was no significant difference between the two WM support conditions, Contrast 2: t(29) = − 0.14, p = .886.

Discussion

In sum, response patterns were more random and RTs shortened with the presence of WM support. No such difference was detectable between the different levels of WM support. These results can be taken as first evidence that WM plays a similar role in free choice tasks as it does for random generation tasks.

There is one potential confound in this particular experimental design: The presence of the arrows employed as WM support can be interpreted as a type of action effect (or action outcome), which conceivably differs between the no-support and the two support conditions. Furthermore, the last presented arrow was always spatially compatible with the selected response. Importantly, RTs are shorter when the responses produce compatible action effects compared with incompatible ones (Kunde 2001; see also; Janczyk and Lerche 2018; Janczyk et al. 2017; Koch and Kunde 2002). At first glance, this might have contributed to the shorter RTs in the two WM support conditions. However, we believe that this argument does not pose serious problems for several reasons. First, it is important to note that in all conditions an immediate and compatible arrow appeared in the center of the fixation circle. Second, in the two WM support conditions, always multiple arrows were present on the screen. Thus, there would most of the time (unless the participants repeated responses multiple times) be a mixture of compatible and incompatible action effects be present what would weaken a potential impact on RTs. Third, the RT difference we observed (roughly 70 ms) is larger than the usual effects of action effect compatibility (e.g., between 20 and 50 ms in Kunde 2001). Hence, if this confound played a role in the RT results, it likely would account only for a part of the difference. Lastly, and potentially most important, it is not clear how the theoretically more important LUD results would be affected by compatible or incompatible action effects.

A further objection might be that the presence of the previous choices on the screen turned the free choice task into a “cue-dependent task”. Of course, we cannot exclude that participants’ used different strategies between conditions. It is the case, though, that the information about the previous choices were actually always available to the participants in form of a memory trace. The presence of the WM support arrows merely made it more accessible.

To attain more and converging evidence from a different kind of experimental manipulation, we experimentally increased WM load through an n-back task in Experiment 2.Footnote 3

Experiment 2

In Experiment 2, we paired a free choice task with a WM-intensive task to induce WM load. Specifically, we alternated a free choice task with an n-back task for this purpose (Kirchner 1958). In all n-back conditions, participants had to react only under specific circumstances: For 0-back, whenever a stimulus (colored circles that were displayed left/right and above/below center on the screen) with a pre-specified color or location appeared, and for 1-, 2-, and 3-back whenever the stimulus color or location in a given trial matched that n trials ago. The two relevant stimulus features (color vs. location) were chosen to generalize the results and counteract potential modality-specific influences. Furthermore, this experiment completely avoids the potential confound of compatible action effects from Experiment 1. Conversely to the previous experiment, we predict that with an increasing WM load, the LUDs should deviate more from zero and the RTs should increase.

Methods

Participants

Thirty-two people from the Tübingen area participated for monetary compensation or course credit (Mean age = 24 years, 22 female, 10 male). All participants reported normal or corrected-to-normal vision, were naïve regarding the underlying hypotheses, and provided written informed consent prior to data collection.

Apparatus and stimuli

Stimulus presentation and response collection happened on a PC connected to a 17-inch CRT monitor. Stimuli were a white fixation cross in the middle of the screen, circles that could be red, green, blue, and yellow and that could appear in the top left, top right, bottom left, and bottom right location of the screen as well as a white double-headed arrow. Stimuli were presented against a black background. The responses were given with the left and right Ctrl keys on a QWERTZ keyboard (free choice task) and foot pedals placed under the feet of the participants (n-back task).

Tasks and procedure

Participants performed two tasks in alternation (for an illustration, see Fig. 5). In the free choice task, they were to freely choose one of the two manual response options in response to the appearance of the double-headed arrow. In the n-back task they were to compare the current stimulus with a specific one or one that occurred n-trials back. A trial (with both tasks) started with the appearance of a fixation cross for 250 ms, followed by a blank screen for 250 ms, followed by the appearance of an n-back stimulus for up to 1500 ms or until a response was given, followed by another fixation cross and blank screen, which in turn was followed by the double-headed arrow appearing for 1500 ms or until a response was given. After this, the inter-trial interval was 250 ms.

Fig. 5
figure 5

a Example of a sequence of displayed stimuli and fixation crosses on the screen. b Example of a sequence of n-back stimuli (free choice stimuli not displayed). In the color-based 2-back condition, only panel (4) would require a response, while in the location-based 2-back condition, panel (3) would require a response

n-back level was manipulated block-wise. Every level appeared a total of four times, with the task either requiring attention to the color or the location of the stimulus. A participant performed all levels of the color n-back blocks first or all of the location n-back tasks first, followed by the other block-type. The order of the block-types was balanced according to a Latin square for the first half and then mirrored for the second half of the experiment, in which the order of the color and location base of the task from the first half was repeated.

Participants performed in 16 blocks (four n-back levels, two modalities, repeated twice) of 61 responses each for n-back conditions 1, 2, and 3, and 60 responses each for the 0-back condition. They were informed before each block which criterion needed to be fulfilled for the n-back task in order to press the foot pedal and which foot pedal to use. Half of the participants used the left foot pedal in the first half of the experiment and the right foot pedal in the second half and vice versa for the other half of the participants. The criterion was fulfilled when either a specific color or a specific location appeared for the 0-back task, or when the color/location in the current trial matched the color/location 1, 2, or 3 trials before the current one. The course of a trial as well as an example for the n-back task is illustrated in Fig. 5.

Participants were instructed to give about equal amounts of right and left responses and to avoid patterns (e.g., alternating left and right responses or repeating sequences) in the free choice task. There was one test session per participant which lasted about 45 min. In cases where the distribution of free choice responses skewed too far in one direction (> 80%) data of the participant was discarded and new data collected (1 case).

Design and analyses

As in Experiment 1, the two dependent measures were the LUDs and the RTs. The independent variables were the WM load condition (0 vs. 1 vs. 2 vs. 3 in the n-back task). For analyses of LUDs we again analyzed four different window sizes (2 vs. 4 vs. 6 vs. 8). Accordingly, two main analyses were performed: LUDs were analyzed with a 4 × 4 ANOVA with WM load and window size as repeated-measures. RTs (and error rates for the n-back task) were analyzed with an ANOVA with WM load as a repeated-measure. Because we predicted LUDs increasingly deviating from zero with increasing WM load and increasing RTs (and error rates in the n-back task), we calculated Helmert contrasts on WM load (Contrast 1: 0-back vs. higher difficulties; Contrast 2: 1-back vs. higher difficulties; Contrast 3: 2-back vs. 3-back). In case of an interaction between window size and the Helmert contrast, separate Helmert contrasts for each window size were calculated and are reported in the “Appendix” section.

LUDs were calculated on the whole data set once sufficient responses were given for the respective window size. For the subsequent analyses, trials were excluded as outliers if their RTs deviated more than 2.5 SDs from the respective cell mean (calculated separately for each participant).

Results

In a preliminary analysis, we included the relevant stimulus feature (location vs. color; 2.86% outliers based on free choice RTs). With LUDs as the dependent variable, this additional variable did not yield a significant main effect, F(1, 31) = 3.94, p = .056, ηp² = 0.11, and the three-way interaction WM load × window size × stimulus feature was also not significant, F(9, 279) = 0.96, p = .421, ηp² = 0.03, ε = 0.36 (Greenhouse–Geisser estimate). With RTs as the dependent variable, also no significant main effect was observed, F(1, 31) = 3.75, p = .062, ηp² = 0.11, and the interaction WM load × stimulus feature was also not significant, F(3, 93) = 2.74, p = .063, ηp² = 0.08, ε = 0.78. To simplify the main analyses, we thus dropped this variable from further analyses.

Manipulation check: performance in the n-back task

We excluded 2.35% of trials as outliers (based on only the n-back task), and descriptive statistics are provided in Table 3. Contrast 1 yielded no significant result for the RTs, t(31) = 1.91, p = .065, but did yield a significant result for the error rates, t(31) = 12.60, p < .001. Contrast 2 was significant for both the RTs, t(31) = 2.35, p = .025, and the error rates, t(31) = 13.51, p < .001. Contrast 3 also was significant for both the RTs, t(31) = 2.19, p = .036, and the error rates, t(31) = 13.10, p < .001. Thus, the n-back task induced a load as expected.

Table 3 Means (and SDs) of RTs in ms, LUDs, and percentage of errors (PE) for Experiment 2 for each WM load condition

Free choice task

Average LUDs and RTs (2.81% outliers) for the free choice task are visualized in Fig. 6 and are summarized in Table 3. LUDs increased with WM load for all window sizes and all contrasts were significant and in the same direction. Contrast 1 was significant, t(31) = 3.42, p = .002, but it interacted with window size, t(31) = 3.36, p = .002. Contrast 2 was also significant, t(31) = 3.15, p = .004, and it interacted with window size, t(31) = 2.41, p = .022. Finally, Contrast 3 was significant, t(31) = 3.835, p = .001, but did not interact with window size, t(31) = 1.65, p = .110. Note, however, that the descriptive pattern was the same for all window sizes despite the interactions (for more details on separate analyses per window size, please see the “Appendix” section). For RTs, Contrast 1 was not significant, t(31) = 1.68, p = .104, but RTs increased for the following levels and both Contrast 2, t(31) = 4.94, p < .001, and Contrast 3, t(31) = 2.33, p = .026, were significant.

Fig. 6
figure 6

Mean distances from ideal local unevenness (LUDs) (for the window sizes 2, 4, 6, and 8) and response times (RTs) in the free choice task in Experiment 2 for each level of working memory (WM) load. Error bars are 95% within confidence intervals (separate for all window sizes in case of the LUDs) (Loftus and Masson 1994)

Discussion

In summary, for the critical analyses, all contrasts were in the predicted direction and significant except for the difference in RTs and LUDs for the window size 2 for the contrast between the 0-back condition and higher n-back conditions. Thus, in line with our predictions, randomness (and RTs) in the free choice task decreased with increasing WM load. These results again suggest that free choice tasks are similar to random generation tasks.

General discussion

In this study, our participants performed free choices combined with either a WM support manipulation (Exp. 1) or a WM load manipulation (Exp. 2). This was done to investigate whether the impact of these manipulations on the patterns in free choices is the same as the impact of such manipulations on random generation tasks. To support WM, we displayed the previous choices that the participants made. To increase WM load, we used a concurrent n-back task.

Summary of results

In both experiments, the direction of the results was consistent with the idea that free choice tasks are related to random generation tasks: overwhelmingly, lack of WM support as well as higher WM load led to responses in which the LUDs were farther away from what would be expected in a random sequence of choices, which were also slower. More specifically, the absolute values of the LUDs moved in a more positive direction with less WM support/more WM load, suggesting that the proportion of sequences that are less balanced increased (e.g., L-L-R-L, R-R-R-R for a window size of 4).

Theoretical implications

In light of these results, we tentatively suggest that free choice tasks (as they are used in contemporary research) are at the very least related, if not outright identical, to random generation tasks, giving support to ideas expressed, for example, by Frith (2013) or Schüür and Haggard (2011): that free choice tasks are not what they are often thought to be. Frith claimed that free choice tasks are essentially random generation tasks, while Schüür and Haggard claimed that free choice tasks are either underdetermined or determined by uncontrolled internal cues like the preceding choices. Both ideas are compatible with our results: Hindering the maintenance of a memory trace of previous choices leads to responses that are ‘less random’ and also slower.

This potentially has wide-reaching implications for the literature on self-generated action. Assume that free choice tasks are in fact random generation tasks (as our results suggest). At least two cases can be distinguished: First, if one commits to the idea that self-generated-ness of actions must exclude all aspects of random generation, our results imply that free choice tasks do not operationalize self-generated action. Second, and in contrast, if one assumes that randomness is an inherent component of self-generated actions, then unfortunately the role of free choice tasks is even more unclear, because even without any extra cognitive load it is unclear whether the resulting sequence of actions is truly random.Footnote 4 This assumption seems not be universal among researchers though. Passingham et al. (2010, p. 18), for example, mention as a condition for self-generated action that “One action can serve as a cue for the next action”. In other words, one action is not independent from previous actions.

Limitations

An intrinsic limitation for every investigation into randomness of responses is the requirement to choose which measure of randomness to use. This choice effectively determines which kinds of patterns can be detected. It is always possible though that participants chose a different non-random production strategy for choosing responses that the researchers in question did not take into account. In our case, we chose a measure of (non-)randomness that essentially measures the proportion of different levels of balancedness in the response strings and whether they skew more towards balanced or unbalanced strings. Two weaknesses of this measure are that it cannot detect the order of responses within one window nor their identity. The string L-L-R-L looks, from a LU perspective, the same as the strings R-R-L-R and L-R-L-L: all result in LU = 0.25. However, this limitation would only pose serious problems if we had observed no differences in randomness between our different conditions.

Another issue is that WM manipulations affect the RTs of tasks involving higher cognitive processes of any kind. This makes a pure RT analysis not diagnostic with regard to whether a task is a random generation task. However, there is no reason to assume that the detected randomness of a task that is not a random generation task would suffer from WM load or benefit from WM support. This supports the interpretation of the present results as indicative of free choice tasks being random generation tasks. While this interpretation relies on drawing an analogy between free choice tasks and random generation tasks, we can at present only speculate about the specific mechanisms behind our results. One example of a plausible candidate mechanism known from the random generation literature, is the inhibition of prepotent associates (e.g., Towse and Neil 1998). Easier monitoring of ongoing choices could make it easier to identify and suppress these stereotypical responses (e.g., fewer repetitions than would be appropriate for a random sequence).

Conclusion

We investigated whether LU, as a measure of randomness, based on responses from free choice tasks and RTs in this task is affected by WM support and load in a similar way as random generation tasks are. In short, we observed that they are and conclude that free choice tasks are related to or identical to random generation tasks. This potentially casts doubt on some types of investigations into self-generated action. The present study also provides evidence that random (response) generation is one of the processes that contribute to the mean RT difference between free and forced choice tasks, a difference that was tentatively attributed to the non-accumulation time by Naefgen et al. (2017). It is an open question whether this is the full extent of what makes up this difference or if there are other, additional processes that differentiate free and forced choice tasks.