What’s in a name: The role of verbalization in reinforcement learning

Schaaf, Jessica V.; Johansson, Annie; Visser, Ingmar; Huizenga, Hilde M.

doi:10.3758/s13423-024-02506-3

What’s in a name: The role of verbalization in reinforcement learning

Brief Report
Open access
Published: 20 May 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Psychonomic Bulletin & Review Aims and scope Submit manuscript

What’s in a name: The role of verbalization in reinforcement learning

Download PDF

Jessica V. Schaaf ORCID: orcid.org/0000-0002-4856-9592^1,2,3^na1,
Annie Johansson¹^na1,
Ingmar Visser^1,4,5 &
…
Hilde M. Huizenga^1,4,5

707 Accesses
4 Altmetric
Explore all metrics

Abstract

Abstract (e.g., characters or fractals) and concrete stimuli (e.g., pictures of everyday objects) are used interchangeably in the reinforcement-learning literature. Yet, it is unclear whether the same learning processes underlie learning from these different stimulus types. In two preregistered experiments (N = 50 each), we assessed whether abstract and concrete stimuli yield different reinforcement-learning performance and whether this difference can be explained by verbalization. We argued that concrete stimuli are easier to verbalize than abstract ones, and that people therefore can appeal to the phonological loop, a subcomponent of the working-memory system responsible for storing and rehearsing verbal information, while learning. To test whether this verbalization aids reinforcement-learning performance, we administered a reinforcement-learning task in which participants learned either abstract or concrete stimuli while verbalization was hindered or not. In the first experiment, results showed a more pronounced detrimental effect of hindered verbalization for concrete than abstract stimuli on response times, but not on accuracy. In the second experiment, in which we reduced the response window, results showed the differential effect of hindered verbalization between stimulus types on accuracy, not on response times. These results imply that verbalization aids learning for concrete, but not abstract, stimuli and therefore that different processes underlie learning from these types of stimuli. This emphasizes the importance of carefully considering stimulus types. We discuss these findings in light of generalizability and validity of reinforcement-learning research.

Lowered inter-stimulus discriminability hurts incremental contributions to learning

Article Open access 01 September 2023

Unintentional response priming from verbal action–effect instructions

Article Open access 02 April 2022

Further evidence of automatic reinforcement effects on verbal form

Article 21 November 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

In reinforcement-learning studies, people learn which stimulus yields the highest reward. These stimuli are usually abstract such as foreign language characters (e.g., Daw et al., 2011; Frank et al., 2004; Pessiglione et al., 2006) or fractals (e.g., Gläscher et al., 2010). To study reinforcement-

learning processes in developmental and aging populations, abstract stimuli are often replaced by concrete pictures of, for example, everyday objects (e.g., Eppinger et al., 2009; Eppinger & Kray, 2011; van de Vijver et al., 2015; van den Bos et al., 2009; Xia et al., 2021). This raises the question of whether the same learning processes underlie reinforcement learning of abstract and concrete stimuli. In the current preregistered study, we therefore tested whether abstract and concrete stimuli yield different reinforcement-learning performance, and whether potential differences are due to verbalization.

A recent reinforcement-learning study (Farashahi et al., 2020) showed superior learning for concrete as compared to abstract stimuli. However, the mechanism underlying this superior learning remains understudied. A potential mechanism is verbalization, that is, naming stimuli while learning, as it is hypothesized to modulate otherwise non-verbal cognitive processes (Kray et al., 2015; Lupyan, 2012). Specifically, verbalization may aid reinforcement-learning performance as it helps keep information in working memory. According to Baddeley’s classic working-memory model (1986; for a more recent review, see Baddeley & Hitch, 2019), a part of the working-memory system, called the phonological loop, temporarily stores verbal information and rehearses this information through inner speech. If stimuli are easier to encode phonologically, it is easier to appeal to this phonological loop while learning. We argue that concrete stimuli are easier to name than abstract ones, which makes them easier to encode phonologically, subsequently resulting in better reinforcement-learning performance for concrete than for abstract stimuli.

In support of this idea, there is ample evidence for a general role of working memory in reinforcement learning, that is, that people tend to rely on working memory (as opposed to associative reinforcement learning) when the number of things to learn fall within people’s working-memory capacity (e.g., Collins, 2018; Collins & Frank, 2012). In addition, experimental studies already showed that verbalization can aid performance in various (otherwise non-verbal) cognitive processes, including working memory (Forsberg et al., 2020; Souza & Skóra, 2017), category learning (Lupyan et al., 2007; Lupyan & Casasanto, 2015; Minda & Miles, 2010; Vanek et al., 2021; Waldron & Ashby, 2001; Zeithamova & Maddox, 2006; Zettersten & Lupyan, 2020), and motor learning (Gidley Larson & Suchy, 2015). Two recent studies (Radulescu et al., 2022; Yoo et al., 2023) addressed whether verbalization can also aid reinforcement learning. Yoo and colleagues (2023) showed that people performed worse in a condition in which concrete pictures represented the same object than in a condition in which the pictures represented different objects, concluding that verbal discriminability (i.e., distinguishable stimulus names) is particularly important for learning. Similarly, Radulescu and colleagues (2022) showed that people performed worse in a condition in which stimuli were difficult to verbalize than in a condition in which they were easy to verbalize (see also Waltz et al., 2007). Both studies drew conclusions about the effects of verbal processes on learning based on indirect measures of verbalization, that is, they relied on the assumption that verbalization was affected differently in different conditions. We implemented a direct measure of the effects of verbalization on learning. Specifically, we adopted a dual-task design in which people learned abstract and concrete stimuli while verbalization was either hindered or unhindered. Such a dual-task design allows one to assess whether a certain cognitive process plays a larger role in one condition than in another (Pashler, 1994). As we were specifically interested in whether verbalization plays a larger role when learning concrete stimuli than when learning abstract stimuli, we adopted a dual task that suppresses verbalization, that is, we let participants count to three. Doing this while learning, a procedure commonly applied when investigating the effects of verbalization (Nedergaard et al., 2023), suppresses participants’ ability to rehearse verbal information in their phonological loop (e.g., Baddeley & Larsen, 2007; Miyake et al., 2004), precluding them from using verbalization to aid learning.

Experiment 1

We primarily hypothesized an interaction effect between stimulus type and verbalization condition on accuracy, that is, that the detrimental effect of hindered verbalization would be more pronounced for concrete compared to abstract stimuli. In addition, we expected main effects of both stimulus type and verbalization condition on accuracy. That is, better learning for concrete compared to abstract stimuli (Farashahi et al., 2020) and better learning in the unhindered verbalization condition because it only requires performing a single task (Pashler, 1994). We also expected to observe these effects in interaction with trial, that is, that hindered verbalization would especially lead to slower learning for concrete stimuli (stimulus type x verbalization condition x trial interaction), that learning would be faster for concrete than abstract stimuli (stimulus type x trial), and that learning would be faster in the absence compared to the presence of the verbalization task (verbalization condition x trial).

Method

Preregistration

All procedures and analyses were preregistered within the Open Science Framework as Reinforcement learning of abstract vs concrete stimuli (https://osf.io/qwu3g). These analyses are labeled confirmatory analyses. Any other analyses are considered exploratory, and are specified as such. Data and analysis code are freely available at https://osf.io/w9fv4/.

Participants

A power analysis for the multilevel logistic regression on accuracy (Olvera Astivia et al., 2019) with medium effect sizes for the main effects (i.e., 0.5) and a small effect size for the interaction of interest (i.e., 0.25) indicated a required sample size of 50 to detect the crucial interaction between stimulus type and verbalization condition with a power of 0.9. As such, a total of 68 participants were recruited through the University of Amsterdam. A total of 18 participants were excluded, either because they did not perform the verbalization task correctly (either forgetting to count on more than six beats in a row or more than 25 beats in total, as checked by a present experimenter; n = 16) or because of technical failures (n = 2). We did not exclude any participants based on their learning performance because we anticipated that if hindered verbalization would affect learning in abstract stimuli, performance in this condition could drop to chance level. Thus, the final sample consisted of 50 participants (24 female, one other, M_age = 21.5 (3.0) years, range: 18–33 years). We only recruited participants without experience with the Hiragana alphabet or character-based languages to minimize individual differences in the ability to verbalize these abstract stimuli. We reimbursed participants when they completed the long-term retention task within 36 h after testing (see section Reinforcement-learning task) and, as preregistered, performed analyses on data from these participants (n = 44).

Reinforcement-learning task

Experimental design

We adopted a 2 x 2 within-subjects design in which we manipulated stimulus type (i.e., abstract vs. concrete) and verbalization condition (i.e., hindered vs. unhindered). Each participant performed one block of each combination (four blocks in total) in a randomized order with one constraint: only one of the manipulations changed between two subsequent blocks. For example, an abstract block with verbalization task was followed by either a concrete block with verbalization task (different stimulus type) or an abstract block without verbalization task (different verbalization condition). Each block was followed by a testing block (short-term retention). After 24–36 h participants again performed a testing block (long-term retention).

Task design

As illustrated in Fig. 1, in each learning block, we presented participants with four new stimulus pairs from which they learned the stimulus with the highest expected value, that is, the correct stimulus, based on feedback. Specifically, we instructed participants that their goal was “to win as many points as possible by clicking on one of the stimuli” and told them that the more points they earned, the higher the monetary bonus they would receive. The stimuli in a pair were fixed across the trials of a block and each pair was presented 16 times (64 trials per block, four blocks, resulting in a fixed total of 256 trials for all participants). The order of the pairs was randomized per four trials such that pairs were presented a maximum of twice in a row. In two of the four blocks (i.e., the unhindered verbalization blocks), participants heard a metronome (80 bpm) but did not have to say anything. In the other two blocks (i.e., the hindered verbalization blocks), we instructed participants to say “1, 2, 3” repeatedly out loud on the beat of the metronome during learning. This articulatory-suppression manipulation hinders verbalization by occupying the phonological loop (Baddeley et al., 1984; Emerson & Miyake, 2003).

Immediately after each learning block, a testing block followed. In this testing block, the same pairs as in the learning block were presented, four times each (randomized per four trials). Participants were asked to indicate the correct stimulus (formulated as: “Please click on the stimulus you think usually resulted in winning 10 points”), but did not receive any feedback on their choice. As such, performance in testing blocks did not count towards their reimbursement. The testing block was self-paced (i.e., without response deadline). After 24–36 h, participants again performed a testing block, but now with all 16 pairs (each presented four times, randomized per 16 trials).

Practice

Before each learning block, participants practiced the new stimulus type and verbalization condition combination. In this practice block, two stimulus pairs were presented for eight trials each (totaling to 16 trials).

Stimuli

We selected stimuli that are commonly used in reinforcement-learning studies because we aimed to uncover the potential role of verbalization in such studies. As abstract stimuli, we used characters from the Hiragana alphabet (for examples, see Frank et al., 2004; Hämmerer et al., 2011; Simon et al., 2010). As concrete stimuli we used pictures of everyday objects^{Footnote 1} (for examples, see Eppinger et al., 2009; Eppinger & Kray, 2011; van de Vijver et al., 2015; van den Bos et al., 2009; Xia et al., 2021) from the MultiPic database (Duñabeitia et al., 2018); we only considered stimuli with average visual complexity and one-syllable names in both English and Dutch. To only select stimuli with similar verbalizability and to assess whether the abstract stimuli were indeed harder to verbalize than the concrete ones, we conducted a pilot study in which we asked participants to come up with a name for the stimuli and to indicate how difficult it was to do so. To select stimuli that were similarly difficult to discriminate, we also asked participants to rate how similar they found the stimuli in a pair. For details on this stimulus selection procedure and pilot results, we refer to Online Supplemental Material (OSM) Text I.

Feedback

In all learning blocks, choices for one, which we coin the correct, stimulus would usually lead to winning 10 points (75% of trials), and only sometimes to losing 10 points (25% of trials). Choices for the other, incorrect, stimulus would usually lead to losing 10 points (75% of trials), and only sometimes to winning 10 points (25% of trials). Which stimulus was correct was determined randomly per participant.

Procedure

Participants were tested individually in a lab cubicle. They sat in front of a laptop with mouse and keyboard and received on-screen instructions about the learning and short-term retention tasks. An experimenter was always present during testing to check whether the participant performed the verbalization task (i.e., saying “1, 2, 3” on the beat of the metronome) correctly. This on-site experiment took approximately 30 min. At the end of the on-site experiment, participants saw the bonus they earned on the screen, were asked how they experienced the experiment (both on-screen and by the experimenter), and were informed about the long-term retention task. After 24 h, participants received a link to this task via email. They were instructed to complete it at home within 12 h (i.e., 24–36 h after the learning task). If participants did not complete the long-term retention task in time, they received €5 or course credits as reimbursement. If participants did complete the task in time, they received the reimbursement plus a bonus equal to the number of points won in the learning task divided by the number of trials (i.e., 256). This resulted in a bonus between €0 and €10 (M_bonus = €1.39 (€0.95)). We told participants that winning more points would lead to a higher reimbursement, but they were unaware of the formula used to convert points into money. After completing the learning task, they were informed that bonus money would only be paid out when the long-term retention task was completed in time.

Data analyses

Learning

To test whether abstract and concrete stimuli yield different reinforcement-learning performance and whether potential differences are due to verbalization, we performed a multilevel logistic regression analysis on trial-by-trial choice accuracy in the learning blocks of the reinforcement-learning task; we did so using the glmer function from the lme4 package (Bates et al., 2015). We modeled fixed effects of stimulus type (abstract (coded as -1) versus concrete (coded as 1)), verbalization condition (hindered (coded as -1) versus unhindered (coded as 1)), trial (linear, centered, such that all effects excluding trial are estimated in the middle of learning), and all two- and three-way interactions, as well as random intercepts and random slopes for the main effects. We fixed covariances between random effects to zero.

Retention

To test for these same effects on retention rates, we performed a multilevel linear regression analysis on short- and long-term retention rates, defined as the average proportion correct in the testing blocks, that is, collapsed across pairs and trials; we did so using the lmer function from the lme4 package (Bates et al., 2015). We modeled fixed effects of stimulus type, verbalization condition, delay (short vs. long) and all two- and three-way interactions, as well as random intercepts and random slopes for the main effects. Covariances between random effects were fixed to zero.

Response times

Finally, we performed an exploratory multilevel linear regression analysis on trial-by-trial response times (irrespective of choice accuracy). We modeled fixed effects of stimulus type, verbalization condition, trial, and all two- and three-way interactions, random intercepts and random slopes for the main effects, and fixed covariances between random effects to zero. Also, we modeled first-order autoregression to take the autocorrelation between the error term across trials into account. We did this using the lme function from the nlme package (Pinheiro et al., 2022).

Results

Confirmatory analyses: Learning and retention

Learning

Most importantly, as illustrated in Fig. 2, results showed no interaction between stimulus type and verbalization condition (p = .51) and no three-way interaction between stimulus type, verbalization condition, and trial (p = .76). Thus, in contrast to our expectations, the effect of the verbalization task did not differ for abstract and concrete stimuli. Results did show a main effect of stimulus type (z = 4.03, p < .001), indicating higher accuracy for concrete than abstract stimuli, and an interaction between stimulus type and trial (z = 3.2, p = .001), indicating accuracy improved faster across trials for concrete as compared to abstract stimuli. Finally, results showed neither a main effect of verbalization condition (p = .27) nor an interaction between verbalization condition and trial (p = .52).

Retention

Most importantly, as illustrated in Fig. 3, results showed neither a stimulus type x verbalization condition interaction (p = .21), indicating no difference in the effect of the verbalization task across stimulus types, nor a stimulus type x verbalization condition x delay interaction (p = .74), indicating this effect did not differ between short and long delay. Results did show a main effect of stimulus type (t(48.1) = 4.6, p < .001), indicating better retention for concrete than abstract stimuli, but no stimulus type x delay interaction (p = .11). In addition, results showed neither a main effect of verbalization condition (p = .31), nor a verbalization condition x delay interaction (p = .86). Finally, they did show slightly better retention after short than long delay (main effect of delay: t(228) = -2.0, p < .05).

Taken together, these results suggest no differential effect of the verbalization task on learning from, and retention of, abstract versus concrete stimuli. They show in addition that both learning and retention were better for concrete as compared to abstract stimuli, and that learning and retention were unaffected by the verbalization task.

Exploratory analyses: Response times

One possible explanation for the lack of effect of the verbalization task on accuracy is that participants slowed down in order to keep up their performance, that is, that participants traded off their speed and accuracy (e.g., Wickelgren, 1977). We therefore performed an exploratory multilevel regression analysis on response times. Most importantly, as displayed in Fig. 4, results showed an interaction between stimulus type and verbalization condition (t(12743) = -2.3, p = .02) and a three-way interaction between stimulus type, verbalization condition, and trial (t(12743) = -3.4, p = .001). Follow-up analyses in each stimulus type separately only indicated a detrimental effect of hindered verbalization on response times for concrete stimuli (main effect of verbalization condition: p = .77; verbalization x trial interaction: t(6347) = -4.8, p < .001), not for abstract ones (both ps > .50). In addition, results showed no main effect of stimulus type (p = 0.10), but did show an interaction between stimulus type and trial (t(12743) = -6.3, p < .001), indicating that response times decreased faster for concrete as compared to abstract stimuli. Finally, results did not show a main effect of verbalization condition (p = .73), but did show an interaction between verbalization condition and trial (t(12743) = -2.8, p = .006), indicating that response times decreased faster in the unhindered than the hindered verbalization condition.

Interim conclusion

We predicted that the detrimental effect of hindered verbalization on learning and retention would be more pronounced for concrete than for abstract stimuli. However, results did not show any effects of the verbalization task on learning or retention. Rather, exploratory analyses revealed our predicted interactions between stimulus type and the verbalization task on response times.

Experiment 2

The result that we found our predicted interactions on response times instead of on accuracy may be explained by a speed-accuracy trade-off. To test this explanation, we performed a second experiment in which we prevented participants from slowing down by reducing the response window (as commonly done in the response-time literature; Wickelgren, 1977) and investigated stimulus type and hindered verbalization effects on accuracy, retention, and response times. We followed the same preregistered procedure as in Experiment 1 with three changes: we reduced the response window in the learning task from 2.5 to 1.5 s, we added a practice retention block, and performed the exploratory analysis on response times in a confirmatory manner (see section Reinforcement-learning task and data analyses).

With such a short response window, we expected the results found in Experiment 1 on response times to now become apparent on accuracy. Thus, with respect to accuracy, we predicted a detrimental effect of hindered verbalization on learning specifically in concrete stimuli. With respect to response times, we predicted no differential effect of hindered verbalization on abstract and concrete stimuli.