Stimuli that function as reinforcers for behavior in most individuals sometimes do not function for behavior in others, and extra effort may be required to establish them as such. Yet, the recommendations regarding the most effective procedures for establishing new reinforcers are diverse in applied behavior analysis as well as in basic research (e.g., Dozier, Iwata, Thomason-Sassi, Worsdell, & Wilson, 2012; Holth, Vandbakk, Finstad, Grønnerud, & Sørensen, 2009; Kelleher & Gollub, 1962; Petursdottir & Lepper, 2015; Williams, 1994). Researchers have recently been particularly interested in conditioned reinforcers within four specific areas of early skill development in humans.

First, behavior analysts have been studying joint attention: the coordination of attention between a child and its social partners with respect to a common thing or event of interest. Operant analyses of joint attention skills (Dube, MacDonald, Mansfield, Holcomb, & Ahearn, 2004; Holth, 2005, 2011; Jones & Carr, 2004) have suggested that social conditioned reinforcers, such as others’ head turnings, smiles, nods, and comments, shape and maintain initiating joint attention skills in the natural environment. These operant analyses gave a clear recommendation of ensuring the social function of joint attention skills through conditioning typical social consequences as reinforcers prior to, or as part of, the establishing of joint attention skills.

Second, establishing speech sounds as conditioned reinforcers has also been an area of great interest over the last several decades, first with the early intervention programs for children with autism (Lovaas et al., 1966; Sundberg, Michael, Partington, & Sundberg, 1996) and lately in different investigations specifically concerned with teaching verbal behavior (Greer, Pistoljevic, Cahill, & Du, 2011; Petursdottir, Carp, Matthies, & Esch, 2011; Petursdottir & Lepper, 2015; Stock, Schulze, & Mirenda, 2008). The rapid expansion of speech-sound production in typically developing children seems to occur when sounds similar to those heard from others begin to function as reinforcers for the children’s own sound production (Lovaas, 2003; Miguel, Carr, & Michael, 2002; Palmer, 1996; Sundberg & Partington, 1998). In line with Skinner’s (1957) analysis, certain verbal functions (e.g., tacts, echoics, and intraverbals) seem to depend on generalized conditioned reinforcers. Greer and Du (2014) suggested that the source of many communicative functions is the establishment of conditioned reinforcers through experience, specifically the establishment of conditioned social reinforcers.

Third, conditioned reinforcers appear to play a crucial role in the establishment of naming, which is an integration of speaker and listener behavior (e.g., Horne & Lowe, 1996). For example, working with children with and without autism spectrum diagnoses Longano and Greer (2014) demonstrated the emergence of naming following the establishment of visual and auditory conditioned reinforcers for relevant observing responses.

The fourth area of interest is the use of the well-known token reinforcement economies. Basic research on token economies has mainly focused on conditioned reinforcement and the circumstances under which neutral stimuli come to acquire reinforcing functions. The applied field has concentrated on more practical concerns (e.g., program implementation, staff training, and generalization), and this discrepancy between research focuses of basic and applied studies was recently identified by Hackenberg (2018) as a gap in the relation between laboratory and applied work.

In all four areas, the identification of the most effective means to condition new reinforcers may be used to improve efforts to speed up the development of useful behavioral repertoires in children with or without specific learning deficits, whether it is using a token economy or to form socially conditioned reinforcers. Hence, several researchers have sought to identify reliable and effective procedures aimed at establishing formerly neutral stimuli as conditioned reinforcers (Dozier et al., 2012; Greer & Singer-Dudek, 2008; Holth et al., 2009; Isaksen & Holth, 2009; Jones & Carr, 2004; Lugo, Mathews, King, Lamphere, & Damme, 2017; Zrinzo & Greer, 2013).

In behavior analysis, a common procedure prescribed for establishing conditioned reinforcers is usually referred to as stimulus–stimulus pairing, rooted in the principles of Pavlovian or classical conditioning (e.g., Cooper, Heron, & Heward, 2014; Dozier et al., 2012; Sundberg et al., 1996). In a stimulus–stimulus pairing (SSP) procedure a neutral stimulus is temporally correlated with an already established reinforcer (similar to the pairing of a neutral stimulus and an unconditioned stimulus in Pavlovian or classical conditioning) resulting in the neutral stimulus gaining strength as a conditioned reinforcer (Gollub, 1970).

An alternative means to condition new reinforcers is through an operant discrimination training (ODT) procedure, as proposed by Keller and Schoenfeld (1950), which is used in some applied studies (Holth et al., 2009; Isaksen & Holth, 2009; Lepper, Petursdottir, & Esch, 2013; Lovaas et al., 1966; Taylor-Santa, Sidener, Carr, & Reeve, 2014). In an operant discrimination training procedure, a previously neutral stimulus acquires reinforcing properties when it is established as a discriminative stimulus for a response that produces a reinforcer (Skinner, 1938/Skinner, 1991).

Both basic and applied research have provided mixed results with the different procedures used to establish conditioned reinforcers. Brief stimulus pairings with unconditioned reinforcers have sometimes established powerful conditioned reinforcers without the procedural requirement that the stimuli are established as discriminative (Kelleher, 1966; Miliotis et al., 2012; Rader et al., 2014; Rodriguez & Gutierrez, 2017; Stein, 1958; Yoon & Bennett, 2000). On the other hand, some studies have demonstrated a weak or nonexistent effect of establishing conditioned reinforcers by pairing formerly neutral stimuli with reinforcers (Holth et al., 2009; Lovaas et al., 1966; Reichow, Doehring, Cicchetti, & Volkmar, 2011; Stock et al., 2008). Hitherto, some researchers have verified the subsequent effectiveness of stimuli as conditioned reinforcers after they were established as discriminative stimuli through operant discrimination training (Holth et al., 2009; Isaksen & Holth, 2009; Lepper et al., 2013; Lovaas et al., 1966; Skinner, 1938/Skinner, 1991; Taylor-Santa et al., 2014). In summary, it appears that stimuli that have been paired with reinforcers can sometimes become conditioned reinforcers, and that the operant discrimination procedure can successfully establish new stimuli as conditioned reinforcers, and at times have done so even when the pairing procedure failed (e.g., Lovaas et al., 1966).

Demonstrations of the relative effectiveness of conditioned reinforcers have typically been carried out during experimental extinction, as with the new-response technique and the established-response technique (Kelleher & Gollub, 1962). In both techniques, the unconditioned reinforcer is no longer available: only the formerly neutral stimulus is presented as contingent either on a new response or on an already established response (Dozier et al., 2012; Kelleher & Gollub, 1962; Sosa, Santos, & Flores, 2011; Williams, 1994). Usually, the SSP and the ODT procedures are compared and evaluated in such extinction tests, using either the new-response technique or the established-response technique. Short-term, brief, and small effects are often reported problems with testing the effect of conditioned reinforcers in the absence of unconditioned reinforcement, as pointed out in reviews by Kelleher and Gollub (1962) and Williams (1994). This limitation has been emphasized in a number of applied studies (Dozier et al., 2012; Isaksen & Holth, 2009; Lepper et al., 2013; Lugo et al., 2017; Taylor-Santa et al., 2014). In a review of research on token economies, Hackenberg (2018) also points to an additional limitation regarding testing during extinction: the presentation of the conditioned reinforcers in the discriminable absence of unconditioned reinforcement. Such discriminable absence of unconditioned reinforcement may be relatively atypical in a natural, nonexperimental setting. Also, rather little of our everyday behavior is reinforced every time it occurs (Jenkins, 1950). A possible solution is to include occasional unconditioned reinforcement in testing, to prevent potentially rapid extinction that occurs in tests that withhold unconditioned reinforcement (e.g., Zimmerman, 1957). Hackenberg (2018) suggested the use of extended or chained and concurrent-chain schedules, in which the test stimuli continue to be paired with unconditioned reinforcers. Procedures that include unconditioned reinforcers generate more robust responding, and have proven useful in the analysis of conditioned reinforcement more broadly (Fantino, 1977; Gollub, 1970; Shahan, 2010; Williams, 1994; Zimmerman, 1957).

The present experiment examined the effectiveness of conditioned reinforcers in settings that do not involve extinction. It is also an exercise in employing novel methodology with a traditional problem (e.g., Iversen & Lattal, 1991; Sidman, 1960). To maintain responses reinforced by the conditioned reinforcers, we arranged a variable-ratio reinforcement schedule in the acquisition test of the effectiveness of the conditioned reinforcers. The procedure is similar to a concurrent-chain procedure with double intermittency of reinforcement (Kelleher & Gollub, 1962; Zimmerman, 1957). In concurrent-chain schedules, the initial link consists of two equal schedules. Completion of either of the concurrent initial-link schedules leads to a characteristic terminal-link schedule so that preference for either of the terminal-link conditions is reflected in the differential completion of the initial-link schedules (e.g., Fantino, 1969). Concurrent-chain schedules have been widely used to study conditioned reinforcement, and the critical measure is the relative rate of responding in the initial link of the chain (Williams & Dunn, 1991).

In the acquisition of a new response, we first examined the effect of thinning the reinforcement schedule in the initial link of the chain (choice of two levers), and next we examined the effect of thinning the schedule in the terminal link reducing the probability of reinforcers (water). In the initial link, the subjects could press the left or the right lever. Lever presses would produce an ODT trial (left lever) or an SSP trial (right lever) on an intermittent schedule. In the terminal link of an ODT trial, the left light would turn on, and a flap-door opening response in the presence of the left light would produce water. In the terminal link of an SSP trial, the right light would turn on and water was delivered (without a response requirement). The purpose of the present experiment was to investigate whether the operant discrimination training or the stimulus–stimulus pairing established a conditioned reinforcer more effectively using a concurrent-chain procedure.

Method

Subjects

Four Wistar albino male rats (Han Tac) obtained from a commercial supplier (Charles River Breeding Centre, Germany) were used. The rats were approximately 21 days of age, weighing 68–80 g, at the start of the experiment. The rats were housed separately in opaque plastic cages 35 × 26 × 16 cm (height) placed in a holding rack (Camfil). They had free access to food (RM3 (E) from Special Diet Services, Witham, Essex CM8 3 AD, UK). Before each session, the rats were deprived of water for 22½ hr, and they had free access to water for 1 h after each session. The animal quarters was lit between 08:00 am and 08:00 pm, the room temperature was kept at 20 ± 2°C, and humidity at 55 ± 10%. The study was preapproved by the National Animal Research Authority (NARA) and was carried out according to the Norwegian laws and regulations controlling experiments/procedures using live animals.

Apparatus

The experiment was conducted in four identical standard Campden (410-R) operant chambers, enclosed in custom made soundproof boxes with ventilation fans. The chambers were equipped with two retractable levers and two lights (15 W), positioned 2.6 cm above each lever. The levers were placed 10.9 cm apart and 5 cm above a grid floor. The levers required a force of at least 0.1 N for depression. A 15 W bulb located in the center of the ceiling illuminated the cage. The rat's working space was 24.2 × 20.0 × 21.0 (height) cm. A 0.04 ml squirt of tap water was used as the reinforcer, dispensed by a peristaltic pump into a recessed tray located halfway between the levers. Starting simultaneously with the tray light turning on, the water pump operated for 1 s and produced a motor humming sound. The tray light was lit for 2 s when water was made available in a cup in the tray. Unconsumed water would remain in the cup. The tray opening was 4.5 cm wide and 4.0 cm high, and covered by a hinged plastic flap door. Access to the tray required the opening of the hinged plastic flap door, with a required force of less than 0.1 N. Each chamber was placed separately in a sound-attenuating cubicle, and each animal used the same operant chamber throughout all sessions.

Each chamber was connected by an interface (ADU208 USB Relay I/O) to a laptop (HP, Compaq nw 8440, with Microsoft Windows XP Professional 2002, Service pack 3, using software written in Microsoft Visual Basic 1.0 (rev. 141) 2010 Express) that automatically controlled presentation and removal of stimuli, operation of the peristaltic pump, and recorded flap-door openings and lever presses.

Procedure

Each daily session was conducted from 09:00 to 09:30 am, and session duration was fixed at 30 min. The experiment lasted for 72 consecutive days. Table 1 gives an overview of the phases of the experiment. First, all four rats received six sessions of habituation and six sessions of magazine training. The rats were water deprived prior to every session from Session 7 on. Over Sessions 12 and 14, pressing the left lever was shaped and then continuously reinforced for Rats 3906 and 3907, whereas pressing the right lever was shaped and continuously reinforced for Rats 3908 and 3909. In Sessions 13 and 15, pressing the other lever was reinforced. From Session 16 on, both levers were retracted until the acquisition of new responses.

Table 1 A general overview of the procedures

Different Stimuli and their Function in the Procedure

The two lights above the levers served as the stimuli that were to be established as conditioned reinforcers. The left light was used as the initially neutral stimulus in the operant discrimination training, and the right light was used as the neutral stimulus in the stimulus–stimulus pairings. Hereafter, the left-situated light is referred to as the ODT light and the right-situated light is referred to as the SSP light. The programmed duration of the ODT light was determined from pilot studies in our lab. During the ODT procedure, the pilot rats’ reaction time from onset of the light to the flap-door opening was fairly consistent at 0.5–0.9 s. To arrange a time interval that allowed for responding in the presence of the light, the ODT light was programmed to last for up to 1 s. In the SSP procedure, an attempt was made to set the duration of the light short enough to limit unintentional establishment of the stimulus as discriminative (for any response, including flap-door opening). Also, the delay from the presentation of the SSP light to the delivery of water should be long enough to avoid that the SSP light might be overshadowed by the water delivery. Therefore, in the current SSP procedure, we fixed the duration of the light at 0.5 s. This duration is also recommended as the optimal duration of the neutral stimulus in the pairing literature (Bersh, 1951; Jenkins, 1950; Kimble, 1961).

The levers served as “new-response” operanda in a later Acquisition Phase run to determine how the different establishing procedures affected such acquisition of new responses. Except for being present during four sessions of initial shaping, the levers were retracted until the acquisition sessions. The required response in the ODT procedure was to push open the hinged plastic flap door (that covered access to the tray). The “opening of the flap door” was chosen as the required response in the ODT in order to match the response effort across conditions, because the rats would also have to open the flap door to get access to the delivered water in the SSP procedure.

Operant Discrimination Training (ODT) Establishing Procedure

During the initial four ODT sessions, the ODT light (i.e., the stimulus to be established as discriminative; the left light) was presented according to a variable time (VT) 20 s schedule, ranging from 10 to 30 s. Over the remaining ODT sessions in the establishing phases, the VT schedule for ODT light presentations was gradually increased to 40 s, ranging from 20 to 60 s. In the presence of the ODT light, opening the flap door to the water tray (the required response) produced the water, and the light would turn off contingent on the flap-door opening (see also the upper panel illustration in Fig. 1). During the first two ODT sessions, the ODT light was lit for 3 s and then reduced to 1 s for the rest of the ODT sessions (from Session 18 in Establishing Phase 1 and Session 34 in Establishing Phase 2, respectively). A limited hold (LH) for opening the flap door to the tray following onset of the ODT light, was set to 10 s and then gradually reduced to 7 s, to 5 s, to 3 s and, finally, to 1 s and corresponding with the ODT-light duration (from Session 23 in Establishing Phase 1 and Session 39 in Establishing Phase 2). This initial arrangement of the LH was set up to make sure that the rats would make contact with the contingency between opening the flap and water delivery but at the same time limit the light exposure in the ODT to differ as little as possible from the duration of light exposure in the SSP. If the rat opened the flap door during the presentation of the light, the light switched off immediately and water was delivered. Water would also be delivered if the rat opened the flap door after the light had been switched off, but within the current LH (Sessions 16–22 in Establishing Phase 1, and Sessions 32–38 in Establishing Phase 2). A 10-s reset delay (RD) prevented flap-door opening from occurring during the last 10 s of the VT before each ODT light presentation. The RD on the ODT light onset was arranged to make it more likely that flap-door opening eventually would come under control of the ODT light.

Fig. 1
figure 1

A schematic presentation of the programmed events and responding during ODT and SSP training procedures. In ODT training, the left light turns on according to the scheduled VT and with a maximum duration of 1 s. If the rat responds (opening of the flap door) at any time within the 1 s, the left light turns off and water is delivered for 1 s. In the SSP procedure the right light turns on according to the scheduled VT, and lasts for a fixed duration of 0.5 s and then turns off and water is delivered for 1 s. With no response requirement. Thus, with both procedures, there is no delay between opening the flap door and getting access to water

We defined stimulus control in ODT such that the response (flap-door opening) had to occur within the 1-s light limit for at least 90% of the trials over three successive sessions. This criterion was met within 11 sessions in the first establishing phase. There was no specific behavior-based criterion in the SSP condition. Hence, the number of training sessions for the two rats in the SSP condition, as well as for all rats in Establishing Phase 2, was also set to 11 to have similar exposure to both conditions for all rats.

Stimulus–Stimulus Pairing (SSP) Establishing Procedure

The stimulus–stimulus pairing was presented on a VT 20-s, ranging from 10 to 30 s, as for the ODT procedure (see Table 1 for procedural details). The SSP light (i.e., the stimulus to be paired; the right light) turned on according to the VT, with the same gradual increase of the VT to 40 s, and a 10-s RD operated in the VT schedule, as in the ODT procedure, and water was delivered when the SSP light (0.5 s) turned off, without the requirement of any response from the rat. The SSP light was thus preceded with water delivery. In addition, the motor humming sound from the water pump and the light in the tray necessarily accompanied water delivery. The lower panel in Fig. 1 illustrates the SSP training procedure. Although flap-door openings in the presence of the ODT light produced water in the ODT procedure, water was delivered at the offset of the 0.5 s SSP light in the SSP procedure, but the rat had to open the flap door to access the water. The RD in the SSP procedure prevented the SSP light from turning on while the rat’s head was inside the tray.

Acquisition of Lever Pressing

The purpose was to determine how ODT and SSP procedures may differently affect acquisition of a new response. Figure 2 illustrates the programmed events and the possible actions (responses) during the different parts of the phase. A “forced choice” was programmed in the beginning of each session to make sure that the subjects’ behavior made contact with both contingencies at the start of each session. After pressing one lever and producing the corresponding trial (left lever started the ODT trial [left light for max 1 s–flap-door opening–water]; right lever started the SSP trial [right light for 0.5 s (no response requirement)–water]), that lever was inactive until a press on the other lever had occurred followed by the corresponding trial. After this, both levers were working in a free-choice situation, available all the time. The arrangement was similar to a concurrent-chain procedure consisting of an initial link and a terminal link. Both levers were available all the time. In the initial link, lever presses were followed by the corresponding light, whereas in the terminal link, water was followed by the light presentations with a scheduled leaning of reinforcement probability. As in concurrent-chain schedules, the subject chose between the two response alternatives in the first link, but as soon as a choice was made, the rejected alternative became unavailable until the start of the next trial. Both alternatives led to the same terminal-link stimulus (water) but only after both links had been completed (see Fig. 3 for an illustration).

Fig. 2
figure 2

The Figure shows the different events during the Acquisition Phase. Both levers are available and presses lead to the onset of the ODT or the SSP trial. Both the original ODT and SSP establishing procedures continue to alternate in the background in Part 1 of the Acquisition Phase, with the VT leaning from 40 to 60 s, and are terminated from Part 2 on. Now only lever presses initiate onset of the ODT or the SSP trial (on an intermittent schedule). In the Part 3, water is delivered with a .25 probability, that is, water is delivered on the average of every fourth light presentation for both ODT and SSP (hence the dotted line for water delivery)

Fig. 3
figure 3

The experimental procedure used during the acquisition of new responses, here illustrated as a concurrent-chain procedure. Both levers were available all the time and in the initial link of the procedure, lever presses were first followed by the corresponding light in a within-session progressive FR 1–10 (Part 1), and then intermittently on a VR 5 (Part 2), and a VR 3 (Part 3). In the terminal link of the procedure, the probability of the light presentation followed by water was p=1 (Part 1 and 2), and p= .25 (Part 3)

In the first part of the Acquisition Phase, the procedures (ODT and SSP) operated alternatingly and time contingently. The intervals between presentations of ODT or SSP stimuli was on variable intervals, and the intervals ranged from 20 s to 100 s. This alternation continued until the selection of either procedure was made by the rats’ own behavior (lever presses) prior to the end of the interval. Now, only lever presses produced ODT or SSP stimuli. During Part 1 of the Acquisition Phase, reinforcers for pressing either lever were programmed according to a within-session progressive ratio schedule (FR1-10, step size 1). That is, after each reinforcement, the ratio increased by one. During Part 2, from Sessions 60 to 66, we gradually thinned the reinforcement schedule for lever presses (in the initial link) until it reached a variable ratio (VR) of five responses, ranging from 2 to 9. In Part 3 (Session 67 through 72) the schedule of water deliveries contingent on light-producing lever presses was thinned (in the terminal link) from a probability of 1.0 to .25. On the average, only every fourth light-presentation would result in the delivery of water. At the same time, the VR schedule on lever presses was enriched from VR 5 to VR 3, ranging from 2 to 5, all to produce durable effects, and to avoid extinction of lever pressing during the last part of the Acquisition Phase (Sessions 67–72).

Order of Conditions Experience

All rats were exposed to ODT and SSP. Two of the rats were randomly selected to receive the conditions in the order of ODT–SSP, and the other two received the two conditions in the opposite order (SSP–ODT). The first phase, whether ODT or SSP, is referred to as Establishing Phase 1, and the second as Establishing Phase 2. Each phase was arranged for 11 sessions. The schematic event record in Fig. 1 illustrates the programmed events and possible responses during both conditioning procedures (ODT in the upper panel and SSP in the lower panel).

Finally, all four rats completed an acquisition phase to determine how the different establishing procedures affected acquisition of new responses. During this phase, both levers were available, and lever presses produced either the ODT trial (left light for max 1 s–flap-door opening–water) or the SSP trial (right light for 0.5 s (no response requirement)–water). The Acquisition Phase was arranged with similarities to a concurrent-chain procedure and was divided into three parts, dependent on the schedule operating in the initial and the terminal link.

Results

ODT and SSP Establishing Phases

In the first establishing phase, both rats in the ODT condition reached the discrimination criterion (at least 90% of all stimulus presentations followed within 1 s by the response of opening the tray flap door) within 11 sessions. Mean reaction times from the onset of the ODT light to opening the tray door in the final Establishing Phase 1 session (26) was 0.72 s for Rat 3906 and 0.80 s for Rat 3907. The same number of sessions was kept for both conditions in the second establishing phase, and the other two rats exposed to the ODT procedures in this phase also reached the discrimination criterion within 11 sessions. Mean reaction times in the final Establishing Phase 2 session (42) were 0.68 s for Rat 3908 and 0.91 s for Rat 3909.

Figure 4 shows the mean reaction times for all four rats through the last 11 sessions of each training phase. As can be seen in the lower panels of the figure, rats 3908 and 3909 were exposed to SSP first, and showed substantially higher reaction times during the initial SSP training than during the later ODT training. Rats 3906 and 3907 were exposed to ODT first, and showed approximately the same mean reaction times during the initial ODT and the later SSP training.

Fig. 4
figure 4

Mean reaction times from light onset to tray opening for each rat through the last 11 sessions of each training phase. Rats 3906 and 3907 were first exposed to ODT, whereas Rats 3908 and 3909 were first exposed to SSP. (Reaction times data from Rat 3907 in Session 36, and from Rat 3908 in Sessions 39 and 40 could not be recovered)

Acquisition Phase, Lever Pressing

Results are displayed in Fig. 5 which shows the number of responses on each of the two levers during the whole Acquisition Phase (Sessions 44–72). When the response-independent ODT and SSP procedures were still running (with VT 40–60 s), from Sessions 43 to 59 (Part 1), and lever presses were reinforced on a progressive FR1–10 schedule, all subjects pressed both levers. However, three of the rats (3906, 3907, and 3908) responded somewhat more frequently on the lever that produced the ODT trial. Only Rat 3909 emitted a higher number of responses on the lever that produced the SSP trial during this first part of the phase.

Fig. 5
figure 5

Number of lever presses per session in all four rats, respectively on the ODT or the SSP lever during the different parts of the Acquisition Phase. Session duration was always fixed at 30 min. Lever presses were followed by the corresponding trial (ODT or SSP trial). For Rat 3906, the data from Session 58 are missing due to an apparatus failure

In Part 2 of the Acquisition Phase (Sessions 60–66), where the response-independent ODT and SSP procedures were terminated and the within-session progressive FR schedule on lever presses changed to VR 5, the same three rats (3906, 3907, and 3908) continued to emit an increasingly higher number of responses on the ODT lever. Rats 3906 and 3908 in particular exhibited a relatively high number of responses on the ODT lever: rat 3906 emitted between 61 and 298 responses per session, and rat 3908 emitted between 131 and 458 responses per session. The number of presses on the SSP lever during Sessions 60–66 remained low for two rats, ranging from 13 to 41 (rat 3906) and from 25 to 68 (rat 3908) responses. Towards the end of this part of the Acquisition Phase, rats 3907 and 3909 emitted approximately the same number of presses on both levers.

In the last part of the Acquisition Phase when the probability of reinforcement was set to .25 in the terminal link, all four rats emitted a higher number of responses on the ODT lever than on the SSP lever. For three of the four rats, the difference was distinct throughout the Acquisition Phase, whereas for the fourth rat (3909), the difference was clear only over the last three sessions: rat 3909 switched from pressing more frequently on the SSP lever to pressing more frequently on the ODT lever from Session 70 on. The number of ODT-lever presses during Sessions 67–72 ranged from 158 to 279, from 197 to 399, from 388 to 526, and from 132 to 416, for rats 3906, 3907, 3908, and 3909, respectively. The number of presses on the SSP lever in the same sessions remained low for three of the rats, ranging from 11 to 37, from 18 to 97, and from 53 to 83, for rats 3906, 3907, and 3908, respectively. In contrast, for rat 3909, the number of SSP-lever presses ranged from 132 to 416. During this final part of the phase, lever presses produces the corresponding trials according to VR 3, and a .25 probability of water delivery at each light presentation. Data from this part are shown separately to demonstrate the significant difference in rate of responding across the two conditions when lever presses were intermittently followed by trials, and trial lights were intermittently accompanied by water delivery. The number of lever presses was markedly higher on the ODT lever than on the SSP lever for all four rats, though less distinct for rat 3909 than for the other three.

Also, when the rats could start the ODT trial by pressing the left lever and the SSP trial by pressing the right lever in this last phase of the acquisition, mean reaction times were slightly higher from onsets of the SSP light to tray door openings than from onsets of the ODT light to tray-door openings (Fig. 6). Hence, the delay of the intermittent water reinforcement following lever presses were slightly longer in the SSP procedure than in the ODT procedure. Toward the end of the Acquisition Phase, when the water was delivered intermittently, the tray openings were less consistent in the presence of the SSP light than in the presence of the ODT light (Fig. 7).

Fig. 6
figure 6

Mean reaction times from light onset to tray opening for each rat through final Acquisition-Phase sessions (60-72)

Fig. 7
figure 7

Probability of tray opening when the SSP or the ODT light is turned on. The dotted line marks the start of the Part 3-sessions in the Acquisition Phase when the probability of water was p= .25

Discussion

The purpose of the present experiment was to investigate and evaluate the relative effectiveness of two procedures identified in the literature on conditioned reinforcement; operant discrimination training (ODT) and stimulus–stimulus pairing (SSP). The main difference between the two procedures was the absence of a response requirement in the presence of the light in the SSP procedure, whereas in the ODT procedure, flap-door openings produced water only in the presence of the ODT light. We first established potential conditioned reinforcers, then we determined if there would be a difference in the acquisition of a new response when the consequence was an SSP or an ODT trial contingent on lever presses. Thus, presses on the ODT lever produced the ODT trial, and presses on the SSP lever produced the SSP trial. During the final acquisition sessions, when the contingencies were intermittent, the results showed a higher number of responses on the ODT lever than on the SSP lever for all four rats. This result suggests that the ODT light had acquired more effective conditioned reinforcing properties than the SSP light. Thus, the results of the present study are congruent with previous studies (e.g., Holth et al., 2009; Lovaas et al., 1966; Taylor-Santa et al., 2014), indicating an advantage of the operant discrimination procedure. These results are also compatible with the suggestions by Keller and Schoenfeld (1950) that the stimulus to be conditioned through operant discrimination must be established as a discriminative stimulus if it is to function as a conditioned reinforcer.

In addition, we wanted to avoid extinction during the evaluation of the potential conditioned reinforcers, and therefore we used intermittent reinforcement in the acquisition of new responses in a concurrent-chain arrangement. It has frequently been reported that responses intermittently reinforced in training usually show more resistance to extinction, and several authors have suggested this to be an important variable to produce durable effects in testing for conditioned reinforcer effects (e.g., Hackenberg, 2018; Kelleher & Gollub, 1962; Zimmerman, 1957). During the final part of the Acquisition Phase, water was delivered contingent on every 12th response on the average to prevent the rate of responding from declining too quickly during the evaluation, as it typically will when the connection between the conditioned and the unconditioned reinforcer is abruptly cut (e.g., Williams, 1994). We explored a double intermittency reinforcement schedule in the acquisition of the new responses, in the concurrent-chain procedure. By thinning the reinforcement in both the initial and then the terminal link, we succeeded in maintaining the response rates during the acquisition phase. This is in line with suggestions by Kelleher and Gollub (1962) and support the results by Zimmerman (1957).

A potential problem with the interpretation of the results of the present experiment stems from the continued, although intermittent, delivery of the unconditioned reinforcer during the exploration: including unconditioned reinforcement in evaluating the effect of conditioned reinforcement can affect responding apart from the effect via the conditioned reinforcers of interest. The acquisition of the new response was carried out under concurrent VR VR reinforcement schedules in the initial links and there is a possibility that the VR schedule of unconditioned reinforcement alone maintained the overall responding as responding moved toward one side and the rate of water reinforcement on that side therefore increased. A concern with concurrent VR VR schedules is that they tend to produce all-or-none allocation of responses on the relevant operanda. However, when the ratios are equal, no skewed distribution is expected (e.g., Herrnstein & Loveland, 1975). In the present experiment, the same VR 3 schedule of light presentation in the initial link, and a .25 probability of water delivery accompanying the light in the terminal link, was arranged to occur on presses on both levers. Further, to avoid that the rats could end up responding to one lever only, we arranged a forced choice, in which the rats had to sample both response consequences at the beginning of each session. Yet, all four rats ended up responding more frequently to the ODT-lever than to the SSP-lever. Despite the evident difference in the frequency of responding on the two levers from Session 60 during the phase for three of the four rats, the difference for rat 3909 first appeared in Session 70. After the probability of water deliveries in the presence of the lights was reduced to .25 presses to the ODT-lever occurred more frequently. It is possible that the differential effects of conditioned reinforcers established by the different procedures become clearer when the frequency of unconditioned reinforcers is lowered. In any case, under natural circumstances or in applied settings, the intermittent occurrence of unconditioned reinforcers may be more typical than a total absence of unconditioned reinforcers.

Although the VR schedule was the same for SSP and ODT, some minor, but potentially important differences remained. First, because the mean reaction times from light onset to tray opening were higher in SSP than in ODT, the delay of water reinforcers following lever presses was also somewhat longer in SSP than in ODT. Thus, the observed preference for the ODT option may have resulted in part from the differential delays to reinforcement. However, the difference in reaction times, and hence in delays, was not an independently controlled procedural feature. In fact, although mean reaction times were typically higher in SSP than in ODT, some of the lowest values were seen in the SSP condition. Furthermore, the procedural delay from the onset of the light to the start of the water pump was systematically shorter in the SSP procedure than in the ODT procedure. Although this delay stayed constant at 0.5 s in SSP, the mean delay from ODT light onset to water-pump startup typically stayed closer to 1 s, and was never as low as 0.5 s. This difference supposedly should have favored the SSP procedure.

Second, the mean time from the offset of the light to water delivery was longer in SSP than in ODT. This difference was a necessary feature of the procedure, because the light in the SSP procedure turned off after 0.5 s whereas it remained on until tray opening, or a maximum of 1 s, in the ODT procedure. The relatively short duration of the SSP light was set to restrict unintentional development of stimuli as discriminative for any responses. Yet, the rats had to approach the water tray to obtain the presented reinforcer and hence, some discriminative function of stimuli correlated with the reinforcer delivery cannot be excluded. The longer duration of the ODT light was chosen to ensure enough time for the animal to move toward and push open the flap door to the water tray during the presence of the light in the ODT procedure. The experiment focused on “light on” rather than “light off” as a conditioned reinforcer, but the fact that the SSP light turned off after 0.5 s led to a shorter mean exposure time to the light in the SSP condition than in the ODT condition. This difference in mean exposure time to the SSP and the ODT stimulus is a potential confounding variable, even if it is not obvious how this difference in mean exposure time may have affected the results. Anyhow, to determine if this small time difference plays a role, future experiments may use a yoked procedure to eliminate this initially unequal exposure to the stimuli to be conditioned.

The relative lack of effect of the SSP stimulus as a conditioned reinforcer may be an example of blocking (Kamin, 1969; vom Saal & Jenkins, 1970), or overshadowing (Rescorla & Wagner, 1972). Blocking may result from the previous magazine training, possibly because the sound of the water pump already served as a reliable predictor of reinforcement. In the SSP procedure, the light would not add anything to the predictive value of the sound of the water pump, whereas in the ODT procedure, the light would have the additional function as a discriminative stimulus for tray opening. As an alternative (or in addition), stimuli arising directly from the delivery of the reinforcer, such as sound or smell, may have overshadowed the lights. In the SSP procedure, although the light was presented 0.5 s prior to the water delivery, the light would never exceed those other stimuli as a predictor of water, and when the light—water contingency was made intermittent, the light became a less reliable water predictor. In the ODT procedure, the light would still set the occasion for tray opening, even when tray opening was only intermittently reinforced. Thus, when water reinforcement occurred intermittently, the ODT-conditioned reinforcer surpassed the SSP-conditioned reinforcer for the behavior of all four rats. This differential effect of the SSP and the ODT procedures is consistent with the literature on observing responses (e.g., Dinsmoor, 1983, 1995; Wyckoff, 1952): the ODT involves a contingency for observing the light, whereas the SSP procedure does not. The fact that the rats exposed to ODT first had shorter reaction times than those first exposed to SSP may result from the initial contingency for observing behavior only present in the ODT. Related to the above, the ODT involved an additional operant discrimination contingency in which water was available only within the limited hold, whereas this was not the case in the SSP. This procedural difference can have added to the ODT stimulus may come to function more effectively as a conditioned reinforcer.

Yet another potential source of the differential effects of the ODT versus SSP procedures may follow from an inherent difference between the two procedures: although there was a performance-based criterion for evaluating the establishment of stimulus control in the ODT procedure, no such performance-based criterion was inherent in the stimulus–stimulus pairing procedure. Thus, a direct performance-based criterion for the SSP procedure would have required a separate test, for example of whether the formerly neutral stimulus had acquired an effect as a conditioned stimulus for some conditioned response. On the other hand, the fact that the ODT procedure allows us to discover when the new stimulus begins to affect behavior whereas the SSP procedure does not may constitute a major practical advantage of ODT over SSP.

The present experiment was based on a modified version of the new-response technique. A response must occur at least once for other reasons before it can be reinforced (Skinner, 1969). To make sure that responses to the left and to the right levers would occur so that the ODT or SSP light, respectively, could be presented contingent on the responses, pressing each of the levers had been shaped and continuously reinforced over two sessions initially during the experiment. The sequence of exposure to reinforcers on the left versus right lever was counterbalanced so that two rats started on the left lever and two started on the right. No data indicated that the order of shaping made a difference. Another variable that could be counterbalanced in a future study is the position of the SSP versus the ODT lever. There is the possibility that a preference for the left over the right lever may have arisen from uncontrolled variables, but it is unlikely that such a preference for one of two identical levers would be as large as the experimental effect seen in the last phase and occur for all rats.

The current finding that the ODT procedure was more effective than the SSP procedure does not imply that the SSP procedure by itself was not effective. In Holth et al. (2009), a new-response test in single-operant conditions was run in order to evaluate the effect of the different procedures, and it was therefore possible to evaluate and compare the absolute reinforcement effect in the two procedures. However, the single-operant test procedure may occasionally yield high rates of responding to almost all stimuli and lead to false positive predictions about relative reinforcement effects. Roscoe, Iwata, and Kahng (1999) showed that when assessing high- and low-preferred stimuli as reinforcers in a concurrent operant arrangement, the participants consistently showed preference for one stimulus, called the high-preferred stimulus. When the low-preferred stimulus was assessed in a single operant arrangement, response rates for the participants were as high as those observed for the high-preferred stimulus during the concurrent arrangement. In any case, although the SSP stimulus in the present experiment might have functioned as an effective conditioned reinforcer in the absence of the ODT stimulus, the concurrent choice arrangement favored ODT.

Thus, future studies should continue to explore procedures based on operant discrimination training to establishing neutral stimuli as conditioned reinforcers—in humans—and specifically regarding research on joint attention, verbal behavior, and naming skills. Further, despite the need for further replications, a superiority of the ODT procedure compared to the SSP procedure with respect to establishing new stimuli as conditioned reinforcers appears to be strengthened by the results of the present experiment. The results are also in line with earlier studies that have shown a lack of conditioned reinforcing effect after using stimulus–stimulus pairing procedures (Esch, Carr, & Michael, 2005; Holth et al., 2009; Lovaas et al., 1966; Normand & Knoll, 2006; Schoenfeld, Antonitis, & Bersh, 1950). Stimulus–stimulus pairing procedures are well known, easy to implement, and may work well to condition new reinforcers in many situations. However, the influence of possible problems of implementation, such as overshadowing or blocking, need to be explored. It is for applied behavior analysis to develop the most effective procedure for the establishment of conditioned reinforcers, for example, for the behavior of children diagnosed with autism, for whom natural contingencies often do not suffice. An obvious limitation of the present study was that it involved only four rats. If the present results can be demonstrated with human participants, the ODT procedure seems recommendable. In addition to appearing more effective, procedures based on operant discrimination training seem to facilitate desired stimulus control and also to ensure that it is the scheduled stimulus that controls behavior and not an unintentional one.