Stimulus equivalence in humans is a robust phenomenon, demonstrated by most participants who learn the relevant conditional discrimination baselines. There has been some variability in outcomes of equivalence tests, which may not be surprising, considering the diversity of stimuli, procedures, and characteristics of participants in the different studies (e.g., Almeida-Verdu et al., 2008; Carr, Wilkinson, Blackman, & McIlvane, 2000; Devany, Hayes, & Nelson, 1986; Eikeseth & Smith, 1992; Fields, Arntzen, Nartey, & Eilifsen, 2012; Lazar, Davis-Lang, & Sanchez, 1984; R. R. Saunders, Drake, & Spradlin, 1999; Sidman & Tailby, 1982). Some participants have shown prompt formation of equivalence classes (e.g., Sidman & Tailby, 1982). Many participants have shown equivalence only after repeated testing or after additional manipulations, such as teaching common names to stimuli, have been used (e.g., Eikeseth & Smith, 1992; Lazar et al., 1984). In several studies, there have also been some participants who did not show class formation with the training and testing parameters employed (e.g., Devany et al., 1986; Fields et al., 2012). The sources of this variability and the necessary and sufficient conditions for positive results in equivalence tests are not yet entirely clear.

McIlvane and colleagues have argued that a considerable proportion of variability in test outcomes may be related to different controlling relations, or stimulus control topographies in their terminology, in baseline conditional discriminations. Responses may be controlled by the specific relations between stimuli intended by the experimenter or may be controlled by other features, such as specific stimulus features or positions of stimuli (e.g., ; McIlvane & Dube, 2003; McIlvane, Serna, Dube, & Stromer, 2000 see also Carrigan & Sidman, 1992; de Rose, 1996; and Johnson & Sidman, 1993, for the importance of controlling relations in determining outcomes in equivalence tests).

The blank-comparison method (e.g., McIlvane et al., 1987, Wilkinson & McIlvane, 1997) has been used, often in two-comparison conditional-discrimination tasks, to detect specific controlling relations. This method involves substituting a mask for one of the comparisons so that the mask replaces the comparison designated as correct (S+) in some trials and the comparison designated as incorrect (S-) in other trials. Usually, the mask replaces the S+ in 50 % of the trials and the S- in the remaining 50 % of the trials. In addition, a fading procedure is often used to gradually introduce the mask. After participants are responding consistently with the mask, probes are inserted, substituting the mask either for the S+ or the S-. A sample-S+ controlling relation may be inferred when the participant consistently responds to the S+ in trials with the mask replacing the S- and responds at chance level in trials with the mask replacing the S+. A sample-S- controlling relation may be inferred when the participant consistently responds to the mask in trials with the S- displayed and responds at chance level when the S- is replaced by the mask. When the controlling relation is sample-S+, the sample controls selection of the S+, and the participant may not even notice the distinctive features of the S- (this is sometimes called a select stimulus control topography). When the controlling relation is sample-S- (sometimes called a reject stimulus control topography), the sample controls rejection of the S-, and the participant may not even notice the distinctive features of the S+. These controlling relations do not necessarily exclude one another: the sample may control both selection of the S+ and rejection of the S-. In this case, participants should consistently select the S+ and reject the S- when those stimuli are displayed with the mask; thus, participants respond to the mask in these trials (e.g., de Rose, Hidalgo, & Vasconcellos, 2013; Grisante, de Rose, & McIlvane, 2014).

Kato, de Rose, and Faleiros (2008) showed that formation of six-member equivalence classes was more probable when performance in blank-comparison probes was consistent with both sample-S+ and sample-S- controlling relations (i.e., participants both selected the S+ and rejected the S-). In a recent study, de Rose et al. (2013) used the blank-comparison method when teaching conditional discriminations rather than in test trials. They reasoned that if both sample-S+ and sample-S- controlling relations increase the probability of class formation, then training conditional relations with the blank-comparison method would assure that baseline training would engender both relations, and therefore the probability of class formation would increase. In one condition, de Rose et al. taught conditional relations AB, BC, and CD to four children (7 to 10 years of age) with the standard blank-comparison procedure, with the mask replacing either the S+ or the S- in 50 % of the trials, respectively. The four participants showed immediate formation of equivalence classes. A second condition was conducted with the same participants, designed to prevent sample-S+ relations in the BC conditional discrimination, so that conditional relations AB and CD would involve both sample-S+ and sample-S- relations, but conditional relation BC would involve only sample-S- relations. To do so, conditional discrimination BC was trained with the mask replacing the S+ in 100 % of the trials. Accounts of equivalence based on stimulus control topographies (Carrigan & Sidman, 1992; McIlvane et al., 2000) would predict that participants would not form the usual classes under these conditions. Two of the participants did not show class formation, as predicted. However, two participants did show class formation. They did so even with a training in which the B samples were never displayed together with the S+.

De Rose et al. (2013) claimed that their findings confirm the hypothesis that equivalence class formation is enhanced when training guarantees both sample-S+ and sample S- relations and attributed the unexpected results of Condition B to a history of pretraining and training with several two-comparison conditional discriminations. In other words, the participants may have acquired generalized conditional responding (K. J. Saunders, Saunders, Williams, & Spradlin, 1993) that allowed the development of sample-S+ relations between a particular sample (such as B1) and the S- displayed with the other sample (B2, always displayed with its corresponding S-, C1). This indirect acquisition of sample-S+ relations was possible because in the participants’ history with several conditional relations, the incorrect comparison for one sample was always the correct comparison for the other sample. To control for this possibility, the present study replicated the basic design of de Rose et al. (2013). Relations were trained between three samples and three comparisons using a two-comparison-per-trial format. In Condition B, for all BC trials, the S+ was replaced with the mask, and only one of the two possible S-s were presented with each sample (see Table 1 for the trial types used in the study). This should have prevented the indirect acquisition of sample-S+ relations because the S- displayed with each sample could be the S+ for each of the two other samples.

Table 1 Trial Types for Relations Trained in Conditions A and B

Method

Participants

Four typically developing children, a boy and three girls, with ages between 7 and 12 years, participated. They attended public elementary schools, and their native and only language was Portuguese. They attended the lab between three and four times per week, and the study lasted for three to three and a half weeks.

Setting and Materials

Sessions were conducted in a small room at the lab, containing a desk and a chair. An iMac Apple Macintosh computer with the MTS software V. 11.1 (Dube & Hiris, 1997) presented stimuli and recorded responses. The software displayed stimuli on five 3 cm × 3 cm white “windows” on a gray background, at the center and corners of the screen. Samples always appeared in the center window and comparison stimuli in the corners. In the experimental phases, stimuli were line drawings of abstract pictures, such as those used in de Rose et al. (2013). Familiar pictures were also used in the pretraining (see Procedure). Participants responded to the stimuli by mouse-clicking on their windows. Correct responses produced a visual display of moving colored stars with a sequence of tones. Incorrect responses produced a 3-s timeout, during which the screen remained dark. The next trial began immediately after the consequence. In probe trials (see Condition A, below), there were no differential consequences.

Children were transported to the lab from their homes in a school van and participated in sessions one at a time. The other children stayed in a larger room with a table, drawing materials, games and computers. This room also contained a cupboard with transparent Plexiglas doors, displaying small toys and stationery items that the children could choose after sessions, based on the percentage of correct responses during the session.

Procedure

Pretraining

The objective of the pretraining was to provide experience with conditional discriminations, first without a blank-comparison and later with a blank-comparison, with stimuli different from those to be used in the experiment proper. The pretraining was based on that used by de Rose et al. (2013) but was extended to include training with conditional discriminations relating three samples to three comparisons in a two-comparison-per-trial format.

Children initially learned to match familiar stimuli that were thematically related (e.g., selecting the picture of a moon as a comparison stimulus when the sample was a picture of the sun, and selecting a picture of grapes when the sample was the picture of a pear). The blocking procedure (K. J. Saunders & Spradlin, 1990) was used to teach this and subsequent conditional discriminations (see de Rose et al., 2013, for details). After children attained criterion on this conditional discrimination, at least three additional conditional discriminations were taught in succession, each relating two samples to two comparison stimuli. Stimuli in these tasks, and in the remaining of the experiment, were abstract pictures. Stimuli were different for each conditional discrimination. For children who had difficulties with the acquisition of these conditional discriminations, the number of conditional discriminations taught was increased until the child acquired a new conditional discrimination with few errors. Three other conditional discriminations were then taught, each relating three samples to three comparisons. Each trial displayed the correct comparison and only one of the incorrect comparisons. The blank comparison was then introduced in trials of a conditional discrimination already mastered by the child: a black mask was faded in until it completely replaced one of the comparisons in each trial (see de Rose et al., 2013, for details of the fading procedure). Then, a new conditional discrimination, with three samples and three comparisons, was taught with the blank comparison present in all training trials. The mask replaced the S+ in 50 % of the trials and the S- in the remaining trials.

Condition A

All children were submitted initially to this condition. They learned conditional discriminations AB, BC, and CD, each relating three samples to three comparisons. Stimuli were different from those used in the pretraining. The blank comparison was used throughout: each trial displayed a sample, one comparison stimulus, and the mask. Table 1 presents the trial types for each relation. Half of the trials displayed the S+ and the other half displayed one of the S-s. Thus, when the sample was A1, 50 % of the trials displayed the S+ (B1) together with the mask; the other 50 % of the trials displayed the mask together with one of the S-s, B2. When the sample was A2, the S- displayed in 50 % of the trials was B3. When the sample was A3, the S- displayed in 50 % of the trials was B1. Analogous trial types comprised conditional discriminations BC and CD.

AB training was conducted in blocks of 24 trials, with each of the six trial types presented four times in randomized order. Position of the comparison stimuli was also randomized. After children reached criterion of no more than one incorrect response in a 24-trials block in AB training, conditional discrimination BC was trained, in the same way. Then, 12 trials each of conditional discriminations AB and BC were intermixed in the following block. After the same criterion was attained, conditional discrimination CD was trained. The following block presented 12 CD trials intermixed with six AB and six BC trials. After criterion was achieved, children were instructed that the computer would no longer tell whether selections were correct or incorrect. They had then a block intermixing eight trials of each trained relation, without differential consequences, in preparation for probes. After the learning criterion was attained in this block, probes were conducted in blocks of 24 trials each of relations DA, DB, and CA, in this order, without differential consequences. There were no baseline trials interspersed within these blocks. This sequence was repeated for participants Luciana and Lucia, respectively. Probe trials displayed two comparisons, without the mask. Thus, in probe trials with a sample from Class 1, the incorrect comparison was always from Class 2 (e.g., sample D1, comparisons A1 and A2). In probe trials with a sample from Class 2, the incorrect comparison was always from Class 3, and with a sample from Class 3, the incorrect comparison was always from Class 1.

Condition B

Conditional discriminations AB, BC, and CD were trained with new stimuli. Training and probes were as in Condition A, except for conditional discrimination BC. As in de Rose et al. (2013), in BC training trials, the mask always replaced the S+. Trial types in Condition B are presented in Table 1. Therefore, the BC conditional discrimination trained in this study did not provide a basis to acquire sample-S+ relations as supposedly occurred in de Rose et al. In that study, in the BC conditional relation of Condition B, there was only one C stimulus that was not presented with each B sample, so that stimulus was necessarily the S+. For instance, B1 was always presented with C2 and the mask, so the only possible S+ would be C1. In this study, there were two C stimuli that were not displayed with each B sample, so the S+ could be either one of them.

The expected result was that participants would show equivalence class formation in Condition A and not in Condition B. Condition A was conducted first for all participants because formation of equivalence classes with one set of stimuli facilitates subsequent class formation with another set (Buffington, Fields, & Adams, 1997). Therefore, equivalence formation in Condition A should facilitate equivalence formation in Condition B. Thus, a higher equivalence yield for Condition A could not be attributed to order of conditions.

Results

All participants needed more than one training block in order to achieve 96 % correct responses (no more than one incorrect response) for each new relation. Table 2 shows the total number of training blocks for each participant in Conditions A and B.

Table 2 Number of Training Blocks to Criterion

Probe results for Conditions A and B are shown in Table 3. Responses consistent with equivalence were considered correct. All participants showed higher scores in probes after Condition A than in probes after Condition B. Lucia and Luciana showed low scores in the two initial probe blocks in Condition A, which tested for the DA and DB emergent relations, but scored at or above 87 % correct in the third block, which tested for the CA emergent relation. Because this suggested delayed class formation for these participants, additional probes were conducted. Considering only the last test block for each relation, Luciana scored 96 %, 87 %, and 92 % correct in probes DA, DB, and CA, respectively (1, 3, and 2 incorrect responses, respectively), indicating the emergence of stimulus equivalence. It may be argued that equivalence emerged also for Lucia, although her results are not so clear: she scored above 83 % correct for all relations, with 4, 4, and 3 incorrect responses in probes DA, DB, and CA, respectively.

Table 3 Percentage of Responses Consistent with Equivalence for Each Probe Block in Conditions A and B

Maria and João had only one probe session for each relation. Maria’s scores indicate that stimulus equivalence emerged: 92 %, 87 %, and 100 % correct responses for probes DA, DB, and CA, respectively. Results for João are less clear. The increase in scores for successive probes suggests gradual emergence: correct responses were 79 %, 83 %, and 92 % for the DA, DB, and CA probes, respectively.

Therefore, in Condition A, two participants showed clear indication of the formation of equivalence classes, whereas two others showed results that could be interpreted as gradual emergence of equivalence. Scores in probes for Condition B were below 40 %, with the exception of CA for Maria, in which scores reached 75 %.

Discussion

In this study, we expected that participants would show prompt formation of equivalence classes in Condition A (as in de Rose et al., 2013) but that no participant would show class formation in Condition B (different than in de Rose et al., 2013). The second expectation was confirmed, but the first was not. The results of Condition A in the present study showed higher variability compared to those of de Rose et al. However, two participants showed clear indication of class formation, and it may be argued that the results of at least one of the other participants, and possibly of both of them, are also consistent with class formation.

The use of three samples and three comparisons, in the present study, resulted in a decrease in probe accuracy compared to de Rose et al. (2013). This is probably due to the increase in complexity of the task. However, the “yield” of equivalence classes in Condition A of the present study was much higher than what is usually found with a linear design and arbitrary stimuli (e.g., Arntzen & Holth, 1997). It has been found that the usually low yield of linear designs can be increased by different manipulations, such as inclusion of a meaningful stimulus in the trained stimulus set (e.g., Fields et al., 2012), or pretraining of discriminative functions with one of the arbitrary stimuli in the set (e.g., Nartey, Arntzen, & Fields, 2015). The present results, together with those of de Rose et al. (2013), indicate that another such manipulation is a procedure ensuring that both sample-S+ and sample-S- relations are formed in conditional discrimination baselines. Successful baseline performance in Condition A required selection of the S+ in trials in which the mask replaced the S- and rejection of the S- in trials in which the mask replaced the S+. A high yield also may be obtained when the mask procedure is used only in a proportion of baseline trials (Grisante et al., 2014).

Condition B was designed to prevent equivalence class formation: participants would learn the BC relation based only on sample-S- relations. This should disrupt class formation, according to the analyses of Carrigan and Sidman (1992) and McIlvane et al. (2000). In de Rose et al. (2013), two participants formed classes in Condition B. These authors attributed this result to the training of relations between two samples and two comparisons for participants who already had a history in which the S- for one sample was always the S+ for the other. This did not happen in Condition B of the present experiment. In the BC relation of Condition B, each sample was always displayed with the mask and the same incorrect comparison (see Table 1). The incorrect comparison for one of the samples could be the correct comparison for either one of the two other samples. This would prevent an indirect formation of sample-S+ relations in Condition B and could explain the absence of class formation in this condition. The present results also strengthen the hypothesis of de Rose et al. about why some participants showed equivalence class formation in Condition B of that study.

It is important to note a limitation of the present study: probes in Condition A were repeated for some participants that showed signs of delayed emergence of equivalence classes. This was not done in Condition B because no participant scored above 80 % in probes. Further research should, however, provide equal opportunity for delayed emergence in all conditions. It may also be argued that, in the present study, participants actually learned sample-S+ relations between each comparison and the mask in the BC relation. This is not likely based on the results of de Rose et al. (2013). If the mask were the S+ in that study, responding should be disrupted in probe trials in which both samples B1 and B2 were displayed with comparisons C1 and C2, in the absence of the mask. Participants performed accurately in those probe trials, confirming that they had learned to reject the S-. In those probes, however, participants selected the S+ in the presence of the sample. However, selecting the S+, rather than the mask, in the presence of each sample, may have established sample-S+ relations, even in unreinforced probes.

The present study, therefore, strengthens the hypothesis of de Rose et al. (2013) that equivalence class formation is enhanced and intersubject variability is reduced when training assures both sample-S+ and sample-S- controlling relations. Subsequent research is necessary to confirm this hypothesis and determine the generality of the present findings. Determining sources of variability in outcomes of training designed to establish equivalence classes is important both for theoretical and applied reasons. As sources of variability are established, determinants of stimulus equivalence could be clarified, which is essential for solving theoretical disputes in the field (e.g., Hayes, Barnes-Holmes, & Roche, 2001; Horne & Lowe, 1996; Sidman, 2000). Also, as applications of stimulus equivalence to teaching and rehabilitation increase (e.g., Almeida-Verdu et al., 2008; de Souza et al., 2009; Fienup, Covey, & Critchfield 2010; Rehfeldt, 2011; Rehfeldt & Barnes-Holmes, 2009), determining sources of variability is essential to the design of increasingly effective and efficient procedures.