Skinner (1957) defined the tact relation as a verbal response occasioned by a non-verbal discriminative stimulus (SD) and maintained by generalized reinforcement. The importance of an extensive tact repertoire, as described by Skinner, is that it extends the listeners “contact with the environment” (p. 85). However, the environment does not include exclusively visual stimuli, and an analysis of the tact to include other senses is warranted. The majority of the literature base on the tact has utilized visual SDs. Although it is reasonable to assume that many naturally occurring tacts are evoked by visual stimuli, it is important to tact antecedent stimuli of other senses. A proficient auditory tacting repertoire can supply several practical benefits. For example, skilled car mechanics are often able to tact mechanical issues in vehicles simply by listening to the motor running. This saves time compared to visually searching to identify the problem. Although perhaps overlooked, an efficient auditory tacting repertoire helps individuals interact with their environment in an effective manner. The ability to interact with nonvisual aspects of the environment could increase opportunities for social reinforcement and sharpen verbal abilities. For example, being able to tact sounds that animals make could both evoke a teacher’s praise and provide additional learning opportunities (e.g., the teacher describing animals and providing additional opportunities to engage in verbal behavior).

Another verbal operant that has received research attention recently is the intraverbal relation (e.g., Carp and Petursdottir 2012; Grannan and Rehfeldt 2012; Sautter and LeBlanc 2006). Skinner defined the intraverbal as a verbal response evoked by a verbal antecedent that lacks “point-to-point correspondence between properties of stimulus and response” (p. 185), and is maintained by generalized conditioned reinforcement. In addition, behavior analysts in general have increasingly been focused on emergent verbal responding (e.g., acquisition of unreinforced appropriate responses), which has led to an emphasis on developing strategies that do not require the direct instruction of intraverbal responses (Rehfeldt 2011). It has been pointed out by several authors (see Hayes et al. 2001; Skinner 1968) that direct instruction of every response is an inefficient means of teaching. Skinner (1957) acknowledged that once an individual has a sufficient learning history, and is a sophisticated speaker, the verbal operants can become functionally interdependent, meaning that if an individual can produce a response under one set of stimulus conditions (e.g., as a tact) they may also be able to emit the response under other conditions (e.g., as an intraverbal). The level of sophistication required of the speaker is unknown, and investigations of derived relations show examples of interdependence between Skinner’s verbal operants (Grannan and Rehfeldt 2012).

Skinner’s (1953) analysis of imagining could be an important conceptual tool used to establish efficient instructional procedures to teach individuals to engage in appropriate emergent or novel verbal responding. One of the ways in which Skinner (1953) describes how one form of imagining, operant seeing (i.e., a covert visual response in the absence of a corresponding physical stimulus visible to another observer), may be reinforced is in that “the private response may produce discriminative stimuli which prove useful in executing further behavior of either a private or public nature” (p. 273). Skinner (1953) discussed operant seeing, or seeing in the absence of the thing seen, most extensively, but noted that the other senses are “parallel cases” so we may reasonably be able to extrapolate his analysis of visual imagining to the other sense modes. Therefore, it is reasonable to conceptualize that private hearing responses may also produce such supplementary stimuli, which increases the probability of reinforcement for future behavior. For example, if I were to ask “what sound does a frog make?” one may engage in private responses, such as seeing a frog or the hearing of a frog croaking, the response-products of which we may then interact with (e.g., tact) publicly in answer to the question. According, then, to Skinner (1953), the seeing and hearing of the frog is not necessarily verbal, however it does produce supplementary discriminative stimuli that may allow an individual to engage in subsequent verbal behavior more effectively. It should be noted that the self-echoic might play a role in responding in the previous example. Instead of publically tacting the auditory stimulus of the frog croaking, it is possible that the individual privately tacts the stimulus “croak” with the vocal response croak and then engages in a self-echoic publically in response to the question. Based on this analysis, we may engage in auditory imagining if hearing some sound would currently be reinforcing or if hearing a sound would make subsequent behavior more likely to be reinforced.

Auditory imagining has been interpreted both as an instance of imagining which does not, necessarily, rely on the vocal apparatus (e.g., Skinner 1953), or as a case of sub-vocal verbal behavior (e.g., Schlinger 2009). Whether or not auditory imagining relies on the vocal apparatus, it is possible that being able to engage in this behavior could increase one’s ability to answer novel questions. Therefore, teaching individuals to engage in conditioned hearing may facilitate the emergence of intraverbal behavior. This would be desirable as the imagining behavior would be part of the individuals repertoire, and they would be free to engage in this response in novel environments where the only available probes to correct responding would be the individual’s own repertoire. Kisamore et al. (2011) studied the effects of a visual imagining procedure on the acquisition of complex category intraverbal behavior in preschool children, showing an increase in intraverbals with the inclusion of a rule. Although the results of the Kisamore et al. (2011) study showed that a rule statement to use the visual imagining strategy was necessary to produce the target intraverbal responding, it does provide some empirical evidence for the utility of directly teaching procedures which target imagining to produce socially important responses. Similarly, the use of an auditory imagining instruction procedure may increase simple and complex intraverbal responding by evoking hearing responses which increase the probability of appropriate responding. This may be done simply by supplying additional pairings of the non-verbal auditory sounds and the verbal words targeted for intraverbal emission (e.g., presenting the sound of a frog croaking simultaneously with the word “croak”). However, the differential ability to emit appropriate echoic and self-echoic responses may facilitate the process of engaging in auditory imagining responses to produce supplementary stimulation and the emission of intraverbal responses. That is, if one has a weak auditory imagining repertoire and is asked “what is a frog sound?” the ability to echo, perhaps covertly, and then self-echo the word “frog” may increase the probability of emission of the imagining response of hearing the frog croak, rather than relying on the initial question as the only source of stimulus control.

The purpose of the current investigation was threefold. First is to investigate the effects of auditory tact instruction on related intraverbal responses to explore the possible functional interdependence between the two operant repertoires. Second, if intraverbals failed to emerge following auditory tact instruction, to investigate the effect of an auditory imagining instruction procedure on the production of intraverbal responses. Third, to investigate the correlation between echoic and self-echoic repertoires, as measured using an assessment based on Esch et al. (2010), and the emergence of intraverbal responding.

Method

Participants

Four typically developing children participated in the study: Murray (6 years old), Fred (7 years old), Julie (6 years old), and Samantha (6 years old). All participants were enrolled at one local attendance center. Inclusion criteria included the teachers’ recommendation following a description of the study, as well as scoring below 20 % correct in pre-experimental sessions.

Setting and Materials

Sessions took place in a teachers’ workroom, which included a table, three chairs, two copy machines, and a counter surrounding the majority of the room. Sessions were scheduled after lunch, were approximately 15 to 25 min in duration, and were conducted 4 to 5 days per week. Materials included (a) 14 animal and object sounds obtained via YouTube® webpages with a Creative Commons license and played using a Toshiba laptop computer, (b) a small camcorder to record sessions, (c) a variety of small tangible prizes for participants, and (d) a sticker sheet and stickers. Participants were able to trade in their stickers for small prizes following each session.

Experimental Design

A multiple-probe design (Horner and Baer 1978) was used to assess the effects of various independent variables on the emergence of intraverbal responses. Participants received pre-experimental sessions for inclusion in the study followed by baseline pretest sessions, two tact instructional phases (AT1 and AT2) following a self-echoic assessment, and then progressed through the set of experimental phases depending upon the outcome of the test for emergent intraverbal responding. Intraverbals were assessed before and after each phase of tact instruction, and prior to each instructional procedure until criterion for inferring the emergence of untaught intraverbals was met. Following the second tact instruction phase, participants received three auditory imagining (AI) instruction sessions before the inclusion of an echoic component to the auditory imagining procedure (AIE) for an additional three sessions. If responding did not meet mastery criterion following AIE, a transfer of stimulus control procedure (ToSC) was implemented. If time permitted, a follow-up (FU) session was conducted several weeks after the mastery criterion was met.

Response Measurement and Interobserver Agreement

The primary dependent measure was the percentage of correct responses to the target simple intraverbal test questions (see Table 1). Throughout the document intraverbal “tests” will refer to the simple intraverbal questions only, and intraverbal “probes” will refer to categorization intraverbal questions only. A correct response to a simple intraverbal question was defined as the participant saying the experimentally defined name of the animal/item or the name of the animal/item sound when presented with a test question. An example of a correct response is the participant responding “caw” if asked “what is an eagle sound?”, or responding “eagle” when asked “what makes the sound caw?” An incorrect response was scored if the participant stated anything other than the target answer, or did not respond within 5–7 s.

Table 1 Intraverbal probe questions and the target responses

Secondary data were collected during intraverbal tests on the presence or absence of lip movement without vocalization before target responses during each trial of the intraverbal tests. Lip movement was defined as any movement of the lips that did not produce a vocalization audible from at least 31 cm away, that if vocalized, would appear as if they would produce a recognizable word, component of a word, or a phrase. Examples include, but are not limited to, movements that if vocalized would produce the sounds of buh, efe, v, ā, ē, ê, ī, î, ū, ō, ô, T, and Qu. The definition excluded any lip movement which would not appear to produce a recognizable word, component of a word, or phrase; for example, yawning; stretching of the jaw; stereotypic movements (e.g., ba-ba-ba, pa-pa-pa); chewing; whistling; puckering; licking of the lips; sneering; smiling; and frowning.

Secondary data were also collected for latency during intraverbal tests between the offset of the intraverbal question and onset of the participant’s response. Latency was defined as the amount of time in seconds that it took for the participant to initiate a response following the presentation of the question. Response latency in each trial was scored as falling within one of five categories: 0–2, 2–4, 4–6, 6–8, or >8 s.

A second categorization intraverbal probe consisted of two open-ended questions (e.g., the number of target responses to the question “What are some sounds you know?” or “What are some things that make sounds?”). The concept of categorization is here defined “as a group of related responses that are evoked by a particular verbal stimulus” (Kisamore et al. 2011, p. 256) with each target question evoking a distinct group of responses relating to aspects of the target sounds. A correct response for the intraverbal categorization probe consisted of the production of target intraverbal responses (e.g., “caw” or “eagle”).

Additional dependent variables consisted of recording all overt vocal responses by participants throughout the study, as well as the number of correct tacts during tact instructional phases. A correct response during tact instruction was defined as saying the experimentally defined name of the animal or the name of the animal sound when presented with audio of its characteristic sound. For example, saying “eagle” when presented with the sound of an eagle cawing during the first tact instructional phase (AT1), or saying “caw” when presented with the sound of an eagle cawing during the second tact instructional phase (AT2). An incorrect response was defined as the participant producing anything other than the target tact or no response within 5–7 s and was followed by a corrective statement (e.g., “an eagle makes that sound”) and representation of the auditory stimulus until the correct tact was emitted.

In addition, the number of correct echoic and self-echoic responses was recorded during the self-echoic assessment. A correct response during the self-echoic assessment was defined as a response with point-to-point correspondence with the model. For example, a correct echo included saying “1–3–8” when the experimenter stated “1–3–8,” as well as a correct self-echo for saying “1–3–8” following the participant’s own response of saying “1–3–8” (see Esch et al. 2010). There was also an echoic response requirement during the auditory imagining plus echoic phase (AIE). A correct response was defined as the participant repeating the phrase exactly as the experimenter presented it (e.g., “can you say ‘an eagle goes caw?’” → “eagle goes caw”). An incorrect response was defined as the participant saying anything other than the target phrase, or not responding for 5–7 s.

Interobserver agreement (IOA) was assessed by an independent observer for at least 50 % of all simple intraverbal test and categorization intraverbal probe sessions (range, 50–100 %), 70 % of all instructional sessions (i.e., tact and intraverbal) and 100 % of self-echoic assessment sessions. During all simple intraverbal, intraverbal categorization, and tact trials data collectors would indicate whether the response was correct or incorrect as well as write down the response the participant made. For example, during a simple intraverbal trial if the participant was asked “What sound does an eagle make” and the participant said “caw,” then the data collector would mark that it was correct, as well as write down the word “caw.” For the primary dependent variable of simple intraverbal responses, agreements were scored if each observer recorded the same correct or incorrect response. For the secondary dependent variable of intraverbal categorization agreements were scored if each observer recorded the same correct or incorrect response in order. Percentage agreement was calculated by dividing the number of agreements by the number of agreements plus disagreements and multiplying by 100. Mean agreement for the simple intraverbal and intraverbal categorization was 98.9 % (range, 83–100 %); for Murray, 98.7 % (range, 93.75–100 %); for Fred, 99 % (range, 91–100 %); for Julie, 99 % (range, 97.5–100 %); and for Samantha, 98.6 % (range, 83–100 %). For lip movement, agreements were scored if both observers recorded the occurrence or nonoccurrence of lip movement in a given trial. Percentage occurrence agreement was calculated by dividing the number of agreements on occurrence by the number of trials in which at least one observer recorded an occurrence and multiplying by 100. Mean agreement for lip movement was 91.75 % (range, 66.67–100 %); for Murray, 85.7 % (range, 66.67–100 %); for Fred, 100 %; for Julie, 89.4 % (range, 66.7–100 %); and for Samantha, 90.9 % (range, 66.67–100 %). For latency, agreements were scored if both observers recorded the same latency category (e.g., 0–2 or 4–6 s) in the same trial. Percentage agreement for latency was calculated by dividing the number of agreements by the number of agreements plus disagreements and multiplying by 100 %. Mean agreement for latency was 98.9 % (range, 83–100 %); for Murray, 90 % (range, 50–100 %); for Fred, 99.6 % (range, 95–100 %); for Julie, 82.3 % (range, 55–100 %); and 89.5 % (range, 60–100 %) for Samantha. For tact and intraverbal instruction, agreements were scored if each observer recorded the same correct or incorrect response. Percentage agreement was calculated by dividing the number of agreements by the number of agreements plus disagreements and multiplying by 100. Mean agreement for all instruction sessions was 99.5 % (range, 70–100 %).

Procedure

Following inclusion into the study, what the participants experience would depend on their responding to the simple intraverbal tests. Once a participant met mastery criterion for the simple intraverbal tests, they would be finished with the study, with the possible exception of a follow-up probe, and may not receive every intervention phase. In general, following one baseline session, each participant will receive the self-echoic assessment. Then, following at least three total baseline sessions, they would receive the initial auditory tact instruction (AT1). If intraverbal responses failed to emerge following AT1, they would receive the second set of auditory tact instruction (AT2). Following the tact instructions, the participant would receive the auditory imagining procedure (AI), followed by the addition of an echoic requirement to the imagining procedure (AIE). If responding still does not emerge at this point, then participants would receive a transfer-of-stimulus control (ToSC) procedure until mastery for simple intraverbal tests was met.

Pre-Experimental Tests

Prior to experimental data collection, pre-experimental tests were conducted. Pre-experimental tests were conducted exactly as the simple intraverbal tests described below. If a participant scored above 20 % correct on the pre-experimental test, then that participant was re-assessed with another pre-experimental test with replacement stimuli (i.e., intraverbal questions that corresponded to alternate auditory stimuli), and only the stimuli the participant responded correctly toward were replaced. Pre-experimental tests were presented a maximum of three times. If the participants scored at or lower than 20 % correct on any pre-experimental tests, then that participant was included in pretest experimental data collection, and the same stimuli tested were used for baseline. If a participant did not score at or below 20 % on any of the three pre-experimental tests, then that participant would have been excluded from the study. No participant was excluded due to performance on the pre-experimental tests. Two participants received replacement stimuli (see set 2 in Tables 1 and 2).

Table 2 Target stimuli for tact instruction as well as the target tact response

Simple Intraverbal Tests

Target intraverbal tests consisted of a trial block of 20 questions, with two questions corresponding to each auditory stimulus (See Table 1). Each intraverbal test session began with the experimenter saying “I am going to ask you some questions; I want you to try your best to answer them correctly. If you don’t know the answer, you can say you don’t know. I will not be able to tell you if you’re right or wrong, but just try your best.” Each trial began with the experimenter obtaining the participant’s attention as defined by eye contact, and then presenting the question (e.g., “What makes the sound swish swish?”). The participant was able to earn stickers regardless of responding. In each trial, the experimenter delivered general praise (e.g., “great work”), but no differential, corrective feedback. In each session, the participant was presented with all 20 intraverbal questions. Mastery criterion was three consecutive test sessions with at least 90 % correct responding.

Intraverbal Categorization Probes

Presentation of an intraverbal categorization probe occurred prior to each simple intraverbal test. Intraverbal categorization probes consisted of the experimenter asking two questions (i.e., “What are some sounds you know? Tell me as many as you can.” and “What are some things that make sounds? Tell me as many as you can”). The order of the presentation of each question was altered prior to each session. The presentation of each trial began with the experimenter obtaining the participant’s attention as defined by eye contact, and then presenting the question. Only target responses were scored as correct; however, secondary data were collected on all overt verbal responses as well as lip movement without vocalizations for each probe. Latency was not measured during intraverbal categorization probes. Once the participant did not provide a response for 5–7 s, the experimenter provided the prompt “any more?” Once the participant provided the entire target set or did not respond for 5–7 s following the prompt, the initial probe trial ended, and the second question was presented following the same process. The participant was able to earn stickers regardless of responding. The experimenter praised the participant for non-target behaviors and no corrective feedback was given. There was no mastery criterion for intraverbal categorization probes.

Tact Pretests and Posttests

Because each non-verbal auditory stimulus had two corresponding questions (i.e., “what is the sound?” vs. “what makes the sound?”), two sets of tacts were taught for each stimulus. For example, during the initial tact tests corresponding to the tact instructional phase (AT1) concerning what makes a sound, the experimenter began by saying “I am going to play some sounds and I want you to try your best to tell me what makes the sound.” Tests for the second tact, corresponding with the tact instructional phase (AT2) concerning what the sound is, were presented prior to instruction for that phase. The experimenter began the second portion of tact tests by saying “I am going to play some sounds and I want you to try your best to tell me what it is.” For each tact test session, the participant was presented with all 10 auditory stimuli (see Table 2), and had 5–7 s to respond with the appropriate tact to be scored as correct. No feedback was given on correct responses, and participants earned stickers regardless of responding. Mastery criterion was three consecutive test sessions of 100 % correct responding.

Self-Echoic Assessment

After the initial intraverbal pretest session participants were given a self-echoic assessment adapted from Esch et al. (2010). This was conducted to determine if responding on intraverbal tests differentially corresponded to an effective echoic or self-echoic repertoire. An echoic placement pretest was conducted exactly as in Esch et al.; participants were asked to repeat numbers increasing in level from one to nine digits (e.g., 1/1–2/1–2–3, etc.) each including three sets (e.g., 1–2, 2–4, 5–3). If the participant correctly echoed two of the three sets they would pass that level and move onto the next. Once the participant incorrectly echoed at least two of the three sets the pretest was terminated. The highest level that the participant passed was used for the assessment.

The echoic/self-echoic trials were similar to the trials described by Esch et al. (2010) with the only difference being that each participant in the current investigation was given the same instructional lead (e.g., “what?”) and distracter trials were not included in the present investigation. The experimenter presented approximately two numbers per second with the instructional lead (e.g., asking “what?”) provided 2–3 s after the participant’s response. Each participant was exposed to 20 echoic/self-echoic trials; each trial followed by the delivery of stickers regardless of the participant’s response. The experimenter praised the participant for non-target behaviors (e.g., “good job working hard”) and no corrective feedback was given.

Auditory Tact Instruction

During auditory tact instruction, the investigator began the session in a similar manner as during the initial tact test sessions described earlier. During the initial tact instructional (AT1) phase, the target tact was the name of the item or animal that produces the auditory stimulus. The initial prompt consisted of “I am going to play some sounds and I want you to try your best to tell me what makes the sound.” A session consisted of 10 trials, including the presentation of each of the 10 auditory stimuli once in random order. The mastery criterion was 100 % correct responding across three consecutive test sessions.

The second auditory tact instructional (AT2) phase was conducted if participants did not meet mastery criterion for the intraverbal tests following AT1. During the AT2 phase, the participants were again presented with the auditory stimulus (e.g., an eagle producing “eagle cawing”) but were instructed to tact it as the sound, rather than what was producing the sound (e.g., “caw”). The investigator began the session exactly like the initial tact instruction phase described above, with the exception of the initial prompt consisting of “I am going to play some sounds and I want you to try your best to tell me what the sound is.” A correct response was followed by descriptive praise and a sticker. An incorrect response was followed by a corrective statement (e.g., “that sound is caw”) and representation of the auditory stimulus until the correct tact was emitted. A session consisted of 10 trials, including the presentation of each of the 10 auditory stimuli once in random order. Mastery criteria were set at 100 % correct responding across three consecutive test sessions.

Auditory Imagining Instruction Procedure

If intraverbals failed to emerge following the tact instructional phases, the investigator implemented the auditory imagining instruction procedure. This involved the experimenter reading through a short script with the participants, with the auditory stimuli playing at specific times. The script for stimulus set 1 (see Appendix A for stimulus set 2 script) consisted of the experimenter saying: “If someone asks me a question, and I don’t know the answer I try to talk myself through it. For example if someone asks me what sound a car makes, I would say to myself ‘what other sounds do I know; I know that an eagle goes, [eagle cawing], right an eagle goes caw. I know that a camera goes, [camera clicking], oh right a camera goes click, I know that a cricket goes, [cricket chirping], oh right a cricket goes chirp chirp. I know that a phone goes, [phone ringing], oh right a phone goes ring ring. I know that scissors go, [scissors snipping], oh right scissors go snip snip. I know that water goes, [water dripping], oh right water goes drip. I know keys go, [keys jingling], oh right keys go jingle. I know that a clock tower goes, [clock tower ding donging], oh right a clock tower goes ding dong. I know that an alarm goes, [alarm buzzing], oh right an alarm goes buzz. And I know that a tambourine goes, [tambourine rattling], oh right a tambourine goes rattle”. Following the first presentation of the script, the intraverbal tests and probes were immediately presented. If the participant did not attain 90 % correct responding on an intraverbal test then the script was represented a maximum of three times. Mastery criterion was set at 90 % correct responding for three consecutive intraverbal posttests.

Auditory Imagining Plus Echoic

If the intraverbal repertoire failed to emerge following three presentations of the auditory imagining instruction procedure, a vocal response requirement was included in all subsequent presentations of the script. The contents of the script remained identical to the previous phase; however, the participant was prompted to echo the last portion of each sentence. For example, following the experimenter saying “I know that an eagle goes [eagle cawing] oh right an eagle goes caw” the echoic prompt consisting of “can you say ‘an eagle goes caw?’” was provided. Descriptive praise and the delivery of a sticker followed each correct echoic, so the participants could have earned up to 20 stickers each time the script was presented. An incorrect response was followed by representation of the entire sentence as well as the prompt to echo (e.g., “I know that an eagle goes [eagle cawing] right an eagle goes caw” then the prompt of “can you say ‘an eagle goes caw?’”) until the participant either correctly echoed the statement or the sentence was presented three times. A correct echoic response after an incorrect response was followed by descriptive praise but no stickers. Mastery criterion was 90 % correct responding for three consecutive intraverbal posttests.

Transfer of Stimulus Control

If the intraverbal repertoire failed to emerge following auditory imagining plus echoic instruction, then a transfer of stimulus control (ToSC) procedure was introduced for a set of intraverbal questions. Each participant began with a different set of intraverbal questions, depending upon when the participant began instruction. Following each presentation of the intraverbal question, the experimenter immediately vocalized the correct answer (e.g., “What is a frog sound? Ribbit”). If the participant correctly echoed the answer, descriptive praise was provided and the target question was represented and the participant was given the opportunity to answer independently. A response was scored as prompted if the participant echoed the experimenter or independently answered correctly on represented trials. An incorrect response was scored if the participant emitted anything other than the correct response or did not respond within 5–7 s. A graduated prompt delay of 5 s was used once the participant scored 100 % correct with prompting in any one session. After 100 % correct prompted responding to the set of questions, the following session, the experimenter waited an additional 5 s from the previous session before emitting the correct answer for the participant to echo (e.g., after 100 % correct prompted at a 5-s delay, “What is a frog sound?”—10 s delay—“ribbit”). If the participant responded before the experimenter provided the correct response, it was recorded as independent correct. Once the participant scored 100 % independent correct in three consecutive sessions on the first set of questions, the intraverbal tests and probes were represented. The intraverbal tests and probes were conducted exactly as described above. If participants did not meet mastery criterion following the first set of questions, additional sets were trained using the ToSC procedure. Mastery criterion was 90 % correct responding to intraverbal tests for three consecutive sessions or after all four sets had been trained.

Procedural Integrity

A trained independent observer scored 50 % of sessions to assess procedural integrity for test, instruction, and probe sessions. Each trial was coded as either correct or incorrect based on the appropriate experimenter behavior for each phase. The experimenter behavior being observed included presenting session instructions correctly (defined differently for each phase), as well as providing the correct consequence for participant behaviors, which was individually defined for each phase (e.g., not providing differential consequences during test and probe sessions, and providing corrective feedback during instruction sessions). Mean procedural integrity scores for all sessions was 99.8 % (range, 90–100 %); for Murray, 99.75 % (range, 91–100 %); for Fred, 100 %; for Julie, 99.7 % (range, 90–100 %); and for Samantha, 99.9 % (range, 97–100 %).

Results

Self-Echoic Assessment

Results of the self-echoic assessment are depicted in Fig. 1. Murray echoed 17 responses correctly, and self-echoed 14 responses correctly with 6 digits. Fred echoed 13 responses correctly and self-echoed 9 responses correctly with 6 digits. Correct responses for Julie were 16 echoic responses and 16 correct self-echoic responses with 5 digits. Samantha correctly echoed 0 responses and emitted 12 correct self-echoics with 4 digits.

Fig. 1
figure 1

Number of correct echoic and self-echoic responses for all participants. The line indicates a 1:1 ratio of correct echoic and self-echoic responses

Intraverbal Pretests and Posttests

During the initial pre-experimental test, Fred and Samantha made no correct responses. Murray and Julie both scored 30 % correct and thus met the requirement for inclusion of replacement stimuli and received a second pre-experimental test. Murray and Julie responded with 10 % and 5 % accuracy, respectively, with the replacement stimuli.

Figure 2 displays results for simple intraverbal tests for all participants. Results for the pretests showed an average of 10 % correct intraverbal responses (range, 0–20 %) for all participants. Responding remained low after the initial tact instructional (AT1) phase for all of the participants (M = 6.25 %, range, 0–10 %). Following the second tact instructional (AT2) phase, responding increased for three of the participants (M = 37.68 %, range, 0–85 %). Murray and Julie met mastery criterion following the introduction of auditory imagining instruction (AI), while Fred and Samantha showed no increase in responding. Two participants, Fred and Samantha, were exposed to the auditory imagining plus echoic (AIE) condition. Session instructions were also altered for Fred to include a rule (AIE+R) during this phase. The rule was introduced in an attempt to increase response variability. The rule consisted of altering the pre-session instructions (i.e., “This time do not tell me the sound that the word makes. For example, if I ask ‘what is a car sound’ do not say ‘cah,’ instead try to remember the words that we talked about”, and “this time do not tell me the alphabet and alphabet sounds; but try to remember the sounds that we talked about”) for simple intraverbal and categorization probes, respectively. No increase was seen during this phase, and Samantha’s correct responding decreased. Following the mastery of the initial transfer of stimulus control set, each participant showed increases in the intraverbal tests above what was taught (M = 81.43 %, range, 60–100 %). Responding was near mastery level during follow-up sessions for three participants (M = 91.67 %, range, 85–100 %).

Fig. 2
figure 2

The primary y-axis depicts percent of correct intraverbals (represented by closed circles) and percent of occurrences of lip movements (represented by open circles across). The secondary y-axis depicts average latency in seconds for responses (represented by bars)

Intraverbal Categorization Probes

Table 3 displays results for intraverbal categorization-makes probes for all participants. Results for the pretest probes showed an average of 0.58 % correct intraverbal responses (range, 0–10 %) during each session for all participants. Responding remained low after the AT1 phase for three of the four participants (M = 15 %, range, 0–60 %). Following the AT2 phase, responding was low for all of the participants (M = 0.71 %, range, 0–10 %). Responding decreased to zero levels following the introduction of AI for all participants. Responding increased in comparison to previous phases for Fred during the AIE+R phase (43.3 %), and Samantha showed no improvements during both the AIE phase and post-ToSC procedure (0 %). Fred’s responding increased post-ToSC (55 %) and during the follow-up (FU) (80 %). Murray and Julie remained at 0 % at FU.

Table 3 Mean percent of trials with correct categorization during open-ended “makes” and “is” probes

Table 3 displays results for intraverbal categorization-is probes for all participants. Results for the pretest probes showed an average of 0 % correct intraverbal responses for all participants. Responding remained at zero levels after the initial AT1 phase for all participants. Following the AT2 responding was low for all of the participants (M = 3.93 %, range, 0–40 %). Responding remained at low levels following the introduction of AI for all participants (M = 4.29 %, range, 0–50 %). Responding increased in comparison to previous phases for Fred during AIE+R (70 %) and Samantha showed no improvements during both the AIE phase and post-ToSC (0 %). Fred’s responding decreased to 60 % during post-ToSC and FU sessions. Julie remained at 0 % at FU, while Murray increased to 30 %.

Instruction

The number of sessions to reach mastery criterion for the AT1 phase for Murray was 11 sessions (110 trials) and for the second AT2 phase, was 23 sessions (230 trials). For Fred, mastery criterion was reached in 7 sessions (70 trials) during the AT1 phase, 12 sessions (120 trials) during the AT2 phase, and 4 sessions (40 trials) for ToSC. Mastery criterion was reached in 12 sessions (120 trials) for AT1 phase for Julie, and 25 sessions (250 trials) to reach mastery criterion during the AT2 phase. For Samantha, mastery criterion was reached in 5 sessions (50 trials) during the AT1 phase, 8 sessions (80 trials) during the AT2 phase, and 28 sessions (280 trials) for ToSC.

Lip Movement and Latency

Figure 2 displays results for lip movement and latency measures for all participants. Latency is reported as scoring each trial as the lower boundary within the range (e.g., if a response is scored within 0–2 s, the response is scored as 0 for that trial; if a response is scored between 2 and 4 s, the response is scored as 2, etc.), adding all the trials and dividing by the number of trials. Latency was scored in this manner as a conservative estimate and since most trials occurred immediately and within 0–2 s. The majority of responses across all of the phases of the experiment occurred 0–2 s after the question was presented (M = 80.18 %). The percentage of trials with lip movement remained relatively low through phases for all participants (M = 14.28 %). Murray emitted the lowest amount of lip movements of the participants (M = 2.36 %). Fred and Samantha emitted low levels of lip movement for the first portion of the experiment, but increased sharply after the introduction of the AIE+R and ToSC phase, respectively. Julie emitted low levels of lip movement which remained slightly variable throughout the experiment (M = 17.46 %, range, 10–35 %). During both categorization-is and categorization-makes probes, lip movement for all participants remained at close to zero levels throughout the experiment. Murray did not engage in any lip movements in any phases. For Fred, there was no lip movement for categorization-makes probes except following ToSC (M = 10 %), and only during AIE (M = 1.67 %), ToSC (7.5 %), and FU (M = 5 %) for categorization-is. For Julie, there was no lip movement during categorization-is probes except following AT2 (M = 1 %), and did not emit any lip movements during categorization-is probes. Samantha did not engage in any lip movements during any categorization probe.

Discussion

The first purpose of this experiment was to investigate the effects of auditory tact instruction on the production of related intraverbal responses. The results of the simple intraverbal tests indicate that teaching auditory tacts may be beneficial to generate functional interdependence with simple intraverbals. Categorization responses have been conceptualized as an example of a “complex intraverbal repertoire” (Sautter et al. 2011). It is reasonable, then, that the acquisition of simple intraverbal behavior may not be interdependent with categorization responses. The results of both groups of intraverbal tests indicate that the emergence of either simple or categorization intraverbal responding did not correlate with the production of the other. Secondary measures were collected on lip movement and latency to investigate the possible role of covert responses mediating the emergence of intraverbals. Results of the latency and lip movement measures indicate that it is likely that participants’ responding was not mediated by private verbal behavior, although lip movement does not always perfectly correlate with private verbal behavior. It is possible that the increased responding following the second tact instructional phase (AT2) for Murray and Julie was due to a functionally interdependent tact and intraverbal repertoire following two sets of tact instructional phases, although it is also possible that they were engaging in a conditioned hearing response to mediate responding. If conditioned hearing is a non-verbal perceptual operant (Skinner 1953), then emission of lip movements would not be necessary to accompany the mediating response.

The second purpose was to investigate the possible role of an auditory imagining instruction procedure to facilitate the emergence of intraverbal responding. It was conceptualized that the additional exposure (i.e., pairings) to the auditory stimuli and words would increase the probability of a conditioned hearing response of the auditory stimulus. Results showed no improvements in intraverbal categorization; however, Murray and Julie both met the mastery criterion for simple intraverbals in this phase. On trials where Julie emitted additional overt verbal behavior during this phase, she would echo the target word presented in the question. For example, if the question was “what is an eagle sound?” she would echo the word “eagle.” It is possible that she engaged in this corollary echoic behavior to increase the probability of evoking the hearing response to mediate correct intraverbal behavior. This seems to indicate that some mediating behavior was necessary to produce the intraverbal responses.

An echoic requirement was added for Fred; however, a rule was also included during this condition (AIE+R). Fred’s responding had become rote in that he would respond to each question with the sound that the target word begins with (e.g., “what makes the sound caw?” is answered with “cah”) to simple intraverbal questions and with the alphabet and alphabet sounds to categorization probes. Fred’s simple intraverbal responses did not increase in this phase; however, his categorization responses increased, indicating that there may be independence within the intraverbal operant class. Fred answered each categorization question with both what makes the sound and what the sound is (e.g., “eagles goes caw,” etc.), indicating that his responding was not mediated by a conditioned hearing response, but rather was controlled intraverbally. Following mastery of the initial set of ToSC, Fred and Samantha’s simple intraverbal responses improved, indicating that they needed some direct reinforcement on some simple intraverbal responding for others to emerge. Murray, Fred, and Julie were given a follow-up session (FU) to test whether the intraverbal responses would maintain. Julie and Murray’s responding remained high on the simple intraverbal tests. Fred was able to maintain 100 % on simple intraverbals, and was able to emit several correct categorization responses. This indicates that these responses were able to maintain at a high level with up to 1 month between sessions.

The third purpose was to investigate the relation between echoic and self-echoic repertoires and the ability to emit novel intraverbals. Results of the self-echoic assessment showed that Murray and Julie had the most correct responding on both the echoic and self-echoic portions, and they met mastery criterion for simple intraverbals after the introduction of AI. This indicates that the ability to emit echoic and self-echoic responses may correlate to a differential ability to emit novel intraverbals. Julie echoed the target word of the question before answering several questions. It may be speculated that the echoic response facilitated the conditioned hearing response, leading to the ability to correctly answer the question. Fred emitted fewer self-echoics than echoics, and this lack of proficiency in self-echoics may have been involved in his need for direct reinforcement of some questions before the other intraverbals emerged. Samantha was able to emit several self-echoic responses but no correct echoic responses during the assessment. A limited echoic repertoire could have contributed to her inability to emit the simple intraverbal responses, as after hearing the question, she perhaps could not produce appropriate supplementary stimuli (e.g., privately echoing a portion, or the entirety, of the question presented by the experimenter).

There are several limitations of the current investigation that are noteworthy. First, only Murray and Julie met the mastery criterion in the AI phase; however, their responding increased dramatically following AT2. It is possible that a more stringent mastery criterion or remedial tact instructional phases could have increased responding to mastery levels for these participants. It is also possible that the greater amount of exposure to tact instruction for Murray and Julie, as compared to Fred and Samantha, increased the probability that intraverbals would emerge. Fred and Samantha did not show substantial improvements on the simple intraverbal tests until after direct instruction for a portion of the stimuli; however, both showed a performance increase greater than what was taught. It is unclear what role the participant’s verbal histories played in the current investigation; future research should include pre-experimental verbal assessments to address this issue. Second, auditory imagining is a covert response that cannot be directly measured at this time. It therefore cannot be verified that the participants were engaging in any covert mediating behavior. Third, Fred’s responding only increased following the alteration of session instructions. The role of this alteration in improving performance is unclear. Fourth, the measure for latency may not have been sensitive enough to detect differences in responding. The time frame used to code latency consisted of 2 s blocks, which could have been too great a time frame to discern differential responding due to covert mediating responses. Finally, it is possible that it was inappropriate to use an echoic transfer of stimulus control procedure for Samantha, as she did not score any correct echoic responses on the self-echoic assessment. It is possible that using a separate operant to transfer responding to the intraverbal questions could have decreased the amount of trial blocks needed for mastery and potentially increased subsequent responding on intraverbal tests and probes.

There are several issues that warrant further investigation. First, although responding increased following the AT2 phase for three of the four participants, none were able to reach mastery criterion. Future research should address whether a history of interacting with sounds increases the likelihood of transfer. Similarly, as there was such an increase for two participants following the tact instructional phases, it is somewhat unclear if the AI procedure was responsible for the remaining increase to mastery or if the two tact instructional phases could have been sufficient. Second, it is conceivable that the increases that were seen during the AI phase for Murray and Julie depend upon the previous exposure to the tact instructional phases. It is potentially worth investigating the effects of the AI procedure without this previous learning history. Third, for Murray and Fred, categorization responses emerged before simple intraverbal responses. More research is warranted on the covariation between simple intraverbal responding and intraverbal categorization responses. Fourth, additional research on rule-statements on imagining procedures and the production of intraverbal responses is warranted. Finally, future research should investigate whether the responses sharing the same antecedent sense mode are initially more susceptible to emergence. For example, would correct responding to untaught intraverbal behavior with auditory antecedent control (e.g., someone vocally asking a question) emerge faster if the verbal operant used in a transfer procedure shares an auditory antecedent control rather than visual, or other antecedent sense modes.