The impact of training methodology and category structure on the formation of new categories from existing knowledge

Hélie, Sébastien; Shamloo, Farzin; Ell, Shawn W.

doi:10.1007/s00426-018-1115-3

The impact of training methodology and category structure on the formation of new categories from existing knowledge

Original Article
Published: 27 October 2018

Volume 84, pages 990–1005, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Psychological Research Aims and scope Submit manuscript

The impact of training methodology and category structure on the formation of new categories from existing knowledge

Download PDF

401 Accesses
5 Citations
Explore all metrics

Abstract

Categorization decisions are made thousands of times every day, and a typical adult knows tens of thousands of categories. It is thus relatively rare that adults learn new categories without somehow reorganizing pre-existing knowledge. Yet, most perceptual categorization research has investigated the ability to learn new categories without considering they relation to existing knowledge. In this article, we test the ability of young adults to merge already known categories into new categories as a function of training methodology and category structures using two experiments. Experiment 1 tests participants’ ability to merge rule-based or information-integration categories that are either contiguous, semi-contiguous, or non-contiguous in perceptual space using a classification paradigm. Experiment 2 is similar Experiment 1 but uses a YES/NO learning paradigm instead. The results of both experiments suggest a strong effect of the contiguity of the merged categories in perceptual space that depends on the type of category representation that is learned. The type of category representation that is learned, in turn, depends on a complex interaction of the category structures and training task. We conclude by discussing the relevance of these results for categorization outside the laboratory.

Single and multiple systems in categorization and category learning

Article 22 July 2024

Learning and generalization of within-category representations in a rule-based category structure

Article 24 April 2020

The impact of category structure and training methodology on learning and generalizing within-category representations

Article 05 June 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Categorization is a ubiquitous cognitive process and categorization decisions are made thousands of times every day (Hélie, Waldschmidt, & Ashby, 2010). As a result, a typical adult knows tens of thousands of categories. It is thus relatively rare that adults learn new categories without relying on prior knowledge. For example, if one already knows the category “red objects” and the category “large objects”, then learning the category “large red objects” should be fairly straightforward and would not need re-learning about the color red or large sizes. Instead, the already known categories can be merged together to form the new category. This property has been referred to as compositionality (Fodor & Pylyshyn, 1988). Although this issue has been investigated in the context of natural categories (Aerts, Gabora, & Sozzo, 2013; Cohen & Murphy, 1984; Prinz, 2012; Smith, Osherson, Rips, & Keane, 1988; Voorspoels, Storms, & Vanpaemel, 2012; Wisniewski, 1997; Ling Wu & Barsalou, 2009; Zadeh, 1982), most perceptual category learning tasks have investigated the ability to learn artificial categories without relying on prior knowledge (e.g.. Ashby & Maddox, 2010; Erickson & Kruschke, 1998; Medin & Schaffer, 1978; Nosofsky, 1986; Posner & Keele, 1968; Shepard, Hovland, & Jenkins, 1961; Smith & Minda, 2002). In this article, we test the ability of young adults to merge already known categories into new categories as a function of training methodology and category structures using two experiments. The results show that categories that are contiguous in perceptual space are easier to merge, and that the magnitude of the merging cost may depend on the type of category representation that is learned. The type of category representation that is learned, in turn, depends on both the category structures of the learned categories and the training task (Ell, Smith, Peralta, & Hélie, 2017; Hélie, Shamloo, & Ell, 2017).

Category representation and generalization

Category representations (i.e., the way in which information is stored and used: Markman, 2002) are the building blocks of decision-making from the most routine to the most novel contexts (Hélie & Ashby, 2012). Generally speaking, category representations can be broadly classified as within-category representations or between-category representations (Levering & Kurtz, 2015; Markman & Ross, 2003). Specifically, within-category representations contain information about the categories themselves. For example, a within-category representation of humans could contain information about what is common among category members (e.g., one head), the correlation between the features (e.g., as height increases, so does arm length), or the range of feature values (e.g., adult height typically varies between 5 and 6.5 feet). In contrast, between-category representations would contain information about the distinguishing features between two categories. For example, a between-category representation contrasting humans and dogs might contain information about the relevant features for separating humans and dogs (e.g., number of legs) and criteria on these feature-values (e.g., less than three legs is generally a human; more than three legs is generally a dog). Hélie, Ell, and colleagues (Ell et al., 2017; Hélie et al., 2017) argued that within-category representations may be more useful when inferring missing attributes (e.g., one may infer that the author’s spouse has two legs without being told) whereas between-category representations may be useful for extrapolation outside of the learning space (e.g., an animal with one leg is even less likely to be a dog than a human). In the category learning literature, prototype models (Reed, 1972; Smith & Minda, 2002) and exemplar models (Nosofsky, 1986) assume that a within-category representation is the basis for response selection, whereas criterion-setting models (Erev, 1998; Treisman & Williams, 1984) assume a between-category representation.

Given that category representations may contain different types of information, it is reasonable to expect that within- and between-category representations may assemble differently when merged to create new categories. For example, between-category representations can be represented as decision criteria in perceptual space (e.g., a threshold on height separating tall peoples from short peoples), so compositionality could consist of linking multiple decision criteria together using logical connectors (e.g., AND, OR). In the earlier example of large red objects, these would be objects that fall above the large threshold on size AND above some threshold in the hue spectrum. The decisions could be made independently on each dimension using a conjunction rule (F. G. Ashby, Alfonso-Reese, Turken, & Waldron, 1998). Accordingly, applying multiple rules consecutively could be taxing for working memory resources (Miles & Minda, 2011), and one would expect that difficulty would increase with the number of decision criteria that need to be applied to merge the categories (Erickson, 2008). In other words, the effect would be similar to increasing the Boolean complexity of the categories (Feldman, 2003; Shepard et al., 1961).

In contrast, within-category representations can be represented by the generative model (e.g., density distribution) that is most likely to have generated the category members (Hélie et al., 2017). For example, imagine four categories formed based on peoples’ height (e.g., short, average, tall, very tall). Let’s further imagine that each category was generated using a normal distribution, so an ideal observer learning within-category information would learn 4 generative models of the categories (i.e., 4 separate normal distributions—one for each category) and use the sample means and variances to estimate the distribution parameters. The top row of Fig. 1 shows an example resulting model where the x-axis would represent the height of an individual and the y-axis would represent the probability.

After learning the 4 categories, imagine that the ideal observer is asked to merge the learned categories into 2 new categories. There are many ways in which generative models can be merged. If the newly formed categories are contiguous, one could aggregate the learned generative distributions by calculating a grand mean and variance to include all the stimuli from the merged categories, or learn a mixture model of the training categories. Mixtures models are statistical models where a distribution is formed using a weighed sum of basis functions (in this case normal distributions) with different parameter values (Bishop, 2006). These two possibilities are shown in the middle row of Fig. 1. These two merged models are fairly straightforward and not much complexity is added since there are no stimuli with a different category label between the merged categories. However, if the categories become non-contiguous, then the best that can be done by the ideal observer is to create mixture models that are semi-contiguous (Fig. 1, bottom-left) or non-contiguous (Fig. 1, bottom-right). These mixture models are discontinuous, which could drastically increase the difficulty of the categorization task.

Generating predictions

To provide an intuition for the difficulty of merging the generative models, we simulated the merging models included in Fig. 1. The simulation went as follows: (1) A set of 96 training stimuli was generated from 4 Gaussian distributions. The resulting maximum likelihood Gaussian models are shown in the top row of Fig. 1. (2) Next, the training models identified in (1) were merged and tested using 96 new stimuli generated from the original training distributions. For the contiguous condition, both models in the middle row of Fig. 1 were tested. For the semi-contiguous and non-contiguous conditions, the models in the bottom row of Fig. 1 were tested. (3) In all cases, the test stimuli were presented one at a time and the probability of the stimulus under each merged category was calculated based on the distribution models. (4) A response was selected stochastically. Each condition was simulated 10,000 times.

When the learning densities (top row of Fig. 1) are merged with optimal mixture weights (i.e., the weights linearly combining the merged learning densities), all conditions produce a test accuracy \(> 99\%\). However, human participants do not have the opportunity to learn the mixture weights: They are only told that the categories are merged. To produce more realistic predictions, we added mean-centered Gaussian noise to the optimal mixture parameter estimates (\(SD = 1\)).^{Footnote 1} With the noisy weight estimates, the contiguous mixture model (Fig. 1, middle-right) produced a test accuracy of 94.2%, the semi-contiguous merged model (Fig. 1, bottom-left) produced a test accuracy of 80.2%, and the non-contiguous merged model (Fig. 1, bottom-right) produced a test accuracy of 73.4%. Finally, because the integrated merged model (Fig. 1, middle-left) did not require estimating mixture weights, it produced a test accuracy of 96.7%.

These simulation results thus allow for the following predictions. If participants are merging categories using the integrated model, then there should be nearly perfect transfer from training to test. However, all the mixture models should produce a transfer cost. The transfer cost for the mixture model is small in the contiguous condition (about 5%), but is substantial when category contiguity is broken. For the semi-contiguous condition, the transfer cost is about 20% and for the non-contiguous condition, the transfer cost was about 26%. Hence, breaking contiguity of the merged categories produces a large transfer cost, with a small difference between semi- and non-contiguous conditions. This suggests an all or none type of transfer cost when breaking contiguity with within-category representations.

Hypotheses

The present experiments test the effects of category contiguity in perceptual space on merging difficulty as a function of category structures and training methodology. Hélie et al. (2017) and Ell et al. (2017) showed that category structures and training methodology interact in determining the type of information that is learned in perceptual categorization. In a typical classification (A/B) experiment, participants are shown a stimulus and asked to assign the stimulus to one of a number of contrasting categories by pressing a response button corresponding to the category. For example, a participant might be asked to press the left button if the animal is a human and the right button if the animal is a dog. Hélie and colleagues showed that more than half of the participants trained in A/B with rule-based (RB) categories learned between-category information. In contrast, over 70% of the participants trained in A/B with information-integration (II) categories learned within-category information. We thus predict that, with A/B training, the difficulty of merging already learned RB categories will increase with the number of decision bounds that need to be joined with logical connectors. In contrast, the difficulty of merging already learned II categories with A/B categorization will be increased abruptly by the mere presence of a discontinuity.

Experiment 1 tests for these hypotheses by training participants in a 4-category A/B task using either RB or II categories. After training, participants transferred to a condition where they had to merge the 4 training categories into 2 new categories. The new categories could be formed using contiguous (C), semi-contiguous (SC), or non-contiguous (NC) training categories. To anticipate, the results show that, as predicted, each additional decision bound increased the transfer cost with RB categories. Hence, the C condition was easiest (requiring 1 bound), followed by the SC condition (requiring 2 bounds), and the NC condition was the most difficult (requiring 3 bounds). In contrast, breaking contiguity increased the transfer cost in an all-or-none fashion with II categories. Specifically, the transfer cost for the SC and NC conditions was higher than for the C condition (as with the RB categories), but there was no evidence of a transfer cost difference between the SC and NC conditions (unlike with the RB categories).

Another task popular in perceptual categorization is the YES/NO task. In a typical YES/NO experiment, participants are shown a stimulus with a category label and asked to accept or reject the association by pressing a different response button for yes and no. For example, a participant might be shown an animal with the label human and be asked to press the right button if the animal is a member of the category human (i.e., “yes”) or the left button if the animal is not a member of the category human (i.e., “no”). Hélie et al. (2017) showed that most participants trained with YES/NO learned within-category information for both RB and II category structures. We thus predict that, with YES/NO training, the difficulty of merging already learned categories will depend on the contiguity of the categories in perceptual space (similar to merging II categories with A/B training). However, with YES/NO training, this prediction should hold for both RB and II categories.

Experiment 2 tests for this hypothesis by reproducing Experiment 1 with the only difference being that participants were trained with YES/NO categorization (same stimuli, categories, and transfer conditions). To anticipate, the results show that, as predicted, the transfer cost was higher for the NC condition compared to the C condition, and did not depend on the category structures. However, unlike in Experiment 1, there was no evidence suggesting a difference in transfer cost between the SC condition and either the C or NC condition.

Experiment 1

Experiment 1 tested the effects of category structures and category contiguity on the compositionality of categories learned using an A/B paradigm. Participants learned four II or RB categories using trial-and-error and then transferred to a two-category task where two new categories were created by merging learned categories. The merged categories at test could be contiguous, semi-contiguous, or non-contiguous.

Method

Participants

One hundred eighty-eight Purdue University undergraduate students were recruited to participate in this experiment. There were two category structures (RB and II) and three testing conditions (C, SC, and NC). Participants were randomly assigned to one of the six combinations of category structure \(\times\) testing conditions: RB/C (\(n = 32\)), RB/SC (\(n = 31\)), RB/NC (\(n = 30\)), II/C (\(n = 33\)), II/SC (\(n = 32\)), and II/NC (\(n = 30\)). Each participant was given credit for participation as partial fulfillment of a course requirement.

Material

The stimuli were lines of various lengths and orientations presented on a 21-inch monitor (\(1920 \times 1080\) resolution). Each stimulus was defined in a 2D space by a set of points (length, orientation) where length was calculated in pixels and orientation (counterclockwise rotation from horizontal) was calculated in degrees. The stimuli were generated with the Matlab Psychophysics toolbox (Brainard, 1997) and occupied an approximate visual angle of 5 degrees. Figure 2a shows an example stimulus.

Four categories (arbitrarily labeled “A”,“ B”, “C” and “D”) were generated using the randomization technique of Ashby and Gott (1988). Each category was generated using a bivariate normal distribution. The parameters to generate the RB category structures were as follows (Fig. 2b): \(\mu _A = \left(110, 67 \right)\), \(\Sigma _A = \left({\begin{matrix} 50 & 0\\ 0 & 350 \end{matrix}} \right)\); \(\mu _B = (150, 67)\), \(\Sigma _B = \Sigma _A\); \(\mu _C = (190, 67)\), \(\Sigma _C = \Sigma _A\); \(\mu _D = (230, 67)\), \(\Sigma _D = \Sigma _A\). To generate the II category structures, we used the following parameters (Fig. 2c): \(\mu _A = (122, 88)\), \(\Sigma _A = \left({\begin{matrix} 646 & 313\\ 313 & 179 \end{matrix}} \right)\); \(\mu _B = (159, 77)\), \(\Sigma _B = \Sigma _A\); \(\mu _C = (182, 61)\), \(\Sigma _C = \Sigma _A\); \(\mu _D = (210, 44)\), \(\Sigma _D = \Sigma _A\). RB categories can be separated using a rule on line length while ignoring the line orientation: the shortest lines are from category “A”, medium-short lines are from category “B”, medium-long lines are from category “C”, and the longest lines are from category “D”. No such verbalizable rule exist for II categories. Perfect accuracy was possible in all conditions.

Twenty-four stimuli were generated from each category for a total of 96 stimuli. The resulting stimuli were re-shuffled at the beginning of each block and each stimulus was presented once in each block. In each trial, a single stimulus was presented in the center of the screen with a question in the center-top of the screen asking a specific categorization question: “X or Y?”, where X and Y stand for one of the category labels used in the experiment. During the training blocks, the category labels were A, B, C, or D. For example, substituting A for X and B for Y would produce the questions “A or B?”. The question indicated the possible choices for the categorization trial. By creating all the possible combinations there were six possible questions. Each question appeared 16 times in each training block. Correct responses for each question were also equally split (e.g., the correct response to half of “A or B?” was “A” and other half “B” and so on). Positive feedback was indicated by the word “Correct” in green font, negative feedback was indicated by the word “Incorrect” in red font, and late responses (i.e., more than 5 seconds) were followed by the words “Too slow!” in black font.

During the test block, two new non-overlapping categories were formed by merging together two training categories. The new categories were arbitrarily labeled “1” and “2”. In the C condition, 1 = {A, B} and 2 = {C, D}. In the SC condition, 1 = {A, D} and 2 = {B, C}. In the NC condition, 1 = {A, C} and 2 = {B, D}. The test categories are shown in Fig. 3 for the RB conditions and in Fig. 4 for the II conditions. Trials during the test block were identical to those in the training blocks except that the categorization question was always “1 or 2?”.

Participants responded using a standard keyboard. Key “d” always corresponded to category “A” and key “x” was the category that merged with “A” in the test phase. The keys “k” and “m” were used for the other two categories. Therefore, the key locations depended on the testing condition. The reason for this was to have the response buttons of the categories that were merged together at test be on the same side during training to exclude any possible motor effect when comparing different testing conditions. Keys “e” and “i” corresponded to test categories “1” and “2”, respectively, for all testing conditions. The keyboard configurations for all conditions are shown in Fig. 5.

Procedure

Each experimental session was composed of five training blocks and one test block. Participants were told that they would be doing a categorization task for six blocks, and that the stimuli were lines varying in length and orientation. They were also told that there are four categories “A”, “B”, “C” and “D” and that on each trial they would see a stimulus and be asked to choose between the two categories mentioned in a question on top of the screen. They were told that the first five blocks would be training blocks in which they receive feedback while the last block would be a test block where they would not receive feedback. Participants were told that they would see instructions on the screen about the test phase after finishing the last training block. The test instructions varied based on the testing condition, but they were all similar and told participants which categories would be merged in the test block. For example, the instructions for the semi-contiguous conditions was: “Categories A and D will form a new category, '1'. Categories B and C will form a new category, '2'.”

A training trial went as follows: (1) a fixation cross was presented in the center of the screen for 1500 ms; (2) The crosshair disappeared and was replaced by the line stimulus and the question. The stimulus and question stayed on screen until the participant pressed a key corresponding to one of the two categories in the question. (3) After a key was pressed, feedback was presented for 750 ms. Test trials were identical to training trials except that no feedback was presented.

Results

A binomial test was used to identify and exclude participants who performed randomly during the last training block (i.e., non-learners). The rationale was that participants who did not learn the training categories should not be able to merge the training categories (which is the main goal of the experiment). Specifically, we excluded participants whose accuracy in Block 5 was not above chance (\(p < .05\)) according to a binomial distribution (\(p = 0.5\), \(n = 96\)).^{Footnote 2} This corresponded to a 59% accuracy threshold. Using this threshold, 31 participants were excluded (16.5% of the sample), and 157 participants remained in the analysis, with at least 25 participants left in each condition (see Fig. 6 for exact counts per condition).

Learning phase

Figure 6 shows the mean accuracy for each block for each testing condition. The left panel (a) shows the RB categories while the right panel (b) shows the II categories. In both panels, the first five blocks were training and the last block was the testing block. A 2 (RB, II) \(\times\) 3 (C, SC, NC) \(\times\) 5 (Block) mixed effect ANOVA was performed on the training data. As expected, the main effect of Block was statistically significant \((F(4, 628) = 144.11, p < .001, \eta ^2 = 0.18)\), showing that participants were able to learn the task. The effect of Category was also significant \((F(1, 157) = 70.51, p < .001, \eta ^2 = 0.19)\), showing that participants were more accurate with RB categories than II categories. However, these main effects need to be interpreted with care since the Category \(\times\) Block interaction also reached statistical significance \((F(4, 628) = 7.29, p < .001,\eta ^2 = 0.01)\). The interaction was decomposed by computing the effect of Block within each level of Category. The results show that the effect of Block reached statistical significance for both RB \((F(4, 328) = 91.70, p < .001, \eta ^2 = 0.27)\) and II \((F(4, 300) = 54.09, p < .001, \eta ^2 = 0.18)\) categories, confirming that participants were able to learn both category structures. The interaction was thus likely caused by a larger increase in accuracy with the RB categories than with the II categories. Mean accuracy in Block 1 with RB categories was 70.9%, which improved to 88.2% in Block 5. For II categories, mean accuracy in Block 1 was 64.8%, which improved to 76.2% in Block 5. All other main effects and interactions failed to reach statistical significance (all \(F < 1.54, n.s.\)).

Testing phase

The main goal of this experiment was to test whether participants could merge learned categories together to form new categories. The transfer cost was calculated as the difference in accuracy between Blocks 5 and 6 and is shown in Fig. 7. Again, the left panel (a) shows the RB categories whereas the right panel (b) shows the II categories. A 2 (RB, II) \(\times\) 3 (C, SC, NC) ANOVA was performed on the transfer cost. Both the effects of testing condition \((F(2, 151) = 111.97, p < .001, \eta ^2 = 0.57)\) and category \((F(1, 151) = 4.39, p < .05, \eta ^2 = 0.01)\) reached statistical significance. However, the main effects need to be interpreted in the context of a statistically significant interaction \((F(2, 151) = 6.29, p < .01, \eta ^2 = 0.03)\). We proceeded by decomposing the effect of testing condition within each level of Category. For RB categories, the effect of testing condition was statistically significant \((F(2,79) = 33.76, p < .001, \eta ^2 = 0.46)\). Bonferroni-corrected pairwise comparisons show that all pairwise differences were statistically significant \((p < .001)\). The mean transfer costs were: C = 00.0%; SC = 11.4%; and NC = 21.6%. For II categories, the effect of testing condition also reached statistical significance \((F(2,72) = 106.00, p < .001, \eta ^2 = 0.75)\). Again, Bonferroni-corrected pairwise comparisons show that the C condition differs from both the SC and NC conditions \((p < .001)\). However, unlike for RB categories, there was no statistical difference between the SC and NC conditions. The mean transfer costs were: C = − 8.4%; SC = 14.3%; and NC = 18.4%.

Next, a t test was performed to assess whether the transfer cost was statistically different from zero in each testing condition of each category. For RB categories, the transfer cost was not statistically significant in the RB/C condition (\(t(24) = 0.09, n.s.)\), but reached statistical significance for both the RB/SC and RB/NC conditions (both \(t> 6.83, p < .001\)). For II categories, all transfer costs were statistically different from zero (all \(|t|> 6.00, p < .001\)), but note that this difference is negative for the C condition, showing a facilitation effect instead of a cost. In contrast, the transfer costs were negative for the SC and NC conditions, showing a true cost of merging categories (similar to RB categories). Hence, breaking contiguity had a transfer cost for both RB and II categories, but the cost was progressive for RB categories and all-or-none for II categories.

Discussion

The results of Experiment 1 show no evidence of a transfer cost for C conditions with either RB or II category structures. One surprising result is that there was facilitation when merging II categories. It is possible that participants averaged the distributions of the merged categories and used a single integrated distribution for the “1” category and another single integrated distribution for the “2” category (instead of forming mixture models—see middle-left of Fig. 1). No increase in accuracy was observed in the simulations of this model because of a ceiling effect in training accuracy, but if training accuracy is reduced by biasing the estimated means of the training generative models the integrated model does produce a higher test accuracy. It is thus possible that participants in the contiguous II condition used this response strategy at test. Note that this “single integrated distribution” strategy is only possible with within-category information, so it was unlikely to be used with RB categories, which could explain why no facilitation was observed in the RB/C condition. This result was unexpected and the experiment was not designed to test for this possibly. Still, clearly, there was no transfer cost for both RB and II categories.

In contrast, a transfer cost was present for all other conditions. Critically, the SC condition was differently affected by the category structures. Specifically, the SC condition differed from both the C and the NC conditions with RB category structures, with a transfer cost falling somewhere between these two conditions. This result is in line with the hypothesis that participants learn between-category information in A/B with RB categories (Hélie et al., 2017), so the transfer cost increase with the number of decision bounds that needs to be assembled. In contrast, there was no evidence of a different transfer cost between the SC and NC conditions with II category structures. This suggests that, when trained with an A/B paradigm, category contiguity may be an all-or-none phenomenon with II category structures because participants are learning a within-category representation and are forming mixture models of the generating distributions (at least when the merged categories are not fully contiguous). Experiment 2 tested whether these effects were also present with YES/NO training.

Experiment 2

Experiment 2 tested the effects of category structures and category contiguity on the compositionality of categories learned using a YES/NO task. Experiment 1 showed that transfer cost increased gradually with the required number of decision bounds with RB categories (consistent with between-category representations) but that this increase was all-or-none with II categories (consistent with within-category representations). However, Hélie et al. (2017) showed that, unlike A/B categorization, YES/NO categorization leads to learning within-category information with both RB and II categories. Hence, Experiment 2 tests whether breaking the contiguity of the training categories at test would increase the transfer cost in an all-or-none fashion. As in Experiment 1, participants learned four II or RB categories using trial-and-error and then transferred to a two-category task where two new categories were created by merging learned categories. The merged categories at test could be contiguous, semi-contiguous, or non-contiguous. The only difference between Experiments 1 and 2 is that Experiment 2 used YES/NO training instead of A/B training.