Paraphrasing Berliner’s (1993) assessment of educational psychology, critical-analytic thinking has a long history but a short past. Dewey’s (1933) and Glaser’s (1941) classic work can be considered the beginnings of the modern instantiation of the critical-analytic thinking movement that has spawned a vast literature and the hope for a more deeply informed populous. Critical-analytic thinking (CAT) is regarded as an essential aspect of progress and knowledge growth in any scholarly discipline. Moreover, CAT is thought to be a prerequisite to determining the best course of action in important, complex decisions and is, therefore, indispensable to a democratic society that seeks to promote or protect the welfare of its citizenry (Brookfield 2012). Unfortunately, research suggests that CAT is not a “habit of mind” or prevalent for even college-educated adults in the USA or elsewhere that humans may roam.

Examination of the literature on CAT in diverse domains (e.g., education, cognitive and social psychology, philosophy, social studies, science, or politics) reveals that CAT is considered to be a primary goal of society and vital to the success of individuals, institutions, and governments (e.g., Brookfield 2012; Halpern 2014; Kahneman 2011). In the present article, we describe and expand on this general perspective and, in so doing, hope to provide increased clarity about the nature and development of both the components of CAT and CAT per se. Although an historical review of the literature reveals that there are some differences of opinion regarding the nature and development of CAT (see Schraw and Gutierrez 2012), the field appears to have gained a strong foothold in levels of education. In what follows, we first describe the emerging views on the nature of CAT and then discuss developmental trends in the expression of CAT and possible explanations for these age trends. We then end by describing instructional approaches that could lead to increased engagement in CAT in the general population. Although a historical review of prior accounts would make for an interesting comparison to more current views, conducting such a review is beyond the scope of this article. We touch base with prior accounts somewhat in our section below discussing what CAT is not.

The Nature and Importance of CAT

To understand current perspectives on the nature of CAT, it is helpful to begin with a functional analysis of this ability (i.e., its potential utility in particular contexts and why it evolved), then consider its cross-situational variability and consistency, and conclude this first section by contrasting CAT with other skills with which it sometimes gets conflated.

A Functional Analysis of CAT

Although early perspectives on CAT cast it in terms of an individual working alone on a reasoning task, recent perspectives suggest that CAT may have evolved because it can play an important role in various types of information exchange or symbolic interaction (Brookfield 2012; Kuhn 1999; Mercier and Sperber 2011; Moore 2011). Within these communicative interactions, it is common for claims to be made about some prior or present state of affairs. For example, consider the ubiquity of claims that might be made in contexts such as reading a textbook in school, watching televised news broadcasts, watching an advertisement about a new prescription drug, listening to a political speech, seeking advice from a physician, or simply having a conversation with a colleague. Not only history textbooks describe events that are alleged to have occurred, but also, in addition, their authors often provide explanations as to why these events occurred. Both the description of the event and the explanation are claims that have a truth value or degree of “correctness.” Moreover, both are made on the basis of evidence. A similar account could be provided about science textbooks that describe scientific constructs (e.g., atoms) and provide explanations of these constructs (as part of theories). News broadcasts likewise are rife with claims about reality (e.g., Russian troops in Ukraine in April 2014) and often recruit “experts” to explain these events. Even casual conversations include descriptions and explanations (e.g., what a boss reportedly said in private about a new venture to one employee). In fact, much of our knowledge of the world comes from others, rather than being the result of primary experience, and necessitates an analysis and critical evaluation of sources, internal coherence, and relation to other sources of information (Harris 2002).

These information exchanges can have an effect on listeners by changing what they know, what they believe, or what they feel; changes in knowledge, belief, and affect, in turn, can influence the behaviors of listeners (Brookfield 2012; Ennis 1987; Halpern 2014; Mercier and Sperber 2011). A little reflection about the examples in the preceding paragraph shows that inaccurate claims could have disastrous consequences for individuals (e.g., faulty medical advice given by a physician), a nation (e.g., going to war on the basis of flawed intelligence), or even the world (e.g., decisions based on the evidence regarding climate change). Critical-analytic thinking comes into play when we question or at least do not simply passively accept the accuracy of claims as givens. Claims are always inferential even when they are based on evidence. Evidence does not exist in-and-of itself; rather, aspects of a situation that we call “evidence” are always interpreted by people (regardless of whether they are historians, scientists, or non-scientists). Mature or advanced forms of critical-analytic thinking include knowledge of the various factors that could contribute to claims being inaccurate such as (a) the interpreter of evidence could be biased in some way; (b) the interpreter or a second-hand reporter of this interpretation (e.g., an author or politician) is trying to be intentionally deceptive about the evidence; (c) the evidence is partial, degraded, equivocal, or open to multiple interpretations; and (d) the evidence was collected in a manner that does not allow clear interpretations (e.g., an experiment with multiple confounds). Excellent examples of non-scientific factors that play a role in reasoning about major scientific issues are found in recent analyses of politically motivated attitudes to climate change (Capstick and Pidgeon 2014) and toward nuclear power (Kahan 2013).

But, a second aspect of advanced CAT is the ability to recognize flawed reasoning or flawed arguments derived from claims. Reasoning chains consist of one or more premises followed by conclusions that vary in the degree to which they follow with validity or certainty (Johnson-Laird 1980; Thompson and Evans 2012). In the best (and perhaps least common) case, conclusions follow logically from premises when the premises are assumed to be true (i.e., the conclusion cannot be wrong). Such cases are called instances of deduction. An example would be modus ponens inferences (e.g., “if an animal has 23 chromosome pairs, it must be human; this animal has 23 chromosome pairs, so it is human”).

In other cases, the premises follow in a more probabilistic manner such as in instances of induction (e.g., “every swan I have seen is white; therefore, all swans are white”), abduction (e.g., “this hypothesis fits the evidence best, so it must be true”), and analogical reasoning (e.g., “if we do not attack Russia now because it massed troops at the border of Ukraine, Russia will invade Ukraine in the same way that Iraq invaded Kuwait when Iraq massed its troops at Kuwait’s border and we did not attack;” Dunbar and Klahr 2012; Lombrozo 2012). People are more likely to agree with inductive, abductive, and analogical conclusions as the probability of the conclusion being false goes down. In addition to classic forms of deduction and induction, there are also examples of slippery slope arguments (“if you let the government restrict the number of ammunition rounds for guns, they will eventually take away your guns as well”), non-sequiturs (e.g., in education, many teachers have subscribed to the ideas that you teach logical arguments to the left side of the brain and holistic processes to the right side of the brain, based on a non-critical evaluation of reports of brain research), and an uncritical approach to the popular media.

As was noted for CAT about evidence, CAT about reasoning could also be very helpful or adaptive because it could help someone avoid being persuaded via an argument chain into a state of belief that could lead to disastrous consequences (e.g., when a politician with no knowledge of biology claims that a vaccine against the Herpes virus causes autism). This is why argument skills have recently been held as central to a stance of epistemic vigilance (Mercier and Sperber 2011) and why argument skills have been alleged to have an evolutionary basis. According to Mercier and Sperber (2011, p. 60), “(t)he main function of reasoning is argumentative: Reasoning has evolved and persisted mainly because it makes human communication more effective and advantageous.” While there are numerous theories of the reasons for errors in thinking and reasoning, accounts of both inductive and deductive reasoning have consistently demonstrated that thinkers ignore alternate hypotheses and interpretations of data and allow beliefs to override the evidence. In fact, recent neuroimaging studies have shown that prior beliefs can bias the processing of information, whereby information inconsistent with one’s prior beliefs is blocked from being processed in the brain (Fugelsang and Dunbar 2005)

The recent suggestion that CAT is particularly suited for information exchanges and arguments also highlights the fact that people have to not only be on guard against potentially misleading evidence or unwarranted claims of others, but also be on guard against being guilty of the same tendencies themselves. If individuals recognize that someone else could be (a) relying on equivocal evidence, (b) biased in their interpretations, or (c) drawing unwarranted conclusions, those individuals, if fair-minded, would worry about their own biases getting in the way of evidence interpretation and valid inference making (Stanovich 2012). If we combine CAT about others and CAT about oneself, it becomes easy to see why major medical and nursing organizations emphasize the importance of instilling CAT in their curricula (e.g., American Nursing Association 2010). Medical staff can be biased or overly accepting in their interpretation of symptoms and test results (i.e., the latter serving as evidence in the current framework). Lack of CAT can lead to faulty diagnosis and implementation of wrong treatments. However, see Dumas et al. (2014), for combinations of CAT skills in medical teams that increase the probability of correct diagnoses.

Before moving on to definitional characteristics of CAT, we must first situate CAT in the realm of human constructed categories and contemporary research and theorizing on categories and, second, discuss the role of problem solving and causal reasoning in CAT. Turning now to categorization, categories ranging from “animals,” “furniture,” “fruits,” “sports,” “science,” and “thinking” have internal structure: There are prototypical sports such as football and less prototypical sports such as curling. Furthermore, as Wittgenstein (1953) noted, there are no absolute defining features of all sports; rather, there are typical sports that share a high probability of attributes, but none is absolutely necessary. Most humanly constructed categories are like this, and CAT is no exception. Thus, no one feature, or even small set of features, is absolutely necessary for CAT to occur, but the more features are present (particularly so-called characteristic features), the more we can say that prototypical CAT has occurred. If we take this critical-analytic thinking as category (CATC) approach, then we place CAT in the realm of other crucial categories that have been used to understand, evaluate, and transform the world. Dunbar (2002) used a similar science as category approach to explain the nature of scientific thinking and its development. Here, certain aspects of science, such as conducting experiments, are typical features of science but are not absolutely necessary. Rosch (1975) and Smith and Medin (1981) articulated this view of categorization in a detailed and elegant fashion.

Based on the preceding considerations, then, we can say that CATC has the following characteristic features rather than necessary and sufficient features—many of which parallel the shared claims overviewed in the introduction to this special issue (Alexander 2014):

  • It is metacognitive and reflective, because it requires thinking about your own or someone else’s thinking (Kuhn 1999). In other words, it is different than, and something more than, simply making an argument or comprehending someone else’s argument.

  • It is evaluative, because it involves the ability to think about the quality of evidence or the degree to which an argument is sound or compelling. It is different than simply using evidence to draw reasonable conclusions or accepting arguments or claims without evaluating them.

  • It is skeptical and moderately distrusting, because it is human to be biased, draw unwarranted inferences, be self-serving, or conduct flawed investigations; scientists, historians, politicians, and physicians make mistakes all of the time and often work in their own self-interest; their work and reasoning need to be checked, double-checked, and challenged.

  • It is analytic, because it involves separating out and scrutinizing the elements of the evidence gathering and evidence evaluation processes (e.g., such as a theory and the evidence to support this theory; the individual steps of a reasoning chain and the permissible inferences between them).

  • It attempts to be unbiased and open-minded, because one has to guard against uncritical acceptance of one’s own perspective, recognize the possibility of making erroneous assumptions, and be open to the fact that other perspectives may be more accurate and more likely to lead to favorable results than one’s own perspective.

  • It is effortful, potentially time-consuming, and mentally taxing, because being open-minded and engaging in metacognitive evaluation and analysis requires additional processing capacity beyond that needed for comprehension or inference making. In addition, one may need to spend extra time investigating claims further (e.g., reading articles cited by an author to see if the author accurately summarized these articles).

  • It requires a sufficient amount of domain-specific expertise, because metacognitive knowledge of content and metaprocedural knowledge of evidence-collecting procedures in different fields (e.g., history, medicine, and neuroscience) are required to engage in the evaluative and analytic thinking previously described (Willingham 2008). For example, if a person knows biology content and knows how to engage in biological research methods, that person is more equipped to be critical about the reasoning and methods of biologists. If someone lacks of expertise, such a person could only bring to bear the skepticism and open-minded components of CAT.

Cross-Situational Variability and Consistency

Over the past 40 years, theoretical perspectives on reasoning and CAT progressed in the following manner. Early studies assumed that the human mind is inherently rational and logical. However, when a variety of experiments showed that the average adult fails to select the correct logical response on reasoning tasks or demonstrates other forms of non-optimal performance on decision-making tasks, the field largely abandoned the original assumption of formal reasoning abilities and replaced this model with one that not only assumed intuitive, fast, and often unconscious skills as being the norm, but also argued that these skills are more adaptive or effective than the formal, logical kind (Kahneman 2011; Thompson and Evans 2012).

However, as is often the case in many scientific fields, the original logical model (the thesis) and later intuitive model (the antithesis) have been replaced more recently by synthetic perspectives that combine these two approaches. The combined approaches are known collectively as dual-process theories (Evans 2012; Stanovich et al. 2013). Dual-process theories assume that adults have both intuitive, fast, and unconscious inferential processes (system 1) that, in many situations, suffice to accomplish everyday tasks. However, adults also have the ability to override this system with abilities that are reflective, metacognitive, detached, logical, and relatively open-minded (system 2). The kinds of skills required for CAT previously described reflect system 2 thinking. However, there is much controversy regarding the nature and extent of system 1 and system 2 modes of thinking (Evans and Stanovich 2013). More than likely, these theories do not have absolute defining features, just like CAT itself.

The fact that system 1 and system 2 operate in parallel and sometimes in conflict can explain why the reasoning of the same individual can look highly competent in some contexts but highly incompetent or irrational in other contexts. Many researchers have argued that the default mode is system 1 and people need to be enticed into system 2 thinking by contextual or motivational cues. One common strategy is to present participants with fictional studies that have detectable flaws and demonstrate how participants are more likely to find the flaws when the results are threatening to their beliefs (e.g., suggesting that Christians are more likely to become drug abusers when the participants in the hypothetical study are described as devout Christians) than when the results are consistent with their beliefs (e.g., Klaczinski and Lavallee 2005; Klaczynski and Robinson 2000). As we will describe, such dual-process accounts have implications both for designs of experiments and for classroom interventions to promote CAT. But, see various critiques of system 1 and system 2 theorists, by dual-process theorists themselves (Evans and Stanovich 2013). This in itself is an example of critical-analytic thinking.

But, there is another way that the concept of cross-situational consistency is relevant to CAT research: the notion of a disposition or habit of mind toward CAT. Are there people who routinely approach text-based, media-based, and person-to-person information exchanges with a CAT orientation, even when the information presented is consistent with their own biases (e.g., a conservative watching, a conservative news outlet, or a liberal watching a liberal news outlet)? If so, what educational or occupational experiences promoted this disposition? Stanovich and colleagues (2013) believe that a disposition toward CAT is an individual-difference variable (i.e., some people are more prone to exhibit CAT across contexts than others) and have constructed a self-report measure to assess this tendency in a variety of studies. Others have argued that CAT is more domain-specific (Willingham 2008) or have found that rationality on some tasks is uncorrelated with rationality on others (Thompson and Evans 2012).

Contrasting CAT with Other Cognitive Processes

As noted, there are some points of disagreement in the various descriptions of CAT in the literature. Nevertheless, we also argued that it is nevertheless possible to see CAT as having the properties of other valid human categories possessing prototypes and fluid boundaries, rather than being all-or-none categories of CAT previously described. Whereas most authors acknowledge the elements of CAT that are described above, some have also conflated CAT with other aspects of mind. This is problematic for measurement purposes (operational definitions follow from theoretical definitions), but also because it muddies the theoretical waters. Before moving on to a discussion of developmental trends, therefore, it is important to say what CAT is not, in order to increase definitional clarity and precision.

In some accounts of CAT (e.g., Halpern 2014), the terms “critical thinking” and “intelligent thinking” are used interchangeably. This is the reason that we have adopted the CAT as category approach as we see that CAT can have many of the elements of intelligent thinking. However, the typical descriptions of CAT and contemporary descriptions of intelligence are not synonymous. Logically, one could argue that it is possible to engage in CAT without having high levels of intelligence and vice versa. Indeed, researchers such as Robert Sternberg and his colleagues (2014) recently contrasted instructional conditions that were designed to instill either “successful intelligence” or critical thinking, thereby demonstrating their belief that these skills were distinct. Also, various studies have found low to non-significant correlations between measures of CAT and indices of intellectual ability (Stanovich et al. 2013).

Other authors have seemed to equate CAT with problem-solving ability (e.g., Willingham 2008). Again, the classic definitions of problem solving (e.g., setting a goal, developing options for meeting this goal, evaluating these options, and implementing the best choice) are somewhat distinct from the set of skills described here, especially when the notion of information exchange or argumentation is at issue. Although it is helpful to engage in CAT when in the midst of examining information that could be used to help develop solutions to problems, CAT is also used in other kinds of situations as well. Some problem-solving situations involve information exchange or argument, but not all information exchange situations involve problem solving (e.g., watching news broadcasts and listening to a debate). Thus, there is overlap in the two constructs but they are not identical.

Still others equate CAT with the ability to draw valid deductive inferences and reasonable inductive inferences (e.g., Ennis 1987). In contrast to models that equate CAT with intelligence or problem solving and thus construe CAT too broadly as opposed to treating it simply as an ally process, the equating of CAT with logical reasoning ability defines CAT too narrowly. Asking people to select appropriate answers on a measure that presents deductive and inductive arguments requires some aspects of CAT (e.g., metacognition, analysis, and mental effort) but not others (e.g., skepticism, domain-specific knowledge, open-mindedness, information exchange, and critical evaluation of methods and evidence). However, research on aspects of both deductive and inductive thinking demonstrates a belief bias effect, where prior beliefs overpower deductive or inductive thinking, thus severely limiting CAT. Our point here is that while CAT is not synonymous with problem solving, deductive thinking, or inductive thinking, these processes are often involved in CAT, and findings from these literatures do shed light on understanding CAT and suggesting possible avenues for improving the teaching of CAT.

At this point in our discussion, it is necessary to further articulate the nature of problem solving and how the results of problem solving research shed light on CAT. Following from Newell and Simon’s (1972) “human problem solving,” researchers have considered problem solving as a search through a problem space that takes the problem solver from the start state to the goal state (see Bassok and Novick 2012 for a review of problem solving research). A problem space consists of all the intermediate states that a problem solver goes through between the start state and the goal state. Strategies or heuristics is used to search through the space, and different strategies are more or less efficient in searching through a problem space. When searching through a problem space, problem solvers must decide how to search by evaluating the benefits and costs of moving to one state or another. Put more technically, problem solvers apply an evaluation function, which can be described computationally.

However, how far ahead the problem solver looks at each evaluation step will radically determine the types of problem solutions that a problem solver will reach. The different strategies or heuristics that are used to solve problems have been well specified in virtually all domains ranging from architecture, business, medical reasoning, and science. Surprisingly, very few researchers have incorporated ethical, emotional, or societal consequences into their problem solving models (but see Baron 2007). Furthermore, researchers in CAT generally have not used the well-specified problem solving framework in their discussions of problem solving in CAT (but see Halpern 2014).

Is CAT the same as scientific thinking? Similar to what was noted for problem solving, one could engage in scientific thinking without using CAT, and CAT could be used in non-scientific contexts (e.g., history and politics). However, if an historian was really interested in getting closer to the truth about what really happened during the “Boston Massacre” (who really started the fight and how many were actually killed?) and a scientist was really interested in getting closer to the truth about the nature of light energy (is it quanta or a wave?), the use of CAT would enable both kinds of scholars to develop more accurate proposals than if they did not engage in CAT. However, the domain specificity of CAT (Willingham 2008) suggests that an historian would not be as competent as a scientist if the historian were to engage in CAT for a scientific question. Relatedly, a scientist would not be as competent as an historian if the former were to engage in CAT for an historical question. It is for the same reason that taking research methods and statistics courses might improve the ability to engage in CAT about traditional psychological research, but not necessarily about other kinds of research (e.g., neuroscientific studies). But, these assumptions can be evaluated empirically. Perhaps, taking such courses does generalize beyond specific domains.

The Development of CAT

At a basic level, two primary questions can be posed regarding developmental trends: (a) when are children capable of CAT and (b) how often do children and adults at various points in development exhibit CAT, on their own, in everyday (non-contrived) contexts. To answer these questions, one has to first derive operational definitions for CAT from theoretical definitions in order to be able to measure CAT at different ages. These operational definitions normally allow one to “unpack” CAT into its components skills. The list of components, in turn, allows one to ask developmental research questions. Based on the CATC approach taken in this article, such questions might include the following: (a) When can children think about, and evaluate, their own thinking or the thinking of someone else? (b) When can children understand that evidence needs to be interpreted and evidence can be equivocal (or non-demonstrative) and that different people can draw different interpretations from the same evidence? (c) When can children distinguish among conclusions that follow from premises and conclusions that do not follow or are merely probabilistic? (d) When can children distinguish between theories and evidence and understand how evidence bears on the accuracy of theories? (e) When can children understand the difference between methodologies that generate convincing or credible evidence and methodologies that generate equivocal or misleading evidence? (f) When can children demonstrate skepticism, recognize bias, and avoid bias? (g) Generally, then, when can they demonstrate System 2 thinking? A number of comparable questions can be constructed by substituting the phrase “When can children…” with the phrase “How often do children, adolescents, and adults…” in the prior questions.

To provide comprehensive and detailed answers to the aforementioned questions is beyond the scope of this article, but we can sketch the outlines of a general developmental trajectory. First, we must discuss what we know about the development of the cognitive processes underlying CAT and then discuss findings regarding the development of children’s ability to engage in CAT. Over the past century, but especially the last 50 years, there has been an enormous amount of research on cognitive development with major paradigm shifts occurring at multiple points in time. The research of Jean Piaget looms large over the field of children’s cognitive development, which cannot be covered in any detail here Byrnes (2008).

However, two key theoretical and empirical issues are worthy of discussion as they are relevant to CAT. The first is that Piaget argued that children are incapable of abstract thought until approximately age 12, when they enter the stage of formal operations. As stated, recent research on children’s thinking has largely discredited this view, demonstrating that children as young as 3 years of age are capable of abstract thought and have some of the thinking abilities that are needed for CAT in certain situations. Second, research findings have undergone many changes regarding the development of children’s memory systems. Again, memory is an important part of any CAT system. A critical-analytic thinker must hold information in memory while engaging in CAT, and generally speaking, there is an increase in children’s memory capacity as they become older. This has enormous implications for CAT as one might conclude that more memory capacity might lead to better CAT and better problem-solving abilities. However, research shows that the story is more complex than this.

With respect to thinking about thinking, 1- and 2-year-old children must possess the ability to “read minds” in order to learn language (Bloom 2000), but thinking about flawed thinking or recognizing multiple interpretations of the same evidence is another matter. In a recent review of preschoolers’ metacognitive abilities, Mills (2012) describes the many ways in which preschoolers can reveal surprising metacognitive competencies in contrived laboratory situations that supply considerable scaffolding. For example, in one study, the child might observe two adults look inside a container and utter the name of a hidden toy. One adult, however, does so when the toy is not in the container, so the child has the opportunity to learn that the latter adult is not a reliable (or trustworthy) source. When given the chance to look inside a container to obtain the toy, the child often uses the more reliable adult as a guide (particularly after age 4). After reviewing a variety of such studies, Mills concludes (p. 407) the following:

Beyond age 3, children show some ability to move beyond absolutism when evaluating others’ claims—for instance, when deciding whom to trust, they pay attention to how many errors someone has made in the past. They are also capable of detecting some forms of deception. That said, before the age of 6 or 7, children tend to struggle more when the reasons to doubt are more subtle (e.g., trusting someone who has made small errors instead of large ones in labeling familiar animals) or involve greater understanding of the role of intentions and motivations (e.g., detecting deception).

However, research on theory of mind and moral reasoning has demonstrated that young children, and even perhaps infants, can take the perspectives of others into account. For example, Killen and colleagues (Mulvey et al. 2013) have shown that children as young as 3 years of age can realize that it is wrong to discriminate against out-of-group members. Furthermore, as Case (1982) and Chi et al. (1981) have demonstrated, adults can reason in ways that appear childlike, and children with knowledge of a domain, such as dinosaurs, can display adultlike thinking abilities. Thus, contextual information and how situations are framed are central to both CAT in general and the development of CAT in particular. In fact, recent research in cognitive, social, and moral development has demonstrated a degree of sophistication in CAT that was unimaginable even a decade ago. Furthermore, the work of Griffiths and Tannenbaum and their colleagues (e.g., Tenenbaum et al. 2011) have shown that young children have a surprising sensitivity to the trustworthiness of different sources of information that modulates young children’s inductive inferences in the face of probabilistic information. Other studies suggest that children between ages 7 and 10 continue to show increased insight into argument quality, significantly beyond the competence of preschoolers. For example, Baum et al. (2008) revealed clear evidence of the ability to discriminate between informative and circular explanations in 7-year-old children, but this skill was significantly more prevalent in 10-year-old children.

In reviewing studies with older children, adolescents, and adults, Kuhn (1999, 2011) argues that between childhood and adolescence, children tend to be “absolutists” who think that there is a fairly direct mapping between reality and the mind (i.e., they are not constructivists) and that it is not possible for two people to interpret the same situation differently; when they do, one person must be lying. Children, adolescents, and many adults also demonstrate a great deal of difficulty discriminating between their theories of some phenomenon (i.e., why there is recidivism in crime) and the evidence that can be used to support this theory (e.g., effectiveness of community re-entry programs that target possible causes or recidivism). Through education and experience in which they learn about other perspectives, some adolescents and college students become “multiplists,” in which they show increasing understanding of the role of interpretation and context and come to believe that there is no way to discriminate between one viewpoint and another (all viewpoints are equally viable). Finally, some adults reach an “evaluative” epistemology in which they respect the right of people to have their own opinion but use evaluative criteria regarding evidence and reasoning to decide whether one perspective is better than another. Even in educated adults, however, the norm is the multiplist rather than the evaluative stance. These different views of the differing contexts and perspectives have been captured in the late Paul Baltes’s research program on the nature of wisdom and changes in CAT that occur over the lifespan (Baltes and Staudinger 2000). Rather than CAT being fully achieved at a particular age or developmental period, CAT has a flexible quality, which, in our opinion, is both a positive and a negative attribute of CAT.

To Kuhn’s review, we can add a variety of other reviews showing the strong tendency of adolescents and adults to be biased in their reasoning in a variety of ways (e.g., Stanovich and West 2008; Stanovich et al. 2013; Thompson and Evans 2012). For example, people, be they children or adults, tend to show confirmation and “myside” biases in which they try to seek evidence in support of their view, superficially evaluate flawed evidence in support of their view, continue to put resources into a project that is discovered to be flawed (the “sunk cost” effect), and so on. Oddly, some studies have shown children to be less susceptible to such errors presumably because they engage in less reasoning (Mercier 2011). Furthermore, scientists have been shown to be as susceptible to confirmation bias as non-scientists. It is interesting to note that a critical-analytic thinker, after reading reviews of both the early childhood and adult reasoning literatures, would readily recognize biases on the part of researchers to show a great deal of talent in preschoolers, on the one hand, and incompetence of adults, on another.

We can reintroduce several comments made earlier, however, to remind the reader that a growing number of studies have found that the reasoning performance of adolescents and adults can be substantially increased by utilizing group discussion or argumentative formats (Kuhn 2011; Mercier and Sperber 2011) and by engaging their system 2 thinking via motivational (e.g., ego-threatening) means (Klaczinski and Lavallee 2005; Klaczynski and Robinson 2000). However, even these approaches tend to generate successful reasoning only on about 50 % of trials (up from 10 %), so performance is far from perfect.

Possible Explanations of Age Trends in CAT

How might these age trends be explained? One hypothesis is that the standard instruction in disciplines such as science and history does not promote uncertainty and an appreciation for the idea that some inferences are more warranted than others. Indeed, there is clear evidence that teachers and the textbooks they use in elementary, middle, and high schools treat information as facts that have always been true and provide very little insight into the ideas of evidence quality, methodological quality, debates among scholars, and evolution of ideas over time (e.g., Maggioni et al. 2009; Wong and Hodson 2009). Moreover, standard instructional approaches do not typically involve hands-on activities that emulate the practices and methods of working professionals, and most activities are perfectly planned so that there are no surprises or uncertainties (Lehrer et al. 2008). When students exposed to such methods for many years are asked in college to do such things as engage more critically in these disciplines, discover debates in the fields, and recognize uncertainties, it is no wonder why college students might struggle doing so in their courses or when serving as experimental research participants. If we have learned anything about constructivism and sociocultural approaches over the past 40 years, this kind of paradigm shift in thinking should take years to slowly internalize and appropriate. This explanation can also account for the association between years of education and expression of CAT in argumentative contexts (Kuhn 2011).

A second possible explanation for the age trends is that there are developmental constraints on the expression of CAT that would limit the efficacy of instruction that tried to promote increased appreciation for inferential warrants and the idea of progress in disciplines. These constraints could be tied to the development of expertise and also brain development. If expertise in a discipline takes at least 10 years of deliberate practice (Nandagopal and Ericsson 2012), should we expect preschoolers or children below the age of 10 to have the ability to think metacognitively about the quality of evidence or inferential warrants? In addition, given that (a) working memory (WM) capacity increases monotonically between childhood and adulthood (Swanson and Alloway 2012), (b) WM capacity is enhanced by expertise (Ericsson 2013), and (c) CAT requires additional intellectual capacity beyond that needed for everyday reasoning (Stanovich et al. 2013), should we expect younger students to have the WM capacity to engage in CAT?

But, returning to the earlier seeming paradox in age trends (i.e., good skills in preschoolers but low skills in adults), the choice of content within studies investigating the skills of the two age groups and degree of scaffolding within tasks could explain the discrepant findings. Part of the scaffolding with young children involves choosing content about which they are very familiar (and essentially experts about) and walking them through the steps. While this might seem a promising instructional strategy to try in elementary school to elicit CAT in younger children, there is still the problem of covering the required content for a particular grade level that is not so familiar and the fact that the domain specificity of CAT suggests that promoting CAT with familiar content will not generalize or transfer to the unfamiliar content (Willingham 2008). But, it is an empirical question as to whether such an approach would increase CAT in elementary school. In recent publications by researchers such as Karmiloff-Smith and Farran (2012) and Green and Dunbar (2012), it has been argued that epigenetic processes, whereby genes and environment mutually contribute to both the development of thinking and various manifestations of thought, provide new lenses by which the development of critical thinking can be viewed.

The third possible explanation of age trends pertains to motivational reasons for not exerting the time and effort required to engage in CAT. While dual-system theories presume the need to motivate people out of system 1 thinking into system 2 thinking and such theories can explain shifts in performance across contexts in adolescents and adults, there are few empirical reports of the use of motivational tactics to engage system 2 thinking in children below adolescence. Mercier (2011) reviews studies of natural arguments among children (e.g., in the playground or over toys in a preschool) and contends that children can be highly motivated to find many (moral) justifications for their behavior. However, it is not at all clear that the examples provided truly exemplify CAT as described here. Again, it is an empirical question as to whether motivational strategies can be used to elicit system 2 thinking in children below the age of 10, and we hope researchers will take on this challenge.

Possible Instructional Approaches to Promote CAT

The instructional implications of our analysis draw directly from the preceding discussion of developmental mechanisms that could explain age trends in the expression of CAT and from contemporary theories emphasizing a hypothesized evolutionary basis of CAT in informational exchanges and arguments (e.g., Kuhn 2011; Mercier and Sperber 2011). Perhaps, the first lesson to be learned from scholarship in this area is that classrooms should emulate the work environments of professionals in disciplines such as history or science, at least part of the time (Dunbar and Klahr 2012). That is, students should (a) pose unanswered questions that require the collection of data or evidence; (b) engage in appropriate methodologies to gather this evidence; (c) have opportunities to be surprised by unanticipated findings; and (d) discuss or debate how the anticipated, unanticipated, and missing evidence should be interpreted. Throughout, they should work in teams, engage in discussions, identify sources of uncertainty and problems of interpretation, and present their findings and conclusions for peer review.

Another line of approach to teaching CAT more appropriately is to use more detailed models of the cognitive and sociocognitive processes involved in CAT and develop interventions based on these models. For example, the detailed models of decision making and problem solving mentioned above can be applied to the teaching CAT more effectively. This would allow educators to design more targeted interventions that can have specific effects on the subprocesses of CAT. At the moment, interventions tend to be more holistically focused rather than honed in on specific components. This type of finely tuned approach has been successful in other areas of cognitive development, such as language acquisition (Kuhl 2006) and executive functions (Diamond 2012). We expect that similar targeted interventions would also benefit CAT.

We next discuss, in more detail, several examples of interventions that illustrate some of the foregoing suggestions. The first is the yearlong study of Lehrer et al. (2008) of sixth graders who engaged in an ecological analysis of a nearby retention pond. In this study, teams of students collected jarfuls of water from the retention pond and tried to use this jar as a model of the original pond ecosystem. In weekly meetings, students posed questions to be answered, examined and discussed changes in the jar water, and were encouraged to think about unintended outcomes such as algae blooms and large increases in bacterial colonies. Students were specifically asked to consider the jar as an analogical model of the pond, and how and why the analogy broke down. Rather than use traditional science kits that had failsafe results, the use of the jar models was intended to encourage students to

“learn to ask questions, build and revise systems for investigation, invent measures, construct data representations that would be convincing to other investigators, and decide what conclusions were warranted and how much trust they should be given.” (p. 515)

Time was also spent in developing criteria for what makes a good research question and how to evaluate the quality of evidence. Across the year, student teams would present their questions and progress reports to others, and comments from their peers were used to revise their questions and methodologies, especially when peers noticed problematic aspects of their methods that caused problems of interpretation. For example, one group recognized that that its efforts to create a sustainable ecosystem in their jar might endanger the fish and frogs that they had placed there. So, group members decided to remove the animals and place them in another jar. However, their peers noticed the problem with this approach:

“Just wait…if your fish or frogs start dying in the jar, and you take them out and put them in the middle jar, then you can’t do your question anymore, because they are not in the jar affecting the DO [dissolved oxygen]. They are in some other jar.” (p. 517)

In all, the students showed considerable changes in their understanding of the ecosystem, the optimal methodologies for answering their questions, and the nature of science. To return to an earlier point, this extended form of problem-based inquiry did not replace more traditional instruction nor squeeze out time for covering information that would occur on standardized tests.

As a second example outside of the realm of science, Kuhn and Crowell (2011) followed three groups of sixth graders (80 % Black or Hispanic, 60 % free lunch) who met twice weekly in a 50-min “philosophy” class. The school year was divided into four, 13-week quarters, and students were asked to discuss one new topic each quarter. Examples were whether parents should be allowed to homeschool children, China’s one-child policy, and whether teacher pay should be linked to years of teaching. Each debate cycle consisted of (a) students working in same-stance, two-person teams for several weeks to prepare their case and collect evidence, (b) then arguing with teams who held opposing stances for several weeks using an online system, and (c) finally meeting in a whole group format for an in-class debate. Coaches helped students formulate questions and provided two- to three-sentence answers to their questions. In addition, students were ask to anticipate what other, opposing stances might be and consider problems with these alternative stances (in terms of evidence or arguments). In a debriefing session, students were walked through an “argument map” that detailed the argument structure of each side and the evidence presented. Points were awarded for more effective argument moves and more credible evidence, and a winner was declared. Finally, after the debriefing session, students were asked to write an argumentative essay. Essays were scored with respect to whether (a) no argument was provided, (b) a one-sided argument was provided, (c) an unintegrated presentation of two sides was provided, or (d) an integrated presentation of both sides was provided. Results showed that students in the experimental classrooms generated more effective essays than students in control classrooms but only in the third and fourth quarterly assessments across 2 years. In addition, experimental participants showed a higher level of argument skill on a transfer essay using a content that was not presented in the quarterly assignments.

Across both studies, several things should be noted. First, the results were not due to merely having students work in groups or engage in any kind of discussion. Rather, there was explicit scaffolding of required elements and carefully designed tasks that focused on the form of effective argumentation and methodologies that could generate compelling evidence. Second, it is not surprising (given the prior section on age trends in CAT) that the participants were older than 10 in each study. But, as noted earlier, it is an empirical question as to how young children could be to engage in the activities used in the Lehrer et al. (2008) or Kuhn and Crowell (2011) studies. Third, these studies show that students need multiple attempts to slowly internalize CAT skills and appropriate them over a considerable period of time.

Attempts at producing CAT in a domain general manner through the presentation of counterexamples and anomalous data are sometimes successful, producing stable and lasting conceptual change. However, some concepts are extremely difficult to change or replace through standard constructivist teaching and learning strategies. One hypothesis regarding this problem is that while many concepts can be taught and changed through CAT strategies of argumentation and counterexamples, other more entrenched concepts need to be inhibited and never really go away. For example, in recent neuroimaging research on Newtonian concepts of motion, even experts were shown to be inhibiting their more Aristotelian concepts (Dunbar et al. 2007). Overall, there is no one approach to instruction that will produce CAT. Rather, multiple approaches and theoretical frameworks appear necessary to produce CAT.

Conclusions

In the present article, our goal was to delineate a theoretical description of CAT that included aspects that were common to a variety of approaches in the literature (e.g., CAT is metacognitive, evaluative, effortful, and domain-specific). Based on this definition, we then provided a brief review of the literature on age trends in CAT that revealed an apparent paradox in performance (i.e., young children seem to demonstrate aspects of CAT while adolescents and adults demonstrate surprisingly poor performance) that could be explained by appealing to differences in task content and degree of scaffolding provided in the tasks by researchers in the two age-based literatures. For non-scaffolded tasks that focus on content likely to be encountered in school or in the real world, adolescents and adults show uneven performance that is consistent with the premises of dual-process models (though such models may not represent entirely satisfactory accounts of performance).

We also discussed three hypotheses as possible explanations of these age trends that focused on deficiencies in standard approaches to instruction, developmental constraints related to expertise and WM capacity, and motivation. In light of the importance of CAT to the well-being of individuals and democratic societies, the mediocre to poor performance of adolescents and adults implies that the standard curriculum has to be altered to foster an increased level of CAT in the general population. We described two example instructional approaches that seemed to be effective supplements to traditional instruction that showed promise for increasing CAT in early adolescents and argued that these approaches were effective because they seemed to target developmental mechanisms responsible for age trends.

Although the basic story line of the nature and development of CAT is coming into sharper focus, the framework and summary presented here are intended to serve as a foundation for future studies in the area. There remain many unanswered questions about age trends, and some of the developmental questions posed earlier in this article lack definitive answers. Similarly, there remain many unanswered questions about the best instructional approach to foster CAT in children. Although children below the age 7 may have difficulty with some aspects of CAT, they could nevertheless gain proficiency in project-based learning approaches as a way to deepen their conceptual understanding and familiarity with the methodologies of historians, scientists, and other scholars. They can clearly engage in these activities before they may be able to think critically and metacognitively about these activities.

In closing, one important question with regard to CAT is whether CAT refers to thought with its own distinct set of processes (a modular perspective). Alternatively, the mental processes involved in CAT may be the same as those involved in other types of thinking, such as problem solving, categorical reasoning, and decision making, but that CAT differs in terms of content. A third possibility is that the content of the reasoning, such as ethical or socially and morally important issues, might actually change the reasoning processes themselves; the critical content may produce new types of thinking or new networks of thought processes that constitute CAT. Research on CAT, particularly research on cognitive development and motivated thinking (e.g., Toplak et al. 2014), suggests to us that the latter type of mechanisms underlies the development of CAT. We call this the connected critical and analytic thinking (CCAT) hypothesis. Here, the content of thought creates new connections among preexisting thought processes and can also create new types of CAT processes entirely. Our CCAT hypothesis thus makes it possible to propose an account of what happens when people engage in CAT, using conventional thought processes, and a set of mechanisms for the development of CAT or changes in CAT as a result of learning. This hypothesis accounts for the effects of both the nature and nurture of critical and analytic thought with implications for psychology and education. Using combinations of the letters C, A, and T, it has not escaped our notice that the CCAT hypothesis has biological and genetic implications for the nature of thought.