Juvenile delinquency poses a serious societal problem (Loeber & Farrington, 1998; Snyder & Sickmund, 1999). Why does one young person engage in a criminal offense, while another does not? About seventy years ago, Piaget (1932) suggested that analyzing the development of moral judgment in juvenile delinquents might contribute to answering this question. Betke (1944) refers to studies investigating moral judgment of juvenile delinquents as early as 1903. Since that time, numerous studies have compared moral judgment of juvenile delinquents to that of juvenile nondelinquents.

The cognitive-developmental approach to moral judgment

According to Kohlberg (Colby & Kohlberg, 1987) and others who advocate a cognitive approach to moral development (Blasi, 1999; Nucci, 2002), behavior can only be regarded as moral if it is informed by moral judgment, identified by formal criteria such as impartiality, universalizability, and prescriptive role-taking. Kohlberg (1984) used hypothetical dilemmas to assess moral judgment and posited an invariant sequence of six hierarchically ordered stages of moral judgment, consisting of three levels, namely: preconventional, conventional, and postconventional. At the preconventional level, rules and social expectations are external to the self, imposed from the outside by authority figures; at the conventional level, the self has internalized these rules and social expectations; at the postconventional level, the self is differentiated from rules and social expectations, and moral values become defined in terms of self-chosen principles (Colby & Kohlberg, 1987). Given cognitive development and adequate social perspective-taking opportunities, people advance through these stages and levels (although attainment of the highest level is infrequent; see Gibbs, Basinger, Grime, & Snarey, 2006). More recent views of the moral judgment stages (e.g., Gibbs, 2003) de-emphasize the three levels and instead do emphasize the superficial and pronounced egocentric bias of the immature stages (1 and 2) as risk factors for antisocial behavior in adolescents and adults.

Each stage is considered to be more adequate than the preceding stage in providing more prescriptive, profound, and universally acceptable solutions to moral issues, justifications for moral decisions, or reasons for moral values. Stage six explicitly formulates the moral point of view, that is, the moral ideal, including respect for persons, justice, and benevolence (Kohlberg, Boyd, & Levine, 1990). The stages are supposed to be relatively content- and context-independent, forming so-called “structured wholes.” Nonetheless, breaking the law would more readily seem morally acceptable at the self-centered lower stages, which emphasize the avoidance of punishment (stage 1) or instrumental advantage (stage 2). In contrast, the mature stages may buffer against delinquent behavior. As children enter adolescence and experience attendant increases in size, strength, sex impulse, and ego capability, those who have not achieved at least a mutualistic (Stage 3) understanding of social life are at risk given antisocial opportunities and peer influences (Gibbs, Arnold, Ahlborn, & Cheesman, 1984; cf. Gibbs, 2003; Palmer, 2003).

Kohlberg's cognitive developmental approach and stage model of moral judgment provided moral psychology with a new impetus (Kohlberg, 1958, 1984), which yielded a surge of studies investigating the relation between moral judgment and juvenile delinquency. Blasi's (1980) extensive review revealed that 10 out of 15 selected studies showed a lower stage of moral judgment in delinquent adolescents compared to nondelinquent youth. Smetana's (1990) review and Nelson, Smith and Dodd's (1990) meta-analysis confirmed Blasi's finding that juvenile delinquents generally reason at a lower moral stage than age-matched groups of nondelinquents.

These reviews were limited or flawed in a number of respects. The reviews by Blasi and Smetana were not quantitatively analyzed, and Nelson et al.'s meta-analysis failed to include many of the then available studies. Legitimately, Nelson et al. included only studies that reported on differences in moral reasoning between juvenile delinquents and their nondelinquent age-mates, and they left out studies of subculture groups, such as nondelinquent drug users, as well as adolescents with internalizing problems and learning disabilities, and studies that did not use a comparison group of nondelinquent adolescents. The exclusion of juvenile delinquents with psychopathic disorder from their meta-analysis, however, is a contestable decision, creating generalization problems, as many delinquent adolescents have been diagnosed with psychopathic disorder (Hare, 1980) or DSM IV disorders showing similarities with psychopathic disorder (Vermeiren, 2003). The radical self-centeredness entailed in psychopathy and conduct disorder is commonly observed among juvenile delinquents (Farrington, 2005). Due to other limitations, Blasi and Smetana could not demonstrate to what extent research findings were influenced by characteristics of the studies that were included in their narrative reviews. Nelson et al. inspected relevant moderators, but did not examine the unique effects of these moderators on the differences in moral judgment between delinquents and nondelinquents.

The present study

Lytton (1994) suggested that meta-analyses should be replicated as a rule, because the results of meta-analyses are affected by countless decisions about collecting, coding and analyzing primary studies. Furthermore, given the noted flaws, it seems important not only to replicate but also to extend Nelson et al.'s study. Accordingly, we conducted a meta-analysis which includes the 15 studies that Nelson et al. used, any suitable studies that were left out, and all suitable studies that have appeared since. The first purpose of this study is to test the hypothesis about lower stage moral judgment of delinquent adolescents compared to their nondelinquent age-mates on a more comprehensive set of data. The second purpose of this study is to identify factors that affect the relation between moral judgment and juvenile delinquency.

A number of criticisms apply to most studies investigating differences in moral judgment between delinquents and nondelinquents. For instance, it is often ignored that delinquent populations are seldom homogeneous in terms of type and seriousness of the offense, behavioral characteristics, and personality (e.g. psychopathy or personality disorder). Males and females may differ both in their level of moral judgment and prevalence of delinquent behavior. Most of the time, institutionalization effects are ignored. Ostensibly non-delinquent comparison groups are often not scrutinized for possible delinquent members. Production measures (standardized semi-structured interviews using open questions to elicit the production of moral judgment) and recognition measures (multiple-choice questionnaires to select or evaluate moral judgments representing different stages) may not assess the same attribute, nor may measures using hypothetical and real-life dilemmas (moral competence versus moral performance). The results of studies that have used blind scoring may differ from studies in which blind scoring was not applied. All of these factors might affect the findings. Hence the study's second purpose is to identify variables that function as moderators in the linkage between moral judgment and delinquency, that is, factors having an effect on the direction and strength of the relation between moral judgment and delinquency. As indicated, these factors include sample characteristics, such as socioeconomic status, cultural background, gender, age, intelligence, psychopathy, type of offense, institutionalization, period of incarceration, type of comparison group (matched/clean), as well as factors dealing with the measurement of moral judgment.

Socioeconomic status

The influence of socioeconomic status on moral judgment is explored by Sagi and Eisikovits (1981). They suggest that lower class children may develop an external moral orientation, as their parents use power assertive control strategies that direct children's attention to the consequences of their behavior for themselves. Middle class children may be positively influenced by parental inductive disciplining, as parents who point out the possible consequences of their children's behavior for others are expected to promote an internal moral orientation (Berkowitz & Grych, 1998; Hoffman, 2000; Walker & Hennig, 1999). There is indeed empirical evidence showing that parental disciplining in lower class families is characterized by more power assertive control, whereas middle class parents tend to use authoritative and inductive control strategies (Bradley & Corwyn, 2002; Hoff, Laursen, & Tardif, 2002). In addition, several studies have shown low SES to be associated with higher rates of juvenile delinquency (see Lipsey & Derzon, 1998; Hawkins et al., 1998).

As socioeconomic status has been shown to be associated with both moral judgment and juvenile delinquency, it is important to determine the extent to which socioeconomic status moderates the moral judgment-delinquency relation and may even account for it. For instance, it is possible that SES moderates the relation between moral judgment and juvenile delinquency due to less variation in moral judgment or delinquency among adolescents from higher socioeconomic backgrounds, which would yield lower effect sizes in the high SES group. Moreover, adequate economic resources might buffer the effect of lower moral judgment on delinquency. Finally, SES can moderate the relation between moral judgment and juvenile delinquency, as adolescents from higher socioeconomic backgrounds may have been more induced by their parents to use justice-based moral judgments in moral decision making (see Thoma, Rest, & Davison, 1991).

Culture

“Culture” or “cultural background” may serve as a rubric for factors such as family dynamics or parenting style that may moderate moral judgment developmental delays among delinquents. Authoritative or inductive-discipline parenting styles have been found to relate to more mature levels of moral judgment (e.g., Nevius, 1977; Olejnik, 1980), presumably through the provision of greater opportunities for social perspective-taking (Speicher, 1996; Walker, Hennig, & Krettenauer, 2000). More authoritarian and power-assertive parenting styles have been found to be more prominent in immigrant families with a non-Caucasian background (Wissink, 2006). Although physical discipline may be adaptive or at least less deleterious in some cultural contexts (Dodge, McLoyd, & Lansford, 2005), such parenting factors have been generally associated with delinquency (Kazdin, 1995) and may render children at risk for impoverished perspective-taking and delayed moral judgment. Several reviews (e.g. Daly & Tonry, 1997; Smith, 2005; South & Messner, 2000) indicated that adolescents with non-indigenous or cultural minority backgrounds are overrepresented in delinquent samples. Culture, like socioeconomic status, appears to be related to both moral judgment and juvenile delinquency. Accordingly, it is important to conduct analyses to assess the extent to which culture moderates or perhaps even totally accounts for the moral judgment-delinquency relation. It is possible that discrimination, social exclusion and other unfavourable life circumstances, such as unemployment (Farrington, Gallagher, Morley, Ledger, & West, 1986), of many youths from cultural minority groups (e.g., immigrant youth) create tensions and stresses that interact with the influence of moral judgment on delinquency.

Gender

Clear gender differences in juvenile delinquency exist with respect to prevalence rates, type of delinquency and antecedents, although moral judgment stage differences are less consistently found. Delinquency is far more common among boys than among girls (Mullis, Cornille, Mullis, & Huber, 2004; Snyder & Sickmund, 1999), crimes committed by female offenders are generally less violent and serious than those committed by male offenders (Acoca, 1999), and different risk and protective factors have been shown to be associated with male and female juvenile delinquency (Mullis et al., 2004). For instance, many female delinquents have been victims of physical, sexual and emotional abuse or exploitation, which might enhance identification with the victim and foster perspective-taking abilities, possibly resulting in higher levels of moral judgment (Stams et al., 2006). In the transition to adolescence and early adolescent years, females have been found to reach moral judgment Stage 3 sooner than males (Garmon, Basinger, Gregg, & Gibbs, 1996; Silberman & Snarey, 1993; Walker, 1984; cf. Jaffe & Hyde, 2000), although this difference tends to disappear by late adolescence or adulthood. Notably, gender differences may moderate the relation between moral judgment and delinquency, as boys and girls have been shown to differ in care and justice orientation (Jaffee & Hyde, 2000), which could have an impact on the utilization of justice-based moral judgments in moral decision making (see Thoma et al., 1991). Given the relation of gender to both moral judgment and delinquency, delinquent-nondelinquent comparisons should of course be conducted within gender.

Age of the adolescent

Age is one of the most important variables to take into account, since moral judgment develops over time. Moral judgment delay renders youths vulnerable, given (as noted earlier) increases in ego capabilities as well as peer influences from early to middle and late adolescence (Hart & Carlo, 2005). Accordingly, the incidence of delinquent behavior is much higher during late adolescence than during early or middle adolescence (Donker, Smeenk, van der Laan, & Verhulst, 2003; Moffitt, 1993). That moral judgment delay may have a greater effect on delinquent behaviour with increasing age is consistent with more pronounced moral judgment differences in late-adolescent delinquent and nondelinquent samples (Brusten, 2003; Gibbs et al., 2006).

Intelligence

Intelligence or cognitive capacity also pertains to moral judgment level and delinquency. Consistent with a cognitive-developmental approach to moral judgment, greater intelligence and higher-level education, reflecting greater capacity for abstract thinking, have been shown to be related to more advanced stages of moral judgment (e.g. Colby, Kohlberg, Gibbs, & Lieberman, 1983). Low school attainment and other difficulties with school, including low intelligence, are important risk factors for delinquency (Farrington, 1996; Moffitt, Leman, & Silva, 1994; Seguin, Phil, Harden, Tremblay, & Boulrice, 1995).

Psychopathy

Offenders vary in risk factors that include mental health. In particular, antisocial personality has been identified as among the most important risk factors for delinquent behavior (Andrews & Bonta, 2003). Delinquent adolescents with psychopathic disorder (Hare, 1980; cf. conduct disordered juvenile offenders; Hill, Neuman, & Rogers, 2004) display a pattern of behavior that violates the basic rights of others and societal norms and rules. These relatively severe delinquents have a 40% increased risk for developing antisocial personality disorder (Kazdin, 1995). Several studies have found juvenile delinquents with psychopathic disorder to lag behind in moral judgment compared to other delinquent youth (Campagna & Harter, 1975; Fodor, 1973).

Type of offense

In most studies, the only criterion for participants to be categorized as delinquent is conviction of a criminal violation. The actual offenses, however, may vary considerably between and within studies, and could be a source of variation in moral judgment. For example, Kohlberg and Freundlich (1973) found that delinquents, whose offenses were related to using drugs, showed a higher stage of moral judgment compared to “regular” delinquents. It is obviously wrong to treat delinquents as a homogeneous group. Findings show that delinquents, divided into subgroups on the basis of the nature, seriousness, and motivation of their anti-social acts, differ in terms of moral judgment (Arbuthnot, Gordon, & Jurkovic, 1987).

An anomaly was reported by Petronio (1980), who found that recidivist juvenile delinquents showed higher moral judgment scores than juvenile delinquents who were not returned to court within two years after being first placed on probation. Petronio suggested that the repeat offenders may rationalize their behavior “by invoking a ‘higher’ set of moral standards that minimizes their badness” (p. 57). According to Gibbs (1991, 2003), rationalizations, cognitive distortions, or thinking errors, such as blaming the victim, and minimizing and mislabeling moral transgressions, enable delinquents to distance themselves from responsibility for their actions. As such, recidivist delinquents may have been able to continue their criminal career despite their nondelayed level of moral judgment by using guilt-neutralizing cognitive distortions—although it should be noted, in contradiction to Petronio, that morally delayed delinquents are even more likely to use cognitive distortions (Barriga, Landau, Stinson, Liau, & Gibbs, 2000).

Institutionalization

Most studies assess moral judgment in samples of convicted delinquents who are residing in institutions. It is possible that incarceration has an effect on the moral judgment of delinquents (Emler & Reicher, 1995). It is difficult, however, to determine whether a lower stage of moral judgment causes delinquent behavior, or merely reflects the individual's criminal environment. Less mature moral judgment by institutionalized delinquents could result from a negative and self-centered moral atmosphere of the institution or the deviant peer group one is engaged in (Colby & Kohlberg, 1987). Incarcerated delinquents may also lag behind in their moral development due to little opportunity for self-determination or social perspective-taking. Dishion, McCord and Poulin (1999) suggested that aggregation of deviant peers during early adolescence could inadvertently reinforce problem behavior by means of deviancy training, i.e., positive reinforcement of antisocial attitudes. Hence, moral development may be negatively influenced by aggregation of juvenile delinquents in correctional facilities, especially in cases of extensive commitment (of course, theses more severe cases may entail greater moral judgment delay at the outset). It is also possible that moral judgment may influence judicial decisions about incarceration, that is, adolescents may be sent to detention on the basis of the moral arguments they offer for their norm transgressive behavior (Hendriks, Rutten, Stams, & Brugman, 2006; Tarry & Emler, in press), which undermines the interpretation of an institutionalization effect on moral judgment. Therefore, it is also necessary to include an incarceration or adjudication factor in the analysis of moral judgment differences between delinquents and non-delinquents.

Comparison group (Matched/Clean)

Socioeconomic status, cultural background, gender, age, intelligence, and educational level have been designated as important confounders of the relation between moral judgment and delinquency (e.g. Blasi, 1980; Nelson et al., 1990). Therefore, a comparison group should be matched as closely as possible to these variables. Adequate matching for age, IQ, and educational level, to name a few variables, may increase the risk of selecting adolescents in the comparison group who have not been caught for a criminal offence or who have a criminal record. It creates the so-called problem of a polluted or “non-clean” comparison group. As Blasi (1980) pointed out, however, precautions can be taken to reduce chances of delinquent individuals appearing in the comparison group by using parent reports, school records and school personnel reports of delinquent behavior. But does it solve the problem?

Some argue that controlling for covariates such as age and intelligence may not be a good idea, as it obscures true group differences. The fact that moral judgment correlates with age, intellectual ability, and educational level supports the construct validity of moral judgment (Basinger, Gibbs, & Fuller, 1995). Studying moral judgment with these variables covaried out is like studying basketball ability with height covaried out (see Miller & Chapman, 2001). It is true that height is relevant to basketball ability, and in the same vein, age, intelligence, and educational level should be considered as relevant to moral judgment. It is even true that excessive use of covariate control removes legitimate construct variance. Nonetheless, it remains important to match for these and other variables when examining differences in moral judgment between delinquent and nondelinquent adolescents.

Assessment method of moral judgment: production and recognition measures

Piaget was, in 1932, the first to chart progressive transformations in the child's verbal reflective understanding of moral rules and justice using clinical interviews. Kohlberg substantially revised and extended Piaget's stage sequence and went on to design a standardized production measure of moral judgment: the Moral Judgment Interview (MJI). During this interview, a child's stage of moral judgment is assessed through administration of hypothetical moral dilemmas in which acts of obedience to laws, rules, or demands of authority conflict with the needs and welfare of others. The dilemmas were designed to bring about judgment regarding a variety of moral issues (Jurkovic, 1980). Responses can be coded in terms of moral stage scores that are directly linked to Kohlberg's stage sequence.

Kohlberg's MJI is lengthy to complete and must be administered individually. Its coding system is very labor intensive, due to the need for extensive training to administer and score. Hence, some researchers have sought to develop more practical methods for assessment. Rest (1975) developed the Defining Issues Test (DIT), which is a multiple-choice alternative to the MJI. Because the DIT is easy to use, it quickly became the most popular instrument for measuring moral judgment. The DIT uses items from stages 2 to 6, but focuses mainly on the development from the conventional level (stages 3 and 4) to the post-conventional level (stages 5 and 6). Responses are coded in the P-score, which stands for the percentage of postconventional judgment. Recently, the DIT has been updated, which resulted in the DIT2 (Rest, Narvaez, Thoma, & Bebeau, 1999).

Gibbs, Widaman and Colby (1982) developed the Sociomoral Reflection Measure (SRM), which is a written production measure that can be administered in groups. The Sociomoral Reflection Objective Measure (SROM) was derived from the MJI (Gibbs et al., 1984). Later, the SRM and SROM were replaced by Short-Form versions: the SRM-SF (Gibbs, Basinger, & Fuller, 1992) that no longer contains moral dilemmas and the SROM-SF (Basinger & Gibbs, 1987). All instruments contain items based on stage 1 to 4, and focus on the transition from the preconventional level (stages 1 and 2) to the conventional level (stages 3 and 4), reconceptualized as immature and mature by Gibbs (2003). Responses can be coded in terms of global moral stage scores (ranging from 1 to 4) that are directly linked to Gibbs’ (2003) revision of Kohlberg's stage sequence.

Gibbs et al. (2006) emphasize that stage 3 (ideal moral reciprocity or mutuality in relationships) represents a major achievement in moral development, as it is based on the coordination of social perspective-taking, implying mature moral understanding. Stage 3 is already prominent during early adolescence, and becomes the modal moral judgment stage in late adolescence (Gibbs et al., 2006). Stage 4 (social systems) starts in late adolescence and extends stage 3 mature understanding beyond the interpersonal sphere to encompass complex social interactions in social institutions. Thus a stage 4 perspective would make it more difficult to engage in antisocial acts involving strangers.

In sum, the instruments for assessing moral judgment can be divided into two groups: production measures like the MJI and the SRM(-SF), in which subjects respond to open questions that are coded later, and recognition measures like the DIT and the SROM(-SF), where subjects evaluate presented judgments. Blasi (1980) noted an interesting difference: whereas nine out of eleven studies using production measures found that delinquents reason at a lower moral judgment stage than nondelinquents, the three studies using recognition measures showed no difference in moral judgment between the two groups. Gavaghan, Arnold and Gibbs (1983) investigated this apparent disparity and substantiated Blasi's inference that production measures are more likely than recognition measures to distinguish delinquents from nondelinquents.

This difference favoring production measures may reflect a greater relevance of production to situational behavior. According to Colby and Kohlberg (1987), comprehending and evaluating moral judgment should not be regarded as the same process as the spontaneous production of moral judgment. Referring to Vygotsky's zone of proximal development, Gavaghan et al. (1983) suggested that some adolescents may recognize conventional stage 3 judgment, while they are not yet able to produce verbal explanations in terms of stage 3 judgment, or to apply stage 3 judgment to new, concrete situations. It is necessary that judgments be produced spontaneously in actual situations in order to guide action. In general, the relation between moral judgment and delinquent behavior may be stronger if assessed with production in stead of recognition measures.

Probing method: Dilemma (hypothetical and real-life) versus dilemma-free procedures

Most instruments assessing moral judgment present the subject with a series of hypothetical dilemmas, like the one about Heinz, who should or should not steal an expensive drug to save the life of his wife. These dilemmas should bring about the highest stage of moral judgment, as there is no interference from preconceptions (Walker, de Vries, & Trevethan, 1987) or practical circumstances (Higgins, Power, & Kohlberg, 1984). One might argue, however, that hypothetical dilemmas are about unfamiliar and irrelevant situations, allowing the subject to judge without identification or involvement, which may reduce the usefulness of moral judgment assessment. Kohlberg, Scharf and Hickey (1972) found a disparity between competence and performance among prisoners in traditional prisons, i.e., the inmates scored at lower stages in response to prison dilemmas than they did in response to dilemmas on the MJI. Even when real-life dilemmas are used, these are still dilemmas that are designed by the researchers, and only presumed to be real-life dilemmas for the participant. It might be more appropriate to have participants reason in terms of real-life moral practices and values. As such, a dilemma-free assessment method for presenting moral values, like the SRM-SF, may have greater ecological validity, as participants can supply the appropriate situational content in terms of their daily lives and cultural background (Gibbs, Basinger, & Fuller, 1992).

To summarize, in the present study we address two questions. First, we examine the evidence regarding the hypothesized moral immaturity of juvenile delinquents on a more extensive sample of studies than in a previous meta-analysis. Second, we determine to what extent moderators, that is, sample characteristics and factors concerning the measurement of moral judgment, affect the link between moral judgment and juvenile delinquency.

Method

Sample of studies

Multiple search methods were used in order to avoid biased retrieval of studies published in the major journals, which may selectively publish only the results characterized by lower p values and larger effect sizes (Rosenthal, 1995). First, we used computerized databases: PsycLIT, PsycInfo, ERIC, Medline, Psychological Abstracts and Dissertation Abstracts. No specific year was indicated, and the following key words were used for searching in varying combinations: justice, moral*, moral judgment, moral reasoning, adolescen*, delinq*, criminal, offend*, youth, juvenile. In the second step, the reference lists from Nelson et al. (1990), Smetana (1990) and Blasi (1980) were used. The third step included a search in reference sections of those studies that were drawn from the databases to identify citations that did not appear so far. Finally, authors were contacted who were likely to have produced or to know of recent studies.

This yielded hundreds of studies. The studies were inspected and included if they met the following criteria: (a) studies measuring moral judgment mainly in terms of justifying prescriptive social decisions or values by appeals to justice or fairness or related considerations of right and wrong (Kohlberg, 1958; Gibbs, 1979; Rest, 1975), not prosocial moral judgment (Eisenberg, 1986), care judgment (Gilligan, 1982; Skoe et al., 1999) or shame and guilt (Tangney, 1995); (b) studies using an appropriate comparison group of nondelinquents; (c) mean age of the subjects should be between 10 and 20 years; (d) delinquency should be officially established, so for instance studies reporting on conduct problems in the classroom (Bear & Richards, 1981), students with learning disabilities (Fincham, 1977), and subculture groups (Alterman, Druley, Connolly, & Bush, 1978) were not included; and (e) studies should not report data that had already been reported in other studies included in the present meta-analysis. Copies of all suitable journal articles, unpublished papers and dissertations were obtained to enable us to compute effect sizes. In the few cases that an author failed to report enough statistical information to compute an effect size, attempts were made to contact the author and ask for additional information. Where it was impossible to trace the author (sometimes because of the age of the study), effect sizes were estimated by using the reported significance p. No studies were excluded from the meta-analysis on the basis of flawed design. Instead, impact factor and publication status were included as moderators in the analyses as indicators of study quality (Mullen, 1989). Compared with the meta-analysis conducted in 1990 by Nelson et al., which encompassed 15 studies, the final sample of the current meta-analysis included a much larger number: 50 studies.

Coding the studies

Each study was coded using a detailed coding system for recording publication, sample design, and measurement characteristics. The intercoder reliabilities between two coders were satisfactory, Kappa >.80. As publication characteristics we coded publication status (journal articles were coded as published), impact factor (0–2 = low to medium; >2 = high) of the journal (Brainerd, 2006), and year of publication (published before or after 1980, when Blasi wrote his extensive review).

Socioeconomic status (lower or lower-middle class), cultural background (Caucasian white or mixed/all non-indigenous), gender (males only, part or all females), age (early/middle or late adolescents, with age 15 marking the transition to late adolescence), intelligence (low or average, with the standard score of 100 being the natural cut-off criterion), psychopathy (“yes” if the sample contained juvenile delinquents with psychopathic disorder as diagnosed with the Psychopathy Check List (Hare, 1980), “no” if it did not contain delinquents with psychopathic disorder), type of delinquency (petty crime, violent/serious delinquency or combined), institutionalization (if more than half of the subjects were incarcerated, they were coded as institutionalized), period of incarceration (less than 6 months versus 18 months or more) were included as sample characteristics.

The following design and measurement characteristics were included: matched comparison group (a study was coded as “matched” if delinquents and nondelinquents were matched on at least age, IQ, educational background or SES), clean comparison group (if the author reported a reliable form of checking whether the comparison group contained delinquents—like school reports—the comparison group was coded as “clean”), blind scoring procedure, assessment method (production, recognition or combined measures) and probing method (dilemma or dilemma-free). We did not code type of dilemma, as there were not enough studies examining differences in moral judgment between delinquent and nondelinquent adolescents by means of real-life dilemmas. The only study comparing delinquent and nondelinquent youth answering to hypothetical and real-life dilemmas found that both groups scored lower on the real life than the hypothetical dilemmas. As expected, delinquent adolescents attained a lower level of moral judgment than nondelinquent adolescents, showing a similar disparity between real life and hypothetical moral judgment as nondelinquent youth (Trevethan & Walker, 1989).

Data analysis

The outcomes of all studies were transformed into Cohen's d, the standardized difference between delinquents and nondelinquents, using Mullen's (1989) advanced basic meta-analysis program. In some cases effect sizes were computed on the basis of reported means and standard deviations. Where only the p, t or F was reported, these statistics were used. When a study did not provide statistical information necessary to calculate an effect size, but reported a nonsignificant difference, an effect size of zero was assigned based on a one-tailed p of .50 (Z=0.00). This is a commonly used and conservative strategy, which generally underestimates the true magnitude of effect sizes (Durlak & Lipsey, 1991). The exclusion of these nonsignificant results from the meta-analysis, however, would result in an overestimation of the magnitude of the combined effect size estimate (Rosenthal, 1995).

The resulting set of effect sizes was inserted into Borenstein, Rothstein, and Cohen's Comprehensive Meta-Analysis (CMA) program (2000) that computes fixed as well as random effect sizes. In a fixed effect model, the significance testing is based on the total number of participants, whereas in a random effect model, the significance testing is based on the total number of studies included in the meta-analysis. The former model allows greater statistical power, but has limited generalizability. In contrast, a random effect model allows increased generalizability, but has lower statistical power (Rosenthal, 1995). Because an unbiased d removes the effect size bias that is caused by a small study sample size, we computed the unbiased d for random effects. CMA also computed the significance of the effect sizes, the homogeneity, the analysis of variance, and confidence intervals around the point estimate of an effect size. Because all studies proposed directed hypotheses predicting that delinquents would show less advanced moral judgment, we present the 95% confidence intervals (with one-tailed alphas set at .05).

When there is homogeneity, estimates of effect sizes will differ only because of unsystematic sampling error. In case of heterogeneity, however, one needs to be careful when interpreting data. Heterogeneity indicates that one should look for moderators explaining variability in the sample of studies. This can be done by conducting analysis of variance and subsequently multiple regression analysis in order to establish whether significant moderators are unique.

The file drawer problem is the (well-supported) suspicion that studies included in a meta-analysis are not likely to be a random sample of all studies that were conducted. In other words, published studies are the ones that have achieved statistical significance and that are easy to find. The unpublished studies remain tucked away in file drawers because they did not have a significant outcome. Since it is not possible to search every existing file drawer for these studies, the degree of damage to the conclusion of a meta-analysis that a file drawer problem could cause should be estimated. One can calculate the number of studies with null results that must be in the file drawers before the overall probability of a Type 1 error can be brought to any desired level of significance, for instance p=.05. If the overall level of significance of the meta-analytic review is brought down by the addition of just a few more null results, the finding is not resistant to the file drawer threat (Rosenthal, 1995). We used Rosenthal's (1991) formula for calculating the fail-safe number's critical value: the amount of hypotheses times 5, plus 10.

Outlying effect sizes were identified on the basis of standardized z values larger than 3.3 or smaller than −3.3 (p<.005; Tabachnick & Fidell, 2001). Once identified, studies with outlying effect sizes can be deleted from the meta-analysis, or the effect size of the study can be reduced to fall within the normal range, for instance by recoding it to equal the next highest result (or lowest result, depending on the position of the outlier).

Results

The 50 studies included in this meta-analytic review reported data on N=4814 subjects (n=2316 delinquents and n=2498 nondelinquents). A list of all studies with effect sizes and important moderators is provided in Table 1.

Table 1 Study characteristics

The overall distribution of effect sizes is presented in Table 2. Campagna and Harter (1975) was identified as a study with an outlying effect size, namely d=2.45. We decided, however, to keep this study in the meta-analysis, and to recode the effect size to the next largest result in the normal range, which was that of Al-Falaij (1991): d=1.56. In interpreting the magnitude of effect sizes, generally accepted conventions formulated by Cohen (1988) were used. Effect sizes of d=.20, d=.50, and d=.80 were considered as indices of small, medium, and large effects, respectively.

Table 2 Stem and leaf plot of effect sizes for differences in moral judgment (d) between delinquent and nondelinquent adolescent

The meta-analysis yielded a large and significant overall effect size of d=.76, p<.001 (CI .63 <d<.88, random model), indicating lower moral judgment scores for delinquents compared to nondelinquents. The random model fail safe number was 2922, meaning that it would take more than 2922 studies tucked away in file drawers showing no effect to reduce the overall effect size to nonsignificance. This fail safe number was larger than Rosenthal's critical value (50×5+10=260), which suggests that there was no file drawer effect.

As the set of effect sizes proved to be heterogeneous, Q(49)=184.19, p<.001, it was important to conduct moderator analyses in order to explain the variation in effect sizes among studies (Mullen, 1989). Moderator tests were performed, using categorical testing procedures as described by Lipsey and Wilson (2000). These procedures are analogous to analysis of variance (ANOVA). Categorical testing yields two homogeneity estimates, a within groups Q (Q w) and a between groups Q (Q b). A significant Q w indicates that the effect sizes within each moderator variable category are heterogeneous. A significant Qb indicates that the subgroups of effect sizes are significantly different from another.

As can be derived from Table 3, a series of ANOVA's yielded 10 significant moderators. A somewhat larger effect for published studies (d=.79) than for unpublished studies (d=.70) was found: Q(1)=8.20, p<.01. The gender effect indicated that larger effect sizes were found in male samples (d=.82) than in female or mixed samples (d=.64): Q(1)=10.06, p<.01. Late adolescence (d=.78) was associated with a larger effect size than early or middle adolescence (d=.65): Q(1)=3.91, p<.05. The ANOVA for intelligence showed that effect sizes were large (d=.96) for comparisons including delinquents with low intelligence, and medium (d=.50) for comparisons including delinquents with average intelligence: Q(1)=32.45, p<.001. The ANOVA for psychopathy showed that differences in moral judgment between delinquents and nondelinquents were very large for studies including delinquents with psychopathic disorder (d=1.16) in contrast with studies that did not include delinquents with psychopathic disorder (d=.72): Q(1)=9.57, p<.01. The effect of institutionalization indicated that studies with incarcerated delinquents showed larger differences in moral judgment than studies with delinquents who were not incarcerated: d=.82 (large) versus d=.52 (medium), respectively: Q(1)=27.06, p<.001. The institutionalization effect was largest when delinquents were incarcerated for a long period, that is, 18 months or more (d=1.34). When delinquents were incarcerated for a relatively short period, namely, 6 months or less, the effect size was d=.71: Q(1)=10.19, p<.01.

Blind scoring (d=.78) yielded a smaller effect size (d=.92) than non-blind scoring procedures: Q(1)=3.90, p<.05. The effect of assessment method indicated a larger effects size for production measures than for recognition measures or a combination of both measures: production d=.86, recognition d=.60, and combined d=.71: Q(2)=18.18, p<.001. Finally, the dilemma-free probing method (e.g. SRM-SF) yielded somewhat larger differences in moral judgment between delinquents and nondelinquents than the standard dilemma method (e.g. MJI or DIT): d=.78 versus d=.74, respectively: Q(1)=4.55, p<.05.

In the regression analysis, moderators that were identified as significant in the categorical testing procedure were entered as predictor variables to test whether they accounted for unique variance in the unbiased effect size d. Missing values were imputed employing the expectation maximalization (EM) algorithm (Bernaards & Sijtsma, 1999). EM is considered to be adequate when values are missing at random. Period of incarceration was not entered into the regression analysis because of an extremely high percentage of missing values (80%). Also, the scoring procedure (blind or not) was not entered into the regression analysis for the reason that coding the 40% missing values for this variable as non-applicable would result in contamination problems, since this category would coincide with recognition measures (assessment method). Finally, assessment method of moral judgment (production, recognition, and combined measures) was transformed into a set of two dummy variables.

Table 3 ANOVA's for studies comparing juvenile delinquents to nondelinquent youth

Preliminary analyses showed that the two dummy variables for assessment method were highly correlated (r=−.84). Apart from this, most correlations between moderators were small, with few exceptions of variables that were moderately correlated (range: .01<r<.38). Nine predictors (two dummy variables for assessment method, probing method, publication status, gender, age, psychopathy, institutionalization, and intelligence) were entered into the regression analysis, yielding a significant regression equation, Q(9,40)=21.42, p<.05. The moderators accounted for 30% of the variance in study effect sizes. Two variables emerged as significant predictors from the regression analysis: psychopathy (b=.34, p<.05) and institutionalization (b=.35, p<.05). We controlled for multicollinearity by adding error variance to each of the nine predictors, repeating the regression analysis. Notably, in case of multicollinearity only small differences in the data can yield differences in the estimation of parameters. The second regression analysis yielded only marginally different unstandardized regression coefficients, indicating that multicollinearity was not likely to be a threat to the results.

Discussion

The first purpose of this study was to replicate Nelson et al.'s (1990) meta-analysis that was conducted in 1990, and to test the hypothesis about the moral judgment immaturity of juvenile delinquents on a more comprehensive set of data. The second purpose of this study was to identify factors that could affect the relation between moral judgment and juvenile delinquency. We tripled the number of studies that were used by Nelson et al. (15 studies) by finding 35 additional studies. The present study replicates Nelson et al.'s finding that the moral judgment of juvenile delinquents is substantially lower compared to that of nondelinquents, with a large overall effect size of d=.76.

Moderator analyses showed that effect sizes were large for comparisons involving male offenders, late adolescents (15 years of age and older), delinquents with low intelligence, and incarcerated delinquents. The largest effect sizes were found for period of incarceration and comparisons involving juvenile delinquents with psychopathic disorder. Production measures (e.g. SRM-SF, MJI) instead of recognition measures (e.g. DIT, SROM-SF), dilemma-free assessment methods, and non-blind scoring procedures yielded relatively large effect sizes. Dilemma-based assessment also yielded a fairly large effect size. Effect sizes were medium for comparisons involving delinquents with average intelligence, non-incarcerated delinquents, female offenders, as well as early and middle adolescents. Type of delinquency and cultural background did not moderate the relation between moral judgment and juvenile offending. The conclusion is that moral judgment is strongly associated with juvenile delinquency, even after controlling for socioeconomic status, culture, gender, age, and intelligence.

As the effect sizes in nearly all sub-groups proved to be considerable, the main effect should be considered to be quite robust. Although the relation between such molar and heterogeneous variables as moral judgment and delinquency is complex, and the impact of moderators does complicate the relation between moral judgment and delinquency, our meta-analysis shows that the association between moral judgment and juvenile delinquency is substantive and not merely attributable to methodological variations or sample characteristics. Notwithstanding the robustness of this finding, some conditions may further compound the problems for juvenile delinquents, such as institutionalization and psychopathy.

Two moderators were identified as unique moderators, namely, institutionalization and psychopathy. Institutionalization was not coded in Nelson et al.'s meta-analysis. Studies including delinquents with psychopathic disorder were even explicitly excluded. Although Nelson et al., indeed left out Fodor (1973), other studies with samples (partly) containing delinquents with psychopathic disorder, such as Campagna and Harter (1975) and Jurkovic (1975) were included. The present meta-analysis revealed that studies including delinquents with psychopathic disorder showed a larger effect size than studies that did not include delinquents with psychopathic disorder.

Juvenile delinquents are often treated as a homogeneous group, but research shows that psychopathy is a common phenomenon in groups of institutionalized delinquents (Chandler & Moran, 1990; Jurkovic & Prentice, 1977; Lee & Prentice, 1988). It is extremely difficult to select groups of delinquents containing no trace of psychopathy, and it is therefore not realistic to leave out studies with delinquents showing psychopathic disorder. As psychopathy may be present to some degree in any group of delinquents, it could mean that psychopathy is responsible for part of the relation between moral judgment and delinquency. Research is needed to further investigate the relation between psychopathy and moral judgment in juvenile delinquents.

We found a large effect size associated with incarceration, even after controlling for age, gender, type of delinquency and psychopathy. Studies that included incarcerated delinquents showed a larger difference in moral judgment than did studies with delinquents who were not incarcerated. The effect size was extremely large for delinquents who were incarcerated during a relatively long period (d=1.34). Lengthy incarcerations are associated with offense severity and psychopathy. Apart from such person variables, an institutionalization effect is also possible. It could be that young people locked up in institutions for long periods cannot develop their moral judgment as much as they would in the outside world. Incarcerated juvenile delinquents stay in highly compromised settings where opportunity for self-determination and social perspective-taking, which are thought to promote moral judgment competency, are scarce (Lowenkamp & Latessa, 2005; McGuire, 1995). This lack of social developmental “exercise” may cause retardation in the development of moral judgment.

Incarcerated delinquents tend to be exposed to stages of moral judgment reflecting highly antisocial group norms. Within a group of delinquents, the types of offense committed can differ enormously. One cannot for a moment consider a boy stealing candy from a local store to be from the same population as a murderer. In many studies, however, delinquency ranged from petty crime to violent and serious crime, while the subjects were drawn from the same institution, and possibly even from the same treatment unit. The fact that delinquents guilty of petty crime often reside in the same institutions as delinquents having committed serious and violent crimes may well cause the petty criminal to be influenced in his (moral) thinking by the more serious offender, which then may account for part of the institutionalization effect.

Some limitations of this study should be mentioned. The overall effect size for differences in moral judgment between delinquents and nondelinquents proved to be based on a heterogeneous set of effect sizes. However, homogeneity may not be a realistic expectation, because juvenile delinquents form a rather heterogeneous population. Of course, heterogeneity of effect sizes sets limits to the generalization of our findings to the entire population of juvenile delinquents. At least, our findings need to be qualified to the extent that psychopathy and institutionalization may explain differences in moral judgment between delinquents and nondelinquents.

Publication status proved to be a significant, though not unique, moderator, indicating somewhat larger effect sizes for published than for unpublished studies. Rosenthal (1991), however, found only small differences in combined effect sizes between meta-analyses that included so-called “grey” publications—such as unpublished dissertations, research reports and papers—and meta-analyses that did not, with dissertations having the largest impact. Dissertations generally meet scientific standards and show effects that are one fifth of a standard deviation smaller than the effects of published studies (Rosenthal, 1991). Therefore, it is important to include dissertations in meta-analyses. This meta-analytic study is based on 33 published and 17 unpublished studies, including 8 dissertations.

It should be noted that all studies included in the meta-analysis were cross-sectional, which makes it difficult to establish a causal link between moral judgment and delinquency. Krebs and Denton (2005) contend that people invoke moral judgment afterwards to justify what they have done. They claim that moral judgment stages are only weakly associated with behavior, and may be used in a flexible way in accordance with different social contexts or purposes. Flexibility of stage use, for example, was found in a study by Hains (1984), who measured moral judgment in delinquents and nondelinquents first answering as themselves, and subsequently answering as if they were policemen. Both groups scored significantly higher answering as policemen.

Krebs and Denton (2005) argue that optimal moral judgment does not reflect adequacy but usefulness, that is, “creating the conditions that enable people to achieve their goals and advance their interests in cooperative ways” (p. 646). In fact, different kinds of contexts pull for different forms of moral judgment: the business world is guided (in their view) by a stage 2 moral order based on instrumental exchange (but cf. Damon, 2004), marriage is guided by a stage 3 moral order based on mutuality, and the legal system is guided by a stage 4 moral order based on maintaining society. In this view, people move in and out of moral orders, not stages of moral development.

Although Krebs and Denton cite some interesting studies with examples of flexible stage use, Gibbs (in press) and Pizarro and Bloom (2003) provide evidence that higher moral judgment competence increases resistance to low-stage pulling contexts or sets the stage for moral action. In their famous study, Oliner and Oliner (1988) found that rescuers of Jews in Nazi Europe were motivated by strong moral values learned from their parents. It should be noted that moral judgment can even impact morally relevant behavior in sudden situations if an issue, such as social equality, has been sufficiently considered (Moskowitz, Gollwitzer, Wasel, & Schaal, 1999). Results from longitudinal studies linking moral judgment with delinquent behavior, however, should be interpreted with caution (Colby et al., 1983; Walker, 1989). For example, Raaijmakers et al. (2005) and Menard and Huizinga (1994) found moral judgment and delinquency to be reciprocally related.

Most pertinent to the question of a causal relation between moral judgment and juvenile delinquency are intervention studies that target moral judgment and recidivism, showing that induced changes in moral judgment affect delinquent behavior (e.g., Arbuthnot & Gordon, 1988). Although Leeman, Gibbs and Fuller (1993) found no immediate moral judgment intervention effects, moral judgment gains after one year following release were significantly associated with decreased recidivism. Finally, an extensive meta-analysis of intervention studies aimed at increasing moral judgment competence in delinquent and non-delinquent (pre)adolescents and adults yielded consistently medium to large effect sizes (Lind, 2002). This finding is in line with a recent meta-analysis by Wilson, Bouffard, and Mackenzie (2005), showing the positive impact of cognitive-behavioral programs for offenders. These programs are based on the idea that offenders act like criminals because they think as criminals. In particular, reductions in recidivism were observed for interventions focusing on moral functioning, including moral judgment, and cognitive-restructuring programs. Therefore, if interventions for offenders are directed at moral functioning, the focus on moral cognition seems warranted. Notably, the overall effect size estimated from the present meta-analysis of moral judgment and juvenile delinquency is larger than the effect sizes for both cognitive (d=.48) and affective empathy (d=.11) that were reported in a recently published meta-analysis of empathy and offending conducted by Jolliffe and Farington (2004), and larger than the medium effect size for the relation between cognitive distortion (hostile attribution of intent) and aggressive behavior (see meta-analysis by Orobio de Castro, Veerman, Koops, Bosch, & Monschouwer, 2002).

This comprehensive meta-analysis reveals a significantly lower stage of moral judgment for juvenile delinquents compared to that of juvenile nondelinquents. By late adolescence, the superficiality and self-centeredness of immature moral judgment may become criminogenic. Furthermore, the present study has shown institutionalization and psychopathy to be associated with the outcome of studies finding the moral judgment of juvenile delinquents delayed relative to that of nondelinquents. Research into these two moderators would be a first necessary step towards greater investigation of the criminogenic processes associated with delay in moral judgment development of juvenile delinquents. Offender psychopathy and institutionalization effects may also moderate the impact of moral judgment remedial programs (e.g., Potter, Gibbs, & Goldstein, 2001), and hence should be taken into account in analyses of treatment outcome.