Callous unemotional traits (C/U) refer to specific deficiencies in affective experience (absence of guilt, constrictive display of emotion) and interpersonal (failure to show empathy, callous use of others for one’s own gain) style (Cooke et al. 2006; Fanti et al. 2009; Frick and White 2008; Kimonis et al. 2008; Munoz et al. 2011). C/U traits are one of at least three dimensions which consistently emerge in the construct of adult (Cleckley 1976; Hare 1993) and adolescent psychopathy (Andershed et al. 2002; Forth et al. 2003; Lynam 1997) whether using teacher, parent, self-report, or clinical ratings (Frick and White 2008). Furthermore, there is evidence from a number of studies, including longitudinal studies, that C/U traits are relatively stable from late childhood to early adolescence when measured using self- or parent report (e.g., Munoz and Frick 2007; Obradović et al. 2007).

C/U traits are important for designating a distinct subgroup of antisocial and delinquent adolescents and preadolescents (Essau, Sasagawa, & Frick 2006; see Frick 2006; Frick and Marsee 2006; White and Frick 2010 as cited in in Salekin and Lynam 2010, for reviews), whose causal processes leading to their antisocial behavior operate differently to those characteristic of other antisocial youth (Kimonis et al. 2008). These individuals show a more severe, stable, and aggressive pattern of behavior (Kahn et al. 2012) which is more premeditated and instrumental in nature (Pardini et al. 2003). They are also at increased risk for early onset delinquency, and later antisocial behavior (Frick and White 2008). Furthermore, these young people are at increased risk for poorer response to treatment (Frick and Dickens 2006).

Given that C/U traits are one component of the features indicative of adult psychopathy (Cooke and Michie 2001; White and Frick 2010 as cited in in Salekin and Lynam 2010), and are more associated with the childhood onset trajectory of severe conduct problems, the potential importance of identifying those with C/U before the conduct problems and aggression become too severe is critical. This assumes greater importance given there is evidence of malleability of levels of C/U traits during adolescence (Fontaine et al. 2010). Distinguishing between those characterised by childhood onset severe conduct problems and those by adolescent onset could help understand the developmental processes involved (Roose et al. 2011) and allow for preventive intervention (Frick and White 2008).

Given the importance of C/U traits for understanding antisocial children and adolescents, and differentiating within these groups, there is a need for an efficient, reliable and valid measure of these traits (see Essau et al. 2006; Kimonis et al. 2008; Roose et al. 2011). The two most widely used measures for most of the past research, the Antisocial Process Screening Device (APSD: Frick and Hare 2001) and the PCL-YV (Forth et al. 2003), have a number of limitations in their assessment of C/U traits. First, both assess a number of dimensions of psychopathy and the C/U dimension is therefore only one of a number of subscales. Frick and White (2008) have argued that the burgeoning research on C/U traits clearly demonstrates the need to develop assessments that separate these traits from other antisocial dimensions. Second, each of the APSD and PCL-YV also possess only a limited number of items (APSD n = 6 and PCL-YV n = 4) that measure C/U, which in the case of the APSD probably contributes to the moderate internal consistency reported in many studies (Essau et al. 2006). Third, all but one of the APSD items are positively worded, therefore giving rise to the possibility that ratings could be influenced by a specific response set. Finally, the PCL-YV, which has primarily been used with incarcerated adolescents, utilizes a 60–90 min interview format and requires a review of the respondent’s offence records (Kimonis et al. 2008).

In an attempt to overcome the limitations evident in the APSD and PCL-YV, the Inventory of Callous and Unemotional Traits (ICU: Frick 2004) was developed. This 24-item self-report measure assesses three aspects of C/U traits: Uncaring, Callousness, and Unemotional using a four-point Likert scale (0 = Not at all true, 1 = Somewhat true, 2 = Very true, 3 = Definitely true). Three factors (i.e., Uncaring, Callousness, and Unemotional) loading onto a higher order C/U dimension have consistently emerged with a range of samples: 13 to 18 year old German adolescents (Essau et al. 2006); 12 to 20 year old American adolescent offenders (with 22 of the 24 ICU items) (Kimonis et al. 2008); 12 to 18 year old Greek adolescents (Fanti et al. 2009); and 14 to 20 year old Belgian adolescents and young adults (Roose et al. 2011).

The content of the ICU is based on the APSD (Frick and Hare 2001) C/U scale, which consists of six items. The basis of the ICU is the four items of the APSD that loaded consistently on its C/U scale, in both clinical and community samples (i.e., “Feels bad or guilty when he/she does something wrong”, “Does not show feelings or emotions”, “Is concerned about the feelings of others”, and “Is concerned about how well he/she does at school or work”) (see Frick et al. 2000). Three positively worded and three negatively worded items were developed for each of these original items, which resulted in the current 24 items. Of the 24 items, 12 are reverse scored. Currently, there are Youth Self-Report, Parent Report, Teacher Report, Parent Report (Preschool), and Teacher Report (Preschool) versions of the ICU. Internal reliabilities have ranged between .77 and .81 suggesting satisfactory reliability. Thus, there is a growing body of evidence supporting the ICU to be a promising assessment instrument.

To date, however, there appears to have been comparatively few applications of the ICU with younger children, with most work instead being conducted using the APSD. Thus, the validation of the ICU for use with younger children is necessary if the development of preventive interventions is to be forthcoming.

This current study tested the fit of the factor structure that was established previously in samples of European and American adolescents. We include assessment of the extent to which different hypothesised measurement models account for responses, and the extent to which the measure is equivalent across males and females and across younger and older children. Finally, we report on the effects of these two variables (gender, school-stage) on the measure’s subscale scores.

Method

Preliminary Validation of the ICU

Our first step was to conduct a preliminary examination of The Inventory of Callous Unemotional traits (Frick 2004) for use with younger children. Therefore, to check its face and content appropriateness it was presented to 20 postgraduate students in the final year of their Master of Educational Psychology professional training degree programme and three registered psychologists (5 to 15 years experience) employed in behavior centres, which cater for children aged 7 to 13 years with behavioral problems. Sixteen of the 23 commented on the inappropriateness of words in two of the items and as a result some rewording occurred. Specifically, the item “I do not feel remorseful when I do something wrong” was changed to “I do not feel sorry when I do something wrong”. In addition, the item “I care how well I do at school and work” was changed to “I care how well I do at school.” The former item was changed to address possible difficulties with remorseful, while the latter item was altered because young children rarely, if ever, work.

The four point response format anchored with the descriptors Not at all true (scored 0), Somewhat true (= 1), Very true (= 2), and Definitely true (= 3) was perceived as appropriate.

Thirty children (15 males and 15 females) aged from 7 to 13 years (M = 11.8 years) randomly selected from each of Grades three, four, five, six and seven from two separate primary schools (15 per school, 3 from each year group) participated. Schools were in the metropolitan area of the Western Australian capital city of Perth, and selected to be representative of low and middle socioeconomic status areas as determined by an index defined at the postcode level from the Australian Bureau of Statistics (2003).

Prior to the research being initiated, permission was obtained from the Human Research Ethics Committee of the administering institution. Following approval, the principals of the schools were invited to randomly select children from the identified grade levels and obtain parental permission for these children to participate in the preliminary validation study. All participants completed the ICU in their regular classrooms.

The readability levels of the 24-item ICU scale was measured using The Flesch-Kincaid Grade Level (i.e., the number of years of education required to understand a standard reading passage) and The Flesch Reading Ease (i.e., the difficulty level of reading a normal reading passage) (see Flesch 1948; Microsoft Corporation 2003). The ICU was considered appropriate and comprehensible and easy (Reading Ease = 85.9; a score of 80 and above indicates an easy to a very easy reading passage) for Australian school students enrolled in Grade 3 (Flesch-Kincaid Grade Level; age 7 years and above).

Item analyses were conducted using Kline’s (2000) dual criteria: (a) a satisfactory q-value of between .2 and .8 for item affectivity and (b) a correlation of the item with the total score beyond .3 for item discrimination. Four items were subsequently removed, namely “It is easy for others to tell how I am feeling”, “I do not like to put the time into doing things well”, “What I think is right and wrong is different from what other people think” and “I do not let my feelings control me”. This reduced the number of items to 20 (overall Cronbach’s alpha coefficient of .92).

Confirmatory Factor Analysis

Participants

Two hundred and sixty eight children (138 males, 115 females, 15 unknown) randomly selected from Grades three (age 7–8 years) to seven (age 12–13 years) in six separate primary schools in the metropolitan area of the Western Australian capital city of Perth participated. Of the 268, 39 were in Grade three, 44 in Grade four, 63 in Grade five, 47 in Grade six, and 47 were in Grade seven. The grade levels of 28 children were not reported. The sample ranged from 7.6 to 12.8 years. The schools (not included in the preliminary validation study) were located in low, low-middle and high socioeconomic status areas as determined by an index defined at the postcode level from the Australian Bureau of Statistics (2003).

For purposes of data analysis, the sample was classified as lower primary school (n = 146: Grades three to five; 7–10 years of age) and upper primary school (n = 96: Grades six and seven: 11–13 years of age). Although the specific grade levels were not known for all children (n = 28) it was possible to determine that some were lower or upper primary school.

Measure

The 20-item ICU administered to the participants comprised eight items measuring Callousness, four items measuring Emotionality and eight measuring Uncaring. Participants rated each item on a four point Likert scale anchored with the wording “Not at all true” (scored 0), “Somewhat true” (= 1), “Very true” (= 2), and “Definitely true” (= 3). A description of the ICU, its development and psychometrics was provided earlier in the introduction.

Procedure

Approval for the research was obtained from the Human Research Ethics Committee of the administering institution. Six state primary schools, two from each of low, low-middle, and high SES areas, from the metropolitan region of Perth, Western Australia were randomly selected as a representative sample of Western Australian primary school students. The principals of all six schools were then approached for permission to undertake the research and of these three principals (one from each of the SES areas) agreed to participate. An information sheet explaining the purpose and nature of the study, along with an assurance of confidentiality and a consent form were then sent home to the parents of all students in each of a number of randomly selected classes in all of the participating schools. The students and their parents were required to give consent to participate. Overall, there was a positive response rate of 63 %, which is comparable to rates obtained in other school-based studies conducted in Australia and elsewhere which have sought information on sensitive subject matter, including for example (% response rate provided) psychopathic traits (53 %: Houghton et al. 2012; 53 %: Marsee et al. 2005), fire setting (32.5–74.8 %: Dadds and Fraser 2006), gambling (46 %: Raisamo et al. 2012; 79 % Jackson et al. 2008); drug use (44.5 %: Redonnet et al. 2012); and bullying (58 %: Nathan et al. 2011).

The 20-item ICU was subsequently administered to participants by a Doctoral level researcher with full-certified psychologist board registration. Prior to administration, all participants were verbally informed about the nature of the study and again assured of confidentiality and anonymity of their responses. Participants were requested to complete the ICU without peer discussion and were informed that should they encounter any problems with the questions, they were to raise their hand to obtain support from the researcher administering the questionnaires. Each administration took approximately 20 min.

Data Analysis

First, we conducted a confirmatory factor analysis of the three-factor ICU model using AMOS 19.0. Three latent variables, each representing a factor, were modelled to be independent but correlated. We used four indices to assess the goodness of fit of a first-order measurement model: the comparative fit index (CFI: above .95 indicates good fit, above .90 indicates adequate fit), the root mean-square error or approximation (RMSEA: .05 or less indicates good fit, .08 or less indicates adequate fit), the CMIN/DF (lower than 2–3 indicates good fit: Carmines and McIver 1981) and chi-square (non-significant values represent good fit). Equivalence of the measurement model across gender and across school-stage was then evaluated. Finally, differences in mean levels of the two factors were examined across gender and age using ANOVA.

Results

The three-factor model fit indices showed mixed support for the three-factor model. The χ 2 test [χ 2 (df = 167) = 380.09, p < .001] and the CFI (.85) both indicated a poor fit of the data to the hypothesised model, but the CMIN/DF ratio (2.28) and the RMSEA (.07, 90 % confidence interval [CI]: .06, .08) indicated acceptable fit. Examining the factor loadings across all items, one was below .3 (“I do not show my emotions to others”, loading = .27). In addition, the internal reliability for the associated factor (Unemotional, α = .47) was poor. Removing the item with the low loading did not improve the scale reliability and, in fact, further reduced it (α = .44). Other measurement models pertaining to these traits (e.g., the CAIBSI: Houghton et al. 2012) cluster callous and unemotional traits together, so we tested a model where the four Unemotional items were loaded on the ICU Callousness factor. However, this did not improve fit, χ 2 (df = 169) = 417.43, p < .001, CFI = .82, CMIN/DF = 2.47, RMSEA = .07, 90 % CI: .07, .08) and the four Unemotional items all had very low loadings (.08 to .23). Therefore, this model was not accepted as a viable alternative. Consequently, we deleted the Unemotional scale and reassessed the model. This marginally decreased the levels of fit: χ 2 (df = 103) = 280.32, p < .001, CFI = .86, CMIN/DF = 2.72, RMSEA = .08, 90 % CI: .07, .09).

In order to further improve fit, we reviewed the item descriptors to evaluate whether items were similar enough to justify correlating their associated errors. In this way we correlated eight pairs of errors: “I do not care if I get into trouble” and “I do not feel sorry when I do something wrong”; “I do not care if I get into trouble” and “I do not care about being on time”; “The feelings of others are not important to me” and “I do not care who I hurt to get what I want”; “I do not care who I hurt to get what I want” and “I seem very cold an uncaring to others”; “I apologise (say I am sorry) to people I hurt” and “I try not to hurt others’ feelings”; “I try not to hurt others’ feelings” and “I am concerned about the feelings of others”; “I always try my best” and “I work hard on everything I do”; “I seem very cold an uncaring to others” and “I apologise (say I am sorry) to people I hurt”. This model achieved satisfactory levels of fit: χ 2 (df = 95) = 221.63, p < .001, CFI = .90, CMIN/DF = 2.33, RMSEA = .07, 90 % CI: .06, .08). All factor loadings associated with this model are shown in Table 1.

Table 1 Factor loadings and factor score weightsa for the revised ICU

Invariance of the First-Order Measurement Model Across Gender and Age

Invariance was assessed incrementally to examine the equivalence of factor loadings, correlations between latent factor scores, and variance in factor scores across the groups. In each case, a model constraining the two groups to be equivalent was compared to previous models. For example, the first comparison was between a model where factor loadings were constrained (to be equivalent across boys and girls) with the null model (where they are free to vary across boys and girls). Change in chi-square (∆χ2) was used to assess the relative merits of the competing models, with a significant ∆χ2 indicating that the unconstrained model should be accepted (i.e., indicating that there DO exist differences across the groups on the relevant parameters).

The results of these analyses are summarized in Table 2, and are reported in more detail below.

Table 2 Fit Indices for models assessing invariance across gender and age

Gender

There was a non-significant difference between the unconstrained model and the model constraining factor loadings to be equal across boys and girls, ∆χ2 (df = 14) = 10.10, p = .755. This indicates that boys and girls do not differ in this regard. We then compared this constrained model, which still allowed factor score variances to differ across boys and girls with one where those variances were constrained. Again, no difference was evident across the two groups: ∆χ2 (df = 2) = 1.03, p = .598. Finally, we compared this (constrained factor loadings and constrained factor variances, but unconstrained factor covariance) with one where the covariance between both factors was constrained. This comparison provided support for the existence of a gender difference: ∆χ2 (df = 1) = 9.08, p = .003. The correlation between the two factors was stronger for boys (r = .88) than for girls (r = .58).

Age

Two age groups were created by comparing students in Grades 3, 4, and 5 (N = 146) with those in Grades 6 and 7 (N = 94) as no other split created groups of adequate size to permit the multiple groups analyses. There was a non-significant difference between the unconstrained model and the model constraining factor loadings to be equal across younger and older students, ∆χ2 (df = 13) = 12.53, p = .485. This indicates that these groups do not differ in this regard.

We then compared a model where all factor loadings were constrained to be the same across younger and older participants, and which allowed the factor score variances to differ, with a model where those variances were constrained. No difference was evident across the two groups: ∆χ2 (df = 2) = 1.52, p = .467. Finally, we compared this (all constrained factor loadings except one, and constrained factor variances, but unconstrained factor covariance) with one where the covariance between both factors was constrained. This comparison provided support for the existence of a school-stage difference: ∆χ2 (df = 1) = 7.61, p = .006. The correlation between the two factors was weaker among younger (r = .66) than older students (r = .92).

Effects of Gender and Age on Factor Scores

Using the formula W = BS−1, where B is the matrix of covariances between the unobserved and observed variables, and S is the matrix of covariances among the observed variables, AMOS 19.0 calculated factor score weights for each of the items based on the accepted measurement model. To use these, each participant’s score on each item is multiplied by the factor score weight for that item, and this is then added to a similar score for the following item, and so on (see Table 1, which also shows the internal reliability of each scale). Mean scores are shown by gender and school-stage in Table 3.

Table 3 Means (and Standard deviations) by gender and school-stage

To examine the effects of gender (male vs. female) and school-stage (Grades 3, 4, and 5 vs. Grades 6 and 7) two separate two-way independent ANOVAs were conducted, one for each factor score. For Callousness, neither the main effects nor the interaction were significant. For Uncaring, there was a small, significant effect of school-stage, F (1, 205) = 6.13, p = .014, η p 2 = .03. This indicates that older children have significantly higher scores on Uncaring than younger children. Neither the gender main effect nor the gender x school-stage interaction were significant.

Discussion

The main aim of the present study was to examine the structure and correlates of C/U traits in young mainstream children using the ICU (Frick 2004). As the study involved young mainstream children rather than children in clinical or institutionalised settings we first tested the item functioning of the ICU, using affectivity (i.e., items which participants consistently find easy or difficult to endorse: see Osterlind 1989) and discrimination (i.e., the degree to which the responses obtained for a particular item correlate with the participants’ total scores on the instrument: Sax 1997; Streiner and Norman 1995) indexes. Using Kline’s (2000) dual criteria four items were found to be unsatisfactory for this population and so they were deleted; the resulting Cronbach’s alpha coefficient was satisfactory (.92). Kimonis et al. (2008) and Essau et al. (2006) also raised concerns about two of the four items deleted here, namely, “What I think is right and wrong is different from what other people think” and “I do not let my feelings control me” from the Callousness dimension.

With these items deleted we completed a confirmatory factor analysis (CFA) with 268 children on the hypothesized factor structure of the 20-item ICU (8 items measuring Callousness, 4 for Emotionality and 8 for Uncaring). The results of the CFA indicated that a two-factor structure for the ICU fit adequately to the data and was superior to the three-factor model tested. In other studies using the ICU the three factor-structure (i.e., Callousness, Unemotional, and Uncaring) has consistently emerged with samples ranging in age from 13 to 20 years of age (see Essau et al. 2006; Kimonis et al. 2008; Fanti et al. 2009; Roose et al. 2011). In our study, however, the Unemotional factor had a poor internal reliability and a problematic item, which when removed reduced the reliability further. Clustering the Unemotional items onto the Callousness factor did not improve the fit and so the items were deleted.

Based on substantive a priori reasons we hypothesised that the unique variances of the associated indicators overlapped (i.e., measured something in common other than the latent constructs represented in the model) and correlated eight pairs of errors. Consequently, the CFA in our study captured two dimensions of behavior that fit the data best. One (Uncaring) was representative of a lack of caring about one’s performance in tasks and for others’ feelings. The second (Callous) captured behavior that included a lack of empathy, guilt and remorse, and an absence of emotional expression. None of the five items making up the ICU unemotional factor (i.e., “I do not show my feelings to others”, “I express my feelings openly”, “I hide my feelings from others”, “It is easy for others to tell how I am feeling”, and “I am very expressive and emotional”) loaded onto our two-factor model. Furthermore, the ICU items “I care about how well I do at school or work” (Uncaring factor), and “I do not feel remorseful when I do something wrong”, “I do not care who I hurt to get what I want”, and “I am concerned about the feelings of others” (Callousness) did not load onto our two-factor model. Given that our sample consisted of mainstream children (7.6 to 12.8 years) who were younger than those included in previous research using the ICU (i.e., predominantly 12 to 20 year olds), it is possible that many of them were at an age whereby they could not “feel” the (affective) emotions of others (see Dadds et al. 2009; Munoz et al. 2011). Furthermore, they may not have had the experience to be able to attribute these emotions to themselves or others (see Widen and Russell 2010).

The two-factor model was invariant across gender, supporting factor structure equivalence across the two groups. Essau et al. (2006) reported gender differences in ICU subscale scores consistent with past research indicating that men tend to score higher than women on all dimensions of psychopathy, including C/U. Our findings did show that with regards to age, there was a small significant effect, with older children have significantly higher scores on Uncaring (i.e., a lack of caring about one’s performance in tasks and for others’ feelings) than younger children. This is consistent with developmental findings that during early adolescence rebelliousness and antisocial attitudes become more common (Moffitt 1993).

It must be acknowledged and taken into consideration when interpreting the findings that our results are based solely on self-report data and that corroborative information such as file data and observations might enhance reliability. Nevertheless, self-report is an effective means of obtaining an accurate insight into the subjective dispositions that can be difficult to obtain from third parties such as teachers and parents (Andershed 2010 as cited in Salekin and Lynam 2010; Frick et al. 2009). Indeed, the validity of self-report on psychopathology and personality tends to increase from childhood to adolescence whereas parental and teacher report decreases for this period (Essau et al. 2006). This present study was purely school-based and therefore only children attending school were assessed. Children presenting with elevated C/U traits, such as those in clinical, institutional or referral-based settings should therefore be included in future studies so that distribution of C/U trait scores using the ICU can be compared. Furthermore, to obtain adequate fit to the data we had to correlate eight pairs of errors, which although substantially less than the 25 correlated error terms in the initial test of the ICU (Essau et al. 2006) suggests the factor structure needs to be replicated in other samples.

In summary, the ICU was specifically designed to address the limitations in previous measures of C/U and in doing so to provide a comprehensive assessment of C/U traits in young people. Although research has consistently provided evidence of three factors for adolescents this was not the case for younger children in this study. Thus, the data presented here represent a strong case for the continued use of the ICU with children aged 7–12 years in order to build on its potential and to support its further development.