Introduction

Recently, increased attention has been paid to the burden of children’s mental health problems globally (Patel et al. 2007). Children and adolescents make up nearly one-third of the world’s population (UNICEF 2016), and psychiatric disorders represent a high percentage of their health-related burden, including Disability-Adjusted Life Years (World Health Organization 2013). Recent estimates are that one in five children globally has a psychiatric disorder or significant mental health problem (Belfer 2008; Perou et al. 2013; Polanczyk et al. 2015). Childhood mental health problems are of concern, given their association with poor physical and mental health outcomes throughout childhood and into adulthood, including school dropout, peer relationship difficulties, and difficulty meeting health care needs and transitioning to adulthood (Ford et al. 2007; Kessler et al. 2005).

Although there are nearly 500 million children aged 0 to 18 years in sub-Saharan Africa, little is known about the prevalence of mental health problems among children in the region, particularly young children, many of whom live in poverty in conditions where family disruption, community violence, and national conflicts are common (UNICEF 2016). One of the few reviews of work in this region found a psychopathology prevalence of 14.3% among children 5–16 years old (Cortina et al. 2012), with psychopathology broadly defined in terms of mental health problems, psychiatric disorders, behavioral problems, or psychological distress. South Africa (SA) is a country where child mental health problems may be of particular concern, given high rates of poverty and violence, and a legacy of discrimination due to Apartheid and the HIV epidemic. HIV in particular has taken a significant toll on Black South Africans, with staggering numbers of young children orphaned (UNICEF 2013), increasing risk for mental health problems (Rutter 1971, 1979; Doku 2009).

The paucity of mental health research on African children, including SA children, may be due to multiple factors, including the challenge of meeting basic physical needs including food, shelter, and child survival (Williamson 2005). Also, a significant barrier to understanding child mental health in any context is related to assessment. Children typically present with symptoms in different ways from adults and often lack sufficient language skills to describe their experiences. Evaluation of young children is dependent on parents/caregivers, teachers, and other adults and must be viewed within the context of family and social environments (King et al. 2009).

In the poorest communities in SA, assessment challenges are compounded by a dearth of mental health professionals (Lund et al. 2010). The lack of providers poses a barrier to identifying individuals with mental health problems and providing them with treatment. Recently, calls for use of “lay counselors” in the assessment and treatment of mental health problems have increased (Petersen et al. 2012). However, these efforts have been hindered by a lack of translated and validated standardized clinical instruments that can be easily and effectively administered by non-mental health professionals.

One of the most commonly administered brief mental health screening tools for children that can be used by lay professionals globally is the Strengths and Difficulties Questionnaire (SDQ) (Youth in Mind 2012). The SDQ, completed by caregivers, teachers, or older children themselves, measures behavioral difficulties and pro-social strengths in children ages 3 to 16, with clinical cut-off scores to indicate likely mental health problems. An international review of 48 studies on the psychometric properties of the SDQ caregiver and teacher versions for children ages 4 to 12 found sufficient internal consistency, test-retest reliability, inter-rater agreement, and validity of a five-factor structure, supporting its use across contexts (Stone et al. 2010). However, only a few studies have used the SDQ in Africa (e.g., Kashala et al. 2005; Menon et al. 2007), where ethnic and cultural differences in childrearing and expectations about childhood in general might impact how caregivers perceive development, emotional and behavioral function, and social relationships. Some of these studies suggest that the SDQ total score has adequate psychometric characteristics, but not all of the factors established by Goodman and colleagues have been equally supported. A few studies using confirmatory and exploratory factor analysis have found the peer problem scale items, and some conduct items, do not always load as expected on published factors (Stone et al. 2010). Moreover, these studies suggest that cut-off scores for significant mental health problems based on British or US-based samples may need to be reconsidered as much larger proportions of the children in settings, such as Zambia and Democratic Republic of Congo, fall outside the normal range (Kashala et al. 2005; Menon et al. 2007). It is not clear if this accurately reflects prevalence of mental health problems or indicates socio-cultural differences.

The SDQ had been translated in SA into Xhosa and Afrikaans, but not isiZulu, the dominant language and culture in KwaZulu-Natal, an area of SA with high poverty and HIV rates, and thus potentially high rates of child mental health problems (Department of Health South Africa 2005). Moreover, no SA studies have evaluated the factor structure of the SDQ and its psychometric characteristics. To our knowledge, few studies in any African country have examined the SDQ in children as young as 4 years, nor its use longitudinally as children develop from preschool to school age. The ability of measures to reflect developmental change is important to ongoing monitoring of mental health. Thus, using longitudinal data from a large epidemiological cohort study of children and their caregivers living in KwaZulu-Natal, in an area affected by poverty and HIV, with few health or social resources, we examined the psychometric performance of the SDQ in isiZulu, including construct validity and internal consistency.

Methods

Participants and Procedures

Asenze is a longitudinal epidemiological study that followed a cohort of pre-school aged children into school age. The study was conducted from 2008 to 2012 in five tribal areas with an estimated population of 67,000 in KwaZulu-Natal, SA. The study area, with semi-rural and peri-urban dwellings, has high levels of unemployment and is situated in a province that has had the highest antenatal HIV prevalence in the country (39.5%) (South Africa National Department of Health 2010).

There were two waves of data collection. First, trained fieldworkers conducted a door-to-door survey, identifying households with children aged 4–6 years; 14,425 households were visited, 2049 with eligible children. If written informed consent was obtained (n = 1787; 87.2%), a demographic interview was conducted with a primary caregiver responsible for the child’s daily care, and they were invited to participate in a larger assessment of the child and caregiver approximately 2 weeks later at nearby research offices; 1581 (88%) participated in this assessment (wave 1). They were also invited to participate in a second assessment 2 years later when the child was 6–8 years old; 1409 (89% of wave 1 participants) completed wave 2. The assessment included the SDQ, administered to caregivers by lay fieldworkers who were native speakers of isiZulu and bilingual in English and were trained and supervised by a SA child psychologist and a medical doctor.

Ethical Considerations

Study procedures were approved by Institutional Review Boards in both SA and the USA. Informed consent was obtained from caregivers for their own and their child’s study participation. Separate consent was obtained for focus group discussions, not originally planned (see below).

Measures

Demographics

Data were collected on child age and gender, household composition, household assets, income, and caregiver education during the household survey and updated at each interview.

Strengths and Difficulties

The SDQ (Goodman 1997) is a brief, behavioral screening questionnaire that has been translated/adapted into multiple languages and validated worldwide (Youth in Mind 2012). Twenty-five items are rated on a 3-point Likert scale (0 = not true; 1 = somewhat true; 2 = certainly true) to assess emotional and behavioral function, resulting in five subscales (emotional symptoms, conduct, peer relationships [peers], hyperactivity/inattentiveness, and prosocial behavior) and a total difficulties score, based on the first 4 subscales. The SDQ was adapted/translated into isiZulu following standard procedures for translation/back-translation, with discussions to resolve discrepancies (Preciago and Henry 1997) and reviewed with the SDQ author (Goodman, personal communication) to ensure adherence to item meaning. Because of structural differences in the isiZulu language and English, focus groups were conducted with SA native isiZulu speakers, and a SA linguist was consulted before the final translation was adopted (available on the SDQ website). All scores from the subscales were computed at both waves using published scoring algorithms in the SDQ website (http://www.sdqinfo.com/py/sdqinfo/c0.py).

Statistical Analyses of SDQ

Analyses were conducted to determine whether the SDQ five-factor structure would be replicated in this context at both waves. We calculated the internal consistency (Cronbach’s alpha) for the five original scales: prosocial, conduct, emotional, hyperactivity/inattentive, peers), as well as for total difficulties. We also used exploratory factor analysis (EFA) with two refinements: a psychometric model tailored for items with three ordinal response options and a target factor rotation, attempting to be as close as possible to a pre-specified target pattern (Browne 2001) of the original factor solution.

Both of these were implemented using Mplus (Version 4.7) (Muthen and Muthen 1998–2010). To fit the factor model, we used a weighted least squares estimator with a mean and variance adjusted chi-square statistic (WLSMV). The adequacy of the fit of the model was evaluated using several fit indices: the comparative fit index (CFI), the Tucker-Lewis index (TLI), the root mean square error of approximation (RMSEA) (Marsh et al. 1996), weighted root mean square residual (WRSR), and a chi-square test for discrepancy between the model and the data. CFI is considered as an “adequate” fit when its value is greater than .9 and as a “good” fit when its value is greater than .95 (Hu and Bentler 1999). For the RMSEA, a value of .05 is thought to indicate close fit, .08 a fair fit, and .10 a marginal fit (Browne and Cudeck 1993). Yu (2002) denoted WRMR statistics typically less than 1.00 as a “good” fit. All analyses were carried out in each wave separately. Additionally, we examined the degree to which the SDQ scale produces stable and consistent results by computing the correlations between responses at two time points.

Results

Demographics Description

To allow comparison of psychometric results from both waves, we limited our analyses to the children with SDQs completed at both waves (N = 1394); 702 (50.4%) were boys, 692 (49.6%) girls (Table 1). The mean age at wave 1 was nearly 5 years. The majority of questionnaires (65.5%) were completed by the child’s birth mother and father: 21.1% by grandmothers and 13.4% others. Among caregivers who reported on their education (78%), 6% had no formal education, 4.3% had some primary school, 3.4% completed primary school, 13% had some middle school, 8.5% completed middle school, 40.3% had some high school, 20.9% completed high school, 3.6% had some college. Among caregivers reporting on food security (96%), nearly 6.1% reported food insecurity for over 5 days in the past month and 19.3% reported food insecurity problems for 1–5 days; 74.6% had not experienced food insecurity in the previous month.

Table 1 Demographic characteristics of children in wavec1 (N = 1394)

Descriptive Statistics of SDQ Items

Table 2 presents descriptive statistics for SDQ items. As indicated, prosocial scale items had higher means and less variation than those items reflecting difficulties across the two waves.

Table 2 Mean scores and standard deviation for caregiver SDQ by wave (N = 1394)

Internal Consistency

Cronbach’s alphas for the scales were consistent at each wave (1, 2): total difficulties (0.74, 0.74), emotional (0.62, 0.62), prosocial (0.57, 0.57), conduct (0.51, 0.47), hyperactivity/inattentive (0.42, 0.54), and peers (0.38, 0.29). Although the values for the total difficulties are acceptable, and values for emotional and prosocial are almost acceptable, the values for conduct, hyperactivity/inattentive, and peers are generally unacceptable. The factor analysis can give more insight into which items are inconsistent.

Factor Analysis

We extracted five factors and specified a target pattern for the items based on Goodman (1997): (1) prosocial, (2) conduct, (3) emotional, (4) hyperactivity/inattentive, and (5) peers. The fit of the five factor model was adequate in both wave 1 (χ2(185) = 557.11; CFI = .95; TLI = .93; RMSEA = .04; WRMR = 1.01) and wave 2 (χ2(185) = 357.49; CFI = .98; TLI = .96; RMSEA = .03; WRMR = .80). As shown in Table 3, items generally had loading of .3 or higher on the expected factors. The exceptions to this rule were the peer items, which did not form a coherent factor, and two of the hyperactivity/inattentive items. The two hyperactivity/inattentive items (21 and 25) loaded on the prosocial factor with negative loadings. These were reverse-coded items designed to measure inattention. Another reverse-coded item that loaded with prosocial was item 7 (“generally obedient”), which had a − .5 loading, but it also had a loading of .3 with conduct. In addition to the items that cross-loaded with prosocial, item 5 from conduct (“often has temper tantrums”) had a high loading with peers in both waves.

Table 3 Five-factor exploratory factor analysis of caregiver SDQ by wave

Considering the poor performance of hyperactivity/inattentive and peer subscales, we set them aside and fit a 3-factor model to the items in the other three original subscales, prosocial, conduct, and emotional. The fit indices showed that the refined model had a good fit in wave 1 (χ2(63) = 201.06; CFI = .95; TLI = .92; RMSEA = .04; WRMR = .97) and wave 2 (χ2(63) = 160.76; CFI = .96; TLI = .94; RMSEA = .03; WRMR = .87) and the majority of items loaded on the expected factors, except for two conduct items. In addition to the conduct factor, item 5 “Often has temper tantrums or hot tempers” and item 7 “Generally obedient” loaded on emotional and prosocial separately. These two items cross-loaded more at wave 2 when the children were older.

The target rotation allowed the latent variables to be correlated rather than being constrained to be independent. In both waves, conduct was positively correlated with emotional (r = .37, .40) and negatively correlated with prosocial (r = − .28, − .35). These factor-based correlations were somewhat larger than the correlations among the scales formed as simple sums of item responses. Conduct item sums and emotional item sums were positively correlated at .31 and .33, whereas conduct and prosocial sums were negatively correlated at − .18 and − .22 in wave 1 and wave 2, respectively. In both waves, there was almost no correlation between emotional and prosocial regardless of whether the scales were formed as the latent variables (− .05, − .13) or simple item sums (− .05, − .09) (see Table 4).

Table 4 Correlations among prosocial, emotional, and conduct subscales for caregiver SDQ by wave

The most well-validated and widely used SDQ scale is total difficulties, which summarizes all negative emotional and behavioral items (i.e., excludes prosocial items). Although we know the SDQ has a multifactorial structure, we examined a one-factor solution to determine if all of the items were related to a single difficulties latent dimension. Across both waves, only 13 items, including 3 hyperactivity/inattentive items and 3 peer items, consistently had loadings over .35. The other 7 items did not have high loadings on the factor. When computed based on all 20 items, Cronbach’s alpha was 0.74 in both waves; based on 13 items, the alpha was 0.76 at wave 1 and 0.74 at wave 2. The 20-item and 13-item versions of total difficulties had a modest but statistically significant correlation over waves (r (20 items) = .32, r (13 items) = .27). There was no evidence that the more homogeneous set of 13 items had higher stability over time.

Prevalence of Psychological Risk

We compared the distribution of the SDQ scores in KwaZulu-Natal to UK norms given by Goodman (1997) (Tables 5 and 6 and Fig. 1). Asenze participants had higher scores on total difficulties and all of its four subscales. For example, 36.7% of our participants had total difficulties scores greater than the UK cutpoint of 17 in wave 1, compared to 9.8% of UK children who received scores above this value (Meltzer et al. 2000). The proportion of children above this value in wave 2 (28.3%) was also elevated relative to UK norms, but less than that of wave 1. Across two waves, 202 children (14.5%) consistently had total difficulties scores above the UK cutpoint of 17. In general, the mean scores in wave 2 were consistently lower than wave 1 (Tables 5 and 6), while still being larger than UK norms.

Table 5 Wave 1 scores on each subscale and total difficulties scores based on caregiver-rated SDQ (n = 1394)
Table 6 Wave 2 scores on each subscale and total difficulties scores based on caregiver-rated SDQ (n = 1394)
Fig. 1
figure 1

Total difficulties distribution with 90th percentile cut-off and UK cut-off by wave

Post Hoc Qualitative Interviews and Findings Concerning Caregiver Perceptions of Items

During wave 2, we conducted two focus group discussions (not originally planned) with caregivers to gain socio-cultural insights into the high scores observed on several SDQ subscales, as well as concerns about the peers subscale. We randomly sampled from caregivers of children who had high SDQ scores, from different tribal areas, comprising two groups of six to seven participants. Questions concerned: (1) what defines an ‘unhappy child’; (2) what is considered “appropriate” behavior in young children; (3) challenging behaviors; (4) the role of friends for young children; (5) children’s recognition of others’ feelings and sharing; and (6) obedience and following instructions. A bilingual (isiZulu and English) ethnography fieldworker recruited and consented participants, facilitated the audiotaped discussions, and transcribed and translated the discussions. Two SA team-members extracted the predominant responses to each question.

Caregivers described manifestations of “unhappiness” as refusal to play with other children, eat, and/or talk, and bullying/fighting with others or being bullied. “Crying without a reason” was described as “not normal” and some attributed this to physical pain. Challenging behaviors for caregivers included children being “rude” or disrespectful to caregivers, using vulgar language and fighting with caregivers.

The discussion of childhood friendships had relevance for the SDQ Peer subscale. Some caregivers thought children should play mainly with children who resided in their household or family. This allowed for better supervision and also avoided unwanted negative influences “from outside,” such as disobedience, bad language, sexual experimentation and smoking, and reduced risk of physical dangers, such as snakebites and sexual abuse. Playing with friends at school under teacher supervision seemed acceptable, but caregivers were more cautious about allowing children to visit school friends outside school. Several caregivers reported that older siblings often looked after younger ones and that caregivers did not usually participate in children’s play. Thus, they may not have sufficient ability to report on peer relationships.

There was agreement on teaching children to be helpful and to undertake household chores from a young age. Caregivers emphasized the need to ensure obedience even if it meant occasionally threatening or punishing children. Not being attentive or not completing tasks was attributed to forgetfulness or being distracted by play.

Discussion

In countries where many children grow up in poverty, the ability to identify childhood mental health problems is a critical public health challenge. International best practices for children under the age of 10 are to obtain reports from caregivers. In contexts with limited resources for professional assessments, the likelihood of obtaining useful information from caregivers is increased when using standardized survey instruments that have been used in similar settings and validated against professional assessments. However, standardized measures may not automatically yield precise information when adapted for a new cultural group and translated into a new language. For this reason, we examined carefully the psychometric properties of the widely used SDQ after it was translated into isiZulu and used in a multi-wave assessment of a large sample of children in KwaZulu-Natal, SA. To our knowledge, this is one of the few large studies in sub-Saharan Africa to longitudinally follow a young cohort using the SDQ.

In this new context, we confirmed that the emotional subscale of the SDQ had adequate internal consistency and evidence of construct validity and that two others (prosocial and conduct) had marginally acceptable internal consistency. Psychometric problems were found in the remaining two subscales. The inattention items within the hyperactivity/inattentive subscale did not correlate with the hyperactivity items, and none of the items in the peers subscale showed the expected psychometric structure. Despite the mixed findings for the subscales, we found that the total difficulties score had an acceptable internal consistency of 0.74.

Based on the psychometric analyses, we conclude that the total difficulties score is more useful than the subscales in this language and context. The distribution of the SDQ Total Difficulties scores suggests that young children in our study were at high risk for mental health problems. This risk may be due to multiple factors including poverty, HIV/AIDS, family disruption, and other variables associated with the legacy of Apartheid on Black South Africans. The risk associated with individual children seemed to change over the 2-year follow-up period in our study. The wave 1 and wave 2 total difficulties scores were only correlated 0.23 with each other, with the mean level of problems significantly declining over time. Young children’s mental health can change over time, particularly in high stress environments where adverse events can come and go, and thus, we do not necessarily expect high levels of consistency in this context of high poverty and negative familial events.

We were not able to reproduce the 5-factor structure in this context, finding uneven support of the subscales. We identified only one previous study from an African setting that examined the SDQ factor structure, but using teacher-ratings, not caregiver-ratings, and primary school children (7–9 years) (Kashala et al. 2005). The similarities between our studies are interesting, despite the differences in raters. Their factor structure revealed a very similar overall pattern with adequate loadings for the prosocial, emotional and conduct subscales, fair loadings on hyperactivity/inattentive, and poor loadings on peers. In our study, further support for some of the SDQ scales was reflected in the inter-correlations. For example, conduct was negatively associated with prosocial as one would expect. It was also positively associated with emotional, both reflecting behavioral difficulties.

Similar to other studies in Africa, the peers subscale performed the poorest, with items not loading as expected on published factors (Stone et al. 2010). Thus, it may be the least useful in African settings, where, as suggested by our qualitative data, significant concerns about safety and other cultural factors related to childrearing may result in limitations on child interactions with peers, as well as fewer opportunities for caregivers to actually observe interactions with peers. Also, cultural views on what are appropriate and inappropriate parent-child interactions may vary across LMIC and high-income countries and lead to different ratings on the SDQ items related to conduct and attention, which were derived in a western-European context with different approaches to parenting and caregiving.

The hyperactivity/inattentive subscale did not perform as expected in both waves, seeming to comprise two separate factors. Also, in wave 2 among 6–8-year olds, some of the conduct items loaded on the hyperactivity/inattentive factor. However, the two factors that did emerge do reflect recent literature indicating that attention deficits may or may not co-exist with hyperactivity (American Psychiatric Association 2013). Thus, the two are now often considered separate constructs. Further work in the African context on this particular diagnosis might be helpful for enhancing the SDQ.

Few studies from LMIC have examined use of the SDQ in children as young as 4 years (e.g., Du et al. 2008; Owen et al. 2015), and only Du et al. (2008) had a sample as large as ours, but they did not look at the youngest children separately. Thus, studies with comparison groups for preschool children in LMIC are limited. The 90th percentile cut-offs in our 4–6-year olds were higher on all subscales and the total score compared to the published UK cut-offs, perhaps indicating a greater proportion of children with behavior problems in settings with socioeconomic, health, and historic challenges.

At wave 2, compared to wave 1, the 90th percentile cut-offs of the emotion, conduct, and hyperactivity subscales were the same, and peers and total difficulties were lower. But all subscales and total difficulties remained higher for our sample, compared to the UK cut-offs. The higher 90th percentile scores across both waves for total difficulties could reflect more frequent psychosocial problems in this context, or it could reflect cultural differences in understanding of items or perceptions of behavior.

There were several limitations to the study. Adapting an instrument based on the English language to a language with a different structure raises concerns around validity and reliability. Our extensive translation efforts were an attempt to overcome this. However, we did not have the resources for a validation of the SDQ against a clinical assessment by a mental health professional (nor are there many bilingual professionals in SA). Although a large epidemiological study, the data may not reflect other ethnic groups or other countries in Africa. Another limitation is that we only had caregiver reports. The SDQ requires parents to make relatively nuanced judgements about their child. If the child is not causing trouble at home, they may not notice inattention or peer problems. Thus, multiple raters are often needed to fully understand what is happening with the child, including the child’s report as they age, which is possible with the SDQ once children are age 11.

That said, the major strengths of this study were the large population-based sample, the use of longitudinal data with a high retention rate, and a strong analytic approach to the psychometric properties in this context. Our results supported the psychometric structure of the emotional, prosocial, and conduct subscales of the SDQ in SA, suggesting that the overall translation and implementation of the measure in this context was successful. The fact that the psychometric results for hyperactivity/inattentive and peers were not as expected is an important reminder that one cannot assume that standardized measures developed in one cultural context will automatically transfer to other cultural contexts, even if the measures have been widely used in other studies. Just as the basic scientific findings in medicine and epidemiology warrant constant review and critique (Ioannidis 2005), so does the expectation that measures can be easily adapted. The similarity of our findings on peers to other studies, and the difficulties with hyperactivity/inattentive and conduct, suggests that there is a need for modification of the SDQ in this context and development of other tools to assess mental health in young children, particularly if social skills and hyperactivity/inattention are being considered. Given that recent studies highlight the global burden of attention-deficit/hyperactivity disorder (ADHD) in children (Erskine et al. 2014), as well as the importance of social skills and relationships at all stages of development (Dunkel Schetter 2017), assessment that is appropriate to the social and cultural context of both these variables may be critical to fully addressing the mental health challenges of young people.

Fortunately, the data suggest that the SDQ can still be a useful screening tool with young children in this part of SA, with a few caveats. The total difficulties score had good psychometric properties suggesting that it can be used in SA. Standardized mental health tools are important, as they not only provide a useful clinical tool, but allow for cross-cultural comparisons to further understand the need for mental health resources across the globe. Moreover, accurate assessments of prevalence in specific populations can be critical to targeting resources, particularly when there are limited resources for mental health treatment. However, we recommend that tools such as the SDQ not be used as the sole input for clinical decision-making; decisions need to be considered carefully so that children are evaluated correctly given the context they are in and receive appropriate care.