Program effectiveness when early intervention programs are scaled up for widespread implementation is an important issue in prevention science (Botvin 2004). Implementation can be associated with deteriorations in program benefits, but few studies have compared the effects associated with alternative implementation conditions. In 2004, the Australian Government funded the nation-wide implementation of early intervention programs for the families of young children at developmental risk (Department of Family and Community Services 2004). This paper examines pre to post intervention changes in parenting skills and child developmental outcomes for participants attending a music therapy early parenting intervention funded under this scheme. Taking advantage of the natural experimental conditions created by a process of rapid widespread implementation, outcomes are compared across four sites that varied in terms of participant characteristics and implementation processes and supports.

Several reasons have been proposed for the erosion of program effectiveness when programs are taken to scale. Programs are often provided to more diverse, multi-problem participants than are typically seen under controlled research conditions (Scott et al. 2001). Content and structural changes may be necessary to accommodate new delivery systems and client populations, with the potential for inadvertently undermining the effective components of the program (Castro et al. 2004; Gray and Francis 2007; Sloboda et al. 2009). Under widespread implementation, programs often suffer from reduced funding levels, the use of less well qualified or trained staff and a lack of monitoring and accountability against program objectives (Gray and Francis 2007; Takanishi and Bogard 2007). Effectiveness can be further undermined if there is variability in the acceptance and supports provided in different implementation sites.

Behavioral parenting programs are popular as an early intervention for children at risk of behavioral problems. Brief individual or group programs have achieved improvements in parenting skills and reduced child behavioral problems (Leung et al. 2003; Sanders et al. 2000; Webster-Stratton 2001). However, low participation rates (Barlow et al. 2005; Ireys et al. 2001) and the poor retention (Barlow and Coren 2001; Barlow and Parsons 2002) of disadvantaged families may limit the reach of such programs when taken to scale. Music therapy is a new approach that seeks to make parenting programs accessible and enjoyable, by using musical activities as an engaging context for parents to learn and practice skills that foster children’s development.

Music therapy has been used in a variety of applied settings (Davis et al. 1999), but has only recently been employed with parents and their children (Abad and Edwards 2004; Oldfield and Flower 2008). Music therapy parenting interventions differ from traditional behavioral parenting approaches in several ways. Unlike behavioral parenting interventions, which are typically provided only to parents, music therapy programs are conducted with parent-child dyads. Programs are delivered within a structured context of music-based play, focused on one-to-one interactions between parents and their children. Music-based play is believed to foster parent-child bonding through the engaging nature of music and the links between music and the music-like qualities of parent-led communications with young children (Shenfield et al. 2003; Standley 2002; Trevarthen and Malloch 2000). Often there is little or no didactic teaching of parenting skills. Rather the music therapist uses incidental opportunities arising during the course of sessions to model, coach and reinforce parenting skills. Sessions are structured to practice a range of skills, with repetition each week providing the opportunity to build and reinforce developing skills.

The efficacy of music-based parenting interventions is yet to be established. Ten published studies have evaluated music therapy parenting interventions (Abad and Williams 2007; Allgood 2005; Archer 2004; MacKenzie and Hamlett 2005; Muller and Warwick 1993; Nicholson et al. 2008; Oldfield 2006; Oldfield et al. 2003; Shoemark 1996; Vlismas and Bowes 1999), with most studies collecting uncontrolled or post-program data only. All reported some positive effects for parent-child relationships, parenting skills and children’s development. However, half the studies also reported some neutral or adverse outcomes.

Sing & Grow is the most comprehensively evaluated music therapy parenting intervention to date (Abad and Williams 2007; Docherty et al. 2007; Nicholson et al. 2008; Williams 2006). The program is conducted by registered clinicians with a university degree in music therapy. Each program consists of ten 1-hour sessions conducted weekly for groups of 8 to 12 parent-child dyads. One evaluation of this program collected post intervention data for 683 disadvantaged families. The results indicated high levels of parent satisfaction, a perceived positive impact on parent-child relationships and improvements in children’s cognitive, physical and social development (Abad and Williams 2007). In a second study, changes from pre to post intervention were compared for 358 families attending programs for young parents, children with a disability and disadvantaged families (Nicholson et al. 2008). For all three types of participants, improvements were found for clinician-observed parent and child behaviors, parent-reported irritable parenting, educational activities in the home, and child communication and social play skills. No changes were found for parental warmth or child behavior problems.

From 2005, Sing & Grow was funded for 4 years to provide programs to 3000 families nation-wide (Department of Families Housing Community Services and Indigenous Affairs 2008). The service was expanded from its initial location (Brisbane, in the state of Queensland) to all Australian states and territories. Program administration was supported by each state’s independent Playgroup Association, and delivered in partnership with local service providers. After the first 6 months, structured interviews were conducted with Playgroup and Sing & Grow managers in all states and territories, to assess factors likely to affect implementation (Oldenburg and Glanz 2008; Rogers 2003), including program acceptability and compatibility, implementation costs, and program supports and communication. These revealed site differences in implementation, providing a natural experiment for comparing program-related changes across four sites with differing implementation characteristics.

Site 1 was Queensland where the program originated. Partnerships were well-established with Playgroup Queensland and community organizations. The National Director, a State Director and a part-time senior music therapist who had worked on the program for several years were employed in Brisbane. Sites 2 and 3Footnote 1 were each large states with no prior history of delivering this program, and partnerships had to be newly established. A locally based State Director was employed in each site. The interviews indicated that compared with Site 2, implementation in Site 3 was more difficult in terms of communication and limited organizational acceptance and compatibility. Site 4 was the remainder of the country, covering the smaller-population states and territories, with no prior history of delivering this program. No State Directors were employed in these areas. Service partnerships, program coordination and the management and training of clinicians were managed by State Directors located elsewhere. Interviews revealed the absence of locally-based senior staff contributed to communication and coordination difficulties and a lack of local ownership. Based on these data from Playgroup and Sing & Grow managers, implementation difficulties appeared greater for Sites 3 and 4 relative to Sites 1 and 2. Across the four sites there were also variations in who programs were provided to, as this was dependent on the links established with local organizations.

Three research questions are addressed in this study: (1) Did parents and children attending the program show pre to post improvements in parenting and child development skills? (2) To what extent were these outcomes similar across the four implementation sites? (3) Did variations in participant, program and clinician characteristics account for differences in program outcomes across sites? It was hypothesized that (a) participants would show improvements in parenting and developmental outcomes from pre to post across all sites; (b) after accounting for differences in participant characteristics, greater improvements would be observed in those sites with few implementation difficulties (Sites 1 and 2) compared to those with more implementation difficulties (Sites 3 and 4); and (c) these differences would be explained by between-site variations in program and clinician characteristics.

Method

Design

The ‘real-world’ context for the current research shaped what was possible in terms of study design and data collection methods (Docherty et al. 2007). The Sing & Grow evaluation was tendered to an external research team to assess the impact of the program during its expansion with 10% of the service-provision budget allocated to the research. The resulting limitations included: a reliance on clinicians to collect data within their prescribed clinical workloads (including the distribution and collection of parent questionnaires); observational methods needed to be simple enough to be completed by a single clinician at the end of the session for each parent-child dyad in the group; and intensive checks of program quality and fidelity were not possible given the vast geographical dispersion of the program settings (Docherty et al. 2007). Funding was not available for data collection from non-intervention controls or for independent, blinded assessments of participant outcomes.

Participants were parents attending 161 Sing & Grow group programs conducted in 2006–2007. Brief demographic data were collected from all parents and questionnaires were completed at the conclusion of the first (pre) and final program sessions (post) by parents who gave consent for research participation. Clinicians recorded attendance, rated the therapeutic quality of each session, and recorded their observations of parent and child behaviors at Sessions 1 and 2 and Sessions 9 and 10. At the end of the research, clinicians received a questionnaire about their experiences with the program.

Measures

Parenting and Child Developmental Outcomes

Clinicians rated three parent behaviors with single items (sensitivity to the child’s signals; effective engagement with the child; acceptance demonstrated through positive affect towards the child) and three child behaviors with single items (responsiveness to the parent; interest and participation; social engagement with others). Scores were averaged across two sessions each at pre and post to provide mean scores ranging from 1 to 5, with higher scores indicating more frequent positive behaviors (Nicholson et al. 2008). Independent observers attended 10% of sessions. Consistent with training standards, clinician and observer ratings were concordant to +/- 1 for 92% to 95% of ratings.Footnote 2

Validated, brief parent-reported measures from the Longitudinal Study of Australian Children (Zubrick et al. 2008) were used to assess parenting. Warmth was assessed by six items rating parents’ expression of physical affection and enjoyment of the child (5-point scale; α = .86). Irritable parenting was assessed with five items rating the frequency of anger and irritability towards the child (5-point scale; α = .86). Home activities with the child were rated on five items assessing the frequency of play and incidental teaching activities in a typical week (4-point scale; α = .73). Parents also reported their child’s receptive communication skills and social play skills (5 items each; 3-point scales; α = .85 and α = .87 respectively). For all measures, item scores were summed and averaged.

Child and Family Characteristics

Measures of individual characteristics included: age, gender, child developmental delay (yes/no), Indigenous status, main language spoken at home (English or other), parent’s marital status, parent’s highest level of education, and whether the family income was mainly from government benefits. Child temperament was measured at baseline with four items (α = .77) assessing child difficultness. Poor parental mental health was measured with the Kessler 6-item screener for detecting psychological symptoms (α = .90) over the last 4 weeks (Furukawa et al. 2003), and a single item rating of how well they were coping with life in general (5-point scale).

Program and Clinician Characteristics

Program and clinician characteristics assessed as potential sources of influence on program outcomes included: overly large group sizes (12 or more participating families); clinicians’ prior experience working with parents and children, and clinicians’ satisfaction with the quality of training and support provided (4-point scales: ‘very dissatisfied’ to ‘very satisfied’). In addition, at the end of each session, clinicians rated the quality of session facilitation on three items (5-point scales: rapport with group members, time management, and level of positive reinforcement for target behaviors provided). Clinician and observer ratings were concordant to +/- 1 for 97% to 99% of ratings. A mean score was computed across all sessions for each program (α = .93) with higher scores indicating higher program quality.

The Intervention

Sing & Grow was funded to deliver programs free of charge through local service providers and community organizations. Clinicians led participants through a set repertoire of musical activities designed to enhance parents’ skills through non-didactic strategies such as demonstration, rehearsal, feedback and praise. Targeted parenting behaviors included parental expression of affection, physical touch, praise, development of age-appropriate expectations, use of clear instructions and the use of music to moderate children’s mood and behavior. To promote children’s developmental skills through repeated practice, each session contained: greeting and farewell songs to encourage social responsiveness; action and movement songs for fine and gross motor skills and concept comprehension; playing with instruments for motor skills, following instructions, turn-taking and sharing; and quiet music to encourage physical touch and bonding between parent and child.

Procedure

The national expansion of Sing & Grow involved establishing a new service structure over a 6 month period. This required: forming partnership arrangements with eight state and territory Playgroup Associations; developing links and referral mechanisms with local service providers; recruiting and training clinicians; and establishing appropriate management, administration and communication systems. The clinical team expanded from 1.5 full-time equivalent and 6 casual staff based in Brisbane, to 4.5 full-time equivalent and 33 casual staff employed nation-wide.

Programs were offered through a variety of community and government agencies including local government and non-government family support services agencies; residential programs (e.g. for drug rehabilitation); and state or federal government funded specialist services (e.g. for families of children with a disability). There were five main types of clients: families facing general social and economic disadvantage; young parents; parents of a child with a disability; Indigenous families; and non-Australian born or refugee families. A small number of ‘other’ programs were provided to parents with a mental illness, drug or alcohol dependence, mothers in prison, and families experiencing domestic violence or under the child protection system.

Sing & Grow was provided as a stand-alone program. It was offered to the service agency’s clients as an addition to other services or supports that parents may have been receiving. Program content did not vary by client group or by site. Session structure and the main skills practiced each week remained constant, although the specific musical activities could be varied to be suitable for younger and older groups of children. The intervention was delivered by 39 music therapy clinicians (10–12 per site) who were trained by their State Directors in the therapeutic approach, the session structure and activities (manualized), the nature of client needs, and administration and evaluation procedures. State Directors conducted site visits at least once for each program to provide supervision and ensure consistency in implementation and quality of delivery. Twenty-seven clinicians completed satisfaction surveys at the end of the project (69.2%) with no evidence of variations in participation by site.

Data Analysis

Baseline parenting and child outcomes were compared across sites using analysis of variance or chi-square tests. Correlations were conducted to identify potential covariates related to outcome measures, followed by linear regression to identify and eliminate variables not contributing to the prediction of outcomes. To avoid multicollinearity between indicators of family risk, a risk factor exposure score was computed for each family, comprising a simple count of six risk factors (young parent, single parent, Indigenous, non-English speaking background, incomplete high school education and family income from government benefits). Individual-level covariates were: child age, gender, developmental delay, baseline temperament, baseline parent mental health, parent coping, total risk factor exposure score, and the number of sessions attended. Covariates at the program and clinician level were: oversize group (more than 12 parent-child dyads); mean session facilitation score; clinician prior experience, and satisfaction with training and support.

The effects of site of implementation on parenting and child outcomes were examined using multilevel modeling (Raudenbush and Bryk 2002) conducted with SPSS Version 15.0 (SPSS Inc 2006) to account for the clustering of individuals within groups. The analyses took account of issues and assumptions in the use of multilevel analyses, including the distributional properties of the data (Tabachnick and Fidell 2007). With samples of 500 or more, regression models have been demonstrated to be robust to even extreme violations in normality assumptions (Lumley et al. 2002). Given the number of comparisons undertaken a conservative p value of .01 was selected to indicate statistical significance.

Results

Programs were provided for 1611 parents of typically developing children aged birth to 3 years or children with developmental disabilities up to age 5 years. Research consent and pre data were provided by 1354 (84.5%) parents, with post data provided by 850 (62.8% of those with pre data). As the level of missing data at post precluded multiple imputation, analyses were restricted to participants with data at pre and post. Compared to all those who attended Sing & Grow programs, parents who provided complete data were significantly older (by 7 months, p < .0005), less likely to be Indigenous (5.2% vs. 7.7%, p < .0005) and were more likely to have a female child (51.3% vs. 48.1%, p < .005). While statistically significant, the magnitude of these differences was small. Site of implementation was not associated with participation in the research or the completion of questionnaires at post.

Site Differences at Baseline

As expected, there were differences between sites in the characteristics of participants (Table 1). A higher proportion of programs were provided in Site 1 to children with a disability and Indigenous families, and in Site 2 to multicultural families. Sites 3 and 4 had higher proportions of parents with incomplete high school education. Parents in Site 3 reported more psychological symptoms, poorer coping and greater exposure to risk factors. Participants attended an average of 7.1 sessions, with no significant differences between sites.

Table 1 Sample characteristics by implementation site

Session facilitation varied across sites (Table 2) and was rated more positively for programs conducted in Site 2 than other locations. With 39 clinicians across the four sites there was insufficient power to test for statistically significant differences in clinician characteristics by site. However, the proportion of experienced clinicians was lowest in Site 1 due to the use of supervised graduate students as clinicians. Site 1 clinicians were the most satisfied with the training and supports provided, and Site 4 clinicians were the least satisfied.

Table 2 Program and cliniciana characteristics by implementation site

There were differences between sites on three baseline observational measures. Parents in Site 4 were rated as having significantly poorer levels of engagement with their child at pre than parents in all other sites and their children were rated as displaying lower interest in group activities. Children from Sites 2 and 4 were observed to be less socially competent in the sessions than those from Sites 1 and 3. No site differences were found for the baseline parent-reported measures of parenting and child development.

Changes in Parenting and Child Behavior from Pre to Post

Differences between sites in changes over time were evaluated using multilevel modeling procedures separately for each outcome measure. The first level was repeated measurements within individuals, the second level was individuals and the third level was program groups. For each outcome a series of nested models were conducted using maximum likelihood (ML) methods of estimation, with improvement in model fit assessed by change in the χ2 statistic (Tabachnick and Fidell 2007). After examination of the intercepts only model, site and time were entered (unadjusted model), with an interaction term allowed to determine whether time-related changes varied across sites. Significant time effects indicated a change in outcomes from pre to post (Research Question 1). When significant site or interaction effects were found, this indicated differences in outcomes across sites (Research Question 2) and further modeling was undertaken. To examine possible causes of site differences (Research Question 3), covariates were entered in two sets, with individual characteristics and number of sessions attended entered first, and program and clinician characteristics entered second. Results are presented for clinician observations of parent behaviors and child behaviors first then for the parent-reported measures.

From pre to post there were significant improvements on all three clinician-reported measures of parent behavior (Table 3). For clinician-reported parental sensitivity and engagement, the unadjusted models revealed significant site effects, and for all three behaviors, significant time and site-by-time interactions indicated that the extent of pre to post change varied by site. Addition of the individual characteristics improved model fit for all measures, but all significant site, time and interaction effects remained. Addition of program and clinician characteristics provided a further improvement in model fit for parental sensitivity and engagement, eliminated the site effects and reduced the interaction to a marginal level. The site-by-time interaction remained significant for parental acceptance, and across all three behaviors the time effect remained significant.

Table 3 Model parameters for observed parenting and child behavior measuresa

In summary, these analyses indicated that site of implementation was associated with differential improvements from pre to post in clinicians’ ratings of parents’ behaviors. Adjustment for the characteristics of parents receiving the programs in different sites, partly accounted for these differences. Addition of clinician characteristics and program quality ratings fully accounted for the remaining site differences for parental sensitivity and engagement. For parental acceptance, change over time continued to differ according to site, with smaller improvements over time reported for parents attending programs in Sites 3 and 4. Overall, the fully adjusted models indicated that observed parenting behaviors were rated more positively (0.65–0.70 points higher on the 5-point rating scales) at post compared to pre (Table 3).

The three clinician-reported child behaviors also improved from pre to post (Table 3). For all measures, the unadjusted models showed significant site, time and interaction effects. Addition of the individual characteristics improved model fit but significant site, time and interaction effects remained. Addition of program and clinician characteristics provided a further improvement in model fit and eliminated the site effects for child responsiveness and sociability. The site effect for child’s level of interest in program activities remained. The site-by-time interactions were marginal for responsiveness and sociability but remained for child interest. Time effects continued to be highly significant for all measures.

Similar to the analyses for observed parenting behaviors, these analyses indicated that site differences in participant characteristics accounted for only some of differential improvements from pre to post in children’s observed behaviors, with clinician and program characteristics accounting for much of the remaining differences. In the fully adjusted model for child interest, change over time continued to differ significantly according to site, with smaller improvements for children attending programs in Sites 3 and 4. Across the observational measures of child behavior the fully adjusted models indicated that children’s behaviors were rated more positively (by 0.60–1.01 points on the 5-point rating scales) at post compared to pre.

Changes over time for the parent-reported measures of parenting and child behavior were initially examined in a similar way. However, the unadjusted models revealed no site or site-by-time interactions for all parent-reported measures of parenting or for child receptive communication skills. For child social play skills the interaction effect was just significant at p = .01. Therefore only the unadjusted models are presented (Table 4). Significant improvements were reported from pre to post on all parent-report measures, which were similar across sites. For parenting warmth and irritability, changes of 0.03 and 0.04 points in a positive direction were obtained on the 5-point rating scales. For parent activities, an improvement of 0.01 was obtained on a 4-point rating scale. Child communication and social play skills were rated on 3-point scales and improvements were 0.09 and 0.19 of a point, respectively.

Table 4 Model parameters for parent-reported parenting and child behavior measuresa

Discussion

This study provides new information about the extent to which an early parenting intervention for parents of young children at developmental risk is associated with gains from pre to post intervention when rapidly taken to scale. Consistent with hypothesis a, results indicated positive improvements from pre to post intervention for clinician- and parent-reported measures of parenting and child development. However, the absence of control group or independent blinded assessment data precludes conclusions about the effectiveness of music therapy as an early parenting intervention, as the effects of maturation and parent or clinician expectancies can not be excluded.

The study provided limited evidence that changes in outcomes from pre to post varied according to site of implementation (hypothesis b). For the parent-reported outcome measures, there was no evidence of differential effects, suggesting that parents perceived similar changes over time, irrespective of implementation processes. In contrast, for all clinician-reported measures, gains from pre to post were related to site of implementation, with site differences persisting after adjustment for differences between sites in the characteristics of participants receiving programs. Across these measures, Site 2 where the implementation process had previously been reported by Playgroup and Sing & Grow managers to have proceeded smoothly, demonstrated a similar magnitude of change over time to that achieved in Site 1 where the program was well-established. In Site 3, where the implementation process had previously been evaluated less positively, participants were rated by clinicians as showing comparatively less change over time than either Site 1 or Site 2. However, in Site 4 which also had less favorable implementation conditions, ratings of participant changes were similar to Sites 1 and 2 on some outcome measures, and resembled Site 3 on other measures. As predicted by hypothesis c, adjustment for participant, program and clinician factors reduced site differences, although they remained statistically significant for two of the six clinician-reported measures.

Results from the analyses of clinician-reported measures could be interpreted as providing evidence that implementation processes influenced the effectiveness of this intervention. In support of this interpretation, the patterns reported were broadly consistent with our hypothesized differences predicting greater gains for participants in Sites 1 and 2 compared to Sites 3 and 4. Results from the fully adjusted models suggested that at least part of these differences were accounted for by program and clinician characteristics, although these findings should be viewed with caution due to the confounding by measurement source. An alternative possibility is that the clinicians working in sites where implementation was well-supported were more likely to perceive benefits for their clients, than those in less well-supported sites.

Implications for the implementation of early parenting interventions depend on judgments about the relative quality of the parent- vs. clinician-reported data. The parent data indicate that this intervention is robust to variations in implementation process. In contrast, the clinicians’ data in combination with our findings from the earlier qualitative interviews with Playgroup and Sing & Grow managers suggest that several improvements could be made which may benefit participant outcomes. These include improved communication between program management and state-based Playgroup Associations, allowing more time to develop shared partnerships and perspectives about how the new program would fit within existing structures and practices, employment of locally-based program managers and increased resourcing for staff training and supports.

A strength of this implementation was its consistent application of funding, staff recruitment and quality assurance across sites, as previous studies have attributed poorer outcomes in new sites to factors such as reduced funding levels, use of less qualified staff, and lack of monitoring and accountability (Gray and Francis 2007; Takanishi and Bogard 2007). The study’s limitations reflect the difficulties encountered in real world evaluations (Bamberger et al. 2004), including the use of brief measures that may restrict measurement sensitivity, and the lack of independent blinded assessments of outcomes, measures of clinician skills, and control group data. Additionally, one third of families who commenced the research failed to participate at post, with younger parents, parents of boys and children from Indigenous backgrounds under-represented amongst those with complete data, thereby restricting generalisability.

The evaluation of music therapy parenting interventions is a new area of scientific inquiry and quality research is sparse. While the program examined here was backed by several years of research, this was largely descriptive. At the time of funding, there was not sufficient evidence that music therapy parenting interventions were ready for dissemination as an evidence-based approach. This history reflects the nature of priorities within policy and service provision contexts. Programs that get funded are those that are efficient, feasible, politically acceptable and low risk (Rychetnik and Wise 2004), and the value of control group data may not be apparent (Bamberger et al. 2004). This study provides some additional support for the potential of music therapy as an early parenting intervention, and further emphasizes the need for randomized controlled trials.