Introduction

Developmental scientists investigating processes of parenting frequently rely upon parent-report assessments despite well-documented limitations of this methodology (Morsbach and Prinz 2006). While parent-report measures have strengths in certain instances, it is generally accepted within the field that observational measures of parenting behaviors and parent-child relationships are superior in criterion and predictive validity. However, observational measures are considerably more costly and time-consuming for both researchers and participants (Aspland and Gardner 2003; Margolin et al. 1998; Morsbach and Prinz 2006; Zaslow et al. 2006). As such, understanding precisely how these different methods of assessing parenting behaviors correspond with and differ from each other holds value for advancing the study of parenting. Among the relatively few studies that report associations between methods, correlations of observations and parent-report are inconsistent and low-to-moderate at best. These inconsistencies suggest a more complex relation between observational and parent-report measures that may involve moderation by individual parent and family factors.

Asking parents directly about their parenting has notable benefits, such as convenience, efficiency, and their experience with all interactions that make up the relationship. Parents’ report of the nature and quality of their daily interactions with children has superior ecological validity compared to laboratory tasks (Gardner 2000). Self-report is most appropriate when assessing parenting attitudes and beliefs, which are inherently subjective. Utilizing self-report for parenting behaviors, however, introduces systematic sources of bias (Gardner 2000). Like other cases of self-report, parents’ reports of their own parenting can be influenced by individual differences in interpretation, memory, and perceptions of both the questionnaire content and the behavior being sampled (Lovejoy et al. 1997; Morsbach and Prinz 2006). Furthermore, social desirability can motivate parents to endorse items generally considered to represent “good parenting,” such as praising or giving affection to children, and avoid items considered “bad parenting,” such as criticizing or spanking children, regardless of their actual behavior (Lovejoy et al. 1997; Sessa et al. 2001). How the parent-report measures are constructed including the syntax, grammar, and vocabulary also can influence responding (Morsbach and Prinz 2006).

Compared to the measurement issues involved in self-reported parenting behaviors, ratings of observed parenting behaviors by trained researchers are considered more reliable. While observer biases can emerge, the measurement error is more likely to be random than that of parent-report and more practically addressed through careful development of coding schemes and criteria for inter-rater reliability (Gardner 2000; Repetti et al. 2015). Particularly when parent-child interactions are conducted under structured conditions with task demands that mimic everyday situations, such as having parents and children play together or clean up toys, raters can assess parenting behaviors according to carefully defined coding schemes. Thus all parents within a sample are scored against the same criteria. Observational measures generally show better concurrent and predictive validity in relation to other methods of assessing parenting (e.g., child report) and to child outcomes (Gonzales et al. 1996; Scott et al. 2011; Zaslow et al. 2006). Among different types of observational coding schemes, those targeting small units of behavior with limited inference, or micro-social approaches, are less susceptible to halo effects and other rater bias than more global coding schemes (Alexander et al. 1995). In a separate analysis using the current sample, we demonstrated that micro-social observational coding of the parent–child relationship outperformed global observational codes in predicting teacher reports of child behavior (Bardack et al. 2017).

Of course, conducting observational assessments places greater demands on participating parents and children, who engage in the interaction tasks while being video-recorded, and especially on research teams, who must train to reliability and code the data. Observational methods are not always feasible, especially with large samples. As such, the field would benefit from a better understanding of the patterns and moderators of concordance between widely used self-reports of parenting and more costly ratings of parenting behaviors observed directly.

Multi-method studies of parenting that report direct comparisons of measures are not well represented in the literature (Hurley et al. 2014; Morsbach and Prinz 2006; Winsler et al. 2005). Self-report and observational measures of parenting show inconsistent concordance, with correlations ranging from non-significant (Dekovic et al. 1991; Sessa et al. 2001) to low-to-moderate (Hawes and Dadds 2006; Kochanzka et al. 1989; Lui et al. 2013; Scott et al. 2011; Zaslow et al. 2006). Evidence of associations between self-report and observational measures are reported more often in multi-method assessments of negative indices rather than positive indices of parenting. Associations may be weaker in positive compared to negative indices due to greater social desirability of positive parenting items that may restrict the range of measurement (Hawes and Dadds 2006; Scott et al. 2011; Zaslow et al. 2006). Validity data for self-reported parenting measures commonly consist only of comparison to other self-reported measures of parenting, parental symptoms, or child emotional and behavioral problems (Hurley et al. 2014). As such, relatively little is known regarding predictors of agreement between observational and self-reported measures of parenting.

Relevant findings can be found in the studies that have examined validity of parent report of children’s behavior problems. In particular, concerns about the influence of parents’ depression symptoms on their reports of children’s externalizing and internalizing problems, often referred to as depression-distortion, have prompted studies examining these links since at least 1967. In 1992, Richters described inconsistencies in the literature considering whether parents with depressive disorders or higher scores on continuous scales of depression symptoms would report exaggerated behavior problems in their children. He also attributed some of the inconsistency to important methodological shortcomings; many studies failed to include alternative methods of assessing child behavior problems for comparison, and others that did use criterion constructs from observational methods or other reporters had poor conceptual matching of the constructs from the differing methods (Richters 1992). Richters thus recommended improving future studies by considering whether symptoms of depression correlate with disagreement between methods as the outcome of interest, and carefully matching constructs in parent-report and criterion measures.

Following his recommendations, researchers have since produced more rigorous evidence suggesting that symptoms of depression in parents do explain some discrepancies between parent-report and alternative methods of assessing child behavior problems (Briggs-Gowan et al. 1996; De Los Reyes and Kazdin 2006; Durbin and Wilson 2012; Fergusson et al. 1993; Gartstein et al. 2009; Harvey et al. 2013; Leerkes and Crockenberg 2003; Najman et al. 2001). In the majority of studies that have considered depression-distortion, internalizing symptoms of parents were associated with reporting more problems for children relative to other reporters or observer ratings.

Given evidence that parent-report can be biased by maternal depression, it is surprising that similar work regarding parent report of parenting behavior and parent-child relationships is scarce. It is likely that the inconsistent pattern of concordance with observational and other methods of assessing parenting denotes complexity. Parent factors of distress and SES are especially relevant to parenting quality and tend to inter-correlate, and thus may moderate the strength of the concordance between parent report and observational measures of parenting. Specifically, parents who report symptoms of depression or distress may perceive their own parenting as more problematic than it actually is, similar to the distortion hypothesis regarding child behavior. Likewise, parents from different socioeconomic backgrounds may interpret questionnaire items differently, such that some items considered socially desirable or undesirable among families of higher SES might not have the same meaning for lower SES families (Hoff et al. 2002).

Few studies have considered possible sources of distortion in parenting measures. In a clinical sample of families with children diagnosed with ADHD, parents who endorsed more symptoms of depression reported their parenting as more negative than their children did (Chi and Hinshaw 2002). Also among parents of children with ADHD, those with more of their own symptoms of hyperactivity and impulsivity tended to rate their parenting more positively than observers did (Lui et al. 2013). Sessa and colleagues (2001) found that ethnicity moderated the relation between mother and observer report of parenting within a diverse community sample of preschool-aged children such that agreement with observer ratings was lower among African American mothers than Caucasian mothers. In a recent cluster analysis of observational and parent-report parenting measures for over 400 demographically diverse dyads with kindergarten-aged children, there emerged one subtype of parents whom authors labeled “self-critical” because parent-report scores were below average while observational scores were average (Heberle et al. 2015). Compared to the other four subgroups, the self-critical parents had higher mean scores on measures of depression and anxiety.

Concerns regarding how accuracy of parent-report may differ by SES also have been raised but not investigated (Hoff et al. 2002). Specifically, established differences in aspects of parenting by SES, such as lower SES parents generally demonstrating less self-efficacy regarding parenting and a tendency to be more controlling and punitive, could have implications for measurement agreement (Bradley and Corwyn 2002). Lower SES parents could evaluate impacts of certain parenting behaviors differently than observational rating schemes developed by researchers, judging them more negatively or positively. Research has shown that parents of higher SES backgrounds tend to have parenting attitudes that align more closely with their actual parenting behaviors and more closely with prevailing expert advice than do parents of lower SES (Hoff et al. 2002). These potential sources of measurement disagreement based on SES have not been tested empirically.

The sparse and inconsistent literature investigating patterns of concordance between parent-report and observational measures of parenting leaves much to be explained. As with considerations of parent-report of child behavior problems, it is likely that aspects of measurement and factors that differ between families contribute to these discrepancies. Understanding the role of factors that are relevant to parent-child relationships, particularly parents’ symptoms of emotional distress and family SES, may clarify the utility and limitations of parent-report for assessing the quality of parent–child relationships.

In the current study, we address the substantial gap in understanding the patterns of concordance between self-report and observational measures of parenting using data from a community sample of kindergarten-aged children and their parents. With detailed, micro-social coding of observational data from structured laboratory parent–child interaction tasks and two parent-report questionnaires, we carefully matched observational criterion constructs of two particular aspects of parenting, negative control and positive control, to parent-reports of similar constructs. Our sample was relatively diverse in family backgrounds including a range of SES and a range of parent-reported distress, enabling us to test whether these factors known to be associated with parenting quality moderated the concordance between observational and self-report measures.

Method

Participants

Participants included 102 children ranging in age from 4–6 years (M = 5.61, SD = 0.56) and their primary caregivers, the majority of whom were mothers (93%). Families indicated their interest in participating by responding to advertisements at community centers, preschools, elementary schools and libraries in the San Francisco Bay Area. The sample represented a range of income levels and ethnic backgrounds. Children were 52% male and 64% ethnic minority (26% Hispanic or Latino, 20% Asian, 14% Multiracial/Other, and 4% Black). Seventeen percent of parents reported being single parents, and regarding educational background, 13% reported having a high school degree or less, 36% reported having an associate’s or bachelor’s degree, and 42% a more advanced degree. Family income levels ranged from less than $12,000 (5.2%) to $200,000 or more (36.1%), reflecting the high cost of living in the Bay area.

Procedure

Assessments involved a 3-hour protocol with concurrent individual child and parent sessions followed by a series of structured parent–child interaction tasks. Parents completed interviews and questionnaires regarding family demographics, parenting practices, and child behavior and functioning. Both parents and children participated in behavioral tasks and physiological assessments not reported in the current analyses (Obradović et al. 2016). The parent–child interaction consisted of four structured tasks intended to elicit parenting behaviors in a variety of situations. First, parents were instructed to play with their children while enforcing a rule that children could play with toys from only one shelf while not touching a set of attractive toys from another shelf. Next, parents were given a magazine and instructed to read while asking their children to clean up the toys. In a problem-solving task, parents and children discussed a problem that parents had previously identified as a salient issue for their family. Common problems for discussion were following rules and getting along with brothers and sisters. Finally, parents were instructed to help children complete a series of geometric puzzles, using Tangoes cards and blocks. Interactions were video-recorded for observational coding.

Measures

Parent behavior and child behavior during the interaction tasks were evaluated separately by two raters trained to reliability by a master rater who also coded 20% of the cases for child behavior and a separate 20% for parent behavior. All parent behavior was categorized on a second-by-second basis into one of four mutually exclusive codes: positive control, negative control, following the child’s lead, and disengaged. Positive control involved behaviors such as giving instructions, teaching, redirecting, praise, and limit-setting when accompanied by a positive or neutral affective tone. Negative control involved criticisms, hostility, physical interventions, or any other behaviors accompanied by a negative or harsh affective tone. Behaviors considered to be following the child’s lead were active listening, reflecting, and other behaviors in which the parent was engaged with the child and positive or neutral in tone but not directing the child in any way. The disengaged code applied when the parent was distracted, withdrawn, looking away, or otherwise not interacting with the child.

Similarly, all child behavior was categorized as active on-task, passive on-task, withdrawn, or defiant/dysregulated. Child behavior was considered active on-task when the child was actively participating in task activities by speaking to the parent or leading play through actions such as manipulating materials. Passive on-task behaviors occurred when the child was appropriately engaged but not leading, such as listening to the parent with eye contact or following specific directions. Withdrawn was coded when the child was distracted, avoiding eye contact or turning away, or otherwise disengaged with both the parent and the task at hand. Finally, defiant/dysregulated behaviors included disobedience, any hostility or aggression directed towards the parent or materials, whining, and displays of negative affect. Observer accuracy was calculated based on the kappa statistic and observed base rates of behavior in the sample (Bruckner and Yoder 2006). All parent and child codes had observer accuracy scores above 90%.

Negative control (observed)

Our variable for observed negative control was calculated as the proportion of time during the parent-child interaction that the parent was coded as using negative control as described previously, regardless of the corresponding child’s behavior. The proportion of total interaction time parents used negative control ranged from .00 to .17, with M = .02 and SD = .03.

Negative parenting (self-report)

The composite for self-report of negative parenting was based upon two subscales of the Parent Practices Interview (PPI; The Incredible Years Project 2011): harsh and inconsistent discipline (15 items, α = .73) and physical punishment (6 items, α = .70). Parents responded to items such as “how often do you raise your voice (scold or yell)” and “how often does your child get away with things that you feel s/he should have been disciplined for?” using a seven-point scale ranging from never to always. Item-level scores from the two subscales were averaged to create a composite score of self-reported negative parenting (21 items, α = .74).

Positive control (observed)

We calculated observed positive control to represent the proportion of each child’s total time off-task that the parent was concurrently coded as using positive control. In contrast to negative control behaviors that we consider insensitive regardless of child behavior, we assert that positive control behaviors are sensitive in response to children’s withdrawn or dysregulated behavior but not necessarily sensitive when a child is actively or passively on-task and constructively self-directed. Scores for proportion of time during interaction with observed positive control ranged from .18 to .79, with M = .50 and SD = .14.

Positive parenting (self-report)

Parents completed the Coping with Child’s Negative Emotions Scale (CCNES; Fabes et al. 1990). The CCNES includes six subscales, three of which correspond conceptually with positive control when a child is off-task: expressive encouragement (12 items, α = .89), emotion-focused reactions (12 items, α = .77), and problem-focused reactions (12 items, α = .77). Each item is rated on a seven-point scale ranging from very unlikely to very likely. We created a composite of positive parenting as an average of the three subscale scores (α = .73).

Parent distress

Parents’ emotional distress was assessed via self-report using the Center for Epidemiological Studies-Depression Scale (CES-D; Radloff 1977). The CES-D consists of 20 items each with a four-point response scale ranging from rarely or none of the time to most or all of the time in the past week. Sample items include “I was bothered by things that usually don’t bother me,” “I felt depressed,” and “I felt sad.” The total score was calculated as the average of responses to all 20 items (α = 89).

Family SES

A composite score representing family socioeconomic status consisted of the average standardized z-scores from parent responses to questionnaire items regarding total family income (raw score M = $125,041, SD = $72,701) and the primary caregivers’ education in years (raw score M = 16.3, SD = 3.01).

Data Analyses

Complete data were available for child age, gender, and the composite of family SES. Small amounts of data (2–4%) were missing for the following: self-reported positive and negative control, observed positive and negative control, and parent distress. Chained multiple imputation was implemented using the mi impute command in Stata 14 to generate 20 complete datasets. Bivariate correlations and hierarchical regression analyses were conducted on each of the 20 datasets, and results including coefficients, standard errors, and R 2 values were combined.

Results

Associations between Self-Reported and Observational Parenting

Bivariate correlations of all study variables are presented in Table 1. The association between self-reported negative parenting and observed negative control was not significant, nor were the associations between self-reported and observational measures of positive parenting. Parent distress correlated significantly with negative parenting based on self-report (r = .29, p < .01) as well as negative control based on observation (r = .39, p < .001). Correlations between parent distress and each method for assessing positive parenting were not significant. Household SES was negatively correlated with self-reported positive parenting (r = −.27, p < .01) and observed negative control (r = −.49, p < .001), but not with self-reported negative parenting (r = −.03, p = .73) or observed positive control (r = .15, p = .13). While child gender was associated with observed positive control, neither age nor gender influenced the regression results described next.

Table 1 Bivariate correlations

Moderation by distress and SES

To test whether the association between self-reported and observational measures of parenting was moderated by parent distress or household SES, we conducted four separate hierarchical linear regression models (see Fig. 1). All predictor variables were mean-centered to aid interpretation. Model 1a considered negative parenting moderated by parent distress. The first step included self-reported negative parenting, parent distress, family SES, and covariates of child age and gender predicting observed negative control. In the second step, we added the interaction term of self-reported negative parenting and parent distress. Model 1b considered negative parenting moderated by family SES in the second step. Models 2a and 2b followed the same procedure but with the measures of positive parenting in place of negative parenting. When interaction terms showed trends or significance, we probed interactions with post-hoc analyses to assess significance of simple slopes (Preacher et al. 2006).

Fig. 1
figure 1

Interaction effects from hierarchical regression models. (OB observed, SR self-report). + p < .10; ***p < .001

Negative control

Results from the models predicting observed negative control are presented in Table 2. In the first step with main effects only, self-reported negative parenting was not a significant predictor (β = −.16, p = .12). SES was a significant predictor (β = −.36, p < .01) and parent distress showed a trend towards significance (β = .24, p = .05).

Table 2 Hierarchical regressions predicting observed negative control

In Model 1a, the interaction between self-reported negative parenting and distress emerged as significant in the second step (β = −.27, p = .02) and accounted for an additional 6.7% of the variance in observed parenting. The negative coefficient indicates that concordance between self-reported negative parenting and observed negative control was lower among parents who reported more distress. The simple slope of observed negative control on self-reported negative parenting was significant and negative one standard deviation above the mean of distress (b = −.03, p < .001), but non-significant one standard deviation below the mean (b = .01, p = .12).

For Model 1b, the interaction term for self-reported negative parenting by SES was significant (β = .27, p = .03) and accounted for an additional 7.2% of the variance in observed parenting. The positive coefficient indicates that concordance between self-reported negative parenting and observed negative control was higher among parents of higher SES. The simple slope of observed negative control on self-reported negative control was significant one standard deviation above the mean of SES (b = −.03, p < .05) but not one standard deviation below the mean (b = −.01, p = .13).

Positive control

There were no significant main effects of self-reported positive parenting, SES, or parent distress in the first step predicting observed positive control (see Table 3). In Model 2a, the interaction term for parent distress by self-reported positive parenting emerged as significant in the second step (β = −.21, p = .04), with a negative coefficient indicating that concordance between self-reported positive parenting and observed positive control was lesser at higher levels of parent distress. The simple slope of observed positive control on self-reported positive parenting showed a trend for significance one standard deviation above the mean of distress (b = −.05, p < .10), but not one standard deviation below the mean (b = .03, p = .36).

Table 3 Hierarchical regressions predicting observed positive control

In Model 2b, the interaction of self-reported positive parenting by SES showed a trend toward significance (β = .21, p = .08) and explained an additional 4.3% of the variance in observed positive control. Simple slopes showed a trend towards significance one standard deviation below the mean of SES (b = −.06, p = .07), suggesting that concordance was lower among families of lower SES.

Discussion

Overall, the associations between self-reported and observational measures of parenting differed by level of parent distress and family socioeconomic status. Specifically, concordance between the methods was worse for parents who reported more depression symptoms and parents of lower SES. These patterns of interaction may indicate a number of underlying issues relevant to understanding measurement of parenting. Our study examined both positive and negative aspects of parenting in contrast with much of the extant work involving only one or the other. Furthermore, we utilized detailed, micro-social observational methods in contrast with more commonly used and less precise global coding approaches (Alexander et al. 1995; Bardack et al. 2017).

Regarding parent distress, the findings may reflect a phenomenon similar to depression-distortion in the literature on child behavior problems. Symptoms of depression may influence how parents perceive not only their children’s behavior, but also their responses to and management of that behavior. As a moderator, distress does not simply predict worse self-report scores but changes the association between self-report and observed scores. Distressed parents who were rated by observers favorably might have been overly critical of themselves, as the depression-distortion hypothesis would predict. In contrast, distressed parents rated less favorably by observers tended to describe their parenting more positively than did non-distressed parents with similar observer ratings. We speculate that, among parents demonstrating less competent parenting behaviors, distress may inflate their self-perception of parenting because they are preoccupied with personal emotional distress while lacking insight for how their distress influences their parenting behaviors. This possibility warrants further empirical investigation.

Regarding family SES, differential concordance may reflect cultural differences in beliefs about parenting self-efficacy, how individuals interpret questionnaire items, and what individuals consider favorable and socially desirable parenting behavior. We found significant disagreement in parent-report and observational measures of parenting among lower SES families such that lower SES families who were rated more favorably by observers reported less favorable parenting. Parents from lower SES backgrounds who were observed as demonstrating more positive control may lack parenting self-efficacy and thus undervalue and underreport competence-promoting parenting (Hoff et al. 2002). On the other hand, lower SES families rated less favorably by observers reported more favorable parenting than higher SES parents rated less favorably by observers. This could reflect the tendency of some lower SES parents to engage in punitive and controlling behaviors that they may consider appropriate. Considering that most researchers come from relatively privileged backgrounds, it is likely that higher SES parents interpret measure items in a manner more similar to the interpretations of the researchers who design self-report measures and coding schemes to evaluate observed parenting. For example, warmth in response to children’s negative affect may be viewed as nurturing and supportive by researchers and parents of higher SES, but may be viewed as permissive or “spoiling” by parents of lower SES, who tend to value obedience and social conformity to a greater degree than do higher SES parents (Hoff et al. 2002). Thus, lower SES parents who demonstrate sensitivity in interactions with their children might give themselves less favorable self-reports that would conflict with observer ratings.

Relatedly, it is possible that the discordance between self-report and observational measures among parents with greater distress scores and lower SES backgrounds indicates that such parents may be less skilled at “faking good” both on self-reports and during observational tasks. However, the literature indicates that reactivity to observation is not a substantial source of measurement error in observational assessments of parenting (Aspland and Gardner 2003; Gardner 2000). Self-presentation bias is considerably more likely in self-report than observational methods (Gardner 2000; Repetti et al. 2015). As another alternative, we acknowledge the possibility that discordance could be driven by bias on the part of researchers, who may inadvertently inflate parenting scores based on apparent SES differences. However, it is unlikely that such bias accounts for the findings due to our use of micro-social momentary coding procedures, which have been shown to reduce halo effects compared to more global observational coding schemes (Alexander et al. 1995; Bardack et al. 2017). Observer bias also fails to explain why low SES parents with more favorable observational ratings would self-report their parenting in a less favorable manner compared to low SES parents with less favorable observational ratings.

It is important to note that parent distress and family SES were significantly correlated, sharing 28% of variance. As such, the moderation effects by SES could partially reflect impacts of parent distress through their common variance. Alternatively, distress could reflect effects of SES. Because distress and SES do not always co-occur, however, it is useful to understand how each factor independently interacts with self-report in predicting observational measures. Both distress and family SES can serve as markers of likely differences in measurement concordance, and one or both factors may be salient to future studies depending upon their populations of interest. While these findings highlight challenges with different methods for assessing parenting, they also offer progress for understanding and accounting for well-documented discrepancies in results from differing methods of parenting assessment.

Implications for Future Research

Results of the current study have several implications for measurement of parenting quality. First, limitations of self-reported parenting are apparent not only due to lack of significant bivariate correlations with observational measures, but also because patterns of agreement differ by highly relevant parenting covariates. We found that observational measures that were carefully matched to items and subscales in parent self-report measures did not correlate significantly. Presuming that the observational measures function as a criterion, we infer that information provided by parents through self-report should be considered in the context of both family SES and parent distress. At minimum, when researchers are predicting outcomes such as child behavior from self-reported parenting, parent distress and SES should be measured and included as covariates. Interaction terms of self-reported parenting by distress and SES might provide additional clarity when statistical power permits their inclusion.

These patterns of disagreement in parenting are most salient when researchers seek to examine parenting quality within populations that range substantially in levels of parent distress and family SES. As such, the issue is not limited to clinical or other high-risk samples. In fact, concordance between measures may actually be improved when samples are more homogeneous. For this reason, future studies that must rely on parent self-report would benefit from the development of measures that show validity not only within high risk or clinical samples but across samples that range in risk and cultural background. Ideally, validity data should demonstrate associations between parent-report and criterion measures such as observational coding that are similar rather than varying across cultural groups and across ranges of parent distress and SES. One consideration for the development of such measures would be to reduce differences in socially desirable responding, either by determining that perceived social desirability of individual items is consistent across levels of parent distress and SES or by identifying items that differentiate parenting quality with less face validity. Cultural differences in what constitutes positive or negative parenting also require further explication and development.

Another potentially fruitful avenue for future research might identify alternative methods for assessing parenting quality that do not depend upon parent self-report or the intensive and costly aspects of many observational measures. For example, recent work shows great promise for the Five Minute Speech Sample, a measure that utilizes brief audio recordings of open-ended parent responses to prompts that can be coded by trained researchers with demonstrable reliability and validity as an assessment of parent-child relationship quality (Sher-Censor 2015; Weston et al. 2016). Adapting and validating such methods for use in more naturalistic settings could further enhance their utility (Gardner 2000).

The observed patterns of moderation are noteworthy not only because they represent the complexity of measurement issues, but also because parent distress and family SES predict differences in parenting quality in the broader developmental literature. It is likely that measurement issues highlighted by our study both exaggerate and obscure the links between parenting, parent emotional distress, and SES within the literature that attempts to summarize and aggregate findings across studies that have utilized different parenting measurement approaches. Our findings indicate that measurement of parenting via observation and via self-report should not be considered equivalent. As such they should not be aggregated conceptually without consideration of how family characteristics like parent distress and SES influence measurement agreement. Conceptual reviews that carefully separate findings by method of parenting assessment would better characterize what is known.

Similarly, studies that include both self-report and observational measures with discrepant patterns of results may be better understood with attention to moderated effects. For example, studies assessing impacts of parenting interventions often identify change in self-reported, but not observational parenting (Bor et al. 2002; Eddy et al. 1998). In some cases, the intervention might be impacting relevant family characteristics other than parenting, such as parent distress, that drive these findings. Furthermore, the changes apparent in self-reported parenting may reflect processes of change in parent attitudes as distress remits or perceptions of parenting styles and self-efficacy change. These changes in parent attitudes and perceptions may then lead to improvements in observed parenting in the future.

Strengths and Limitations

Noteworthy strengths of the current study include high quality observational parenting measures as well as a relatively diverse community sample. In order to rely upon the observational coding as a criterion measure, we employed a micro-social coding scheme with small units of behavior and limited inference to reduce likelihood of rater bias (Alexander et al. 1995; Bardack et al. 2017). The information generated by coders with good reliability was then aggregated carefully to best correspond with constructs assessed by self-report measures of both positive and negative control. With our community sample, we were able to observe differences in agreement by a range of parent distress and family SES.

Despite these strengths, we were limited by a fairly small sample size of 102 dyads. While SES did range substantially, the sample was generally of lower risk with most parents reporting educational backgrounds of college degrees or higher. It is possible that patterns would differ with more representation at the lower end of SES. With 93% of our small sample consisting of mothers as primary caregivers, we lacked statistical power to assess possible differences between agreement of methods with maternal vs. paternal reports. We also utilized the CES-D measure of depression with continuous scores to represent current levels of parent emotional distress rather than attempting to assess clinical depression or differentiate parents who might qualify for diagnoses. Our findings suggest that differences across the range of distress are meaningful, though it may be the case that qualitative differences would emerge between groups by depression diagnosis.

It is also important to note that, despite our best efforts to achieve conceptual mapping between the self-reported and observational parenting constructs, it is possible that some disagreement attributed to measurement may instead have arisen from conceptual mismatch. As a specific example, parents reported on inconsistency as part of the negative parenting measure while observational coding of negative control did not include such behaviors. Furthermore, findings based on our specific focus of positive and negative control may not generalize to other aspects of parenting, such as affective warmth and responsiveness. We also recognize the need for caution when presuming that our observational measures of positive and negative control represent the best criterion. While its short-comings are well documented, self-report of parenting maintains certain strengths. Parents clearly have the most experience with their own parent-child relationships and the largest samples of behavior from which to draw. They can consider aspects of the parent-child relationships across all contexts whereas observational measure rely upon what can be seen or inferred during brief, highly structured laboratory tasks. For example, seemingly self-critical parents, whose observational scores were more favorable than self-reports, may have been reporting problematic behaviors or interactions that occurred in their daily lives but were not apparent in the structured laboratory tasks.

In summary, we found that parent distress and family SES moderated the concordance between self-report and observational measured of parenting in terms of both positive control and negative parenting. These patterns of moderation likely reflect differences in parents’ interpretations of behavior, self-report measure items, cultural perceptions, and aspects of social desirability. Efforts to further understand the nature of discrepancies among parenting measurement methods have potential to improve our understanding of the extant research on parenting and to inform future study designs.