1 Introduction

Interparental conflict has a negative effect on the development of children but has only been described and studied in more detail since the turn of the millennium (Buehler et al. 2006; Buehler et al. 1994; Lipman et al. 2002). The consequences of interparental conflict for personality development of the children can extend far into adulthood and later relationships (Kelly 2000; Wekerle and Wolfe 1999; Amato and Sobolewski 2001), in particular when elements of interparental hostility or hostile-aggressive parenting (HAP) are involved.

Approximately 1 in 10 children experiencing the separation and divorce of their parents is affected by serious maltreatment or abuse (Gilbert et al. 2009). However, such estimates underestimate the true extent when studies are limited to only certain forms of maltreatment and abuse (Fallon et al. 2010). In addition to this, only a small proportion of cases of maltreatment can be found in the youth protection reporting systems (MacMillan et al. 2003). Passing on divorce problems and parent behaviour directed against the child to subsequent generations can also be problematic (Amato and Cheadle 2005; Scaramella and Conger 2003).

A similar damage potential can result from sexual and non-sexual forms of maltreatment and abuse (Egeland 2009). Studies in Great Britain, the USA and Germany on the spreading of emotional and psychological child maltreatment report a prevalence of approx. 10% of the children questioned (Edwards et al. 2003; Finkelhor et al. 2005; Iffland et al. 2013), for Eastern European countries, estimates of up 33% are reported, depending on the country and the severity classification (Sebre et al. 2004).

A German survey study on hostile-aggressive parenting revealed that 75% of parents who live apart from their children and only have limited contact with them see their child exposed to a form of maltreatment or abuse; 49% of study participants use the term child maltreatment or abuse in its direct form (KiMiss study 2012). A major problem when collecting such data is the lack of definitions for content-overlapping terms such as psychological or emotional maltreatment or mental abuse (O'Hagan 1995).

The term child well-being plays a tragic role in the study of non-sexual forms of maltreatment and abuse and can be distorted and used arbitrarily by extreme positions or ideological influences (Seaberg 1990; Cherlin 1999). This is often due to the fact that attempts are made to answer questions which have been asked in a conceptually incorrect manner: While it is clear that the term child well-being must be defined on a continuous scale by its characteristics as a parameter for quality of life (Duerr et al. 2015), attempts are often made to assess issues relating to child well-being with a yes/no answer. The conceptual error here is oversimplifying a continuous measure to a dichotomous measure.

The same problems occur when defining terms such as psychological or emotional maltreatment or mental abuse: Here too, the attempt at a yes/no assessment is a defective approach because the continuum of mild to severe forms of maltreatment and abuse is dichotomised. In order to solve these problems, the concept of ‘Loss of child well-being’ has been created, which makes it possible to deal with the term child well-being on a continuous scale and to classify the degree of maltreatment or abuse when certain thresholds are exceeded (Duerr et al. 2015).

The study carried out here is based on data from a survey study carried out in Germany in 2012 on 1146 parents who live apart from their children and have less contact with them than they would like (KiMiss study 2012). Parents reported on the living situation with their children based on a questionnaire containing 151 items in the context of hostile-aggressive parenting.

In terms of methodology, the survey carried out here uses a rating method in which a score was determined for each of these 151 items (Duerr et al. 2015). This score describes the severity of parent behaviour and makes it possible to derive the loss of child well-being as a percentage. If several items are observed in one family case, the question arises of how add up the severity of the individual items so as to be able to estimate a total loss of child well-being. The object of this survey is to develop a suitable methodology.

So far there has been no general concept for how factors which reduce quality of life can be added up mathematically and correctly in terms of content to get a total loss of quality of life. The development of the method presented here combines the results from the aforementioned studies, the 2012 survey study and the 2015 expert rating. This allows the instrument to be calibrated and validated based on data.

2 Materials and Methods

The aim of this study is to develop an instrument to evaluate a number of family matters and calibrate the instrument using data. The instrument merges the results of a survey study and a rating study:

  1. 1)

    Survey study: The data basis is a survey study that was carried out in Germany in 2012, in which parents reported on 151 items on the topic of HAP (KiMiss study 2012). The list of items, taken from a Canadian manual (Family Conflict Resolution Services 2010), is provided in Table T2 of the Online resource, and is referred to as the HAP item list below. The present study uses the following two main elements of the survey study:

  1. A)

    The 1146 parents reported a total of 46,720 items, which averages at approximately 40 items per child (median: 37 items).

  2. B)

    The parents reported their overall assessment of whether they considered the sum of the reported items as a form of child maltreatment or abuse. 52% of the parents affirmed this question.

  1. 2)

    Rating study: The quantitative basis for developing the instrument is an expert rating which estimated a score (R score) for each of the 151 items of the HAP item list (Duerr et al. 2015). This score describes the severity of parent behaviour and can be projected on the scale of percentage loss of child well-being. The instrument developed here combines individual R scores of items to form a total score, thus being able to estimate a total loss of child well-being per case. The scale for loss of child well-being has been constructed in such a way that items can be added together in a mathematically correct manner using the R scores.

In contrast to the mathematical addition of items, a content-correct addition is complicated by the fact that items can describe similar situations and content overlap must be considered when calculating a total score. For example, content overlap is evident for items G025 and G127: A parent who does not pick up or drop off their child for parent-child contact appointments will probably not contribute to the resulting travel costs (P(G025|G127) = 87%, KiMiss study 2012).

These conditional probabilities (‘intersections’) must not be counted twice in a total calculation, but rather must be subtracted. However, this is virtually impossible when there are many items per case: the number of possible overlaps between items increases disproportionately to the number of items and quickly reaches values which make it impossible for the problem to be treated statistically (e.g. case with 10 items: There are 1015 different ways to combine 10 out of 151 items). The essential empirical data on possible intersections can no longer be ascertained due to the limited sample sizes of the studies. The problem is illustrated in the Online resource, Fig. F1; the Online resource contains essential parts of the methodology developed in this investigation.

R scores of items can be added together in such a way that items with content overlap are excluded from the total calculation and do not contribute to the total score. This will be called Elimination-below-Maximum (EbM) method in the following. For the present investigation the items of the HAP item list have been gathered into the following eleven item groups (method EbM-11c, further methods see Online resource):

  1. A)

    Item group 11c-A: Behaviour against the child

  2. B)

    Item group 11c-B: Behaviour against the other parent

  3. C)

    Item group 11c-C: Behaviour against contact child/other parent

  4. D)

    Item group 11c-D: Alienation or manipulation of a child

  5. E)

    Item group 11c-E: Non-cooperation, splitting of the family

  6. F)

    Item group 11c-F: Behaviour of the child against a parent

  7. G)

    Item group 11c-G: Neglect up to endangerment of a child

  8. H)

    Item group 11c-H: Problems specific to a parent, parenting skills

  9. I)

    Item group 11c-I: Financial matters

  10. J)

    Item group 11c-J: Medicine and health

  11. K)

    Item group 11c-K: Issues at court, social services, etc.

An EbM method considers only the highest-scoring item (the most severe item) of the item group for the total calculation; lower-scoring items will be ignored. The total R score is determined from the total of the maxima of the item groups. The maximum rule is essential since a higher-scoring item may not be eliminated by a lower-scoring item from the same group (see Online resource, Fig. F1).

The elimination of lower-scoring items assumes that they are topically fully represented by the highest-scoring item in the group. The approach is therefore conservative and tends to produce underestimation that requires a more generalizable consideration: An EbM procedure produces an underestimate if content overlap is excessively eliminated (e. g. when too few item groups have been used); it produces an overestimate if content overlap is not sufficiently eliminated (e. g. when too many item groups have been used). The correction of such over- or underestimating influences requires options for adjusting the algorithm as follows.

Methods for quantifying human assessments and opinions are not exact and require options for adjustments. A survey study among affected people can, for example, produce overestimations caused by over-reporting if reports are not independently validated. On the other hand, for instance in the case of data collection by external assessors, underestimation or under-reporting can be expected, caused by an incomplete overall assessment. Deviations from both tendencies are possible (overestimating external assessment or underestimating self-assessment), even at the same time (unbalanced or biased assessment of topics by the same person).

The correction of potential under- and overestimations is achieved in the EbM method by the two calibration factors po and pu which are both described in the legend to Fig. 1. For the present study, the EbM method must be adjusted by an overestimation factor po to achieve agreement between the total scores derived from the parent reports and their overall view on the presence of child maltreatment or abuse.

Fig. 1
figure 1

Total loss of child well-being as a function of the total score. The total loss of child well-being (VS) as a function of the total score (RS) can be calculated by \( {V}_s={Log}_{10}\;\left({R}_s\;\frac{\left(1-{p}_o\right)}{\left(1-{p}_u\right)}/{R}_1\right)/{Log}_{10}\;\left({R}_5/{R}_1\right)\ast 100\% \), whereby R1 = 1.75 and R5 = 35.3: Threshold values for standardizing VS to the limits of VS = 0 and 100%; po: Proportion of overestimating influences (e. g. proportion of incorrectly claimed items in case of over-reporting); pu: Proportion of underestimating influences (e. g. proportion of incorrectly ignored items in case of under-reporting). VS ≤ 0% represents no loss of child well-being (Category 0); VS ≥ 100% represents a complete loss of child well-being and is equated with the presence of a form of child maltreatment or abuse (Category 5). The detailed definition of categories 0 to 5 can be found in Duerr et al. 2015

Conducting an EbM method requires the following computations for each item report of a study participant i, illustrated in the example of the EbM-11c method:

  1. 1.

    Classification of the items affirmed by study participant i in G = 11 item groups,

  2. 2.

    Selection of the 11 Items with the maximum R score per item group (Rg,i),

  3. 3.

    Total score of case i = Sum of the maximum R scores \( \left({R}_{S,i}={\sum}_{g=1}^G{R}_{g,i}\right) \),

  4. 4.

    Adjustment and transformation of the total score into the total loss of chil well-being (see Fig. 1).

  • More details on the methodological part are provided in the Online resource.

3 Results

The study population is characterized by the presence of hostile-aggressive parenting (HAP) as shown in Table 1. Items have been reported most frequently from the item group ‘Non-cooperation, splitting of the family’ (on average 47.3% of the 20 items), followed by ‘Behaviour against contact child/other parent’ (40.8%) and ‘Alienation or manipulation of a child’ (40.5%). ‘Non-cooperation, splitting of the family’ is reported by 97.9% of the study participants, followed by ‘Behaviour against the other parent’ or ‘Behaviour against contact child/other parent’ (91.7 and 90% of study participants, respectively). The study population has been described in more detail in an online data report (KiMiss study 2012).

Table 1 Frequency of item groups reported in the KiMiss study 2012

The distribution of loss of child well-being has been estimated from the N = 1146 parent reports by the classification method EbM-11c (see Methods and Online resource) and is shown in Fig. 2. EbM-11c requires a calibration factor of po = 0.44 to achieve agreement between the distribution of loss of child well-being and the parents’ overall view on the presence of child maltreatment or abuse (see Fig. 2b). For a survey study as the one used here, the calibration factor is likely to compensate predominantly for over-reporting.

Fig. 2
figure 2

Distribution of the loss of child well-being. Distribution of the loss of child well-being among N = 1146 parent reports after application of classification method EbM-11c. Grey bars: a loss of child well-being greater than 100% is considered as a form of child maltreatment or abuse (Duerr et al. 2015). a raw data (no adjustment, po = 0). b Considering a calibration factor of po = 0.44, 52% of cases in B exceed a loss value of 100% which is in agreement with the empirical finding of the KiMiss study 2012: 52% of the parents describe the consequences of interparental conflict on the child as “ ... a form of child maltreatment or abuse”

The EbM method can be adapted by the number of item groups and the calibration factors to other fields of applications or other groups of assessors. The calibration factor is proportional to the number of item groups of the EbM method used: an EbM method with few item groups eliminates many items and requires less adjustment than an EbM method with many item groups. The relationship is shown in Table 2.

Table 2 Correlation between EbM method und calibration factor po

Further psychometric properties of the EbM method are summarized as follows, referring to the sensitivity analyses described in the Online resource:

  1. 1.

    The EbM method provides robust estimates for the total loss of child well-being; the estimates are outlier-insensitive and can be reproduced in parallel approaches with high precision (Online resource, Fig. F2).

  2. 2.

    For practical purposes, the estimate for the mean total loss of child well-being does not depend on the type of classification, but on the number of item groups of a classification method.

  3. 3.

    The relationship between mean total loss of child well-being and number of item groups is so highly correlated that an estimated mean total loss of child well-being can be predicted for any other classification method (Online resource, Fig. F3).

The obvious assumption that the subjectivity of a classification method has quantitative side effects is therefore not confirmed.

4 Discussion

The present study analysed 1146 reports of parents affected by hostile-aggressive parenting (HAP) under separation and divorce. A method is developed to calculate a total score from items, with special consideration of the problem that items may not be scored twice when there is content overlap between them (see Methods and Online resource).

The present methodology differs in various aspects from available instruments and tests used in child welfare assessments or in the practice at family courts (Quinnell and Bow 2001; Heinze and Grisso 1996); the main differences are:

  1. 1.

    The EbM method is scalable and can be calibrated to differing assessors (e. g. self-assessments vs. external/independent assessments) or to different countries, and in particular to longitudinal trends in human populations (e. g. shifts in ethical standards over decades).

  2. 2.

    The total score obtained from an EbM method can be directly transformed into the interpretable measure of loss of child well-being (rather than describing the deviance of an individual’s total score from a norm on a purely statistical basis).

  3. 3.

    The EbM method must, in contrast to other psychometric methods, allow for the elimination of content-overlap within a subset of items (compared to a ‘simple’ summation procedure yielding a total score).

  4. 4.

    The single-item R scores used here have been derived from a Delphi procedure (Duerr et al. 2015) and they are quantitatively more significant than simple rank scores used by other instruments if obtained, for instance, from a Likert scale.

Based on these differences the EbM method contributes novel elements to a ‘child indicators movement’ (Ben-Arieh 2008), even though the child does not represent the directly assessed unit of observation and the children’s own view is not yet a part of the method (compare, for instance, Kurdek and Berg 1987). The consideration of such factors, and the consideration of multi-dimensionality in general, is the task of instruments that combine several examination methods like the Ackerman-Schoendorf Parent Evaluation of Custody Test (Ackerman and Schoendorf 1992).

Compared to the available tools, the method presented here is most similar to the Spousal Assault Risk Assessment (SARA) (Kropp and Gibas 2010; Kropp and Hart 2000), with the main difference that SARA is used to interview ‘offenders’ whereas the EbM method is used to evaluate reports of ‘victims’ of HAP. The identification of domestic violence is always subject to methodological difficulties (Bow and Boxer 2003), and the method presented here is not immune to the criticism about the theory and practice of assessing inter-parental conflicts (Emery et al., 2005).

These difficulties increase if the term child well-being is considered additionally (Amerijckx and Humblet 2014). With the topic of quantifying parental conflicts, the present study considers only a small part of the axes or domains of child well-being (Axford 2009; Ben-Arieh 2000). Considering the complexity of the term child well-being, the present investigation addresses only a domain-specific loss of child well-being, whereby this should even be restricted to a subdomain called interparental issues. Despite this subdomain characteristics we must not forget that interparental conflicts, even taken alone in itself, can hurt a life, not only a childhood, but also the later life, leaving wounds that can be inherited to subsequent generations (see Section 1).

The starting point for the need of a new method is the problem that HAP items can describe similar topics or issues which, however, can differ with respect to their degree of severity (see Methods and Online resource). The obvious approach to eliminate redundancy of content overlap of items by means of factor reduction methods does not allow a correct treatment of the problem because items of varying degree of severity would be aggregated by an average degree of severity (e. g. factor loadings or otherwise averaged weighting factors).

The present study shows that the Elimination-below-maximum (EbM) method exhibits good characteristics in terms of quantitative robustness and precision, allowing for a quantitative evaluation of interparental conflict based on an item questionnaire. Items can be grouped according topics, and they can be included or excluded depending on the question under study (see Online resource, Table T1). An important finding of this study is that a total score estimated by an EbM method does almost not depend on the type of classification, but on the number of item groups of the classification.

The number of item groups shifts the baseline level of an EbM method, which can be adjusted by means of calibration. The calibration factors po and pu make it possible to adjust the EbM-derived total score to the overall question “Does the sum of those items represent a form of child maltreatment or abuse?”. The estimate of po = 0.44 in this study means that the total scores must be reduced by 44% to match the parents’ overall view as to whether there is a form of child maltreatment or child abuse in their case.

The assessors in this study were the affected parents who are likely to answer the target question “Child maltreatment/abuse ... yes or no?” in a sensitive manner. An EbM method is, however, not restricted to this group of assessors since the calibration factors can compensate for differing assessor-sensitivity. The calibration factor p0 in this study would be estimated with a smaller value if assessments were less sensitive than those reported by the parents in this study. Ideally, a calibration factor should be derived from the society as a whole to represent what ‘we’ define as a form of child maltreatment or abuse.

A quantification method such as the EbM method cannot define the topics which a psycho-social assessment should address - this must be defined by the question under study. It will not be highly disputable that a quantification of interparental conflict should include topics such as ‘physical violence’ or ‘clearly existing maltreatment’. However, it may be disputed whether a fact is ‘clearly existing’ or not. For example, the controversial discussions on parental alienation syndrome in recent decades have shown that the topic ‘alienation and manipulation of a child’ (item group 11c-D) is not a widely accepted fact, but can be subject to a certain ideology or to other sources of bias.

The controversial considerations on the relevance of topics becomes more apparent when looking at less substantial issues, for instance ‘Financial matters’ (Item Group 11c-I). Issues such as the lack of willingness to share special expenses between parents, or fraudulently claiming the existence of special expenses (items G118, G124, etc.), are classified by some assessors as ‘hardly relevant in terms of child well-being’, and by others as a ‘generally relevant component of interparental conflict which the child will be more or less exposed to’. Such variability in opinions is already considered by the R scores used in this investigation. It is, however, a general problem that institutions and assessors, and actually the society as a whole, have to achieve agreement on which items will be counted, and on how they will be weighted.

For an application of the instrument in practice, the following conclusions can be taken: The answer to the question of whether a particular parent behaviour is attributable to a (non-sexual) form of child maltreatment or abuse is based essentially on the assessment of a sum of facts that can be quantified by an EbM procedure. A transparent and reproducible assessment of interparental conflict requires that the following three conditions are fulfilled:

  1. 1.

    Item list: A generally accepted ‘catalogue’ of items exists that defines not only the items themselves, but also their weighting or scoring. (Here: HAP item list).

  2. 2.

    Item classification: items have to be classified such that they form subject areas in which content-overlap can be appropriately eliminated. (Here: 11 item groups, EbM-11c).

  3. 3.

    Calibration: the method of item summation must offer options for adjusting the numerical result of the algorithm (greater equal or less than 100% loss of child well-being) to the overall assessments on the presence of child maltreatment or abuse.

Interparental conflict can be reduced when there is a social consensus about what is good for children and how we assess conditions deviating from this. The present study, together with the scoring of the individual items (Duerr et al. 2015), has established prerequisites for quantifying interparental conflict in a transparent and reproducible manner - it remains to evaluate and apply such methods in practice on the basis of a socio-political consensus.