Family therapy has been shown to be similarly effective compared to individual therapy, exhibiting moderate to high overall effect sizes (Heatherington et al. 2015; Sexton et al. 2013). In a systematic review of 47 randomized controlled trials evaluating systemic therapy for externalizing disorders in childhood, including attention deficit hyperactivity disorder and substance use disorders, 42 studies have shown the efficacy of systemic therapy for specific symptom reduction (von Sydow et al. 2013). Another 33 out of 38 randomized controlled trials of systemic family therapy supported its efficacy for internalizing and mixed disorders in childhood and adolescence, such as mood disorders (Retzlaff et al. 2013). Outcome research indicates that systemic therapy is almost equally efficacious compared to well-established family therapy approaches such as Cognitive-Behavioral Therapy and Psychodynamic Therapy (Carr 2014b; Pinquart et al. 2016).

Family therapy targets social and interactional processes between all members. Yet, it is worth noting that emotional or behavioral problems of children are one of the most common reasons for families to seek treatment (Merikangas et al. 2010; Merikangas et al. 2009). In many cases the child or adolescent is considered to be the index-patient of the family system (i.e. the patient carrying the symptom). The main focus in outcome research is usually the symptom itself. Here, observational measures using for example coding schemes to analyze play tasks or specific parent-child interactions in laboratory or other settings can be used to assess treatment outcome in detail (Aspland and Gardner 2003). Questionnaires are more economic and a number of measures are frequently used to evaluate improvements in the child’s or adolescent’s symptom load. The Child Behavior Checklist (Achenbach 1991; Achenbach and Rescorla 2004) focuses on psychosocial (adaptive) functioning/dys-functioning, problems, and competencies. The Behavior Assessment System for Children-2 (Reynolds and Kamphaus 2004) is directed towards emotional or behavioral disorders. Finally, the Youth Outcome Questionnaire 2.01 (Atkin et al. 2001; Burlingame et al. 2005; Burlingame et al. 2004) parallels youth self-report version (Wells et al. 2003) and assesses behavior change resulting from therapy (McClendon et al. 2011).

By referring to systemic theory, the focus on the symptom presents a problem. Even if the child or adolescent is the index-patient of the family system, systemic approaches consider the interactions and relationships between all system members as crucial factors in the development and sustainment of a disease. Unfortunately, only cautious attempts have been made to include interactional aspects as outcome variables in psychotherapy research in general. Yet, focusing only on symptom reduction can mislead especially when an intervention targets other aspects such as social interactions or communication styles (Adelman et al. 2014). Social interactions, social support, relationship quality and relationship satisfaction are important predictors for health and well-being (Miller et al. 2009; Rusbult and Van Lange 2003). Relationship quality and relationship satisfaction between children/adolescents and their parents are not just important for child/adolescent development (Cleveland et al. 2008; Moore et al. 2004) but can also foster family and individual resilience (Patterson 2002; Rayner and Montague 2000; Walsh 2003). The parent-child relationship changes along the lifespan as reliance on the parents diminishes and changes into a more reciprocal relationship (Allen et al. 2003; Noller et al. 2001).

Different theories aim to describe and explain important domains of relationship quality in families, referring to well-being and social-functioning and taking into account the changing needs of the developing child and adolescent. The sense to belong (Resnick et al. 1993) functions as a protective factor for different problem and high risk behaviors. A high sense of social connectedness among people but also to one’s own family is strongly associated with higher well-being in adolescence (Jose et al. 2012). Low family cohesion is connected to depressive feelings, especially feelings of loneliness, and reduced social acceptance (Johnson et al. 2001). Family cohesion also predicted children’s internalizing and attention problems longitudinally (Lucia and Breslau 2006). It follows that positive family involvement and expressed warmth are essentials for the establishment of a good family atmosphere. Both have been identified as predictors for reduced symptom development and better social functioning during the prodromal phase of schizophrenia (O’Brien et al. 2006). Perceived (social) support and the ability to get responsibly involved in family decisions provide a buffering against negative psychological outcomes or maladjustment (Cook et al. 2002; Demaray and Malecki 2002).

The frequency of communicational interaction along with the need for shared experiences and emotional self-disclosure diminish during the transition from childhood to adolescence. Still, the value of the continuity of (caring) parent communication for health outcomes has been frequently reported (Ackard et al. 2006; Laursen and Colling 2004; Phillips-Salimi et al. 2014). Collective efficacy is the perceived ability of the parent-child group to jointly and effectively perform a certain task using their skills based on a sense of cohesion (Bandura et al. 2011; Wells et al. 2004). This includes the ability for conflict management, problem solving and shared decision making fostering motivation, endurance, and overall family functioning. For example, collective efficacy enhances family attachment and support, protecting adolescents from suicidal behavior (Brooks-Gunn et al. 2010). It also includes the foundational ability to adapt to changing life experiences in different domains. Taken together, family functioning and the relationship quality between children/adolescents and their parents exhibit numerous health benefits. Adolescents perceive a positive, respectful and caring relationship with their parents as meaningful and helpful. This strongly encourages the inclusion of relational aspects in psychotherapy outcome research.

However, currently available measures of family functioning still have shortcomings. Based on a current systematic review, only five out of eight existing commonly used measures are suitable for assessing clinical outcome in couple and family therapy (Carr and Hamilton 2016): The McMaster Family Assessment Device (Baldwin et al. 1983; Mansfield et al. 2015), Family Adaptability and Cohesion Evaluation Scales (Olson 1985, 2011), Self-Report Family Inventory (Beavers and Hampson 2000), Family Assessment Measure III (Skinner et al. 1983; Skinner et al. 2000), and the Systemic Clinical Outcome Routine Evaluation (Carr and Stratton 2017; Stratton et al. 2010). Yet, they contain a large number of items, take a long time to complete, focus on a specific theory, are problem-oriented, and are usually context-specific. Moreover, only a few report their applicability to children and adolescents. Reports on quality criteria, including indices for sensitivity of change, are scarce (Olson et al. 1979).

The Evaluation of Social Systems scale (EVOS) was developed precisely to address the need for a brief and economic measure of the quality of social relationships for the evaluation of systemic, multi-person intervention outcomes in different social systems (such as families) from the perspective of all members of a targeted social system (Aguilar-Raab et al. 2018; Aguilar-Raab et al. 2015). EVOS aims to assess the valuable perspective of each system member without the presumption of “healthy” or “functional” relationships. The item wording theoretically originated from systems theory and from models of functionality and relationships in families, such as the Beavers Systems Model (Beavers 1981), the Circumplex Model of Marital and Family Systems (Olson 1985; Olson et al. 1979), and the McMaster Model of Family Functioning (Baldwin et al. 1983; Bishop et al. 1978). Additionally, groups and social contexts with regard to organizational psychology (Anderson and West 1996; Faulstich 1998; Kauffeld 2001; McGrath et al. 2000), as well as Bandura’s concept of collective efficacy were taken into account (Bandura et al. 2011). Overlapping dimensions were extracted in order to condense the most important ones for different social systems. In line with systems theory, EVOS follows a non-problem-oriented, non-normative, and constructivist approach. It focuses on the first person plural “we…” instead of “I…”, in order to assess an individual’s perception of the group or system as a whole, including the individual him/herself. The EVOS scale includes nine items covering the following aspects subsumed under the two factors (1) quality of relationship and (2) collective efficacy —all of which are important aspects for the quality of child/adolescent-parent relationship: (1) communication, cohesion, atmosphere, giving and taking, as well as (2) collective aims, resources, decision making, finding solutions, and adaptability. An additional, separate tenth item, that is not used to calculate the EVOS total score, can assess the perceived consensus within the system for additional evaluation. The EVOS scale was initially developed using exploratory factor analysis resulting in two distinct factors. On the basis of systems theory, we assumed the two factors (i.e. quality of relationship and collective efficacy) to be correlated, rather than being independent from each other. Additionally, EVOS aimed at a global evaluation of the quality of a social relationship. These assumptions were tested using confirmatory factor analysis. A model with two correlated factors was established in adult samples and validated in both English and German. EVOS has also been evaluated in different social contexts—such as couples, families and working teams—and proved to be equally applicable to all social contexts. Thus, the scale is a short, reliable, sensitive to change, and valid measure of the quality of relationship and collective efficacy (Aguilar-Raab et al. 2018; Aguilar-Raab et al. 2015).

In order to monitor family therapy sessions or evaluate the outcome of family therapy, all members of the family—including adolescents— should be assessed. We believe that children and adolescents are especially important as pillars of a family life. Disregarding them in family therapy outcome research means to miss an integral part of a picture—in such cases, findings can only be misleading. In order to respect children and adolescents, and their associated needs for a supportive environment, it is vital to apply measures that reflect and include their views. Commonly used self-rating measures of family functioning have rarely been tested for applicability in younger ages. Additionally, it is most often unclear if ratings provided by children and adolescents are truly comparable to adults’ judgments. The present research investigated the applicability of EVOS in a sample of adolescents. We examined factorial validity, construct validity, and measurement invariance between adolescent and adult samples.

Method

Participants

Participants were N = 203 adolescents from two different schools in Germany. In Germany, there are three different tiers of schools. Basic schooling (“Hauptschule”) goes up to the ninth grade. Middle school (“Realschule”) means schooling up to the tenth grade and finally, high school (“Gymnasium”) is the university-track school type and means schooling up to the 13th grade. The first subsample encompassed n = 112 middle school students (Mage = 14.51; SD = 1.62; range = 12 to 18; 46.4% female). The second subsample constituted n = 91 high school students (Mage = 15.02; SD = 2.18; range = 11 to 19; 48.4% female).

Procedure

For practical reasons and ethical concerns on the side of school administration, participants in subsample 1 completed only the EVOS. Adolescents in subsample 2 filled out additional validation measures. Participants, parents, and teachers were fully informed about the project. Prior to participation, adolescents and parents provided informed consent. Participants could individually take part in a lottery for vouchers as an incentive. Additionally, the school class as a whole could participate in a lottery for a lunch order of their choice for all students in class. For comparison purposes, we used data presented by Aguilar-Raab et al. (2015), where N = 188 adults with a mean age of 30.46 (SD = 13.07; 80.9% female) evaluated their family relationship. The study was approved by the ethics committee of the university hospital Heidelberg (S-508/2012).

Measures

EVOS: Family relationships

Family relationship quality was assessed using EVOS (Aguilar-Raab et al. 2015). The quality of the social system is assessed by four items capturing the emotional/affective level of the quality of the relationship (e.g., “For me, the way we talk with each other, is …”), and five items measuring the rather cognitive evaluation of collective efficacy of the family (e.g., “For me, how we adapt to change, is …”). Answers on 4-point rating scales range from 0 = very poor to 3 = very good. We computed three different scores: We computed mean scores for both subscales quality of the relationship (Cronbach’s Alpha = 0.81) and collective efficacy (Alpha = 0.80) as well as a mean score for the whole scale (Alpha = 0.88). EVOS includes a tenth item measuring the perceived consensus among family members. This item is not an integral part of the scale and may be used to explicitly assess the agreement between family members. For the purpose of the present study, where we have only a single assessment of a certain family, we ignored the consensus item.

SCORE-15: Family functioning

SCORE-15 is a validated measure of family functioning assessing strengths, difficulties, and communication among family members (Hamilton et al. 2015; Stratton et al. 2014). Sample items include “People often don’t tell each other the truth in my family” and “It feels miserable in our family”. Answers were given on 5-point rating scales marked from 0 = do not agree at all to 4 = completely agree. A mean score was computed. Cronbach’s Alpha amounted to .81.

SDQ-deu: Psychological distress

We used a German adaptation (Lohbeck et al. 2015) of Goodman’s (1997) Strengths and Difficulties Questionnaire (SDQ). The scale is a validated instrument with 25 items, measuring social behavior, emotional problems, behavioral problems, hyperactivity, and problems with peers. Sample items include “I worry a lot” and “I often get into fights. I can force others to do what I want”. Each of the five subscales contains five items. Answers were given on three-point scales marked 0 = not true, 1 = partially true, 2 = completely true. Negatively coded items were reversed before summing up items to subscale and total scale scores. Higher scores indicate greater difficulties. Scale reliabilities amount to Alpha = .69 for the SDQ total score, as well as .73 (emotional problems), .38 (behavioral problems), .71 (hyperactivity), .70 (peer problems), and .63 (social behavior) for the subscales.

Data Analysis

We used SPSS 22 (IBM, 2012) and Mplus 7.11 (Muthén and Muthén 1998-2012). For confirmatory factor analyses model fit was evaluated by (1) the—ideally non-significant— χ2 test (Bentler and Bonett 1980); (2) the comparative fit index (CFI) with values of .90/.95 and above indicating appropriate/good model fit (Bentler 1990; Hu and Bentler 1999); (3) the root mean square error of approximation (RMSEA) with values less than 0.08 indicating good model fit (Browne and Cudeck 1993); and (4) the standardized root mean square residual (SRMR) with values less than 0.08 (Hu and Bentler 1999) considered to reflect good fit. For model comparisons we used the Bayesian Information Criterion (BIC) as a comparative fit index (Schwarz 1978). Lower scores indicate better model fit and differences greater than +/˗10 indicate unequal fit (Raftery 1995).

We initially checked for multivariate normality using Small’s omnibus test. Results indicated that multivariate normality did not hold, both in the adult (χ2 (18) = 78.54, p < 0.001) as well as in the adolescent sample (χ2 (18) = 154.76, p < 0.001). This was expected due to EVOS’ coarse 4-point scale and is in line with prior research on the EVOS scale (Aguilar-Raab et al. 2015). Hence, we used a full information maximum likelihood estimator with robust standard errors (MLR) in all CFAs. Mplus provides MLR for maximum likelihood with robust ‘Huber-White’ standard errors and a scaled test statistic asymptotically equivalent to the Yuan–Bentler T2* statistic (Yuan and Bentler 2000) and similar to the robust Satorra–Bentler scaled χ2-statistic (MLM; Chou et al. 1991). A total of 0.18% of EVOS data were missing at random and subsequently handled using full information ML. The validation measures had 0.83% missing values. Due to the negligible amount of missing data, we computed mean scores for all participants and variables. A subsample of only n = 90 was available for correlation analyses. Nonetheless the sample size should be adequate to investigate construct validity. In adult samples, quality of family and couple relationships (as measured with EVOS) correlated with measures of general psychopathology between r = −0.30 up to r = −0.54 (Aguilar-Raab et al. 2018; Grevenstein et al. 2018). We expected to find a similar correlation in adolescent samples. Power calculation using G-Power (Faul et al. 2007) indicated that a sample size of N = 84 would be sufficient to detect a correlation of r = −0.30 assuming common parameters of ɑ = 0.05 (two-tailed) and power (1 – β) = 0.80. For single-sided testing, which is clearly in line with our hypothesis, the required sample size is even reduced to N = 67.

To compare the measurement model of EVOS between adolescent and adult samples, we examined measurement invariance (MI) (Vandenberg and Lance 2000). When applying a scale across situational contexts, cultures, or age groups with the intent to compare their scores numerically, most researchers simply assume that the scores reflect the identical construct. Yet, differential use of a scale and disparate measurement models in specific samples may also account for differences between groups. Mean scores or correlations with external measures can only be meaningfully compared if MI can be established (Chen 2008). One needs to ascertain that differences in scale means are due to true differences in latent means, not different item utilization.

MI is tested within a sequential approach of nested, increasingly restricted confirmatory factor-analytical (CFA) models. Four increasingly restrictive forms of MI are usually tested (Meredith 1993; Schmitt and Kuljanin 2008; Vandenberg and Lance 2000) (1) Configural MI assumes equal construct dimensionality and equivalent item-to-factor patterns across groups. (2) Under Metric MI, all item loadings are constrained to be equal across groups, indicating that the same construct is measured in both groups and that participants attribute the same meaning to the latent construct. (3) Scalar MI additionally assumes invariant item intercepts across groups, indicating that scores are based on the same unit of measurement. Scalar MI allows a meaningful interpretation of latent mean differences. (4) Strict MI additionally requires equality of item residuals, indicating equal reliability across groups. When strict MI holds, all differences on manifest variables are due to true differences on the latent variables, rather than measurement error. Strict MI would then allow a direct comparison of observed mean scores across groups. From this level onward, one can additionally examine the invariance of structural parameters, i.e. latent means and factor variances. If some parameters are non-invariant across groups, a weaker form of MI, partial invariance, may still hold. For example, partial scalar MI requires most, but not all, of the item intercepts to be invariant across groups. In this case latent means may still be cautiously compared (Byrne et al. 1989; Lubke and Dolan 2003).

When testing for MI, models are compared using χ2-difference tests. Due to the use of robust estimation procedures (MLR), we used Satorra-Bentler scaled χ2-difference tests (Satorra 2000; Satorra and Bentler 2001). χ2-tests are often highly dependent on sample size, so fit indices are commonly used to judge MI. Differences of ∆CFI and ∆RMSEA are examined and a drop in CFI less or equal to .010 is considered acceptable, as long as it is balanced by ∆RMSEA no greater than +0.015 (Chen 2007; Cheung and Rensvold 2002). Finally, lower BIC values indicate a better tradeoff between accuracy and parsimony.

Results

We initially examined if both adolescent subsamples differed. There were no significant differences with regard to participant sex, χ2 = 0.07, df = 1, p = 0.89, or age, t = 1.86, df = 160.48, p = 0.065, M1 = 14.51, SD1 = 1.62, M2 = 15.02, SD2 = 2.18, d = 0.27. Still, sample 1 reported a lower EVOS mean score than sample 2, t = 2.20, df = 201, p = 0.029, M1 = 2.29, SD1 = 0.50 vs. M2 = 2.45, SD2 = 0.51, d = 0.32. Correlations between study variables can be seen in Table 1. Means and standard deviations are depicted in Table 2. Overall our sample was comparable to other adolescent samples. The SDQ scores closely matched the scores reported for a recent German standardization (Lohbeck et al. 2015).

Table 1 Correlations between study variables
Table 2 Descriptives of study variables

A total of N = 200 participants could be used for psychometric analyses due to missing values (listwise-deletion is required). The items tended to be answered (more) positively, as would be expected in general school samples (all Ps > 74.42). Corrected item-to-total correlations ranged between 0.52 and 0.72, indicating good reliability of the EVOS. Cronbach’s Alpha for the whole scale amounted to 0.88 for the whole scale and 0.80 for the subscale relationship quality and 0.81 for the subscale collective efficacy. This closely resembled scale reliabilities in adult samples, where Alphas of 0.87 for the total score and 0.82 for the subscale relationship quality and 0.82 for the subscale collective efficacy have been reported.

To test the measurement model of the EVOS in adolescent samples, we used confirmatory factor analysis. EVOS was developed as a scale implementing a construct with two theoretically distinct, yet highly related factors using exploratory factor analysis. Consequently, a model with two correlated factors for quality of the relationship and collective efficacy was established for adult samples. In the adult sample, this two-factor model was shown to fit the data (Aguilar-Raab et al. 2015), χ2 = 38.10, df = 26, p = 0.06; RMSEA = 0.05 CI90 = [0.00–0.08], p-close = 0.47; CFI = 0.985; SRMR = 0.030; AIC = 3344; BIC 3434. We now aimed to replicate the previous findings with our new data. The two-factor mode also fitted the adolescent data very well, χ2=25.37, df = 26, p = 0.50; RMSEA = 0.00 CI90 = [0.00–0.05], p-close = 0.93; CFI = 1.00; SRMR = 0.031; AIC = 3338; BIC 3431. An attentive reader will notice that this model achieved perfect model fit in terms of CFI and RMSEA due to χ2 being less than the degrees of freedom. This is a result of using a robust maximum likelihood estimator, which produces scaled χ2 test statistics. If χ2 is less than df, which is the expected χ2 in the population, then RMSEA is set at zero, indicating a perfectly specified model. Similarly, if CFI is greater than one, it is set at one and if less than zero, it is set at zero (see Kenny 2015). We computed the average variance extracted (AVE; Fornell and Larcker 1981) for the two sub-factors (quality of the relationship: AVE = 0.51; collective efficacy: AVE = 0.46) as well as for the whole scale, AVE = 0.48. Both factors were highly correlated at r = 0.87, which may be construed as an indicator of poor discriminant validity. Thus, we also checked the applicability of a single factor model. This model also fitted the data, χ2 = 54.04, df = 27, p < .002; RMSEA = 0.07 CI90 = [0.04–0.10], p-close = 0.11; CFI = 0.961; SRMR = 0.038; AIC = 3357; BIC 3446. Even though a single factor model also fitted the data, the two-factor model was superior as was clearly visible in all the indicators of model fit, both absolute and comparative. The item descriptives and factor loadings of the two-factor model of EVOS are depicted in Table 3 and the structural model with standardized loadings can be seen in Fig. 1.

Table 3 EVOS item descriptives and factor loadings
Fig. 1
figure 1

Measurement model of EVOS with standardized factor loadings

The initial step in MI testing indicated that configural MI held across age groups. Metric MI also held unconditionally. When testing for scalar MI, model fit noticeably decreased. We then examined if the invariance was caused by specific parameters. Relaxing one candidate parameter at a time on the basis of modification indices (Byrne et al. 1989), we checked if model fit could be improved. The fit of the scalar MI model improved noticeably after relaxing equality constraints for the intercept of item #1 (“For me, the way we talk with each other, is …” (ModInd = 17.12). Apparently, this item was easier to endorse for adolescents. Thus, partial scalar invariance could still be shown. At the next step, we tested for equal item residuals. No relevant decrease in model fit could be observed, thus strict MI held across samples. Hence, EVOS showed comparable reliability for adolescents and adults.

Based on these positive results, we examined the invariance of structural properties. First, we constrained both factor variances and the covariance between factors to be equal across age groups. Model fit was slightly diminished, even though it might still be within conventionally accepted limits. Yet SRMR clearly indicated a non-fitting model, so we carefully refrain from declaring equal (co)variances. In the last step we additionally constrained latent means to be equal. Once again, model fit dropped substantially, because means were not equal. This was mirrored by a comparison of observed mean scores between adolescents and adults (d = 0.76). The measurement invariance for EVOS across adolescent and adult samples according to multiple group confirmatory factor analysis can be seen in Table 4.

Table 4 Measurement invariance for EVOS across adolescent and adult samples according to multiple group confirmatory factor analysis (MGCFA)

We examined correlations between EVOS and the other constructs to assess construct validity in the second subsample. Both subscales of the EVOS showed similar correlations to the validations measures, thus we tend to interpret the global score, rather than the subscales. EVOS and SCORE-15 correlated at r = 0.78, indicating a strong overlap and convergent validity. The correlation between EVOS and SDQ was noticeably weaker (r = −0.33). Still, better perceived family relations indicated less psychological symptoms. Similarly, SCORE-15 correlated with psychological distress at r = −0.44. Correlations with psychological distress did not significantly differ for EVOS and SCORE-15, Z = 1.70, p = 0.09, even when accounting for the variance overlap between EVOS and SCORE-15 (Lee and Preacher 2013; Steiger 1980).

Discussion

Family therapy gives room for all involved members to express their views, needs, and feelings with regard to their relationship, the functioning of the family system as a whole and one’s individual experience within this particular social system. Nonetheless, studies on family therapy research considering youth perspective are scarce (Moore and Seu 2011). Overall, children and adolescents are relegated within the mental health system. As a result, most research disregards youths’ feedback, even though perspectives of parents and their children differ considerably (Celinska et al. 2015). Obviously, children and adolescents influence their social contexts and relationships as active agents and create their own perspectives based on individual experiences. To meet criteria for participatory research, it is necessary and beneficial to include all involved members of a focused (social) system. Doing so not only increases validity but also enhances the fit between mental health services and those who make use of it (Calheiros and Patrício 2014). The involvement of family members in treating disorders in young people is highly recommended for example by authoritative clinical guidelines (Carr 2014a). This should not only foster the change process in adapting and providing a supportive social context, but also help address the different needs of burdened parents or other family members (Meltzer et al. 2011).

The focus of change in therapy can differ. In family therapy, relational aspects are addressed such as the quality of the relationship, the level of functioning, and the observable interactions between the members of a social system. This might in turn positively influence individual maladaptive behavior, emotion and cognition. Hence, the perception of these affective-cognitive ways of being, interacting, and functioning with each other is of crucial relevance. For instance, youths’ perceptions (in this case, of marital relationships) were found to be a predictor for their adjustment succeeding parental ratings (Erel and Kissil 2003). In summary, session monitoring and outcome research should assess all included participants of family therapy.

Research has emphasized the importance of different domains of the relationship between children/adolescents and their parents along development. Fruitful communication and support for each other can evolve when children/adolescents and their parents are affectionately related and when they have feelings of belonging and cohesion embedded in a warm-hearted, caring atmosphere (Paradisopoulos et al. 2015). In addition, an increase of collective functioning is associated with being able to adaptably make decisions about collective aims, find solutions while considering resources – all of these areas are addressed by EVOS (Aguilar-Raab et al. 2018, 2015).

Commonly used self-rating measures of family functioning have rarely been tested for applicability in younger ages. Additionally, it is most often unclear if ratings provided by adolescents are truly comparable to adults’ judgments. The present research investigated the applicability of EVOS in a sample of adolescents. Results indicated that EVOS showed comparable psychometric quality in both adolescent and adult samples. Tests for measurement invariance showed that the same two-factor model held across both age groups with only minor deviation. Only a single item (“For me, the way we talk with each other, is …”) measuring the communication aspect of a relationship was easier to endorse in younger years. Given that full measurement invariance is a very strict criterion that is often very hard to achieve (Van De Schoot et al. 2015), we still consider this a positive assessment of EVOS’ applicability in adolescent samples. The EVOS scale showed equal reliability in both age groups, qualifying EVOS for direct comparisons of observed scores across adult and adolescent samples. Construct validity of EVOS in adolescent samples was established by comparison with SCORE-15, a related measure of family functioning. EVOS and SCORE-15 showed a strong overlap, yet no significant differences between both measures with regard to criterion validity emerged in our sample. This may be due to the relatively small sample size. One could speculate that the difference between criterion correlations might have turned out to be significant given a larger sample. Such a result would not be surprising, given that SCORE-15 has a problem-focused design, whereas EVOS explicitly does not aim to measure dysfunctional aspects of social relationships. For a closer examination of the relative merits of either scale, much larger samples and ideally multi-trait-multi-method investigations will be necessary (Campbell and Fiske 1959).

The dimensionality of EVOS may still be challenged. EVOS was originally conceived as a measure with two distinct, yet highly related factors. It was consequently constructed using exploratory factor analysis and the two-factor model has been supported by confirmatory factor analysis. Our current data also show that the two-factor model fits the data better than a single-factor model. Nonetheless, both subscales of the EVOS showed similar correlations to other measures. Even though both subscales are conceptually distinct, they are highly correlated and do not show any substantial difference in criterion validity. Going back to the initial scale development (Aguilar-Raab et al. 2015), we have advocated the use of the total score, rather than interpreting the subscales separately. We still argue that EVOS, being already a comprehensive measure based on several theoretical foundations, should be interpreted as a global score, unless a researcher has a specific, theoretically grounded question in mind. Finally, the global score offers the advantage of increased reliability compared to the subscale scores.

Family relationship quality significantly predicted psychological distress among adolescents. Replicating earlier research, our results confirmed the association between family relations and psychological health. The family environment appeared as an important factor for health and well-being (Miller et al. 2009; Rusbult and Van Lange 2003). Prior research has shown that positive social relationships aid the development of self-regulation skills in adolescents (Farley and Kim-Spoon 2014). Contrasting that, negative interactions and low parental involvement have been identified as a risk factor for the development of severe psychopathology (Fruzzetti et al. 2005).

All in all, better family relationship quality was linked to less psychological distress in adolescents. EVOS proved to be an economic, reliable, and valid measure of family relationship quality in adolescents. The present research supports the use of EVOS as an outcome measure in family therapy.

Limitations and Future Research

A potential limitation might be the restricted reliability of the SDQ. The self-report version has been evaluated to some extent and has been successfully used in many languages. Nonetheless the SDQ has shown less than ideal internal consistencies in a large scale evaluation across many countries. Essau et al. (2012) reported internal consistencies for the SDQ total score ranging from Alpha = 0.52 (Italy) to Alpha = 0.74 (Germany). A more recent standardization in Germany reported a slightly higher internal consistency of Alpha = 0.77 for the total score (Lohbeck et al. 2015). In our sample, the SDQ subscale behavioral problems showed only insufficient reliability. This is unfortunate, but still mirrors the known low reliability of the subscale across a wide range of samples (Essau et al. 2012; Lohbeck et al. 2015). We chose this instrument for validation reasons not only because of a lack of available instruments validated in German, but also because the five items of each of the subscales are related but reflect different facets of the characteristic range specified in the scale name, which is particularly desirable for a screening instrument in this context (Lohbeck et al. 2015). Compared to available instruments in the field of research on adolescents, it is a much more economic and well applicable screening instrument for children and youth that can globally assess their behavioral problems and behavioral strengths. In future research, besides examining construct validity by cross-sectional correlational analysis with other more reliable instruments, it is particularly necessary to check the criterion validity in longitudinal studies in order to fully appreciate the advantages of EVOS.

Our analyses of construct validity only included half the sample and only high school students. Unfortunately, this was the result of procedural and ethical concerns on the side of the school administration, yet it does pose a limitation. Future research should investigate EVOS’ construct validity in larger probability based samples.

Although we could show measurement invariance of EVOS between different social contexts demonstrating that EVOS assess the same construct of social relationship even in considerably dissimilar situations (Aguilar-Raab et al. 2015), we would like to mention that in our previous studies a majority of adult participants were females. In contrast, in the sub-samples available in the studies presented here, male and female youths were approximately equally matched, with even very slightly more males than females forming the basis of our analysis. Despite the fact that we did not control for or stratified sub-samples by gender, this is a limitation that should be addressed in future research with special focus on gender differences in adolescents in relation to the development of psychopathology and family relationship quality.

Our adolescent sample was from a high socio-economic status associated with a better social background. In a review Conger et al. (2010) pointed to the fact that low socio-economic status has negative (causal) effects on adults, children and their relationships. Therefore, our findings might not be representative with regard to lower socio-economic status as a result leading to a restricted generalizability. Nonetheless at least with regard to psychological distress our sample was not substantially different from larger, more diverse samples (Lohbeck et al. 2015). Further longitudinal research will be necessary to elucidate causal relations between perceived quality of family relationship as a protective factor and psychological health.