Recovery from work is a crucial process that occurs during nonwork hours, and can benefit individuals’ health and well-being in several ways (Demerouti et al., 2009; Sonnentag et al., 2017). Recovery experiences are the mechanisms through which the recovery process occurs and can be considered an essential mediator or moderator between job stressors and health problems (Kinnunen et al., 2011; Sonnentag & Fritz, 2015). Sonnentag and Fritz (2007) proposed four specific off-job experiences (psychological detachment, relaxation, mastery, and control) that allow individuals to take a break from job demands and replenish drained resources. Psychological detachment involves a state in which an individual mentally disengages from work and stops considering job-related events during nonwork hours (Sonnentag & Bayer, 2005). Psychological detachment is the core dimension most frequently studied in this research domain (Wendsche & Lohmann-Haislah, 2017). Relaxation is characterized by low sympathetic activation associated with increased positive affect (Stone et al., 1995). It can be obtained through many activities such as listening to music and watching TV. Mastery captures the experience that arises from participating in challenging activities and learning various skills, such as learning a new language (Sonnentag & Fritz, 2007). Control is defined as an individual’s ability to determine the time they spend on certain activities during off-job time (Sonnentag & Fritz, 2007). Accumulated evidence has indicated that these recovery experiences are related to a series of indicators of health and well-being, such as increased vigor at work (Kinnunen et al., 2010), lower levels of work-family conflict (e.g., Molino et al., 2015), and reduced need for recovery (Siltaloppi et al., 2009).

Sonnentag and Fritz (2007) developed a Recovery Experience Questionnaire (REQ) to capture the above psychological experiences during off-job times. This questionnaire includes 16 items on psychological detachment, relaxation, mastery, and control. Items are rated on a 5-point Likert scale ranging from 1 (totally disagree) to 5 (totally agree). The REQ has been translated into several languages and validated in different countries such as Nepal (Panthee et al., 2020), Sweden (Almén et al., 2018), South Korea (Park et al., 2011), and Finland (Kinnunen et al., 2011). All the above studies have found support for the proposed four-factor first-order model compared to various competitive models. Shimazu et al. (2012) indicated that the three-factor first-order model best fits the data, with psychological detachment and relaxation items collapsing into one factor. Hong and Zhang (2017) translated the REQ into Chinese and found that the four-factor first-order model best fit the data compared to alternative models.

Although much is known about construct validity of recovery experiences, outstanding issues still need to be addressed. On the one hand, some scholars using the scale tend to use the overall recovery experience, namely, using the mean of the four dimensions to characterize individuals’ recovery experiences (e.g., Ding et al., 2020; Yang et al., 2020). A potential assumption for using the mean is that there is a high-order latent variable explaining the common variations in the four recovery experiences. However, meta-analytic evidence has indicated that the four recovery experiences vary in the extent to which they are inter-correlated (ranged from .19 to .70) (Steed et al., 2021). One way to deal with this issue is to perform a simple CFA and evaluate the goodness-of-fit indices of a one-factor second-order model. However, the results from a single sample are highly susceptible to sampling and measurement errors. Meta-analytic structural equation modeling may offer an alternative solution, as this method allows researchers to conduct a CFA using the pooled correlation matrix (Cheung & Chan, 2005). Therefore, the first aim of our study is to explore the construct validity of the REQ using a meta-analytic CFA.

On the other hand, some scholars have used a cross-sectional design to investigate the antecedents or consequences of recovery experiences (e.g., Ding et al., 2020), leading to attention being paid mainly to between-person differences. However, most studies using experience sampling methods have shown that recovery experiences can fluctuate daily (Sonnentag et al., 2017). The substantial variance in recovery experiences can be attributed to intra-individual sources (e.g., Podsakoff et al., 2019; Sonnentag et al., 2017). To achieve a better understanding of the REQ, it is necessary to assess its construct validity on a daily basis. In this study, we aim to investigate whether the REQ has a similar structure across between- and within-person levels using a multi-level CFA. Multi-level CFA can deconstruct the total sample covariance matrix into within-level and between-level covariance matrices and uses these two matrices to analyze the factor structure at each level (Muthen, 1994).

In summary, we conducted two studies to address these gaps in the literature. In study 1, we conducted a systematic literature review of articles using the REQ. We aimed to explore whether a one-factor second-order model could explain the correlations between the four recovery experiences using two-stage structural equation modeling (TSSEM) (Cheung & Chan, 2005; Cheung & Hong, 2017). In study 2, we invited participants to complete the REQ on five consecutive workdays before going to bed. We then performed a series of multi-level CFA to investigate whether the REQ has a similar structure across different levels. We expected that our findings would provide valuable information to guide researchers in accurately using the REQ.

Study 1: Meta-Analytic CFA

Method

Literature Search

We used several search strategies to identify the potential literature. A set of keywords and combinations to search for relevant articles was used, including “recovery experience,” “psychological detachment,” “relaxation,” “mastery,” and “control.” The literature search included articles published until January 2021. We first conducted a broad search for potential literature on recovery experiences in the Web of Science database and then undertook follow-up searches using EBSCO, PsycINFO, and ProQuest. We also searched relevant journal websites, including Journal of Organizational Behavior, Journal of Applied Psychology, Journal of Vocational Behavior, Work and Stress, Journal of Occupational Health Psychology, and Journal of Management, Personnel Psychology, and Health Psychology Review. A manual search was conducted to locate additional literature for the references of published reviews and meta-analyses (Steed et al., 2021; Sonnentag et al., 2017; Bennett et al., 2018).

Inclusion Criteria

To be included, studies needed to meet the following inclusion criteria. First, primary studies must have been empirical and quantitative. The study had to report the sample sizes and correlations or statistics that could be transformed into correlations. Second, we required the paper to be written in English. Third, we used each sample as a separate entry in cases in which one article used multiple samples. Fourth, the study must have reported at least one between-level correlation coefficient between the four recovery experiences. Fifth, recovery experiences had to be assessed using the scale developed by Sonnentag and Fritz (2007). Finally, we only included one sample of articles from the same databases.

Coding Procedure

Following a literature search, the first and second authors independently coded each record that met the inclusion criteria. Specifically, the coders recorded the inter-correlations between the four recovery experiences, author(s), publication year, sample size, and publication status. After both raters categorized each effect, the results were compared to establish agreement, initially estimated at 96%. Coding disagreements were handled through discussion.

Meta-analytic Procedure

In this study, we used the TSSEM (Cheung & Chan, 2005; Cheung & Hong, 2017) to conduct a series of CFA with the metaSEM package in R version 3.6.3. In the first stage of TSSEM, we synthesized the correlation matrices of the primary studies into a pooled correlation matrix using a random-effects or fixed-effects model. In the second stage of TSSEM, we conducted a CFA on the pooled correlation matrix using a weighted least squares estimator. At these two stages, we used several indices to assess the model fit, including chi-square (χ2), standardized root-mean-square residual (SRMR), root-mean-square error of approximation (RMSEA), comparative fit index (CFI), and Tucker-Lewis index (TFI). Previous studies suggested that model fit was acceptable if the χ2 value was smaller, SRMR was .08 or less, RMSEA was .08 or less, and the CFI and TLI were .90 or greater (Browne & Cudeck, 1993; Meyers et al., 2006). In addition, we used the fail-safe N (Rosenthal, 1979), random effects trim-and-fill method (Duval & Tweedie, 2000), and Begg and Mazumdar’s rank correlation test (Begg & Mazumdar, 1994) to assess publication bias.

In this study, we compared four prominent models. Model A was a one-factor second-order model in which a single general factor could adequately model four recovery experiences. Model B was a two-factor second-order model in which psychological detachment and mastery formed one factor, and relaxation and control formed another factor. Similarly, in model C, psychological detachment and control formed one factor, while relaxation and mastery formed the second dimension. Finally, in model D, the combination of psychological detachment and relaxation represented one factor and the combination of mastery and control indicated another factor. We used the chi-square difference test to examine whether there was a significant difference in the fit of the first- and second-order models.

Results

Papers Meeting the Inclusion Criteria

The screening process is illustrated in Fig. 1. In our initial search, we identified 2902 citations for possible inclusion. The records identified through other sources included 239 citations. After duplicates were removed, 2348 papers remained. In total, 1249 papers were excluded based on title and abstract screening. Post full-text screening, 1099 papers were excluded on the basis of the following: (1) being a review/meta-analysis/theoretical paper (n = 58) or a qualitative study (n = 66); (2) another language was used (n = 27); (3) the paper was not typically relevant (n = 481) or involved other themes about recovery (n = 158); (4) there was no corresponding measurement for recovery experiences (n = 66); (5) the study only included one dimension (n = 147) or intra-personal correlations (n = 3); and (6) the paper used student samples (n = 13) or the same data (n = 3).

Fig. 1
figure 1

Flow diagram for the search and inclusion criteria for studies in the meta-analysis

Ultimately, 82 independent samples from 77 articles, with a combined sample size of 27,616, were included in the final meta-analysis. Of these samples, k = 64 were from published articles and k = 13 were from unpublished papers. All the matrices were obtained from papers published between 2007 and 2020. All the included articles were marked with an asterisk () in Appendix 1.

Publication Bias

Table 5 Appendix 2 presents the results of these analyses. The fail-safe N was sufficiently large for the six inter-correlations, suggesting that the observed effects were robust. The Begg and Mazumdar’s rank correlation test was not significant for the six inter-correlations, suggesting that publication bias may be absent in our meta-analysis. Trim-and-fill estimates for most of the relationships were relatively consistent with the raw zero-order estimates except for the correlation between relaxation and mastery (robserved = .37 versus radjusted = .42) and that between relaxation and control (robserved = .60 versus radjusted = .64). Caution is required when interpreting these results, as this method performs poorly when there is substantial between-study heterogeneity (i.e., as observed for the correlation between relaxation and mastery: I2 = 95.42). Furthermore, the estimates of the Begg’s test and fail-safe N for the two relationships were acceptable. Hence, we agreed that no severe publication bias was observed in our meta-analysis.

Two-Stage Structural Equation Modeling

We first used a fixed-effects model to assess the homogeneity of the correlation matrices. The results showed a poor fit of the data (χ2(346) = 3872.82, p < .001, CFI = .86, TFI = .86, RMSEA = .17, and SRMR = .15). These indices indicated that the primary correlation matrices could not be considered homogenous. Therefore, a random effects model was used in the subsequent analysis. Table 1 shows the pooled correlation matrix using a random effects model. All the correlation coefficients were statistically significant (p < .001), and the pooled correlations ranged between .19 and .58.

Table 1 Pooled correlation matrix with the random-effects model

Stage 2 TSSEM was conducted on the pooled correlation matrix. Table 2 shows the test statistics and goodness-of-fit indices of these models. Regarding the first model (the one-factor second-order model), the factor loadings were all acceptable, varying from .44 to .84. However, the fit indices for this model were unsatisfactory (χ2(2) = 19.74, p < .001, CFI = .99, TLI = .98, RMSEA = .02, and SRMR = .04). Following the same trend as in the first model, the fit of model C was the worst among the two-factor second-order models (χ2(1) = 18.31, p < .001, CFI = .99, TFI = .96, RMSEA = .03, and SRMR = .04). The goodness-of-fit indices for model B were identical to those found for model C and similarly indicated a poor fit to the data (χ2(1) = 11.23, p < .001, CFI = .99, TFI = .98, RMSEA = .02, and SRMR = .03). The fourth model (model D) provided the best overall goodness of fit (χ2(1) = 1.49, p > .05, CFI = .99, TFI = .99, RMSEA = .01, and SRMR = .01) among the four models investigated, with positive and acceptable factor loadings ranging from .47 to .88. Furthermore, model D was significantly different from model A (Δχ2(1) = 18.25, p < .001). These results suggest that a two-factor second-order model (model D) best explains the REQ.

Table 2 Goodness-of-fit indices of the meta-analytic CFA

Study 2: Multi-level CFA

Method

Participants and Procedure

Participants were recruited through online advertisements and asked to complete a general questionnaire and a daily survey for five consecutive working days. To participate, they had to work full-time with a regular work schedule (shift workers were excluded). The initial survey link was distributed to 170 employees who expressed interest in the study. In total, 152 of the 170 participants completed the initial survey (89%) and reported demographic information. They also reported ID codes and used them throughout the study to match responses across the 5 days. Of these participants, 46.7% were married and 57.2% were female. The average age of the participants was 31.75 years old (SD = 7.10) and they worked an average of 46.59 (SD = 11.33) hours per week. The mean job tenure of these participants was 8.62 years (SD = 7.52), and most were highly educated (93% completed college or university). The participants were instructed to assess their recovery experiences in the daily survey before going to bed across five consecutive working days. Each individual participated on average for 4.77 days. Finally, we obtained 725 day-level data points by matching the data, yielding a response rate of 95%.

Measures

Recovery Experiences

Recovery experiences were assessed using the scale developed by Sonnentag and Fritz (2007) and its Chinese version (Hong & Zhang, 2017). All items were revised to measure daily recovery experiences. This scale includes four subscales, each containing four items rated on a 5-point scale (1 = totally disagree, 5 = totally agree). Sample items include, “Today, during time after work, I distance myself from my work,” “Today, during time after work, I take time for leisure,” “Today, during time after work, I do things that challenge me,” and “Today, during time after work, I decide my own schedule.” The mean Cronbach’s alpha was .94 for psychological detachment, .93 for relaxation, .91 for mastery, and .96 for control experience across 5 days.

Statistical Analyses

Data were analyzed using a multi-level CFA procedure (Muthen, 1994) with Mplus 7.4 (Muthén & Muthén, 1998–2012). Multi-level CFA can divide the total covariance matrix into within-level and between-level covariance matrices and uses these two matrices to conduct the CFA. Responses indicated an approximately normal distribution with skewness statistics ranging from − 1.01 to .47 and kurtosis values ranging from − 1.30 to .52. We used the maximum-likelihood estimation method given that all items had acceptable values of skewness (< 2.0) and kurtosis (< 7.0) (Curran et al., 1996). Several goodness-of-fit indices were used to assess and compare the models, including chi-square (χ2), standardized root-mean-square residual (SRMR), root-mean-square error of approximation (RMSEA), Akaike’s information criterion (AIC), Bayesian information criterion (BIC), comparative fit index (CFI), and Tucker-Lewis index (TFI) (Browne & Cudeck, 1993; Meyers et al., 2006).

Results

Preliminary Analyses

Table 3 presents the means, standard deviations, and correlations of the 16 items at the between- and within-person level. Correlations between the 16 items were substantially higher at the between-person than at the within-person level. Before conducting the multi-level-CFA, we first examined the intra-class correlations (ICC) to determine whether the multi-level analysis was justified (Klein & Kozlowski, 2016). The ICC(1) ranges from 0 to 1, with higher values indicating greater proportions of between-level variance (Dyer et al., 2005). In this present study, ICC values of the items ranged from .28 to .50 (see Table 3). These results suggested that a substantial portion of the variance was attributable to within-person variation. Thus, the multi-level modeling approach was deemed appropriate to test our model.

Table 3 Descriptive statistics and between-level and within-level correlations

Multi-level CFA

Table 4 presents the fit indices of the six competing models. Among the four first-order models, the four-factor model indicated the best fit to the data (χ2(196) = 793.40, CFI = .94, TFI = .93, RMSEA = .07, SRMRbetween = .06, and SRMRwithin = .06). The chi-square difference test also showed that the four-factor model provided a much better fit to the data than (a) the one-factor model (Δχ2 = 3782.78, Δdf = 12; p < .001), (b) the two-factor model (Δχ2 = 1570.03, Δdf = 10; p < .001), and (c) the best-fitting three-factor model (Δχ2 = 786.29, Δdf = 6; p < .001). Meanwhile, the AIC and BIC values of the four-factor model were lower than those of the above three first-order models.

Table 4 Goodness-of-fit indices of the multi-level CFA

In addition, according to the results of study 1, we reported the fit indices of the two second-order models. The two-factor second-order model provided a notably better fit to the data (χ2(198) = 796.47, CFI = .94, TFI = .94, RMSEA = .07, SRMRbetween = .06, and SRMRwithin = .06) and was not significantly different from the four-factor first-order model (Δχ2 = 3.07, Δdf = 2; p > .05, ΔAIC = .93). However, the one-factor second-order model was significantly different from the four-factor first-order model (Δχ2 = 26, Δdf = 4; p < .001) and two-factor second-order model (Δχ2 = 22.93, Δdf = 2; p < .001). Taken together, the two-factor second-order model and four-factor first-order model provided a better fit for both within- and between-person levels.

As shown in Fig. 2, all factor loadings were significant (p’s < .001). At the between-person level, the standardized factor loadings ranged from .79 to .99, with a high average of .96. The inter-correlations between the four factors ranged from .21 to .77. At the within-person level, standardized factor loadings ranged from .66 to .92, with an average of .83. The inter-correlations between the four factors ranged from .20 to .69. Next, we constrained the factor loadings to be equal across within- and between-person levels. The constrained model showed a significant increase in χ2 (Δχ2(12) = 28.95, p < .01). This result indicates that the standardized factor loadings of the four-factor first-order model were significantly higher at the between-person level than at the within-person level.

Fig. 2
figure 2

Path diagram of the final four-factor model of recovery experiences (standardized solution)

Discussion

Our study aimed to investigate the factor structure of the REQ by using two extended CFA. In summary, our results revealed several novel insights that have theoretical implications for the use of this scale. First, to the best of our knowledge, our study is the first attempt to perform a meta-analysis on the structure of recovery experiences. Our results confirmed that the two-factor second-order model provided a much better fit in explaining the correlations of the four recovery experiences than the one-factor second-order model. Hence, it could conceivably be argued that using the mean of the four dimensions to characterize recovery experience is inappropriate.

Two theories can be used to explain the two-factor second-order model. Psychological detachment and relaxation have their roots in the effort-recovery (E-R) model (Meijman & Mulder, 1998), and mastery and control can be explained using the Conservation of resources (COR) theory (Hobfoll, 1998). According to the E-R model, effort expenditure at work can trigger load effects (e.g., acute fatigue and emotional exhaustion). However, the load effects disappear when an individual is no longer confronted with various job demands. Psychological detachment and relaxation can aid recovery because they imply that no further demands are placed on functional systems called upon during work (Sonnentag & Fritz, 2007). Hence, we defined this factor as buffer-oriented recovery strategies that consider psychological detachment and relaxation can buffer the effects of job stressors on health and well-being. Based on the COR theory, depletion effects occur when an individual’s resources are drained or when no resources are gained after resource investment. Participants need to acquire new resources to achieve effective recovery. Mastery and control can aid recovery because they help individuals build up internal resources such as skills, competencies, and self-efficacy (Sonnentag & Fritz, 2007). Therefore, we defined this factor as supply-oriented recovery strategies given that mastery and control can help individuals obtain new resources.

Second, the results of the multi-level CFA indicated that the REQ had adequate psychometric properties at both the between- and within-person levels. Our findings align with previous studies showing that the four-factor first-order structure fits the data better than other alternative models across different levels of analysis (e.g., Bakker et al., 2014). Our findings indicate that the two-factor second-order model also provided a notably better fit to the data, both at the between- and within-person levels.

Moreover, factor loadings were higher at the between-person level than at the within-person level. A possible explanation may be that recovery is a highly fluctuating experience, with significant variations attributed to within-person factors (e.g., Podsakoff et al., 2019; Sonnentag et al., 2017). Certain recovery experiences are less likely to occur on a daily basis. For example, individuals may experience high levels of psychological detachment and relaxation on some days, but not on other days. Fluctuations at the day level may have contributed to this result. One unanticipated finding was that the correlation pattern in our study did not match that observed in earlier studies. Previous studies have shown that the strength of the inter-correlations between recovery experiences is greater at the between-person level (Bakker et al., 2014; Breevaart et al., 2012). However, we found that the two correlations were greater at the within-person level. A possible explanation for this might be due to sampling error. Taken together, scholars can use the four-factor first-order model and two-factor second-order model to conduct various studies.

Limitations and Future Research

Our study has some limitations that need to be addressed. First, we used an experience sampling method to investigate the recovery experiences during weekday evenings. Recovery processes can occur in various temporal settings, such as workday breaks, weekends, and vacations (Sonnentag et al., 2017). Further research can use a weekly experience sampling design to test our findings. Second, only papers written in English were included in our meta-analytic CFA. Future studies could collect data from other versions of the REQ to validate our results and compare the factor structure of the REQ or its measurement invariance across different cultures. Third, this work mainly focused on testing the construct validity of the REQ and two recovery strategies. Future studies should further examine their discriminant, criterion-related, and incremental validity.

Practical Implications

Our comprehensive assessment of the existing research has several practical implications. We found that it was inappropriate to characterize the entire construct by using the mean value of the four recovery experiences, as some of the studies did. However, researchers can use both recovery strategies to conduct academic exploration related to recovery from work. Researchers can simplify the research model by using the two recovery strategies as a superordinate concept of the four recovery experiences. Furthermore, researchers are accustomed to using traditional CFA to test construct validity for a given scale. In this study, we used meta-CFA and multi-level CFA to further evaluate the construct validity of the REQ. This work provided two relatively novel methods (TSSEM and experience sampling method) that may be used to test construct validity for future studies.

Conclusion

In summary, this study further investigated the construct validity of the REQ using two extended CFA. The REQ had well-fitted construct validity at both the within- and between-person levels. Meta-analytic CFA showed that the two-factor second-order model provided a much better fit than the one-factor second-order model did. Hence, scholars should not use the mean of the four dimensions to characterize individuals’ recovery experiences. The two-factor second-order model is a viable alternative method.