What does it mean for people to feel good psychologically? Seligman (2011) suggested that pleasant life, good life, and meaningful life underpin the WB concept. This operationalization combines two approaches on WB in positive psychology (Huta & Waterman, 2014; Ryan & Deci, 2001): hedonia and eudaimonia. Hedonia refers to the affective dimension of WB and includes happiness, the balance between positive and negative affect, and life satisfaction (Graham, 2017; Kahneman et al., 1999). Therefore, hedonic WB reflects what Seligman named pleasant life. Eudaimonia refers to striving for positive functioning (Ryff & Keyes, 1995; Ryff & Singer, 2008), or what Seligman (2011) called good life and meaningful life. Positive functioning means how well individuals are doing in managing own environment, achieving long-term purposes, making good relationships, and having high self-acceptance, contributing to self-actualization or personal growth. Additionally, a series of concepts in the literature have tried to capture as best as possible what it means for an individual to feel psychologically well: general WB (Longo et al., 2016, 2017, 2018), subjective WB (Diener, 2000; Lucas et al., 1996), psychological WB (Ryff, 1995, 2018), social WB (Keyes, 1998), positive mental health (Keyes, 2013), mental WB (Tennant & Conaghan, 2007), spiritual WB (Ellison, 1983; Fisher et al., 2000; Gomez & Fisher, 2003), and flourishing (Diener et al., 2010). Longo et al. (2018) highlight in a meta-analysis the existence of 14 main conceptualizations of WB in the literature: happiness, vitality, calmness, optimism, involvement, self‐awareness, self‐acceptance, self‐worth, competence, development, purpose, significance, congruence, and connection. The authors emphasize that these constructs represent dimensions of the superordinate factor that they name “general WB.”

Psychometric Properties of the SDHS

In the realm of clinical psychology, as well as positive psychology, psychometricians have developed tools that measure either depressive symptoms or subjective happiness. Considering the polarity of affectivity and that WB means more than the presence of positive affect and the absence of negative affect, an instrument was needed to measure WB on the depression-happiness continuum. At first glance, it might seem counterintuitive that a WB scale should include items that measure depression. But is not the case, because WB indicates if individuals’ lives are getting better or worse. However, scientists have provided empirical support for the fact that coping with negative emotions significantly contributes to the state of WB (Gross & John, 2003; Mayordomo-Rodríguez et al., 2015). When maladaptive or defensive mechanisms appear instead of adaptive coping strategies, it can suggest ill-being and psychological disorders, including affective disorders such as depression. In brief, WB is a complex psychological state that reflects positive emotional experiences, individual’s balance of positive over negative states (Lyubomirsky et al., 2005), or ability to cope with negative states.

Clinicians need psychometrically robust tools to measure the progress achieved in therapeutic interventions—progress that is quantified not only in the remission of symptoms but also in the improvement of WB. Since positive psychology appeared only three decades ago, clinicians quantify progress according to the reduction of dysfunctions (), more precisely the decrease of scores on the scales of depression, anxiety, stress, etc. (Holmqvist et al., 2015). Specialists in positive psychology have proposed a series of WB measures, focusing on the hedonic and eudaimonic approaches of this complex construct. From the hedonic perspective, the following scales were developed: Subjective Happiness Scale (Lyubomirsky et al., 2005), Satisfaction With Life Scale (Diener et al., 1985; Pavot & Diener, 2008), and subjective well-being (Diener, 2000; Diener et al., 2010). The eudaimonic approach includes measures such as: psycho-social well-being (Ryff, 1989, 1995; Ryff & Keyes, 1995), The Meaning in Life Questionnaire (Steger et al., 2006), spiritual well-being (Ellison, 1983; Ellison & Smith, 1991), and The Questionnaire for Eudaimonic Well-Being (Waterman et al., 2010).

Increasingly more, specialists needed to articulate the two perspectives (Guidi et al., 2013; Keyes, 2005) to elaborate measures for assessing the progress obtained following therapeutic interventions. Joseph et al. (2004) put together the depression and happiness constructs and built an instrument that captures WB. The authors argued that the assessment of depressive symptoms could theoretically be extended through the zero point into measure of happiness. Items measuring depression (reverse scored) combined with happiness scores reflect the latent construct or factor measured by the SDHS, namely WB. The findings obtained in the original study proved the scale is psychometrically robust. Joseph et al. (2004) underlined the usefulness of the SDHS in research and progress assessments obtained during therapeutic interventions.

It is important to note that Joseph et al. (2004) have the merit of proposing for the first time an instrument that includes the depression-happiness continuum, combining the clinical model with positive psychology. The items of the SDHS fit into the hedonic approach of WB, because they underlie the affective component of WB. The same researchers proposed a more comprehensive scale (Longo et al., 2017, 2018) to measure WB, more precisely general WB. It is more comprehensive measure of WB, since it combines both hedonic with eudaimonic approaches of WB. However, the authors emphasized that they developed the SDHS to be a useful tool for measuring the levels of negative and positive affective states with rapid application in clinical settings.

Current Research

As aforementioned, the original study highlighted good psychometric qualities of the SDHS. The literature also mentions several validation studies of adapted versions of the SDHS in Spanish (Martínez et al., 2018) and Arabic (Yildirim & Balahmar, 2020). To the author’s knowledge, no study has yet been performed to validate the adapted versions in Romania. Such adaptation studies are thus needed so that researchers can conduct more cross-cultural research. In addition, previous validation studies were carried out using only the CTT approach. None used IRT analysis, which outperforms CTT (specifically CFA) by considering not only raw scores (as CFA does) but also score patterns and item properties, such as discrimination and difficulties parameters (Embretson & Reise, 2013). For this reason, the current study aims to address this gap in the literature, by including the modern psychometric approach of IRT, more precisely graded response model (GRM) to explore the precision of the SDHS. More precisely, this study seeks to evaluate the contribution of each SDHS item to the latent construct as well as the location of the items on the latent construct continuum.

Methods

Participants and Procedure

The current study sampled 1326 participants (44.26% male) from Romania, age ranging from 18 to 64 years old (Mage = 34.09, SD = 9.4). Participants’ level of education showed that 31.9% participants completed high school, 38.9% completed a bachelor’s degree, and 29.2% completed a master’s degree. A series of campaigns carried out on WathsApp, SNSs, and e-mail between January and April 2022, using non-probabilistic sample based on the snowball technique. Five hundred eighty participants enrolled in various postgraduate training courses at the University of Bucharest were invited by e-mail to get involved in this study and 514 accepted the invitation. They were asked to share the research link in their WhatsApp contact list and Facebook network. Thus, another 841 questionnaires were collected. Twenty-one questionnaires were removed because the participants answered the two filler questions incorrectly. The eligibility criteria were (i) to be at least 18 years old and (ii) to be native speakers of the Romanian language. Eight participants were ineligible because they were under 18 years old. Thus, in the end, 1326 questionnaires were collected. Participants expressed their informed consent to complete the battery of questionnaires before engaging in research. Various details related to the main aim of the study, voluntary and anonymous participation, and possibility to withdraw during the research without any explanation being necessary were mentioned in the informed consent. To conduct test–retest reliability analysis at 2-week interval, we renewed the invitation for all 514 participants who were initially invited by e-mail and 125 participants agreed to continue their involvement in this research.

Ethics

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation. The study adhered to the tenets of the Helsinki Declaration 1975 as revised in 2000.

Measures

Sociodemographic variables

Sociodemographic data were collected on age, gender, and educational level.

The Short Depression-Happiness Scale (SDHS; Joseph et al., 2004) is a 6-item scale that assesses well-being on a continuum of depression and happiness. Three items are negatively worded (i.e., “I felt cheerless”), and the other three were positively worded (i.e., “I felt happy”), which evaluates the frequency of some mood states in the past week. Each response is scored on a 4-point Likert scale, ranging from 0 (never) to 3 (often). The total score of items assessing depression ranges from 0 to 9, and a higher score indicates a higher level of depression. The total score of items assessing happiness ranges from 0 to 9, and a higher score indicates a higher level of happiness. The reverse scoring of negatively worded items supposes that higher scores imply lower levels of WB. The overall score measuring WB is computed by adding the reverse scores on depression items to scores on happiness. It ranges between 0 and 18. A higher score indicates a higher level of well-being. In the original study (Joseph et al., 2004), Cronbach’s alpha of the SHDH ranged from 0.77 to 0.92. The translation of the SDHS into Romanian was achieved following a forward–backward translation procedure, in accordance with the recommended methodological approach described by Sousa and Rojjanasrirat (2011).

The Brief Resilience Scale (BRS; Smith et al., 2008) is a 6-item scale which measures the ability to bounce back or recover from stress. Items include both positively and negatively worded sentences, such as “I usually come through difficult times with little trouble” and “It is hard for me to snap back when something bad happens.” Each item is scored from 1 (disagree) to 5 (strongly disagree). The total score of the BRS ranges from 6 to 30, and a higher score indicates a higher level of resilience. In the current study, the BRS proved very good psychometric properties in terms of (i) internal consistency (ω = 0.83, 95% CI [0.81, 0.85]; CR = 0.86); (ii) convergent validity (AVE = 0.51); and (iii) construct validity (CFI = 0.98, TLI = 0.98, RMSEA = 0.05, CI [0.03, 0.08], SRMSEA = 0.05, λs ranges from 0.58 to 0.89).

The Perceived Stress Scale-10 (PSS-10; Cohen & Williamson, 1991) is a 10-item scale which assesses the global level of perceived stress. It includes both negatively worded questions about how often respondents felt a certain way over the past month (i.e., “In the last month, how often have you felt nervous and stressed”) and positively worded questions (i.e., “In the last month, how often have you felt that things were going your way”). Each item is scored on a 5-point Likert scale from 1 (never) to 5 (always). Scores for positively worded items must be reversed to obtain the total score. The total score of the PSS-10 ranges from 10 to 50, and a higher score indicates a higher level of perceived stress. In the current study, the PSS proved very good psychometric properties in terms of (i) internal consistency (ω = 0.80, 95% CI [0.78, 0.85]; CR = 0.91); (ii) convergent validity (AVE = 0.52); and (iii) construct validity (CFI = 0.96, TLI = 0.95, RMSEA = 0.06, CI [0.05, 0.08], SRMSEA = 0.05, λs range from 0.61 to 0.80).

Data Analysis

Data analysis was conducted using Mplus 8.8 (2022), Stata 16.1 (2020), and SPSS 28 (2021). Descriptive statistics of the sample characteristics and analysis of the psychometric properties of the SDHS using CTT and IRT analysis were included. Firstly, the internal consistency of the SDHS based on a congeneric model and McDonald’s ω (1999) as well as test–retest reliability at 2-week interval were computed. Second, the construct validity by means of a confirmatory factor analysis (CFA) using the robust maximum likelihood (RML) estimation method was tested. Various goodness-of-fit indices were calculated to determine the acceptability of the model: the root mean square error of approximation (RMSEA), the standardized root mean square residual (SRMS), the comparative fit index (CFI), and the Tucker-Lewis index (TLI). CFI and TLI values > 0.90 are acceptable and > 0.95 are good (L. Hu & Bentler, 1999). The RMSEA and SRMR preferably should be less than or equal to 0.08. Third, the convergent validity was computed by calculating the average variance extracted (AVE) with a cut-off criterion of > 0.50 (Hair et al., 2018) and composite reliability (CR) with a cut-off criterion of > 0.70 (Chin et al., 2003). Fourthly, the concurrent validity based on Pearson correlations between WB, resilience, and perceived stress was tested. Fifthly, IRT analysis was conducted to measure the information given by each item of the scale. The assumptions of local independence were verified. Based on the Fisher information provided by each item’s characteristics, that is, the slope/discrimination (α) and threshold/difficulty (β) parameters were estimated whether the SDHS lays out more or less psychometric information about latent construct, i.e., WB. Items with α > 1.7 are considered providing very high information; 1.35 < α < 1.69, high information; 0.65 < α < 1.34, moderate information; 0.35 < α < 0.64, low information (Baker, 2001). In addition to the α and β parameters, graphs of trace lines, curves of each item, and all items together were considered, including boundary characteristic curves (BCCs), item information function (IIF), the test information function (TIF), and the test characteristic curve (TCC).

Results

Descriptive Statistics

The mean, SD, skewness and kurtosis indicators, and standard errors of all items of the SDHS, as well for other research variable, are shown in Table 1. The findings, namely kurtosis values < 2 and kurtosis < 7, provide evidence for the univariate normality of the data for WB, resilience, and stress.

Table 1 Descriptive statistics (mean, SD, skewness, and kurtosis of research variables)

Internal Reliability

The results confirmed that SDHS has very good internal consistency, taking into account the McDonald’s ω = 0.84; 95% CI [0.82, 0.85]. As can be seen in Table 2, internal reliability could not be improved by removing none of the six items. To compare current results on internal reliability with those obtained in the validation study of the original scale (Joseph et al., 2004), Cronbach’s α coefficient was also computed. A very similar value was obtained: Cronbach’s α = 0.84; 95% CI [0.83, 0.85]. All corrected item-rest correlations were positive and ranged between 0.50 and 0.71 (see Table 2). In addition, the inter-item correlation matrix proved that all values were higher than the cut-off criterion (> 0.30) recommended in the literature (Cohen, 1992).

Table 2 Individual item reliability statistics if item dropped and item-total correlation

Test–retest reliability was computed based on a subsample of 125 participants who agreed to complete again the SDHS at 2-week interval. Intraclass correlation coefficient (ICC), i.e., 0.83, 95% CI [0.82, 0.84], p < 0.001 (with no significant difference between overall scores obtained in the two assessment sessions, t = 0.43, ns), proved a good internal reliability, in accordance with the cut-off recommended by Koo and Li (2016).

Construct Validity

The original single factor model proposed by Joseph et al. (2004) was tested to analyze the psychometric structure of the SDHS in the current study. CFA revealed that the unidimensional structure fitted very well the data. More specifically, χ2(9) = 52.43 (p < 0.001), CFI = 0.987, TLI = 0.978, RMSEA = 0.060; 90% CI [0.045, 0.076], RMSEA p-value = 0.33, and SRMR = 0.048. All factor loadings were above the cut-off (> 0.50) recommended by Hair et al. (2018). They ranged from 0.53 to 0.81 (see all standardized factor loadings and R2 values for each item in Table 3).

Table 3 Standardized factor loadings and R2 for the SDHS items

Convergent Validity

Convergent validity was highlighted by computing AVE and CR. Both coefficients were above the cutoff recommended in the literature (i.e., AVE > 0.50, according to Hair et al. (2018), respectively, CR > 0.70, as mentioned by Cho et al. (2016)). More precisely, AVE = 0.51 and CR = 0.76 were obtained.

Concurrent Validity

Positive correlation between overall score of SDHS and resilience, r(1324) = 0.51, p < 0.001, as well negative association with perceived stress, r(1324) =  − 0.59, p < 0.001, provided evidence for the concurrent validity of the SDHS.

Item Response Theory analysis

Conducting the IRT (GRM) analysis required testing two assumptions: unidimensionality and local independence. The CFA analysis supported unidimensionality. To test local independence, the inter-item residual correlation matrix was computed. The results ranged from 0.01 to 0.09. Since these values were significantly lower than the 1.30 cut-off recommended by Christensen et al. (2017), the second assumption of local independence was proved.

Discrimination Parameters

According to the discrimination parameters, which ranged from 1.43 to 3.85 (as shown in Table 4), all the SDHS items can distinguish between contiguous latent trait levels. Applying Pelfrene et al.’s (2001) recommendation, item 6 (“I felt that life was meaningless”) had a high discrimination level (1.35 < α < 1.69), and the remaining items had an even higher level (α > 1.70). Arranging the discrimination parameters in descending order (5, 2, 4, 3, 1, 6), the first three items were positively worded selected to assess happiness. More specifically, item 5 (“I felt that life was enjoyable”) had the highest discrimination parameter (α > 3), followed by item 2 (“I felt happy”) and item 4 (“I felt pleased with the way I am”), both with α ranging from 2 to 3. Item 6 (“I felt life was meaningless”) resulted in the lowest α value.

Table 4 Discrimination and difficulty parameters for the SDHS items

Difficulty Parameters

In terms of difficulty parameters, the β values ranged between − 2.78 and 0.60 (as seen in Table 4 and Fig. 1). According to the various thresholds recommended in the literature (Martínez et al., 2018), these difficulty values are indicative of both easy levels (β <  − 2) and average levels (− 2 < β < 2). It should be mentioned that two of the negatively worded items (i.e., item 3: “I felt cheerless” and item 6: “I felt that life was meaningless”) have thresholds that are all negative (β1–β3), which means they are very easy to endorse. In addition, these values indicate that choosing “never” for these negatively worded items (which are reverse scored) reflects lower levels of WB than does selecting “often” for all the positively worded items.

Fig. 1
figure 1

The BCC graphs for the SDHS items

The ascending ranking of the SDHS items according to their location on the difficulty scale (more specifically on the upper threshold, i.e., β3) was 6, 3, 5, 2, and 4. The upper thresholds for the positively worded items ranged from 0.20 to 0.60 SD from the mean of the latent trait (ϴ). Thus, the upper threshold of the negatively worded items indicated lower levels of WB as compared to the positively worded items. Although the upper thresholds of the positively worded items were positive, they remained close to the mean of ϴ. This suggests that they reflect the latent trait to a greater extent than the negatively worded items, but other factors impact the latent trait as well.

The SDHS Items Precision to Measure WB or IIF

As the IIF graphs illustrate (see Fig. 2), the positively worded items contributed to the latent trait to a great extent because they peaked from + 1.40 SD to + 3.80 SD from the mean of ϴ. In contrast, the negatively worded items were less reliable. More precisely, their amount of information peaked from + 0.85 SD to + 1.05 SD from the mean of ϴ. In addition, two of them had an apex lower than 1 (i.e., item “1” and item “6”).

Fig. 2
figure 2

The IIF graphs for the SDHS items

Another detail emphasized in the IIF graphs is that the curves were bi- or tri-modal. However, this pattern is not unusual when considering that each category of responses (from 0 to 3) has its own precision for measuring the latent trait, which may peak over a different trait range.

The SDHS Precision as Whole to Measure WB or TIF

The TIF graph (as shown in Fig. 3) highlights relatively uniform information about individuals between − 3.5 SD and + 1 SD from the mean of the latent trait (ϴ). A decline beyond those points in both directions was noticed. In addition, there is a consistent amount of scale information at around − 1.5 and − 1 SD from the mean, peaking at around − 1.2 SD from the mean. The curve of standard error illustrates that the average score on the SDHS is under one-third of one standard error. In addition, the lowest standard error registered between − 2.5SD and + 0.5SD from the mean of ϴ suggests highest reliability on this range. The reliability decreases for higher and very higher levels of WB, since standard error increases for + 2SD and + 3SD (as shown in the TIF graph).

Fig. 3
figure 3

The TIF graph or the SDHS precision (test information function, blue line; standard errors, magenta line)

The maximum amount of information (high precision) scored approximately 11 of the latent trait estimates. Though the SDHS provides a precise estimation of scores for a relatively broad range of the latent trait continuum, it also includes more negative SD from the mean of ϴ than positive (as shown in Fig. 3). This result reinforces the aforementioned pattern related to the upper end of the scale.

Test Characteristic Curves (TCC)

The TCC graph (see Fig. 4) shows that in the 3 to 18 range of total scores, the information curve increases steeply and monotonically. This steep curve highlights the probability that endorsing a correct response monotonically increases as the WB of the respondent increases. In addition, results from the TCC emphasize that the average score was approximately 14.

Fig. 4
figure 4

The TCC graph

Using the 95% critical values from the standard normal distribution (1.96 and 1.96), this plot also tells us that we expect about 95% of participants to score from 6 to 18, which can be interpreted that most people have medium and high levels of WB. The remaining 5% represents respondents with extremely low WB levels (i.e., scores ≤ 6). Considering that the person location on 1.96 SD below the mean of the latent trait indicates an expected score of 6, then a preliminary (before clinical assessment confirmation) cut-off for clinically WB is overall score ≤ 6. Descriptive statistics confirmed the results of the TCC. More specifically, 5.1% of participants obtained an overall score ≤ 6.

Discussion

The current study is the first to use IRT to examine the precision of SDHS measurements, overcoming the limits of the more commonly used CFA that considers only the raw data and ignores item-person relationships. The findings of this research, which are based on CTT, provide evidence for the sound psychometric quality of the Romanian version of the SDHS. Similar to previous studies, the SDHS in this study was unidimensional. Researchers proved high internal consistency of the SDHS using various techniques: (i) McDonald’s ω, (ii) CR, and (iii) bottom-up perspective of reliability shown by various IIF graphs computed during the IRT analysis. McDonald’s ω was calculated because the use of Cronbach’s α is not appropriate due to the impossibility of satisfying the tau-equivalence assumption. Additionally, to compare with the original study, Cronbach’s α coefficient was also calculated. Similar results were found. These findings proved a good level of inter-relationship among items.

CFA highlights that the unidimensional structure of the SDHS fits the data well, as all items were significantly loaded on the latent trait (WB). Although all factor loadings were above the recommended cutoff, item 6 (“I felt that life was meaningless”) had both the λ value and the coefficient of determination (R2), the latter of which was lower than the other remaining items. It is not at all counterintuitive that this negative, reverse scored item contributes less to the overall WB score compared to the other items.

The positive association between the overall resilience score of the SDHS, as well as the negative correlation with perceived stress, confirms the concurrent validity of the SDHS. These results align with previous research that emphasized resilient people have higher levels of subjective happiness (Benada & Chowdhry, 2017; Brailovskaia et al., 2019; T. Hu et al., 2015; Pourkord et al., 2020) and WB inversely related to perceived stress (Cusinato et al., 2020; Li & Hasson, 2020; Ryan, 2022).

In the IRT analysis, six important patterns emerged. First, the SDHS items had a high discrimination capacity, meaning that they effectively reflect the relationship between the position of each participant on the latent trait continuum and the probability of endorsing the items of the scale. The obtained high and very high α prove that the SDHS items can easily distinguish between contiguous latent trait levels.

Second, contrary to α parameters, β parameters indicated very low and respectively low levels of SDHS items’ difficulty. This finding suggests that low and respectively moderate WB levels are required to endorse the items of SDHS, since previous literature (De Ayala, 2008) mentions that high difficulty indicates high levels of the latent trait. Two negatively worded items (item 3 “I felt cheerless” and item 6 “I felt life was meaningless”) were the easiest to endorse since they had the lowest difficulty parameters on all thresholds. Thus, choosing “never” for these items indicates lower levels of WB than does selecting the same response category for the remaining negative item (i.e., item 1), respectively than choosing “often” for positively worded items (i.e., items 2, 4, and 5). More specifically, the difference between the highest difficulty parameter (item 5) and the lowest difficulty parameters (items 6 and 3) suggests that the response to item 5 is more likely due to an individual’s level of WB than the response to the other two items.

Furthermore, the upper thresholds (β3) of difficulty parameters for items 3 and 6 were negative, and all remaining items were located a little over a half SD above the mean of ϴ. These results again indicate that supplementary factors contribute to an individual’s high and very high WB levels more so than those captured by the SDHS items. This pattern seems plausible for two reasons: (i) the SDHS is built on the depression-happiness continuum and (ii) researchers must consider that the latent trait can be explained not only by the presence of positive emotions and the absence of negative ones but also by other factors. This is why items of the SDHS are located on the latent trait continuum mostly below the mean or around one-half (+ 0.6SD) above the mean of the latent trait, thus indicating the very low and moderate levels of WB. In addition, the Likert-type scale used in the SDHS may cause biases in favor of low WB rather than high WB. For example, for positively worded items, there is only one category higher than “sometimes,” namely, “often,” but two categories less than “sometimes,” that is, “rarely” and “never.” Considering this, respondents can have tendency to perceive the SDHS as a measure of lower WB than higher WB.

Third, regarding the IIF graph, the trace lines indicated the positive items contributed with more information to the latent trait, that is, WB, compared to the negatively worded items. Item “I felt that the life was enjoyable” had highest discrimination parameter.

Fourth, the reliability analysis at item level confirms the robustness of the SDHS’s internal construct and item validity, but mostly on the negative range of the latent trait continuum. The TIF graph, in which information function and standard error lines were plotted together, highlights that high values for information are associated with low values of SE, proving that the SDHS Romanian version is highly precise in measuring WB between − 3.5 SD and + 1 SD from the mean of the latent trait. Though the SDHS provides excellent precision according to TIF and SE values (in the aforementioned range), it does not provide enough information to reliably identify individuals who have higher levels of WB (namely, those who are located on + 2 SD and + 3 SD from the mean).

Fifth, the TCC graph highlights that the scores measured by the SDHS increase monotonically. Though the expected score was 14, the peak score of information provided by the scale was approximately 11, which suggests a moderate level of WB.

Sixth, IRT provided the opportunity to rank the SDHS items based on their psychometric properties. Slopes and item locations highlight that the positive items are psychometrically more robust than the negative items because they provide more reliable information on the latent trait. Additionally, scores of 6 and lower are indicative of clinically low WB, as can be seen on the x-axis in Fig. 4. Thus, the preliminary cut-off (before clinical assessment confirmation) is < 7. Though in previous studies the proposed cut-off was < 10, in the current research, scores less than 10 indicate a person location at 1 SD below the mean, which theoretically suggests that around 16% of participants were clinically depressed. Considering the statistics from the WHO (2022), however, this percentage seems to be implausible and unrealistic. According to the obtained cut-off in this study, 5.1% of participants had extremely low or clinically low WB levels.

Strengths of the Current Study

The contributions of this study are six-fold. This is the first study conducted on the SDHS using the advanced psychometric approach of IRT. The advantage of this method is that IRT evaluates not only raw scores (as CTT does) but also score patterns and item properties or item-person relationships.

Second, this study calculated the reliability of the SDHS using both classical and modern techniques. More precisely, in terms of CTT, the McDonald’s ω coefficient was computed. In the framework of modern approach, the IIF and TIF graphs were generated. While the CTT shows only the average reliability of all SDHS items, the IRT has the advantage of providing a bottom-up picture of reliability through the IIF graph. The graph captures variations depending on the pattern of each item, providing an extensive picture of the internal consistency of the SDHS.

Third, CTT analysis evaluated three types of validity: construct, convergent, and concurrent. All proved good psychometric properties of the Romanian version of the SDHS. Construct validity replicated the unidimensionality found in the original study. Convergent validity indicated that the variance of the SDHS scores was captured mainly by the latent trait and less by measurement error.

Fourth, the current findings identified the SDHS’s most difficult and discriminant items. The items that provided the highest levels of information for the latent trait (i.e., WB) were those that were positively worded. Less information was registered in the case of one negatively worded item (i.e., item 6 “I felt that life was meaningless”).

Fifth, based on the TIF graph, a preliminary (before clinical assessment confirmation) cut-off (< 7) was identified for the clinically low WB. The cut-off highlighted in the current research is lower than that proposed in the original study (i.e., < 10). Different cut-offs obtained are worthy of more investigation in future research. As aforementioned, these results must be considered preliminary since they were obtained using participants only from the general population.

Sixth, this is the first validation study of the SDHS conducted in Romanian culture. Previous validation studies of adapted versions of the SDHS were conducted only in Spanish (Martínez et al., 2018) and Arabic (Yildirim & Balahmar, 2020). Thus, this study expands the extant empirical body of research on the psychometric properties of the SDHS.

Limitations and Future Studies Directions

Beyond these strengths, this study has some limitations. First, researchers obtained the data using a cross-sectional design; future longitudinal studies are needed to capture the dynamics of the latent trait (WB). Second, no data were collected from a clinical sample. Future studies using clinical samples are necessary to confirm the cut-off for the clinically low WB measured by the SDHS. Another suggestion for further research is to integrate the analysis of a disputable topic, namely the presence or absence of gender differences that might influence WB. Such investigations must start with measuring the invariance of the SDHS using CTT framework and the differential item functioning (DIF) based on IRT.

Conclusion

The authors of the SDHS (Joseph et al., 2004) explicitly encouraged future studies to validate the SDHS on various populations. To date, no study has been conducted in Romania. In addition, all previous studies were performed using CTT framework. The current research aimed to fill this gap. Construct, convergent, and concurrent validities proved good psychometric properties of the SDHS. IRT revealed excellent scale precision mostly on a negative range from the mean (− 3.5 SD, + 1 SD). All items had higher and respectively very high discrimination, indicating that each item can distinguish between contiguous levels of WB. Negatively worded items were the easiest to be endorsed, suggesting that positively worded items bring more information to the latent trait. The current findings demonstrate that the SDHS has excellent accuracy in measuring clinically low and moderate levels of WB. Thus, professionals can use the scale to measure WB dynamics in the context of therapeutic interventions, more precisely to highlight whether the symptomatic scores for clinically low levels of WB have improved. In the case of the Warwick–Edinburgh mental well-being scale (WEMWBS), which is also a measure WB, Marmara et al. (2022) obtained a similar pattern of less reliability for high levels of WB. However, the WEMWBS also does not capture low levels of WB. Since the SDHS is built on the depression-happiness continuum, it can capture extremely low (i.e., clinically low) levels of WB, but neither the SDHS nor the WEMWBS can capture high WB levels. Possible future developments of the WB scales should consider adding items of high difficulty to increase measurement precision in the upper end of the latent trait continuum.