Introduction

The belief in a just world theory (Dalbert, 2001; Lerner, 1965, 1980) represents a fundamental contribution to the understanding of how people perceive and deal with the world. This theory posits that people feel a need to believe that they live in a world where everyone gets what they deserve and deserve what they get (Lerner, 1965). This belief has a highly adaptive scope, as it gives individuals stability and order over their social context. For instance, when people feel threatened by injustice, they try to uphold their belief in a just world by attempting to restore former conditions of justice. In case their attempts fail, they tend to resort to the so-called “assimilation of injustice”, that is a strategy of copying by means of which the situation is re-evaluated in such a way as to conform to one’s own belief in a just world (Dalbert, 2009).

Dalbert (2001) has conceptualized the belief in a just world as a stable dimension of an individual's personality and described three of its main functions. First, it encourages people to behave according to the rules of justice; since those who believe in a just world expect to be rewarded for their actions, they tend to behave in accordance with established and shared rules of conduct. Second, it promotes trust in others and the belief that one's destiny is tied to one's righteous behaviour. This entails important adaptive skills, which lead to including others in one’s own life plans. Lastly, the belief in a just world provides a point of reference to interpret one’s life in a meaningful way and to deal with situations of injustice, based on the conviction that justice will sooner or later be restored (Dalbert, 2001).

Lipkus et al. (1996) made an early distinction between a belief in a world that is just for oneself, in which individuals feel that they are usually treated fairly, and a more general belief in a world that is just for others, in which individuals feel that people generally get what they deserve. Bègue and Muller (2006) also showed that these two beliefs have different effects on people’s attitudes and behaviours. Starting from the original formulation of the theory and the subsequent distinction between justice for oneself and justice for others, the scholarly literature has produced several quantitative tools to measure people’s belief in a just world (for a review see Furnham, 2003). Rubin and Peplau (1975) were amongst the first to develop a Belief in a Just World Scale (BJW). This tool includes 20 items and has a heterogeneous structure, in that it combined both general and specific aspects of justice into a unidimensional domain. However, this heterogeneity was soon identified as a limitation (Dalbert et al., 1987), leading to the construction of new scales for measuring the general belief in a just world in a more homogeneous manner. These include the General Belief in Just World Scale (Dalbert et al., 1987) and the Global Belief in a Just World (Lipkus, 1991). Both scales measure belief in a just world in general and are positively related to Rubin and Peplau's BJW scale, but unlike the latter, they present a more homogeneous and streamlined structure. In fact, the scales include only 6 and 7 items, respectively.

More recently, Dalbert (1999) completed the General Belief in Just World Scale with the addition of a 7-item scale to assess the belief in a just world for oneself, namely the Personal Belief in a Just World Scale. Together, they form the Personal and General Belief in a Just World (P-BJW and G-BJW) Scales. This new tool was first developed in German and English (Dalbert, 1999), and since then, it has been expanded through several cross-cultural adaptations, including Hungarian (Dalbert & Katona-Sallay, 1996), Slovakian (Dzuka & Dalbert, 2002), French (Bègue & Bastounis, 2003), Portuguese (Correia & Dalbert, 2007), Chinese (Wu et al., 2011), Latvian (Nesterova et al., 2015), Brazilian (Veloso Gouveia et al., 2018) and Russian (Nartova-Bochaver et al., 2018).

Despite their widespread use across the world, these scales have not yet been introduced to the Italian context. As a consequence, the Italian justice scholarship has hitherto relied on related tools such as the Moral Foundation Questionnaire (Bobbio et al., 2011; Graham et al., 2011), the Portrait Values Questionnaire (Schwartz et al., 2012; Vecchione & Alessandri, 2017) and the Measure of Moral Orientation (Di Martino et al., 2019; Liddell & Davis, 1996). However, these tools were designed to measure aspects such as moral domains, moral values, and moral orientations; unlike the P-BJW and G-BJW scales, they are not suitable to fully capture an individual’s perception of justice.

Some Italian-adapted tools, which measure the perception of justice, are available only for specific contexts, such as the Classroom Justice Scale (Berti et al., 2010; Chory‐Assad & Paulsel, 2004) and the Italian version of Colquitt's Organizational Justice Scale (Colquitt, 2001; Spagnoli et al., 2017). On the other hand, the P-BJW and G-BJW scales can be applied to a wider variety of contexts such as at work (Hafer et al., 2005), in trade (White et al, 2012), schools (Donat et al., 2018), or to explore gender issues (Correia et al., 2015), and socio-cultural differences (Wu et al., 2011).

The aim of the present study is to bridge this gap in the Italian justice scholarship by adapting the Personal and General Belief in a Just World Scales to the Italian context.

Methods

Participants and Procedures

This study was part of a larger project, which examined the relationship between justice, mattering, and well-being (see Esposito et al., 2022; Di Napoli et al., 2022). The adaptation of the P-BJW and G-BJW scales was implemented through a series of steps, which started with the translation and back-translation (Brislin, 1970) of the instrument to the Italian language. The obtained Italian version of the scales was tested first through a pilot study, which involved a convenience sample of 213 Italian university students (22% males and 78% females), with an age between 18 and 30 years (M = 21; SD = 2.4). The sample was recruited in April 2019, during university classes by a member of the research team.

Following encouraging results from the pilot study, a national study was conducted from May to September 2019, to test the psychometric properties of the scales across the wider Italian context.

The recruitment relied on the snowball sampling technique; the students who were surveyed in the pilot study were trained in Computer-Assisted Survey Information Collection (CASIC, Couper et al., 1998). Once they completed their training, they were instructed to administer the questionnaire to their personal networks of contacts. Contacts were recruited through a variety of methods including telephone calls, text messages, emails, and online social networks. In most cases, the students directed the potential participants to the online survey. Only those participants who were not familiar with high technology or unable to use it had to be assisted. In this case, students read out the survey items to the respondents and inputted their answers in the database.

This procedure led to the collection of 3180 responses, which underwent data cleaning. Cases were excluded if they met one or more of the following criteria: (a) under 18 of age (460 cases), (b) who were not living in Italy at the time of the survey (374 cases), (c) who completed less than 80% of the survey (311 cases), and (d) who did not provide consent for processing their personal data (427 cases). This process resulted in the deletion of 497 cases (approximately 16% of the total sample), and thus to a final sample of 2683 participants (39% males and 61% females), with an age between 18 and 88 years (M = 29.86; SD = 12.83), who presented no missing data. The RMSEA-based power analysis test of close fit, developed by MacCallum et al. (1996), was chosen to assess whether this final sample was adequate for our analyses. Based on a priori analysis with 50 degrees of freedom in the final Model C (see Table 2)—which is the one used to assess the main characteristics of the adapted P-BJW and G-BJW Scales—we estimated a minimum sample size of 214 cases to reach a power of .80. Since our sample largely exceeded this value, we could be confident that our assessment of model fit would not incur a Type II error.

Socio-demographic characteristics of both the pilot and national sample are shown in Table 1.

Table 1 Demographic characteristics of pilot and national samples

Measures

Participants from both the pilot and national study were presented with an online survey, which was hosted of the SurveyMonkey digital platform. This data collection tool included a socio-demographic card and a section presenting the Italian translated version of the Personal and General Belief in a Just World Scales (P-BJW and G-BJW). The scales include 7 and 6 items, respectively, which were presented in the form of statements, to which the participants needed to express their level of agreement on a 6 points’ Likert scale, ranging from strongly disagree to strongly agree.

The Italian-adapted version of the P-BJW and G-BJW scales is available as Supplemental Material SM1.

Results

Data Analysis

Data analysis was conducted following covariance-based structural equations modelling (SEM) and implemented using the MPLUS software vers. 8 (Muthén & Muthén, 2007). A series of confirmatory factor analyses (CFAs) were used to test the factorial structure of the two scales. Given the presence of some univariate and multivariate deviation from normality, maximum likelihood robust (MLR) was chosen as main estimator. Missing data were treated with list-wise deletion, causing no further loss of data, both in the pilot study and in the national study.

Amongst the chosen goodness-of-fit indices, we referred to chi-square (χ2) test of model fit, root mean square error of approximation (RMSEA), Bentler's comparative fit index (CFI), Tucker–Lewis index (TLI), and standardized root mean square residual (SRMR). According to Hu and Bentler’s (1999) guidelines, a model presents good level of fit if RMSEA < .05, CFI and TLI > .95 and SRMR < .05. Gamma hat (Fan & Sivo, 2009) and McDonald’s non‐centrality index (NCI; McDonald, 1989) were also used as robust measures to large sample size. The literature suggests as cut‐off values, Gamma hat more than .95 and NCI more than .90 (Hu & Bentler, 1999).

To evaluate the reliability of the factors, we relied on Joreskog composite reliability (CR) index. CR values greater than .7 indicate a good level of reliability of a latent variable (Fornell & Larcker, 1981). Following Brown’s (2015) recommendations, we retained only items with factor loadings higher than .4. To assess convergent and discriminant validity, we relied on average variance extracted (AVE), which measures how much variance a latent variable is capable of explaining on a set of congeneric indicators, compared to the amount of unexplained variance due to measurement error. According to Fornell and Larcker (1981) and Hair et al. (2010), AVE values greater than .5 indicate good convergent validity. Additionally, AVE can be used to assess discriminant validity through comparing the amount of variance a latent variable manages to explain on its own, with the amount that it is shared with other constructs within the same model. In our case, we would expect that the amount of variance that the P-BJW factor explains should be as unique as possible to that factor and not be explained by G-BJW, and vice versa. AVE values greater than both maximum shared variance (MSV) and average shared variance (AVS) are indicative of good discriminant validity.

In line with Kline’s recommendations (2016), the next pages will follow a stepwise approach, with a series of models with increasing modifications. In order to preserve consistency with the original structure of the BJW scales (Dalbert et al., 1987; Dalbert, 1999), we started with a model comprising 7 and 6 congeneric variables, which were made to load onto the P-BJW and G-BJW factors, respectively. As shown in Table 2, the fit indices of model A were not adequate, neither in the pilot or national sample. For the pilot study, none of the fit indices met Hu and Bentler’s (1999) recommended thresholds. The national sample showed a significant χ2(64) = 780.2, and despite a good value of SRMR (.041) and Gamma hat (.96), the borderline value of the RMSEA = .065 (.061; .069), along with low values of CFI, TLI and NCI (respectively, .90, .88 and .88) indicated an unsatisfactory model fit.

Table 2 Indices of fit and comparison between Models A, B and C

As shown in Table 3, CFA almost all the items in both the pilot and national sample showed acceptable and statistically significant standardized factor loadings, along with satisfactory inter-item reliability values. However, some items did not show particularly high standardized values. Amongst them, PBJW1 presented standardized factor loadings well below the recommended threshold of .4 (λ = .18 with R2 = .03 in the pilot sample, and λ = .21 with R2 = .05 in the national sample).

Table 3 Factor loadings and inter-item reliability (R2) of Models A, B and C in the pilot and nation samples

An inspection of correlated residuals and modification indices showed that some items shared common variance. Amongst these, we found potential correlated errors that presented expected parameter change (EPC) statistics equal to or higher than 10.84. This value is indicative of a significant χ2 (p < .001) and suggests that their addition would significantly improve the overall fit of the model. The manifest variables correlated terms that showed the highest EPC were found between GBJW2 and GBJW1, GBJW2 and GBJW3, GBJW4 and GBJW1, GBJW4 and GBJW3, and GBJW6 and PBJW7 in the pilot sample, and between PBJW2 and PBJW3, GBJW1 and GBJW2, and GBJW3 and GBJW4 in the national sample.

However, as the literature recommends (Byrne, 2013; Cliff, 1983), the suggested changes were made not only based on statistical significance, but also in consideration of theoretical relevance. In fact, the presence of these correlated error terms finds theoretical justification in the nature of the items compositing the two scales, which share similarities in meaning. For example, the highest correlation between item PBJW2 “I am usually treated fairly” and P-BJW3 “I believe that I usually get what I deserve” (see SM1) can be explained by the fact that most people feel that they are treated fairly when they are given their due.

This led us to the choice of respecifying Model A by allowing the above-mentioned manifest variables’ residuals to correlate (Model B). As shown in Table 2, the fit indices of model B constitute a significant improvement compared to Model A. In fact, this respecified model showed satisfactory fit both in the pilot sample, χ2(59) = 99.55, p < .001; RMSEA = .057 [.037; .076], CFI = .933, TLI = .911, SRMR = .047, Gamma hat = .97, NCI = .91, and the national sample, χ2(61) = 413.67, p < .001; RMSEA = .046 [.042; .051], CFI = .953, TLI = .940, SRMR = .033, Gamma hat = .98, NCI = .94.

However, the item PBJW1 still presented a decidedly a low standardized value and inter-item reliability in both the pilot and national study (λ = .17, R2 = .03; λ = .21, R2 = .05). This caused concerns in terms of convergent validity, since AVE fell below the acceptability threshold of .5, with a minimum value of .34 for P-BJW in the pilot sample and a maximum value of .39 for G-BJW in the national sample. Moreover, the comparison between the AVE of each latent variable and the corresponding values of MSV and ASV revealed that Model B reached low discriminating validity for both the latent dimensions in the pilot sample. As for the national sample, only P-BJW had an AVE higher than the MSV and ASV (see Supplementary Material SM2).

Due to these limitations, we decided to implement an item-removal strategy (Larwin & Harvey, 2012) and test a new model (Model C). Following a parsimonious approach, we did not substantially alter the structure proposed in the original scales. Therefore, we decided to remove only item PBJW1, which had previously shown the lowest value amongst all other manifest variables. Consequently, Model C included 12 items, 6 for the latent factor P-BJW and 6 for the latent factor G-BJW.

As shown in Table 2, Model C presented acceptable fit indices in the pilot sample, χ2(48) = 81.98, p < .001; RMSEA = .058 [.035; .079]; CFI = .943; TLI = .922; SRMR = .056, Gamma hat = .97, NCI = .92, and even better indices in the national sample, χ2(50) = 330.65, p < .001; RMSEA = .046 [.041; .050]; CFI = .961; TLI = .949; SRMR = .032, Gamma hat = .98, NCI = .95. This last model also showed adequate standardized factor loadings and inter-item reliability values in both samples, ranging from .35 (R2 = .12) to .72 (R2 = .55) in the pilot sample and from .48 (R2 = .23) to .73 (R2 = .53) in the national sample (see Table 3). Furthermore, reliability and discriminant validity were also satisfactory, with CR > .7 and AVE always greater than MSV and ASV. However, this model still presented issues in terms of low AVE in both factors, with values ranging between .39 for P-BJW and .38 for G-BJW in the pilot sample and .40 for P-BJW and .38 for G-BJW in the national sample (see Table 4).

Table 4 Factor correlations, reliability and validity measures of the personal and general Belief in a Just World Scales in pilot and national samples for Model C

Figure 1 shows the final factorial structure of Model C—which was derived from the national sample—along with standardized factor loadings, latent variables correlations, and error terms correlations. Means, standard deviations and correlations between the items can be viewed in Supplementary Material SM3.

Fig. 1
figure 1

Factorial structure of the latent dimensions P-BJW and G-BJW in the national sample (n = 2683), in the final model (Model C). Note All values are significant at .1% alpha level

Model Comparison

Based on previous comparisons between factor loadings, psychometric validity indices, and fit indices, Model C was found the most suitable solution for explaining the data variability in both the pilot and in the national sample. To further confirm these findings, we compared Model B with Model A and Model C with Model B.

Due to large sample size, we compared Model B and Model A through CFI, Gamma hat, and McDonald's Non-Centrality Index (NCI), which have been proven robust to sample size variations (Fan & Sivo, 2009). Table 2 shows that the differences in all these indices are below the thresholds established by Cheung and Rensvold (2002). This means that the 13 items solution with correlated residuals (Model B) is to be preferred over the solution with 13 uncorrelated items (Model A).

As for the comparison between Model C and Model B, we could not follow the same procedure, since the models have 13 and 12 items, respectively, and as such they constitute two completely different models. For this reason, the only way to assess them was through looking at the respective indexes of goodness of the fit. Table 2 shows that the Model C indices are overall more satisfying than those of Model A, thereby suggesting that the former could better explain the data and give a more robust structure to the scales.

Additional comparisons were made to ascertain that the 2-correlated traits factors solution was tenable against competing or alternative structures. Model C was thus compared against a one-factor model (Model D) and a bi-factor model with two orthogonal specific factors (Model E). Model D yielded very poor fit to the data, χ2(54) = 1981.897, (p < .01); RMSEA = .115 [.111, .120]; CFI = .732; TLI = .673, SRMR = .084, and it would have been necessary to correlate too many error terms to achieve an optimal solution. In the same vein, Model E did not fit the data well, χ2(44) = 515.267, (p < .01); RMSEA = .063 [.058, .068]; CFI = .935; TLI = .902, SRMR = .032, and even after correlating several error terms, this structure did not produce acceptable reliability and validity indices. Therefore, we can conclude that the 2-correlated traits factors solution (Model C) is the best fit to our data.

Further Convergent and Discriminant Validity and Group Measurement Invariance

The Italian-adapted Belief in a Just World Scales underwent further validity tests. These were implemented by comparing the scales scores with variables that the literature has found either related (convergent validity) or unrelated (discriminant validity) to the construct of belief in a just world.

In terms of convergent validity, numerous studies have proved that the belief in a just world is strongly related to the experience of subjective well-being (see Dalbert, 1998; Dzuka & Dalbert, 2006; Lipkus et al., 1996). However, those studies that have specifically employed the Personal and General Belief in a Just World have generally found that only the personal belief is associated with well-being (Bartholomaeus & Strelan, 2019). A recent investigation by Hafer et al. (2020) has highlighted that these studies often ignore that the Personal and General Belief in a Just World factors is usually strongly correlated. This relationship could be responsible for one variable overshadowing the effect of the other. Alternatively, the authors suggest treating personal and general belief as joint indicators of a latent belief in a just world factor.

Following the above recommendations, first, we tested a second-order factor, which explains variability in both the General Belief and Personal Belief in a Just World factors combined together. This new higher-order factor, which we named Belief in a Just World (BJW), was then treated as a predictor of 7 latent factors of well-being (Overall, Interpersonal, Community, Organisational, Physical, Psychological, and Economic), which were derived from the national sample’s responses to the Italian I COPPE short form (Esposito et. al., 2022). The model, implemented through the Structural Equation Modeling, presented excellent indices of fit: indices, χ2(232) = 894.807, (p < .01); RMSEA = .033 [.030, .035]; CFI = .975; TLI = .965, SRMR = .027. High and statistically significant (.001) regression coefficients were also found between BWJ and overall (β = .57), interpersonal (β = .43), community (β = .40), occupational (β = .50), physical (β = .46), psychological (β = .56), and economic (β = .50) well-being (see Supplemental Material SM4). These findings provide strong support for the convergent validity of the Italian Belief in a Just World scales.

As for discriminant validity, the literature has often found that the belief in a just world is either weakly or uncorrelated to demographic variables such as gender (for a review see O'Connor et al., 1996) and age (Bègue & Bastounis, 2003; Oppenheimer, 2006). Therefore, we decided to regress the Italian-adapted Belief in a Just World scales onto these two variables. The analyses were conducted through a Multiple Indicators Multiple Cause (MIMIC) model, which showed moderate fit to data, χ2(70) = 532.490, (p < .01); RMSEA = .050 [.046, .054]; CFI = .941; TLI = .924, SRMR = .036. Statistically significant, although very weak, negative regression coefficients were found between gender and P-BWJ, β = − .083, p < .001, 95% CI (− .125, − .041) and G-BWJ, β = − .116, p < .001, CI (− .160, − .073), suggesting that women reported lower levels of belief in a just world than men. As for age, a non-significant and very weak positive effect was found in relation to P-BWJ, β = .005, p = .839, 95% CI (− .042, .052) and a significant, although weak, positive effect was found with G-BWJ, β = .089, p < .001, 95% CI (.043, .135), suggesting that people would report higher levels of belief in a just world with increased age.

Lastly, we wanted to test whether the Italian Belief in a Just world scale could be administered to Italian respondents regardless of their geographical location. This decision was driven by the presence of a historical divide in the country, particularly between the North and the South (González, 2011). In fact, Italy has long been characterized by a differential in the geographical distribution of opportunities and access to the labour market and public services (ISTAT, 2018). In particular, the South of Italy reports higher unemployment rate and incidence of poverty as well as lower access to essential services compared to Central and Northern Italy (ISTAT, 2019).

Following the approach used by the Italian National Institute for Statistics (ISTAT, 2022, p. 9), we combined regional data in the following three macro-areas: Northern (n = 1202), Central (n = 806), and Southern Italy (n = 634). Following Byrne (2013), we tested a series of nested models with increasing constraints (see Table 5). Given that the three groups presented relatively smaller sample sizes compared to the total national sample we partially relied, amongst other indices, to the results of chi-square difference test.

Table 5 Measurement invariance across Northern, Central, and Southern Italy

The baseline model shows good fit to the data, χ2(150) = 420.179, (p < .01); RMSEA = .045 [.040, .050]; CFI = .962; TLI = .950, SRMR = .035. The next model tested the metric invariance of the scale by constraining factor loadings across the three groups. The fit of this model is very similar to the previous model, χ2(174) = 453.870, (p < .01); RMSEA = .043 [.038, .048]; CFI = .962; TLI = .950, SRMR = .035, and the difference test showed a non-significant chi-square, Δχ2(24) = 33.691, p = .232, providing evidence of achieved metric invariance of factor loadings.

The subsequent model further constrained intercepts to test for scalar invariance. In this case, the fit of the model was only slightly different from the previous model, particularly with respect to CFI, χ2(198) = 517.848, (p < .01); RMSEA = .043 [.038, .047]; CFI = .955; TLI = .955, SRMR = .045. Comparison with the previous model yielded a significant chi-square, Δχ2(24) = 517.848, p < .001. This result is likely due to the fact that the three groups analysed presented relatively large samples (see Table 1), and therefore, the chi-square difference test might have detected even the smallest differences. Conversely, small differences in other indices less sensitive to sample size, ΔCFI = .006, ΔGamma hat = .006, and ΔNCI = .001, confirmed that Italian Belief in a Just World Scales is likely to show scalar invariance of intercepts across Northern, Central, and Southern Italy. These patterns indicate that Italian people interpret the scale items in the same way regardless of where they live in Italy.

Discussion

Results from our analyses show that a slightly revised version of the original Belief in a Just World Scales presents moderate psychometric properties that make it suitable for its applicability to the Italian context. This study involved two administrations of the scales. The first was conducted with a pilot sample of 213 university students and the second with a national sample of 2683 Italian citizens. First, we tested a model A that replicated the structure of the original validated scale (Dalbert et al., 1987; Dalbert, 1999). This model included 7 congeneric indicators for the Personal Belief in Just World and 6 congeneric indicators for the General Belief in Just World. However, Model A failed to provide adequate fit both in the pilot and national sample. Consequently, we tested a second model (Model B) in which we allowed the residuals of some manifest variables to correlate. Correlating the items’ errors terms is a strategy shared by several other adaptation studies of the Belief in a Just World scales. In particular, in the Persian adaptation study (Mikani et al., 2021), the authors allowed four correlations between items error terms (PBJW1 with PBJW3, PBJW3 with GBJW2, PBJW5 with GBJW5, and GBJW3 with GBJW5). In the Russian study (Nartova-Bochaver et al., 2018), four correlations between residues were allowed (PBJW1 with PBJW3, PBJW3 with GBJW2, PBJW5 with GBJW5, and GBJW3 with GBJW4). Lastly, in the Brazilian study (Veloso Gouveia et al., 2018), one correlation between residues was allowed (PBJW1 with PBJW3). In all those cases, error terms correlations had to be acknowledged before achieving satisfactory indices of model fit.

Despite Model B showing better indices of fit, the observed variable PBJW1 still presented extremely low factor loading and inter-item reliability, thereby causing issues in terms of convergent and discriminant validity. Unlike other validation studies that did not drop any items, to overcome this issue, we decided to test a third model (Model C), which excluded the observed variable PBJW1. This latest model presented better indices of fit and discriminant validity. Therefore, we selected Model C as the final structure to be applied to the Italian adaptation of the Personal and General Belief in a Just World Scales.

Although this last model still showed some relatively low convergent validity, we should acknowledge that this condition is not exclusive to this study. Whereas some cross-cultural adaptations of the scales achieved acceptable or near acceptable levels of AVE (.46 for G-BJW and .50 for P-BWJ in the Polish adaptation, .45 for G-BJW and .50 for P-BWJ in the Persian adaptation), others showed very similar results to ours, with AVE achieving only .36 for G-BJW and .50 for P-BWJ in the Russian adaptation and .29 for G-BJW and .39 for P-BWJ in the Brazilian adaptation.Footnote 1

Nevertheless, we should be mindful that AVE is not the only criterion to assess the convergent validity of an instrument. In our case, the positive and highly significant regression coefficients of well-being onto the Belief in a Just World factor lend strong support to the convergent validity of the scales. Regarding discriminant validity, our results confirmed the weak relationship that the literature has found between the belief in a just world and age (Bègue & Bastounis, 2003) and gender (Sutton & Douglas, 2005). Additionally, our results were very similar to the one found in the Russian adaptation of the Belief in a Just World Scale (Nartova-Bochaver et al., 2018), in which the authors found no significant correlation between gender and the Belief in a Just World, and only a weak positive correlation (r = .10) between age and the Personal Belief in a Justice World.

Lastly, the Italian Belief in a Just World scales showed equivalence of factor loadings and intercepts when compared across the three main geographic areas of Italy. This suggests that, despite a historical divide between the North and South of the country, to Italian people are likely to interpret the scales equivalently, regardless of their geographic location.

Limitations and Future Recommendations

The results of this study must be interpreted in the light of some limitations. First, we used a non-probabilistic technique such as snowball sampling to recruit our respondents. Additionally, the data collection of the national sample involved trained students and recruiting personal contacts. This poses limitations to the representativeness of the final sample and invites future studies to better generalize our results to the Italian population.

Secondly, some adjustments were necessary to achieve the final version of the Italian P-BJW and G-BJW scales. In particular, the item PBJW1: (I believe that, by and large, I deserve what happens to me) had to be removed altogether from the Italian version. The deletion of this item can have multiple explanations. One is that this item has been found generally quite weak across several studies’. For example, in the original validation measure (see Dalbert, 1999), it presented the lowest standardized factor loading. Similarly, in the Polish and Persian adaptations, the item resulted to be the weakest amongst its congeneric indicators.

However, none of the previous studies had to drop the item PBJW1, and therefore, the above explanations do not fully justify the extremely low values we found in our study. Another explanation we should consider is the possibility that respondents might have misunderstood the meaning of the item. In fact, the way that the item is worded might have led them to believe that whatever negative happened in their life is the direct result of their wrong choices. This might have led respondents to believe that the item was assessing their sense of guilt, rather than how much they felt the world is a just place. Future studies could better establish whether it will be feasible to reintroduce the problematic item. For example, as one of the reviewers of this article suggested, an alternative translation could be “Credo che, in linea di massima, io meriti ciò che mi accade".

Conclusions

The present study introduced the Italian version of the Personal and General Belief in a Just World Scales. Our results also confirm the need to distinguish two levels of justice, namely personal and general justice. Indeed, as proved in many instances, these two distinct aspects of a person’s perception of justice may lead to different interpretations of life events (Bègue & Bastounis, 2003; Dalbert, 1999; Lipkus et al., 1996). These can have different effects on some important personal and social outcomes, such as mental health (Dzuka & Dalbert, 2002; Jiang et al., 2016), self-efficacy (Bègue, 2005; Correia et al., 2016), religious orientation (Saroglou & Pichon, 2009), and prosocial behaviour (De Caroli & Sagone, 2014; Silver et al., 2015).

Despite some adjustments and room for further improvements, the Italian adaptation of the BWJ scales presents adequate psychometric properties, and therefore, it can prove very useful to advance the Italian scholarship towards a better understanding of how the perception of justice affects people’s lives. For example, the use of this scale could be of great value to unpack the reasons why Italy reports lower levels of social justice than other European countries (Schraad-Tischler et al., 2017). In that regard, the Italian version of the Personal and General Belief in a Just World Scales may be useful for those researchers that are trying to investigate the link between justice, well-being, and other socio-psychological variables (Di Martino & Prilleltensky, 2020; Di Martino et al., 2022). This will ultimately contribute to developing operational guidelines for socio-political institutions and for all those committed to the study and promotion of justice.