1 Introduction

Corruption is usually defined as “the abuse of public office for private gain” (World Bank 1997, p. 8). Extensive scholarly research has identified the several effects of corruption on socioeconomic systems. In particular, since the late 1990s, the empirical economics literature has exponentiallyFootnote 1 expanded owing to the raising quality and availability of data on (perceived) corruption. This literature highlights three main criticisms. The first one refers to the reliability of the indexes on (perceived) corruption utilized to describe the magnitudes of corrupt activities. A recent critical viewpoint raises significant doubts about whether the perceptions-based indicators are reliable proxies for actual corruption (e.g., Seligson 2006; Razafindrakoto and Roubaud 2010; Donchev and Ujhelyi 2014; Treisman 2015; Ning 2016). For instance, Treisman (2015) considers the differences in countries’ perceived corruption scores as for the most part correlated with national cultural stereotypes or with wider media coverage of, e.g., corruption scandals, rather than the actual extent of corrupt activities.

A second criticism refers to the common practice of treating “corruption as unidimensional and as synonymous to bribery” (Philp 2015, p. 19). According to Andersson (2017), other forms of corruption (e.g., favoritism, improper interference, conflicts of interest) usually more common in developed countries are partially neglected by usual corruption-perception indexes that focus essentially on bribery. A growing body of literature has pointed out the existence of different “forms” of corruption. For instance, Dincer and Johnston (2019) distinguish between legal and illegal corruption, relying on the nature of public official’s gains in exchange for providing specific benefits to private individuals or groups. Specifically, illegal corruption occurs when public office is abused for private gains in the form of cash or gifts to a government official. On the other hand, legal corruption occurs when the abuse of power is for political gains in the form of campaign contributions to or endorsements by a government official (e.g., lobbying activity).Footnote 2 A different taxonomy of corruption has also been proposed, e.g., “High level” or “Grand” corruption versus “Low level” or “Petty” corruption. “Grand” corruption refers to misconduct at the top by leading politicians and that category comprises both illegal and legal corruption. “Petty” corruption refers to underhand payments to expedite administrative procedures: bribes to avoid fines or to “speed up” waiting lists for public services, and so on); it usually involves administrators and bureaucrats. Accordingly, taking into account that the degree to which bribery can serve as a proxy for overall corruption varies depending on the nature of a political system and the extent of economic development, Andersson (2017) concludes that, in established democracies with highly developed economies and low corruption, the accuracy of conventional perceived-corruption indexesFootnote 3 may be particularly poor.Footnote 4

The third criticism refers to the evidence that older and more recent studies on corruption often contradict each other. Consequently, doubts arise about the reliability of the estimated corruption indexes because of statistical inconsistencies. The predominant explanation for the conflicting findings points out that as, at least partially, the discrepancies are consequences of the more sophisticated econometric approaches, larger datasets, or both, available for recent analyses (Dimant and Tosato 2018).

From a methodological viewpoint, the present article aims to contribute to the debate by focusing on the last two criticisms above. In particular, in order to deal with the second criticism, I apply a statistical approach that considers corruption to be a multidimensional phenomenon. As such, I aim to estimate an overall perception of corruption index (i.e., taking account of both “legal” and “illegal” as well as “grand” and “petty” corruption).Footnote 5

As the third criticism concerns, I apply an estimation method—i.e., Partial Least Squares estimation approach to Structural Equation Modeling (PLS–SEM)—which has two main advantages over the previous empirical analyses. First, it is able to translate into testable relationships the economic hypotheses regarding the causes and consequences of corruption by means of a unified statistical approach—the so-called “structural model” of the SEM. The second advantage of a SEM consists in treating (perceived) corruption as an unobservable variable (i.e., a latent construct) that interacts in complex ways with several other unobserved socioeconomic factors (e.g., institutional variables) and observable variables. In that sense, I aim to improve reliability of the estimates of perceived corruption by reducing measurement errors.

To the best of my knowledge, the study at hand is the first attempt to estimate an index of perceived corruption by PLS–SEMFootnote 6—hereinafter a structural corruption perception index (S-CPI).

Essentially, PLS–SEM is a system of interdependent equations estimated using both factor analysis and multiple regression techniques until the model converges adequately by an iterative method.

From a positive viewpoint, the contribution of the present study consists in providing an updated, wide-ranging and comparable meta-index of perceived corruption for 165 countries using annual data over the 1995–2016 period.

In terms of policy implications, I will identify the main factors affecting corruption—by decomposing the total effect of the causes on corruption in direct and indirect effects—and which of them are the most effective channels for fighting corruption by conducting an importance-performance map analysis.

The paper is organized as follows. The next section summarizes the causes and consequences of corruption, providing the theoretical background for model specification. Section 3 explains the empirical approach and provides a formal representation of the PLS–SEM. Section 4 reports empirical results and discusses the findings and policy implications. Section 5 concludes. Two online appendixes describe the dataset and report annual S-CPI scores for all 165 countries.

2 Theoretical background causes, consequences and indicators of corruption

Duncan (1975, p. 149), describing the SEM, stated that “the meaning of the latent variable depends completely on how correctly, precisely and comprehensively the causal and indicator variables correspond to the intended semantic content of the latent variable”. Thus, the reliability of the estimates of the key latent variable (i.e., perceived corruption) depends completely on what causes and consequences are selected in specifying the model. Accordingly, following the literature on corruptionFootnote 7 and data availability, I specify a model with nine latent variablesFootnote 8 and 42 observed indicators. In SEM terminology, the system of statistical relationships explaining how latent variables (causes, consequences and indicators of perceived corruption) are related with each other is defined as the structural or “inner” model. The systems of equations—so-called “blocks”—in which each latent variable is connected to a subset of manifest variables constitute the measurement or “outer” model—in the SEM. Table 1 summarizes the main theoretical hypotheses supporting the specification of structural model.

Table 1 Theoretical hypotheses on the causes and consequences of corruption

According to the unobservable and/or multidimensional nature of the potential causes and consequences of corruption, I define the latent variables as “reflective” (i.e., the observed indicators of a construct are considered to be caused by that construct) or as “formative” (i.e., the manifest variables are considered to be the causes of the latent variable). For the sake of brevity, in the next section, I report details on the measurement model of the key latent construct only (i.e., corruption). As for the measurement models of other latent variables, the definitions and sources of observations on all manifest variables are provided in the Appendix A1.

2.1 Indicators of corruption: the measurement model for the S-CPI index

The latent variable “Corruption” (S-CPI) is measured by a reflective model based on some of the most widely known cross-country indexes that account for the magnitude of perceived corruption as reflected in the opinions of panels of national experts and business people. Specifically, the five indicators are: (a) the Corruption Perceptions Index published by the Transparency International (2017) (CPI Rev); the original index—perceptions of the extent of corruption as seen by business people, risk analysts and the general public—is rescaled so that the scores are higher scores when the level of perceived of corruption increases. (b) The Bayesian Corruption Indicator estimated by Standaert (2015) (Bayesian Corr). It is a composite index of the perceived overall level of corruption combining information from 20 different surveys and more than 80 different survey questions. (3) The Political Corruption index (Political Corr) is equal to the average of public sector corruption index as estimated by Coppedge et al. (2017) and Pemstein et al. (2017) in the “Varieties of Democracy (V-Dem)” Project. (4) The Control of Corruption index (Control Corr. Rev) is extracted from the Worldwide Governance Indicators database of the World Bank and measures perceptions of corruption. (5) Freedom from corruption (Freedom Corr. Rev) is the (rescaled) index of Freedom from corruption published by the Heritage Foundation (2017). (6) The (rescaled) “ICRG Indicator of Quality of Government” (ICRG. Rev) included in the International Country Risk Guide indicators and produced by the PRS Group (2018). It is calculated as the complement to the mean of the ICRG variables “Corruption”, “Law and Order” and “Bureaucracy Quality”.

3 The statistical approach: partial least squares: structural equation modeling

SEM is a multivariate statistical approach that subsumes a whole range of standard multivariate analytical methods, including regression and factor analysis. It enables the researcher simultaneously to estimate complex causal relationships among latent (unobservable) and manifest (observable) variables. SEM is extensively applied in different fields, such as business, marketing, management, psychology, social and, more recently, in macroeconomics research (e.g., Dell’Anno 2007; Dreher et al. 2007; Ruge 2010; Dell’Anno and Dollery 2014; Buehn et al. 2018).

Two approaches to estimating a SEM are possible: a covariance-based approach (CB–SEM) and partial least squares (PLS–SEM).Footnote 9 The differences between CB–SEM and PLS–SEM estimation methods of SEM parameters mainly relate to different data characteristics and the researcher’s objectives (Richter et al. 2016). According to Faizan et al. (2018), PLS–SEM is especially promising when both the assumption of a multinormal distribution is violated and the theory relied on to explain the phenomenon requires modelling complex interactions with many latent constructs. For Esposito Vinzi et al. (2010b), PLS–SEM has the advantage, compared to the CB-SEM, that no strong assumptions with respect to the distributions, sample size and measurement scale, are required. However, those advantages must be considered in light of some disadvantages. For example, the absence of any distributional assumptions implies that scholars cannot rely on the classic parametric inferential framework (Chin 1998; Tenenhaus and Esposito Vinzi 2005). PLS–SEM in fact applies the jackknife and bootstrap resampling methods to derive empirical confidence intervals and for testing hypotheses on statistical coefficients. For this reason, “the emphasis [of PLS] is more on the accuracy of predictions than on the accuracy of estimation” (Esposito Vinzi et al. 2010b, p. 52). Similarly, Shmueli et al. (2016) state that PLS–SEM, by focusing on the explanation of variances rather than covariances, makes it a prediction-oriented approach to SEM. Another drawback is that the absence of a global optimization criterion in PLS–SEM implies the absence of measures of overall model fit. The lack of such measures limits PLS–SEM’s usefulness for theory testing and for comparing alternative model structures (Hair et al. 2012). Hair et al. (2019) provide some guidelines to identify the best approach to estimate a SEM model. Following Hair et al.’s (2019) hints,Footnote 10 I consider the PLS approach as preferable to the CB method for estimating the proposed SEM.

3.1 The PLS–SEM model for estimating the structural corruption perception index

In this section, I provide a formal representation of the PLS–SEM based on existing theory and empirical evidence on the causes and consequences of corruption. Moreover, the structural (or inner) model of PLS–SEM allows one to model both the direct effects of the “causes” of corruption, but also the interactions among them. Accordingly, the inner model of PLS–SEM specification may be described by the system of Eq. (1):

$$ \begin{aligned} S\!{\text{-}}CPI_{it} & = \beta_{21} MediaFreed_{it} + \beta_{31} Educ_{it} + \beta_{41} Democracy_{it} + \beta_{51} Regulation_{it} \\ & \quad + \,\beta_{61} NaturalRes_{it} + \beta_{71} OilRent_{it} + \beta_{81} SizePubSec_{it} + \beta_{91} Decentraliz_{it} \\ & \quad + \,\beta_{10,1} Fractionaliz_{it} + \beta_{11,1} FrenchC_{it} + \beta_{12,1} PortugC_{it} + \beta_{13,1} SpanC_{it} \\ & \quad + \,\beta_{14,1} ItalianC_{it} + \beta_{15,1} Bel\& DutC_{it} + \beta_{16,1} British,US,Austl_{it} \\ & \quad + \,\beta_{17,1} Catholic_{it} + \beta_{18,1} Protestant_{it} + \beta_{19,1} Muslim_{it} + \varsigma_{1,it } \\ MediaFreed_{it} & = \beta_{32} Educ_{it} + \varsigma_{2,it } \\ Democracy_{it} & = \beta_{24} MediaFreed_{it} + \beta_{34} Educ_{it} + \varsigma_{4,it } \\ Regulation_{it} & = \beta_{35} Democracy_{it} + \varsigma_{5,it } \\ NaturalRes_{it} & = \beta_{76} OilRent_{it} + \varsigma_{6,it } \\ Decentraliz_{it} & = \beta_{10,9} Fractionaliz_{it} + \varsigma_{9,it } \\ Catholic_{it} & = \beta_{11,17} FrenchC_{it} + \beta_{12,17} PortugC_{it} + \beta_{13,17} SpanC_{it} + \beta_{14,17} ItalianC_{it} + \varsigma_{17,it } \\ Protestant_{it} & = \beta_{15,18} Bel\& DutC_{it} + \beta_{16,18} British,US,Austl_{it} + \varsigma_{18,it } \\ SocEconDev_{it} & = \beta_{1,20} S\!{\text{-}}CPI_{it} + \beta_{3,20} Educ_{it} + \beta_{6,20} NaturalRes_{it} + \varsigma_{20,it} , \\ \end{aligned} $$
(1)

where the subscript i = 1,…,165 indicates the country and t = 1995,…,2016 denotes the year.

In the system (1), the first equation accounts for the direct effects of the causes on corruption; in the second through eighth equations, I model the interactions among the causes of corruption. The path-coefficients estimated in those seven equations allow me to estimate indirect (mediated) effects of the causes on S-CPI. Finally, the last equation accounts for the consequences of corruption on the socioeconomic system. It is included in the model to improve the reliability of the estimates in accordance with Duncan’s (1975) remark (i.e., the meaning of the latent variable depends completely on how precisely I select causes and indicators in the SEM specification). In that sense, including within the empirical model the effect of corruption on socioeconomic development allows me to better describe the target latent construct (i.e., perceived corruption).

The dataset used for the empirical analysis is extracted from “The Quality of Government Standard Dataset” collected by Teorell et al. (2018). All variables are scaled to have zero means and unit variances.

Owing to the prediction-oriented focus of the proposed SEM, I deal with missing values in the dataset by applying different missing data treatments. First, I apply the pairwise deletion method.Footnote 11 That option is chosen because it retains as much information as possible. The second treatment is based on interpolating the missing values. I use three different datasets in that empirical analysis as a function of the replacement used: (1) a dataset, labelled “MV”, wherein pairwise deletion is applied with no replacement; (2) a dataset, labelled “I”, for which I first replace missing values by linear interpolation—i.e., calculated using the last valid value before the missing value and the first valid value afterwards—later I apply pairwise deletion; (3) a dataset, labelled “IFB”, in which I apply, in the following order, linear Interpolation (I), “forward” interpolation (i.e., I use the last observed value to replace subsequent missing values of the same country) and “backward” interpolation (i.e., I impute the newest observation to replace earlier missing observation of the same country) and, lastly, pairwise deletion.

4 Empirical results

I estimate several PLS–SEM specificationsFootnote 12 using three missing data treatments (MV, I and IFB). Taking into account that the results are robust to alternative missing data treatments and model specifications, for the sake of brevity, I report estimates based only on the IFB dataset and two models: the broadest model specification (Model 1) and a restricted model (Model 2) in which the determinants of corruption that cannot be affected by policymakers (i.e., Colonial Heritage and Religion belonging) and the “consequence” of corruption (i.e., Socioeconomic Develop) are excluded in order to focus on normative interpretations. Accordingly, Model 1 is predictive (i.e., to explain the S-CPI index), while Model 2 is applied to derive policy implications by conducting an importance-performance map analysis (IPMA) (Ringle and Sarstedt 2016).

Once the SEM models have been specified and the PLS-algorithm generates the estimates,Footnote 13 Hair et al. (2019) suggest first to evaluate the reliabilities and validities of the latent variables in the outer models and, only if the outer models are reliable, evaluating the reliability of inner model.

Accordingly, to assess the reflective outer models, I test: (1) the reliability of reflective indicator—outer loadings should be larger than 0.708; (2) internal consistency reliability—ρA falls between the thresholds 0.70 and 0.95; (3) convergent validity—the average variance extracted (AVE) of each construct is 0.50 or larger; (4) discriminant validity assessment—representing the extent to which the construct is empirically distinct from other constructs—Henseler et al. (2015) suggest that a heterotrait-monotrait (HTMT) value below 0.90 provides evidence for discriminant validity between a given pair of reflective constructs. To assess the formative outer models, I analyze: (1) the indicator weights’ statistical significances—p values should be less than 0.05 and (2) indicator collinearity—variance inflation factors (VIFs) of 5 or above indicate potential collinearity problems.

Table 2 reports the outer loadings and weights and assessment statistics for the reflective and formative models.

Table 2 Outer loadings and weights (p values in parenthesis)—assessment statistics

Table 2 shows that every outer loading is statistically significant and with a value larger than 0.71; ρAs are higher than 0.75. Corruption reveals some problems of indicator redundancy—because ρA is larger than 0.95—I consequently have excluded “Control Corr. Rev. from model 2 to reduce indicator redundancy; convergent (AVE) and discriminant validity (HTMT) are satisfactory.Footnote 14 The formative latent constructs return satisfactory assessment statistics for both models.

Once the reliability and validity of the outer models have been positively evaluated, the second step in assessing a PLS–SEM consists of evaluating the inner (or structural) model. Table 3 reports the standardized path coefficients and standardized total effects for each latent construct on corruption.

Table 3 Path coefficients (direct effects) and total effects on corruption—inner model

Table 3 shows that the estimated path coefficients are qualitatively robust to the two specifications, with the only exception being the direct effect of “Education” on “Corruption”, which is statistically significant only in Model 1. I find that the path coefficients (i.e., direct effects) carry the expected signs with some exceptions: lower Decentralization, higher Fractionalization, abundance of Natural Resources and British colonial heritage are not associated with more corruption.

In particular, on the one hand, countries with higher Quality of Regulation, Quality of Democracy, Media Freedom, Natural Resources, Education (only for model 1), Fractionalization, higher population percentages of Protestants and countries with Belgian, Dutch or French colonial heritages are perceived as being less corrupt. On the other hand, higher levels of Decentralization, Oil Rent, higher population percentages of Catholics and countries with Italian colonial heritages are associated with higher levels of corruption. Lastly, looking at the consequences of corruption, my findings validate the common finding that more corrupt countries show lesser Socioeconomic development.

Following Hair et al. (2019), in addition to (1) the statistical significances of standardized path coefficients in assessing the inner model, I check (2) collinearity among latent constructs—VIFs of more than five are indicative of probable collinearity issues; (3) the coefficient of multiple determination (\(R_{{}}^{2}\))Footnote 15; (4) cross-validated redundancy, also known as the Stone-Geisser Q2—which assesses the inner model’s predictive relevanceFootnote 16; and (5) the model’s predictive power (PP)—by checking if the PLS–SEM analysis yields higher prediction errors in terms of Root Mean Square Error (RMSE) than the linear regression model (LM).Footnote 17 Table 4 reports assessment inner statistics and criteria for model selection among a finite set of models—i.e., the Bayesian Information Criteria (BIC) and Akaike’s Information Criterion (AIC).Footnote 18

Table 4 Assessment of structural model

As far as the criteria for inner model assessment are concerned, Table 4 shows probable collinearity issues only for “Corruption” in the model 1. The explained variance of the key variable in the analysis at hand (i.e., corruption) has a large R2 (about 0.80). Looking at the Stone-Geisser Q2, the most predictive relevance is associated with “Corruption”, while “Quality of Democracy”, “Quality of Regulation”, “Media Freedom”, “Catholics” and “Socioeconomic Development” all have “high” or “medium” predictive relevance. All of the latent variables, with the exception of “Protestants”, reveal high predictive power (PP) in estimating the observed indicators. In conclusion, taking also into account the BIC and the AIC metrics, model 1 is considered to be the best specification for predicting latent scores, i.e., S-CPI.Footnote 19

Following the current literature, I standardize the estimated latent scores of “perceived corruption” (\(\hat{x}_{it}\)) in order to derive an index ranging between 0 and 100. The standardization is based on the following formula:

$$S\!{\text{-}}CPI_{it} = 100\frac{{\hat{x}_{it} - \mathop {Min}\limits_{\forall i,\forall t} \left( {\hat{x}_{it} } \right)}}{{\mathop {Max}\limits_{\forall i,\forall t} \left( {\hat{x}_{it} } \right) - \mathop {Min}\limits_{\forall i,\forall t} \left( {\hat{x}_{it} } \right)}}$$
(2)

where for Model IFB,Footnote 20 the following values are obtained: \(\mathop {Min}\limits_{\forall i,\forall t} \left( {\hat{x}_{it} } \right) =\)− 2.581 and \(\mathop {Max}\limits_{\forall i,\forall t} \left( {\hat{x}_{it} } \right) =\) 1.693. Focusing on the “extreme cases”, I find that the four nations with the smallest indexes of perceived corruption are Denmark, Finland, New Zealand and Sweden. On the other side of the ranking, the most corrupt countries are Somalia, the Democratic Republic of Congo, Iraq and North Korea.Footnote 21 In terms of the time trends of S-CPI, Fig. 1 shows some representative countries: Italy and South Korea (i.e., countries representative of developed economies with relatively high levels of perceived corruption); Germany, United States and France (i.e., developed countries with relatively low levels of perceived corruption) and China (i.e., a developing economy with a relatively high perception of corruption).

Fig. 1
figure 1

Some annual estimates of Standardized S-CPI

To conclude the descriptive analysis, Table 5 compares the standardized S-CPI and the most widely known existing indexes of perceived corruption.Footnote 22

Table 5 Comparison of perceived corruption indexes

The root mean square error, mean absolute error and the correlation matrix reveal that the corruption perceptions index published by Transparency International and the control of corruption index published by Worldwide Governance Indicators are more similar to the S-CPI. However, taking into account that the S-CPI covers more than 30% (22%) of country-level scores over the 1995–2016 period than the corruption perceptions index or the control of corruption index and, moreover, that its scores are validated by statistical and economic theories, I conclude that the S-CPI can be considered to be a superior data source for empirical analysis.

4.1 Policy implications

In terms of policy implications, normative inferences as to which are the most effective channels for fighting corruption can be drawn by conducting an importance-performance map analysis (IPMA) (Ringle and Sarstedt 2016) and a partial least squares multi-group analysis (PLS–MGA) on model 2.

The IPMA extends the standard SEM results based on the total effects of the latent constructs on Corruption by taking the performance of each determinant into account. That approach makes it possible to identify the causes that have a relatively high “Importance” for Perceived Corruption (i.e., those latent variables that have larger total effects on the target construct), but also a relatively low “Performance” (i.e., low average latent variable scores). Graphically, the importance-performance map reports the (unstandardized) total effects on the x-axis to measure the “Importance” and, on the y-axis, the average rescaled latent variable scores to measure the “Performance”.Footnote 23 For the interpretation of the results, Ringle and Sarstedt (2016) point out as the constructs in the lower right area of the IPMA are characterized by high importance for the target construct, but reveal low performance, they should be considered to be particularly relevant for policy action (i.e., there are placed the potential first-best policies for deterring corrupt practices). Figure 2 shows the IPMA map and the four quadrants that identify the priority order for policy actions.

Fig. 2
figure 2

Importance-performance map of perceived corruption

According to the IPMA, the main policy implications can be summarized as follows:

  1. (1)

    Reducing corruption is hard because no “first-best” policies that, affecting some causes of corruption—with relatively low performance (i.e., below the average of 30.1—horizontal line) and particularly high importance (i.e., total effect above the average of 0.9—vertical line)—reduce a country’s perceived corruption markedly;

  2. (2)

    Looking at the second priority for policy actions aiming to reduce corruption, IPMA suggests improving Quality of Democracy, reducing (ethnic, linguistic and religious) Fractionalization and fostering Media Freedom;

  3. (3)

    More Education and Decentralization, on the one hand, have relatively low “importance” in curbing Corruption, but, on the other hand, both reveal relatively low performance. Education and Decentralization therefore are potentially relevant for policy actions, but have less significant marginal effects;

  4. (4)

    Quality of Regulation and Size of Public Sector have lower importance and larger performance on Corruption than the average; hence, a decision maker should prioritize the above-mentioned alternative policies to reduce Corruption;

  5. (5)

    Oil Rents and Natural Resources have negligible (unstandardized) total effects on Corruption.

In order to explore the overall validity of the just listed policy implications, I conduct a multi-group analysis (MGA) by clustering the global sample in subgroups based on geographical areas and estimating inner and outer coefficients for each subgroup separately. Table 6 reports the standardized total effects of each potential determinant of corruption.

Table 6 Standardized total effects of corruption by geographical area—model 2

The findings support the hypothesis that the order of priority for policy actions (see the “Rank” values in Table 5) change according to the geographical areas considered. Focusing attention on the main results, Media Freedom has the largest effect in reducing Corruption all around the world, with the exception of North Africa, the Middle East, Latina America and the Caribbean, where the priority is improving the Quality of Democracy. More Education contributes to reducing Corruption, with the exception of Western Europe and North America for which the two phenomena are statistically uncorrelated. Similarly, the Size of public Sector is negatively correlated with Corruption, but this correlation does not materialize in South, East and South-East Asia. As far as the sign of the effect of Natural Resources on Corruption concerns, estimates for that group-specific analysis corroborates the indeterminateness of the empirical literature. Indeed, the direction of effect depends on the specific geographical group studied. Abundance Natural resource abundance has a negative effect on corruption in North Africa and the Middle East; it carries a positive sign in Europe, post-Soviet Union, Sub-Saharan Africa and North America; while it does not have a statistically significant effect in Latin America, the Caribbean, South, East, South-East Asia or the Pacific. Oil Rent has the largest positive effect on Corruption in North Africa, the Middle East and Sub-Saharan Africa, while that determinant of corruption does not have a statistically significant effect in Europe, post-Soviet Union, Latin America, Caribbean or North America.

In the second step of MGA—which it is not reported here for the sake of brevity—the statistical tests of differences in group-specific coefficients reveal that these differences often are statistically significant across geographical areas. The normative implication is that the efficacies of policy actions significantly differ from country to country. Given that the effectiveness of policies to reduce a specific type of corruption significantly varies from actions against other types of corruption, a policy maker should select a strategy based on empirical analysis and best practices of countries with similar institutional and economic development because the magnitudes of different types of corruption (e.g., grand, petty, legal, illegal) depend on socioeconomic and institutional development as well.

5 Conclusions

This research examines the causes and consequences of corruption by adopting partial least squares—structural equation modeling (PLS–SEM). Approaching corruption as a latent construct, I estimate an index of perceived corruption in 165 countries from 1995 to 2016.

From a methodological perspective, the analysis of empirical relationships between constructs that are not directly observed (e.g., corruption), intrinsically multidimensional (e.g., institutional quality, economic development), or both, predicting an overall index of perceived corruption makes the PLS–SEM approach worthwhile for the relevant strand of literature. The methodology presented herein allows researchers to estimate the determinants of corruption in a unified framework that relies on the existing theory and empirical evidence on corruption. It is made possible by the opportunity that SEM supplies to specify simultaneously both the determinants that affect corruption directly, indirectly, or both as well as the effects of corruption on a country’s socioeconomic performance—the “structural” or “inner” model of the SEM. On the other hand, SEM allows one to exploit currently available indexes of perceived corruption as complementary observable measurements of corruption—the “measurement” or “outer” model. However, the proposed statistical approach shares two of the problems most relevant in the empirical literature. First is the problem of the divergence between “perceived” and “actual” corruption”. Second, the PLS–SEM provides unsatisfactory solutions to the problem of endogeneity. Specifically, it is likely that some variables, identified in the model as “causes” of corruption, also are influenced by the perceived magnitude of corruption, which depends, e.g., on institutional quality. Therefore, I suggest caution in assessing the relationships between institutional explanatory variables and perceived corruption as one-way causal links instead of bi-directional interactions that generate feedback loops.

On the positive side, the estimated S-CPI has two main advantages over existing indexes of perceived corruption. First, it provides estimates of perceived corruption by exploiting not only the existing measures, but also combines elements of the extant theoretical and empirical literature on the causes and consequences of corruption within a unified framework. Second, it reduces measurement errors in two ways: (a) by using several indicators for each “unobservable” variable (e.g., corruption; quality of institutions; socioeconomic development). Accordingly, the proposed index can be thought of as a “meta-index”. (b) By following the conventional statistical remedy to enlarge the sample size in order to reduce measurement errors. Specifically, I consider about 160,000 observations (coming from 47 manifest variables concerning 165 countries over a period of 22 years). Those two correlated strategies make my findings robust to alternative model specifications and to strategies for replacing missing values.

On the normative side, I derive some policy implications from PLS–SEM findings by analyzing the estimated direct and indirect (i.e., mediated by other potential causes) effects and by conducting the importance-performance map analysis proposed by Ringle and Sarstedt (2016).

In general terms, I find that focusing only on direct effects may be misleading. For instance, Quality of Regulation has the largest direct effect in reducing Corruption (− 0.44), followed by Media Freedom (− 0.54) and Quality of Democracy (− 0.06), but, once the indirect effects are taken into account, the ranking of the most important causes of corruption change as follows: Media Freedom (− 0.54); Quality of Regulation (− 0.44); Education (− 0.39); Quality of Democracy (− 0.35). Furthermore, by conducting an IPMA to identify the most effective policy actions for fighting corruption, I find that a decision maker should primarily be concerned with (in descending order): Quality of Democracy, Fractionalization, Media Freedom, Decentralization, Education, Size of Public Sector and Quality of Regulation. For other determinants (e.g., Natural resources, Oil Rents), that are often considered as important causes of corruption in the existing empirical literature, my results do not validate those conclusions.

The last step of the empirical analysis consists in implementing a multi-group analysis by clustering the global sample of countries in subgroups based on geographical areas and estimating the total effect on each subgroup separately. According to that analysis, the estimated effects of the causes on corruption vary significantly across geographical areas; consequently, policy actions also should differ from country to country. The rationale is that different types of corruption (i.e., “grand” and “petty”, “legal” and “illegal”) exist and their relative importance depends on economic development, the quality of institutions, cultural background, and so on. Accordingly, the best policies for fighting corruption consist in taking action first on the most important causes (i.e., with the largest estimated total effects) and with the most room for improvement. Each of the dimensions of policy action should be estimated on sub-groups of homogenous (in terms of institutional quality and economic development) countries.