1 Introduction

Progress in fighting corruption on all fronts requires a measurement of corruption itself to diagnose problems and monitor results. This recognition had renewed interest in many international organisations and also among aid donors, aid recipients, investors, and civil society. Many recent estimations have emphasised the considerable detrimental effects of corruption on growth rate.Footnote 1

The question of exactly how corruption can be estimated has been widely debated (Kaufmann et al. 2007). In several cases, corruption is measured by drawing informal views of relevant stakeholders and the composite subjective index is an aggregate measure of these perceptions,Footnote 2 which involve an irreducible element of uncertaintyFootnote 3 and a lack of transparency, especially in combining information from different sources. However, because corruption usually leaves no paper trail, perceptions of corruption based on an individual’s actual experiences are viewed as the best—and sometimes the only—information available.

A related concern is the excessively narrow current definition of corruption, which is characteristically described as the misuse of public power for private benefit,Footnote 4 which leaves many conceptual ambiguities open to question (Svensson 2005). For example, although the term “private benefit” may describe the receipt of money or some kind of asset of value, it may involve increased political power or status, in addition to advantages deriving from receiving promises for future favours or benefits for closely connected persons (nepotism or favouritism). In addition, this definition does not consider corruption as a multidimensional phenomenon. For example, there is a certain degree of consensus in recognising at least two relevant dimensions of corruption—defined as political and administrative corruption—which should be taken into account when anti-corruption policies are implemented (Bardhan 1997, 2006; Warren 2004).

Based on this strand of the literature, this paper intends to examine some of the methodological issues concerning the identification and measurement of multidimensional corruption indices from multiple indicators. We exploit data from the Business Environment and Enterprise Performance Survey 2002 (BEEPS) to evaluate the multiple dimensions of corruption across the countries of Eastern Europe and Central Asia. The multidimensional framework (M-IRT) that was proposed by Bartolucci (2007) includes latent class (LC) analysis (Goodman 1974, 1978) along with Item Response Theory (IRT). It allows us to characterise the role that is played by the various types of information contained in responses to the questionnaire (items) and to identify the underlying LCs of corrupt activities.Footnote 5 This method also includes the well-known multidimensional Rasch model (Rasch 1961), as a particular case.

Although the IRT model was initially used in psychometrics to examine ability/achievement tests (see, Bock 1997), it has since been applied in social and economic research. For example, Cappellari and Jenkins (2006) applied the IRT model to explain several problems in the deprivation scales of widely used sum-score deprivation indices. Similar methods were used by Kuklys (2004) to analyse housing and healthcare functioning and Faye et al. (2009) used a set of food insecurity indicators to derive a food deprivation scale. The M-IRT extension was also applied in identifying the latent health status of elderly patients who are currently receiving healthcare assistance in Italy (Bartolucci et al. 2010, 2012).

The proposed analysis identifies a multidimensional structure of corruption and finds that four LCs exist, with increasing levels of corruption. The contribution of our findings is emphasised by the robust association of our identified LCs of administrative corruption with the indices of (mainly) administrative corruption derived from BEEPS and the CPI of Transparency International (TI). In addition, our illustrative example, which uses our different LCs to estimate the relationship between firms’ competitiveness and corruption, attempts to disentangle several contrasting expected signs that an aggregate indicator of corruption generally yields in the cited relationship.

The rest of this paper is organised as follows. Section 2 describes the dataset that is used for our analysis. Section 3 discusses the empirical framework that we use to quantifying the number of LCs and presents our main results. Section 4 shows the robustness of these results by comparing the aggregate indices of firms’ components of corruption with those extracted directly from BEEPS and CPI of TI. Section 5 presents the estimates of the relationship between a firm’s competitiveness and corruption using the identified LCs of corruption. Finally, some conclusions are drawn in Sect. 6.

2 Corruption Data from the BEEPS Firm Survey

Our analysis uses data on corrupt attitudes perceived by firms from BEEPS, which is jointly implemented by the European Bank for Reconstruction and Development (EBRD) and the World Bank. These surveys have been carried out, usually every three years, since 1999. They examine several important problems affecting firms in Eastern Europe and Central Asia, such as questions about firms’ sales, investments, innovations, and access to financing, along with business regulations, taxation, and qualitative perception of the business environment. They also contain details of the firms’ characteristics, such as sector and size in terms of employees. Although other firms’ surveys have been conducted by the World Bank, the extension of these investigations were designed by a single country. In addition, this sample of microdata is of particular interest because it includes countries that had been undergoing transition towards freer markets and which were potentially associated with corrupt practices.

One section of BEEPS is addressed to government policies and practices, and it includes items regarding perceived corruption by firms. One of the most interesting features of BEEPS is the sample design that is used to collect the data, which was of particular interest for our aims. Based on the managers’ perceptions and related to the line of business in which they operate, these subjective measures of corrupt practices cover the interviewees’ direct experience and they more clearly solicit answers on corruption. Obviously, the subject matter of BEEPS is sensitive. Consequently, it was natural to ask why firms would be willing to answer questions about their involvement in corruption. A number of features were incorporated into our survey to improve the quality of the data, according to three key components (Hellman et al. 2003): (1) before beginning an interview, the nature and purpose of the research project was explained; (2) data were recorded without any mention of the name of the respondent or the company, and they were collected by international organisations and not by national governmental authorities; and, (3) on sensitive topics, our questions were carefully worded to remove any implication of blame from the respondents, who were usually indirectly asked to describe the typical situation of a similar firm.Footnote 6

In this paper, we used data from the survey carried out in 2002 because, in contrast to the other years, the data for that year do not contain many missing values in corruption responses. This provides some reassurance that most firms in the countries where BEEPS was carried out were in fact willing to talk about corruption, even if under-reporting remains an issue. If firms in some countries were simply unwilling to talk about corruption, then it seems logical that they would give favourable responses for corruption questions and, therefore, they would induce a downward bias.Footnote 7

Table 1 Description of the dataset

Our aim is to characterise corruption from different underlying latent causes. We exploit the potential of this questionnaire, which distinguishes a priori between items that can proxy administrative and political corruption. Administrative corruption, which is aimed at influencing the proper implementation of laws and regulations, is often referred to as bribery by individuals, groups or firms in the private sector. In contrast, political corruption refers to how firms influence the content and implementation of specific laws and regulations for the benefit of narrow private interests rather than the broad public interest (Scott 1972; Bardhan 1997, 2006; Warren 2004).

The questionnaire items are listed in Table 1. Looking at these proxied definitions of administrative corruption, we selected 10 items, covering a wide range of features related to corrupt practices (Anderson and Gray 2006). Three items concern corrupt payments in the field of public services and permits, which are closely connected with the fixed costs of doing business. A second set of items is linked with bribes to weaken the activities of public inspections within a firm, or inspections of buildings, health and safety regulations, and environmental safety. The remaining sub-set of items describes informal payments that are required in dealings with the public administration, imports and customs, courts and tax collection. It should be noted that Items 9 and 10 are included in the administrative corruption section because they reflect the use of bribery to speed procedures necessary for making overdue payments, and the influence—rather than manipulation—of the contents of new legislations and rules, respectively.

As regards political corruption—which in our case refers specifically to whether payments by other firms affect respondents’ firms—a set of six items was used.Footnote 8 In particular, managers responded to questions regarding private payments or gifts that are made with the aim of affecting the votes of parliamentarians and government officials on specific laws, the contents of governmental decrees, and the decisions of elected officials. In addition, private payments or gifts were considered to potentially influence the decisions of judges working for the criminal and civil courts, together with benefits to bank officials in influencing central bank policies and decisions.

All of the items reported in the Table 1 range in an interval 0–4, where 0 represents the lowest degree of impact of the corruption item on the firms’ business. For the purpose of the empirical analysis, these items are transformed in dummy variables recording as 0 when the original variable has values of 0 “no impact”, 1 “minor” and 2 “moderate”, and as 1 when the original variable has values of 3 “major” and 4 “decisive”.

Our dataset contains details of 6667 firms in 27 countries. Table 2 lists by country the descriptive statistics of firms and the itemised responses to the questions on administrative and political corruption, reporting the fraction responding “YES” in each set. Although the proportion of firms interviewed across countries does not differ greatly, the number of interviewed firms (minimum 170) was sufficient to allow us to extract consistent information on corruption issues within countries. In the appropriate columns of Table 2, the binary responses of items in terms of perception of corruption indicate not only the great variety of administrative and political corruption across countries but also the fact that the items referring to the two types of corruption do not necessarily go hand in hand. This preliminary evidence emphasises the idea of corruption as a complex phenomenon, which requires a suitable statistical technique to identify and measure the main characteristics of corruption.

Table 2 Statistical distribution of firms and questionnaire items of perceived corruption, by country

3 Framework of Analysis

In this section, we describe a LC model that explains the multidimensionality of corruption. According to the constrained version of this model, we view the tendency to be involved in corruption as a latent characteristic affecting the conditional probability of responding in a certain way to each item. According to this characteristic, the respondents were grouped into a certain number of LCs. Statistical criteria were mainly used to choose the number of such classes.

3.1 The Statistical Model

We first describe the M-IRT model proposed by Bartolucci (2007). Let \(y_{j}\) denote the response variable for the j-th item of the questionnaire, which is binary (\(y_{j} = 0, 1\)). We denote by n the number of firms in the sample, and we assume that they respond to J items measuring D different latent characteristics (tendency to be involved in corruption) and we also assume that every item measures only one of these characteristics. This model allows us to examine the correlation between the latent dimensions of corruption. Here, \(n=4610\) and \(J=16\), and, according to the items defined in the dataset, we expect at least two different dimensions, so that \(D\ge 2\).

The model that we adopt is based on the following parameterisation of the conditional probabilities of success \(\lambda _{j|c}\) as a logit function:

$$\begin{aligned} {\text {logit}}(\lambda _{j|c})=\gamma _{j}\left( \sum _{d}\delta _{jd}\theta _{cd} -\beta _{j} \right) , \qquad j=1,\ldots ,J, \end{aligned}$$
(1)

where with reference to item j, \(\lambda _{j|c}=(p (y_{j}=1) |\ firm \ is \ in \ class\ c)\) and \(\delta _{jd}\) is a dummy variable equal to 1 if \(j\in {\mathcal {J}}_{d}\) and to 0 otherwise, where \({\mathcal {J}}_{d}\) is the sub-set of items measuring dimension d; \(\theta _{cd}\) is a measure of the latent trait (dimension d) for subjects in LC c, and \(\beta _{j}\) indicates the IRT difficulty parameterFootnote 9 as the overall tendency to respond 0 to item j (for a more detailed discussion of this point, see Bartolucci 2007). Parameters \(\gamma _{j}\) may be set equal to a fixed value according to one-parameter logistic parameterisation (1PL) or they may be left unconstrained and, thus, estimated as in two-parameter logistic parameterisation (2PL). In the second case, the model allows for the different sensitivity of the item measuring the latent trait. The relative importance of the difference between a firm’s trait level and the item threshold is, therefore, determined by the magnitude of the discriminatory power of item (\(\gamma _{j}\)).

Under the assumption of local independence,Footnote 10 the distribution of \(\varvec{Y}\) for subjects in the c-th LC is:

$$\begin{aligned} p(\varvec{y}|c)=p(\varvec{Y}= \varvec{y} |\varvec{\Theta }=\varvec{\theta }_{c})=\prod _{j}\lambda _{j|c}^{y_{j}}(1-\lambda _{j|c})^{1-y_{j}}, \end{aligned}$$
(2)

where \(p(\varvec{Y}= \varvec{y} | \varvec{\Theta }= \varvec{\theta }_{c})\) is the conditional probability that a subject with latent vector \(\varvec{\theta }\) provides to response configuration \(\varvec{y}\). Through a finite mixture, we can express the distribution of \(\varvec{Y}\) as:

$$\begin{aligned} p (\varvec{y})= \sum _cp(\varvec{y}|c)\pi _{c}, \end{aligned}$$
(3)

where \(\pi _{c}=p(\varvec{\Theta }=\varvec{\theta }_{c})\) are the weights corresponding to each LC.

The log-likelihood function, which is used to estimate the above multidimensional LC Rash model, is thus:

$$\begin{aligned} \ell (\varvec{\phi })=\sum _{\varvec{y}}n(\varvec{y})\log (p (\varvec{y})), \end{aligned}$$
(4)

where \(\varvec{\phi }\) is the vector containing all identifiable parameters of the model and \(p (\varvec{y})\) is computed as a function of \(\varvec{\phi }\). In particular, to make the parameters identifiable, we use constraint \(\beta _j = 0\), when the parameterisation is of 1PL type, and \(\beta _j = 0, \gamma _j = 1\) when it is of 2PL type, where \(J = {j_1, \ldots , j_D }\), and \(j_d\) denotes a specific element of \({\mathcal {J}}_{d}\).

To estimate the model parameters, we maximise log-likelihood \(\ell (\varvec{\phi })\) by the EM algorithm (Dempster et al. 1977); for further details, see also Bartolucci (2007) and Bartolucci et al. (2012). Let us assume that we know frequencies \(m(\varvec{y}, c)\) of the contingency table in which these subjects are cross-classified according to response configuration (\(\varvec{y}\)) and the LC (c). The EM algorithm is based on the so-called complete log-likelihood:

$$\begin{aligned} \ell ^{*}(\varvec{\phi })=\sum _{\varvec{y}}\sum _{c}m(\varvec{y}, c) \log p( c,\varvec{y}). \end{aligned}$$
(5)

At the E-step of this algorithm, we compute the expected value of \(\bar{m}(\varvec{y}, c)\) for each \(\varvec{y}\) and c, given \(n(\varvec{y})\), and at the M-step we maximise complete log-likelihood, in which every frequency \({m}(\varvec{y}, c)\) is replaced by the corresponding expected value \(\hat{m}(\varvec{y}, c)\). The EM alternates these two steps until convergence.

The estimated parameters from the EM algorithm are then used to compare (two) multidimensional models. The hypothesis test is of type:

$$\begin{aligned} H_{0}:\varvec{g}(\varvec{\phi }) = {\mathbf {0}} \end{aligned}$$
(6)

where \(\varvec{g}(\varvec{\phi })\) is a vector-valued function and \(\varvec{0}\) denotes the column vector of zeros of suitable dimensions. To select the most suitable model, we use the likelihood ratio (LR) test statistic

$$\begin{aligned} LR=2\sum _{\small {\varvec{y}}} n(\varvec{y})\log \left[ \frac{\hat{p}_{D}(\varvec{y})}{\hat{p}_{D-1}(\varvec{y})}\right] \end{aligned}$$
(7)

where \(\hat{p}_{D}(\varvec{y})\) refers to the assumed model with D different dimensions and \(\hat{p}_{D-1}(\varvec{y})\) to the constrained model with \(D-1\) dimensions. When the response probability is modelled by an M-IRT model, the resulting LR statistic has a \(\chi ^2\) distribution with a number of degrees of freedom equal to the difference between the number of parameters of the full multidimensional and restricted models, in which we merge two distinct dimensions.

This method is used to cluster items in homogeneous groups, corresponding to various types of corruption. A crucial point in identifying various types of corruption is the strength of the correlation between two distinct dimensions. We compute this correlation as:

$$\begin{aligned} \hat{\rho }_{d_1,d_2} =\sum _{c=1}^{C}\hat{\theta }_{c,d_1}\hat{\theta }_{c,d_2}\pi _{c} \end{aligned}$$
(8)

where \(\theta _{c,d_1}\) and \(\theta _{c,d_2}\) are the standardised estimates of the support points referring to two specific dimensions, \(d_1\) and \(d_2\), for subjects in LC c (e.g., \(\hat{\pi }_{c}\) is the estimated weight of this class with \(c=1,\ldots ,C\)). After identifying the LCs in according to \(\hat{\theta }_{cd}\) and \(\hat{\rho }_{d_1,d_2}\), as a final step, we compute the firm’s trait scores by using the expected value of frequencies \(\hat{m}(\varvec{y}, c)\) estimated at the E-step, defined as:

$$\begin{aligned} \hat{m}_{\varvec{y} |c}=n(\varvec{y})\frac{p(\varvec{y}|c)\pi _{c}}{\sum _{h}{p(\varvec{y}|h)\pi _{h}}} . \end{aligned}$$
(9)

3.2 Results

Before presenting the main results of the analysis, we have to choose the number of classes, which is crucial for model identification in the multidimensional LC Rash model. Although this selection is mainly based on information criteria [i.e., BIC and consistent Akaike information criterion (e.g., CAIC)], we complement it with indices measuring goodness-of-fit. A detailed description of the proposed model selection strategy is reported in Table 9 of “Appendix A”. This analysis is based on information criteria and it strongly suggests that the optimal number of LCs is 4, this result is also confirmed by the goodness-of-fit measures (which are given at the end of the table).

We will now summarise the main results of the multidimensional approach to corruption. As described in the previous section, the procedure consists of running a sequence of nested models beginning with an LC model, in which the number of dimensions is equal to the number of selected items (unconstrained model). An analytical overview of the results from hierarchical clustering analysis is given in Table 3. Column 3 lists the combinations of items for each of the sequential steps; Columns 4 and 5 list the deviance from the initial LC model and from the previous model, respectively. Finally, the p value of LR test statistics appears in Column 6.

Table 3 Output of hierarchical cluster analysis

Following the LR test proposed in Eq. (6), Table 3 lists multidimensionality, keeping five significant dimensions, as also confirmed by hierarchical clustering analysis (see, Appendix B). Differences within types of corruption appear in the items of administrative corruption, which characterise Dimensions 1, 2, and 3; the items of political corruption are given in Dimensions 4 and 5.

Looking at the item definitions (Table 1), we can progressively identify the dimensions of administrative corruption as follows:

  • Bureaucrats’ need to influence legally established regulations and the timing of applications of laws;

  • Unofficial payments for inspections in the fields of occupational health and safety, fire and buildings, and environmental works;

  • Application of government contracts, business licences and contracts, public services, and general taxation problems.

The items which cluster political corruption are:

  • Corruption by firms to influence the contents of specific laws and regulations;

  • Corruption to influence decisions by the judges of criminal courts and commercial cases, and decisions or policies of the officials of the central bank.

Table 4 Estimates of conditional probabilities and discriminatory power of items of selected model

Table 4 lists the estimated conditional probabilities of the LC model for each selected item, together with estimates of parameters \(\gamma _{j}\). It shows that there is a dissimilar order between the first 10 items (administrative corruption) and the last 6 (political corruption). For a clearer interpretation, we refer to the components of breadth and depth of corruption, as defined by Mocan (2008). The former describes the extent to which corruption is widespread in a country, and the latter describes the extent to which each LC component affects corruption. Our results indicate that administrative corruption items follow an increasing order across the four LCs, and the last six political corruption items show higher conditional probabilities, regarding LC 2 and, partly, LC 4. In addition, we find lower conditional probabilities for LC 1, which achieves a (modest) influence in Items 2 and 7 (see Table 4), which are associated with administrative corruption. This implies that the estimated level of corruption in LC 1, mainly corresponding to tax evasion or improving access to public services, does not seem to be an obstacle for the business environment. However, as argued previously, we cannot exclude the possibility that firms under-report the true perception of corruption, or at least do not provide answers to questions on this sensitive topic. Accordingly, we exercise caution by classifying the estimates by the conditional probability of this LC and refer to virtually no corruption.

This discussion suggests a crucial point of the present study is the interpretation of the LCs. Useful suggestions for this point come from the estimate of dimensions on different classes and from the correlation measures between identified dimensions (Table 5).

Table 5 Estimates of \(\theta \) and correlations among dimensions

This analysis has three main results. First, estimated parameter \(\theta _{cd}\) in Eq. (1) formally identifies the four LCs according to their different dimensions. As shown in the upper part of Table 5, within the four LCs, all five dimensions are increasingly ordered, the lowest values appearing in class 1 and the highest in class 4. This also means that we can order corruption levels progressively.

Second, as a brief index of how widespread corruption is in firms, we observe that most firms (43.9%) belong to Class 1, followed by Class 3 (31.3%), and Class 2 is the smallest with only 10.8%. The remaining firms are in Class 4 (14%).

Third, looking at the values of the dimensions in each class, we identify Class 3 as the class that has problems in administrative corruption, and Class 2 as characteristically political corruption. In more detail, and according to the results shown in Table 4, because Dimension 2 prevails in LC 3, we can identify it as corresponding to administrative corruption because it is more closely linked with public inspections, apart from taxes, licences and permits. In addition, although the items concerning political corruption may partly affect the results of Class 4, the prevalence in that class of the effects of administrative corruption Dimensions 1, 2, and 3 shows that the highest level of corruption is more closely associated with general administrative corruption. Figure 1 shows the identification process derived from the empirical framework.

Fig. 1
figure 1

Identification of corruption indices. Notes Analytic description of LCs and dimensions, according to results of Table 5. The continuous vertical arrow (right) measures scale of corruption, from 0 (absence of corruption) to high

4 Comparative Analysis

In this section, the expected value of posterior probabilities, computed at the E-step, allows us to build a binary variable in which each firm is attributed 1 in the LC with the highest level of posterior probability, and is 0 otherwise. This new variable identifies the perceived prevalence of corrupt practices for a firm classified in one of the LCs.

We then assess the robustness of the results by: (1) comparing our estimated corruption variables with a corruption ranking index extracted from the BEEPS, and (2) comparing the country’s perception corruption ranking with CPI. In the first case, we compare our estimates with the perceived index of administrative corruption, which was directly obtained from the BEEPS dataset in which firms responded to the following question: “Is it common for firms in your line of business to have to make irregular additional payments or gifts to get things done as regards customs, taxes, licenses, regulation services etc.?”. The answer includes six modalities on an increasing scale of perceived corruption (Never, Seldom, Sometimes, Frequently, Usually, Always). However, for empirical comparison with our four LCs, we rescale it into four modalities by combining responses Seldom and Sometimes and Frequently and Usually. The new corruption index thus mentions Absence, Low, Medium, and High, which tests an increase in the quantitative perception of corruption and is in line with the estimated LCs.

Table 6 lists the results of the LR test statistic in the 27 countries, which assumes an independence hypothesis between each estimated LC and the corresponding binary index at firm-level of BEEPS administrative corruption. LCs 1 and 4 appear to be closely associated with the directly perceived administrative corruption index. Here we discuss the novelty of the proposed corruption measure by characterising other identified LCs. In Class 3, the data reject the independence hypothesis at the 5% level between our firm score variable and that obtained with medium corruption, as extracted directly from BEEPS. We only note a non-significant association for a few countries. This result strengthens our findings and it indicates that although the index was built with a direct item of the perceived administrative corruption in BEEPS, its specificity generates the clear-cut effect of public inspections and tax evasion in some countries with respect to general administrative corruption (e.g., Bulgaria, Croatia, Hungary, Latvia, Uzbekistan and Yugoslavia). Finally, Column 2 lists the results of a “false experiment” specification, in which low administrative corruption, as extracted from BEEPS, was expected to be independent of the political corruption variable estimated as LC 2. We find confirmation that these two components of corruption are not related and that, independently of corruption level, they affect country corruption differently. Appendix C also graphically illustrates the correlation results between the estimated LCs and the perception of corruption index extracted from BEEPS. These indices are aggregated at country level by counting the “1” in each firm of the LC and in the corruption modalities of the variables that were extracted from BEEPS. In line with the results of the LR test statistic, there is a significant correlation between the identified administrative LC and the BEEPS modalities, whereas the political corruption class is not correlated with the BEEPS measure (the pairwise correlation coefficient is only 0.038).

Table 6 Tests for independence between identified LCs and direct BEEPS measures of corruption

Second, to validate our estimates, we exploit the perceived corruption scores at country level—the maximum value of the index represents the worst performance—to obtain the corresponding ranking. In particular, to obtain the country index from the LCs, we use the average of positive responses for each country and we weight these averages by the sample size of each country. Then, from these country averages, we create four different corruption ranking variables in a range of 0–100, where 0 records the lowest value of the variable at country level. We then compare these rankings with those of the CPI, which is the most prominent index measuring corruption at country level. It should be noted that CPI represents the best government performance with the maximum value of the index (CPI perception index range: 0–10). This comparison is particularly important because it suggests the potential external validity of our estimating procedure in building a country-ranking index.

Table 7 Country corruption scores and rankings

Our hypothesis considers that, if our score estimates for each LC are correct, then the specific LC related to “virtually no corruption” should partially reproduce the CPI country classification because the latter measures the best government performance with the maximum value of the index. The analysis in Table 7 shows that in some cases (i.e. Bosnia & Herzegovina, Czech Republic, Georgia, Lithuania, Moldova, Romania, Russia, Slovenia and Ukraine) the rankings of the first latent lass are in line with those of CPI. In other cases (i.e. Azerbaijan, Bulgaria, Kazakhstan, Slovak Republic and Uzbekistan) we do not find a link between the two indices. The explanation given here, which generally reflects criticism reported in the literature, is that CPI scores are not weighted means of policy and that certain firm-level characteristics influence corruption which we instead extract from the multidimensional framework adopted.

Some examples serve to clarify the main features of these LC differences. In Kazakhstan, the high estimated score in public inspections leads to an increase in the mean level of the perception of corruption, as measured by CPI. In Azerbaijan and Uzbekistan, the difference between our estimated index of virtually no corruption and CPI is shown by the relatively greater perception of political corruption. Instead, CPI estimates an excellent ranking for Estonia, which is probably associated with political corruption. The positions for the Slovak Republic and Bulgaria are more distant on the corruption index. CPI indicates that these two countries are in positions 3 and 7, respectively. In 2002, Bulgaria experienced the highest degree of political corruption and it was greatly constrained by the inefficiencies, or failure, of the legal system (Delavallade 2011).

Country rankings for administrative corruption in LCs 3 and 4 are particularly affected by the level of taxation, permits and inspections, and government contracts. While government contracts are linked to regulatory quality and the enforcement of property rights. Restrictive regulation and taxation, which are associated with insufficient enforcement of property rights, have been proven to be key determinants of administrative corruption in Russia, Ukraine, Moldova and Albania (Tanzi 1998).

The determinants of corruption are slightly different from the situation which arose in Bosnia with respect to the former USSR countries, in which administrative corruption is shown to rise with weak enforcement of property rights, revealing the incapacity of the courts to implement the law (Johnson et al. 1999). As an example of the highest levels of administrative corruption in Bosnia, Nowak (2001) attributed the space for corrupt activities to the post-war context, which complicated power structures and fragmented administration. Figure 2 graphically summarises the ranking of corruption components in the Eastern Europe countries.

Fig. 2
figure 2

Representation of types of corruption, by country. a Political corruption. b Administrative corruption (inspections). c Highest administrative corruption level (public contracts, licenses, taxes and regulations). Notes Corruption by country is constructed by corruption scores of Table 7

5 An Illustrative Example: The Relationship Between Firm Competitiveness and Corruption

In this section, we will revise the empirical relationship between competitiveness and corruption by exploiting the results of the M-IRT model, which identifies four monotonically increasing LCs. Many papers have been published about the determinants of corruption, with particular emphasis on the role of market competition (Ades and Di Tella 1999; Herzfeld and Weiss 2003; Dreher et al. 2007; Pieroni and d’Agostino 2013). In the traditional view, the competitiveness of firms is presumed to lead to more corruption because they may spend part of their profits on bribes paid to public officials.Footnote 11 Ades and Tella (1997) and Clarke and Xu (2002) empirically examined the hypothesis of a positive correlation between firm competitiveness with corruption in East European and Central Asian countries. Their results support the mechanism in question; that is, the most competitive firms have much more money with which to bribe bureaucrats and, thus, increase corruption. These findings are in line with the estimates regarding the same relationship by Svensson (2003) and Reinikka and Svensson (2006) in the case study of Uganda. However, this economic literature also proposed and tested an alternative hypothesis: if comparative disadvantages in competitiveness exist, then firms invest in bribery to stay in the market. This hypothesis was supported by Gauthier and Reinikka (2001) and was empirically confirmed for a sample of North African countries by Delavallade (2011), in which a negative relationship was found.

To illustrate the role of firm competitiveness as a determinant of corruption, the literature generally distinguishes two components: administrative and state capture indices (which we call political corruption). The main shortcoming of this approach is that the empirical tests must be carried out to ascertain whether or not the aggregate index of corruption can be pooled with either administrative or political corruption. It is certainly possible that perceived corruption indicates the existence of an overlap between these indices. Instead, our model allows some firms in these LCs of corruption to be similar to others, although they comprise statistically distinct groups.

Thus, we set up an empirical framework based on logit models, in which the dependent variables are firms’ dummy variables as derived from the estimates of the expected value of frequencies (see Sect. 4). Competitiveness is also a discrete variable with three modalities, in which the highest score indicates greater competitiveness and is proxied by the increase in sales over the previous two years (i.e., sales in 1999). We extend the concept of competitiveness by also including firms which, at least, did not suffer reduced sales. This allows us to avoid the general criticism concerning firm competitiveness linked with increased sales (i.e., to maximise firm profits) and we substitute it with the more general aim of firm survival in a competitive market.

The model specification includes the size of the firm (size) and the size of the sector in which it operates (sector). Following the results of Hellman et al. (2000), we assume that small firms tend to engage more in administrative than in political corruption, probably because the former is less costly. We include the variable size dichotomised, which is based on full-time employees of firms calculated as “small” (e.g., 2–49, reference modality), “medium” (e.g., 50–249), or “large” (e.g., \({>}\,250\)), expecting the size of the firm to negatively affect administrative corruption. In addition, this specific literature finds conflicting evidence that corruption differs among sectors. Industrial sectors, in which projects involve large amounts of money or high rent-generating public procurements, may be more open to corruption, particularly of the political kind. Thus, we consider a dummy variable according to whether a firm works in the “manufacturing” or “services” sector, the latter being the reference modality.

Table 8 Firm competitiveness and corruption: logit and ordered logit regressions

The coefficient estimates of the logit model are listed in Table 8 in terms of odds ratios. We emphasise three important results. First, firm competitiveness has a positive effect in cases where there is a perception of virtually no corruption (e.g., LC 1) but it has a negative effect when the weight of corruption increases. Long-term survival strategy is characterised by firms that are stable or which increase their competitiveness, indicating that a representative firm has less need to bribe bureaucrats to obtain, for example, public procurements in its line of business. We estimate that, at most, few corrupt activities (i.e., virtually no corruption) does not constrain economic development, finding that competitive (stable) firms increase the probability of not operating in corrupt countries by about 40%. However, the effects of competitiveness change for components characterised by higher levels of corruption, starting from the political kind. Competitive firms that occupy a stable position in their markets have a significantly lower probability than less competitive firms of being involved in corruption. This may imply that less competitive firms resort to unofficial payments to compensate for their competitive position, distorting the rules of competition for countries mainly affected by high levels of corrupt activities.

Second, by including dummies for the specific effects of size, we find significant coefficients in the LCs identified as the highest levels of corruption. In particular, large enterprises reduce the propensity (e.g., odds ratio = 0.771 and 0.805, for corruption components 3 and 4, respectively) to seek influence in inspections, public contracts and licenses, taxes, and so on. This matches the previous literature on the transition economies, which finds a positive relationship between administrative corruption and small firm size. In contrast, we do not find significant differences between the sectors of the major lines of business in each component of corruption.

Third, we provide support for the findings of the country classification on corruption. We estimate that Turkey and Bulgaria have the highest propensity for political corruption and significance in the estimated country-dummy parameters of administrative corruption, which accurately replicates our previous estimates.

The last column of Table 8 lists the estimates of the ordered logit regression parameters. The odds of reporting virtually no corruption versus political corruption, and the two LCs of administrative corruption with monotonic increase of intensity are 25–30% lower in firms with stable competitiveness, net of the characteristics of firms (size and sector) and of the specificities of countries. This confirms the findings of the logistic regressions.

6 Conclusions

This paper proposes a LC model to identify different dimensions of corruption, which is a latent phenomenon that is often measured by synthesis of observed indices assumed to be correlated with corruption. In particular, the constrained version of the M-IRT model is used to classify corruption using firms’ perceptions in a sample of Eastern Europe and Central Asian countries.

The hypothesis that items in a multidimensional perspective can reproduce the distinction between administrative and political corruption is extended and we find that four LCs exist, with increasing levels of corruption. We identify these LCs and validate the results according to a corruption index that is directly derived from the BEEPS and CPI rankings of TI. An illustrative example of the relationship between firm competitiveness and the identified LCs shows that weakly competitive firms are more tempted to resort to bribery, influencing the rules of competition, and this propensity is estimated to be greater in small firms associated with pervasive administrative corruption.

We believe that the strategy that we have used to identify corruption would be useful to empirical researchers in several respects. First, the greater part of the data on corruption uses direct investigation by questionnaires, such as the M-IRT models, which allow us to disentangle the different dimensions that an aggregate indicator of corruption tends to hide. Second, the M-IRT model reduces the bias associated with the measures of impact of a specific relationship, when corruption has been incorrectly classified. From the M-IRT model that is adopted here, the contrasting evidence of the determinants of corruption may be explained by overlapping their heterogeneous effects. This implies that the implementation of anti-corruption policies may also benefit from a correct identification of which type of corruption affects countries or regions, and which tools should be used to combat it.