1 Introduction

Economic inequality is a universal phenomenon in human societies. Although there are broad patterns of economic inequality between countries, their sources are poorly understood and hotly debated (Kuznets 1955; Acemoglu and Robinson 2009; Autor 2014; Piketty and Saez 2014; Ravallion 2014; Nishi et al. 2015). To explain the origin of economic inequality, researchers put forward different mechanisms, such as institutional structures (Acemoglu and Robinson 2009; Piketty and Saez 2014), technological progress (Acemoglu and Robinson 2009; Autor 2014), economic growth (Ravallion 2014), psychological factor (Nishi et al. 2015), and so on. In fact, because economic inequality involves many different aspects (e.g. income, wealth, social status, and so on) in human societies, seeking a universal pattern of economic inequality seems an impossible task. Nevertheless, some researchers tried to find universal patterns in income inequality. An influential economist Vilfredo Pareto proposed that income distribution in a society is well described by a power law (Pareto 1897). Although many studies have confirmed that the high-income class of populations follows a power law (Mandelbrot 1960; Kakwani 1980), there is increasing evidence showing that it does not apply to the majority of population with lower income. Using income data for USA, Yakovenko and Rosser (2009) have shown that the US society has a well-defined two-class structure (Dragulescu and Yakovenko 2001a, b; Silva and Yakovenko 2005; Yakovenko and Rosser 2009; Banerjee and Yakovenko 2010): the great majority of population (low and middle income class) follows an exponential law, while the remaining part (high income class) follows a power law. Dragulescu and Yakovenko proposed a thermal equilibrium theory based on statistical mechanics to explain the exponential pattern of income distribution (Dragulescu and Yakovenko 2000), which has won more and more support from recent empirical studies (Nirei and Souma 2007; Derzsy et al. 2012; Jagielski and Kutner 2013; Shaikh et al. 2014; Shaikh 2016; Oancea et al. 2016). However, it should be noted that the exponential law does not fit the super-low income data, which are usually fitted by log-normal or gamma distributions (Banerjee et al. 2006; Chakrabarti et al. 2013). Moreover, although the exponential law is quite successful in describing the low and middle income data, the mechanism of thermal equilibrium is questioned by mainstream economists (Cho 2014). These economists argue that the thermal theory of income distribution lacks solid economic foundation (Cho 2014), and so is unhelpful in making policy recommendations. In response to this criticism, we show in this paper that the exponential law of income distribution can be derived from the principles of free competition and Rawls’ fairness (Rawls 1999), thus giving it a solid economic foundation (Tao 2015, 2016). Because we introduce a rigorous economic treatment, the scope of applicability of the exponential distribution is determined, and we can explain why it fails to fit super-low and high income data.

Furthermore, our results can be formulated as a powerful complement for the existing literatures. Understanding of the social impact and quantitative characterization of income inequality is a subject of great social and political importance. For the quantitative characterization of inequality, while there are plenty of case-by-case studies (Piketty and Saez 2003; Piketty 2003; Banerjee et al. 2006; Piketty and Qian 2009; Clementi et al. 2010, 2012; Jagielski and Kutner 2013; Shaikh et al. 2014; Saez and Zucman 2016; Oancea et al. 2016), most of them do not recognize the underlying universal quantitative structure of income inequality, i.e., do not see the forest for the trees. Here we present overwhelming empirical evidence, derived from the datasets for 67 countries, that the low and middle part of income distribution follows a universal exponential law. More importantly, relative to other existing distributions, the fitting parameters in our distribution have an explicit economic interpretation and direct relevance to policy measures intended to alleviate income inequality. For the social impact of inequality, there are two strands of literatures. One line focuses on how the market structure and institution influences income inequality (Katz and Autor 1999; Autor et al. 2008; Heathcote et al. 2010; Moretti 2013). The other line investigates the mechanism of redistribution reducing income inequality (Piketty and Saez 2003; Piketty 2003; Atkinson et al. 2011; Golosov et al. 2013; Jones 2015). In this paper, we make an attempt to combine these two lines. On one side of the empirical investigation, we show that free economy exhibits a universal two-class structure: The great majority of population (low and middle income class) follows an exponential law, while the remaining part (high income class) follows a power law. On the other side of theoretical research, we show that the exponential income structure is a result of combining free competition and Rawls’ fairness, while the power income structure is due to the rule “the rich get richer” (i.e. the Matthew effect). To reduce the degree of inequality, we propose that the redistribution policy should be based on the principle of levying a tax on high-income class to pay the unemployment compensation, in line with Piketty’s policy propositions.

2 Exponential income distribution

In fact, mathematical apparatus of modern economics has been strongly influenced by physics. Following Newton’s paradigm of classical mechanics, the famous economist Leon Walras developed a set of equations that describe economic equilibrium (Walras 2003). These equations opened the paradigm of “neoclassical economics” and later were perfected by Arrow and Debreu (1954). Now these equations are called the “Arrow–Debreu’s general equilibrium model” (ADGEM), which is the well-known standard model of modern economics (Mas-Collel et al. 1995). Using such a model, one can illustrate why the equilibrium allocation of social resources, in which every social member obtains maximum satisfaction, exists in an “ideal institutional environment” that ensures reasonable private property rights and judicial justice. Following a mainstream economic approach, we use ADGEM in this paper to study the equilibrium income allocation among social members. Thus, we can observe that how macro-level pattern of income inequality arises from micro-level competitive interactions of individuals embedded within an ideal institutional environment.Footnote 1

Without loss of generality, we consider an “N-person non-cooperative game”, where are N consumers (or agents), each of whom operates a firm, so there are N firms. Following the basic assumptions of neoclassical economics, each consumer should be selfish and have infinite desire; therefore, all of these firms will pursue maximum profit, and all of these consumers will exchange with each other to obtain maximum satisfaction. Furthermore, if a consumer is employed in a firm that he does not operate, he will obtain the ownership share of that firm. Because consumer i operates firm i, his income consists of the firm’s operational revenue and the returns on holding the shares from other firms, where \(i=1,2,\ldots ,N\). All of these settings are explained in detail in “Appendix A”. In accordance with the basic settings of ADGEM, all the firms should be sufficiently competitive so that monopoly cannot arise; therefore, by the rule of gaining income above, each firm actually looks like a self-employed household or a small trader. This means that we can use household income data to test validity of our upcoming model. As the Pareto optimal solution to ADGEM that captures all of these settings above, Tao proved that (Tao 2015, 2016), in the long-run competition, each consumer’s equilibrium income should be completely random, and obeys the following constraint:

$$\begin{aligned} \left\{ {{\begin{array}{l} I_i \ge 0 \quad for \quad i=1,2,\ldots ,N \\ \mathop \sum \nolimits _{i=1}^N I_i =Y \\ \end{array} }} \right. \end{aligned}$$
(1)

where \(I_i \) denotes the equilibrium income of the \(i\hbox {th}\) consumer and Y denotes GDP (Gross Domestic Product).

Here we use \(A=\left\{ {I_1 ,I_2 ,\ldots ,I_N } \right\} \) to specify an “equilibrium income allocation” (EIA) among N consumers. Due to the randomness of Pareto optimal solution (1), there is a large number of EIAs. To eliminate uncertainty of optimal allocations, the proposal of traditional economists is to seek the best one by using a social preference function (Mas-Collel et al. 1995). Unfortunately, Arrow’s Impossibility Theorem has denied the existence of social preference (Arrow 1963). This is the well-known “dilemma of social choice”. However, Tao proposes that such a dilemma can be avoided by using the paradigm of natural selection (Tao 2016). To be specific, regarding each EIA as a random event and income distribution as a set of EIAs, we make a conjecture that, among all possible income distributions, the one endowed with the largest probability, i.e. the likeliest, ought to be selected, so it is survival of the likeliest (Whitfield 2007; Harte et al. 2008; Tao 2010, 2016). If our conjecture is right, we should expect the household income data to exhibit such an income distribution.

The focus of this paper is on income distribution in a democratic economy. To find the probability of each income distribution occurring under the democratic environment, we apply Rawls’ justice principle of fair equality of opportunity (Rawls 1999) to ADGEM. Since ADGEM is an ideally just procedure, fair equality of opportunity indicates that each EIA should occur with an equal probability (Tao 2016). Rawls’ fairness in a democratic economy means that the door of opportunity is open to all social members (Rawls 1999). Rawls’ fairness principle is illustrated for an example of “2-person allocation” in “Appendix B”. When Rawls’ fairness principle is applied to “N-person allocation” subject to constraints (1) where N and Y are large enough, we find that the exponential income distribution occurs with the highest probability (detailed derivation is given in “Appendix C”):

$$\begin{aligned} \left\{ {{\begin{array}{l} f\left( x \right) =\frac{1}{\theta }e^{\frac{- ({x-\mu })}{\theta }} \\ x\ge \mu \\ \end{array} }} \right. \end{aligned}$$
(2)

or equivalently

$$\begin{aligned} \left\{ {{\begin{array}{l} P\left( {t\ge x} \right) =e^{\frac{-({x-\mu })}{\theta }} \\ x\ge \mu \\ \end{array} }} \right. . \end{aligned}$$
(3)

Here x denotes income level, \(f\left( x \right) \) is the probability density of income x, and \(P\left( {t\ge x} \right) \) is the cumulative probability distribution, i.e. the fraction of population with the income higher than x.

The free parameters \(\mu \) and \(\theta \) denote marginal labor-capital return and marginal technology return (Tao 2010, 2016), respectively (see “Appendix D”). The constraint \(x\ge \mu \) is considered as the Rational Agent Hypothesis (Tao 2010) in neoclassical economics, which states that firms (or agents) enter the market if and only if they can gain the marginal labor-capital return at least to pay for the cost; otherwise they will make a loss. Such a hypothesis explains why the exponential distribution fails to fit the super-low income data at x lower than \(\mu \). This is one limitation to applicability of the exponential income distribution (2). On the other hand, by the settings of ADGEM, each firm is sufficiently competitive, and hence looks like a self-employed household; therefore, the exponential income distribution (2) does not fit super-rich people (high income class) who should operate large firms (or monopolistic firms).Footnote 2 Thus, income distribution of super-rich people obeys a power law (Axtell 2001) due to the rule “the rich get richer” (Tao 2015) (i.e. the Matthew effect) rather than Rawls’ equal opportunities. Consequently, when we fit income data using the exponential distribution (2), we should drop super-low and high income data. Finally, we point out that other scholars (Foley 1994; Chakrabarti and Chakrabarti 2009; Venkatasubramanian et al. 2015) have also applied the concepts of Rawls’ fairness, utility and maximum entropy to derive income distribution; however, our derivation has the advantage of being based on ADGEM and specifying the range of applicability of the theoretical distribution.

3 Empirical test for 67 countries

We can estimate the values of \(\mu \) and \(\theta \) by fitting empirical income data to the cumulative probability distribution given in Eq. (3). The datasets employed in this paper come from many sources at the country level and consist of income data for a large sample of percentiles. Using data for a wide span of years, we obtain a dataset of 67 countries around the world, especially European and Latin American countries. The sources of data are fully described in the “Appendix F”. Because our model is based on the ADGEM, which describes an ideal market economy, we expect the exponential distribution to be applicable for the well-developed market economies. To this end, we primarily focus on OECD countries, for which it is also easier to find detailed and reliable income distribution data. Outside OECD, it is often difficult to get detailed-enough, reliable data in the appropriate format. So, the 67 countries analyzed in this paper are those for which we managed to find the data from the sources listed in “Appendix F”. Further effort would be desirable to expand the list of countries in future work. From the household income data, which is classified into macro income quantile data, obtained for each country, we compute the cumulative distribution of income \(P\left( {t\ge x} \right) \), which is the ratio of the number of social members whose income is larger than x to the total population.

Following our theoretical construction, the empirical analysis is conducted in two steps. First, we take logs to the values of the cumulative distribution equation and run a step-by-step ordinary least square (OLS) regression to the sample. Since we investigate the relationship between cumulative distribution of income and income level, according to the scope of applicability of exponential income distribution, we need to drop the high-income samples, as they follow a power law (Axtell 2001; Tao 2015). Resorting to the goodness of fit criteria, we select the samples based on the largest adjusted \(R^{2}\) values criteria. To be specific, we first take logs to Eq. (3),

$$\begin{aligned} ln\,\left[ {P\left( {t\ge x} \right) } \right] =y=\beta x+\alpha +\varepsilon , \end{aligned}$$
(4)

where \(\beta =-1/\theta \), \(\alpha =\mu /\theta \), then we regress y on x using the OLS method. In the second step, based on the regression results obtained from the first step, we compute the value of marginal labor-capital return \(\mu \), which equals to the inverse of the ratio of the intercept to the slope coefficient, that is \(\mu =-\alpha /\beta \). Furthermore, by Rational Agent Hypothesis, we drop the super-low income samples whose values are less than \(\mu \), and again we run an OLS regression to the “new” sample.

To illustrate our testing process, we first apply the aforementioned empirical strategy to United Kingdom. In the years of 1999–2000 to 2013–2014, following the maximal adjusted \(R^{2}\) rule, we drop the super high income data first. According to our theoretical formulation, the high income people do not conform to the assumptions of ADGEM. In fact, the number of these people is relatively small, but their total income is quite large. When the top-income samples are removed, based on the regression parameters of Eq. (3), we get the value of \(\mu \), then we further drop the samples whose values are less than \(\mu \). Once again, we run an OLS regression on the purged data to fit the data to our exponential distribution. For comparison, we also fit the data on the full sample; see the two panels in Fig. 1 for details. Likewise, the same empirical testing procedure is applied to other countries around the world. The results of fitting are shown by Figs. 2 and 3. Figure 2 shows 34 mostly European countries for which Eurostat data are available, and Fig. 3 shows 32 countries and HongKong SAR from other areas. One can observe visually that agreement between theory and empirical data is very good. Furthermore, the goodness of fit parameters for the exponential income distribution (3) to 67 countries are reported in Tables S1-S3 (see Supplementary Material). We show that the adjusted \(R^{2}\) of almost all these countries approach 0.99.

Fig. 1
figure 1

Exponential fits on full and truncated income data for the United Kingdom. The vertical axis displays cumulative percentage of population in a logarithmic scale. The horizontal axis shows annual income rescaled by dividing by the \(\uptheta \)-value of each year. The data and fits for each country are shifted vertically for clarity; each line of fit intersects the vertical axis at 100% population. The fitting parameters and auxiliary information regarding the fits are given in Table S1. HMRC SPI means Her Majesty’s Revenue and Customs, Survey of Personal Incomes

Fig. 2
figure 2

Exponential fits on truncated income data for the European Union in 2014 and its neighboring countries. The vertical axis shows cumulative percentage of population in a logarithmic scale. The horizontal axis shows income rescaled by dividing by the corresponding \(\uptheta \)-value of each country. The data and fits for each country are shifted vertically for clarity; each line of fit intersects the vertical axis at 100% cumulative percentage of population. The fitting parameters and auxiliary information are given in Table S2. EU-SILC means European Union Statistics on Income and Living Conditions

Here we point out that the method of removing the high income class using the maximized \(R^{2}\) can be regarded as a filtering procedure. By our model, the exponential function fits the middle range of income distribution, so it is necessary to filter out the data at the high and low ends of distribution to reveal the exponential pattern. The filtering is always inevitable in any data analysis performed to extract signal from noisy or mixed data, so it is not an absolute question of data integrity, but rather a practical one of whether the filtering procedure is reasonable or not. We believe that our procedure above is reasonably reliable and convincing, because it converges after removal of a quite small fraction of the data. Later, we will observe that the estimated value of \(\mu \) produced by the filtering procedure for maximized \(R^{2}\) indeed agrees with empirical data. Despite this, we still do not verify that the estimate of \(\mu \) produced by the filtering procedure is consistent. In fact, because we only collect the sample data of household income, we must prove that the estimated value of \(\mu \) sufficiently approaches the true value when the number of sample is large enough; otherwise, we cannot guarantee that the estimate of \(\mu \) proposed by us is consistent. In next section, we will show that the estimate of \(\mu \) produced by filtering procedure is consistent.

Fig. 3
figure 3

Exponential fits on truncated income distribution data for various countries over various years. The vertical axis displays cumulative percentage of population in a logarithmic scale. The horizontal axis shows income rescaled by dividing by the corresponding \(\uptheta \)-value of each country. The data and fits for each country are shifted vertically for clarity; each line of fit intersects the vertical axis at 100% cumulative percentage of population. The fitting parameters and auxiliary information regarding the fits are given in Table S3

4 Consistent estimate of \(\mu \)

In Sect. 3, we have shown that the exponential distribution (3) remarkably fits the low and middle parts of household income data from 67 countries. The only problem is that we don’t know if the fitting procedure produces the consistent estimate of \(\mu \). For the full data (i.e., population), the Eq. (4) can be written as:

$$\begin{aligned} y_j =\beta ^{*}x_j +\alpha ^{*}+\varepsilon _j , \end{aligned}$$
(5)

where \(\beta ^{*}=-\frac{1}{\theta ^{*}}\), \(\alpha ^{*}=\frac{\mu ^{*}}{\theta ^{*}}\), and \(\varepsilon _j \sim N\left( {0,\sigma ^{2}} \right) \) for \(j=1,2,\ldots ,\infty \). Here \(\left\{ {x_j } \right\} _{j=1}^\infty \) and \(\left\{ {y_j } \right\} _{j=1}^\infty \) denote the full data. \(\beta ^{*}\) and \(\alpha ^{*}\) are obtained by regressing \(\left\{ {y_j } \right\} _{j=1}^\infty \) on \(\left\{ {x_j } \right\} _{j=1}^\infty \).

It must be noted that, due to the constraint \(x\ge \mu \), the Eq. (3) differs slightly from the Eq. (5). Therefore, we cannot ensure if \(\mu ^{*}=\mu \). In fact, the Eq. (3) implies that \(\left\{ {x_j } \right\} _{j=1}^\infty \) should be a strictly monotonic increasing sequence with \(x_j \ge 0\) for \(j=1,2,\ldots ,\infty \). More importantly, it indicates that there exists a positive integer \(g^{*}\) to guarantee \(x_k \ge \mu \) for \(k=g^{*},g^{*}+1,\ldots ,\infty \). This means that, for the full data, the Eq. (3) should be written as:

$$\begin{aligned} \left\{ {{\begin{array}{l} y_k =\beta x_k +\alpha +\varepsilon _k \\ x_k \ge \mu \\ \end{array} }} \right. , \end{aligned}$$
(6)

where \(\beta =-\frac{1}{\theta }\), \(\alpha =\frac{\mu }{\theta }\), and \(\varepsilon _k \sim N\left( {0,\sigma ^{2}} \right) \) for \(k=g^{*},g^{*}+1,\ldots ,\infty \). Here \(\beta \) and \(\alpha \) are obtained by regressing \(\left\{ {y_j } \right\} _{j=g^{*}}^\infty \) on \(\left\{ {x_j } \right\} _{j=g^{*}}^\infty \).

By Lemma 4 in “Appendix E”, we have proved that if \(g^{*}<\infty \), then one has \(\beta =\beta ^{*}\), \(\alpha =\alpha ^{*}\). Therefore, the Eq. (6) can be rewritten in the form:

$$\begin{aligned} \left\{ {{\begin{array}{l} y_k =\beta ^{*}x_k +\alpha ^{*}+\varepsilon _k \\ x_k \ge \mu ^{*} \\ \end{array} }} \right. , \end{aligned}$$
(7)

where \(k=g^{*},g^{*}+1,\ldots ,\infty \) and \(g^{*}<\infty \).

Obviously, our purpose is to find \(\mu \). The Eq. (7) reminds us that if one can collect the full data \(\left\{ {x_j } \right\} _{j=1}^\infty \) and \(\left\{ {y_j } \right\} _{j=1}^\infty \), then \(\mu \) can be obtained by computing \(\mu ^{*}\). Unfortunately, nobody can collect full data, so it’s impossible to obtain the Eq. (7). However, based on sample data \(\left\{ {x_l } \right\} _{l=1}^n \) and \(\left\{ {y_l } \right\} _{l=1}^n \), we can consider the following statistical estimate equation:

$$\begin{aligned} \left\{ \begin{array}{l} \hat{y} _i =\hat{\beta } _g x_i +\hat{\alpha } _g \\ x_i \ge \hat{\mu } _g \\ \end{array} \right. , \end{aligned}$$
(8)

where \(i=g,g+1,\ldots ,n\) and n denotes sample size. It’s worth emphasizing that \(g=g\left( n \right) \) is undetermined.

Here

$$\begin{aligned} \hat{\beta }_g= & {} \frac{\mathop \sum \nolimits _{i=g}^n \left( {x_i -\bar{x} _g } \right) \left( {y_i -\bar{y} _g } \right) }{\mathop \sum \nolimits _{i=g}^n \left( {x_i -\bar{x} _g } \right) ^{2}}, \end{aligned}$$
(9)
$$\begin{aligned} \hat{\alpha } _g= & {} \bar{y} _g -\hat{\beta } _g \bar{x} _g , \end{aligned}$$
(10)
$$\begin{aligned} \hat{\mu } _g= & {} -\frac{\hat{\alpha } _g }{\hat{\beta } _g }, \end{aligned}$$
(11)
$$\begin{aligned} \bar{x} _g= & {} \frac{1}{n-g+1}\mathop \sum \nolimits _{i=g}^n x_i , \end{aligned}$$
(12)
$$\begin{aligned} \bar{y} _g= & {} \frac{1}{n-g+1}\mathop \sum \nolimits _{i=g}^n y_i . \end{aligned}$$
(13)

Due to the absence of full data, we cannot obtain \(\mu \). However, we hope \(\hat{\mu } _g \rightarrow \mu \) if \(n\rightarrow \infty \). In “Appendix E”, we have proved the following proposition:

Proposition 3

For a strictly monotonic increasing sequence \(\left\{ {x_j } \right\} _{j=1}^n \), if there exists an integer \(g=g\left( n\right) \) to guarantee:

  1. (A).

    \(x_{i-1}<\mu <x_i \) or \(x_i =\mu \), where \(i=g<n\) and \(lim_{n\rightarrow \infty } \frac{g}{n}=0\);

  2. (B).

    \(\frac{\bar{y} _g }{\hat{\beta } _g }>\delta >0\) for any n;

    then one has:

    $$\begin{aligned} lim_{n\rightarrow \infty } \hat{\mu } _g =lim_{n\rightarrow \infty } \left( {\bar{x} _g -\frac{\bar{y} _g }{\hat{\beta } _g }} \right) =\mu , \end{aligned}$$
    (14)

    where g is uniquely determined by n and \(g<\infty \). This means:

    $$\begin{aligned} lim_{n\rightarrow \infty } g=g^{*}. \end{aligned}$$
    (15)

Proof

See “Appendix E”. \(\square \)

Proposition 3 indicates that \(\hat{\mu } _g \) is a consistent estimate if (A) and (B) hold. That is to say, if the sample size n is large enough, we expect that \(\hat{\mu } _g \) is extremely close to \(\mu \). Because nobody can obtain \(\mu \), our purpose can be changed to find a value close to \(\mu \). Obviously, Proposition 3 implies that the estimate \(\hat{\mu } _g \) will provide such a value. Next we show that (B) can be related to the correlation coefficient between \(\left\{ {x_i } \right\} _{i=g}^n \) and \(\left\{ {y_i } \right\} _{i=g}^n \).

Lemma 5

If \(y_i <0\) for \(i=1,\ldots ,n\), and if \(r_g <0\) for any n, then one has \(\frac{\bar{y} _g }{\hat{\beta } _g }>0\) for any n, where \(r_g =\frac{\mathop \sum \nolimits _{i=g}^n ( {x_i -\bar{x} _g } )( {y_i -\bar{y} _g } )}{\sqrt{\mathop \sum \nolimits _{i=g}^n ( {x_i -\bar{x} _g } )^{2}\cdot \mathop \sum \nolimits _{i=g}^n ( {y_i -\bar{y} _g } )^{2}}}\) denotes the correlation coefficient between \(\{ {x_i } \}_{i=g}^n \) and \(\{ {y_i } \}_{i=g}^n \).

Proof

By Eq. (9) we have :

$$\begin{aligned} r_g =\hat{\beta }_g \cdot \sqrt{\frac{\mathop \sum \nolimits _{i=g}^n \left( {x_i -\bar{x} _g } \right) ^{2}}{\mathop \sum \nolimits _{i=g}^n \left( {y_i -\bar{y} _g } \right) ^{2}}}. \end{aligned}$$
(16)

Thus, if \(r_g <0\) for any n, one concludes \(\hat{\beta } _g <0\) for any n, where we have used the Assumptions (b) and (c) in “Appendix E”. Since \(y_i <0\) leads to \(\bar{y} _g <0\), we concludeFootnote 3\(\frac{\bar{y} _g }{\hat{\beta } _g }>0\) for any n. \(\square \)

By using Lemma 5, Proposition 3 leads to the following corollary.

Corollary 1

For a strictly monotonic increasing sequence \(\left\{ {x_j } \right\} _{j=1}^n \), if \(y_j <0\) for \(j=1,\ldots ,n\), and if there exists an integer \(g=g\left( n \right) \) to guarantee:

  1. (C).

    \(x_{i-1}<\mu <x_i \) or \(x_i =\mu \), where \(i=g<n\) and \(lim_{n\rightarrow \infty } \frac{g}{n}=0\);

  2. (D).

    \(r_g<\gamma <0\) for any n; then one has:

    $$\begin{aligned} lim_{n\rightarrow \infty } \hat{\mu } _g =\mu , \end{aligned}$$

    where g is uniquely determined by n and \(g<\infty \). This means:

    $$\begin{aligned} lim_{n\rightarrow \infty } g=g^{*}. \end{aligned}$$

Proof

Using Proposition 3 and Lemma 5 we complete this proof. \(\square \)

Obviously, Eq. (3) implies \(y_i <0\) for \(i=g,g+1,\ldots ,n\). Therefore, we can employ the Corollary 1 to seek \(\hat{\mu } _g \). The step is as below:

First, we seek the minimal l to satisfy \(r_l <0\) for \(\left\{ {x_i} \right\} _{i=l}^n \) and \(\left\{ {y_i } \right\} _{i=l}^n \). Second, we regress \(\left\{ {x_i } \right\} _{i=l}^n \) and \(\left\{ {y_i } \right\} _{i=l}^n \) to obtain h and \(\hat{\mu } _h \). Third, we test \(r_h \): If \(r_h <0\) holds, we conclude that the regress result \(\hat{\mu } _h =\hat{\mu } _g \) is a valid estimate value; if \(r_h \ge 0\), we use \(\left\{ {x_i } \right\} _{i=h}^n \) and \(\left\{ {y_i } \right\} _{i=h}^n \) to repeat the steps 1-3. The computing process should end at the finite steps; otherwise, \(\left\{ {x_i } \right\} _{i=1}^n \) and \(\left\{ {y_i } \right\} _{i=1}^n \) do not fit Eq. (3).

Table 1 Correlation coefficients and estimate \({\hat{{\mu }}}_{{g}}\) for the United Kingdom

It’s easy to check that the filtering procedure in Sect. 3 is in accordance with the three steps above, provided that \(r_g <0\) holds. For simplicity, we only list the correlation coefficients \(r_g \) for the United Kingdom in Table 1, which are all negative. The readers can test the other countries, which also exhibit the negative correlation coefficients (see Figs. 2, 3). Therefore, we believe that the estimate values for \(\mu \) computed in Tables S1-S3 are convincing. It’s worth mentioning that the assumption \(\varepsilon _j \sim N\left( {0,\sigma ^{2}} \right) \) in Eq. (6) holds if and only if the high-income samples can be adequately removed. This is because high-income samples, which obey the power law, will lead to systematic errors so that \(\varepsilon _j \sim N\left( {0,\sigma ^{2}} \right) \) breaks down. In Sect. 3, we remove the high-income samples (systematic errors) based on the rule of maximized \(R^{2}\) to get the estimate value \(\mu _R \). However, the rule of maximized \(R^{2}\) is not the unique method. In fact, Fig. 1 implies that, for the United Kingdom, we may remove only three quantile in high-income samples to get \(\hat{\mu } _g \). Remarkably, the Proposition 3 implies that \(\hat{\mu } _g \) should be close to \(\mu _R \) if the sample size is large enough. In terms of our data, the United Kingdom data has the most quantile, and so yields the largest sample size (approximately equals 100). Therefore, it’s better to compare the estimate values \(\hat{\mu } _g\) and \(\mu _R\) in terms of the United Kingdom data. We have listed the results in Table 1, where the readers can check that the differences only yield the order of 0.01.

5 Discussion

The empirical results above imply that the exponential income law universally holds in most countries all over the world. Because we have investigated 67 countries from different areas, the validity of exponential income law appears to be robust. Compared to log-normal and gamma distributions, which have two or more fitting parameters, the exponential law essentially has only one fitting parameter \(1/\theta \), and produces a more parsimonious fit of the data. More importantly, our exponential law (3) is compatible with the standard model of modern economics (namely ADGEM); therefore, the fitting parameters \(\mu \) and \(\theta \) have explicit economic meaning. In fact, \(\mu \) denotes the marginal labor-capital return, and it is proportional to the minimum wage (Tao 2017). Concretely, we can obtain (Tao 2017):

$$\begin{aligned} \mu =\sigma \cdot \omega -\sigma \cdot r\cdot MRTS_{LK} , \end{aligned}$$
(17)

where \(\sigma \) denotes the marginal employment level, \(\omega \) denote the minimum wage, r denotes the interest rate, and \(MRTS_{LK} \) denotes the marginal rate of technical substitution of labor and capital. The brief derivation for Eq. (17) can be found in “Appendix D”.

The marginal employment level \(\sigma \) stands for the increasing number of employment once a firm enters markets (Tao 2017); therefore, it’s easy to understand \(\sigma \ge 0\). Thus, Eq. (17) implies that the marginal labor-capital return \(\mu \) is theoretically proportional to the minimum wage \(\omega \). Obviously, the minimum wage \(\omega \), like unemployment compensation, can be regarded as a critical income level at which labors would like to enter or exit markets. Therefore, we might as well identify \(\omega \) by the unemployment compensation.

Fig. 4
figure 4

Statistical fit of the relationship between marginal labor-capital return (MLCR) and unemployment compensation (UC). The vertical axis displays MLCR, the horizontal axis shows UC. Cross-section datasets come from 26 European countries in the year of 2011, 2012, 2013 and 2014. The local currency units (LCU) of some countries are not the euro (EUR), so annual average exchange rates, collected from Eurostat, are used to convert the UC values from LCU to EUR. The slope coefficients equal to 0.290, 0.315, 0.331, and 0.320 in the four following years, and the Pearson correlation coefficients between the two variables are separately 0.864, 0.904, 0.899 and 0.880, referring to Table S4 for details. Both the slope parameters and Pearson correlation coefficients are highly significant

To test the relationship between \(\mu \) and \(\omega \), we collected the unemployment compensation data for 26 European countries in the years of 2011 to 2014. Using the computed values of \(\mu \) for European countries from Table S2, we can directly test if there is a positive relationship between marginal labor-capital return and unemployment compensation by the OLS regression. The empirical results are shown in Fig. 4 and Table S4 (see Supplementary Material). From these results we find that the marginal labor-capital return \(\mu \) (i.e., MLCR in Fig. 4), is strongly positive correlated with the unemployment compensation (i.e., UC in Fig. 4), with the Pearson correlation coefficients being separately 0.864, 0.904, 0.899 and 0.880 (from 2011 to 2014). Remarkably, the confidence levels of correlation coefficients are surprisingly high, since p value \(<0.001\) for all four years, as shown in Table S4. It is worth mentioning that the Eq. (17) implies that \(\mu \) is inversely proportional to r ifFootnote 4\(MRTS_{LK} >0\) (Tao 2017). Recently, Tao (2017) has collected the real data of the interest rate r to do the cross-section regression between \(\mu \), \(\omega \) and r. Tao’s empirical results show that the marginal labor-capital return \(\mu \) is indeed inversely proportional to the interest rate r (Tao 2017).

Due to the robust results of our study, some significant policy recommendations can be made: by moderately increasing the level of unemployment compensation, the income inequalities originated from low and middle income classes may be reduced, because the Gini coefficient of the exponential distribution is equal to \(G=1/\left[ {2\left( {1+\mu /\theta } \right) } \right] \), see detailed derivation in Tao et al. (2017). To keep efficiency and fairness in competitive markets, we propose that the source of paying unemployment compensation should come from levying a tax on high income class. This is because, unlike high income class, the low and middle income class evolves to a competitive equilibrium combining efficiency and Rawls’ fairness. The traditional tax policy which artificially changes the income structure of low and middle income class may harm market efficiency and fairness.

6 Conclusion

We have shown that the standard Arrow–Debreu’s general equilibrium model combined with Rawls’ fairness principle naturally produces the exponential distribution of income, which agrees well with the empirical data for 67 countries around the world. These results provide a solid justification for the exponential income distribution within the mainstream economic framework. Furthermore, our findings may have broader socio-economic implications, because the exponential income law is, effectively, a result of natural selection of the likeliest (Whitfield 2007; Tao 2016), i.e. the most probable, distribution. The Arrow–Debreu’s general equilibrium model describes an ideal institutional environment (similar to ecological environment), which permits different income structures. Relative to other structures, the exponential income distribution occurs with the highest probability, and so it represents survival of the likeliest structure, also named as “Spontaneous Order” (Tao 2016). These results are relevant for evolutionary economics (Mackmurdo 1940; Nelson and Winter 1982; Potts 2001; Hodgson 2004; Dopfer 2004; Foster and Metcalfe 2012), which is concerned with the direction of social evolution. The exponential distribution (2) is obtained by maximization of entropy \(ln\varOmega \) (see “Appendix D”), which indicates the direction of evolution. According to neoclassical economics, the entropy in our model is interpreted as technological progress (Tao 2016), as discussed in “Appendix D”, so the higher technological progress is the likeliest direction of social evolution: among all possible social systems, those whose technological level happens to be the highest will be “selected” as survivors. In other words, those social systems that possess the lower technological level will be more likely eliminated in the process of social evolution. Our insights seem to be in accordance with the existing historical facts.