1 Introduction

Empirical studies in the literature have found that national leaders play an important role in economic growth. For example, Glaeser et al. (2004) find that poor countries in the 1960s, mostly governed by dictators, got out of poverty through good policies before they improved their political institutions. Using the sudden death of a leader as an exogenous shock, Jones and Olken (2005) find that the change of leaders has a significant impact on a country’s economic growth. These findings can be contrasted with the thesis that institutions are the more fundamental cause for economic growth (e.g., North and Thomas 1973; Acemoglu et al. 2005). This paper extends the above literature to studying subnational leaders using data collected on 312 Chinese cities for the period 1994–2010. The advantage of studying subnational leaders is that these leaders face the same national institutional setup so their role can be isolated from the role of national institutions. As a result, such a study can provide more decisive evidence for the question whether leaders matter for economic growth.

In most countries, however, subnational leaders do not serve in more than one locality, making comparisons among them impossible because their observed performances are the results of a combination of their own abilities and local conditions. In this regard, China provides a unique opportunity for the study of subnational leaders. In the country, local leaders are engaged in a tournament in which they compete with each other for promotion (Li and Zhou 2005; Xu 2011). In the process, a large number of them are periodically shuffled between localities; those movers, or leaders who are shuffled, connect leaders serving in different localities in such a way that we can compare all the leaders in the connected cities, regardless when they served there. Tracing the leaders who served in more than one city, we construct connected subsamples of cities that had leaders moving between them. With the largest of those subsamples, we are able to adopt the decomposition method developed by labor economists for linked employer–employee data (Abowd et al. 1999; Bertrand and Schoar 2003) to estimate leaders’ relative contributions to local economic growth (leader effects) and to compare them across cities and over time. This approach is an improvement to the Jones and Olken test, which only accounts for within-locality variations.

One potential problem of our decomposition exercise is that the leader effects thus measured may only pick up the heteroskedastic shocks that the cities received during leaders’ tenures. To rule out this possibility, we conduct several robustness checks including a study of the residuals first applied by Bertrand and Schoar (2003), a placebo test that perturbs leaders’ tenures within a city, and a city-specific AR(1) process.

In addition to studying local leaders’ contribution to economic growth, we explore the link between the estimated leader effects and leaders’ chances of getting promoted. We keep in mind two aims for doing this. The first is to check the consistency of our first-step results. To the extent that the Chinese hierarchy—one that is modeled on the Soviet nomenklatura—promotes local leaders based on their personal abilities (Xu 2011), we should find a positive correlation between leader effects and leaders’ chances of promotion. We can then be more confident that our estimates of the leader effects reflect leaders’ true abilities if these effects have good predictive power for leaders’ chances of promotion.

However, studying promotion of leaders introduces a complication to our econometric strategy. To the extent that leaders’ abilities affect both local economic growth and their chances of promotion, estimating their leader effects and promotion separately would suffer from the problem of simultaneity biases. Inspired by Abowd et al. (2006) and Buchinsky et al. (2010), we then take a system-of-equations approach to estimate the leader effects and their impacts on promotion together.

Our second aim in studying promotion outcomes is to improve on the existing literature on the Chinese promotion tournament. We make two methodological contributions in this regard. First, the existing studies use the average growth rate over a leader’s tenure to predict his promotion. However, the growth rate may not entirely reflect a leader’s own capabilities, which presumably are the factor the government’s organizational department looks at. We are able to infer leader effects from local economic growth rates and use leader effects to predict leaders’ promotion; therefore, we provide a more informative test for the relationship between leader performance and promotion. Second, the existing studies all take a single-equation approach and thus can suffer from simultaneity biases. Our system-of-equations approach can address the problem.

In addition to the two methodological contributions, our data enable us to provide more reliable estimates than those found in the current literature. The existing studies (e.g., Li and Zhou 2005; Xu et al. 2007; Wang et al. 2009) have all studied provincial leaders. One problem with this approach is that the number of observations is limited. In addition, political loyalty can become more important for promotion and performance becomes less important when one moves up the hierarchy (Landry et al. 2015). It is found that promotion of provincial leaders to the central level is often influenced by political factors (Opper and Brehmm 2007; Shih et al. 2012).

The rest of the paper is organized as follows: Section 2 gives a description of the sources and structure of the data. Section 3 studies leaders’ contribution to local economic growth. We first lay out the baseline econometric specification linking local economic growth rates with the leader effects and describe the problem of indeterminacy implied by the specification. Then we show how a connected sample allows us to overcome the problem. Lastly, we present the empirical results and several robustness tests dealing with the potential heteroskedasticity in the data. Section 4 studies how leader effects influence leaders’ promotion. We first adopt a single-equation approach, and then we estimate leader effects and leaders’ promotion in a system of equations. Section 5 concludes the paper.

2 The data

China has a highly decentralized fiscal system despite its one-party political system (Che et al. 2005; Xu 2011).Footnote 1 There are five levels of government in the country: central, provincial, municipal, county or district, and township. Each level of government has its own independent budgets and independent or shared revenue sources. At each level, two officials assume the highest offices. One is the secretary of the local Communist Party Committee, and the other is the head of the executive branch (e.g., governor at the provincial level and mayor at the city level). In theory, the party secretary is elected by the local Party Congress and the executive officer is elected by the local People’s Congress, the legislative body; in reality, most of them are appointed by the Organizational Department of the Communist Party in the government one level higher. As described in Li and Zhou (2005) and Xu (2011), the process of moving up in the government and party hierarchy can be best characterized by a tournament. That is, lower-level (e.g., county) government officials compete with each other, and those picked by the Organizational Department enter another round of elimination game at a higher-level (e.g., city) until reaching the highest level, the Standing Committee (in the latest round, consisting of seven people) of the Politburo in the Party’s Central Committee. As such, the competition becomes more and more intense when one moves up the hierarchy.

Whereas political connections are an important determining factor in the promotion tournament, personal abilities are equally important, especially at the lower levels (Landry et al. 2015). Although the central government has been trying to include more goals in the criteria of evaluation, economic growth is the dominant goal for most officials. From the perspective of the multitasking theory, this is hardly surprising because economic growth is much easier to measure than other goals. The literature has documented both systematic and anecdotal evidence for the importance of economic growth in the promotion tournament; some authors (e.g., Xu 2011) attribute China’s high economic growth rates to competition among local leaders.Footnote 2 Conversely, higher growth rates of the local economy serve as a good predictor for a leader’s chances of promotion (Li and Zhou 2005). Presumably, the Organizational Department is using the record of economic growth in a leader’s tenure to gauge his or her personal abilities.

In this paper, we study the party secretaries and mayors in the cities. There are four kinds of cities: the four provincial cities (Beijing, Shanghai, Tianjin, and Chongqing), sub-provincial cities (provincial capitals and one or two other big cities in the province), prefectural cities, and county-level cities. The four provincial cities are clearly outliers as they are actually a special form of provincial unit, and the number of county cities is too large to collect complete data. Hence, we study sub-provincial and prefectural cities in this paper, as they are more alike in terms of the rank of government officials. As of 2012, there were 333 such cities in the country.

By law, the mayor is the executive officer of the municipal government; at the same time, the law says that the mayor is under the guidance of the city Communist Party Committee for which the party secretary is the head. In most cities, the party secretary is clearly the most influential figure because important decisions are made in the party committee. However, the secretary’s power is checked by the mayor because, in theory, executive orders should be delivered through the mayor. In the end, the party secretary and the mayor share power in a city. A division of labor that emerges is that the party secretary is in charge of the personnel and other political duties such as maintaining social stability while the mayor is in charge of the daily operation of the government for which economic growth is a top priority. To the extent that the mayor has to rely on the bureaucracy to manage the economy, the mayor’s contribution to local economic growth is tied to the party secretary’s efforts to select more capable subordinates. In reality, the interaction between the party secretary and the mayor takes many forms and the pattern of their contributions to local economic growth cannot be readily parameterized. In our empirical study, we will take a simple approach by treating the two as making separate contributions to local economic growth.Footnote 3

The period covered by our study is 1994–2010. This period was chosen primarily because of the availability of data. It is difficult to get data on city leaders before 1994, and city-level macroeconomic and demographic data beyond 2010 were not made public when this paper was written. Also, the year 1994 was chosen as the beginning year because China started a new revenue-sharing system that year. Before then, the central government shared revenue with provincial governments based on negotiation; since that year, revenue has been shared under preset rules, similar to the federal system adopted in the United States.

Information on the party secretaries and mayors was collected from The China Yearbook of Municipalities, provincial yearbooks and reports from the media, especially the Internet. We then match the leaders to annual macroeconomic data collected from provincial yearbooks by the following rules:Footnote 4

  1. 1.

    Each city-year observation is matched with one secretary and one mayor.

  2. 2.

    If there was turnover within a year, we take the leader who stayed for more than 6 months in that year.

  3. 3.

    If there were multiple turnovers in a year and no leader stayed for more than 6 months, we take the leader with the longest stay in that year.

Because of the limitation of data sources, we were able to collect an unbalanced panel of 2138 leaders in 312 cities with the starting year varying between 1994 and 1998.Footnote 5 For information on promotion, however, the starting year varies from 1998 to 2001, depending on data availability. Subsequently, we will call the whole sample “the long sample” and the sample with information on promotion “the short sample.” Table A1 in the appendix lists the names of cities in our dataset as well as the starting years of detailed information; Fig. 1 then maps them on a Chinese map. The shaded cities are our sample cities, and the heavily shaded cities are those in our largest connected sample.Footnote 6

Fig. 1
figure 1

Distribution of sample cities

Among all 2138 leaders in the long sample, 1817 served in only one city, 282 had one switch, 34 had two switches, and the remaining 5 had three switches (Table A2). We call those who served in more than one city in our sample period “the movers.” The total number of movers was 321, or about 15 % of the total number of leaders. Figure 2 shows the distribution of all leaders’ tenures in one city. The average was 3.74 years, lower than the designated tenure of 5 years, also lower than the average tenure of the provincial leaders during the period 1978–2005 which was almost 4 years (Wang and Xu 2008). The median tenure was even shorter, only 3 years.Footnote 7

Fig. 2
figure 2

Distribution of leader tenures

In a dataset like ours, attrition is unavoidable. Table A3 presents the distribution of the number of years a leader appears in our sample. Half of the leaders appear for less than three years while only one-quarter appear for more than five years. There are generally three ways for a leader to leave our sample: being promoted to the provincial or central government, being moved to a city not covered by our sample, and retiring. We follow Li and Zhou (2005) to define promotion in the following ways:

  • From a mayor to a party secretary in any city.

  • From an ordinary city to a mayor or party secretary in a sub-provincial city.

  • From any city to a post in the central government or to the provincial government as party secretary, governor, vice secretary, or vice governor.

  • From an ordinary city to the head of a department in the provincial government.

As in Li and Zhou (2005), we treat being moved to the city or provincial legislative bodies (People’s Congress and People’s Political Consultation Conference) as retirement in addition to regular retirement because moves of this form often signal a loss of power and influence within the Chinese system.

Using the sample along with available promotion information in the short sample, we can get a sense of the distribution of attrition. Among the 2378 leader-term pairs, 1326 of them (55.8 %) ended with promotion while 369 (15.5 %) ended with retirement. Among the 2138 leaders in the short sample, 1562 (73.1 %) left our sample before 2010. Whereas the determination of each of the three ways of attrition is not likely to be random, what is pertinent to our study is whether the group of leaders who leave the sample as a whole is systematically different from the group staying in our sample. If it is not, then attrition can be treated as data missing at random from the existing sample.

3 Contribution of leaders to economic growth

3.1 The growth equation

Our main purpose is to compare leaders by their personal abilities. Obviously there is no direct way for us to do that, especially when we do not have information about their education levels and career backgrounds for most of the time. The method we adopt is the fixed-effect method Bertrand and Schoar (2003) use to measure the contributions of managers in different companies. To be sure, what we will get may not reflect leaders’ true abilities, and when the abilities are correctly measured, they only reflect leaders’ abilities to promote economic growth. This includes both leaders’ native abilities and their induced efforts spent to respond to promotion incentives. Because of this, we cannot distinguish between efforts that attend to long-term welfare of the society and efforts that are only geared toward generating short-term growth. It is noteworthy that the second kind of efforts may create distortions to the economy. In the Chinese context, over-investment in physical capital and related structural imbalances could be one of the consequences (Chen and Yao 2011). Our study does not study the mechanisms and consequences of growth. Nor are we able to distinguish between “good” growth that improves long-term social welfare from “bad” growth that drives up short-term GDP figures at the expense of long-term social welfare.

With those caveats in mind, we use “leader effects”, instead of “personal abilities”, to describe what we aim at obtaining. They measure leaders’ abilities to increase annual growth rates of the local economy. To begin with, we note that the economic growth rate of a city in a particular year is related to four unobserved factors in addition to observed covariates, namely, (a) the year fixed effect, (b) the city fixed effect, and (c) the party secretary’s leader effect, and (d) the mayor’s leader effect. The year fixed effect is orthogonal to the other three effects and can be identified by including year dummies in a panel regression with the growth rate as the dependent variable. Conversely, the city fixed effect and the two leader effects share the same dimension of data and cannot be readily identified. Our aim is to find a strategy to disentangle these three effects. For that, we start with a discussion of the relationship between the party secretary’s and the mayor’s leader effects.

As we pointed out in the previous section, the party secretary and the mayor share power in a city. With no prior knowledge on how they interact with each other—there would be tremendous variations across cities even if we did have that knowledge—any specification that requires them to complement or substitute with each other would yield biased estimates.Footnote 8 In practice, we assume that they contribute to local growth separately. Technically, we treat them as if they worked in two different but identical cities, so they enter our analysis as two different observations. With this in mind, our econometric specification is the following three-way fixed-effect model:Footnote 9

$$\begin{aligned} y_{i(jt)}=X_{i(jt)}\beta +\theta _i+\psi _{j}+\gamma _{t}+\epsilon _{i(jt)} \end{aligned}$$
(1)

where \(y_{i(jt)}\) is the real GDP growth rate (in decimal form) of city \(j\) in year \(t\) under leader \(i\)’s tenure, \(X_{i(jt)}\) is a set of time-varying controls, \(\theta _i\) is the leader effect of leader \(i\) (either a party secretary or a mayor), \(\psi _{j}\) is city \(j\)’s fixed effect, \(\gamma _{t}\) is the fixed effect for year \(t\), and \(\epsilon _{i(jt)}\) is the random disturbance for city \(j\)’s growth in year \(t\). In \(X_{i(jt)}\), we include per capita GDP of city \(j\) in the starting year of leader \(i\)’s tenure and city population of year \(t\), both in logarithm forms, and the inflation rate (in decimal form) of year \(t\).Footnote 10 Under the maintained assumption of exogeneity

$$\begin{aligned} E(\epsilon _{i(jt)} | X_{i(jt)}, \theta _i, \psi _j, \gamma _t)=0 \end{aligned}$$

Equation (1) is a revised version of the regular growth equation.

Note that by using Eq. (1), the GDP growth rate of a city in a particular year, as well as the corresponding right-hand-side variables other than the leader effect, appears twice in the dataset: once for the party secretary and once for the mayor. In effect, we are stacking together the data of two separate regressions for party secretaries and mayors. The main gain of stacking the data is that it substantially increases the size of the largest connected sample. The size of a connected sample is a convex function of the number of leaders moving between cities. If we estimate mayors and party secretaries separately, the number of movers in each sample is about half of the number of movers in the combined sample, but the size of each sample is reduced to less than half of the size of the combined sample. In fact, the size of the separate samples can be very small depending on the actual circumstances, such as in our case. Later when we conduct robustness checks, we will re-estimate Eq. (1) by giving the mayor and party secretary a set of ad hoc weights for their contributions.

Note also that Eq. (1) is built on the idea of decomposition of variance. We rely on the different tenures of leaders having served in the same city so we are able to attribute that city’s economic growth in the relevant periods to these leaders’ personal contributions. In particular, the paired party secretary and mayor are assumed to make separate contributions. Therefore, if a party secretary and a mayor had terms that perfectly coincided (i.e., they either worked in exactly the same years in the same city or moved together to another city), their estimated leader effects would be the same. Fortunately, we do not have any pair of such party secretaries and mayors.

Lastly, it is worth mentioning that higher growth rates do not necessarily imply that a city has higher levels of productivity. They can also be brought about by the accumulation of physical capital and a larger workforce. In particular, labor migration can play a significant role in the Chinese context. City population, included in Eq. (1), can partly account for this. However, many migrant workers do not stay long in a city and thus may be missed in population statistics. The growth thus caused then accrues to our estimates of the leader effects.

In Eq. (1) the growth rates of the two identical cities that the paired mayor and party secretary work for are assumed to be independently drawn from the same distribution. As a robustness check, we can also assume that they are drawn from a bivariate distribution. The mayor and party secretary are still assumed to make separate contributions, but their contributions are correlated. In effect, we estimate the following system of equations:

$$\begin{aligned} y_{i(jt)}= & {} X_{i(jt)}\beta +\theta _i+\psi _{j}+\gamma _{t}+\epsilon _{i(jt)}\nonumber \\ y_{i'(jt)}= & {} X_{i'(jt)}\beta +\theta '_{i'}+\psi '_{j}+\gamma _{t}+\epsilon '_{i'(jt)} \end{aligned}$$
(2)

where \(i\) is the index for mayors and \(i'\) is the index for party secretaries. We assume that \(\epsilon _{i(jt)}\) and \(\epsilon '_{i(jt)}\) follow a bivariate normal distribution with zero means. Here we allow the city fixed effects in the two equations to be different. Note that by this specification, we need to impose the assumption that the leader effects have an equal mean across mayors and party secretaries to maintain the connected sample. This assumption may be even stronger than the assumptions maintained when we apply the benchmark estimation based on Eq. (1).

3.2 Identification and test strategies

In most panel data analyses, researchers focus on the coefficients of covariates and add fixed effects only as controls to eliminate unobservable within-group-invariant factors. In this paper, we care about the fixed effects themselves. However, we have three sets of fixed effects to estimate whereas the data on economic performance only have two dimensions, that is, city-leader pair and the calendar year, so there is indeterminacy between the city and leader fixed effects. This indeterminacy is evidently reflected by Eq. (1). The dependent variable \(y_{i(jt)}\) is indexed by \(j\) that identifies cities and by \(t\) that identifies the years. In the meantime, the leader effect \(\theta _i\) enters the equation to coincide with the city fixed effect \(\psi _{j}\) for a number of years. That is, the data points corresponding to \(\theta _i\) are a subset of the data points corresponding to \(\psi _{j}\) if leader \(i\) only worked in one city. As a result, the best one can achieve for these leaders is to estimate the sum of \(\theta _i\) and \(\psi _{j}\). However, the literature on employer–employee linked data in labor economics (Abowd et al. 1999, 2002; Abowd and Kramarz 2006; Cornelissen 2008) provides guidance for us to address this problem. Following that literature, we can build connected samples created by leaders who moved between cities and estimate a relative order for \(\theta _i\)’s and \(\psi _{j}\)’s separately.

Figure 3 provides a simple illustration with only one leader at a time in each city. In the figure there are three cities and six leaders. Leaders 1 and 2 only served in city A, leader 3 served in both city A and city B, leader 4 only served in city B, and leaders 5 and 6 only served in city C. City A and city B are connected by leader 3 who served in both cities. Because of that, all the four leaders who served in the two cities are also connected. We then call cities A and B and leaders 1–4 a connected group. In contrast, city C does not have a leader switching to the other two cities, nor does it have a leader coming from the other two cities. So city C and leaders 5 and 6 form a separate group.

Fig. 3
figure 3

An illustration of identification

As we pointed out previously, normally we can only identify the sum of the leader fixed effect and the city fixed effect \(\omega _{ij} = \theta _i + \psi _{j}\). As a result, the fixed effect of city C \(\psi _{C}\) cannot be separated from the fixed effects of leaders 5 and 6, \(\theta _5\) and \(\theta _6\). So \(\psi _{C}\) cannot be identified. However, the difference of \(\theta _5\) and \(\theta _6\) can be identified because it is equal to \(\omega _{6C} - \omega _{5C}\).

In the connected group, we can do more using the connection created by the mover, leader 3. First, subtracting \(\omega _{3A}\) from \(\omega _{3B}\) we get the difference between city A and city B’s fixed effects \(\psi _{B}-\psi _{A}\). Then subtracting \(\omega _{3A}\) from \(\omega _{1A}\) and \(\omega _{2A}\) we get the difference between leaders 1 and 3 and the difference between leaders 2 and 3, respectively. Finally, subtracting \(\omega _{3B}\) from \(\omega _{4B}\) we get the difference between leaders 4 and 3. With that, we can finally compare all the four connected leaders. However, the values of \(\theta _i\) and \(\psi _{j}\) are not unique. For example, we can add 1 to \(\psi _{j}\) for all \(j\) and subtract 1 from \(\theta _i\) for all \(i\) and leave \(\omega _{ij}=\theta _i+\psi _{j}\) unchanged. So we can only identify the relative order of the leader effects, but not their absolute values.

The study of both the mayor and the party secretary in a city complicates the identification problem because it adds one more dimension. In this regard, our specification in Eq. (1) helps us out. By this specification, we can treat the pair of party secretary and mayor working in the same city in the same year as if they were working in two separate albeit identical cities. Because they share the same city fixed effect, they can be treated as if working in the same city, so the size of the connected group is greatly increased. In the meantime, because their tenures did not perfectly overlap with each other and they are treated and estimated separately in our econometric setup, their \(\omega _{ij}\) have different values despite sharing the same city fixed effect as a component, and we can separate their leader effects from each other.

In conclusion, we can identify the differences of fixed effects between leaders as well as between cities within a connected group. Moving local officials from city to city increases the size of the connected group, which makes it feasible to compare personal qualities within a larger amount of cities. This may be one of the key reasons why the central government in China keeps moving officials across cities and provinces.

In our long sample of 2138 leaders, the 321 movers allow us to identify 20 connected groups plus 38 isolated cities and 253 isolated leaders. Table A4 provides detailed information on these groups. As one can see, among the connected groups, many groups are small. But Group 1 is sufficiently large for our analytical purposes. This group consists of 175 cities, 1196 leaders (among which 218 are movers), and 5403 leader-year pairs (observations); that is, it accounts for 58 % of the whole sample. Thereafter we will mainly analyze this group and simply call it “the connected sample.” In Fig. 1, the connected sample of cities is shaded in a darker color contrasting to other sample cities in a lighter color.

On the basis of the connected sample, we can estimate Eq. (1) to examine whether individual leaders matter. As noted above, we can only identify the differences between cities and between leaders. But to save notation, we set the mean of \(\theta _i\)’s to zero and still use \(\psi _{j}\) and \(\theta _i\) to denote those differences. Then an F-test on the joint significance of \(\theta _i\) suffices for our purpose. This test shares the same idea as Jones and Olken’s, that is, relying on the variation among leaders to answer the questions whether leaders matter. However, the Jones and Olken test is a stronger test than ours. This is primarily because the Jones and Olken test, if it were to be applied to our case, would only consider within-city variations, but our test considers both within-city and between-city variations, so a rejection of the null by the Jones and Olken test definitely implies a rejection of the null by our test. Conversely, if the Jones and Olken test cannot reject the null, it does not mean that leaders do not matter, as Jones and Olken themselves have noticed, because leaders in different cities may perform differently. Failure in rejecting the null by our test does not mean that leaders do not matter either. This is because we can only estimate the differences among leaders and cannot estimate the absolute values of their leader effects. That is, failure in rejecting the null can imply that leaders are equally capable. Because of the dimensional limitation involving leaders and cities, this is by far the best result one can achieve. Our improvements on Jones and Olken’s test thus are twofold. One is that our test has a smaller probability of making a Type II error than Jones and Olken’s, and the other is that we use more information, and therefore failure in rejecting the null is a more decisive indicator that leaders are all the same and do not matter for local economic growth in the sense that shuffling them around would not have any effect.

To see how our tests differ from that in Jones and Olken (2005), we also compose a \(\chi ^2\)-test similar to theirs using only within-city variations. Our data have multiple turnovers of leaders in a single city so the \(PRE\) and \(POST\) dummies in Jones and Olken’s paper are not clearly defined in our case. But consider a city with 3 consecutive leaders with fixed effects \(\theta _{1}\), \(\theta _{2}\) and \(\theta _{3}\). In the first turnover, \(POST-PRE=\theta _{2}-\theta _{1}\); while in the second turnover, \(POST-PRE=\theta _{3}-\theta _{2}\). So the parallel test is \(\theta _{2}=\theta _{1}\) and \(\theta _{3}=\theta _{2}\).

In practice, we can estimate Eq. (1) using the whole long sample and construct a \(J\) statistic similar to that in Jones and Olken (2005).

$$\begin{aligned} J=\frac{1}{Z}\sum _{i=1}^{Z}\frac{(\theta _i-\theta _{i'})^2}{\frac{2\sigma ^2_{\epsilon i}}{(T_i+T_{i'})/2}} \end{aligned}$$

where \(Z\) is the total number of leaders, \(\theta _i-\theta _{i'}\) is the difference in the leader effects of two consecutive leaders for the same city-position, \(\sigma ^2_{\epsilon i}\) is the variance of the error term for city \(i\), and \((T_i+T_{i'})/2\) denotes the average tenure length of the consecutive terms. \(\theta _i-\theta _{i'}\) and \(\sigma ^2_{\epsilon i}\) are replaced with \(\hat{\theta _i}-\hat{\theta }_{i'}\) and \(\hat{\sigma }^2_{\epsilon i}\) from Eq. (1), respectively.

We then execute a \(\chi ^2\)-test with \(Z\) degrees of freedom. If we reject the null, then we can conclude that leaders differ in the effects within a given jurisdiction, as in Jones and Olken; if we cannot reject the null, then we have no evidence that one leader can outperform another in a given city.

3.3 Baseline empirical results

We first mimic Jones and Olken’s original J test by using retirement as an exogenous shock.Footnote 11 The premise is that if retirement is sufficiently random, we would observe significantly different performance records of the new leaders if leaders did matter for growth. The two groups of comparison are the terms of retired leaders and the terms of their immediate successors who themselves did not retire in those terms. We first run a simple regression and find that after a city leader retires, the successor can boost local economic growth by 1.0 percentage point. When we repeat Jones and Olken’s J test, the change is also significant. However, we need to be cautious about the validity of this result because retirement may not be exogenous. As will be uncovered later in the paper, city leaders who retire are those who fail to get promoted, and their personal abilities are systematically lower than leaders who get promoted.

We then conduct the modified Jones and Olken test on all the leaders in a city. We do this twice, once using the whole long sample and the other using the connected long sample. For the whole sample, the \(J\) statistic is 2.975 and the p value is 1.000 under 2363 degrees of freedom. As for the connected sample, the statistic is 3.333, and the p value is 0.999 under 1404 degrees of freedom. Thus we cannot reject the null hypothesis and conclude that within-city variations are not large enough to justify the contribution of individual leaders. That is, the Jones and Olken test fails in our dataset. One of the reasons is probably that changing leaders at the national level is a more dramatic event than changing leaders in a city. In addition, government operation at the city level may be more routine and may rely more on bureaucracy than government operation at the national level. This is consistent with Jones and Olken’s finding that a change of leaders only matters in non-democracies, not in democracies. One explanation for this finding is that procedures are more robust in democracies so leaders’ performance is more stable than in non-democracies.

As we pointed out before, we may make the Type II error if we conclude from the Jones and Olken test that leaders do not matter for local economic growth because leader effects may vary between cities. Next we conduct our F test by estimating Eq. (1) using the connected long sample of 1196 leaders and a total of 5403 observations of leader-year pairs. Following the solution in Cornelissen (2008), we impose the following zero-mean constraint in our estimation

$$\begin{aligned} \sum \theta _i=0. \end{aligned}$$

By imposing this constraint, it is straightforward to apply the standard F test that \(\theta _i\)’s are all zero. The regression results are presented in Table 1, with ordinary, White heteroskedasticity-robust and province- and city-year spell clustered standard errors. The resulting \(F\)-statistic is \(F(1195, 4014)=1.93\), and the \(p\) value is less than 0.001.Footnote 12 That is, the null hypothesis that leaders do not matter is rejected by a large margin.

Table 1 The growth equation: single-equation estimation

The estimates for population and GDP per capita both return negative coefficients although the coefficient for population is not significant when the standard errors are clustered at the province level. The result for initial GDP per capita is consistent with the convergence hypothesis, and the other result implies a certain burden of a larger population. The estimated coefficient for the inflation rate is also negative and significant, showing that inflation is bad for economic growth.

The role of leaders can also be shown by an analysis of variance, as is done in Graham et al. (2012). Table 2 shows the shares of variance of real GDP growth that the city, year, and personal dummies respectively explain in the connected sample. The city dummies alone explain only 8 % of the total variation, the year dummies explain almost 12 %, and the personal dummies explain 26 %. If we have correctly measured the leader effects, this result shows that leaders have played a significant role in explaining local economic growth.

Table 2 Decomposition of variance for the growth equation

Figure 4 presents the kernel density of the estimated individual leader effects. Table 3 provides summary statistics of their distribution. The standard deviation is relatively small and the kurtosis is large, but the distribution is skewed left, indicating that there is a group of leaders with relatively low personal abilities. Figure 5 then separates party secretaries and mayors. The distribution of mayors weakly dominates that of party secretaries although the gap is not statistically significant. We also compare the distribution of leaders who are observed to have left our sample and the distribution of the whole sample. We do not find any significant difference between them. Thinking back, this should be expected because almost all leaders (except those who were still in office by 2010) eventually left our sample. Therefore, biased attrition should not be a problem for our test.

Fig. 4
figure 4

Kernel density of leader effects

Fig. 5
figure 5

Kernel density of leader effects: party secretaries vs. mayors

Table 3 Summary statistics of leader effects

Finally, we estimate Eq. (2) as a seemingly unrelated regression (SUR) model. Then we conduct two \(F\) tests for mayors’ leader effects to be jointly zero and for party secretaries’ leader effects to be jointly zero, respectively. The resulting \(F\)-statistics are 2.19 and 1.84, respectively, whose \(p\) values are both less than 0.001. This confirms our results based on Eq. (1). In subsequent analysis, we will thus focus on Eq. (1).

3.4 Robustness tests

By our identification method, the leader effects are essentially estimated by cities’ average economic growth rates during the respective leaders’ tenures. The null hypothesis is that the residuals in the growth equation, after controlling the right-hand-side variables except the leader fixed effects, should be random draws from the same distribution. Our baseline model in Eq. (1) parameterizes these residuals by the leader fixed effects and our \(F\) test finds that they are not random draws from the same distribution. However, this positive result may arise only because cities received heteroskedastic or spurious time-persistent shocks during leaders’ specific terms (Easterly and Pennings 2014). If that were the case, our estimates of the leader effects would be incidental rather than reflecting leaders’ true abilities. To rule out this possibility, we conduct several robustness checks in this subsection.

The first is to follow Bertrand and Schoar (2003) to study the residuals attributable for leaders. Specifically, we first estimate Eq.  (1) using the long sample without the leader fixed effects, and then collapse the residuals at the leader-city level (i.e., by leaders’ tenures in different cities). If leaders’ fixed effects only picked up city-specific heteroskedastic shocks, there should not be any correlation between these collapsed residuals—not so even between different tenures of the same leader who moved between cities. Consequently, we pick up the movers and regress the collapsed residuals of a leader’s term before his move on the collapsed residuals of his term after his move plus a constant.Footnote 13 This regression returns a coefficient of 0.680, which is statistically significant at the 1 percent level. This result means that leaders’ contributions are consistent across the cities they have served; our estimates of the leader effects are not incidental.

Our second robustness check is a placebo test that permutes leaders’ tenures. If our estimates of leader effects only picked up heteroskedastic shocks, we would have no reason to believe that these shocks would form a consistent pattern that follows the cycle of leaders’ tenures. For that, we randomly permute each leader’s tenures across years within the same city and re-estimate Eq.  (1).Footnote 14 The number of possible permutations is extremely large, so a full permutation is not computationally feasible; nor is it necessary. In practice, we conduct several rounds of permutation with each round consisting of 1000 permutations. We find that the \(F\)-statistic from the true data is either Nos. 1 or 2 among the \(F\)-statistics from any round of permutation. This result gives us more confidence in our baseline results.

Our third robustness test is to re-estimate Eq. (1) by introducing two city-specific AR(1) processes, one for mayors and the other for party secretaries, to its error term. This exercise allows us to explicitly address heteroskedasticity while estimating the leader fixed effects. Neither the regression result, nor the test result, though, changes in a meaningful way.

As a last robustness test, we re-estimate Eq. (1) by assigning weights to the co-working mayor and party secretary’s contributions. Specifically, let the share of growth contributed by the mayor be \(m\). So the share of the party secretary is \(1-m\). Then, we replace \(y_{i(jt)}\) by \(my_{i(jt)}\) when the observation is for a mayor or \((1-m)y_{i(jt)}\) when the observation is for a party secretary, and re-estimate Eq. (1) four times with \(m\) set to be 40, 45, 55 and 60 %, respectively. Compared with the benchmark, which is equivalent to \(m=50~\%\), the \(F\)-statistics are larger and \(p\) values are smaller. That is, our baseline result is robust to different weights assigned to the contributions of mayors and party secretaries.

4 Leader effects and promotion

4.1 The promotion equation

As we pointed out in the introduction, we have two aims when studying how leader effects affect promotion outcomes. One is to test the sensitivity of our estimates of the leader effects, and the other is to improve on the literature on promotion tournaments. Li and Zhou (2005) find that the average GDP growth rate of a provincial leader’s tenure is a good predictor for the probability of that leader’s promotion and retirement. However, Opper and Brehmm (2007), Wang and Xu (2008), and Shih et al. (2012) provide different results. Opper and Brehmm (2007) define an index of political connections and find that it is a strong predictor for the promotion of provincial leaders whereas the local growth rate has no predictive power. Shih et al. (2012) use a more comprehensive set of indexes for political connections and find that political connections are important for a person to enter the central committee of the Chinese Communist Party. Landry et al. (2015) find that economic performance becomes a less important factor to determine promotion when one moves up to higher levels of government, suggesting that political connections may become more important at higher levels. Yet Wang and Xu (2008)’s results seem to reject the political connection story. Whereas they find that provincial party secretaries and governors who are later promoted to the central government do not significantly outperform others, they also find that the provincial leaders who come from and then go back to the central government underperform the average leader.

The controversy may have a lot to do with findings from many studies’ direct use of the GDP growth rate as the predictor. The GDP growth rate may not be a good indicator for a leader’s personal abilities because it is highly correlated with local conditions, some of which change over time and cannot be accounted for by the provincial fixed effect. One of the regularities concerning promotion is that almost all the members of the Standing Committee of the Politburo either have been directly promoted from or have worked in the few most advanced provinces as well as the three big cities—Beijing, Shanghai, and Tianjin. Because those localities enjoy preferred economic policies during various periods, the significant results in Li and Zhou (2005) may well reflect some particular features of the data rather than a general link between promotion and performance. One component of Opper and Brehmm (2007)’s index of political connections is whether a leader has worked in provinces that a member of the Standing Committee has worked for. So their results may suffer from the same problem of incidental correlation.Footnote 15 Lastly, ministry-level officials in the central government may be sent to provinces only for them to gain local experiences, which could lead to Wang and Xu (2008)’s findings.

Our data and identification strategy allow us to improve on the existing literature. On the one hand, the leader effects estimated from Eq. (1) reflect a leader’s own contributions to a city’s economic performance; on the other hand, the promotion of city leaders is less subjected to political considerations and can be more linked with leaders’ personal abilities. Therefore, we can be more confident that our estimates of the leader effects reflect leaders’ real abilities if we find that they are good predictors for leaders’ chances of promotion .

Note that we only have data of promotion with the starting year varying from 1998 to 2001 and can only estimate the leader effects of leaders in the connected sample. So our estimation of the relationship between leader effects and promotion is performed for the period 1998–2010 using the connected sample. This allows us to examine 995 leaders.

As a starting point, we note that, at any point of time, there are three possible states for a leader: being promoted, retiring, and staying as a city leader (including moving to another city). We thus adopt two econometric methods to conduct our study. One is to use the linear probability model (LPM) to compare promotion and the other two options for each calendar year.Footnote 16 The LPM has the advantage of being able to provide straightforward and unconditional predictions based on individual estimates alone. The other estimation method we adopt is the ordered probit model (OPM), also for each calendar year. The OPM has the advantage of treating all the three states simultaneously and thus avoiding potential biases in the estimates of the LPM that arise from not accounting fully for the correlation between states. However, it does not provide direct unconditional predictions.Footnote 17

When analyzing the probability of promotion, we also need to be concerned about the spatial scope of comparison. Whereas some leaders have been moved to cities out of their own provinces, it has been more common that city leaders are promoted or shuffled within the same province. Among the 321 movers in the connected long sample, only 34 served in two different provinces; among the 689 leaders who enjoyed promotion, only 29 were promoted outside the province. That is, competition among leaders is mostly restricted within the same province. On the other hand, comparing leaders only within the same city would apparently be too restrictive. Therefore, we add provincial dummies in our regression analyses to account for the fact that city leaders compete within the same province.

In summary, let \(p_{i(jt)}\) be a notional variable describing leader \(i\)’s state in year \(t\) when serving in city \(j\), and we then specify the promotion equation as

$$\begin{aligned} p_{i(jt)} = Z_{i(jt)}\delta +\alpha \theta _i+v_k+\eta _t+u_{i(jt)} \end{aligned}$$
(3)

where \(\theta _i\) is the leader effect as in Eq. (1), \(Z_{i(jt)}\) is a set of controls, \(v_k\) is the provincial fixed effect for province \(k\), \(\eta _t\) is the year fixed effect, and \(u_{i(jt)}\) is a random disturbance. In the LPM, the outcome of \(p_{i(jt)}\) is 1 or 0 standing for either promotion or no promotion; in the OPM, the outcome of \(p_{i(jt)}\) is \(-\)1, 0, and 1 standing for retirement, staying and promotion, respectively. In both the LPM and the OPM, \(Z_{i(jt)}\) includes the following variables: leader age, number of years since a person became a city leader (city tenure), and a dummy variable indicating whether a leader worked in the provincial government (provincial experience).

Age could be the most important factor in addition to ability in determining a leader’s chances of promotion. The promotion tournament constantly eliminates people who reach the age limit for each level in the hierarchy,Footnote 18 so being young can be a big advantage if one wants to move upward in the bureaucratic hierarchy.

Besides age’s direct effect, the effect of leader effects on promotion can also vary by age. All leaders must work in some place in the government at different stages of their political careers. Although we are unable to observe their performance before they serve as city leaders, the Organizational Department can observe and evaluate all their performances along their career paths. Intuitively, personal ability is revealed more clearly the longer a leader serves in the government, and the estimated leader effect reflects more of the true ability. In this regard, we add an interaction of the leader effect and age in the promotion equation.

Lastly, provincial experience is meant to capture the influence of political connections. Having worked in the provincial government should increase a person’s chances of getting promoted if political connections are important. However, it could also be the case that moving from the provincial government to work in a city represents a step downward because it means that the leader could not get promoted in the provincial government.

4.2 Empirical results for single-equation estimation

Table 4 shows the results of the LPM estimated on Eq. (2) when we compare the promoted leaders with those who stayed or retired. In column 1, we find that the leader effect has no significant effect on promotion, and age is negatively correlated with promotion although the effect is not economically significant. Instead, provincial experience is found to be helpful for promotion. Holding other things equal, a leader coming from a provincial position enjoys a higher probability of promotion by 5.1 percentage points on average. This advantage seems to imply that political connections are important. Longer tenure as a city leader also helps. One more year as a city leader increases a person’s chances of promotion by 2.5 percentage points.

Table 4 The promotion equation: linear probability model

As we pointed out in the previous section, leader effects may be heterogeneous across age. We then interact the leader effect with age and re-run the regression. Column 2 of Table 4 presents the results. Now, the impact of the leader effect grows with age. That is, the older the leader, the more significant the role the leader effect plays in predicting the chances of promotion. By the point estimate for the interaction term, if the most capable person in the sample gets one year older, his or her chances of promotion increase by 2.6 percentage points over those of the least capable person in the sample. One of the causes behind this result is perhaps related to the way we estimate the leader effect. It is estimated for a leader’s whole career appearing in our sample. That is, in a sense, it is the lifetime ability of a leader. Therefore, as a leader becomes older and, for that matter, has worked in the government longer, more information is revealed to the Organizational Department that can then make a more credible inference about the leader’s ability. Another possible reason for the result is that competition becomes more intense when leaders become older, which could increase the value of performance in their promotion. As a matter of fact, younger leaders, from the day they assume the municipal positions, are usually designated as political hopefuls who will one day rise up in the party hierarchy. Consequently, subsequent performance may not be a decisive factor determining their promotion.

To capture possible nonlinearities in the effect of age, we replace age by a threshold. We choose 49 to 52 years old (around the median age of 50) as possible thresholds, and present the results in columns 3 through 6 in Table 4. The impact of the leader effect on leaders younger than the threshold age is unstable; it is much more pronounced for leaders older than the threshold age as the coefficients of the interaction terms are all positive and significant. Among the older group of leaders, the most capable person has a chance of promotion 10.8 percentage points higher than the least capable person when the threshold is 49 years old.Footnote 19 This gap becomes 10.2 and 16.7 percentage points when the threshold is raised to 50 and 51 years old, respectively. However, it drops to 10.4 percentage points and becomes insignificant when the threshold is raised to 52 years old. We have also tried thresholds less than 49 years old and larger than 52 years old and found that the significance of personal abilities declines. Therefore, we conclude that promotion outlook is more sensitive to the estimated leader effects for leaders above the median age than those below that.

Table 5 presents the results of the OPM. The results do not change qualitatively compared with those of the LPM in all coefficients. However, the impact of the leader effect has become more significant in both statistical and economic terms. As pointed out by Ai and Norton (2003), the magnitude of the interaction effect in nonlinear models does not equal the marginal effect of the interaction term. An algorithm of marginal effects for interaction terms in nonlinear models with either continuous or dummy variables is developed in their subsequent paper (Norton et al. 2004). Using that method, we derive the marginal effects for the leader effect to influence the chances of promotion. They are 17.3, 16.7, 27.3, and 17.2 percentage points between the most capable leader and the least capable leader in the group of older leaders when the age threshold is 49–52 years old, respectively, with all but the last being statistically significant. To the extent that the OPM provides a more reliable set of estimates, more weights should be given to these results than to those generated by the LPM.

Table 5 The promotion equation: ordered probit model

4.3 Joint estimation of growth and promotion

Although the single-equation estimation in the last subsection is straightforward, it may encounter a problem if the error terms in Eq. (1) (the growth equation) and Eq. (3) (the promotion equation) are correlated. In particular, the promotion equation uses the leader effects estimated from the growth equation to predict promotion and thus will return a biased estimate if the two error terms are correlated. However, the estimates of the leader effects might also be biased because they influence attritions caused by promotion (see Abowd et al. 2010). In this subsection, we would like to form an econometric strategy to integrate the estimation of the two equations in a simultaneous-equation system. Unlike in the baseline estimation where leader effects are estimated from the growth equation and used in the promotion equation, now we estimate the growth and promotion equations simultaneously to take into consideration the correlation between the two error terms of the two equations. To do so, we start with the following unrestricted model that combines Eqs. (1) and (3) in a system of equations:

(4)

When the LPM is applied to the second equation, both equations are linear. So we will call the model in Eq. (4) the linear–linear model, or simply the L–L model. We assume that \(\epsilon _{i(jt)}\) and \(u_{i(jt)}\) follow a joint normal distribution:

$$\begin{aligned} \left( \begin{array}{c} \epsilon _{i(jt)} \\ u_{i(jt)} \end{array} \right) =\mathcal {N}\left( 0,\Sigma \right) , \Sigma =\left( \begin{array}{cc} \sigma _{11} &{} \sigma _{21} \\ \sigma _{12} &{} 1 \end{array} \right) \end{aligned}$$

Note that the variables in \(X_{i(jt)}\) appear only in the first equation of the equation system (4) so they serve as the instruments for identification. Our identification assumption is that these variables (i.e., initial GDP per capita, city population, and inflation) have no direct effect on a leader’s promotion prospects and are uncorrelated with omitted determinants of a leader’s promotion prospects. To the extent that a leader’s ability to promote economic growth is the most significant factor determining his or her chances of promotion and those variables are closely related to economic growth, this assumption is reasonable.

As we noted before, we only have data for promotion starting around 1998–2001. In our baseline regressions, we can first estimate the leader effects using the connected long sample and then estimate the promotion equation for the period 1998 onward within the connected sample. In the joint estimation of the equation system (4), we can no longer do that. One option is to estimate the largest connected sample since 1998. However, this will drastically reduce the sample size and make results incompatible to the single-equation approach. What we will do is to still estimate the connected long sample and, at the same time, treat leaders without promotion information as missing data (a total of 201 leaders).

One distinctive feature of the system of equations (4) is that the leader effects have to be estimated jointly from the two equations. It is then clear that a single-equation approach would return biased estimates for both \(\alpha \) and the leader effects themselves if simultaneity exists.Footnote 20 A system-of-equations approach mitigates this shortcoming. We can estimate (4) by the maximum likelihood (ML) method. One caveat is that in deducing the log-likelihood function, we need to add up likelihoods from observations with \(p_{i(jt)}\) observed and missing.

Denote \(\omega _{1ijt}=X_{i(jt)}\beta +\theta _i+\psi _j+\gamma _t\) and \(\omega _{2ijt}=Z_{i(jt)}\delta +\alpha \theta _i+v_k+\eta _t\), the likelihood for each observation where \(p_{i(jt)}\) is observed is

$$\begin{aligned} L_{i(jt)}=\phi _2(y_{i(jt)}-\omega _{1ijt},p_{i(jt)}-\omega _{2ijt};\Sigma ) \end{aligned}$$

where \(\phi _2\) denotes the density function of the bivariate normal distribution. For observations with \(p_{i(jt)}\) missing, only the growth equation is estimated, so the likelihood is thus

$$\begin{aligned} L_{it}=\phi (y_{ijt}-\omega _{1ijt};\sigma _{11}) \end{aligned}$$

where \(\phi \) stands for the density function of the univariate normal distribution. To sum up, the log-likelihood function for the unrestricted model is

$$\begin{aligned} \ln L= & {} \sum _{p_{i(jt)} \text {observed}} \ln \phi _2(y_{i(jt)}-\omega _{1ijt},p_{i(jt)}-\omega _{2ijt};\Sigma )\nonumber \\&+ \sum _{p_{i(jt)} \text {missing}} \ln \phi (y_{i(jt)}-\omega _{1ijt};\sigma _{11}) \end{aligned}$$
(5)

To test whether leaders matter, we can estimate a restricted version of (4) where all the leader effects are restricted to equal to zero in the growth equation. As a result, they are also dropped in the promotion equation. We can then perform a likelihood-ratio (LR) test for the null hypothesis imposed on the unrestricted model:

$$\begin{aligned} H_0: \theta _i=0,\quad \text {for all } i. \end{aligned}$$

Then, to examine whether leader effects are important for promotion, we can just test the null hypothesis \(\alpha =0\) in the unrestricted model.

To make it consistent with our baseline estimation, we also estimate a system where the promotion equation is specified as ordered probit (so the model is denoted L-OP). The model is

$$\begin{aligned} \left\{ \begin{array}{l} y_{i(jt)} = X_{i(jt)}\beta +\theta _i+\psi _j+\gamma _t+\epsilon _{i(jt)} \\ p_{i(jt)}^* = Z_{i(jt)}\delta +\alpha \theta _i+v_k+\eta _t+u_{i(jt)}\\ p_{i(jt)}=\left\{ \begin{array}{l l} -1 &{}\quad p_{i(jt)}^*\le c_1 \\ 0 &{}\quad c_1 < p_{i(jt)}^* \le c_2 \\ 1 &{}\quad p_{i(jt)}^*>c_2 \end{array} \right. \end{array} \right. \end{aligned}$$
(6)

where \(p_{i(jt)}=-1\) denotes retirement, \(p_{i(jt)}=0\) denotes staying as a city-level official, and \(p_{i(jt)}=1\) denotes promotion. The two error terms \((\epsilon _{i(jt)},~u_{i(jt)})\) still follow a bivariate normal distribution specified before. The log-likelihood function can be deduced in the same way as in the L–L model, and is presented in Appendix A. The null hypotheses of the two tests are similar to those of the L–L model.

4.4 Results of joint estimations

Because the \(\theta \)’s are estimated from personal dummies, convergence of the ML estimation can take a very long time. Instead of estimating the equation systems in (4) and (6) in one shot, we take an iterative approach. In Appendix B we present a detailed description for the iterative method, and here we provide a brief description. We first use the growth equation to get an initial estimate of each leader’s leader effect \(\hat{\theta }_i^0\). Then we impute \(\hat{\theta }_i^0\) into the promotion equation and conduct a joint estimation with the \(\theta _i\)’s in the growth equation treated as unknown using the method developed by Roodman (2011). This returns a new estimate for each leader effect, \(\hat{\theta }_i^1\), as well as an estimate for \(\alpha \). We then repeat the previous round. If in round \(t\) we find that \(\hat{\theta }_i^t\) are sufficiently close to \(\hat{\theta }_i^{t-1}\) for all \(i\), we take \(\hat{\theta }_i^t\) as the final estimate for \(\theta _i\) and take the estimate of \(\alpha \) in the same round as its true value.

4.4.1 Results of the L–L model

Table 6 lists the results of five estimations for the unrestricted model applied to the equation system in (4). The results of the restricted model are not reported to save space. The results of the unrestricted model are broadly similar to those returned by the single-equation regressions. In estimation (1), we report the results when the leader effect is interacted with age. The LR statistic equals 2189.64 and its \(p\) value is less than 0.001, indicating that leaders do matter for local economic growth.Footnote 21 The coefficient for the interaction term between the leader effect and age is almost the same as that obtained in the single equation estimation. The only change is that the coefficient of provincial experience is no longer significant in the joint estimation. This means that the positive result found in the single-equation estimations may be caused by a spurious correlation between economic growth and promotion.

Table 6 Joint estimations: the L–L model

We also use various age thresholds rather than using age in continuous form, to account for potential nonlinearity. Estimations (2)–(5) in Table 6 present the results. They are also barely different from those provided by the single-equation analysis, and the \(\chi ^2\) statistics all confirm the contributions of local leaders for economic growth.

4.4.2 Results of the L-OP model

Table 7 shows the results of the L-OP model where promotion is estimated by the ordered probit model. All the estimations of Table 6 are repeated. The results are broadly similar to those produced by the single-equation estimations. One significant difference is that the gap between younger and older leaders becomes larger, as shown by the coefficients of the interaction terms between age and the leader effect. In summary, our single-equation and system-of-equation estimations provide consistent and robust results that support the thesis that leaders matter for local economic growth, regardless whether we adopt the linear probability model or the ordered probit model to account for promotion.

Table 7 Joint estimation: the L-OP model

5 Concluding remarks

In this paper, we study the role of subnational leaders in local economic growth and their promotion using a unique panel dataset collected on city leaders from 25 Chinese provinces. With the city leaders facing homogeneous national institutional settings, we are able to isolate leaders’ personal abilities from institutional factors that may confound cross-country studies aiming at answering the question whether institutional or human factors contribute more to economic growth. We find that individual leaders’ contributions to local economic growth are sufficiently different to make the case that leaders matter for local growth. This result is robust to a series of tests trying to address the problem of heteroskedasticity in the data. Our findings lend weight to the thesis that human factors are important for economic growth.

We also improve on the existing literature on the promotion tournament in China. Using the leader effect estimated for a leader’s contribution to local growth as the predictor for his or her promotion, we refine the approach of earlier studies. In addition, our study sheds new light on the promotion tournament. First, we show both theoretically and empirically that shuffling is a way to make the leaders comparable across region and across time. In addition to testing leaders in different cities, shuffling leaders between localities allows leaders to be compared across locations and time. The frequency of shuffling does not need to be high (recall that only 15 % of the leaders in our long sample served in more than one city). Second, age is a pivotal factor determining a leader’s chances of promotion. A leader’s chances diminish quickly as he or she gets older, but in the meantime, personal abilities become a stronger factor determining promotion and the effect reaches its peak around the median age of the sample.

It is worth reminding the reader that our results do not imply that the promotion tournament is an optimal mechanism to induce local leaders’ efforts to improve social welfare. Currently, the yardstick of competition is economic growth. Our results do not exclude the possibility that local leaders sacrifice long-term social welfare — particularly the environment — to exchange for higher rates of short-term growth. Shuffling leaders helps the Organizational Department make better comparisons and judgments on local leaders, but it also shortens those leaders’ tenures in a particular city and thus may induce more short-sighted behavior. Even within the economic sphere, the pursuit of short-term growth may lead to over-investment in physical capital and under-investment in education and health. The structural imbalances that have accompanied China’s fast growth may well be linked to the high-powered incentives presented to Chinese local leaders in their competition for promotion. A more nuanced scheme is needed to motivate local leaders to improve social welfare beyond short-term economic growth.