1 Introduction

For a long time, there has been little interest among most neoclassical economists about issues of income and wealth inequality. As Milanovic (2013) dramatically put it, “Before the global crisis, income inequality was relegated to the underworld of economics. The motives of those who studied it were impugned. According to Martin Feldstein, the former head of Reagan’s Council of Economic Advisors, such people have been motivated by envy. Robert Lucas, a Nobel prize winner, thought that nothing [is] as poisonous to sound economics as ‘to focus on questions of distribution.’ ” Yet recently, issues posed by income (and wealth ) inequality have been highlighted in a number of recent books, inter alios, by Stiglitz (2012), Deaton (2013), Piketty (2014) and Atkinson (2015).

Undoubtedly, Piketty (2014) has attracted the most attention and controversy. One of the reasons for the notable impact of Capital in the Twenty First Century is that it focuses on the great increase in the share of the top one per cent in income and wealth over the last 30 years, or so, in the USA, the UK, Australia and Canada. The increase has been much less in the other developed European countries. This emphasis on the change in share of the top one per cent presents a much more dramatic picture of the increase in inequality than, say, a substantial change in inequality recorded by the Gini coefficient.

The figures for the increase in income inequality for the USA over the last three decades are remarkable. The labour compensation of the top one per cent over the period 1979–2007 accounted for 60 per cent of the growth of market-based incomes (38 per cent of post-tax incomes) (Bivens and Mishel 2013). The income of the top one per cent is largely driven by the earnings of chief executive officers (CEOs), not only because they comprise a substantial proportion of the top one per cent but also because there is a comparability effect on the salaries of the other top earners. So, consequently I shall largely concentrate on the pay of the CEOs.

In the USA, over the period 1965–2013, the remuneration of the average CEOs’ annual income increased from just over $800,000 to $15.3 million in 2013 (Mishel and Davis 2014). The ratio of the pay of the average CEO to that of the average worker was 20:1 in 1965, peaking at 383:1 in 2000 and is nearly 300:1 in 2013. In the UK, the FTSE 100 senior executives today earn 150 times that of their average employees; in 1998 the figure was about 50.

One reason why there has been little attention paid to issues of income inequality , with some exceptions, is that pay, including that of CEOs , is seen as being driven by market forces (Mankiw 2013). Individuals are paid their marginal products. Hence, both the salaries of individuals and the share of income going to labour are largely determined by the technological parameters underlying the aggregate production function. Hence, there is little need to consider the role of institutional factors such as how salaries are determined or the influence of sociological factors or social norms.

However, there are severe theoretical and empirical problems underlying the aggregate production function that vitiates the marginal productivity theory of distribution. This will be discussed after considering Mankiw (2013). Piketty (2014) is sceptical of the relevance of the marginal productivity theory for the determination of the salaries of the top one per cent. Nevertheless, he still considers it applicable for explaining the pay of those undertaking ‘replicable’ work, such as a fast-food server. I show, by means of a simple example that this still concedes too much to the marginal productivity theory of distribution. I next briefly discuss the problems surrounding the existence of the aggregate production , especially the remarkable work of John McCombie and his colleague, Jesus Felipe.Footnote 1 Next, given the rejection of the marginal productivity theory of distribution, I consider how CEO pay is determined in practice. I discuss the way that the attempt to solve the principal–agent problem has paradoxically substantially increased the relative income share of the top one per cent. Finally, I analyse the ‘managerial power approach’ associated principally with the work of Bebchuk and Fried (2004). The last section summarises and concludes.

2 Why Should Income Inequality Be a Matter of Concern? Are Not CEOs Paid Their Marginal Products?

The neoclassical standard explanation of how factors of production are rewarded has been developed from Ricardo’s model of distribution by applying the marginal principle to all factors of production and not just to land (Kaldor 1955–1956). Although the early models concerned themselves with homogeneous labour, it is a small step to apply this methodology at the microeconomic level to individuals.

Consequently, in a nutshell, those workers with higher productivities earn higher incomes that reflect their greater contribution to society. This is determined solely by the technical conditions of production and factors affecting the supply of labour. As Clark (1899) wrote many years ago, “[i]t is the purpose of this work to show that the distribution of income to society is controlled by a natural law, and that this law, if it worked without friction, would give to every agent of production the amount of wealth which that agent creates” (p. v). While Clark’s statement does not imply that this is what every agent necessarily ought to get, it is often implicitly assumed that this is the case (Mankiw 2013). Moreover, the implication is that any attempt to alter the free market distribution of earnings will lead to a ‘great contradiction’, as Okun (1977) termed it, namely a trade-off between equity and efficiency. As altering the distribution of income is likely to reduce the efficiency of the allocation of resources, it, therefore, comes at an economic cost.

A recent statement defending the present distribution of the income of the top one per cent along these lines, albeit with some minor qualifications, is that of Mankiw (2013). Mankiw believes that in a competitive economy individuals are paid their marginal products. For example, in outlining what he sees as the criticism of what he describes as the ‘left’, he writes as follows: “In the standard competitive labor market, a person’s earnings equal the value of his or her marginal product” (p. 32). The normative implications of this are made explicit when he attempts to defend the earnings of the top one per cent along the following lines of the ethical argument of ‘just deserts’. “If the economy were described by a classical competitive equilibrium without any externalities or public goods, then every individual would earn the value of his or her marginal product, and there would be no need for government to alter the resulting income distribution” (Mankiw 2013, p. 32).

Consequently, this may be taken as the neoclassical benchmark. The key, Mankiw (op. cit.) continues, is whether the earnings of the top one per cent reflect their higher (marginal) productivity or represent the extraction of rents. Indeed, he concedes that if the increase in the share of the top one per cent were attributable to successful rent-seeking, he would deplore it. He asserts that on his own reading of the evidence the earnings of the top one per cent, and their rapid growth over the last 30 years, is due to their increased productivity.

The evidence Mankiw (2013) offers in support of this is not compelling. He invokes the superstar theory that “changes in technology have allowed a small number of highly educated and exceptionally talented individuals to command superstar incomes in ways that were not possible a generation ago” (Mankiw 2013, p. 13). As an example of this, he cites Steve Jobs of Apple and the authoress J.K. Rowling. However, their large incomes are heavily dependent on institutions set up by governments in the form of patents, copyright monopolies and, in the case of Jobs, US state expenditure on R&D (Mazzucato 2013), all of which are the antithesis of the free market. Moreover, such huge salaries are not necessary to persuade individuals to make substantial contributions to society. Just think of the unsung heroes who developed the internet and indeed the role of the US government in facilitating it. Then, there is the Genotype project, which makes the results freely available to all, compared with the smaller project of the Celera Corporation, whose aim was to appropriate the private rents from advances in this area. One could go on almost indefinitely. Finally, the share of the top one per cent is dominated by CEOs and the finance sector, not talented innovators.

The second line of reasoning is that Mankiw argues that the increase in the share is due to the ‘race between education and technology’. This is the hypothesis that skill-biased technical change has increased the demand for skilled relative to unskilled labour and has led to a college premium. This, according to the hypothesis and which is Mankiw’s view, has led to rising income inequality, which has nothing to do with rent-seeking, but is simply the operation of supply and demand for labour. Mankiw argues that, while Goldin and Katz (2009) concentrate on the full distribution of income rather than the top one per cent, ‘it is natural to suspect that similar forces are at work’. The share of the top one per cent is considered to follow a similar U-shaped pattern over time similar to the skill–unskilled wage differential. However, unfortunately for this explanation, the college premium flattened out in the 1990s, while the growth of the share of the top one per cent was accelerated and bears little resemblance to the path of the college premium. Moreover, the skill-biased explanation cannot explain the fact that there has also been a rapid increase in the share of the top one per cent in capital income (Mishel and Davis 2014). The hypothesis of skill-biased technical change is predicated upon the existence of a well-behaved CES production function and the indirect measure of different types of technical change.Footnote 2 I shall question the foundations of the aggregate production function below.

However, for neoclassical economists, the existence of the concept of the marginal product of labour and the necessary adjunct of the (aggregate) production function is taken as axiomatic. In the language of Lakatos (1970), the latter is part of the ‘hard core’ or, in Kuhnian (1970) terms, it is a paradigmatic heuristic. The role of the marginal product of labour in determining pay is taken for granted and is deemed untestable by fiat. Consequently, the mainstream view has been that income inequality and its changes are not major issues. The former merely reflects differences in the marginal productivities of labour. Moreover, the decline in labour’s aggregate share, which has been observed in many advanced countries, is explained solely in terms of the aggregate production function and the value of the elasticity of substitution, together with changes in the capital-output ratio.

3 On Piketty’s ‘Illusion of Marginal Productivity’

It is difficult to discuss changes in wealth or income inequality without mentioning Piketty’s (2014) influential Capital in the Twenty-First Century. Piketty (2014) is rightly extremely sceptical of the concept of marginal productivity as an explanation for the determination of wages and salaries of the top one per cent. The hedge fund manager, for example, Paulson earned $3.7 billion in 2007 (Rajan 2010, p. 80). Was this his marginal product? How do we test this proposition? Should the marginal products of a handful of CEOs of the banks that precipitated the Great Recession be regarded as substantially negative over this period? It is worth citing Piketty (2014):

To my mind, the most convincing explanation for the explosion of the very top US incomes is the following. As noted, the vast majority of top earners are senior managers of large firms. It is rather naïve to seek an objective basis for their high salaries in individual “productivity”. When a job is replicable, as in the case of an assembly-line worker or fast food server, we can give an approximate estimate of the “marginal product” that would be realized by adding one additional worker or waiter (albeit with a considerable margin of error in our estimate). But when an individual’s job functions are unique, or nearly so, then the margin of error is much greater. Indeed, once we introduce the hypothesis of imperfect competition into standard economic models (eminently justifiable in this context), the very “individual marginal productivity” becomes hard to define. In fact, it becomes something close to a pure ideological construct on the basis of which justification for higher status can be elaborated. (pp. 330–331; emphasis added)

What is interesting here is that although Piketty dismisses the concept of marginal productivity for senior managers and executives, he seems to consider that theoretically it can be measured for those doing ‘replicable’ jobs, albeit imprecisely. This seems a somewhat contradictory position. As the top one per cent took the vast majority of the increase in income over the last 30 years in the USA, and this had nothing to do with their marginal productivity (which, as Piketty notes, cannot be independently measured), how could the remainder of the labour force be paid their marginal products? Nevertheless, it is a short step from Piketty’s statement to assuming that for these employees with replicable jobs, competitive markets will ensure that they are paid the contribution they make to the economy. However, while the evidence discussed later provides support for Piketty’s arguments regarding CEOs’ pay, I shall argue that even for replicable jobs, the marginal productivity theory, qua a theory, is logically problematical.

To show what, in retrospect, may be seen to be a straightforward point, let us, following Piketty, take the example of a small restaurant managed by the owner. The manager has no idea of the elasticity of demand for his meals, and so undertakes a mark-up pricing policy, a là Kalecki. Prices are determined by a mark-up on the unit costs of labour (the salaries of the waiters and chefs) and the ingredients of the meals together with the other capital costs (energy, rates, etc.). Consequently, total revenue is given by:

$$ \kern0.28em {p}_MM\equiv R\equiv \left(1+\pi \right)\left( wL+I\right) $$
(7.1)

where p M is the price of a meal (M), R is total revenue and I is the value of the ingredients. The operating profit is equal to Π ≡ π(wL + I). The mark-up is determined by the state of competition from other restaurants, the overall level of affluence in the local area and it is also influenced by a target for the level of profits. Nominal wages are assumed to be determined by the state of the local labour market. The contribution of value added of the restaurant to output as reported in the national income and product accounts (NIPA) is given by:

$$ R-I\equiv Y\equiv wL+\varPi \equiv wL+\pi \left( wL+I\right) $$
(7.2)

Suppose the restaurant is flourishing and the manager considers it desirable to hire a new waiter to speed up the service, but for the sake of argument, the same number of meals is served. Under this pricing policy, the increase in value added (Y) in adding an extra employee, from Eq. (7.2), is definitionally equal to ∂Y/∂L = (1 + π)w. So, if we interpret ∂Y/∂L as the marginal product of labour, we can see that it is less than the wage rate. This is because the hiring of the extra waiter, through the pricing policy, automatically increases profits at the same time. Consequently, Π is not held constant as L changes and as the neoclassical marginal productivity theory assumes. Of course, if the manager merely passes on the increased labour cost in the form of an increased price of the meal, then, from Eq. (7.2) and holding Π constant, by definition, ∂Y/∂L ≡ w (it should be noted that the greater price of the meal reflects its increased quality, which includes a better speed of service). But this is not the result of optimization using a well-behaved production function subject to a cost constraint. In fact, changes in the local labour market conditions (such as an increase in the minimum wage) that affect the wage rate of the waiter will also cause his/her supposed marginal productivity to change. But the causation runs from the wage rate to the putatively marginal productivity.Footnote 3

It should be noted that this applies to a firm that is selling a marketed product to the private sector. But what about the large (public) sector of the economy where there is no independent measure of aggregate output? Much depends upon the way it is calculated. In the early national accounts, the output was just taken to be equal to the total labour compensation with an arbitrary adjustment for capital costs. In many cases, there are measures of physical outputs (such as the number of operations in hospitals, or number of trials in the judicial system, which can be used), but the problem still arises as to how to price or value them. Attempts in the UK have been made to revise the output measures of government services after the Atkinson Review (2005), but insurmountable problems remain for the testing of marginal productivity.

It should be noted that the accounting identity, Y ≡ wL + rK, where Y is income, holds irrespective of the degree of state of competition, whether or not there are well-defined production functions and whether or not firms optimise. If this accounting identity is partially differentiated with respect to labour, we obtain ∂Y/∂L = w and (∂Y/∂L)(L/Y) = wL/Y = a where a is labour’s share. The expression (∂Y/∂L)(L/Y) = α is the neoclassical definition of labour’s output elasticity and, under neoclassical production theory, is equal to the wage share if there are competitive markets, a well-behaved aggregate production function and factors are paid their marginal products. But from the definition of the national accounts, α must be definitionally equal to the wage share, a. This led Phelps Brown (1957) to comment that labour’s output elasticity of the production function and the wage share “will be only two sides of the same coin” (p. 557).

On a more pragmatic note, Thurow (1975) in his ‘A Do-it-Yourself Guide to Marginal Productivity’ (pp. 211–230) raises some further problems that occur even if output can be valued independently of the inputs. Other questions include the problems posed by disequilibrium , uncertainty, the presence of increasing returns to scale, whether governments can in principle ever pay their employees according to their marginal productivity and to what extent income benefits influence monetary remuneration. As Adam Smith long ago pointed out, production is characterised by the division of labour. The decisions of, say, a CEO will be influenced by the quality of the decisions of his subordinates, and indeed the outcome of different views in the decision-making process. It makes little sense to try to identify the output of an individual in these and similar circumstances. Clearly, even ignoring the problems of the measurement of the monetary value of output independently of the value of wages, there are many other insuperable difficulties noted by Thurow (1975) in the way of providing an adequate test of the marginal productivity theory. These concerns are shared, inter alios, by Stiglitz (2012, p. 97).

4 The ‘Illusion of the Aggregate Production Function’

It is somewhat paradoxical that Piketty, in spite of his reservations about the marginal productivity theory in explaining the wage rate, nevertheless at times explains the changes in the shares going to capital and labour in terms of an aggregate CES production function. Piketty notes that over the last 30 years or so, capital’s share of income has risen in many countries while the ratio of capital to income has also increased. In terms of conventional neoclassical production theory, this change is simply explained in terms of an aggregate production function where capital and labour are paid their marginal products and the elasticity of substitution is greater than unity.Footnote 4 After discussing the effect of bargaining power on factor shares, this is soon ignored and Piketty considers the role of technology and the production function as an explanation for the changes in the functional distribution of income between capital and labour. However, Piketty’s estimates of the capital stock , which are broadly defined, seem to be overstated and the capital-output ratio has fallen. This implies an elasticity of substitution of less than unity, which empirically seems to be the case (Chirinko 2008; Rowthorn 2014). However, in this approach, there is no role for changes in labour market polices, globalization and so on, to affect the functional distribution of income. It is all down to the technology of production. But is it?

4.1 The Cambridge Capital Theory Controversies and the Aggregation Problem

Although the aggregate production function is now used in neoclassical economics, there is a fundamental problem as to whether or not it exists. First, there is the question as to even when there are well-defined micro-production functions, these can be aggregated to give an aggregate production function. Fisher (2005), who has done more work on this problem than most, comes to the conclusion that micro-production functions cannot be successfully aggregated.Footnote 5

Related to this, are the Cambridge capital theory controversies of the 1950s and 1960s. This debate was largely between Cambridge UK and Cambridge Massachusetts (MIT). The first issue centred on whether the theoretical concept of ‘capital’ as a factor of production had any meaning outside the highly restrictive one-commodity world. The upshot was that the answer was ‘no’. This important debate between Cambridge, UK, and Cambridge, Massachusetts, has long been relegated to the history of economic thought, forgotten or treated as an esoteric debate in theory (Birner 2002). Samuelson (1962) published a paper where he purported to show that a production system with more than one technique of production could be represented by a one-commodity aggregate production function. The capital theory controversies, and they were entirely a matter of theory, proved that this construct was untenable. It was also shown that outside a one-commodity world, an increase in the wage rate was not necessarily associated with an increase in the capital-labour ratio (‘capital reversing’). ‘Reswitching’ can also occur, which is when the same technique of production can be the most profitable at two different interest rates.Footnote 6 While even theoretical debates are rarely conclusive in economics, the force of the Cambridge (UK) critique was conceded by Samuelson (1966). However, results and implications of this debate have long been forgotten by most economists.

So why are aggregate production functions still so widely used?

4.2 Why Aggregate Production Functions ‘Work’?

One reason is that aggregate production functions ‘work’, in that statistical estimations of them give plausible estimates of the parameters. As Solow once remarked to Fisher, “had Douglas found labor’s share to be 25 per cent and capital’s 75 per cent instead of the other way around, we would not now be discussing aggregate production functions” (Fisher 1971, p. 305).

Most neoclassical economists accept Friedman’s (1953) methodological stance that the realism of the assumptions of a model does not matter, what is important is its predictive ability. Ever since Cobb and Douglas’s (1928) seminal paper, many estimations of aggregate production functions have found good statistical fits with the estimated output elasticities close to the factor shares. This has been taken to show that the aggregation problem and the Cambridge capital controversies are empirically irrelevant. Furthermore, this statistical result is interpreted as an indirect confirmation that factors are paid their marginal products.Footnote 7

However, a difficulty arises from the fact that the aggregate production function is an engineering relationship and should be expressed in physical terms (see, e.g., Ferguson 1971, p. 250). However, aggregate production functions are estimated using constant price value data for output and the capital stock , where the output and the capital stock are a constant-price value measure and the ‘price’ is a price deflator. The (erroneous) implication is that the results of the physical one-sector production function still follow through unaffected. The problem is that in practice the aggregate production function has to be estimated using constant-price value data for both output (confusingly, sometimes called the ‘volume’ of output) and the capital stock . The accounting identity Y = wL + rK must hold for any state of competition, whether or not there are constant returns to scale and, importantly, even if the aggregate production function does not exist. If the identity is differentiated and then integrated at any point of time, then the result is a Cobb–Douglas relationship given by:

$$ Y\equiv wl+ rK\equiv B{w}^a{r}^{1-a}{L}^a{K}^{\left(1-a\right)}\equiv A{L}^a{K}^{\left(1-a\right)} $$
(7.3)

where B is the constant of integration and a and (1 − a) are the factor shares.Footnote 8 Equation (7.3) has no behavioural content at all. However, when cross-sectional observations are used in the statistical estimation of the Cobb–Douglas , a, (1 − a), w and r may all differ. But generally if one were to estimate a putative Cobb–Douglas production function, the ‘output elasticities’ would be close to the factor shares, which would be misleadingly interpreted as confirming that factors of production are paid their marginal products. If the factor shares differ in the cross-sectional data, then the use of a Box–Cox transformation may suggest that a more flexible functional form, such as the CES relationship, may give a better statistical fit and approximation to the accounting identity.

What about estimates of aggregate production functions using time -series data? Following Felipe and McCombie (2013), we can express the argument as follows where the ‘direction of causation’ runs from the identity to the putative production function:

$$ {Y}_t\equiv {w}_t{L}_t+{r}_t{K}_t\Rightarrow {\widehat{Y}}_t\equiv {a}_t{w}_r+{a}_t{\widehat{L}}_t+\left(1-{a}_t\right){r}_t+\left(1-{a}_t\right){\widehat{K}}_t\Rightarrow $$
$$ {Y}_t=F\left({K}_t,{L}_t,t\right)\Rightarrow \mathrm{Cobb}\hbox{-} \mathrm{Douglas};\mathrm{CES};\mathrm{translog}\ \mathrm{production}\ \mathrm{functions} $$
(7.4)

Expressing the accounting identity in growth rates may yield a variety of functional forms, depending upon how the factor shares vary over time, if in fact they do. For expositional ease, if the factor shares are constant, then the accounting identity may be expressed as:

$$ {Y}_t\equiv {A}_0{e}^{\lambda t}{K}_t^a{L}_t^{\left(1-a\right)} $$
(7.5)

where λ = ar + (1 − a)w, that is, the weighted growth of the rate of profit and the wage rate are constant. If this is the case, estimating the accounting identity will give a perfect fit to the supposed Cobb–Douglas production function. More generally, the identity will give a good fit to time -series data, provided the weighted logarithm of the wage rate and profit rate can be accurately proxied by a time trend. This will often have to be a non-linear function as the wage rate and the profit rate have a strong cyclical component. The use of a linear time trend can give such poor statistical results that it often gives the impression that a behavioural equation is being estimated. It should be noted that this critique does not just apply to the Cobb–Douglas production function. If the identity has changing factor shares due to, say, the relative change in the bargaining power of firms and workers due to globalization, a better transformation of the accounting identity may be given by a CES relationship as in Eq. (7.4) (Felipe and McCombie 2001; Simon 1979). What are the implications? The use of the aggregate production function to determine the output elasticities and, hence, indirectly test and often supposedly confirm the marginal productivity theory of distribution by comparing them to the factor shares is without foundation.

Piketty is aware of the limitations of the aggregate production function and the role of the paradigm in determining what the legitimate questions are.

All economic concepts, irrespective of how ‘scientific’ they pretend to be, are intellectual constructions that are socially and historically determined, and which are often used to promote certain views, values or interests. […] In particular, the notion of the aggregate capital stock K and of an aggregate production function Y = F(K, L) are highly abstract concepts. From time to time I refer to them. But I certainly do not believe that such gross oversimplified concepts can provide an adequate description of the production structure and the state of property and social relations for any society. (Piketty 2015, p. 70)

Given these conclusions, the logical step is to examine how the pay of, say, the top one per cent is determined in practice, looking at the institutional framework within which these salaries are determined. This involves using a completely different framework and discarding the neoclassical paradigm.

5 The Determination of the Pay of CEOs

The increase in overall inequality in incomes has generally been explained in terms of labour market forces; the increasing wage premium for college graduates, the effect of technical change on the increased demand for skills, the effect of globalization, and the weakening of labour and product market policies and institutions (OECD 2011; Autor 2014). But these explanations, such as those based on the supply and demand for skills, are not adequate to explain the rapid rise of the extreme top end of the earnings distribution. Table 7.1 shows the extraordinary increase in the ratio of CEO pay to the average worker’s pay for the USA over the period 1965–2015. There is the rapid rise in the ratio from 1990 to 2000, followed by a sharp dip associated with the bursting of the dot.com bubble, the recovery and then the short-term decline with the Great Recession . The first obvious problem with the marginal productivity explanation is that the rapid growth of CEO salaries since 1990 is not matched by any increase in the efficiency of firms or the growth of total output. According to the Bureau of Labor Statistics, the growth of US labour productivity over the period 1990–2000 was 2.2 per cent per annum, 2000–2007, 2.6 per cent per annum and 2007–2016, a mere 1.2 per cent.

Table 7.1 CEO -to-worker compensation ratio, 1965–2015 (selected years)

The evidence seems to point to the fact that the increase in the share of the top tail of the distribution has been the result of rent extraction and the pay-setting institutions and not the working of competitive markets (Bivens and Mishel 2013).

Compelling evidence that these high salaries are largely rents is that the increase in the top one per cent in the USA has been mirrored in the UK, Australia and Canada, but not to such an extent in the other advanced countries, such as continental Europe, Korea and Japan. The experiences of Japan, Germany and Sweden, where the share of the top one per cent since the 1930s either depicts an L-shaped curve or is flat, are very different from those of the USA, and the UK, where the pattern of inequality follows a U-shaped curve. Alvaredo et al. (2013) suggest that different institutional arrangements and policies may be the reason why similar countries exhibit ‘such diverging patterns’ in inequality. They maintain that “purely technological stories based solely upon the supply and demand of skills can hardly explain such diverging patterns” (Alvaredo et al. 2013, p. 5).

Arguments in support of the contention that CEOs are paid their marginal products in competitive markets are unconvincing. Kaplan (2012) asks how is it that other groups such as private corporate lawyers, hedge fund investors and private equity investors have achieved equal significant increases? He further argues that CEO compensation has risen slower than the average incomes of the top households, an argument quoted with approval by Mankiw (2013). But as Bivens and Mishel (2013) and Mishel and Davis (2014) have shown, if one uses the earners and not households as the comparator, CEO compensation has risen faster. But even if Kaplan (2012) is correct, how does this necessarily demonstrate that top incomes are determined in a competitive market for talent? The rapid growth of their income could be largely the result of comparability with CEOs’ remuneration and influenced by the fact that the pay determination of the top earners has changed since the mid-1970s.

Furthermore, in the USA and the UK, the rapid increases in the size and profits of the financial sector have driven up top salaries in this sector. In 2008, in the USA, the finance sector earned a quarter of GDP and 40 per cent of profits.Footnote 9 Philippon and Reshef (2012) have estimated that the most significant factor in determining wages in this sector just prior to the subprime crisis was deregulation. This led for a short time to an increase in this sector’s profits, before the subprime crisis, through a rapid increase in leverage and risk taking, the latter caused by the extensive use of financial instruments such as Residential Backed Securities, Collateralized Debt Obligations and Credit Default Swaps on the Collateralized Debt Obligations. Philippon and Reshef (2012) find that the excess wage in finance, the difference between the amount employees earned in this industry, compared with the amount they are predicted to make, reached 40 per cent, which can largely be attributed to rents.

But clearly, to understand why CEOs’ income has risen so dramatically, it is necessary to examine how their salaries are determined in practice. There is now great deal of evidence as to how top executives’ pay is set in reality. As Bebchuk and Fried (2004, 2005) have shown, CEOs’ salaries are determined by supposedly independent remuneration committees and directors on behalf of the shareholders. These committees, which can hardly be described as independent (Bebchuk and Fried 2004), are responsible for not only setting the base salary but also bonus schemes, such as stock options and restricted stock , to incentivise the CEO to act in the best interests of the shareholders (Conyon 2006).

There are basically two competing explanations as to whether this is successful. One view is that ‘optimal contracts’ have been introduced for CEOs , and other highly paid executives, and have largely solved the principal–agent problem. The other view is articulated by Bebchuk and Fried (2003, 2004, 2005) who dismiss the optimal contracts literature, referring to it disparagingly as the ‘official story’.Footnote 10 Their central hypothesis is that the determination of executive pay is the result of a process of remuneration committee capture, whereby the CEOs succeed in setting their own compensation. Bebchuk and Fried (2004) call this process ‘the managerial power approach’, which is presented as a more convincing alternative to the optimal contracting theory.

According to the optimal contracting approach, CEOs earn what is termed their ‘reservation utility’ , which is the remuneration that prevents them from quitting and going somewhere else. According to the managerial power approach, the CEO compensation is set as high as possible, subject to an ‘outrage factor’, which has changed for some reason over time. According to the principal–agent approach, the use of options and restricted shares, as a substantial part of a CEO’s salary package, is seen as incentives given to solve an agency problem. CEOs’ compensation is linked to the financial performance of their firms as reflected in their share valuation. According to the managerial power approach, whatever their rationale, options and restricted stock only transfer rents to executives and do not act as an incentive to get value-maximising strategies adopted.

Much of the impetus for the rapid increase in the use of stock options as a substantial part of CEOs’ remuneration came from the work of two influential business economists, Jensen and Murphy (1990a, b). Under the standard belief that the best judge of the performance of corporations are financial markets, they encouraged the remuneration committees of companies to award CEOs high compensation (they thought that, at the time, CEOs were underpaid), using stock options in order to attract and retain the best and most talented individuals and to use monetary incentives to align the conflicting interests. This ‘pay for performance’ was seen as the best solution to the principal–agent problem. It aligns shareholders’ and CEOs’ interests because, so the argument goes, CEOs are rewarded only if they pursue the principals’ interests, which will be reflected in the firms’ share price.

This ‘optimal contracting’, which is aligned to the ‘maximizing shareholder value’ approach, has been widely adopted in the USA. The success of the management of the firms was to be judged largely, or solely, in terms to the share price of the firm. Typically, top executives have been given options to buy shares not at the then prevailing price, but at some time in the future, when the share price is likely to be higher, supposedly due to CEOs’ efforts. It is notable that in 2004, on the basis of evidence of the actual effect of the stock options, Jensen et al. had a complete volte face and completely changed their minds.Footnote 11 However, by then, it was too late.

Consequently, we have an answer to the question posed above: what was the cause of the dramatic rise in CEOs’ pay over the last 30 years or so? If one were to search for an, or indeed the most, important proximate factor in the growth of CEO pay relative to the mean wage, one need look no further than the widespread use of stock options. The use of stock options was introduced in addition to CEOs’ salary as there was no corresponding reduction in the latter when the stock options were introduced. Starting from the 1980s, there is a high correlation between CEO’s remuneration and stock prices. Table 7.1 shows the consequences of the move towards a much greater part of the remuneration of CEOs being tied up with stock options and, hence, being closely correlated in the value of the stock prices.

Table 7.2 reports the results of regressing the logarithm of CEO annual compensation on the logarithm of S&P Index series over the period 1965–2014. The regression results reveal the strong and statistically significant impact of the growth of the S&P index on that of the top executives’ pay, with over 80 per cent of the variation of the latter explained.Footnote 12 The regression analysis starts by assessing the estimated impact of the lagged level of the S&P Index on the CEO’s annual compensation both without and with a time trend (columns I and II). It is found that the time trend is statistically significant and that the S&P index has affected positively the level of the CEO’s pay, and is statistically significant. The same occurs even when we control for the structural break. Empirical tests reveal that there has been a structural break in 1993: before and after that, the autonomous growth of CEO compensation is positive and significant, and equal to five and two per cent per annum, respectively (columns III and IV). Finally, it is investigated whether there had been any change in the slope coefficient of the S&P index. It is found that the slope has changed and has increased after 1993, but by a small amount (column V).

Table 7.2 CEO’s annual compensation and S&P 500 index (1965–2014). OLS regressions

Bebchuk and Grinstein (2005) run regressions attempting to explain the rapid rise of CEO compensation, over the period 1993–2003, but solely in terms of standard industry variables. They conclude that “the growth in pay levels has gone far beyond what can be explained by the changes in market cap and industry mix” (p. 302).

Why did performance-related pay prove ineffective, and merely led to rapid increases in CEOs’ remuneration? The answer is that in the USA, the structure of a corporation is such that CEOs have enormous influence over the board of directors, who are supposed to be independent and to supervise the CEOs’ conduct and remuneration. Directors often receive large direct and indirect benefits, which are largely at the CEOs’ discretion. Moreover, there are often interlocking pay committees with CEOs being on each other’s remuneration committees, even if at several times removed. Consequently, the CEOs’ remunerations are effectively mutually determined. There are spillover effects into the public sector where large pay increases of the top managers are justified by reference to comparable private-sector pay, often judged merely by the size of the organization rather than any reference to its profitability (Bebchuk and Fried 2003).

Bebchuk and Fried (2004) analyse in detail the performance-related pay schemes, with a view to determining whether these resemble more the optimal contracting approach (according to the principal–agent theory) or the so-called managerial power approach. They found that the structure of the compensation schemes provides compelling evidence for the managerial power approach. Performance pay in the private sector is often linked to the overall increase in the value of the company’s shares, not how the company performs relative to the stock market overall. Ideally, CEOs’ compensation should reflect only the degree to which the company performance that has been affected by their actions. If the value of all shares increases, as it happens during a stock market boom, then additional compensation should go only to the CEOs of those companies whose stock prices rose more rapidly than the average. But this never occurs in practice. CEOs receive stock options with a fixed price and can achieve considerable payments for these, even if their stock increases less than the market (Bebchuk and Fried 2004). Moreover, many of the arrangements for CEOs’ pay are far from transparent, which is the opposite of what one would expect if the principal–agent problem was to be minimised.

The remarkably small number of financial linkages that connect most of the world’s international firms has been demonstrated by Vitali et al. (2011). They used complex network analysis to trace the cross-holdings between 43,060 transnational corporations and found that 147 of these companies had control of 40 per cent of the value of transnational corporations, and 737 had control of 80 per cent. It can be seen that this close interrelationship not only poses severe economic stability problems but also how a very small network of top managers could come to set their own salaries based on a circularity notion of comparability.Footnote 13

In other words, according to the evidence, the rapid increase in CEOs’ remuneration has been driven more by rent extraction than the result of a well-functioning competitive market for senior executives. Moreover, while changes in income distribution need not be a zero-sum game, there is overwhelming evidence that the rise in the share of the top one per cent has been at the expense of the remaining 99 per cent. The relationship between work effort and pay in the neoclassical schema (work is seen merely as a disutility) is over simplistic. Many CEOs and top earners gain a great deal of utility through the power and prestige of their positions, and it is doubtful whether their work effort would decrease if their earnings were taxed more or their salaries were less.

6 Summary and Conclusions

The last three or four decades have seen an explosion in the pay of not only the CEOs but also of managers in the non-private sector. What was once considered an unacceptable salary for the top earners compared to the average remuneration has now become commonplace. The whole question of the remuneration of top executives and managers is one that involves a consideration of how these payments are determined and social norms about what is acceptable. These social norms are not those of the society as a whole, but rather those involved in the determination of these salaries. Clearly, an important question is how these social norms (or moral outrage) are determined and how and why they change over time.

What is clear, however, is that any defence of the rapid increase in the earnings of the top one per cent based on the notion of marginal productivity by neoclassical economists and the concept of ‘just deserts’ is untenable. I have highlighted the theoretical and insurmountable problems concerning the marginal productivity theory of factor pricing and the related concept of the aggregate production function. But what is also telling is that for the neoclassical approach, grounded in the need for microfoundations and using extensively the individual representative agent, it is impossible to test whether the remuneration of a specific individual represents his or her contribution to society. The chapter has considered the way that CEOs are remunerated. It is clear that the rapid increase in their pay, and that of the top one per cent, represents a change in societal values and their managerial power; a concept that fits uncomfortably within neoclassical economics. In fact, the debate over the pay of the CEOs merely serves to emphasise the fact that the neoclassical approach, in relying on the marginal productivity theory of distribution, does not have a coherent theoretical explanation of wage determination.