Introduction

Aggregation is the practice of combining the multidimensional into a single dimension. It is used in all aspects of science. Aggregation occurs when a physicist sums the mass of many particles, or when an ecologist sums the energy consumption of an ecosystem. The use of aggregation is so commonplace that its epistemology is often given little thought. This is particularly true in economics—a field that tends to hide epistemological questions under a fog of mathematics (Mirowski 1991; Keen 2001). Aggregation is often portrayed as a purely objective process. After all what could be more objective than the act of adding things up?

This paper highlights some basic problems with aggregation in economics. As I see it, aggregation involves two types of decisions:

  1. 1.

    Choosing a system boundary;

  2. 2.

    Choosing a measurement dimension.

When we choose a system boundary, we decide what to include in our measurement, and what to exclude. When we chose a measurement dimension, we decide how to make the incommensurable commensurable. The problem is that both boundary and dimensional choices are subjective—they depend on our goals. Yet these decisions affect the resulting aggregation. Giampietro et al. (2006) call this the “epistemological predicament associated with purposive quantitative analysis”—“the observer always affects what is observed when defining the descriptive domain”.

This paper explores the aggregation predicament, with a specific focus on measurement dimensions. I discuss how dimensional problems affect economic aggregation, and explore the implications for ecological and biophysical economics.

Moving Beyond the ‘Boundary Critique’

Critics of economic aggregation usually focus on boundary decisions. This is understandable. The national accounts are based on dubious boundary choices. For instance, they exclude unpaid domestic work (Messac 2018; Waring 1999). They also exclude environmental degradation, social ‘bads’, resource depletion, and ecosystem services (Daly and Cobb 1994; Daly and Farley 2011; Dixon and Hamilton 1996; Costanza and Daly 1992; Kubiszewski et al. 2013).

I agree that the national accounts use questionable boundaries. I also agree that choosing ‘better’ boundaries seems like a good idea. However, I am concerned that the ‘boundary critique’ distracts us from a more fundamental problem. Economists have based their accounting system on the dimension of monetary value. Yet this dimension is unstable. Prices change over time in divergent ways. This changing meter stick wreaks havoc with objective measurement. Should we reform a system based on such an unreliable dimension? I argue we should not. Instead, we need to ask some more basic questions. What are we trying to sustain? What dimension is appropriate? There are no simple answers. But as long as we focus only on boundaries, we will not ask these important questions.

Goals

This paper has three goals. The first goal is to show how dimensional choices affect aggregation (“The Dimension Problema” section). When we choose dimensions, we choose how to weigh different attributes against one another. The problem is that this choice affects the resulting aggregation. The usefulness of an aggregation thus depends on agreement about the appropriate dimension. If our goals are contested and the relevant dimensions are ambiguous, aggregation should be avoided.

My second goal is to show what goes wrong when we choose monetary value as the aggregation dimension (“Monetary Value: The Changing Meter Stick” section). The national accounts use monetary value to aggregate economic output (among other things). I discuss how dimensional problems undermine this approach. The problem is that prices—our unit of analysis—are unstable over time. This instability wreaks havoc with objective measurement. When we attempt to ‘correct’ for inflation, we must make many subjective decisions. The result is a measure that is riddled with uncertainty. I discuss how this affects attempts at national accounts boundary reforms. I also discuss the implications for economic growth accounting.

My third goal is to highlight how the dimension problem affects economic decision-making (“Aggregation and ‘Optimal’ Decision-Making” section). Neoclassical economists often claim to identify policies that are optimal (i.e. best for everybody). This approach has significantly influenced sustainability policy. Yet has a simple problem. Optimization requires aggregation. Thus, the search for ‘optimal’ policy inherits all the dimensional pitfalls of aggregation itself. To deal with these problems, I propose a checklist to determine if optimization is appropriate. If the checklist is not met, then the use of optimization is likely pernicious. It gives ethical and moral preferences of the appearance of scientific rigor.

I conclude with thoughts about how to address the aggregation dimension problem (“Addressing the Aggregation Problem” section). Although there are no ‘solutions’, there are ways to cope with the problem. For too long, economic aggregation problems have simply been ignored. If economics is to be reintegrated with the natural sciences (Hall et al. 2001), these issues must be addressed.

The Dimension Problem

Aggregation requires making the incommensurable commensurable. We begin with incommensurable items—‘apples’ and ‘oranges’—and then use a common dimension to make them commensurable. The dimension converts qualities into quantities that can then be universally compared.

Throughout this paper, I will speak of dimension choice, a concept that is likely foreign to many natural scientists. In the context of basic science, dimensions are not usually thought of as a ‘choice’—they are usually taken as a given. For instance, if we want to measure inertia, it is taken as a given that we should use the dimension of mass. But what this means is that there is near-unanimous consent that ‘mass’ is the appropriate dimension for measuring inertia. This follows from Newtons laws, which state that resistance to acceleration (inertia) is proportional to mass. But we should not forget that it has not always been obvious that mass is the relevant dimension for inertia. For instance, on Earth a feather falls more slowly than a brick. Perhaps this means that inertia is related to the dimension of surface area? That we can exclude this possibility (and instead point to the single dimension of mass as the measure of inertia) is an important scientific achievement.

In economics, things are quite different. For instance, there is no well-tested theory that singles out the correct dimension for economic output. In economics, aggregation dimensions are a subjective choice. The dimension problem stems from this choice. Simply put, the subjective choice of dimension affects the aggregation.

When There is No Dimension Problem

Let us begin with instances when there is no dimension problem. This happens when we aggregate items that are identical and unchanging. In this case, we are aggregating items that are already commensurable. Thus, dimensional choices do not affect the aggregation.

This is illustrated in Fig. 1. Here, we imagine aggregating a stock of identical apples. Clearly, the apple stock in Scenario A is half that of Scenario B. This is true no matter what dimensions we use to aggregate apples (mass, volume, energy, etc.). Since all apples are identical, they all share the same attributes. Thus, the choice of attribute does not affect the stock-size ratio between the two scenarios. There is no dimension problem because aggregation reduces to arithmetic.

Fig. 1
figure 1

Unambiguous aggregation

This type of aggregation is something we rarely do in the real world. Yet it is a common assumption in economic theory. For instance, Solow (1956) begins his famous treatise on economic growth theory by assuming that “There is only one commodity, output as a whole”. His reason for doing so is telling. It is so he can “speak unambiguously of the community’s real income” (ibid). What does he mean by this? Solow is essentially assuming away the dimension problem. In a one-commodity economy, dimensional choices do not affect the measured growth of economic output. Thus, changes in output are completely unambiguous, as are changes in real income.

Solow is not alone in making this assumption. The single-commodity economy is a foundational assumption in neoclassical theory. For instance, Colacchio (2018) observes that “the only case consistent with the [neoclassical] marginal productivity theory is that of a ‘one-commodity’ economy”. Neoclassical economic theory assumes a world in which there is no ambiguity in aggregation. That this assumption is obviously violated in the real world is a telling indictment of the theory.

Illustrating the Dimension Problem

I now move on to the more realistic scenario of aggregating items that are not identical. In this situation, the choice of dimension is not neutral.

The best way to understand the dimension problem is through a two-commodity example. Suppose we are shopkeepers who have a stock of apples and bread slices. Like many shopkeepers, we are not satisfied to state that we have x apples and y slices of bread. Instead, we want to know the size of our total inventory. How do we go about calculating this quantity?

Let us set aside the fact that most shopkeepers care about the monetary value of their stock. (I will deal with monetary value later). Instead, let us assume that we want a physical measure of the size of our stock. This is simple enough to do—we just need to choose a dimension of analysis. Let us choose between dimensions of mass, volume, and energy. Table 1 shows realistic values for the average mass, volume and energy content of apples and bread slices. We simply choose one of these dimensions, and use it to aggregate our total stock.

Table 1 Measuring apples and bread slices using different dimensions

But there is a problem. The choice of dimension is subjective—it depends on our goals. Yet this choice plays a crucial role in determining the measurement results. To understand this dilemma, it is helpful to reflect on what choosing a dimension does. The dimension determines the relative weight assigned to each element. In our example, the dimension determines how we weight apples relative to bread slices. The problem is that different dimensions lead to different weightings. The dimensions of mass, volume, and energy lead to different weightings between apples and bread slices:

$${\text{Mass}}:\,1\,{\text{apple}} = 2.5\,{\text{bread}}\,{\text{slices}}$$
(1)
$${\text{Volume}}:\,1\,{\text{apple}} = 2.0\,{\text{bread}}\,{\text{slices}}$$
(2)
$${\text{Energy}}:\,1\,{\text{apple}} = 0.5\,{\text{bread}}\,{\text{slices}}$$
(3)

These different weightings can lead to divergent measures for the aggregate stock of apples and bread slices. We can illustrate this problem by constructing an indexed time series of the size of our stock. Over a period of 30 h, suppose the individual stock of apples and bread slices changed as shown in Fig. 2a. Assuming that apples and bread slices are uniform, we can state that the stock of bread slices increased by 164%, while the stock of apples decreased by 70%. There is no ambiguity here. We would get the same result no matter what dimension of analysis we chose for each series.

Fig. 2
figure 2

Reproduced with permission from Fig. 8.1 in Nitzan and Bichler (2009)

Aggregating a stock of apples and bread slices. This figure shows how the choice of dimension of analysis affects aggregate measures of quantity. We imagine that a shopkeeper has a stock of apples and bread slices. a Shows how the number of apples and bread slices changes over a period of 30 h. b shows three different indexed aggregate measures of the same stock, calculated using dimensions of energy, volume, and mass (with values from Table 1). Different dimensions lead to a different weighting between apples and bread slices, which causes divergent measures for the growth of the aggregate stock

However, this is not true when we move to an aggregate analysis. Figure 2b shows the indexed growth of the aggregate apple–bread stock. Three time series are shown—one for each dimension of analysis. Note the significant discrepancy between the three series. When measured in terms of caloric energy, the size of our apple–bread stock increases by 86%. Yet when measured in terms of mass, the same stock appears to decrease in size by 3%. What is going on here?

This large discrepancy is caused by our dimensions. When we change dimensions, we change the relative weighting between apples and bread slices. This affects how much we weight the increase in the number of bread slices against the decrease in the number of apples. The result is a divergence in the indexed growth of our stock.

It might seem reasonable to ask—which index is the ‘correct’ measure of aggregate quantity? However, this question is ill posed. All three measures are correct in a mathematical sense. Instead, we should ask—which measurement is appropriate given our goals? It is here that subjectivity enters the equation. Suppose we want to use our stock to feed a starving population. In this context, caloric energy content seems the most appropriate choice of dimension. But if we wanted to calculate shipping costs, then mass is likely the best dimension. The choice of dimension depends on our goals. Yet it affects the aggregate measurement. This is the crux of the dimension problem.

To summarize, aggregation requires subjective choices about the dimension of analysis. These choices then affect the resulting measurement. Once it is pointed out, the dimension problem is simple to understand. It borders on trivial. Yet it has far-reaching consequences for economics. By the end of this paper, the reader should see a repeating story. Aggregation requires subjective decisions. Unsurprisingly, economists make subjective decisions to aggregate. But here is the unsettling part. They do not acknowledge that these decisions are subjective. Moreover, they do not explore the consequences of making different decisions. This behavior is an anathema to good science. It needs to be fixed if we wish to construct a legitimate “science of sustainability” (Dodds 1997).

Monetary Value: The Changing Meter Stick

I move now to a dimension problem that is unique to economics. A defining feature of economics is its use of monetary value as a dimension of analysis. I will first discuss when this is unproblematic. If our interest is in prices themselves, then monetary value is a valid dimension of analysis. However, economists often use prices as a means to measure ‘real’ quantities of production. When used this way, we run into a sea of epistemological problems. The result is irreducible measurement uncertainty.

Prices for their Own Sake? Or Prices for ‘Real’ Quantities?

There are two ways to think about money and prices. The first is to think like a capitalist. The second is to think like an economist.

Capitalists are interested in prices for their own sake. A capitalist does not generally care ‘what’ or ‘how much’ he owns (in any physical sense). Instead, he cares about the monetary value of what he owns. And this is only relevant in comparison to the value of other things. Nitzan and Bichler (2009) observe that capitalists are interested in differential comparison—comparing the monetary value of one thing to another (at a given point in time). If we aggregate monetary value for differential comparison, there are no epistemological problems (although we may raise ethical objections).

In contrast, economists are generally not interested in prices for their own sake.Footnote 1 Instead, they are interested in the ‘real’ sphere of production. Economists want to know ‘what’ and ‘how much’ is produced. Prices are merely the window into this ‘real sphere’—a facade that needs to be removed. Economists suppose that aggregate market value (Y) can be divided into two components—prices (P) and some ‘real’ quantity of production Q:

$$\begin{aligned} Y = P \times Q \end{aligned}$$
(4)

The quantity of production Q is then given by Y/P. This method is how ‘real’ GDP is calculated. We take nominal GDP and ‘adjust’ for inflation using a price index. The result is ‘real’ GDP—the ‘real’ quantity of production.

This type of ‘real’ measurement is a foundational goal of the national accounts. Taken at face value it appears to be unproblematic. But when we dig beneath the surface, we find that ‘adjusting’ for inflation requires a host of subjective decisions. The problem is that prices are an unstable unit—a changing meter stick that wreaks havoc with objective measurement.

The Purpose of a Unit

What goes wrong when we aggregate using ‘real’ monetary value? To understand the problem, we need to understand the measurement role of units. A unit’s purpose is to be uniform. Thus, when we use a meter stick, the actual length of the stick is not important. What matters is that every meter stick is as close to the same length as possible. For this reason, scientists take great efforts to precisely define units. For instance, the meter is now defined as the length of the path traveled by light in vacuum during a time interval of 1/299,792,458 of a second (Petley 1983).

A clearly defined unit makes precise measurement possible. Conversely, a poorly defined unit makes precise measurement impossible. Consider the ‘foot’ as a unit of measure. Although it is now precisely defined, the ‘foot’ originates in the practice of using literal human feet to measure length (Dilke and Dilke 1987). Since foot size varies between individuals, we can imagine how this led to uncertainty in measurement. Prices, it turns out, fail the uniformity condition in a spectacular way. This means that they fail as a unit of measure.

Divergent Price Change

The problem is not so much that prices change—it is that prices change in non-uniform ways. I want to emphasize this point, because we often think of inflation as a uniform increase in all prices. If this was true, ‘correcting’ for inflation would be unproblematic. In reality, price change is not uniform. It varies dramatically between different commodities.

Figure 3a, b illustrate this effect using data from the US Consumer Price Index (CPI). Figure 3a shows price changes for ten selected CPI commodities, while Fig. 3b shows price change for all CPI commodities. The range of price change is astonishing. Since 1935, the price of apples increased by a factor of 50, the price of electricity increased by a factor of 7, and the price of TVs actually declined. (TV price decline has mostly to do with quality adjustments, discussed below).

Fig. 3
figure 3

Divergent price change and divergent measure of real GDP. This figure shows how divergent changes in price affect the measurement of US real GDP. a shows historical price changes in ten selected commodities tracked by the Bureau of Labor Statistics. b shows divergent price change for all CPI commodities. Divergent price change means that the choice of base year has a strong effect on the measurement of real GDP growth, as shown in c. For sources and methods, see the "Appendix"

This divergent price change means that our unit is unstable. The effect on aggregation is the same as when we changed dimensions in our apple–bread example (Fig. 2). Divergent price changes cause the relative weighting between commodities to change with time. This means that our measure of aggregate quantity is affected by our choice of price ‘base-year’. This problem was identified over a century ago by Francis Edgeworth (1887):

If one great group of commodities varies pretty uniformly in one direction, and another in a different direction (or even in the same direction but in a markedly different degree), then the task of restoring the level of prices can no longer be regarded as a purely objective ... problem. (cited in Vining and Elwertowski (1976); emphasis added)

To quantify the scale of the problem, we can calculate the relative standard deviationFootnote 2 of US price change. Between 1935 and 2013, price change for all CPI commodities had a relative standard deviation of about 40%. This 40% uncertainty makes other poorly defined units look precise in comparison. Consider the unit of the ‘man’, defined as the length of the man doing the measuring. It is hard to imagine doing accurate science with this length unit. Yet the uncertainty in male height is only about 4%.Footnote 3  Thus, the unit of the ‘man’ is about ten times more accurate than using US prices as a unit of measure.

Price Instability Leads to Real GDP Uncertainty

The problem with price instability is that it leads to ambiguity when we try to ‘adjust’ for inflation. This leads to ambiguity in time-series based on ‘real’ monetary value. Real GDP is the most ubiquitous such time-series, so I will use it as an example here. The ambiguity in real GDP is not easy to spot. Governments publish only a single official measure of real GDP, and they do not report uncertainty. But if we look under the hood of real GDP calculations, we find significant ambiguity.

Let’s review the problem. To calculate real GDP, we pick a base year and hold prices constant. Prices in this year assign a relative weight to each commodity. We then use these weights to aggregate all commodities into a single measure of economic output. The problem is that prices change in non-uniform ways over time. This means that different base years assign a different weighting between commodities. The choice of base year, therefore, affects the growth of real GDP. Since the choice of base year is subjective, we are left with unavoidable ambiguity in our measure. (This problem is the same as the dimension problem illustrated in Fig. 2. When we change base-year prices, the effect is the same as literally changing dimensions).

Sometimes the base-year effect can be enormous. Nigeria recently switched from a 1990 to a 2010 base year, resulting in a doubling of GDP (Blas and Wallis 2014). A similar doubling of GDP occurred when Ghana changed its base year from 1993 to 2006. Base year revisions in Botswana, Kenya, Tanzania and Zambia have also led to large changes in GDP (Jerven 2012, 2014).

In the US, the uncertainty in real GDP growth is sizable. Figure 3c shows how the choice of base year affects the growth of US real GDP per capita. This analysis indicates a 30% uncertainty in the growth of US GDP per capita over the last 60 years. This estimate is conservative because it does not include the uncertainty involved in quality-change adjustments (discussed below). Interestingly, the official measure of real GDP growth (dashed line) is at the upper end of the uncertainty range. Is this a coincidence? It would be interesting to repeat this analysis for other countries to see if official real GDP measures always lie at the upper end of the base-year uncertainty range. This would indicate systematic bias in government methods.

Of course, measurement uncertainty is an unavoidable part of empirical science. Good science requires being honest and open about measurement uncertainty. The problem in economics is twofold. First, we cannot reduce the uncertainty associated with the base-year problem. This is because the problem resides in the unit itself (prices). So long as price change is divergent, we cannot avoid the base-year problem.

Second, the economics discipline is not open about measurement uncertainty in real GDP. US Government economists are aware of the base-year problem.Footnote 4  But instead of admitting that this leads to uncertainty, they have taken the opposite road. The US government has imposed an official way of hiding the problem. This is called the ‘chain-weighting’ method (Steindel 1995). It involves using a moving average over multiple base years. But this approach does not solve the underlying problem—indeed the problem cannot be solved. Prices are an unstable unit of measure, and no amount of mathematical wizardry can change this.

The Quality-Change Problem

The base-year problem is not the only issue with aggregating using ‘real’ monetary value. We must also measure the changing quality of commodities. Economists adopt the following convention: changes in commodity quality are converted into changes in economic quantity. But how should we measure this quality change?

Consider the example of computers. In 1993, imagine that an economy produced 100 Apple IIe computers. In 2017, the same economy produced 100 iMac Pros. Has economic output remained the same? If not, how much has it changed? To answer this question we must convert computer quality changes into quantity changes. But how should we do this? Here, the dimension problem rears its head again. The quality-to-quantity conversion depends on the dimension of analysis. In terms of mass, computer output have probably declined in our hypothetical economy. But in terms of processing power, computer output has greatly increased. Thus, the choice of quality-change dimension affects our measure of output change.

Again, this is just the dimension problem. How we choose to measure output determines how we measure quality change (and vice versa). The choice then affects our results. But in economics we run into a further problem. Economic output is ostensibly measured using monetary value. But monetary value is unreliable for measuring quality change. Why? Because prices change over time even when commodities stay the same. Price change might reflect a change in a commodity’s quality. But it might also reflect pure inflation. The logical conclusion should be that prices cannot measure quality change.

But this is not the conclusions that most price-index economists reach. Instead, they assume that prices reveal a hidden dimension that itself measures quality change. What is this dimension? It is utility—the pleasure derived from a product. Describing this ‘hedonic’ approach, the US Bureau of Labor Statistics writes:

In Price Index Methodology, hedonic quality adjustment has come to mean the practice of decomposing an item into its constituent characteristics, obtaining estimates of the value of the utility derived from each characteristic, and using those value estimates to adjust prices when the quality of a good changes. (BLS 2010) [emphasis added]

Let’s unpack what is going on here. Price index statisticians are using neoclassical theory to justify a particular way of measuring quality change. The idea is that utility is the relevant dimension of quality change. The problem is that utility is unobservable on its own (Nitzan and Bichler 2009). Instead, it is ‘revealed’ through prices (Samuelson 1938, 1948). This inversion makes the whole process circular. Indeed, Robinson (1962) famously observed that utility is a circular concept: “Utility is the quality in commodities that makes individuals want to buy them, and the fact that individuals want to buy commodities shows that they have utility” [emphasis in original].

This discussion boils down to basic questions about the dimension of economic output. Is it (unobservable) utility, as hedonic quality adjustment implies? Or is it something else entirely? Economists need to take this dimension problem seriously.

The scale of the problem is illustrated in Fig. 4. Here I return to the example of computers. Figure 4a plots measures of computer quality change adopted by eight different OECD nations. Bars show the annual average percentage change in computer quality from 1995 to 2001. (For methods used to derive this data, see the "Appendix"). The dispersion in these measures has little to do with the computers themselves. Computers are produced using a global supply chain. To a first approximation, we can treat them as being the same in all countries. Instead, this dispersion results from the different methods used to measure computer quality change. Summarizing the methods in 2004, the OECD observes:

The United States, Canada, France and Australia employ hedonic methods, and show the fastest rates of price decline. Although a hedonic price index has recently been developed in Germany, and introduced into the consumer price index, the investment deflator shown here is still based on the previous methodology. This explains its slower rate of change. No hedonic adjustment is carried out in Italy and in the United Kingdom. Japan constructs a hedonic producer price index for ICT hardware but it is not clear whether this deflator is also used in the national accounts. (OECD 2004)

Fig. 4
figure 4

Divergent measures of computer quality change. This figure illustrates the dispersion in national estimates for rate of change of computer quality. a shows computer quality change estimates for eight OECD nations. Bars represent the average annual growth rate of computer quality between 1995 and 2001. b shows how these quality change measurements would affect the growth of computer ‘output’ over 30 years. Assuming the number of computers produced remains the same in each year, the different quality adjustments lead to divergent measures of computer output growth spanning three orders of magnitude

The scale of this quality-change dispersion is deceptively large. Consider what happens when we project it over thirty years. We assume that the number of computers produced in each country remains constant over time. But we continue to apply the quality-change adjustments shown in Fig. 4a. How is computer output affected?

Remember that quality change metrics convert qualities into quantities. If the quality of computers improves by a factor of 2, this gets converted into a factor of 2 growth in computer quantity. Figure 4 shows the results of a 30-year extrapolation. The differing quality-change adjustments lead to a three orders of magnitude disparity in the measured growth of computer output.

This uncertainty demonstrates a fundamental aspect of the aggregation problem. There is no ‘correct’ way to convert qualities into quantities. Any such conversion depends on our goals, which will determine the dimension we consider appropriate. And different approaches can lead to wildly different measures. This epistemological predicament is not dealt with honestly by the economists.

Again and again, subjective dimensional decisions are not recognized as such. Economists assume that utility is the ‘correct’ dimension of quality change. They turn to utility because prices themselves are an unreliable measure of quality change. And yet utility is never actually observed. It is ‘inferred’ from prices—the very unit that proved unreliable in the first place. This whole process serves to mask a myriad of subjective decisions about how quality change is measured.Footnote 5  The result is significant hidden uncertainty in the measure of ‘real’ monetary value.

The Failure of ‘Real’ Monetary Value: Some Implications

My goal in this section has been to show what goes wrong when we use monetary value as the aggregation dimension. Here is a summary:

  1. 1.

    Prices are an unstable unit;

  2. 2.

    ‘Correcting’ for this instability requires subjective decisions;

  3. 3.

    This causes significant uncertainty in measures of ‘real’ monetary value.

  4. 4.

    Governments do not report this uncertainty.

These problems have important implications for ecological and biophysical economists. I explore some of these below.

Implications for Boundary Reforms of the National Accounts

For those who seek national accounts boundary reforms, the above problems should cause some soul searching. Even when we accept the boundary choices made by the national accounts, the system still fails. A major goal of the national accounts is to objectively measure the growth of economic production. Yet the system cannot deliver this goal. The need to ‘correct’ for inflation causes unavoidable ambiguity in real GDP and other ‘quantity’ measures (such as the capital stock).

Given this ambiguity, is it worth reforming the national accounts to include environmental and social externalizes? I argue that it is not. When we do so, we simply increase the level of ambiguity in our measure. Not only do we keep the ambiguity in ‘correcting’ for inflation, we add the even greater ambiguity of valuing non-market items. Moreover, ecological economists often use neoclassical methods for valuing non-market items. At the very least, we need to be aware of the problems with these methods, and investigate how alternative methods would change our results. (For a critical discussion of neoclassical valuation, see Diamond and Hausman 1994; Dore 1996; Eberle and Hayden 1991). When it comes to sustainability issues, I argue that the national accounts are not worth reforming. The problems with using monetary value as the dimension of analysis are simply too severe.

Implications for Economic Growth Accounting

Let us move on and consider how ‘real’ monetary value ambiguity impacts the field of ‘growth accounting’. This field seeks to identify the importance of different factors (such as labor and capital) for driving economic growth. Yet the field has tended to ignore the role of energy (and other natural resources). Ecological and biophysical economists have devoted significant efforts to fixing this situation. Many studies now exist that analyze the role that energy plays in driving the growth of real GDP (Ayres and Warr 2005, 2010; Beaudreau 1998; Cleveland et al. 1984; Hall et al. 2001; Hannon and Joyce 1981; Kummel 1982, 1989; Kummel et al. 1985, 2000; Kaufmann 1992).

But aside from a disregard for natural resources, there is a more basic flaw with growth accounting. The field assumes that real GDP is an unambiguous measure of economic output. But is it? As Fig. 3c shows, there is significant ambiguity in the growth or real GDP. This is because the calculation of real GDP requires enumerable subjective decisions. Given this subjectivity, I argue that the growth of real GDP is not worth explaining.

Consider the following thought experiment. Using the tools of growth accounting, we find that the growth of energy use accounts for 70% of the growth of US real GDP. Suppose that the government then adopts different methodological decisions. These lead to a large revision in GDP growth (as Fig. 3b shows is possible). We find that energy growth now accounts for a very different fraction of real GDP growth. This raises an uncomfortable question. Can the government’s subjective decisions change the role that energy plays in the economy? One hopes not. Instead, the logical conclusion is that we are trying to explain something that is not worth explaining.

But if we abandon real GDP, what should economic growth theory seek to explain? One possibility is to focus on the growth of biophysical flows. The importance of these flows follows directly from thermodynamic principles (Georgescu-Roegen 1971; Kondepudi and Prigogine 1998; Hall and Klitgaard 2012; Prigogine et al. 1984). Importantly, they can be measured in well-defined biophysical dimensions. Fix (2015) presents a first attempt at this type of approach (focusing on energy). Of course, focusing on biophysical flows does not make the aggregation problem go away (Giampietro et al. 2013). But at the very least, it ensures a stable unit of analysis—something that cannot be said for monetary value.

Summary: A New Old Problem

Although the problems with aggregating ‘real’ monetary value are severe, they are not new. Most were highlighted more than 60 years ago in the ‘Cambridge capital controversy.’Footnote 6 This was a debate in the 1950s and 1960s between economists in Cambridge, England and Cambridge. Robinson (1953) began the debate when she asked—in what units is capital measured? This prompted a protracted exchange that culminating in the Cambridge, England school demonstrating that there is no way to measure the quantity of capital independently of prices and distribution (Hodgson 2005).

Unfortunately, the conclusions of the Cambridge capital controversy have been mostly ignored by mainstream economists. Why? The conclusions are likely too discomforting. The national accounts cannot unambiguously measure the growth of the capital stock or economic output. If we accept this critique, it leaves a gaping hole in the heart of economic theory.

Aggregation and ‘Optimal’ Decision-Making

The implications of the aggregation dimension problem are too extensive to explore fully here. But I do want to highlight how this problem affects the search for ‘optimal’ policy. Optimization plays a major role in sustainability policy discussions. But when we view this practice through a dimensional lens, some gaping flaws become evident.

The Search for Social Optimums

Neoclassical economics claims it can identify optimal policy that is ‘best’ for everybody. This claim is so important that it has been cited as a core goal of economics:

Making optimal use of scarce resources, that is, maximizing subject to constraints, is the central theme of economics” (Dixit 1990) [emphasis added].

Thus, economists have theories for (among other things) optimal taxation (Sandmo 1976), optimal investment (Abel 1983), optimal government size (Karras 1996), optimal economic growth (Koopmans 1965), optimal levels of pollution control (Kwerel 1977), optimal abatement of CO2 emissions (Nordhaus 1992; Goulder and Mathai 2000), optimal use of resources (Burt 1964; Forster 1980), and optimal population size (Eckstein and Wolpin 1985).

These theories may seem arcane, but they have a real impact on government policy. In his work on optimal climate-change policy, William Nordhaus claimed that a “modest carbon tax” was preferable to “rigid emissions stabilization” (Nordhaus 1992). Politicians have used this work to justify the tepid climate policy seen to date (Linden 2018).

Dimensional Choices Affect the Optimum

The problem with using optimization for decision-making is that it requires unidimensional aggregation. Only functions that return a single dimension can be optimized. Functions that return two or more dimensions do not have optimums—they have trade offs. Thus to seek optimal policy, one must decide on a single dimension of analysis. Optimization, therefore, inherits all of the issues associated with aggregation itself. When we seek ‘optimal’ policy, dimensional choices will affect the aggregation and thus the optimum.

To illustrate how dimensional choices affect optimization, I return to the example of a stock of apples and bread slices. Suppose we need to maximize our stock by choosing between two scenarios. In Scenario A, we have three apples and two bread slices, while in Scenario B, we have two apples and three bread slices (see Fig. 5). Which stock is larger?

Fig. 5
figure 5

Measurements of the maximum stock of apples and bread slices. Which scenario (A or B) maximizes the apple–bread stock? This figure shows how the choice of dimension affects the maximizing scenario. When measured in terms of caloric energy (left), Scenario A maximizes the stock. However, when measured in terms of mass, Scenario B maximizes the stock. Calculations use values from Table 1

The problem is that without defining a single dimension to be maximized, this question has no meaning. Scenario A and Scenario B involve an incommensurable trade-off between an additional apple or an additional bread slice. To make a judgment about the maximum stock, we must make the scenarios commensurable. This requires choosing a single dimension of analysis. The problem is that the choice of aggregation dimension affects what we find. As shown in Fig. 5, when we aggregate in terms of energy, we find that Scenario A maximizes the apple–bread stock. However, when we aggregate in terms of mass, Scenario B maximizes the stock. This is because dimensional choice affects the relative weighting between apples and bread slices.

As this example illustrates, optimization is affected by subjective dimensional decisions. Different decisions will lead to different ‘optimal’ solutions. The search for ‘optimal’ policy thus depends crucially on our goals and our resulting choice of dimensions.

An Aggregation and Optimization Checklist

The dependence of optimization on pre-analytic decisions means that it must be used carefullu. Optimization is a powerful decision-making tool when used appropriately. Unfortunately, it is also a powerful tool for persuasion that can be easily misused. To guard against misuse, I suggest we ask the following questions of any optimization procedure:

  1. 1.

    Are the underlying goals well defined and uncontested?

  2. 2.

    Does the dimension of analysis follow unambiguously from the goals?

  3. 3.

    Does the dimension provide an objective way to weight attributes?

If we answer ‘yes’ to all three questions, then the optimization is likely unproblematic. But if we answer ‘no’ to one or more question, then the use of optimization is likely pernicious. Let us consider some examples.

Unproblematic Optimization

Suppose we want to design a gasoline engine with fixed horsepower that uses as little fuel as possible. In this case the goal is clear—minimize fuel use for a given level of power output. From this goal, the relevant optimization dimension (gasoline energy input) follows unambiguously. The science of energetics then defines how to measure the energy content in fuel, ensuring that the weighting of attributes is determined objectively. In this situation, optimization is unproblematic.

Vague and Contested Goals

Let us move from this engineering example to the kind of problem that modern policy makers face. Suppose we need to craft climate change policy. What is our goal? Is it to lower greenhouse gas emissions? Or simply lower their growth? Is it to save human lives (now and in the future)? Is it to achieve sustainable economic growth? Is it to maximize the present value of human welfare? In sustainability situations such as this, our goals are rarely well defined. If we cannot agree on goals, then there is no point in searching for an ‘optimal’ solution, since this does not exist without first agreeing on goals.

Ambiguous Dimensions

Often policy makers simply define the goal to be achieved. (This is, after all, what politicians do). So let us consider a specific sustainability goal. Suppose we want to choose the automobile engine (diesel vs. gasoline) that will have the least ‘environmental impact’.

The problem is that this goal does not lead unambiguously to a dimension of analysis. These leads to ambiguity in the ‘best’ choice of automobile. Consider two different interpretations of our goal. If ‘environmental impact’ means carbon emissions, then diesel is the superior technology. Diesel engines are more fuel efficient, and therefore, emit less carbon dioxide. However, if ‘environmental impact’ means human health problems caused by particulate matter or nitrogen oxides, then diesel engines are worse than gasoline engines (Ghose 2015).

Unfortunately, this example is not a thought experiment. It has recently played out in Europe. To meet Kyoto obligations, many European countries promoted a rapid switch from gasoline to diesel cars. But policy makers did not consider how this would affect air quality. The widespread adoption of diesel engines led to a predictable rise in particulate matter pollution (Forrest 2017). As a British civil servant put it, the policy choice meant deciding between “killing people today rather than saving lives tomorrow” (Vidal 2015).

Subjective Weighting of Attributes

Let us continue with the gasoline versus diesel engine question. But now we will think like economists. We assume that the relevant optimization dimension must be monetary value. In other words, we will conduct a cost–benefit analysis.

The problem is that monetary value does not objectively weigh different environmental impacts. First, there is the problem that inflation makes prices an unstable unit of analysis. Second, many (if not most) environmental impacts do not have a market price. Further subjective decisions are required to estimate this price.

Consider the impact of different types of emissions. Particulate pollution causes immediate, local deaths. However, carbon emissions will cause future, global deaths from climate change. How should we weight these different outcomes? First, there is the issue of pricing life, which is inherently subjective. Different approaches yield divergent results. Historical valuation data presented in Viscusi and Aldy (2003) has a relative standard deviation of 138%. This is about 35 times the dispersion that exists in adult male height. Then there is the issue of weighting future costs against those in the present. The practice in economics is to ‘discount’ the future. But the choice of discount rate is subjective and can lead to wildly different valuations.Footnote 7

Economists continue to debate the ‘correct’ valuation method and the ‘correct’ discount rate. But this misses the point. The problem resides in the dimension itself. Monetary value does not provide an objective way to weight different outcomes. Instead, the analyst must make a host of subjective valuation decisions. These decisions then affect the ‘optimal’ policy. As a result, the ‘optimal’ policy does little more than reflect the preferences of the analyst.

Optimization as a Political Tool

The more we answer ‘no’ to the optimization checklist, the less appropriate it is to seek ‘optimal’ policy. If we answer ‘no’ to all three aggregation questions, then optimization is likely pernicious. Why? It hides subject trade offs that are otherwise clearly visible. When used this way, optimization serves as a political tool. It takes a political debate over subjective trade offs and turns it into a technical dispute for ‘experts’. This gives political and ethical preferences the appearance of scientific rigor.

Addressing the Aggregation Problem

The crux of the aggregation problem is the subjectivity of comparing the incommensurable. To aggregate, we must make subjective decisions about the dimension of analysis. These decisions then affect our results. There are no ‘solutions’ to the aggregation problem, if by ‘solution’ we mean a way to aggregate that involves no subjective decisions. But there are ways of addressing the problem. I outline three possibilities below.

Avoid Aggregation

One response to the aggregation problem is to simply avoid aggregation. This is appropriate for sustainability issues where the relevant dimension is ambiguous and/or contested. If we cannot agree on the appropriate dimension, this is a sign that we should not be aggregating. Instead, we should leave incommensurable trade offs in their own ‘natural’ dimensions. For instance, we might measure habitat loss in dimensions of area, pollution in dimensions of mass, lives lost in dimensions of individuals, and so on. Ackerman (2008) recommends this route as an alternative to cost–benefit analysis:

Most of the information collected for a cost–benefit analysis is useful under any approach to deliberation. The problems arise only in the final steps of crunching everything into a single bottom-line number: monetizing nonmonetary benefits, discounting future outcomes, and guesstimating the values of important uncertainties all have the effect of distorting and concealing the underlying data.

The advantage of not aggregating is that subjective trade offs remain clearly visible. This allows stakeholders to weight trade offs as they see fit, based on their own preferences.

Use Biophysical Dimensions

If we decide to aggregate, then we must to use a dimension with well-defined units. This truism should hardly need stating. Objective measurement requires a precisely defined unit. And yet the majority of economists seem to ignore this fact. They continue to use prices to measure quantities such as economic output and the capital stock. Yet prices are a spectacularly unstable unit. This causes tremendous ambiguity in indexes of quantity derived from the national accounts.

If we value accurate measurement, then we need to stop measuring economic quantities using real monetary value. The obvious alternative is to use biophysical dimensions to measure economic scale. This will remove the problem of poorly defined units. But it means rethinking what we mean by ‘economic growth’ and ‘capital accumulation’. It means we cannot speak of these quantities without first stipulating a dimension. And we should be prepared to find different results when we look at different dimensions.

Be Open About Subjectivity

If we decide to aggregate, then we should be honest about the accompanying subjectivity. This requires being explicit about goals, and reporting dimensional decisions honestly and openly. This allows others to evaluate the reasoning behind the aggregation. It is the reasoning itself that gives the analysis meaning. Acknowledging this fact, Jonathan Nitzan writes:

... any scientific method of measuring ... must, to some extent, be anchored in our initial values. Indeed, it is these initial values which make our analysis worthy in the first place, so they must be clearly identified for that analysis to carry any weight. (Nitzan 1992) [emphasis added]

When our assumptions are presented openly, they can be debated and tested. Ecological and biophysical economists should avoid the path taken by mainstream economics. If we hide our subjective aggregation decisions, or deny that they exist, we embrace the road to pseudoscience.

Conclusions

When the aggregation dimension problem is stated clearly, it borders on trivial. Aggregation requires comparing incommensurable items using a single dimension. How we choose to do this affects the resulting aggregation. This epistemological predicament is simple when identified. And yet it is easily forgotten. Why?

I think the root of the problem is that it is surprisingly easy to become oblivious to our own assumptions. As Feynman (1974) said of science, “The first principle is that you must not fool yourself—and you are the easiest person to fool”. The problem is that ‘getting fooled’ is a sociological process as much as an individual one. Popper (1959) argued that many methodological choices in science are a result of convention. Founding thinkers make subjective decisions that are then adopted as conventions by the rest of the field. As conventions are institutionalized, they begin to appear like ‘objective’ procedures (Nitzan 1992). Over time, we forget the subjective elements of our methods. When this happens, we collectively ‘fool ourselves’.

With regard to aggregation, I have argued that mainstream economists have fooled themselves. They have decided to aggregate economic quantities using the dimension of monetary value. But prices, it turns out, are an unstable unit. As a result, many subjective decisions are required to adjust for price instability. Yet economists have convinced themselves that their methods are objective. As a result, the ambiguity in the national accounts remains hidden from the general observer.

If ecological and biophysical economists want a true ‘science of sustainability’, then we need to question the conventions of mainstream economics. For ecological economists, this has meant questioning the boundary decisions made by the national accounts. But there is a more basic question that we need to ask. Given the many flaws, do we want to keep aggregating economic quantities using ‘real’ monetary value? If not, what dimensions should we use to measure economic output? For that matter, what dimensions should we use to measure sustainability? These questions have no easy answers. But the most important step is recognizing that these dimension questions need to be asked.

Supplementary Material

Supplementary material for this paper is available at the Open Science Framework: https://osf.io/3smra/.