1 Introduction

Per capita GDP has long been used as a proxy measure of well-being. However, it is now widely recognized that income data provide only a partial perspective on the array of factors that affect people’s lives. Given the problems with using GDP per capita as a measure of well-being, many researchers have been searching for alternative measures. In particular, the importance of incorporating a wider range of socio-economic conditions rather than income alone is now widely recognized. Drawing upon the recommendations for research on economic measurement problems by Stiglitz et al. (2009), the OECD identified 11 dimensions as being essential to well-being. The dimensions cover material living conditions, such as income and wealth, as well as quality of life (QOL), such as community, environment, and work–life balance. These dimensions are explored and analysed in detail by the OECD (2011). The OECD released 11 types of well-being indicators, known as the OECD Better Life Index (BLI), which covers the 34 OECD member countries and 2 non-member countries.Footnote 1 However, evaluation of overall well-being by summarizing the 11 individual indicators is left to users of the statistics.Footnote 2

The 11 well-being indicators allow us to compare countries by the comprehensive well-being of their populations. However, there is an important component missing from these indicators, namely, sustainability. While the 11 well-being indicators capture the well-being of the current population, it is also a critical issue whether current well-being can be sustained in the future.

There is much in common between, on the one hand, the literature and debates on measures of well-being and, on the other hand, those of sustainability.Footnote 3 Levels of well-being are essentially what sustainability advocates would like to sustain. Thus, it is necessary to measure well-being before discussing its sustainability. On the other hand, without sustainability concerns, a country that guarantees the current generation better life circumstances by depleting natural resources at the cost of future generations is evaluated similarly to another country that sustains current well-being in the future, as long as the well-being of the current generation is the same in both countries. This, however, is entirely unconvincing. While the OECD concedes it is necessary to introduce sustainability concerns into the BLI, this has been left as a future issue. The present study attempts to provide a practical proposal on how to measure well-being by incorporating sustainability concerns. First, we add an extra indicator of sustainability concerns to the 11 well-being indicators of the BLI. Second, we aggregate these 12 indicators by the composite indicators.

As Dasgupta (2001) and Arrow et al. (2004) advocate, the productive base of economies, which consists of produced and natural capital and intangible assets, determines the well-being of people. Thus, a smaller productive base predicts lower well-being of future generations. The World Bank’s adjusted net savings (World Bank 2011), which are considered a good measure of sustainability, capture the change in the productive base. Thus, we define the sustainability indicator by the adjusted net savings.

Composite indicators are used in order to measure multidimensional concepts, which are characterized by multiple individual indicators. Since individual indicators may trend in different directions to each other, the set of multiple indicators itself is not enough to provide an overall picture of multidimensional concepts across countries. Among a number of techniques to construct the composite indicator, the ‘benefit of the doubt’ (BOD) approach, which has received increasing attention from researchers, avoids subjectivity in the determination of weights (Mahlberg and Obersteiner 2001; Cherchye et al. 2004, 2007; Despotis 2005; OECD 2008). Under BOD, the weights are country-specific and endogenously determined such that they maximize the value of each country’s resulting composite indicator. Thus, larger weights are given to the individual indicators (topics of well-being) on which each country performs well. The core idea is that a good relative score of a country on an individual indicator shows that it considers the individual indicator as relatively important. Therefore, for international comparisons based on BOD, a country cannot attribute the lower score of its composite indicator to a harmful or unfair weighting scheme.

BOD is rooted in data envelopment analysis (DEA), which is designed to compute efficiency. DEA is an established technique to measure the relative efficiency of decision-making units based on inputs and outputs of observations in a sample. It measures the efficiency of each unit by its distance from the production frontier, which is represented by the best-practice units. Formally, BOD is tantamount to the input-oriented DEA model in multiplier form, with all individual indicators as outputs and a ‘dummy input’ equal to one for all countries.

A well-known problem associated with DEA (thus, BOD) is that it often fails to differentiate the performance of all decision-making units completely, with the result that some units are ranked equally. Poor discriminatory performance of DEA is found when the sample size is small relative to the number of inputs and outputs. This arises from the DEA procedure of constructing the production frontier based on a linear-programming technique. Kuosmanen and Johnson (2010) introduce an alternative method, namely, corrected convex non-parametric least squares (C2NLS) for computing the efficiency measure.Footnote 4 C2NLS constructs the production frontier based on quadratic programming. This new method offers certain advantages to the existing DEA. Kuosmanen and Johnson (2010) show that the estimates based on C2NLS are consistent, and asymptotically unbiased, and yield smaller mean-squared error than the corresponding DEA efficiency estimators.

In addition to these advantages, C2NLS has better discriminatory power than DEA, which allows for the complete ordering of the efficiency scores of all the units in a sample. Several other methods for improving the discriminatory power of DEA have been proposed already. Weight restriction is one of the most widely used method among them. The weights assigned to two well-being indicators should be made according to their relative importance in our application. There are two problems. First, setting common restrictions on weights removes the advantage of flexible weighting of BOD. Second, subjective judgement needs to be involved in deriving the weight restriction, since no widely accepted tool for setting a restriction is available. It is particularly difficult to reach consensus on the relative importance of different socio-economic conditions. Therefore, we believe that C2NLS is more appropriate especially in the context of constructing a composite well-being indicator.Footnote 5 As suggested by Kuosmanen and Johnson (2010) and reported by Kuosmanen et al. (2015), the C2NLS method can be used for estimating shadow prices, setting performance targets, and identifying benchmarks in a similar fashion to the standard DEA. To the best of our knowledge, this is the first study that applies C2NLS to construct composite indicators.Footnote 6

Mizobuchi (2014) applies the BOD method to construct a composite indicator which aggregates the 11 well-being indicators of the BLI. However, the problem of equal rankings among many countries associated with the BOD is established but left unresolved. It is known that the BOD method has a property that an additional indicator increases the score of the composite indicator. Thus, the more indicators the composite indicator aggregates, the more countries are likely to be ranked the highest, leading to weaker discrimination of country performance in terms of well-being. Since the procedure proposed by the present study is not subject to such a limitation, it is clearly a more appropriate tool for analysing the effect of introducing a new indicator into countries’ performances.

Other than GDP per capita, the United Nations’ Human Development Index (HDI) is the most popular measure of well-being. In addition, it is a composite indicator which aggregates fewer aspects than ours, such as income, education, and health. In the last 2 decades, a series of papers has introduced sustainability concerns into composite indicators (Desai 1995; Neumayer 2001; Costantini and Monni 2005; Ray 2014). Adjusted net savings or ecological footprints are used as sustainability indicators. While these indicators are, like ours, motivated by integrating sustainability concerns into measures of well-being or human development, their procedures of constructing composite indicators involve a simple geometric mean with ad hoc constant weight over countries, which has been adopted for the HDI. On the other hand, the C2NLS method allows for a more general and flexible weighting scheme, which assigns different and favourable weights to each country, like BOD.

The rest of this paper unfolds as follows. Section 2 discusses two approaches to construct a composite indicator. Section 3 explains the data of well-being indicators and sustainability. Section 4 computes composite indicators under different cases and compares them across countries. Section 5 examines the robustness of our findings. Section 6 concludes.

2 Methodology

The present study aggregates each of 36 countries’ 11 well-being indicators and a single sustainability indicator into composite indicators. This is to compare countries’ performance in terms of well-being, along with accounting for sustainability concerns. We adopt two approaches, namely, the BOD and C2NLS methods, to construct composite indicators. Since they are sufficiently versatile to be applicable to a variety of problems and situations, we explain these methods below in a more general setting independent of the number of countries and underlying individual indicators.

We assume there are \(K\) countries and that the well-being of people in a country \(k\) is characterized by a set of \(M\) individual indicators, \(\varvec{y}_{k} = \left( {y_{1,k} , \ldots ,y_{M,k} } \right)'\), with \(y_{m,k}\) representing the value of the \(m\)-th individual indicator of country \(k\). Suppose that there are some sustainability indicators among \(M\) indicators, constituting \(\varvec{y}_{k}\).Footnote 7 BOD aggregates these individual indicators using their weighted average. We denote a set of weights for country \(k\) by \(\varvec{\mu}_{k} = \left( {\mu_{1,k} , \ldots ,\mu_{M,k} } \right) '\), whose component \(\mu_{m,k}\) represents the weight of the \(m\)-th individual indicator. The composite indicator based on BOD for country \(c\), \(CI_{BOD,c}\), is formulated as follows:

$$CI_{BOD,c} = \mathop {\hbox{max} }\limits_{{\mu_{1,c} , \ldots ,\mu_{M,c} }} \left\{ {\sum\limits_{m = 1}^{M} {\mu_{m,c} y_{m,c} |}\sum\limits_{m = 1}^{M} {\mu_{m,c} y_{m,k} \le 1\,{\text{for}}\,k = 1, \ldots ,K;\,\mu_{m,c} \ge 0\quad {\text{for }}\quad m = 1, \ldots ,M} } \right\}$$
(1)

For the international comparison, the abovementioned procedure is repeated for every country in our sample. The weight \(\left( {\mu_{1,c} , \ldots ,\mu_{M,c} } \right)\) is determined endogenously to maximize the value of the composite indicator for country \(c\). Thus, a larger weight is assigned to an individual indicator on which the country performs well. In this procedure, a good performance of country \(c\) on an individual indicator is considered to indicate that the country prioritizes this indicator. Therefore, countries cannot excuse their poor performance by an unfair weighting scheme, because any weight other than that used for their evaluation would not improve their position. The first constraint in (1) is that every country in a sample has a resulting composite indicator smaller than one when applying the most favourable weights for the evaluated country \(c\). Thus, the resulting composite indicator for country \(c\) will be less than or equal to one.

As Mahlberg and Obersteiner (2001) graphically illustrate, an alternative interpretation of \(CI_{BOD,c}\) is possible. Given individual indicators \(\varvec{y}\) as outputs and a dummy input equal to one for all countries, \(CI_{BOD,c}\) is considered as evaluating the performance of country \(c\) in terms of its productive efficiency.Footnote 8 Strictly speaking, \(CI_{BOD,c}\) equals the distance between country c’s well-being indicator \(\varvec{y}_{c}\) and the production frontier constructed over countries’ input and output sample data by DEA. The production frontier represents the optimal practices to produce well-being. Countries whose well-being indicators \(\varvec{y}\) are on the frontier are considered the most efficient and are ranked the highest under BOD. The farther from the frontier and the closer to the origin the individual indicators of a country are, the lower its performance is evaluated.

One of the problems associated with BOD is that multiple countries are located on the production frontier and they are evaluated the highest. Thus, we fail to distinguish their performance. Such weak discriminatory power of the composite indicator based on BOD would be more evident in a case in which the observations are small relative to the number of underlying indicators. As shown in Sects. 3 and 4, we apply composite indicators for aggregating 12 or 11 indicators of 36 countries. BOD fails to fully discriminate countries’ performances in this case. Weak discriminatory power is a well-known problem of DEA. C2NLS has a decisive advantage over DEA and BOD by improving discriminatory power significantly.

C2NLS is implemented in two stages. First, the production frontier is estimated by solving convex non-parametric least squares (CNLS). In the situation of a dummy input that is equal to one, the production frontier is formulated as follows:

$$\mathop {\hbox{min} }\limits_{{\begin{array}{*{20}c} {\varepsilon_{1} , \ldots ,\varepsilon_{K} ,} \\ {\varvec{\mu}_{1} , \ldots ,\varvec{\mu}_{K} } \\ \end{array} }} \left\{ {\sum\limits_{i = 1}^{K} {\varepsilon_{i}^{2} |} \begin{array}{*{20}c} {\sum\limits_{m = 1}^{M} {\mu_{m,i} y_{m,i} + \varepsilon_{i} = 1;} } \\ {\sum\limits_{m = 1}^{M} {\mu_{m,i} y_{m,i} \ge \sum\limits_{m = 1}^{M} \mu_{m,j} y_{m,i} } } \\ {\text{for all } i, j = 1, \ldots ,K;} \\ {\mu_{m,k} \ge 0\quad\text{for}\quad\text{all}\quad m = 1, \ldots ,M\,\text{and}\quad \text{all}\quad k = 1, \ldots ,K} \\ \end{array} } \right\}$$
(2)

Let \(\varvec{\mu}_{1}^{*} = ( {\mu_{1,1}^{*} , \ldots ,\mu_{1,M}^{*} }), \ldots ,\varvec{\mu}_{K}^{*} = ( {\mu_{K,1}^{*} , \ldots ,\mu_{K,M}^{*} })\) be a solution to optimization problem (2). The composite indicator \(CI_{BOD,c}\) is the efficiency measure for country \(c\) based on DEA.Footnote 9 The corresponding efficiency measure based on CNLS, \(CI_{CNLS,c}\), is derived so that \(CI_{CNLS,c} = \sum\nolimits_{m = 1}^{M} {\mu_{m,c}^{*} y_{m,c} }\) for all \(c = 1, \ldots ,K\). Second, the efficiency measures are adjusted so that the maximum value becomes one. Then, the composite indicator based on C2NLS for country \(c\) is defined as followsFootnote 10:

$$CI_{C2NLS,c} = CI_{CNLS,c} - \left( {\mathop {\hbox{max} }\limits_{{i \in \left[ {1, \ldots ,K} \right]}} CI_{CNLS,i} - 1} \right)$$
(3)

We explain the characteristics of \(CI_{C2NLS}\) in comparison with \(CI_{BOD}\). \(CI_{C2NLS}\) share flexible weighting with \(CI_{BOD}\) in the sense that every country is allowed to adopt a favourable weight. However, the determination of weights differs between the two measures. While the weights in \(C_{BOD}\) are solved independently for each country in (1), the weights in \(C_{CNLS}\) and \(C_{C2NLS}\) are solved simultaneously for all countries in (2). Thus, the well-being indicators \(\varvec{y}\) of countries with lower \(CI_{BOD}\) have no impact on the \(CI_{BOD}\) of other countries. On the other hand, the \(C_{CNLS}\) and \(C_{C2NLS}\) of all countries are affected by the well-being indicators \(\varvec{y}\) of other countries.

Optimisation problem (1) is formulated alternatively by the following Eq. (4),Footnote 11 which helps us to explain the greater power of discrimination of \(CI_{C2NLS}\), compared with \(CI_{BOD}\).

$$\mathop {\hbox{min} }\limits_{{\begin{array}{*{20}c} {\varepsilon_{1} , \ldots ,\varepsilon_{K} ,} \\ {\varvec{\mu}_{1} , \ldots ,\varvec{\mu}_{K} } \\ \end{array} }} \left\{ {\sum\limits_{i = 1}^{K} {\varepsilon_{i}^{2} |} \begin{array}{*{20}c} {\varepsilon_{i} \ge 0;} \\ {\sum\limits_{m = 1}^{M} {\mu_{m,i} y_{m,i} + \varepsilon_{i} = 1;} } \\ {\sum\limits_{m = 1}^{M} {\mu_{m,i} y_{m,i} \ge \sum\limits_{m = 1}^{M} \mu_{m,j} y_{m,i} } } \\ {\text{for all}\quad i, j = 1, \ldots ,K;} \\ {\mu_{m,k} \ge 0\quad\text{for all}\quad m = 1, \ldots ,M\,\text{and all}\quad k = 1, \ldots ,K} \\ \end{array} } \right\}$$
(4)

Equation (4) is simply a sign-constrained variant of the CNLS problem of Eq. (2). We can interpret these equations as follows: both BOD and CNLS maximize the value of each country’s composite indicator by adopting its most favourable weight. While BOD faces a constraint that the resulting composite indicator is below one, CNLS is free from such a constraint. Thus, there are countries whose composite indicators become larger than one under the CNLS approach. In the case in which multiple countries are ranked the highest with the value of one for their composite indicators under the BOD, the application of the CNLS approach allows us to differentiate the performances of these countries.

3 Data

3.1 OECD Better Life Index

Amid growing concerns about identifying an alternative approach to measuring well-being, in 2011, the OECD launched the Better Life Initiative and released a set of 11 well-being indicators covering the 34 OECD member countries, comprising advanced and emerging economies. The data were updated in 2012 and more dimensions were added to calculate indicators. Moreover, the country coverage was expanded beyond the OECD to include Brazil and Russia. We use the most recent data covering individual indicators, which were released in 2014. The data are cross-sectional for a single year around 2011, as explained later in this subsection.

The 11 individual well-being indicators evaluate topics that the OECD considers essential to people’s well-being. Each individual indicator corresponding to each topic is based on between one and four underlying secondary indicators, which are expressed in different units, such as dollars, years, or numbers of people. To compare and aggregate values expressed in different units, the values are normalized. This normalization is performed according to a standard formula which converts the original values of the individual indicators into numbers between 0 and 10, as follows:Footnote 12

$$\frac{{{\text{value to convert}} - {\text{minimum value}}}}{{{\text{maximum value}} - {\text{minimum value}}}} \times 10$$
(5)

Within each topic, the secondary indicators are averaged with equal weight. For example, while the topic of the environment is constructed using two secondary indicators, water quality and air pollution, first, their scores are normalized in a range between 0 and 10. Then, they are aggregated as follows: \(\frac{{{\text{water quality score}}\,+\,{\text{air pollution score }}}}{2}\). The 11 individual indicators and their corresponding 24 secondary indicators are shown below.

  1. 1.

    Income

    (1.1 Household income; 1.2 Household financial wealth)

  2. 2.

    Jobs

    (2.1 Employment rate; 2.2 Personal earnings; 2.3 Job security; 2.4 Long-term unemployment rate)

  3. 3.

    Housing

    (3.1 Rooms per person; 3.2 Housing expenditure; 3.3 Dwellings with basic facilities)

  4. 4.

    Work–life balance

    (4.1 Employees working very long hours; 4.2 Time devoted to leisure and personal care)

  5. 5.

    Health

    (5.1 Life expectancy; 5.2 Self-reported health)

  6. 6.

    Education

    (6.1 Educational attainment; 6.2 Years in education; 6.3 Students’ skills)

  7. 7.

    Community

    (7.1 Social network)

  8. 8.

    Civic engagement

    (8.1 Consultation on rule-making; 8.2 Voter turnout)

  9. 9.

    Environment

    (9.1 Water quality; 9.2 Air pollution)

  10. 10.

    Safety

    (10.1 Homicide rate; 10.2 Assault rate)

  11. 11.

    Life satisfaction

    (11.1 Life satisfaction)

Among the 11 individual well-being indicators, the first 3 are categorized under material living conditions and the remaining 8 are categorized as QOL. According to the dataset released by the OECD Better Life Initiative, the data years of the underlying detailed indicators range from 2008 to 2013. Averaging them with each topic equally weighted suggests a year close to 2011. Thus, we consider that the 11 indicators of each country measure the socioeconomic situation of people around 2011.

Table 1 summarizes the statistics of the 11 well-being indicators; the complete data is provided in Table 9 of the Appendix. As the OECD (2011, 2013) finds, these tables show that while life is good in many dimensions in some countries, such as Australia, Canada, Denmark, New Zealand, Norway, and Sweden, it is significantly less so in other countries, such as Chile, Mexico, Portugal, Russia, and Turkey. While the latter group of countries is characterized by lower per capita income, except for Portugal, the former group does not necessarily comprise the richest countries.

Table 1 Descriptive statistics, OECD well-being indicators

Hereafter, we group countries based on per capita GDP to consider the link between well-being and economic development, which is well reflected in per capita GDP. There are three groups, as follows: four high-income countries with per capita GDP more than USD 45,000; 13 middle-income countries with per capita GDP between USD 30,000 and 40,000; and 19 low-income countries with per capita GDP less than USD 30,000.Footnote 13 Table 1 suggests that people’s well-being improves in many aspects as income grows. However, this is not always true, especially in some of the topics categorized under QOL, such as community, education, civic engagement, and work–life balance. In these respects, the average person in middle-income countries enjoys a better life than the average person in high-income countries. It is also noteworthy that the life satisfaction indicator, which has the largest standard deviation, differs significantly across countries.

3.2 The World Bank’s Adjusted Net Savings Dataset

Adjusted net savings, also known as genuine savings, are designated a sustainability indicator provided by the World Bank. Its theoretical grounding is the notion that sustainability requires the maintenance of a constant stock of the ‘productive base’. This captures the extended wealth, which is not limited to natural resources but also includes physical, produced, and intangible capital, such as human capital and the rule of law. Adjusted net savings are considered as the change in this total wealth over a given time period. As Dasgupta (2001) and Arrow et al. (2004) advocate, the productive base is the source of well-being of future generations. Thus, negative adjusted net savings indicate future generations fail to be given an opportunity set which is at least as large as that available to current generations.

The World Bank computes adjusted net savings as follows:

$${\text{Adjusted net savings}}\,=\,{\text{net national savings}} + {\text{education expenditure}}-{\text{natural resource depletion}} - {\text{carbon dioxide damage}}.$$

Net national savings is gross fixed capital formation minus the consumption of fixed capital, which indicates the amount of added produced capital. Education expenditure indicates the amount of added human capital, which makes up the larger share of intangible capital. Natural resource depletion is the sum of net forest depletion, energy depletion, and mineral depletion. Natural resource depletion with carbon dioxide damage captures the loss of natural capital. As the productive base consists of produced, natural, and intangible capital, adjusted net savings consists of changes in produced, natural, and intangible capital.

Instead of using the variable of adjusted net savings released by the World Bank, we re-compute the adjusted net savings, this time without including education expenditure, as follows:

$${\text{Adjusted net savings}} = {\text{net national savings}} - {\text{natural resource depletion}} - {\text{carbon dioxide damage}} .$$

There are two reasons we exclude education expenditure from the construction of the sustainability indicator in the present study. First, education expenditure is not a good measure of the changes in intangible capital.Footnote 14 Education expenditure captures changes in human capital but lacks significant parts of other intangible capital, such as the rule of law and social capital. Second, the inclusion of education expenditure leads to double counting. An increase in government expenditure on education usually improves people’s life conditions in terms of education. This might arise from smaller class sizes or more motivated teachers. The returns from educational investment are considered more immediate than changes in produced and natural capital. Since the education well-being indicator of the BLI already captures the impact of education expenditure, we exclude it from the sustainability indicator to avoid double counting. Finally, we normalize the value of adjusted net savings into the range between 0 and 10 based on Eq. (5) and this defines the sustainability indicator.

Table 2 summarizes the statistics of adjusted net savings and the ingredients thereof, along with the sustainability indicator. Net national savings, which indicate the net investment of produced capital, are much larger than the depletion of natural resources and carbon dioxide damage. The gap seems to be expanding as economies grow. While high-income countries seem to have larger natural resource depletion, once we exclude Norway as an exception, their average level of natural resource depletion is smaller than that of middle-income countries. Thus, the results show that as economies grow, natural resource depletion declines in general.

Table 2 Descriptive statistics, World Bank adjusted net savings, constant 2011 PPP dollars

4 Results

We compute composite indicators based on BOD, \(CI_{BOD}\) and C2NLS, \(CI_{C2NLS}\) in two cases: first, when 11 well-being indicators are aggregated, and second, when 12 indicators are aggregated (11 well-being indicators and 1 sustainability indicator). While the values of \(CI_{BOD}\) and \(CI_{C2NLS}\) are originally set to be between 0 and 1 in Eqs. (1) and (3), we multiply them by 10 in this section so that their values are between 0 and 10, which follows the 11 well-being indicators of the BLI. The purpose of this section is to empirically show how the change in the methodology and the inclusion of a sustainability indicator changes the score and ranking of the composite indicators.

Table 3 presents the empirical results, containing the score and ranking of composite indicators along with existing HDI and GDP per capita. To ensure comparability with composite indicators, we rescale the HDI score so that its maximum value is 10, which is the same as the BLI. We compare the distribution of \(CI_{BOD}\), \(CI_{C2NLS}\), HDI, and GDP per capita among countries. According to Table 4, the mean, the variation characterized by the standard deviation, and the range of the distribution characterized by the difference between the maximum and minimum scores are roughly similar and comparable to each other for \(CI_{BOD}\), \(CI_{C2NLS}\), and HDI. While \(CI_{C2NLS}\) and HDI each have a similar mean, \(CI_{BOD}\) has a higher mean than these two indicators. No matter which composite indicators we adopt, their scores are shown to grow as per capita income grows. However, the difference in the score of composite indicators \(CI_{BOD}\) and \(CI_{C2NLS}\) between high-income and middle-income countries is much smaller than the difference in GDP per capita.

Table 3 Composite indicators with HDI and GDP per capita
Table 4 Descriptive statistics of composite indicators

Table 3 shows that the lower discriminatory power of BOD becomes more evident in this study. More than 20 countries among 36 countries are assigned the highest value of one in both cases of aggregating 11 and 12 indicators. These are countries that have higher scores of HDI and GDP per capita among the sample. It is obvious that BOD fails to differentiate the performance of these countries and show its overall picture. Moving from BOD to C2NLS, the comparison dramatically improves and we can completely distinguish countries’ performances. Figures 1 and 2 compare two composite indicators, \(CI_{C2NLS}\) and \(CI_{BOD}\), along with the measure based on CNLS, \(CI_{CNLS}\). Since the difference between \(CI_{C2NLS}\) and \(CI_{CNLS}\) is constant for all countries, comparing \(CI_{C2NLS}\) and \(CI_{BOD}\) illustrates how \(CI_{C2NLS}\) improves \(CI_{BOD}\) in terms of discrimination power. It is shown that \(CI_{C2NLS}\) differentiates the performance of the countries that are ranked equally under \(CI_{BOD}\) by holding the ranking of other countries almost constant. Thus, while the international comparison of well-being based on \(CI_{C2NLS}\) is similar to that based on \(CI_{BOD}\), \(CI_{C2NLS}\) enables us to undertake a more detailed comparison than \(CI_{BOD}\).

Fig. 1
figure 1

Comparison of composite indicators based on BOD and C2NLS (12 indicators)

Fig. 2
figure 2

Comparison of composite indicators based on BOD and C2NLS (11 indicators)

Next, we consider how the inclusion of the sustainability indicator changes the composite well-being indicators by comparing the two cases. Since the present study deals with sustainability as just 1 among 12 well-being topics, the impact of the inclusion of the sustainability indicator is rather modest and it does not change the score and ranking of composite indicators dramatically. Table 4 shows that integrating the sustainability indicator slightly raises the values of the composite indicators and tightens their distribution on average.

As Figs. 1 and 2, and Table 3 show, Estonia, Israel, Korea, Russia, and Sweden are the five countries whose values or rankings of their composite indicators rise the most significantly in the sample.Footnote 15 All five countries except Sweden are relatively low-income countries and have large adjusted net savings compared to their lower socio-economic indicators. Thus, the inclusion of the sustainability indicator causes their ranking to rise. Korea significantly raises its ranking in both composite indicators, which reflects that the value of the sustainability indicator is much higher than the values of the other well-being indicators. In addition, Australia, Finland, Germany, Greece, and Japan are the five countries whose values or rankings of their composite indicator decline the most significantly in the sample. All are relatively high-income countries except Greece and have smaller adjusted net savings compared to their higher socio-economic indicators. While only Australia and Finland show lower composite indicators after inclusion of the sustainability indicator, the other three countries lose their ranking under \(CI_{C2NLS}\).

Table 5 confirms two points: the usefulness of \(CI_{C2NLS}\) in the present application and the relatively modest impact of the inclusion of a sustainability indicator. High correlations between \(CI_{BOD}\) and \(CI_{C2NLS}\) are found in either case of aggregation over 12 or 11 indicators. This suggests that \(CI_{C2NLS}\) differentiates the performance among countries ranked the highest under \(CI_{BOD}\) while hardly changing the ranking of other countries. High correlations between composite indicators aggregating 12 indicators and those aggregating 11 indicators are also observed. This shows that an additional sustainability indicator does not significantly change the score and ranking of composite indicators aggregating 11 indicators.

Table 5 Correlation among composite indicators

In addition, it is shown that all composite indicators and HDI, which share a similar pattern of distribution, are highly correlated with each other. This contrasts with relatively lower correlation between composite indicators and GDP per capita. The correlation becomes even lower when we integrate the sustainability indicator into other well-being indicators by aggregating 12 indicators. The quest for an alternative welfare measure stems from an acknowledgement of the limitations of GDP per capita as a welfare measure. Judging from the picture of well-being across countries drawn by \(CI_{C2NLS}\), GDP per capita is even more problematic as a measure of sustainable well-being than as a measure of current well-being.Footnote 16

5 Robustness Check and Discussion

The values of composite indicators adopted by the present study can be considered as productive efficiency. Composite indicators aggregating 12 or 11 well-being indicators of 36 countries corresponds to efficiency measure based on DEA in the case of 12 or 11 outputs and 1 input with 36 observations. Recently, the asymptotic property of efficiency score in DEA has been widely examined by explicitly incorporating data-generating process.Footnote 17 Studies, such as Korostelev et al. (1995), show that a much larger number of observations are necessary as the number of inputs and outputs increase, in order to avoid large statistical bias and imprecise estimation with a larger confidence interval. Our 36 observations are relatively small based on their standards.

Therefore, there is a possibility that the present study suffers from the so-called curse of dimensionality, so that the empirical result in the previous section is merely an artefact of statistical noise. In this section, we verify the robustness of our analysis by employing a model with a much smaller number of inputs and outputs. By reducing the number of inputs and outputs, statistical bias associated with composite indicators can be reduced. This allows us to evaluate more accurately the methodological advancement proposed in the present study, such as the integration of the sustainability indicator as well as the adaptation of C2NLS.

Instead of directly aggregating 12 indicators into a single number, the construction of a composite indicator of sustainable well-being is implemented in two stages in this section. First, we aggregate 11 indexes into two sub-aggregates based on simple averaging following the classification of the BLI: an indicator of material living conditions and an indicator of QOL.Footnote 18 Second, we aggregate these two sub-aggregates and the sustainability indicator into a composite indicator. Focusing on the second stage of aggregation, we compare two aggregation procedures, BOD and C2NLS, as well as investigate the impact of the integration of the sustainability indicator.

Table 6 shows that composite indicators derived in the second stage are highly correlated with the original composite indicators that directly aggregate 12 or 11 indicators.Footnote 19 This suggests that the composite indicators constructed by both approaches evaluate countries’ relative performances similarly. Thus, as verified below, the conclusion drawn from the empirical analysis in the previous section is expected to still be valid in the setting of the present section, which is characterized by less stochastic noise.

Table 6 Correlation among composite indicators aggregating sub-aggregates

Table 7 summarizes \(CI_{BOD}\) and \(CI_{C2NLS}\) in two cases, compared with HDI and GDP per capita. The two composite indicators have a smaller mean and larger standard deviation than HDI, which is different from the results in the previous section.Footnote 20 However, the impact of the adoption of \(CI_{C2NLS}\) is found to be similar to the previous section. While the scores of composite indicators become larger as per capita income grows, the difference in the score of \(CI_{BOD}\) and \(CI_{C2NLS}\) between high-income and middle-income countries is much smaller than the difference in GDP per capita. The stronger discriminating power of \(CI_{C2NLS}\) is also verified in the case of the relatively small number of inputs and outputs. While four countries are ranked equally under \(CI_{BOD}\) applied to two indicators, six countries are ranked equally under \(CI_{BOD}\) applied to three indicators. Their performances are completely differentiated under \(CI_{C2NLS}\). This lead to the lower mean of \(CI_{C2NLS}\) than \(CI_{BOD}\).

Table 7 Descriptive statistics of composite indicators aggregating sub-aggregates

Next, we consider how the inclusion of the sustainability indicator changes the composite well-being indicators. Since the present section deals with 2 well-being sub-aggregates instead of 11 indicators, the impact of the inclusion of the sustainability indicator becomes relatively large compared with the previous section. While the mean and standard deviation of \(CI_{C2NLS}\) hardly change in the previous section, both decrease by around 5 % in the present section, as shown in Table 7.

However, Table 8 shows significantly high correlations between composite indicators aggregating three indicators and those aggregating two indicators. Thus, the inclusion of the sustainability indicator still has little impact on the evaluations on countries’ relative performances in terms of overall well-being, as we found in the previous section. In addition, the inclusion of the sustainability indicator assures the usefulness of \(CI_{C2NLS}\). High correlations between \(CI_{BOD}\) and \(CI_{C2NLS}\) are found in either case of aggregation over three or two indicators. This suggests that \(CI_{C2NLS}\) differentiates the performance among countries ranked the highest under \(CI_{BOD}\) without certainly changing the ranking of other countries.

Table 8 Correlation among composite indicators of different procedures

Thus, the present section verifies two main conclusions drawn in the previous section under the setting of a smaller number of underlying indicators: the usefulness of \(CI_{C2NLS}\) and the relatively modest impact of the inclusion of a sustainability indicator. However, there is a difference between the empirical results of the two sections, as shown in Table 8. While both composite indicators show higher correlation with HDI than GDP per capita, the inclusion of a sustainability indicator has a different impact on composite indicators. It lowers the correlations with HDI and GDP per capita in the previous section but raises those in the present section.Footnote 21 Which finding is a more accurate picture of reality? To answer this question, we consider it necessary to obtain larger observations or to investigate the method in the first-stage aggregation further.

6 Conclusion

Well-being is a multidimensional concept. The OECD recently specified 11 topics that are essential to people’s well-being and released 11 corresponding well-being indicators. However, the OECD leaves the aggregation of the data to the user and a sustainability indicator is not included among the 11 indicators. Thus, the present study introduces an additional sustainability indicator from the World Bank’s adjusted net savings and aggregates the 11 well-being indicators and the sustainability indicator using composite indicators. We adopt two composite indicators based on the BOD and C2NLS approaches. Unlike HDI, both approaches aggregate individual indicators by investigating country-specific weights that favour each country.

The composite indicator based on BOD is now a standard tool for evaluating multifaceted concepts, such as well-being. However, since more than half of countries are ranked the highest under the application of BOD in the present study, BOD fails to distinguish their performances. The composite indicator based on C2NLS we first introduced here gives a similar cross-country ranking to that based on BOD. Moreover, it even allows us to differentiate completely the performance of countries that are ranked equally under BOD. Thus, C2NLS enables a complete cross-country comparison of overall well-being, improving on the BOD approach.

We quantify the impact of the inclusion of the sustainability indicator into other well-being indicators by using composite indicators based on C2NLS. The inclusion of the sustainability indicator has a rather modest effect and does not significantly change the score and ranking of composite indicators for many countries. However, the composite indicators of some countries are affected significantly by integrating the sustainability indicator. Each of these countries has a large gap between its sustainability indicator and other well-being indicators. While the composite indicators of countries whose sustainability indicator is much larger than their other well-being indicators increase their ranking, such as Korea, the composite indicators of countries whose sustainability indicator is much smaller than their other well-being indicators lose their ranking, such as Australia.

Our results verify that C2NLS is an indispensable tool for integrating sustainability concerns into a composite well-being indicator. However, the greater discriminatory power of the composite indicator based on C2NLS compared with BOD does not mean that the former more accurately captures the state of the sustainable well-being of each country than the latter.Footnote 22 Our proposal to introduce C2NLS is justified merely from a practical standpoint of being able to completely distinguish the level of sustainable well-being among countries. Future research should investigate the theoretical framework for evaluating composite indicators.Footnote 23