1 Introduction

Measuring individuals’ progress and well-being involves expanding the framework of macro-economic indicators traditionally used as measures of growth. Since the end of the 20th century, several attempts have been made to develop indicators that provide a broader overview of factors associated with economic growth and to go beyond the Gross Domestic Product (GDP). In general, the term quality of life (QoL) encompasses the overall life experience of an individual and its personal well-being (Malkina-Pykh and Pykh, 2008). In this paper, we develop a composite regional quality of life indicator, making use of a pre-defined set on individual indicators (measuring the main quality of life dimensions) with two main features. First, reference levels can be considered for each single indicator, and the results are expressed in terms of the relative position of each region with respect to these levels. In particular, this makes it possible to assess the regional quality of life performance at different geographical scales. Second, both compensatory and non-compensatory composite indicators are calculated. This way, apart from obtaining an overall quality of life measure, the weakest performances of each region are easily identified, which is a valuable information for policy-makers. To the best of our knowledge, such a methodology has never been used in the field of quality of life assessment.

Originally, the primary concern to develop new QoL metrics was based on the particular point that they should be people-centred. The first attempt to build a measure looking at economic performance and at diverse welfare experiences was in 1990 with the United Nation’s Human Development Index (HDI). In 2009, a further step was taken by the Commission on the Measurement of Economic Performance and Social Progress, headed up by the Nobel laureate Joseph Stiglitz, intending to propose alternatives to GDP as measures of well-being and social progress (Stiglitz et al., 2009). Following the recommendations of this report, since 2015 the OECD has provided the Better Life Index as a contribution to measuring well-being of OECD countries and regions (Durand, 2015).

Since that early attempt in 1990, many QoL indices have been developed. On the basis of providing a single number to assess the performance of complex and multidimensional phenomena, composite indicators are widely used on a wide range of topics, including well-being or quality of life (Nardo et al., 2008; El Gibari et al., 2019). They consist of aggregating a number of individual indicators, this measuring individual aspects regarding QoL, into a single composite measure. Some survey papers gathering composite measures have been published Hagerty et al. (2001), Malkina-Pykh and Pykh (2008), Costa (2015) review and evaluate QoL composite indicators according to different desirable features, although none of them goes into technical details about the different methodologies used. Only the latter mentions the “exact linear combination of the indicators” as the way to aggregate the individual indicators. Booysen (2002) reviews general composite indicators of development, and identifies 4 ways of normalising the single indicators: not scaling (if they are originally measured in a common scale), z-scores (based on mean and standard deviation), transforming into ordinal scales and linear normalisation (typically, range normalization measuring the position between the minimum -0- and the maximum value -1- of all the units considered). Freudenberg (2003) surveys composite indicators of country performance and identifies other normalisation methodologies based just on the maximum value or the mean value (the normalisation is carried out by dividing each original value by the corresponding reference level). With respect to the aggregation process, most of the methods reviewed use the classical linear weighted average, and a few use functional relationships (like principal component analysis). All these aggregations can be regarded as compensatory, that is, bad values of certain indicators can be compensated by good values of others and therefore, weaknesses can remain unnoticed in the final composite indicator.

In fact, most of the existing methodologies for developing QoL composite indicators used the classical weighted average (additive and, thus, compensatory) as a means to aggregate the indicators. This is for example, the case in Royuela et al. (2003), Durand (2015), Marchante and Ortega (2006), Lagas et al. (2015). Other compensatory methods can be found in Greyling and Tregenna (2017); Patil and Sharma (2020) (principal component analysis), or Karagiannis and Karagiannis (2020) (Benefit of Doubt). In Ivaldi et al. (2014), three types of aggregation are considered: weighted average, factorial (both of them compensatory), and a Borda type one, based on voting theories, that builds rankings in a non compensatory fashion. A similar approach (Condorcet-based) is proposed in Goerlich and Reig (2021). With respect to the normalisation scheme, all of them use one of the previously mentioned ones.

As to the methodological discussion, most criticisms to the use of composite indicators relate to the fact that they provide a “big picture” that can lead policy-makers to draw simplistic conclusions (Saisana and Tarantola 2002; Greco et al. 2018). During the last two decades, the use of multi-criteria decision-making tools to construct composite indicators has risen in a wide spectrum of fields contributing with methodological alternatives to the criticisms in the normalisation, weighting and aggregation stages. In particular, two issues are specially relevant. With respect to the normalization, it is important to notice that, rather than providing absolute measures, we humans are able to measure by comparison, that is, expressing the measurement in terms of a reference unit. Therefore, distance-based normalisations that make use of reference levels for each single indicator are useful, because the results obtained are expressed in terms of the relative position of each unit studied with respect to these levels. In the case of the QoL indicators, these reference levels can be established by experts or policy-makers, who define what is admissible and/or desirable for each indicator. Alternatively, they can be statistically set, taking into account a given number of observations. This allows the possibility to measure QoL at different geographical scales, by comparing the performance of a given territory with these of the other ones belonging to the geographical scale chosen. Regarding the aggregation issue, it has been shown (see, e.g. El Gibari et al. 2021) that the joint use of compensatory and non-compensatory schemes produces additional information about possible improvement lines that can be extremely useful for policy-makers.

In El Gibari et al. (2019), a survey of composite indicators using multicriteria methods is carried out. Some reference based methodologies are identified, among which we can mention the TOPSIS approach, which uses the best (ideal) and worst (nadir) levels as reference levels (see, e.g. Boggia et al. 2018), Goal Programming, which uses a target value for each indicator (Blancas et al. 2010), and the reference point scheme. With respect to the latter, Ruiz et al. (2011) defined the double reference point scheme, where two reference levels (reservation and aspiration) were used for each indicator. This methodology was later on generalized in Ruiz et al. (2019), allowing the use of any number of reference levels: the Multiple Reference Point Weak-Strong Composite indicator (MRP-WSCI), and successfully applied in other fields, like social responsibility (Cabello et al. 2014), university performance (El Gibari et al. 2018), or ease of doing business (Ruiz et al. 2018). Out of the methods reported in El Gibari et al. (2019), only in Boggia et al. (2018) and Ruiz et al. (2011) it was possible to obtain both compensatory and non-compensatory composite indicators, which is also one of the main features of the MRP-WSCI scheme. Later on, Mazziotta and Pareto (2020) proposed a methodology where the non-compensatory and the compensatory indicators form an interval performance measure. Nevertheless, no reference levels are allowed in this methodology.

In recent years, a new approach to regional development in the European Union is emerging to identify territorial challenges and assist governments in the improvement of policies. The importance of developing metrics at different territorial levels is becoming ever more relevant, as they can capture different patterns that could be hidden at the national level (Garcia-Bernabeu et al., 2020). This new understanding of benchmarking quality of life has been introduced into the EU agenda to set out specific regional needs in a common framework for all EU regions. Among other reasons, this is related to the fact that specific policies could be more effective when designed at the regional level. For the sake of complementing the GDP-measure, the European Union has developed its own framework for assessing the quality of life (QoL) in EU countries (European Commission, 2020). The expert group of researchers coordinated by Eurostat has developed a scoreboard based on 8+1 dimensions: Material living conditions, Productive or main activity, Health, Education, Leisure and social interactions, Economic security and physical safety, Governance and basic rights, Natural and living environment, and Overall experience of life. Thus, the Quality of Life framework adopted by the European Commission reflects that the QoL’s measurement is a multidimensional phenomenon. A relevant contribution to this field is the ESPON QoL - Quality of life measurements and methodology project, which aims to produce evidence about the challenges, achievements development trends of European regions and cities in relation to quality of life.

Since the quality of life is a multidimensional phenomenon, the purpose of this article is to address the complex issue of constructing a composite indicator to provide an overview of regional quality of life, by choosing a set of reference levels that allow territorial comparisons. More precisely, we present a different perspective on the assessment of the regional QoL, using the aforementioned Multi-Reference Point-based Weak Strong Composite Indicator (MRP-WSCI) approach and further enhance the quality of the sub-national analysis. In particular, we address two research questions that focus on developing a regional QoL composite index. The first question examines how the decision maker could provide preferential information using reference levels to intuitively define different performance intervals for each indicator. As a result, the information provided is much richer, and users can easily interpret the regional QoL composite index’s meaning. Another advantage of such reference levels is to facilitate comparing the performance across different geographical scales. The second question concerns a fundamental issue related to the aggregation of individual indicators, such as compensability. Two different aggregations are proposed: the weak Regional QoL indicator, allowing for full compensation among the single indicators, and the strong Regional QoL indicator, not allowing for any compensation, which provides an additional layer of information for policy-making. In our proposal, in contrast to other works that develop regional quality of life indicators, the use of reference levels makes it easier to compare a region with respect to its own country or with respect to other groups of countries, such as the European Union. In addition, strong and weak composite indicators are developed to measure the quality of life, in order to highlight the weaknesses of each region and to design strategies according to its specific needs. We consider that this comparative measurement approach can be a valuable tool to deliver guidance for regional and national level policy-makers.

To gain a better insight into these issues, we study the Spanish regions to assess their quality of life as a test case, using two different geographical scales: the Spanish and the European ones. For this purpose, we use data from the Spanish National Institute of Statistics (INE) for the year 2018. The proposed Regional QoL composite indicator is developed to paint a comprehensive picture of the quality of life in 19 Spanish regions, and to provide warning signals to regional and national policy-makers on the areas where the dimensions of quality of life need further improvements.

With these considerations in mind, this paper is organised in five sections (including the introduction) and an Appendix (containing some supplementary material). In Sect. 2, the method for constructing the regional QoL composite indicator is presented. In Sect. 3, the application to Spanish regions is shown by using a data set of individual indicators of QoL for the year 2018. Next, in Sect. 4, the results of this application are discussed. Finally, Sect. 5 draws some conclusions.

2 Methodology: Reference-Point Based Approach for Regional QoL Assessment

To develop the regional QoL composite index, we assume the European Commission theoretical framework. Therefore, we regard the system of indicators that have been previously selected as already validated and we assume the indicators to have reliable information to be assessed. From the baseline of this theoretical QoL framework, we develop a step-wise methodological approach for constructing the regional QoL composite index using reference levels at different territorial scales. Of course, this approach can be applied to any other system of indicators if so wished. Let us describe in this section how we adapt the MRP-WSCI methodology (Ruiz et al., 2019) to build a regional QoL composite index through the following steps:

  1. Step 1

    Let us denote by n the number of regions \((i=1,2, \dots , n)\), by m the number of QoL indicators \((j=1,2, \dots , m)\) and by \(a_{ij}\) the value of indicator j for region i. Therefore, the \((n\times m)\) decision matrix \(A = (a_{ij})\) contains the whole data set. As in most cases, certain indicators are grouped together to capture a more general idea (for example, to measure Safety, the analyst proposes two indicators; perception of physical safety, and crime rates in the region). Thus, the number of such grouping/dimensions is denoted by l. For each \((k=1,2, \dots , l)\) we denote by \(J_k\) the set containing the indices of the indicators belonging to group/dimension k.

  2. Step. 2

    The methodology proposed is suitable for the case when certain levels that define performance intervals for each indicator can be provided. For example, we may think that for a given indicator, values over 10 are good, values between 5 and 10 are fair and values under 5 are poor. This is what we call reference levels. In general, it will be assumed that such v levels are given for each indicator j: \(\rho _j^{1},\dots ,\rho _j^{v}\). These values, together with the minimum, \(\rho _j^{0}\), and maximum, \(\rho _j^{v+1}\), values that indicator j can feasibly obtain, form a partition of the range of possible values of the indicator, defined by the \((v+2)\) dimensional vector \(\varvec{\rho }_j=(\rho _j^{0}, \rho _j^{1},\ldots ,\rho _j^{v},\rho _j^{v+1})\). This vector determines, as mentioned before, certain performance sub-ranges for the indicator. These reference levels can be established in an absolute way, as in the example above, or in a relative way, in order to obtain comparative scores. In our case, their choice will be motivated by the statistical spread observed in the indicator at a geographical scale G, typically obtained using different percentiles. For this reason, the reference vector obtained for the geographical scale G will be denoted as \(\varvec{\rho }_j^G=(\rho _j^{G,0}, \rho _j^{G,1},\ldots ,\rho _j^{G,v},\rho _j^{G,v+1})\). For each geographical scale chosen, the results will indicate the relative position of each region with respect to all the regions contained in G, for the given indicator.

    There will be as many families of vectors of reference levels as territorial scales for which the QoL is to be analysed. Thus, if we take, for example, the country of Italy as the geographical scale for comparison purposes, then, \(G=IT\) and the vector of reference levels will be \(\varvec{\rho }_j^{IT}\), whereas if the territorial scale is the European Union, \(G=EU\), the vector of reference levels will be denoted by \(\varvec{\rho }_j^{EU}\).

  3. Step. 3

    All the values of A need to be brought down to a common scale, according to the previously defined vectors of reference levels \(\varvec{\rho }_j^G\). We shall denote the values that define this common scale by \(\varvec{\alpha }=(\alpha ^0, \alpha ^1,\ldots ,\alpha ^v,\alpha ^{v+1})\). Notice that, regardless of the indicator or the territorial scale, the vector \(\varvec{\alpha }\) should be the same. For technical reasons, it will be assumed that \(\alpha ^t >0\), \(t=1,\dots , v\).

  4. Step. 4

    Compute the normalized matrices for each geographical scale, \(S^G =(s_{ij}^G)\), using the following achievement scalarizing function:

    $$\begin{aligned} s_{ij}^G= & {} s_j(a_{ij},\varvec{\rho }_j^G) = \alpha ^{t-1} + \frac{\alpha ^{t}-\alpha ^{t-1}}{\rho _j^{G,t}-\rho _j^{G,t-1}} (a_{ij} - \rho _j^{G,t-1})\nonumber \\&\qquad \qquad \text {if } a_{ij} \in [\rho _j^{G,t-1}, \rho _j^{G,t}]. \end{aligned}$$
    (1)

    Therefore, the achievement function \(s_{ij}^G\) for indicator j is a piecewise linear function that takes values between \(\alpha ^{t-1}\) and \(\alpha ^{t}\) if the region achieves, for indicator j, values included in the interval \([\rho _j^{G,t-1}, \rho _j^{G,t}]\) of the reference level vector. According to the previous step, if we define two territorial scales, we have to compute two normalized matrices, namely, a \(S^{IT}\) matrix for the country using the \(\varvec{\rho }_j^{IT}\) vector of reference levels and a \(S^{EU}\) one, where we use the European reference levels defined in \(\varvec{\rho }_j^{EU}\).

  5. Step. 5

    Determine the weighting rule. For each dimension k, assign weights \(\mu _{j}\), for every \(j \in J_k\), to the indicators belonging to dimension k, and assign a weight \(w_k\) to the corresponding dimension. The choice of weights has a significant effect on the overall ranking of regions. Each method has advantages and disadvantages and needs to be justified by the composite indicator developer (Greco et al., 2018). Nevertheless, equal weights is the most common weighting scheme which appears when constructing composite indicators (OECD, 2008), provided that the system of indicators has been designed accordingly.

  6. Step. 6

    Define the aggregation rule from indicators to dimensions (First Level). At this stage, we can obtain two types of indicators. The weak composite indicator of region i in dimension k, \(\delta _{ik} (G)^w\), allowing for full compensation, is obtained using the rule based on weighted additive aggregation. On the other hand, the strong composite indicator, \(\delta _{ik} (G)^s\), does not allow for any compensation and it represents the worst performance of the region in dimension k. The general forms of these two indicators for the geographical scale G are:

    $$\begin{aligned} \delta _{ik} (G)^w= & {} \sum _{j \in J_k} \mu _{j} s_j \left( a_{ij},\varvec{\rho }_j^G \right) , \end{aligned}$$
    (2)
    $$\begin{aligned} \delta _{ik} (G)^s= & {} \min _{j \in J_k} \left\{ s_j \left( a_{ij},\varvec{\rho }_j^G \right) \right\} . \end{aligned}$$
    (3)

    This way, the weak indicator \(\delta _{ik} (G)^w\) provides an overall measure of the performance of region i in dimension k, in the scale defined by vector \(\varvec{\alpha }\). Therefore, it can be interpreted as the position of the region with respect to hypothetical reference levels for the dimension. On the other hand, the strong indicator \(\delta _{ik} (G)^s\) points out the worst indicator of the dimension for region i.

  7. Step. 7

    Define the rule to aggregate the dimensions (Second Level). Thus, we obtain the weak (\({\text {R-QoL}}_{i} (G)^W\)) and strong (\({\text {R-QoL}}_{i} (G)^S\)) regional QoL composite indices for each geographical scale G as follows:

    $$\begin{aligned} {\text {R-QoL}}_{i} (G)^W= & {} \sum _{k=1}^{l} w_k \delta _{ik}(G)^w, \end{aligned}$$
    (4)
    $$\begin{aligned} {\text {R-QoL}}_{i} (G)^S= & {} \min _{k=1, \dots , l} \{\delta _{ik}(G)^w\}. \end{aligned}$$
    (5)

    In this case, the global weak indicator \({\text {R-QoL}}_{i} (G)^W\) provides an overall measure of the QoL of region i, while the global strong indicator \({\text {R-QoL}}_{i} (G)^S\) points out its worst dimension.

With the aim of comparing the results at different geographical scales, we propose a new index which consists in computing the ratio of the \({\text {R-QoL}}\) composite index measured with respect to the first geographical scale to the \({\text {R-QoL}}\) calculated using the second geographical scale. The ratio is called "Regional Comparative Advantage (RCA) index” and is defined as follows:

$$\begin{aligned} RCA_i=\frac{R-QoL_{i} (G_1)}{R-QoL_{i} (G_2)} \end{aligned}$$
(6)

Here, \(R-QoL_{i}\) can be the corresponding weak or strong composite indicator. Besides, it is possible (although not very likely) that \(R-QoL_{i} (G_2) = 0\). In this case, \(RCA_i = 1\) if \(R-QoL_{i} (G_1) = 0\) and \(RCA_i > 1\) otherwise. Thus, if \(RCA_i>1\) the region i has a comparative advantage at the geographical scale \(G_1\), namely, the region achieves a higher overall score when it is measured using the first geographical scale. However, if \(RCA_i<1\), this region has a comparative advantage at the geographical scale \(G_2\), that is the overall score is better when the region is assessed using the reference values at the second geographical scale.

3 Monitoring Regional Quality of Life in Spain

In 2019, a monitoring framework for regional quality of life assessment was presented by the National Institute of Statistics (INE) in Spain (INE, 2020) following the EU Quality of Life indicators collection (European Commission, 2020). It also makes a proposal for a composite indicator using the Adjusted Mazziota-Pareto Index (AMPI) methodology (Mazziotta and Pareto, 2016) and analysing the evolution of the regions in the period 2008–2018. The European statistics of Quality of Life from the Eurostat database and the INE statistics at the regional level for Spain are considered the core instrument for this analysis. The list of indicators is the one established by INE, following the recommendations of the main guidelines of the indicators scheme defined by Eurostat on the basis of the Quality of Life Expert Working Group. Thus, the number of indicators included in the publication (17) is the same that was proposed by Eurostat as “principal indicators”, in order to synthesise the analysis of the different dimensions that make up the quality of life of individuals into a not very large but consensual number of indicators. The sample used in our research to construct the decision matrix comprises 19 regions and a set of 17 principal indicators for year 2018 grouped in 9 dimensions as shown in Table 1.

Table 1 List of principal indicators by dimension and their direction

According to the second step, we consider two geographical scales to define the reference vectors. First, we assess the regional QoL using Spanish reference values (\(G1=ES\)) from the same database. Next, we consider European reference values (\(G2=EU\)) using information from Eurostat database. In both cases, we consider statistical values to determine four performance intervals including the minimum, the 30 and 70 percentiles and the maximum. For year 2018, the resulting reference values are listed in Tables 2 and 3 respectively.

Table 2 Reference levels of the QoL principal indicators in 2018 at Spanish geographical scale \(G1=ES\)
Table 3 Reference levels of the QoL principal indicators in 2018 at European geographical scale \(G2=EU\)

Having defined the common scale, \(\alpha =(0,1,2,3)\), for the achievement functions, we compute by applying 1 the normalized matrices \(S^{ES}\) and \(S^{EU}\) which are listed in the Appendix as Table 10 and 11.

To arrive at a regional QoL composite index, the scalarized values need to be weighted and aggregated. As to the weighting scheme, we consider equal weights as the most suitable option, due to its simplicity and to focus the analysis on the aggregation procedure. Besides, the equal weights system is also the one used by the INE when applying the AMPI method.

The aggregation for the composite index consists of two steps.

  • First Level: from indicators to dimensions. The scalarized values were combined for each QoL dimension to compute, from Eq. 2, the \(\delta _{ik} (ES)^w\) and \(\delta _{ik} (EU)^w\) allowing for full compensability. For the non-compensability approach, the corresponding \(\delta _{ik} (ES)^s\) and \(\delta _{ik} (EU)^s\) are derived from Eq. 3. Tables 12 and 13 in the Appendix display the the values for both geographical scales.

  • Second Level: aggregating the dimensions. Finally, and applying Eqs. 4 and 5 we obtain the weak and strong QoL composite indices \({\text {R-QoL}}_{i} (ES)^W\), \({\text {R-QoL}}_{i} (EU)^W\), \({\text {R-QoL}}_{i} (ES)^S\) and \({\text {R-QoL}}_{i} (EU)^S\). The corresponding values and rankings are shown in Tables 4 and 5.

4 Discussion

In this section, we comment on the results obtained for the Regional QoL composite indices, computed as explained in Sect. 3 by adapting the MRP-WSCI approach, and using Spanish and European reference levels such as the geographical scales G1 and G2, respectively. First, we obtain the overall ranking for different compensation degrees. Second, we provide a more in-depth analysis of comparative performances when the regional data are aggregated for each dimension. Finally, we make some further considerations about the practical implementation of the methodology.

4.1 Overall Weak and Strong Regional Quality of Life Composite indices

In the most comprehensive approach to assess regional quality of life, we distinguish two main results depending on the choice of the aggregation rule. For the weak aggregation rule, Table 4 shows the \(R-{QoL}^W\) composite index and the ranking position for Spain’s regions assuming full compensability at the second level of aggregation and for the two geographical scales. From this point of view, a poor performance in some indicators is compensated by sufficiently high values in other indicators. In the last column, the Regional Comparative Advantage index (RCA) is also reported. The ranking is headed up by the regions of Aragón, Illes Balears and Cantabria at both geographical scales. In contrast, Murcia, Ceuta and Andalucía occupy the lowest positions in the ranking. Notice that, for all regions, the score is higher when the individual indicators are aggregated taking European reference levels since the RCA is lower than one. For the best-positioned regions, the measurement of quality of life remains a similar level despite the choice of the geographical scale and, the value of the RCA is close to 1. However, a decreasing trend in the RCA is observed as the position in the ranking of the regions decreases.

Table 4 Weak R-QoL composite index for Spain’s regions (2018 year) using the ES and EU geographical scales

For the strong aggregation rule, Table 5 displays the results when compensability is not allowed at the second level of aggregation. The top regions when using the Spanish reference levels are Cantabria, Comunidad Valenciana and Cataluña, whereas for the EU geographical scale, top regions are Comunidad Valenciana, Comunidad Foral de Navarra and Illes Balears. In the same way, the RCA is computed in the last column. Cantabria, La Rioja, País Vasco, Aragón and Cataluña achieve a \(RCA>1\) involving a comparative advantage using Spanish reference levels. As a result, for these regions, the rank position is downgraded for the European’s reference levels.

Table 5 Strong R-QoL Composite Index for Spain’s Regions (2018 year) Using the ES and EU Geographical Scales

As an illustrative example, we select Comunidad Foral de Navarra to comment on the corresponding results using both the ES and EU geographical scales. As shown in Table 4, for the weak perspective, the region occupies the fourth and fifth position, respectively, with scores of 1.86 (ES) and 1.89 (EU) in the overall performance. It can be seen that, in this case, the RCA is close to 1, which means a similar overall (compensatory) performance using ES or EU reference levels. On the other hand, from a strong perspective, the region gets position six with a score of 0.85 (ES), which points out the poor behaviour of the Safety dimension. Significantly, it improves up to reach the second position with a score 1.29 (EU), as shown in Table 5. In this case, when using European levels, the worst behaviour corresponds to the Leisure and Social Interaction dimension. For this reason, the RCA, with a value of 0.66, reflects a comparative advantage in the performance of the region when EU reference levels are considered. A further analysis is therefore needed by looking at each dimension separately, which will be carried out in the next subsection.

4.2 Assessing Regional Quality of Life by Dimension

By looking at the nine dimensions separately, the \(\delta _{ik}\) index can indeed reveal patterns which do not directly emerge by looking at the overall composite index. Thus, an analysis by dimension can provide more additional insights (see Tables 12 and 13 in the Appendix). As stated in the Handbook of Constructing Composite indicators (OECD, 2008), the presentation of composite indicators and their visualization affects both, the relevance and the interpretability of the results.

When assessing regional QoL by dimension, we considered the use of charts including the weak and strong performance to highlight those dimensions that require particular attention. Longer bars indicate better outcomes, and for each bar, we employ a different colour which is faded for the weak perspective. Further, given the values of the scale used, we consider three levels of QoL dimension’s performance indicating a good performance if the dimension lies between Level 2 and Level 3, moderate performance if the dimension score is situated in the area comprised between Level 1 and Level 2 and bad performance if the score drops below Level 1. For each region, a summary table of the data is provided together with the graph. The complete results for the 19 Spain’s regions at the geographical scales ES and EU can be downloaded by clicking on the following links: Spanish R-QoL(ES) by dimension, and Spanish R-QoL(EU) by dimension. Let us continue with the example of Comunidad Foral de Navarra.

Figure 1 plots the regional QoL by dimension of Comunidad Foral de Navarra using Spanish reference levels. Looking at each dimension, Comunidad Foral de Navarra reflects a good performance for the weak indicator across most issues, such as Education (2.52), Material Living Conditions (2.51), Health (2.35), Overall Satisfaction (2.23) and Governance (2.18). We draw special attention to the issue of Material Living Conditions, in which the gap between the weak and strong scores is explained by the varying levels of indicators within this dimension. In particular, special mention should be paid to the low level of the indicator corresponding to Severe material deprivation (1.30). Next, Employment (1.52) and Environment (1.51) show a moderate compensatory performance and significant discrepancies between the weak and the strong perspectives, coming from a low satisfaction with the employment and the pollution, grime and other environmental problems. Finally, as previously commented, it is remarkable the bad performance of the Safety dimension (0.85) and the particularly bad performance from a strong perspective (0.00), which reveals that this dimensions gets the worst assessment across all regions in one of the Safety indicators (in this case, Crime, violence or vandalism in the area).

Figure 2 plots the regional QoL by dimension of Comunidad Foral de Navarra using EU reference levels. Within the EU geographical scale, Comunidad Foral de Navarra stands out in Health (2.59), Education (2.56), and Overall Satisfaction (2.49). We also see a moderate performance in the remaining dimensions, as all of them are included in Level 2 with a score between 1 and 2. However, it is remarkable the gaps between the weak and strong perspectives for Employment, Safety and Environment. This difference, again, leads to an analysis of which indicator is responsible for each dimension’s under-performance. As can be seen, the compensatory performance in the Safety dimension gets better when compared to the rest of European countries, and this causes Comunidad Foral de Navarra to jump to the second position in the strong indicator when European reference levels are used.

Fig. 1
figure 1

\(R-QoL (ES)\) by dimension: Comunidad Foral de Navarra (2018)

Fig. 2
figure 2

\(R-QoL (EU)\) by dimension: Comunidad Foral de Navarra (2018)

In contrast, the region of Cataluña worsens its position in the ranking from the fifth position at the Spanish level to the eighth position at the European level (see Table 4). For example, notice that the score of the employment dimension when it is assessed using Spanish references levels (see Fig. 3), achieves a higher performance than using European reference levels (see Fig. 4). This result is consistent with the fact that the Spanish employment rates, as indicated in Table 2, are below the European values in Table 3, and for this reason, in comparison with the Spanish regions, Cataluña has a good position but not in comparison with the rest of the European countries. Other dimensions, like Governance, Safety or Social Relations, also drop down when considering European levels.

Fig. 3
figure 3

\(R-QoL (ES)\) by dimension: Cataluña (2018)

Fig. 4
figure 4

\(R-QoL (EU)\) by dimension: Cataluña (2018)

Finally, in order to illustrate the effect of the geographical scale chosen for the reference levels, let us compare the results obtained by Castilla y León and Melilla. The former ranks 10th when Spanish Levels are used, and 6th when European levels are used. Conversely, Melilla ranks 7th when Spanish Levels are used, and 10th when European levels are used (see Table 4). Figure 5 graphically shows the compared scores of the different dimensions of these two regions, for the European (left) and Spanish (right) scales. As can be seen, when the European scale is considered, Melilla shows a much better performance than Castilla y León in dimensions 2, 5, 7 and 9. On the other hand, when the Spanish scale is used, while the differences in favour of Castilla y León remain more similar, the differences in favour of Melilla are now much smaller. Significantly, in dimension 7 (Governance and basic rights), Castilla y León gets the worst possible value (0) in both scales, but Melilla’s performance is above percentile 70 for the European case, and close to percentile 30 in the Spanish case. Another significant effect can be seen for dimension 9 (Overall experience of life). In this case, Melilla’s performance is over percentile 70 for both scales, but Castilla y León performs at percentile 30 for the European levels, and over percentile 70 for Spanish levels. This explains the different ranks of these two regions for the two scales used.

Fig. 5
figure 5

Comparison between the dimension scores for Castilla y León and Melilla (2018)

4.3 Further Considerations on the Practical Implementation of the Methodology

In this subsection we discuss two important issues that need to be taken into account when implementing the proposed methodology. First, we consider the practical relevance of the common scale \(\varvec{\alpha }\) and its possible impact on the results obtained. Second, we study the correlations existing among the chosen indicator, and we propose a way to take them into account when assessing the weights. We will see that the results obtained in this particular study are quite robust regarding these two issues.

Regarding the common scale, we must take into account that the piece-wise linear achievement function used is a transformation that converts the data distribution into a new one. The use of different scales \(\varvec{\alpha }\) may yield different results, given that some distortion may occur. Therefore, the scale must be chosen in a careful way. In the particular case when percentiles are used as reference levels, it may seem reasonable to pick \(\varvec{\alpha }\) equal to the percentiles. This way, we would be essentially approximating the transformation of the data into a uniform [0, 1] distribution. As an example, we have computed the weak R-QoL composite indicators, using Spanish reference levels with \(\varvec{\alpha }=(0,0.3,0.7,1)\). This will add more consistency and ease of interpretation, as the final score can be viewed as a percentage of achievement. We can assume that the results obtained for the other case (European reference levels) would be very similar. On the other hand, the impact on the strong indicator is not significant, given that it just points out the worst indicator or dimension. In Table 6 we can see the results obtained with the new scale, compared to those obtained with the original one. As can be seen, the results are very consistent with our initial proposal: only three regions slightly vary their performance, but in general, the results in terms of ranking remain very stable. Nevertheless, the percentile scale does seem easier to interpret.

Table 6 Comparative analysis of the weak \(R-QoL (ES)\) for different common scales

Secondly, let us discuss the correlation issue, again for the case of the weak R-QoL composite indicator, using Spanish reference levels. We have checked the correlations among the single indicators. The correlation matrix for the original values is reported in Table 14 in the Appendix. As can be seen, the correlation coefficient for the single indicators ranges from – 0.81 (1.1 versus 1.4) to 0.75 (1.4 versus 1.3). Anyway, we believe that indicators belonging to the same dimension can be highly correlated, without this affecting the final results, given the two-stage procedure followed. For this reason, we think it is sensible to maintain the complete list of principal indicators, following the INE proposed framework, and to carry out a correlation analysis among the composite indicators obtained, at the first aggregation stage, for the dimensions. As shown in Table 15, in most cases, the correlations between the dimensions are low, except for some dimensions such as Education and Overall Experience of Life. Specifically, we see a significant correlation between the dimensions of Education and Material Living Conditions with a value of 0.802. Furthermore, the correlation between Overall Experience of Life, Employment and Health also seems remarkable with 0.827 and 0.614, respectively.

These high correlations may imply double counting (or, at least, over-weighting) in some cases. In view of this fact, we propose to use the Factor Analysis technique to derive new dimension weights that take these correlations into account. We can check by applying the Kaiser-Meyer-Olkin (KMO) Test, that the factor analysis can be applied, as the value of KMO is 0.6. Then, the factors that explain the maximum variance and have positive eigenvalues are retained. To clarify the relationship among factors, we use Varimax rotation, assuming that there are no intercorrelations between factors. Therefore, the choice of the rotation method refers to the correlation between factors, not between dimensions. After varimax rotation of the factor axes, three factors were extracted which accounted for 69.6% of the total variance. (see Table 7).

The first factor has high loadings with Employment, Health, Governance and Overall experience of Life. Factor 2 is mainly dominated by Material Living Conditions and Education. Finally, Factor 3 is formed by Leisure and Social Interaction, Safety, and Environment. The rotated factor loadings are used in Table 8 to construct the dimension weights (the factors with the highests loadings are marked in bold in the table).

Table 7 Factor Analysis of QoL dimensions with overall KMO = 0.6
Table 8 Rotated factor loadings (Varimax) and dimension weights

As can be observed in Table 9, the new ranking remains relatively stable in comparison with the ranking made using equal weights (EW), and the top and last regions are the same. On the other hand, as expected, some other regions are affected by the new statistical-based weighting technique. The most notably impacted regions are País Vasco, Comunidad de Madrid, and Extremadura, for which the ranking varies four or five positions from one procedure to the other. In general, we think that the methodology used in this sense may be chosen according to the aims of the decision centre. While taking correlations into consideration is more theoretically sound, one may wonder whether it is reasonable to give such a high weight to the Overall Experience of Life dimension (see Table 8), which is based on subjective assessments, and to give a much lower weight to, for example, the Health dimension, with more objective indicators. Probably, for this reason, the European Commission theoretical framework uses and aggregates all the indicators with equal weights (which we have replicated in Sect. 3). In other practical studies, this last dimension (which can be regarded as a summary of all the previous ones) is not considered in the aggregation process. In any case, as already seen, the methodology proposed in this paper can be easily adapted to the decision made in this respect.

Table 9 Comparative analysis of the weak \(R-QoL (ES)\) for weights derived from Factor Analysis (FA) and equal weights (EW)

5 Concluding Remarks

In this paper we propose the use of a multicriteria reference-based methodology (MRP-WSCI) to build composite indicators to rank and study 19 Spanish regions according to their quality of life. These indicators are built using information on nine quality of life domains provided by the Spanish National Institute of Statistics. Three main issues have been taken into account in order to build these composite indicators. First, the methodology allows the use of different geographical scales in order to comparatively assess the quality of life of the regions studied. Second, the study within each geographical scale is done through the use of corresponding statistical reference levels (in this particular case, percentiles 30 and 70 of all the regions of the geographical zone). Therefore, the results obtained are easily interpreted as the relative position of each region with respect to all the regions belonging to the given scale. Finally, two composite indicators are developed for each region, with different compensability levels. The weak (fully compensatory) indicator provides an overall measure of the region’s QoL, as compared to the regions of the geographical scale considered. The strong (non compensatory) indicator provides information about the worst dimension of each region (again, as compared to the rest of the regions of the scale), thus providing complementary information that would have most likely remained unnoticed with other methodologies.

With respect to the results obtained, several findings can be pointed out:

  • Globally, most of the Spanish regions are at an intermediate position regarding the EU regions, with a moderate overall performance, below percentile 70, but (all of them) over percentile 30 of the EU regions. Only two regions, Aragón and Illes Balears, have an overall performance over the EU percentile 70.

  • 13, out of the 19 regions, perform worse than the EU percentile 30 for at least one dimension. For the 6 remaining regions, their worst dimension performs better than the EU percentile 30, but worse than percentile 70.

  • Compared to the EU regions, the strongest dimensions for the Spanish regions are 3 (Health), 9 (Overall experience of life), 6 (Safety) and 8 (Environment), all of them with an average weak indicator (across all Spanish regions) over EU’s percentile 70. The worst dimension of the Spanish regions is 7 (Governance and basic rights) with an average weak indicator below EU’s percentile 30, and with some regions with a value of 0, meaning that hey get the worst values of all EU regions for the two indicators of this dimension. Other poorly performing dimensions are 4 (Education) and 1 (Material Living Conditions), with an average weak indicator slightly over EU’s percentile 30.

With respect to the methodology used, it has been proved that it successfully identifies the potential areas of improvement of each region, at both geographical scales, through the consideration of the strong composite indicator. It is interesting to observe the variation of this indicator when the geographical scale changes. For example, while dimension 7 (Governance and basic rights) is the worst one for 6 Spanish regions at the EU scale, it is only the worst one for 2 regions at the Spanish scale, meaning that the Spanish generalised low values in this dimension make some results comparatively better when the Spanish scale is used.

Based on the previous comments, these results could be useful for regional policy design at different scales. In this respect, if the EU sets an objective to improve a given quality of life domain and analyses the regions of a given country, it could use the information provided by the composite indicators with European levels. However, the use of country levels provides more useful guidance for policy-making in the context of regional policies defined by country’s own governments.

Another issue of concern that arises from our analysis is data availability at the regional level and the number of indicators considered in each dimension. In fact, it is not always possible for indicators at different geographical scales to be comparable, even if they are included in the same dimension. Incorporating more indicators per domain would allow a better assessment of the regional quality of life.

In the future, this study could be expanded considering different geographical scales associated with the choice of reference levels to test the robustness of the composite indicator. In addition, as a future improvement of the proposed methodology, we aim to consider to define the scale of \(\alpha\) values equal to percentiles to add more consistency to the results. We also plan to investigate the use of expert opinion weights based on subjective choices. It is evident that the field presents significant research opportunities for both academics and policy-makers interested in constructing regional composite indicators to measure the quality of life beyond GDP.