1 Introduction

The use of composite indicators to measure social phenomena has boomed over recent years across academic, scientific, institutional and political fields. In some cases, these are simple indicators that reflect a certain aspect of reality, while others are added from a combination of several variables. They are also very diverse in terms of their methodological complexity as well as in the degree of their approximating reality.

The inclusion of gender perspectives in the design of wellbeing indicators and even more in that of public social statistics is an issue that did not begin to draw attention until the mid-eighties, although at that time still was a symbolic fact (Oakley 1991; Beck 1994). The Fourth World Conference on Women held in Beijing in 1995 represents a milestone in this regard (ONU 1995). During this conference, it is highlighted the need to “collect, compile, analyse and regularly submit data disaggregated by age, sex and other socioeconomic indicators, including the number of dependents, for use in planning and implementing policies and programs” (par. 206B) and “promote the further development of statistical methods to improve data that relate to women in economic, social, cultural and political development” (par. 208B).

In response, international agencies such as the World Bank, the Economic Commission for Latin America and the Caribbean and the United Nations Program for Development (UNDP) pioneered the creation of gender-sensitive indicators. At the European level, they have also begun to incorporate this perspective in the Treaties (Amsterdam) and Agreements, as well as their own regulations governing the Structural Funds (1993 and 1999 reforms). The same applies at the national and regional levels within Europe.

The construction of any composite indicator necessitates the availability of data. In the case of gender indicators, this is a major obstacle; the traditional problems of availability of any public statistics joins a historical limitation in relation to the gender breakdown of generated data and, moreover, to the absence of a non-androcentric design of official statistics at the international level.

Despite some isolated early attempts, the breakdown of the data by sex goes back to the early 70 s. In 1974, the Eighth World Congress of the International Sociological Association (now the Research Committee of Women and Society) created a working group (Boulding et al. 1976) to analyse and enhance the availability of statistical data on women, resulting in the first data source “Handbook of International Data on Women”. Since then, improvements in data availability have been apparent, but some problems still impede gender analysis from a quantitative point of view.

This situation has greatly influenced the development of welfare indicators to incorporate gender issues. In broad terms, the work done in this line has given rise to two types of indicator: simple wellbeing measures disaggregated by gender—so-called gender gap indexes—and measures of gender-sensitive wellbeing created by adjusting aggregate measures of wellbeing to account for gender inequalities.

In 1995, the UNDP proposed the main indices used at the international level, which were the Gender-related Development Index (GDI) and the Gender Empowerment Index (GEM). However, these indices have not been exempt from criticism, and there have been several attempts to overcome their major limitations.

The principal objections to these indicators are similar to those made to the Human Development Index (HDI), which proposed the measurement of welfare from a generic point of view, by McGillivray (1991), McGillivray and White (1993), Srinivasan (1994), Ravallion (1997), and others. But there also some differences in the objections (White 1997; Bardhan and Klasen 1999; Klasen 2006; Dijkstra and Hammer 2000; Dijkstra 2002, 2006). In general, these differences are related to:

  • The choice of dimensions for measuring inequality and the variables for these dimensions (aside from areas especially relevant from the perspective of gender).

  • How to measure the achievements; the indicator measure not gender inequality itself but combinations of absolute levels of achievement and the achievements of women. Moreover, inequality is counted differently for the three variables that comprise the index.

  • The very construction of the index, whose methodology is based on the highly subjective practice of assigning weights.

Thus, the main gender indices proposed over recent years, which are collected in Table 1, have been distinguished primarily by their components and to a lesser extent have been faced with methodological limitations.

Table 1 Some alternative indicators to GDI and GEM

From our point of view, the incorporation of gender mainstreaming implies that assessing the implications of any planned action is for both women and men and is not just the mere breakdown by gender, although this is also essential. Furthermore, the development of a methodology consistent with the requirements of the object of study is essential in the construction of composite indicators.

Considering the above, in this work we propose an indicator that is useful from the standpoint of gender. Aware of the interest in incorporating components that are not traditionally used in the measurement of wellbeing but are particularly useful from the perspective of gender, variables such as time on unpaid work or leisure, among others (Domínguez-Serrano 2009), our interest in this work that focuses on developing a methodology to resolve some of the problems arising from the use of weights. There are such problems for most of the indicators of wellbeing, such as subjectivity in assigning weights, the influence of the units of measurement used in the initial indicators, etc.

In this paper, we focus on these critical issues with the objective of designing a methodology for assessing wellbeing from a gender perspective.

We intend to show the status of men and women separately, not basing the measurement on the quantification of inequalities but trying to obtain a way for measuring that allows a comparison of the situations of each group. Also, the proposed methodology allows the definition of the endogenous and objective weights assigned to initial indicators, but the final result is affected by differences in measurement units. From a methodological point of view, the proposed methodology is based on an approach called global efficiency (Despotis 2002, 2005) and the procedure proposed by Zhou et al. (2007). The combination of both allows the definition of linear programming models that provide composite indicators with a unique and endogenous weighting system, allowing discrimination between all the units tested, and it provides an overview of the status of each. This new approach will be called the best-worst global evaluation approach.

Thus, the paper first shows the methodology for constructing the composite indicators and secondly shows an example of the procedure that is applied to all 27 countries of the EU. This illustrates its use, obtaining composite indicators of welfare to independently analyse the situation of women and men. We then present our conclusions.

2 A New Methodology to Obtain Composite Indicators: The Best-Worst Global Evaluation Approach

In recent years, alternative methodologies for obtaining composite indicators from an initial system have been defined (Nardo et al. 2005a). A composite indicator is the mathematical combination of individual indicators that represent different dimensions of the concept whose description is the objective of the initial system (Saisana and Tarantola 2002; Nardo et al. 2005b). These composite indicators provide a multidimensional evaluation of the analysed concept. The construction of composite indicators involves some stages where the analyst must make subjective decisions that can affect the results. The selection of indicators, the way of grouping them from a conceptual point of view, the choice of a normalization method, the indicator weights,… are subjective aspects contributed by the analyst.

The construction of composite indicators has been the object of many critiques based on the associated subjectivity. In spite of this, composite indicators have been used in practice as evaluation and analysis instruments to help make decisions. These instruments have many advantages that have justified their wide utilization. We can emphasize that they constitute instruments adapted to illustrate and evaluate complex concepts (such as wellbeing or sustainability), facilitating the interpretation of information systems to support decision-makers.

Many methodologies have been advanced for the purpose of constructing composite indicators that reduce the number of decisions that must be adopted by analysts: that is, decreasing associated subjectivity (Munda 2005; Zarzoza et al. 2005; Messer et al. 2006; Vyas and Kumaranayake 2006; González-Laxe and Castillo 2007; Munda and Nardo 2007; Ramzan et al. 2008). One of these methodologies, called the benefit of the doubt approach, uses Data Envelopment Analysis (DEA) like an instrument to construct composite indicators (Mahlberg and Obersteiner 2001; Bradbury and Rouse 2002; Cherchye and Kuosmanen 2002; Cherchye et al. 2003, 2006, 2007; Murias et al. 2007). Our paper follows this line of research to carry out fixed objectives. The benefit of the doubt approach tries to solve two fundamental problems in the construction of composite indicators: the dependence of the composite indicator values on the normalization method applied and the lack of consensus over the treatment of indicator weights.

Linear programming problems that define DEA models of the benefit of the doubt approach allow us to obtain composite indicators, which fix endogenous indicator weights for each unit without needing to assign them a priori value. Specifically, weights assigned to each unit are those that allow us to obtain the greatest composite indicator values, granting a higher weight to indicators that present a better situation regarding the rest of the units. This approach is very flexible because is not necessary that all units use the same weight set (Martínez et al. 2005; Murias et al. 2006). Furthermore, this flexibility allows weight values to be adapted to unit measures of initial indicators; therefore, the application of a normalization method is not necessary because the composite indicator does not depend on different units used in the initial system.

Although the benefit of the doubt approach has gained in popularity in recent years, it presents some limitations that must be considered. First, the flexibility associated with obtaining weight values can generate extreme results, such as composite indicators based on one initial indicator or the assignment of a greater weight to an indicator with secondary importance. Other extreme results include obtaining weights equal to zero for the most important indicators or very different weight sets for each unit. Second, linear programming problems can provide composite indicator values equal to many units, so that it is impossible to discriminate between them. Another limitation is the specific character of the weights that have been obtained for each unit, given that these weights do not enable the carrying out of comparative analysis among all units in the same way that a composite indicator with a weight set common for all units does.

In this context, following the research line of the benefit of the doubt approach, our objective is to define new linear programming problems that counteract some limitations of the DEA models that have been used until now. In particular, the objective of the proposed methodology is to obtain composite indicators using a common and unique weight set so that its values enable discriminating between all units and provide a global vision of each unit situation without using exclusively the most favourable evaluation. On the one hand, this methodology is formulated taking as reference the so-called global efficiency approach (Despotis 2002, 2005), which allows defining composite indicators with a common set of weight for all units using DEA models. For that, the global efficiency approach proposes observing all units from the same point of view, estimating a common set of weights in a manner such that the resulting composite indicator values are as close as possible to the benefit of the doubt values. On the other hand, the new methodology is formulated considering linear programming problems proposed by Zhou et al. (2007). The aim of these problems is to obtain a composite indicator to provide a consensus global evaluation of each unit using two sets of weights that are most and least favourable for each unit to be evaluated. Combining both, we define the one that we call the best-worst global evaluation approach.

The formulation of this approach is as follows. Consider the case where there are n units (for example countries, regions, etc.) the aggregated performances of which are to be evaluated using a composite indicator based on a system of m initial indicators. These initial indicators are quantified using different measurement units. Regarding indicator variability direction, there are two types: positive and negative indicators. Here, positive indicators are those where the greater their values, the better the evaluation will be. The indicators are negative when they fulfill the opposite condition. Without loss of generality, we assume that all initial indicators of the system are positive. Let I ij denote the value of unit i with respect to indicator j.

In this context, it is necessary to estimate common weight sets for all units in the best and worst possible scenarios and use them to construct a composite indicator with preceding characteristics. For this, the analyst must carry out the following procedure. In the first place, all units are analysed in the best possible scenario obtaining the maximum composite indicator value for each unit (\( {\text{CI}}_{i}^{*} \)). This is the value corresponding to the benefit of the doubt composite indicator, which is obtained through the following linear programming problem:

$$ \begin{array}{*{20}c} {{\text{CI}}_{{_{i} }}^{*} = } \hfill & {\mathop {\max }\limits_{{\omega_{{_{ij} }}^{*} }} \quad \sum\limits_{j = 1}^{m} {\omega_{{_{ij} }}^{*} \cdot I_{ij} } } \hfill \\ {{\text{s}} . {\text{t}} .} \hfill & {\sum\limits_{j = 1}^{m} {\omega_{{_{ij} }}^{*} } \cdot I_{hj} \le 1\quad \forall \;h \in \left\{ {1,2, \ldots i \ldots ,n} \right\}} \hfill \\ {} \hfill & {\omega_{{_{ij} }}^{*} \ge 0\quad \forall \;j \in \left\{ {1,2, \ldots ,m} \right\}} \hfill \\ \end{array} , $$
(1)

where \( \omega_{{_{ij} }}^{*} \) is the benefit of the doubt weight of the indicator j for unit i in the best scenario.

In this way, the maximum composite indicator value for each unit is obtained, fixing a weight set that is inferred from looking at relative strengths and weaknesses. The method thus assigns a greater weight to indicators in which analysed units have a better relative situation. Furthermore, two more features are added to establish the composite indicator scale (Cherchye et al. 2007). On one side, a normalization constraint is introduced, stating that no other unit in the set has a resulting composite indicator greater than one when applying the optimal weights for the evaluated unit. On the other, a second constraint limits the weights to be non-negative. In this way, given that the composite indicator is a non-decreasing function of the initial indicators, its values are bounded for each unit such that \( 0 \le {\text{CI}}_{i}^{*} \le 1 \).

After obtaining the benefit of the doubt composite indicator for each unit, the analyst must estimate common weights across the set of all units in such a way that the resulting composite indicators are as close as possible to the ideal point defined by previous ones. Thus, a composite indicator providing a global evaluation (called a global composite indicator) in the best scenario for all units is obtained. To estimate these common weights, the following linear programming problem formulated with parameter t is used:

$$ \begin{array}{*{20}c} {\text{Min}} \hfill & {t \cdot \frac{1}{n} \cdot \sum\limits_{i = 1}^{n} {d_{i} + (1 - t) \cdot z} } \hfill \\ {{\text{s}} . {\text{t}} .} \hfill & {\sum\limits_{j = 1}^{m} {\omega_{j}^{*} \quad I_{ij} + d_{i} = CI_{i}^{*} \quad \forall \;i \in \left\{ {1,2, \ldots ,n} \right\}} } \hfill \\ {} \hfill & {d_{i} - z \le 0\quad \forall \;i \in \left\{ {1,2, \ldots ,n} \right\}} \hfill \\ {} \hfill & {\omega_{j}^{*} \ge 0\quad \forall \;j \in \left\{ {1,2, \ldots ,m} \right\}} \hfill \\ {} \hfill & {z \ge 0} \hfill \\ {} \hfill & {d_{i} \ge 0\quad \forall \;i \in \left\{ {1,2, \ldots ,n} \right\}} \hfill \\ \end{array} , $$
(2)

where \( \omega_{j}^{*} \) is the common weight of the indicator j in the best scenario for all units.

Model (2) allows estimating common weights using different norms to quantify the distance between global composite indicators and ideal points obtained previously, depending on the value of parameter t that has been assigned. In the case that t is equal to one, the objective function represents the mean deviation between global composite indicator and benefit of the doubt values for all units, so that the distance is measured using the L 1 norm. If parameter t is equal to zero, the objective function is reduced to the non-negative variable z and represents the maximal deviation between the above composite indicators. In this case, the distance between them is measured using the L norm. Varying the parameter t among these two extreme values, model (2) allows us to obtain different sets of common weights, beyond the extreme ones, that minimise the maximal and the mean deviation.

Using the previous model to estimate the common weights, global composite indicator values evaluating the situation of each unit i with common weights in the best scenario (denoted by GCI i *) are obtained to carry out the following procedure. In a first step, the analyst must solve model (2) repeatedly for discrete values of the parameter t such that 0 ≤ t ≤ 1, obtaining different composite indicator values. A practical way to do this is to generate a sequence of equidistant values of t, for example, starting with t = 0.01 and with steps of 0.01 up to 1 (Despotis 2002). Due to the convexity of the proposed problem in model (2), if the same solution is obtained for two values of parameter t (for example, t 1 and t 2) then it is available for all values of t \( \epsilon \) [t 1t 2]. In this case, it is not necessary to solve model (2) for all values of t. In this way, the analyst’s role is reduced to determining those interval values of parameter t that obtain the same estimation for common weights.

After that, when these intervals are defined, composite indicator values are obtained for each unit i, using estimated common weights for each value of parameter t (CI*(t) i ) through the following formulation:

$$ {\text{CI}}^{*} (t)_{i} = \sum\limits_{j = 1}^{m} {\omega^{*} (t)_{j} \cdot I_{ij} \quad \forall \;i \in \{ 1,2, \ldots ,n\} } , $$

where ω*(t) j is the estimated weight of the indicator j for a concrete value of parameter t.

Finally, the global composite indicator value with common weights in the best scenario for one unit i, is equal to the mean value of CI*(t) i :

$$ {\text{GCI}}_{i}^{*} = {\frac{{\sum\limits_{t} {{\text{CI}}^{*} (t)_{i} } }}{{n_{t} }}}, $$
(3)

where n t is the number of parameter t values with different estimations of common weights.

When all units have been evaluated under the best scenario, the analyst must determine the global composite indicator for the worst possible scenario using a procedure similar to the above one. For this, it is first necessary to obtain the composite indicator for each unit using the worst set of weights (denoted by CI*i ). To obtain this composite indicator values, the following linear programming problem is used:

$$ \begin{array}{*{20}c} {{\text{CI}}_{*i} = } \hfill & {\mathop {\text{Min}}\limits_{{\omega_{*ij} }} \quad \sum\limits_{j = 1}^{m} {\omega_{*ij} \quad I_{ij} } } \hfill \\ {{\text{s}} . {\text{t}} .} \hfill & {\sum\limits_{j = 1}^{m} {\omega_{*ij} \quad I_{hj} \ge 1} \quad \forall \;h \in \left\{ {1,2, \ldots ,n} \right\}} \hfill \\ {} \hfill & {\omega_{*ij} \ge 0\quad \forall \;j \in \left\{ {1,2, \ldots ,m} \right\}} \hfill \\ \end{array} , $$

where ω *ij represents the worst possible weight that can be assigned in the case of unit i to indicator j.

In this case, specific weights accorded to each initial indicator for each unit are endogenously determined considering its relative weaknesses. Specifically, the above model assigns a greater weight to those indicators in which the evaluated units show a worse relative situation. The scale for the composite indicators is established, inserting two additional constraints. The first constraint states that all units in the set have a resulting composite indicator greater than one when applying the optimal weights for the evaluated unit. The second constraint limits the weights to non-negative values. In this way, the composite indicator values have a lower bound such that CI*i  ≥ 1.

When the values of CI*i have been obtained, we proceed to the estimation of common weights in a manner such that the resulting composite indicator is as close as possible to the anti-ideal point defining the CI*i obtained for each unit. These estimations are solved in this case using the following linear programming problem with parameter t, whose objective function allows us to explore different sets of common weights that minimise the maximal and the mean deviation in both:

$$ \begin{array}{*{20}c} {\text{Min}} \hfill & {t \cdot \frac{1}{n} \cdot \sum\limits_{i = 1}^{n} {d_{i} + (1 - t) \cdot z} } \hfill \\ {{\text{s}} . {\text{t}} .} \hfill & {\sum\limits_{j = 1}^{m} {\omega_{*j} \cdot I_{ij} - d_{i} = {\text{CI}}_{*i} \quad \forall \;i \in \left\{ {1,2, \ldots ,n} \right\}} } \hfill \\ {} \hfill & {d_{i} - z \le 0\quad \forall \;i \in \left\{ {1,2, \ldots ,n} \right\}} \hfill \\ {} \hfill & {\omega_{*j} \ge 0\quad \forall \;j \in \left\{ {1,2, \ldots ,m} \right\}} \hfill \\ {} \hfill & {z \ge 0} \hfill \\ {} \hfill & {d_{i} \ge 0\quad \forall \;i \in \left\{ {1,2, \ldots ,n} \right\}} \hfill \\ \end{array} , $$

where ω *j is the common weight estimated for indicator j in the worst scenario.

Using this model, we then identify the interval values of the parameter t obtaining the same estimation for common weights. Then utilizing obtained estimations, we define the composite indicator value for each value of parameter t in the worst-case scenario in the following way:

$$ {\text{CI}}_{*} (t)_{i} = \sum\limits_{j = 1}^{m} {\omega_{*} (t)_{j} \cdot I_{ij} \quad \forall \;i \in \left\{ {1,2, \ldots ,n} \right\}} . $$

Thus, global composite indicator values that evaluate the situation of each unit with common weights in the worst scenario (denoted like GCI*i ) are equal to the mean value of CI*(t) i :

$$ {\text{GCI}}_{*i} = {\frac{{\sum\limits_{t} {{\text{CI}}_{*} (t)_{i} } }}{{n_{t} }}} $$

At this point, obtained global composite indicators allow the analysis of the situation of each unit depending on its position in each scenario, for which the analyst must obtain two rankings of the unit set using composite indicator values. Given that weights are obtained endogenously and the reference values used, global composite indicators allow the analysis of the situation of each unit depending of its relative strengths (in the best scenario) or weaknesses (in the worst scenario). Thus, the analyst is able to carry out a more realistic analysis of the situation of each unit, observing its positions in the rankings and position variation of a scenario relative to others.

Analytical and graphical instruments can be used when the analyst studies differences between rankings. One analytical instrument is the Spearmann Rho Coefficient. This coefficient quantifies linear association between the two rankings using the ranges or ordinal number that each unit presents, so that it can be calculated in the following way:

$$ \rho = 1 - {\frac{{6 \cdot \sum\limits_{i = 1}^{n} {\left[ {R({\text{IS}}_{i}^{*} ) - R({\text{IS}}_{*i} )} \right]^{2} } }}{{n \cdot (n^{2} - 1)}}}, $$

where \( R({\text{CI}}_{i}^{*} ) = \) range or ordinal number associated to unit i in the best scenario ranking; R(CI*i ) = range or ordinal number associated to unit i in the worst scenario ranking.

When this coefficient has a value close to one, the better and worse ranges of the compared rankings have a tendency to be paired off. Otherwise, the greater ranges of one ranking are paired off with the worst ranges of the other or vice versa when the coefficient is close to −1. Furthermore, the analyst can observe the absolute value of position differences on average for each unit to quantify the existing differences between compared rankings.

In the case of the graphical instrument, the analyst can create a bar graph, in which bars represent the position of each unit in each ranking. This graphic allows one to easily visualize the stability or instability grade of the results obtained in each scenario.

Although the above global composite indicators enable one to carry out a more realistic analysis, it is not always possible to extract general conclusions when the situation of a unit set is analysed using both rankings simultaneously. This is true above all when the analysed set is composed of a high number of units. It is easier to comparatively analyse all units using only one ranking that allows the consideration of strengths and weaknesses. For this, our objective is to define a consensus composite indicator that provides a global vision using the information of the global composite indicators that were previously obtained.

In a first step, given that global composite indicators are measured using a different scale, we proceed to normalize them using their maximum and minimum values through the following ratios:

$$ {\text{NGCI}}_{I}^{*} = {\frac{{{\text{GCI}}_{i}^{*} - {\text{GCI}}_{\min }^{*} }}{{{\text{GCI}}_{\max }^{*} - {\text{GCI}}_{\min }^{*} }}}\quad {\text{NGCI}}_{*i} = {\frac{{{\text{GCI}}_{*i} - {\text{GCI}}_{*\min } }}{{{\text{GCI}}_{*\max } - {\text{GCI}}_{*\min } }}} $$

In this way, a normalized value close to one indicates that the unit is better placed in the corresponding scenario. Otherwise, if the normalized value is near to zero it represents a worse situation for the unit.

When global composite indicator values have been normalized, the Consensus Global Composite Indicator (CGCI i ) for each unit is obtained using the following combination:

$$ {\text{CGCI}}_{i} = \lambda \cdot {\frac{{{\text{GCI}}_{i}^{*} - {\text{GCI}}_{\min }^{*} }}{{{\text{GCI}}_{\max }^{*} - {\text{GCI}}_{\min }^{*} }}} + (1 - \lambda ) \cdot {\frac{{{\text{GCI}}_{*i} - {\text{GCI}}_{*\min } }}{{{\text{GCI}}_{*\max } - {\text{GCI}}_{*\min } }}}\quad {\text{con}}\;0 \le \lambda \le 1. $$

This combination allows the obtaining of composite indicators in a manner such that a normalization method is not required. In this way, this composite indicator is less subjective and is constructed using common weights that allow discriminating among all analysed units, providing a global vision of the situation of each unit. For all these reasons, the consensus global composite indicator enables the fixed objectives in this paper.

Now that this new methodology of obtaining composite indicators has been defined, in the following section we apply it to illustrate how it should be used.

3 The Best-Worst Global Evaluation Approach: An Application

3.1 The Model

After defining the model from a theoretical point of view, this section describes its practical application in the context of wellbeing indicators, considering gender as a specific issue.

Regarding gender indicators, there has in recent years been a tendency to analyse women and men in separate ways, as is reflected in the literature (Klasen 2006; Dijkstra 2006; Berenguer and Verdier-Chouchane 2008) and in the forums (2nd Global Forum on Gender Statistics, held in Ghana in January 2009). Thus, the first of the empirical issues in the proposal we have made was to develop an index for men and another for women, thus capturing the peculiarities of each segment in isolation and to avoid possible deviations from the aggregation of groups with specific features and distinct characteristics.

The second aspect to consider was the choice of components or initial indicators that would make up the composite indicator. As mentioned in Sect. 2, this is a controversial issue, as there is no consensus on the variables regarding what may be more appropriate for capturing all the features needed to measure wellbeing (McGillivray 1991; Kelley 1991; Anand and Sen 1992; Ravallion 1997). Where the concept is complex and has no clearly defined boundaries, so is its measurement. For this work, therefore, the primary objective of which is to provide a novel method of analysis, we chose those components for which there is a greater consensus on an international level: health status, educational level and standards of living because they are considered to be the main dimensions on Human Development (UNDP 1990; Neumayer 2001; Bardhan and Klansen 1999; etc.). However, this paper will be complemented by future works that take into account a possible expansion of the areas to be considered.

The variables used were: life expectancy at birth, combined gross enrolment rate in primary, secondary and tertiary education and estimated earned income,Footnote 1 as proposed by the UN definition of the HDI. The statistical source used for the development of the database was the UNDP Report 2007–2008, and the data refer to 2005.

Finally, with respect to the sample, countries of the EU were selected as the sample set; the EU is a set of areas that have their own characteristics despite the integration of a common area.

The goal of the work is, by following these guidelines, to construct a different consensus global composite indicator for both men and women. Thus, using the methodology, we analyse the wellbeing of each group separately.

3.2 Results

The first case to consider is that of men for the stage which has been described as “the best”. So, applying to the 27 selected countries the models (1), (2) and (3) in successive phases, we obtain the results shown in Table 2.

Table 2 Development composite indicator for men (best scenario)

Although the whole set of countries can be compared using \( {\text{GCI}}_{i}^{*} \), ten of them could not be compared using the model (1), due to the fact that they presented the same uniform score of \( {\text{CI}}_{i}^{*} \). This indicates that the proposed methodology provides a higher discriminatory power and solves the problem of equality in the efficiency of the traditional DEA models.

Table 2 also shows the different intervals of the parameter t (0 ≤ t ≤ 1) for which it has the same estimation of the same common weights, so that each country gets the value of the indicator.

Apart from possible problems arising from the quality of the initial components, the composite indicator obtained provides a useful instrument that synthesizes the information provided by each of the variables, overcoming constraints on subjectivity in assigning weights to them. Following the same methodology, Table 3 shows the situation regarding the second scenario considered: “the worst”. It reflects both the possible values of the composite indicator based on the values assigned to the parameter t and the value of the global index. This allows the establishment of a new ranking based on GCI*i .

Table 3 Development composite indicator for men (worst scenario)

Given the manner in which weights are assigned to each scenario, the respective global evaluation indicators allow an overall assessment of each country according to its strengths or weaknesses. Although these measures would allow a more realistic analysis, it is complicated to consider multiple orders simultaneously; thus, it is interesting to use a single indicator to agree on both scenarios. Based on the overall summary of each scenario, to avoid the differences due to the scales, the indicators have been standardized according to their maximum and minimum values. With regard to the value of λ, if decision-makers or analysts have no particular preference, λ = 0.5 seems to be a fairly neutral choice (Zhou et al. 2007). The values of the consensus global composite indicator for males are shown in Table 4.

Table 4 Men consensus global composite indicator

The best ranking positions correspond to two types of situations. On the one hand, we find those countries that show high values for all variables considered, such as the Netherlands. Moreover, we find countries like Sweden, Spain or Greece whose good position in the ranking is due not only to the good values of each of its components but also to proper balance between them.

On the other hand, countries such as those in Eastern Europe (Estonia, Latvia, Lithuania or Romania) occupy the lowest ranks, mainly due to lower values in their health and standard of living components, although educational indicators are not as low for these countries.

To show the stability and the possible influence of the parameter λ on these results, we obtain the value of a consensus global composite indicator repeatedly for discrete values of the parameter λ such that 0 ≤ λ ≤ 1. A practical way to make this is to generate a sequence of equidistant values in the interval [0,1] starting with λ = 0 and with steps of 0.01 up to 1.

Thus, the position of each country in the ranking corresponding to each indicator is shown in Fig. 1.

Fig. 1
figure 1

Stability of CGCI for men

Most countries maintain their position while the parameter value varies. The countries that show greater volatility are Italy and Sweden, which improve their position, and Denmark, for which this has a negative effect. The higher the parameter value, the less weight the global worse-case indicators have. In this case, the weighting of the education variable is higher so that the positions of those countries with a high enrolment rate are negatively affected. By contrast, the weight of the global indicator for the best scenario increases with λ. In this scenario, the life expectancy variable is the main strength of the countries whose positions improve.

Following the same procedure as for men, we get the consensus global composite indicator for women (Tables 5 and 6). The values obtained for λ = 0.5 are shown in Table 7.

Table 5 Development composite indicator for women (best scenario)
Table 6 Development composite indicator for women (worst scenario)
Table 7 women consensus global composite indicator

The countries that occupy the top positions in the female case are France and Spain. Both countries have above-average values in the three components and are leaders in life expectancy, which also receives the highest weighting in both scenarios. The last positions are again filled by countries in Eastern Europe (Romania, Bulgaria, Hungary, Estonia, Slovakia, Latvia and Lithuania) which had the lowest levels, even among women, in the components of the composite indicator.

Studying the stability, the results show less variation in the case of women than of men, as we can see in Fig. 2.

Fig. 2
figure 2

Stability of CGCI for women

Thus, countries remain at approximately the same position in the higher and lower ranks. Denmark is the only country that changes significantly (six positions); the rest do not change their rankings by more than three positions. Denmark presents disparate positions in the initial components, making it fall in the rankings while the value of the parameter λ increases. If we focused on the health component, it would occupy 18th place in the ranking. If we prioritised the educational variable, it would occupy the first place and if we prioritised income, second. The weight of education and income variables increases in the best scenario, which is more relevant with increasing λ. This makes the country gradually improve its position in the rankings.

On the contrary, as in the case of men, the countries with little or no change in their positions were mainly those with a more balanced distribution among their components.

The results are related to previous findings (Malberg and Obersteiner 2001; PNUD 2007–2008). The new measure is comparable and highly correlated with the HDI, especially in men case. This is consistent with some previous analysis that the authors has carried out. The HDI, as other indicators, measures the wellbeing for both sexes, without distinction between men and women. However, when the characteristics of men and women are studied separately, in many cases, the male results are closer to the results obtained with general indicators than the women ones. This indicates that they reflect the underlying androcentric perspective in defining the very concept of wellbeing, which ignores variables which are significantly important in women situation (Domínguez-Serrano 2009).

There are some countries which position in the ranking is different from the one obtained with HDI. That is the case, for example, of Greece (men ranking) or Ireland (women ranking) that get a better position with the proposed composite indicator. The superiority of the new measure is based on the fact that the weights assumed for the component indicators, as a result of an optimization process, are less arbitrary and contestable than the ones assumed in other indexes which weights are based on subjective decisions.

This is a good feature in order to use it as evaluation and analysis instrument to help make political decisions, i.e., resource allocation or benchmarking practices. Depending on the objective, the policy-maker would be interested in using a weaker or stronger criterion to determine the ranking. Thus, the parameter λ may be amended to obtain the global consensus composite indicator prioritizing the strengths or weaknesses.

4 Conclusions

In this paper we study wellbeing measurements with a gender perspective. In this field, composite indicators have been widely accepted as a multidimensional measure and a useful tool for performance monitoring, policy analysis, benchmarking, etc. to counteract limitations of the wellbeing composite indicator from a gender perspective, we have defined a new composite indicator that shows and evaluates women’s and men’s situations separately. Thus, our objective is to analyse and compare the situation of each one without measuring disparities in a direct way.

To construct this composite indicator, in this paper we have proposed a linear programming problem that defines the best-worst global evaluation approach. This methodology is a combination of the global efficiency approach (Despotis 2002, 2005) and the linear programming problems proposed by Zhou et al. (2007).

The proposed methodology yields composite indicators using a common and unique weight set such that its values enable the discrimination between units and provide a global vision of each one without using only the most favourable evaluation. By contrast, the proposed approach uses two sets of common weights that are the most and the least favourable for the units. Compared with previous studies, on the one hand, this approach does not require prior knowledge of the weights for initial indicators, which can be generated endogenously by solving linear programming models. On the other hand, a normalization method is not required to obtain the composite indicator. Thus, the obtained composite indicator is less subjective than others. Also, this new approach aids in improving the discriminating power of traditional DEA models, so that all units are fully ranked.

The main advantages of this approach is, compared with traditional DEA models, that composite indicators based on common weights are more comparable with those obtained by international organizations using equal weights. Furthermore, the global composite indicators are additional instruments that allow the analysis of each unit’s situation depending on its relative strengths (in the best-case scenario) or weaknesses (in the worst-case scenario). Thus, the analyst is able to carry out a more realistic analysis of the situation of each unit by observing its positions in the rankings and the variation of the position in both scenarios.

The proposed approach has been applied to developing composite indicators for modelling wellbeing with a gender perspective in the case of EU member countries. For all of them, we have obtained a consensus global composite indicator using the same components of the HDI indicator. The descriptive analysis of the results offers a guide for how this new methodology must be applied in practice, establishing the role of the analyst in each case. Also, we have defined a graphical instrument to analyse the stability of values of the composite indicator. This instrument can be used to identify unstable countries and, using weight sets, to determine the causes of this situation.

Nevertheless, as we have pointed out above, it is necessary to work on some issues related to the measure proposed. In further papers we will focus on new wellbeing components in the initial indicator system in order to obtain a multidimensional evaluation. On the other hand, the best-worst global evaluation approach should be refined to simplify the formulation of the global consensus composite indicator. Also, new tools that help to interpret the composite indicator values would be defined. Thus, a review of the literature shows that most studies present a theoretical definition of the composite indicator, but do not use it. In recent decades, there has been an increasing interest in the scientific aspects of developing indicators, however, the integration of these indicators into actual policy-making is only beginning to be addressed. It will thus be a need for further empirical studies so as to cover this gap.