1 Introduction

In recent years, the study of economic evolution regarding the well-being of people and households has become an important research issue because of the possibility to gather and store a large amount of data collected over time (see, e.g., Mazziotta and Pareto 2017).

The rapid progress made in science and technology has contributed to the evidence of an evolving world, and for this reason, new perspectives in knowledge discovery upon economic and social data with a time-oriented perspective are needed. The paradigm of “change mining” (Oliveira and Gama 2012) has arisen as a consequence of this evolution.

Data mining mechanisms that monitor models and patterns over time, compare them, detect changes, and describe these changes have become more and more useful. Having this in mind, some data mining researchers have developed methods and techniques to study the evolution of different phenomena over time (see Aggarwal 2005; Spiliopoulou et al. 2009).

Moreover some macro-economic statistics such as GDP seem to not give a detailed picture of the living conditions of common people (see, e.g., Maggino 2017; Chelli et al. 2016).

The perception of credibility and accountability of public policies has been highlighted in recent years because of the financial and economic crisis.

The Organization for Economic Co-operation and Development (OECD) states, “the OECD Framework for Measuring Well-Being and Progress is built around three distinct domains: material conditions, quality of life and sustainability, each with their relevant dimensions” (OECD 2019).

In this paper, we address the problem of monitoring the evolution of well-being in Italy over time. This can help decision makers of different areas make better economic and political decisions. The indicators, one for each BES domain, have been used to measure well-being. They were selected from the ISTAT database (see ISTAT 2019) and refer to the period 2004–2018.

The aim of this research is to understand how equitable and sustainable well-being in the territories (provinces) has changed; this will be done through a comparison between the precrisis year 2008 and subsequent years. In particular, we are interested not so much in identifying a ranking among the provinces but rather in seeing how the various territorial areas have moved in the period and which indicators have a greater weight. For this purpose, the ISTAT database containing the BES territorial indicators was used. This database contains the time series of different indicators, but not all of them are available for the period of interest.

The new idea regards the application of the Multiple Factor Analysis (MFA) to selected indicators for Italian provinces in different periods. MFA is applied to tables in which a set of individuals (one individual = one row) is described by a set of variables (one variable = one column). The advantage of MFA lies in the fact that within the active variables, it can account for a group structure defined by the user. Such data tables are called individuals × variables organized into groups (Escofier and Pages 1998).

The idea is different from that proposed by Mazziotta and Pareto (2017) because our objective is not to construct a composite indicator but rather to see how different indicators (one for each domain) move to analyze and then monitor the evolution of well-being in Italy over time.

In the analysis, we do not use a composite indicator for each domain of the BES (see Mazziotta and Pareto 2019); we only use one of the indicators of each domain, choosing it based on objective criteria.

We performed the analysis using FactoMineR (Lê et al. 2008), a package of the R language.

2 Equitable and Sustainable Well-Being at Local Level

Equitable and sustainable well-being (BES) is a multidimensional approach that identifies 12 well-being domains (ISTAT 2013; ISTAT 2018); for each of them, a set of indicators is given (at NUTS2 level). BES is becoming a more and more important tool to evaluate the progress of society from an economic, social, and environmental point of view. Consequently, the Italian Economic and Financial Document has included some BES selected indicators since 2017.Footnote 1 The interest in BES has been growing over time, especially for Italian provinces and cities (NUTS3) (see Taralli 2013), so in the 2018, ISTAT issued for the first time a system of BES indicators at the NUTS3 level.

The BES domains at the local level are the same as those at national level, with an exception made for the Subjective well-being domain because of the lack of subjective indicators at the local level. The 11 domains and the indicators are listed in Table 1, and they belong to the 2019 version of the database (see ISTAT 2019). This version differs from the previous one in some aspects: variable definitions, introduction of new indicators, elimination of some previously available indicators, time span of new indicators, and territorial units (see ISTAT 2019).

Table 1 Time availability of indicators (ISTAT database from 2004 to 2018) Coloured cells refers to data availability; different colors allow to distinguish domains

In 2019, a set of indicators consisting of 55 measures was published; each domain is not formed by the same number of indicators, and almost half of the indicators do not give values before 2008. Table 1 shows three domains (Social relations, Landscape and cultural heritage, and Innovation research and creativity) that do not have data from before the 2008. Furthermore, the data related to the Social relations domain are still missing up to 2014. Some indicators present values only in well-defined years, that is, Voter turnout in European elections and Voter turnout in regional elections (Politics and institutions domain).

3 The Method: Multiple Factor Analysis

The need to simultaneously introduce quantitative and qualitative variables (known as mixed data) as active elements of one factorial analysis is usual in different statistical analyses (see e.g. Bolasco 1999).

The first method suggested within this framework is the Classical Canonical Analysis that, in practice, is not useful in the case of creating groups based on a given set of variables.

The problem of variables partition in different subspaces can be solved by using MFA (see e.g. Pages 2014). It relates to a Principle Component Analysis (PCA) that can analyse both quantitative and qualitative data (Escofier and Pages 1998); that is, it can handle multiple data tables that measure sets of variables collected on the same observations and these variables can be of different type.

In particular MFA is applied to tables in which a set of individuals (one individual = one row) is described by a set of variables (one variable = one column). The basic idea of MFA lies in the fact that within the active variables, it can account for a group structure defined by the user. Such data tables are called individuals × variables organised into groups (Escofier and Pages 1998).

In order to describe the MFA algorithm, one can see it as a “mixture” between a PCA for quantitative variables and a Multiple Correspondence Analysis (MCA) for the qualitative variables.

Its goal is to analyze several data sets measured on the same observations, to obtain a set of common factor scores and to plot each of the original data sets in a two dimensional space.

MFA procedures compute a PCA of each data table and normalize them by dividing all elements by the first singular value obtained. All the normalized data tables are aggregated into a new table that is analyzed via a non-normalized PCA; this new PCA is obtained by decomposing the variance of the “compromise” into a set of new orthogonal variables (i.e., the principal components are also often called dimensions, axes, factors, or even latent variables) ordered by the amount of variance that each component explains. The coordinates of the observations on the components are called factor scores; these can be used to plot maps of the observations in which the observations themselves are represented as points such that the distances in the map best reflect the similarities between them. The positions of the observations are called partial factor scores and can be represented as points on a map (Abdi et al. 2013).

In other words, the heart of MFA is a PCA in which the weights are assigned to the variables used in the analysis. More precisely, the same weight is associated to each variable of the group of the PCA on the group j (j = 1,…, J). The importance of the dimension represented by the principal component is given by its eigenvalue, which indicates how much of the total inertia (i.e., variance) of the data is explained by this component.

This shows that the inertia of a group represents the individuals’ variability both from the point of view of their deviation from the centre of gravity and from of the between-individuals distances. Thus, the maximum axial inertia of each group of variables is equal to one.

The influence of the groups of variables in the global analysis must be balanced and the structure of each group must also be respected. The weight assigned to each variable presents a simple direct interpretation. It allows to consider MFA as a particular Generalized Canonical Analysis. For each group of variables, MFA analysis associates a set, that is, a “cloud” of individuals and a representation of these clouds.

This representation can be obtained in different ways: as a projection of a cloud of points, as a canonical variable or using, another idea, such as that proposed by Pages (2014). According to this last proposal the structure of the variables in the J groups (j = 1,…,J) and the use of a weighting of MFA given by the reciprocal of the first eigenvalue are taken into account. This prescaling entails that when a PCA is performed on the merged prescaled data sets, the resulting components will reflect a structure common to the data set.

Given the transition formula of the space of variables into the space of individuals, as written by Pages and Husson (2005), and taking into account the structure of variables in J groups and the weighting of MFA (\(\frac{1}{{\lambda }_{1}^{j}}\) if \({x}_{k}\) belongs to group j), the \({F}_{s}\left(i\right)\), that is, the score of the individual i on the axis (of rank) s is given by:

$${F}_{s}\left(i\right)=\frac{1}{\surd {\lambda }_{s}}\sum_{j=1}^{J}\frac{1}{{\lambda }_{1}^{j}} \sum_{k=1}^{{K}_{j}}{x}_{ik }{G}_{s}(k)$$

where:

  • Kj is the number of variables in group j,

  • Gs(k) is the score of the individual i on the axis (of rank) s,

  • λs is the s eigenvalue associated to axis s,

  • λ1j is the first eigenvalue of group j,

  • \({x}_{ik}\)is the general term of the data table (row i, column k).

This relationship is very important for interpreting the position of individuals with respect to the variables. We must note that on the graphical displays derived from MFA, each individual appears as a centroid of its partial representations. (see Pages and Husson 2005).

In the next paragraph, practical and theoretical notions referring to the object of this study are considered.

4 The Application

This section describes the data used and the results of applying MFA. As previously anticipated, we use the ISTAT database Misure del benessere dei territori. Tavole di dati (2019).Footnote 2 Because our aim is to describe how equitable and sustainable well-being has changed, we compare the situation before the 2008 crisis, after the 2011 crisis, and to the current one. Based on the considerations presented in the second paragraph, we choose the database performances of Italian provinces in 2007, 2012, and 2017.

We apply one indicator for each domain. This is because of the following considerations:

  • the different number of indicators by domain would lead to an initial distortion, resulting in different weights for each domain;

  • the use of a composite indicator for each domain to apply the MFA (see Mazziotta and Pareto 2019) is not appropriate here because in some domains, the indicators are not available for all the years considered;

  • there are cases in which the choice of only one comparable indicator over time is the only possible, because of the presence of numerous missing data in the table in relation to some domains;

  • a similar approach is used by Ciommi et al. (2017), in which the domains of the territorial BES are described by a single indicator given the limited availability of homogeneous time series.

4.1 The Data

The choice of indicators used in the analysis is based on different criteria: availability over time, discriminant capacity, variability, and the correlation among indicators inside each domain.

The evaluation of the discriminant capacity of indicators is based on Analysis of Variance to test for the differences among group means; we grouped provinces by macro socio-economic region (North, Center, and South Italy). Because some indicators have not shown a significant difference in the mean values among macro-regions, we decided not to use them in the application. We report the selected indicators in Table 2, which shows the domains with one indicator (Economic well-being, Spread of rural tourism facilities, Innovation, research, and creativity) and others with a different number of indicators (from 2 to 5). To choose the better indicator in each domain, we considered the correlations within each domain (Figs. 1, 2, 3, 4, 5, and 6) and the variability indexes (Table 2) (i.e., coefficient of variation and the quartile difference for standardized data). Figures 1, 2, 3, 4, 5 and 6 show the correlation matrices for each domain. The red colors indicate a positive correlation, while the blue colors show a negative one. The color intensity increases when the correlation rises. Within each domain, we selected those indicators that show both the largest correlation and the highest variability.

Table 2 Means, coefficients of variation and standardized Q3-Q1 (2007, 2012. 2017) for indicators selected by availability and discriminant capacity
Fig. 1
figure 1

Correlation matrices for Health indicators, 2007, 2012, 2017

Fig. 2
figure 2

Correlation matrices for Education indicators, 2007, 2012, 2017

Fig. 3
figure 3

Correlation matrices for Work and life balance, 2007, 2012, 2017

Fig. 4
figure 4

Correlation matrices for Politics and institutions, 2007, 2012, 2017

Fig. 5
figure 5

Correlation matrices for Security, 2007, 2012, 2017

Fig. 6
figure 6

Correlation matrices for Quality of services, 2007, 2012, 2017

We perform our analysis by making use of different sets of indicators, and we decided to report those providing the best outcome. Table 3 lists the indicators used in the analysis; their outputs are described in Sect. 4.2.

Table 3 Domain, name and description of indicators used in MFA

4.2 The results of the MFA

In the current paper, we use the MFA methodology with three indexes: i = 1,..,110 for the provinces, j = 1, 2, 3 for the years, and k = 1,…11 for indicators. We run the MFA using the variables described in the previous paragraph. In the application of the MFA, we use indicators regardless of the polarity because we are not interested in a synthetic indicator obtained by a PCA (Mazziotta and Pareto 2019), but only in detecting the unit position over time regarding the selected indicators.

Because in some cases the choice of the indicator in each domain is not straightforward, we decided to repeat the analysis with a different set of indicators. This is the case of two domains and two different variables (Neet and Graduates for Education, Child and Elect for Quality of services). Below we report the application that provides the best results in terms of the explained variance, that is, using the indicators Neet and Elect.

The eigenvalues shown in Table 4 and in Fig. 7 suggest choosing the first two dimensions that explain the 60.54% of the total variance.

Table 4 Eigenvalues of the first five factors (dimensions)
Fig. 7
figure 7

Screeplot

The partial analysis (see Table 5) for each of the three years reveals a decrease over time in the variability, which is explained by the first component, highlighting a process of convergence toward more similar values. The second component shows a different behavior.

Table 5 Partial analysis: eigenvalues of the first five component

Figure 8 presents the results of the analysis on the two-dimensional graph. If a variable is well represented (in the sense that its variability is well explained in the factorial dimension, i.e., that much of the variability is expressed in that factor), then its image on the factorial plane approaches the circumference, and the colors visually reinforce this fact. The more a variable forms a small angle with the factorial dimension, the more it is correlated with the factor and determines the interpretation of the axis. The variables that are well represented all over time refer to the following BES aspects: Health (Life_exp), Education (Neet), Work and life balance (Unempl), Security (Crimes), Environment (Waste), Innovation research and creativity (Cult_Emp). Regarding Social relations (Acc_sc), there is a good representation, but only for the last year, which is also the only available one. Politics (Women) has a good representation in the first two moments although it becomes worse in 2017. Landscape and cultural heritage (Rural) is poorly represented both in 2012 and 2017, but its data are not available in 2007. Economic well-being (Loans) has the worst representation, but it improves over time. The last one is Quality of services (Elect), which is not represented very well.

Fig. 8
figure 8

Correlations between quantitative variables and dimensions. Quality of representation (cos2)

By focusing on the horizontal axis (dimension 1) we notice on the right hand side, the variables positively correlated with dimension 1 (Life_exp, Women, Rural, Waste) and on the left one those negatively correlated (Unempl, Elect, Loans, Neet). The first dimension explains the socio-economic and environmental aspects: increasing the values of this dimension would relate to an improvement in Health, Education, Work and life balance, Economic well-being, Social relationships, Politics and institutions, Landscape and cultural heritage, Environment, and Quality of services. This first dimension explains 48% of the variance.

On the other axis (the vertical one), we observe the variables positively correlated with dimension 2 (Crimes, Cult_Emp), which explain 12.5% of the variance, and we can interpret it as a residual one. It is not easy to interpret this dimension: it increases with other crimes reported (Crimes) and with a rise of employers in cultural enterprises.

The graph in Fig. 9 represents the structure of the unit points (provinces) and their positions in 2007, 2012, and 2017. The black point is the barycenter of three colored points: the red one represents 2007, the green one 2012, and the blue one 2017. The dash is the distance between each year and the barycenter. We observe that the movements seem to be prevalent in the horizontal direction. For example, Roma (point 67) shows a considerable change in dimension 2 but not so much in dimension 1 (socio-economic and environmental aspects), while Milano (point 17) improves in dimension 1 a little from 2007 to 2012 and more so from 2012 to 2017. Dimension 1 decreases a little in the province of Rimini (point 47) from 2007 to 2012 but more so from 2012 to 2017. Napoli (point 72) improves in dimension 1 from 2007 to 2012, while it worsens from 2012 to 2017. Moreover, the situation in 2017 is worse than that in 2007.

Fig. 9
figure 9

Individual factor map

Figure 10 shows one relevant aspect. The points representing the provinces are reported in three different colors depending on the relevant macro region. The graph shows that contiguous provinces are not close in the factorial depiction (see, e.g., the provinces of Lazio indicated by the numbers 65, 66, 67, 68, and 69), and we also see a net separation on the factorial axis between the provinces in the North and those in the South, with the latter showing worse values for dimension 1.

Fig. 10
figure 10

Individual factor map. Distribution by macro region (North, Center and South Italy)

5 Conclusions

The aim of the research is to understand how and equitable and sustainable well-being in the territories (provinces) has changed, by comparing them between the precrisis 2008 and the subsequent years.

The principal interest is how the various territorial areas have moved over the period and which are the indicators that have a greater weight. For this purpose, the ISTAT database containing the BES territorial indicators is used. We have chosen an indicator for each domain because for some domains, it was the only possible choice, and in other cases, it was not possible to build a composite indicator for each domain for each of the three years considered (2007, 2012, and 2017) because of missing indicators; moreover, the different number of indicators by domain would lead to an initial distortion, resulting in different weights for each domain.

The need to simultaneously introduce quantitative and qualitative variables (known as mixed data) as active elements of one factorial analysis is typical in the problem of variables being partitioned into different subspaces; this can be solved by using MFA. It relates to a PCA, which can analyze both quantitative and qualitative data (i.e., time).

The hypothesis of independence between the variables at different times is satisfied when considering sufficiently distant times (5 years).

The MFA analysis also allows for the use of indicators available in a few years only; this happens for the Social relations domain. The variables considered in the analysis may also be different in the groups identified by the qualitative variable (time). A further characteristic of the method that makes it useful for studying evolution over time is that it is suitable for considering variables that are not available on an annual basis, too.

Our results identify a principal dimension that describes about 48% of the total variability across provinces over time.

By making use of the MFA we have identified the direction in the change and its magnitude with respect to the group of variables that are described with the various dimensions across the considered Italian territorial units.

The new idea regards the application of MFA to the selected indicators for Italian provinces in different periods. MFA is applied to tables in which a set of individuals (one individual = one row) is described by a set of variables (one variable = one column). The advantage of MFA lies in the fact that within the active variables, it can account for a group structure defined by the user. Such data tables are called individuals × variables organised into groups.

A different idea could be construct a composite indicator for each domain and then apply the MFA to monitor the evolution of BES in Italy over time.