Abstract
This article provides a combined geospatial artificial intelligence-machine learning, geoAI-ML, agent-based, data-driven, technology-rich, bottom-up approach and datasets for capturing the human dimension in climate-energy-economy models. Seven stages were required to conduct this study and build thirteen datasets to characterise and parametrise geospatial agents in 28 regions, globally. Fundamentally, the methodology starts collecting and handling data, ending with the application of the ModUlar energy system Simulation Environment (MUSE), ResidentiAl Spatially-resolved and temporal-explicit Agents (RASA) model. MUSE-RASA uses AI-ML-based geospatial big data analytics to define eight scenarios to explore long-term transition pathways towards net-zero emission targets by mid-century. The framework and datasets are key for climate-energy-economy models considering consumer behaviour and bounded rationality in more realistic decision-making processes beyond traditional approaches. This approach defines energy economic agents as heterogeneous and diverse entities that evolve in space and time, making decisions under exogenous constraints. This framework is based on the Theory of Bounded Rationality, the Theory of Real Competition, the theoretical foundations of agent-based modelling and the progress on the combination of GIS-ABM.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Background & Summary
At the basic level of most climate-energy-economy models, a main assumption rules input treatment, calculations, and analysis of results. Millions of consumers are deliberately represented as a single agent that takes prices as given, making rational choices with perfect knowledge of the market under rational expectations to maximize welfare, subject to budget constraints1, also called a hyperrational representative agent2. To overcome the limitations of representative homogenous hyper-rational agents in traditional climate-energy-economy models – so called the mainstream – the representation of the human dimension requires the use of empirical, historical, and analytical data. Geospatial big data analytics (combination of Geographical Information Systems, GIS, and Big Data Analytics) and agent-based modelling (ABM) tools present a potential opportunity to introduce the human dimension into the analysis in a more realistic manner. These tools can capture the complexities of heterogeneous shaping structures and the diverse shaping attributes of agents that evolve in space and time, which are driven by bounded rational expectations and exogenous factors. These complexities do not always allow agents to maximise their decisions, however, complexities representation presents an opportunity of more realistic assessments. The alternative and novel approach presented here, to represent energy economic agents that are heterogeneous, diverse, evolve in space and time, and take decisions under exogenous constraints, is based on (i) the Theory of Bounded Rationality initially described by Simon3,4, discussed and expanded by Petracca5, (ii) the Theory of Real Competition by Shaikh2, (iii) the theoretical foundations of agent-based modelling by Lavoie6, and (iv) the progress on the combination of GIS-ABM suggested by Crooks, et al.7.
The following sections provide an account of how the research was conducted, and how the datasets were calculated. Clear and detailed steps were provided for the community to repeat the research and reproduce the results. Details of the available data sources and other previously validated techniques used in this study are also presented here for reference. The datasets collected here are for 2010, because this is the base year used in most models. Figure 1 illustrates the steps of this research, along with some of the datasets required to conduct this research. In step 5, the global geospatial agent dataset is obtained, and from step 7, the energy supply dataset is calculated after applying the MUSE-RASA model8. To summarise, this section provides an overview of the datasets required (Subsection 1) for the framework design presented here (Subsection 2).
Collecting and Handling Data
Spatially resolved and temporally explicit datasets were collected from a range of sources. Missed gridded data were completed where necessary. The five groups of datasets were identified as follows. (i) Gridded end-use energy data were collected for 95 countries and completed for 165 countries. The methodology to complete the missing data and an initial assessment of the gridded dataset was published in Sachs, et al.9. (ii) Gridded demographic and socioeconomic data were collected from Kummu, et al.10. (iii) Gridded data for the calibration and validation of energy-related datasets were collected from Department for Business EIS11 and ARCONEL12. (iv) SSP2 macroeconomic driver data were collected from Riahi, et al.13. (v) Techno-economic data inputs used in this research is from the MUSE project at Imperial College London’s Sustainable Gas Institute; similar techno-economic data has been used in a series of articles14,15,16,17. In the following sections, more details on the data used in this study are provided.
Gridded end-use energy data
Four gridded datasets of the end-use energy of the residential sector were collected from Sachs, et al.9: (a) space heating, SH; (b) water heating, WH; (c) space cooling, SC; and (d) total energy for heating and cooling, TE. These energy demand datasets had a spatial resolution of 1 km2 and hourly seasonal temporal resolution, as explained in Sachs, et al.9. Figure 1 summarizes the end-use energy datasets used in this study. At this point, there is no processing of the energy data and only the collection. In addition to the end-use energy datasets, data representing the energy demand density were collected from Sachs, et al.9. Heat density is defined as the ratio between the heating demanded by customers and the area of interest, which may be a district, neighbourhood, or city. Similarly, the cooling density is defined as the ratio between the cooling demand of the customers and the area of interest. At this point, there is no processing of the energy density data.
Gridded socioeconomic and demographic data
Socioeconomic datasets were collected from Kummu, et al.10 and refer to (a) gross domestic product (GDP) per square kilometre, (b) gross domestic product per capita, GDPpc, per square kilometre, and (c) Human Development Index, HDI, at the city level or most available level. Demographic datasets were collected from CIESIN18 and refer to (d) population count per square kilometre and population density per area of availability.
Gridded data calibration and validation
Because of the extent of this research in terms of the number of countries covered, the main limitation in terms of data calibration and validation is the requirement for large-scale datasets at high spatiotemporal resolution. To address this limitation, data for validation purposes were collected from two counties: the United Kingdom (UK) and Ecuador. The Department for Business EIS11 from UK and ARCONEL12 from Ecuador provide publicly available data that were used to validate the gridded energy datasets. The validation process is presented in the validation section of Sachs, et al.9 for the UK and in Moya, et al.19 for Ecuador.
SSP2 macroeconomic drivers
The Shared Socioeconomic Pathways (SSPs) macroeconomic driver datasets are quantitative projections of GDP and Population as part of an Integrated Assessment framework13 developed at the International Institute for Applied System Analysis (IIASA, Austria), with a range of other research institutions globally. SSPs have been widely adopted by the climate change research community to analyse the consequences of future climate change. O’Neill, et al.20 and Van Vuuren, et al.21 report each of the five scenario narratives and the framework behind each scenario. The matrix used to build the framework combines climate forcing and socioeconomic conditions to describe the situation and evaluate climate impacts, vulnerabilities, adaptation, and mitigation. This research uses the SSP2 scenario datasets for GDP and Population, which is considered a “middle of the road” world, where medium challenges to mitigation and adaptation are assumed22. In the SSP2 scenario, trends in social, economic, and technological development broadly follow their historical patterns23. Although some countries would make relatively good progress (in the Global North), others would fall short of expectations (in the Global South). Thus, global inequality persists today in terms of development and income growth, and global population growth is moderate24. This scenario assumes that governments and civil society will work slowly to achieve sustainable development goals. Overall, a decline in the intensity of resource and energy use is expected; however, environmental systems would experience degradation25. SSP2 serves as a starting point to identify the evolution of population and GDP growth in the countries studied in this research.
Technoeconomic data
The technoeconomic dataset refers to the data used for the economic feasibility analysis of technologies in each region of the world. The economic feasibility analysis is a key study for selecting the most appropriate technology from a set of options. These data were developed by Imperial College London’s Sustainable Gas Institute for the MUSE research project15,16,17. Table 1 provides an example of the technoeconomic data used in the MUSE-RASA model for the evaluation of heating technologies. It is also assumed that the interest rate is 10% and that the initial Capital Expenditure (CAPEX) values are in MUS$2010/PJ.
Figure 2 summarises the results of applying the MUSE-RASA framework to obtain the datasets presented herein. A global definition of agent characterisation is provided in terms of GDPpc, HDpc, and HD, as shown in Fig. 2a. Figure 2b presents the global energy demand in the residential sector for the 28 regions in the MUSE-RASA framework. In Fig. 2c,d, a shot of the geospatial agent distribution in Mexico and Shanghai cities is presented. Figure 2e shows the demand for residential heat in terms of the agents’ requirements. These results illustrate the importance of the dataset, along with the strictness and robustness of the systematic approach developed in this study.
Geospatial Big Data Analytics For Spatial Agent Definition
The geospatial agent-based modelling approach of this study follows five components: (i) agent heterogeneity, (ii) agent diversity, (iii) agent evolution in space and time, (iv) the agent decision-making process, and (v) the influence of exogenous constraints on agent decisions. Geospatial big data analytics, also called spatial data mining, was used to discover hidden knowledge from the large, gridded datasets collected in this research. An Unsupervised Machine Learning technique is applied to classify spatial data points into specific groups according to similar properties with the implementation of the geospatial K-means algorithm developed in this research and published in Sachs, et al.26. This method has been applied worldwide to the collected datasets.
This article aims to introduce a new Geospatial Agent-Based Modelling Framework called MUSE-RASA. The model has been used to create a large dataset of geospatial agents to assess the impact of the climate-energy-economy system on the residential sector globally, with a focus on reaching the mid-century net zero emission (NZE) target. The model uses geospatial big data analytics to capture the human dimension in the modelling approach, which is limited to traditional models. The MUSE-RASA model uses five components–heterogeneity, diversity, evolution, decision-making, and exogenous constraints–to represent the complexities of agents’ structures, diversity, and evolving attributes, as shown in Fig. 3. The model produces global metrics that can be used to analyse transition and design policy recommendations. The MUSE-RASA model is an integrated assessment model that combines GIS-based and ABM approaches and is more realistic in representing the complexities of agent behaviour under different constraints.
Methods
This research defines an agent as a group of energy consumers with similar characteristics, in terms of heterogeneity, diversity, evolution in space and time, decision-making process and influenced by exogenous constraints. An agent is spatially defined within a specific zone, enclosed by borders under three heterogeneous characteristics. In each of those zones, a range of parameters are calculated to define the agent diversity and evolution. To do this, machine learning, AI-ML-based geospatial big data analytics, a subfield of artificial intelligence (AI), has been systematically applied to a range of datasets. In the following sections, each step of the framework to produce the datasets8 shared here is described.
Spatial agent definition using machine learning
The Spatial Agent Definition consists of three parts: (1) the spatial characterization of heterogeneity, (2) the spatial parametrisation of diversity, and (3) the spatiotemporal parametrisation of evolution. Figure 4 provides a general description of each of the three parts of the spatial agent definition.
This research defines agent heterogeneity as the shaping structure that shapes agent behaviour, which can be historical, social, economic, and cultural structures, according to Schoon and Heckhausen27 and Shaikh2. Here, agent heterogeneity is captured by overlaying more than one gridded layer, where each layer represents one characteristic (see Fig. 4). The resulting emerging layer from the overlaid process represents the shape structure that defines the limits of contours and zones where agents shape their behaviour. Examples of layers with spatial characteristics that define the agent structure in the energy field include the agent income level, their minimum energy consumption level, and their propensity to consume energy.
Agent diversity is given by a range of parameters that can be calculated in each zone. Overall, the total value of the parameters of interest are extracted from each layer of available gridded data. Examples of attributes that can be used for agent diversity parametrisation in the energy field are the total heating energy demand, total cooling energy demand, and level of development according to HDI, among others. Finally, the spatiotemporal agent evolution is given by a range of parameters that evolve over time for each of the agent zones defined in the spatial characterisation.
The geospatial K-means Unsupervised Machine Learning approach was applied to build the spatial agent definition framework described above as the main contribution of this research. This section provides the general spatial agent definition framework, which can be used to define agents worldwide using geospatial big data analytics. The Framework has six steps: (i) clustering of gridded data, (ii) reclassification of clustered data, (iii) zone definition, (iv) spatial characterisation of agent heterogeneity, (v) spatial parametrisation of agent diversity, and (vi) spatiotemporal parametrisation of agent evolution.
Clustering of gridded data
In the geospatial k-means clustering approach, the Elbow Method (EM) was applied to define the optimal number of clusters (ONC), which served to define the optimal number of spatial agents as each cluster turned into a group of people with the same spatial attribute: an agent. EM calculates the Within-Cluster-Sum of Squared Errors (WSS) for different number of clusters k and choose the k for which WSS becomes first starts to diminish. The elbow was visible in the plot of WSS versus k. Table 2 defines the steps of the Algorithm of the Elbow method, which is used to define the ONC. The within-cluster variance (or the total within-cluster sum of squares, wss), W(Ck), of a cluster Ck is defined by the Euclidean distance in Eq. 1.
Where:
-
xi is a data point belonging to the cluster Ck
-
\({\bar{x}}_{k}\) is the mean value of the points assigned to the cluster Ck; also called the cluster centroid, and its values are the coordinate-wise average of the data points in Ck.
-
{x1, …, xn } is the set of observations; they are vectors, with one (longitude, latitude) coordinate per dimension (e.g., gridded HD).
Once the ONC is defined, a global clustering is conducted by the application of the spatial K-means algorithm to the attribute/parameter of interest (e.g., HD), as can be seen in Table 3. The main outcome of this stage is the calculation of elements belonging to a cluster Ck, which are defined by lower and upper bounds of each cluster. All the cluster elements are centred around their respective centroids. Then, the lower/upper bounds are defined halfway between each consecutive centroid value. This method defines the limits to which each spatial agent belongs. This was performed for each of the parameters of interest. With one parameter, a spatial agent is defined as an agent with one attribute. In the geospatial agent-based modelling section, more attributes are considered to define the heterogeneous and diverse agents.
Reclassification of clustered data
The global reclassification is done by assigning a number, from 1 to k, to the reclassifying ranges (clustered layer) of values of the gridded dataset. This operation reclassifies groups of values into other values. For example, all values between 1 (lower bound) and 100 (upper bound) become 1 (first segment), and all values between 101 (lower bound) and 200 (upper bound) become 2 (second segment), and so on, until k segments. The lower and upper bounds used to define the reclassification boundaries were obtained in the previous step by using the geospatial k-means clustering algorithm. A reclassified gridded layer is obtained from this step, which is then used to define the zones (in the literature, also known as polygons or areas) where each agent is located. Table 4 presents the general concept of reclassification of gridded data. This also visually explained in Fig. 5a.
Where:
-
xmin is the minimum value in the gridded dataset
-
xmax is the maximum value in the gridded dataset
-
xi, xii, xn, xn+1 are the elements of each cluster Ck
-
\({X}_{i}=\left\{{X}_{i}{\rm{| }}i\in {\mathbb{R}},1\le i\le k\right\}\)
-
Xk is the ONC + 1
Zones definition
Once the reclassified layer is obtained, the spatial geometry containing the agents within each reclassified cluster is calculated. The spatial geometry is then defined as a zone containing the agents. A zone is defined as a range of finite polygons formed by the contours/boundaries of all contiguous reclassified clusters, as shown in Fig. 5. For example, Zone 1 is defined by two polygons as it is for Zones 3, 4, and k, whereas Zone 2 is defined by a single polygon. Another Zone can be defined by the remaining six polygons, as shown in Fig. 5b.
The general notation used to define a Zone Z with one spatial characteristic chH is presented in Table 5 and illustrated in Fig. 5. This notation is key for the further definition of agents with multiple characteristics, as developed in the following sections. For example, a spatial agent with 2 spatial characteristics would be defined with the use of two zones each with a different spatial characteristic ch1 and ch2: \({Z}_{c{h}_{1},{n}_{m}}\), and \({Z}_{c{h}_{2},{n}_{m}}\). In Fig. 6, the definition of zones for agents with one spatial characteristic ch1 is illustrated. In Table 5, the general notation is also provided for Zones Z with 1 to H spatial characteristics, 1 to n zones, and 1 to m polygons.
Where:
-
Z represents a zone, grouping several polygons with similar characteristics to the spatial agent in place. Z is defined by a spatial characteristic ch, several zones n; the grouped polygons m with similar properties forms a zone Z.
-
ch is a spatial characteristic and varies from 1 to H. These can be GDP, GDPpc, and SH, among others.
-
n is the maximum possible number of zones Z in a region or country.
-
m is the number of polygons that each zone Z may possess.
Spatial characterization of agent heterogeneity
Once the zones were defined, the spatial agent heterogeneity was defined by the spatial characterisation. First, a spatial agent is the join of all zones into a multi-polygon zone with a specific characteristic. Second, a spatial agent with one spatial characteristic defines the heterogeneity with a single characteristic. Table 6 provides the definition of a spatial agent SpA with one spatial characteristic M, chM, in any zone n of a region or country (Eq. 6). It is important to clarify that here, the zone n is already grouped into a single multipolygon. The attribute is a quantity based on annual values, consistent with the selection of agents and the available data. Examples of spatial characteristics that define the agent heterogeneity include energy demand per capita, energy density, and GDP per capita, among others.
The spatial characterisation of agent heterogeneity is given by multiple spatial characteristics. To obtain an agent with multiple spatial characteristics, the spatial characterisation approach for one spatial characteristic is applied to more than one reclassified gridded layers. Then, multiple layers are overlaid to calculate a new layer that intrinsically inherits the heterogeneous characteristics of the layers used for the intersection. For example, from the intersection of two layers (within a range of zones), a new layer that represents new heterogeneous zones emerges. These zones determine the limits or boundaries of agents with similar spatial characteristics and the same number of characteristics as the layers are intercepted. Figure 7 illustrates the process of the overlaying calculation using two spatial agent characteristics separately (a, b) to end with a new emergent agent with two spatial characteristics (c). A multiple spatial characterisation overlays multiple layers to define the agent heterogeneity.
Equation 7 presents the general representation of a spatial agent SpA with multiple spatial characteristics Mch for a country or a region. The approach used to define spatial agents with multiple spatial characteristics is rooted in the intersection of layers that were previously reclassified using the K-means clustering technique. This definition can be applied to any set of parameters (e.g., GDP, SH, SC, and DH) in the energy field or in any other field where gridded data are available.
Where:
-
SpA represents a spatial agent.
-
Mch defines the multiple spatial characteristics of a spatial agent.
-
ch1 is the first spatial characteristic of the spatial agent.
-
chH is the spatial characteristic, H, of the spatial agent.
Spatial parametrisation of agent diversity
The spatial parametrisation of agent diversity consists of extracting the total value of a parameter or a range of parameters from the multi-polygon zone of each spatial agent. This means that, in each new emergent zone of Fig. 7c, for example, the total value of a parameter is calculated. Table 7 illustrates the equations used to conduct the agent parametrisation of this study with multiple spatial characteristics. The spatial parametrisation can be applied to spatial agents characterised by one or multiple characteristics. A spatial agent SpA defined from the intersection of multiple spatial characteristics Mch in zone n, zn, with parameter 1, p1 is defined by \(Sp{A}_{Mch,{z}_{n}}\left({p}_{1}\right)=\mathop{\sum }\limits_{i=1}^{k}{p}_{1,i}\), as shown in Table 7.
Spatiotemporal parametrisation of agent evolution
The spatiotemporal parametrisation of agent evolution is given by Eq. 11 and consists of the evolution in time t of a parameter or a range of parameters from the multi-polygon zone of each spatial agent. This means that, in each new emergent zone of Fig. 7c, for example, a parameter profile is calculated for a period in time t. Equation 11 illustrates the equation used to parametrise the agent evolution with multiple spatial characteristics. The spatiotemporal parametrisation of agent evolution can be applied to spatial agents characterised by one or multiple characteristics.
Where:
-
SpA represents a spatial agent.
-
Mch defines the multiple spatial characteristics of a spatial agent.
-
ch is the spatial characteristic of the spatial agent.
-
z1→n is the zones of the spatial agent.
-
p1→q is the multiple evolving parameters of the agent.
-
t is the time of the multiple evolving parameters of the agent
Agent-based modelling
Here, an agent is defined as an autonomous, heterogeneous, diverse, adaptive decision-making entity within a complex system that interacts with its environment and other agents through prescribed conflicting bounded behavioural rules, shaped by shaping structures and attributes, to produce emergent and complex system-level patterns in space and time. To represent this agent definition, this research has proposed the general framework for the spatial agent definition developed here and has adopted the MUSE ABM framework proposed in Giarola, et al.15, García Kerdan, et al.16, Moya, et al.17, and Moya, et al.28.
MUSE ABM framework
Figure 8 shows the MUSE ABM framework adopted in this study. Exogenous data are required for the model inputs, which are a combination of gridded and national datasets. The MUSE ABM framework defines a decision-making process for each agent based on the 10 parameters listed in Table 8.
Equation 12 illustrates the agent definition in the MUSE ABM framework. Ten attributes are considered to define the agent decision-making process. The attributes are listed in Table 8.
Survey-based decision-making parametrisation
This research has also developed three questionaries to collect primary data directly from main sources through in situ, person-to-person, and online surveys. The first questionnaire was developed by a team of researchers and industry experts to assess the Indian industry sector; details can be found in Moya, et al.17. Table 9 expands the use of survey outputs to the MUSE agent decision-making framework. Each parameter of the agent’s definition of Eq. 12 is parametrised by a set of answers from the Questionnaire (see Table 9). For example, in Question 19, the agent is asked about the main investment decision metric to consider when energy technology investment is required. The answer guides the researcher towards the definition of the first parameter of the agent definition, the objective investment. A similar approach was used for the remaining parameters of the agent definition. This questionnaire and survey experience served to further develop a questionnaire for the residential sector in China and Ecuador. The Spanish version of the survey used for the Ecuadorian case study can be found in [https://forms.office.com/r/B93BxJgxX2] and published in Moya, et al.29 and the Chinese version of the survey can be found in the following link [https://www.wjx.cn/vj/w8Xp3UL.aspx].
Geospatial agent-based modelling framework
The components of the geospatial Agent-Based Modelling Framework of this research are characterised and parametrised with five groups of attributes: (1) heterogeneity, (2) diversity, (3) evolution, (4) decision-making, and (5) exogenous constraints. The framework presented in Fig. 9 provides spatially resolved and temporally explicit model agent-based scenarios to assess the long-term sustainable transition of the residential sector globally. This framework captures the human dimension and introduces realism into climate-energy economy models.
Spatial characterization of heterogeneity
The spatial characterization of agent heterogeneity follows Step (iv) of the general framework for the spatial agent definition presented previously. The attributes used to define the spatially resolved and time-explicit characteristics are presented in Eq. 13 and are explained in Table 10. Figure 10 illustrates the process of capturing agent heterogeneity by overlaying three shaping structures.
Spatial parametrisation of diversity
The spatial parametrisation of agent diversity follows the step (v) of the general framework for the spatial agent definition presented here. The attributes used to define the spatially resolved and time-explicit parameters of diversity are presented in Eq. 14, and are explained in Table 11.
Spatiotemporal parametrisation of evolution
The spatiotemporal parametrisation of agent evolution follows the step (vi) of the general framework for the spatial agent definition presented here. The evolving attributes used to define the spatially resolved and time-explicit parameters of agent evolution are presented in Eq. 15.
Where:
-
POP(t) is the Population evolution in time
-
GDP(t) in the GDP evolution in time
Parametrisation of decision-making process
This study adopted the decision-making process approach of the MUSE ABM framework described in Eq. 12 and are explained in Table 9.
Exogenous environmental policy constraint
The external limitations imposed by environmental policies are referred to as exogenous constraints, which can prompt individuals to alter their actions while evaluating heating or cooling technology. To investigate this, the study utilized carbon price profiles from 2005 to 2100 suggested in the MUSE model30, with each individual having access to various technologies that could result in varying levels of CO2 emissions. The total cost of carbon is calculated when an individual selects a technology that satisfies its service requirements. This external influence affects the decision-making process of each individual before making the ultimate investment decision.
Scenario definition
In this study, eight scenarios have been developed (see Table 12) to assess each of the five components of the geospatial Agent-Based Modelling Framework presented previously. Heterogeneity (i), diversity (ii), and evolution (iii) follow the definitions previously discussed. For the decision-making component (iv), this research has adopted the Levelised Cost of Energy (LCOE) as the main investment objective in agents when choosing a technology. The calculation of the annual LCOE for each technology includes the required investment expenditures (including financing), the operations and maintenance expenditures, the fuel expenditures, the electricity generation, the discount rate, and the technical life of the system. It is assumed that the agents would consider the final LCOE value to make the final decision. For the same decision-making component (iv), scenarios are defined assuming that agents have unlimited budgets (scenarios 01, 02, 05, and 06) and that agents have budget restrictions (scenarios 03, 04, 07, 08) according to their GDPpc shaping structure, which is part of the heterogeneity characterisation. The latter is called the multiple-budget system. The heat density restriction (HDR) is added to the decision-making process. HDR defines the technical and economic feasibility of technologies in agent zones according to the heat density of the zone where the agents are located. For the component of the exogenous environmental policy constraint (v), it is assumed that scenarios 01, 03, 05, and 07 consider carbon price (CP) schemes from Budinis, et al.30. The remaining scenarios do not consider CP schemes in the model.
The MUSE-RASA model
The MUSE-RASA model is a combination of the general framework for the spatial agent definition and the MUSE ABM Framework used for the geospatial Agent-Based Modelling Framework explained in previous section. Figure 11 presents the link between spatially resolved and time-explicit agents with the MUSE ABM algorithm that has been applied in the MUSE-RASA model. The five components that capture realism in the geospatial Agent-Based Modelling Framework explained previously are also illustrated: (1) heterogeneity, (2) diversity, (3) evolution, (4) decision-making, and (5) exogenous constraints of multiple agents within the MUSE-RASA model. The model calculates six outputs of the eight agent-based scenarios to explore the long-term climate-energy-economy transition pathways towards the NZE targets by mid-century, with a focus on the residential sector globally.
Table 13 describes the formulas that have been implemented in the MUSE-RASA model to calculate the outputs of the model. The service demand for space heating (and other residential end-uses) is firstly calculated. This serves to calculate the installed capacity required to meet the demand for heating supply technologies. Once the technologies are identified, electricity and fuel consumption can be estimated. The total capital expenditure (CAPEX), along with the LCOE and the total emissions are finally calculated.
Data Records
The MUSE-RASA geospatial agent-based modelling framework presents 13 geospatial datasets8: three for the characterization, two for heterogeneity definition, one for diversity parameterization, one for evolution parameterization, two for decision-making parameterization, one for the estimation of global energy demand in the residential sector, two for spatial cross-validation, and one for the MUSE regions used in this research. Details are presented in Table 14. This research defines characterisation as the process of assigning geospatial boundaries to agents under similar geospatial characteristics and parametrisation as the process of estimating numeric parameters to those agents within those boundaries. This study includes a survey-based decision-making parametrisation for China and Ecuador in Dataset 8. To validate the approach, this study employed the spatial cross-validation technique explained in the methodology section. Overall, this study contributes to a better understanding of complex agent systems and provides insights into how to use data in a spatial context for human representation in models.
Global clustered GDPpc [GDPpc_km2_shapes.shp]
This dataset provides a globally clustered GDPpc with respect to the six classes, as shown in Fig. 12 and Table 15. The shape file presents a range of zones with clustered values, regardless of the geographical administrative areas. For example, agents living in zones within GDPpc limit 1 (GDPpc1 = [min, 500], USD/cap*yr) can be in more than one region.
Global clustered HD [HD_km2_shapes.shp]
This dataset provides a globally clustered heat density with respect to the four classes, as shown in Table 16. The shape file presents a range of zones with clustered values, regardless of the geographical administrative areas. For example, agents living in zones within HD limit 2 (HD2 = [1790, 12080], MWh/km2*yr) can be in more than one region.
Global clustered HDpc [HDpc_km2_shapes.shp]
This dataset provides global clustered heat demand per capita with respect to the four classes, as shown in Fig. 13 and Table 17. The shape file presents a range of zones with clustered values, regardless of the geographical administrative areas. For example, agents living in zones within HDpc limit 3 (HD3 = [3.2, 5.3], MWh/cap*yr) can be in more than one region.
Global agents with two characteristics [Agents_GDPpc_HDpc.shp]
This dataset provides global agent characterisation based on two geospatial characteristics, as shown in Fig. 14 and Table 18. The shape file presents a range of zones that represent the borders or areas where agents with two characteristics interact regardless of geographical administrative areas. For example, agents living in zone A’ 1 belong to areas with GDPpc1 and HDpc1 and are in more than one region globally.
Global agents with three characteristics [Agents_GDPpc_HDpc_HD.shp]
This dataset provides global agent characterisation based on three geospatial characteristics, as shown in Fig. 2 and Table 19. The shape file presents a range of zones that represent the borders or areas where agents with the three characteristics interact, regardless of geographical administrative areas. For example, agents living in zone A2 belong to areas with GDPpc1, HDpc2 and HD1, and are in more than one region globally.
Dataset to define agent diversity [6_global_agents_diversity.csv]
This dataset provides 12 parameters to define agent diversity worldwide aggregated in 28 regions. All the values were provided in 2010. Table 20 defines each variable provided in this dataset. Figure 15 provides the global distribution of three out of twelve parameters that define the agent diversity for each of the 28 regions considered in this research.
Dataset to define agent evolution in space and time [7_global_agents_evolution.csv]
This dataset provides the values of GDPpc and Population for each agent zone in each region from 2010 to 2100.
Dataset to define the decision-making process in China [8_China_dm_agents_survey.csv]
This dataset provides a range of variables to define the current status of the residential sector in China in terms of energy consumption and willingness to invest in new energy technologies or retrofitting. Variables are self-explanatory.
Dataset to define the decision-making process in Ecuador [9_Ecuador_dm_agents_survey.csv]
This dataset provides a range of variables to define the current status of the residential sector in Ecuador, in terms of energy consumption and willingness to invest in new energy technologies or retrofitting. Variables are self-explanatory.
Dataset of global energy demand by agents and regions [10_global_agents_demand.csv]
This dataset provides the energy demand in the residential sector worldwide, disaggregated by agents and regions. The demand is further dissagregated in six service demands, as follows: space heating (hspace), water heating (hwater), space cooling (cspace), cooking (cook), lighting (light) and appliances (appl). These demands were used in eight previously defined scenarios. Figure 2 illustrates this dataset.
Dataset of global geospatial cross-validation [11_spatial_cross_validation.csv]
This dataset provides details of the results of the subclustering approach used in this study. The subclustering reduced the number of heating demand agents from 96 to 20 globally. 96 agents were initially estimated for three geospatial characteristics. However, similarities were observed and a subclustering process was applied to reduce the number of agents. The Elbow Method is used to determine the Optimal Number of Clusters along with the actual final number of clusters per region. The dataset shows the results of measuring agent compactness after applying the subclustering K-means discussed previously. The percentage of well-grouped data [percentage_of_well_grouped_data in dataset] shows the usual decomposition of deviance in deviance between clusters (BSS) and deviance within clusters (TSS). Ideally, the subclustering seeks clusters that have the properties of internal cohesion and external separation. Therefore, the ratio of BSS/TSS approaching 1 represents the compactness of the subclustering of agents31. Despite having 96 agents initially, a high percentage of well-grouped data means that the final 20 agents have similar members within each new cluster after the application of the Elbow Method. In summary, if all 96 agents were selected without using the Elbow Method, the BSS/TSS ratio would be 1, thereby achieving 100% compactness. Overall, the separate subclustering conducted for each MUSE-RASA region produced a BSS/TSS ratio greater than 0.975, which means that more than 97% of the initial 96 agents were well grouped into 20 agents. Additionally, the Silhouette coefficient [ave_sil_width in dataset] has been used to evaluate the goodness of the subclustering. Overall, a Si greater than zero indicates that the agents are well grouped. The closest Si is to 1, the best it is clustered. A Si < 0 indicates that agents were placed in the wrong group. In addition, Si = 0 indicates that the agents are between two clusters. These two variables are of especial importance for the cross-validation of agent characterisation and further parametrisation.
Dataset of global geospatial cross-validation errors [12_spatial_cross_validation_errors.csv]
This dataset provides the results of the third validation process in addition to the validation previously discussed. The error between the agent parametrisation values and the aggregated parameter at the regional level is provided in this dataset. Errors have been estimated for GDP, GDPpp, TE, SH, WH, Pop and HDI. Overall, the agent parametrisation approach suggests a global measure of error that is satisfactory, as the error is minimum in most agents and regions.
Dataset of global region shapes [13_Regions_shapes.shp]
This dataset provides the MUSE-RASA regions used in this research in a geospatial format [. shp]. The 28 regions of the MUSE model are provided, which have been extensively documented in the literature14 and32.
Technical Validation: Spatial Cross-Validation
Four validation processes have been conducted in this research to validate the Geospatial Agent-Based Modelling (G-ABM) Framework, including the characterisation of heterogeneity (clustering and subclustering), and the parametrisation of diversity. First, the G-ABM approach was validated by comparing the official values of the two selected countries with those estimated in this study, as published in Sachs, et al.9. Second, the quality of clustering performed using the spatial K-means algorithm on GDPpc, HD, and HDpc was assessed worldwide. Details of this validation of the spatial characterisation are provided in Moya, et al.28. Third, the subclustering goodness of the final spatial agents was measured using the Silhouette coefficient (Silhouette width), as can be seen in dataset No. 11. Finally, the error of the diversity parametrisation of each agent attribute was also calculated and provided in dataset No. 12. This was performed by comparing the aggregated agent results with the total regional values.
The global number of heating demand agents was reduced from 96 to 20 through the process of subclustering. Figure 16 illustrates the results of measuring the compactness of the agents after applying the subclustering K-Means discussed in the Methodology, in the third validation process conducted in this research. The y-axis in Fig. 16 represents the percentage of well-grouped data, which indicates the division of deviance between clusters (BSS) and within clusters (TSS). Ideally, the subclustering aims to create clusters that exhibit internal cohesion and external separation. Thus, a BSS/TSS ratio approaching 1 explains the compactness of the subclustering of agents31. Despite initially having 96 agents, a high percentage of well-grouped data indicates that the final 20 agents share similar members within each new cluster after applying the Elbow Method. In other words, if all 96 agents were chosen without using the Elbow Method, the BSS/TSS ratio would be 1, achieving 100% compactness. In summary, the separate subclustering performed for each MUSE-RASA region resulted in a BSS/TSS ratio greater than 0.975, indicating that over 97% of the initial 96 agents were effectively grouped into 20 agents. This outcome is particularly significant for the subsequent stages of the research, as agent definition involves specific zones of GDPpc, HD, and HDpc (characterization) with a range of parameters (parametrization).
Figure 17 depicts the validation process for agent parametrization, which is the fourth validation procedure conducted in this research alongside with the previous spatial-cross validation processes. The figure presents the disparity and comparison between the agent parametrization values from this research and the aggregated parameter at the regional level from data sources. It can be observed that in certain regions (CHN, DNK, EU7, ISL, ISR, JPN, KOR, and ZAF), the error is less than 1%. However, in the case of GDP in CAN, the error can reach 10%, and for HDI in ATE, error can go up to 12%. Overall, the agent parametrization approach demonstrates an acceptable level of global error, as the majority of agents and regions exhibit minimal error.
Usage Notes
The datasets provided in this study8 are of real importance for researchers exploring the combination of GIS with ABM where socioeconomics and energy demands are needed. The datasets are spatially resolved and temporarily explicit, which serve to capture the spatiotemporal dimensions in global model simulations. A range of agents are systematically defined. It is suggested that these datasets be used as inputs in future research on the decarbonisation of the energy system when considering the human dimension.
Stakeholders of the sustainable transition of the climate-energy-economy system can benefit of this research datasets in several manners. Decision-makers, policy-makers, firms, civil society, and researchers can identify four potential applications of these datasets in the context of assessing climate-energy-economy transition paths:
-
1.
Agent budget limitations: the datasets presented here8 embed intra-regional differences among energy consumers of the residential sector. This has important implications in climate-energy-economy modelling for designing policies, capturing heterogeneities, diversities, evolution, decision-making and external drivers of energy and economic agents in the assessment.
-
2.
Agents that drive the transition: this research has identified the main agents that will drive the climate-energy-economy transition globally. These agents are characterised and define with a range of parameters, openly share in this research8. Specific agents meet certain and customised characteristics defined by stakeholders to reach defined and designed goals such as changing energy use behaviour or adopting clean-highly efficient technologies.
-
3.
Carbon tax schemes implementation: Carbon tax schemes are hard to implement because of the regressive impact on poorer households. This research can contribute towards minimising or eliminating the impact of carbon tax schemes implementation. To accelerate the sustainable transition towards the NZE target by mid-century, this research helps policy-makers and implementers of carbon tax schemes by targeting and focusing on agents that can afford it or developing financial assistance programs for those that are unable to meet such taxes.
-
4.
Research and development prioritisation based on heat density: institutions in charge of researching, innovation and development of new solutions for a sustainable transition of the global climate-energy-economy system can also be beneficiaries of the results of this research8. Additional applications would apply for consumers living in zones where district heating technologies are technically feasible because of the high energy density observed there.
Limitations and challenges are also identified in this research. The main limitation of this study is the validation of the decision-making process part. Although four systematic spatial cross validation processes have been conducted for the general framework for the spatial agent definition, there is lack of data about agent decision-making processes to validate any agent investment objective use in future assessments. The only way to inform specifically the decision-making process would require targeted surveys in the location, city, country, or region under study. Examples of surveys carried out for China and Ecuador, to collect primary data to characterise the decision-making process, are presented in this research8. However, conducting a representative survey for all worldwide regions of this research would be a time- and resources- consuming task. National surveys would enrich the agent disaggregation analysis that this study has proposed. This could apply not only for the residential sector, but also for other sectors such as industry, transport, and agriculture. In this way, the research and datasets here can be applied to other sectors accordingly.
Code availability
The algorithms and formulas used in this study have been previously provided. This research used three programmatic free and open-source platforms: (1) R Statistical Software and Programming Language; (2) Quantum GIS (QGIS) software; and (3) Python software. A range of R Packages for geospatial big data analytics used in this research are presented in Bivand33. QGIS is used for data exploration purposes because of its features of viewing, editing, and analysing geospatial data34. Python is the development programmatic environment for the MUSE model35. The MUSE-RASA model has been built from the integration of the R-based geospatial RASA model with a Python-based MUSE model to end with the MUSE-RASA model. The R code used to create the shape files in the RASA model is available upon request with proper justification from the corresponding author. The MUSE model is an open source code available in Giarola, et al.36. Due to sponsorship agreements, the authors are not allowed to make the RASA code publicly available.
References
Nikas, A., Doukas, H. & Papandreou, A. A detailed overview and consistent classification of climate-economy models. Understanding risks and uncertainties in energy and climate policy, 1–54 (2019).
Shaikh, A. Capitalism: Competition, conflict, crises. (Oxford University Press, 2016).
Simon, H. A Behavioral Model of Rational Choice. The Quarterly Journal of Economics 69, 99–118, https://doi.org/10.2307/1884852 (1955).
Simon, H. A. in Utility and Probability (eds Eatwell, J., Milgate, M. & Newman, P.) 15–18 (Palgrave Macmillan UK, 1990).
Petracca, E. Simulating Marx: Herbert A. Simon’s cognitivist approach to dialectical materialism. History of the Human Sciences, 09526951211031143 (2021).
Lavoie, M. Post-Keynesian economics: new foundations. (Edward Elgar Publishing, 2014).
Crooks, A., Malleson, N., Manley, E. & Heppenstall, A. Agent-based modelling and geographical information systems: a practical primer. (Sage, 2018).
Moya, D. et al. MUSE-RASA captures human dimension in climate-energy-economic models via global geospatial agent datasets using AI-ML, Figshare, https://doi.org/10.6084/m9.figshare.c.6630860.v1 (2023).
Sachs, J., Moya, D., Giarola, S. & Hawkes, A. Clustered spatially and temporally resolved global heat and cooling energy demand in the residential sector. Applied Energy 250, 48–62, https://doi.org/10.1016/j.apenergy.2019.05.011 (2019).
Kummu, M., Taka, M. & Guillaume, J. H. Gridded global datasets for gross domestic product and Human Development Index over 1990–2015. Scientific data 5, 180004 (2018).
Department for Business EIS. Lower and Middle Super Output Areas gas consumption 2010, https://www.gov.uk/government/statistics/lower-and-middle-super-output-areas-gas-consumption
ARCONEL. Estadística del Sector Eléctrico, https://www.regulacionelectrica.gob.ec/estadistica-del-sector-electrico/ (2021).
Riahi, K. et al. The shared socioeconomic pathways and their energy, land use, and greenhouse gas emissions implications: an overview. Global environmental change 42, 153–168 (2017).
Giarola, S., Sachs, J., d’Avezac, M., Kell, A. & Hawkes, A. MUSE: An open-source agent-based integrated assessment modelling framework. https://doi.org/10.21203/rs.3.rs-1450486/v1 (2022).
Giarola, S. et al. Challenges in the harmonisation of global integrated assessment models: A comprehensive methodology to reduce model response heterogeneity. Science of The Total Environment 783, 146861, https://doi.org/10.1016/j.scitotenv.2021.146861 (2021).
García Kerdan, I. et al. Modelling cost-effective pathways for natural gas infrastructure: A southern Brazil case study. Applied Energy 255, 113799, https://doi.org/10.1016/j.apenergy.2019.113799 (2019).
Moya, D., Budinis, S., Giarola, S. & Hawkes, A. Agent-based scenarios comparison for assessing fuel-switching investment in long-term energy transitions of the India’s industry sector. Applied Energy 274, 115295, https://doi.org/10.1016/j.apenergy.2020.115295 (2020).
CIESIN, C. f. I. E. S. I. N. C. U. (NASA Socioeconomic Data and Applications Center (SEDAC) Palisades, NY, 2016).
Moya, D. et al. Geospatial and temporal estimation of climatic, end-use demands, and socioeconomic drivers of energy consumption in the residential sector in Ecuador. Energy Conversion and Management 261, 115629 (2022).
O’Neill, B. C. et al. A new scenario framework for climate change research: the concept of shared socioeconomic pathways. Climatic change 122, 387–400 (2014).
Van Vuuren, D. P. et al. A new scenario framework for climate change research: scenario matrix architecture. Climatic Change 122, 373–386 (2014).
O’Neill, B. C. et al. The roads ahead: Narratives for shared socioeconomic pathways describing world futures in the 21st century. Global Environmental Change 42, 169–180, https://doi.org/10.1016/j.gloenvcha.2015.01.004 (2017).
Kc, S. & Lutz, W. The human core of the shared socioeconomic pathways: Population scenarios by age, sex and level of education for all countries to 2100. Global Environmental Change 42, 181–192, https://doi.org/10.1016/j.gloenvcha.2014.06.004 (2017).
Fricko, O. et al. The marker quantification of the Shared Socioeconomic Pathway 2: A middle-of-the-road scenario for the 21st century. Global Environmental Change 42, 251–267 (2017).
Grubler, A. et al. A low energy demand scenario for meeting the 1.5 C target and sustainable development goals without negative emission technologies. Nature energy 3, 515–527 (2018).
Sachs, J., Moya, D., Giarola, S. & Hawkes, A. Global spatially and temporally-resolved heat and cooling energy demand in the residential sector [in press]. Applied Energy (2019).
Schoon, I. & Heckhausen, J. Conceptualizing individual agency in the transition from school to work: A social-ecological developmental perspective. Adolescent Research Review 4, 135–148 (2019).
Moya, D., Giarola, S. & Hawkes, A. in 2021 IEEE International Conference on Big Data (Big Data). 4035–4046 (IEEE).
Moya, D., Copara, D., Amores, J., Muñoz Espinoza, M. & Pérez-Navarro, Á. Characterization of energy consumption agents in the residential sector of Ecuador based on a national survey and geographic information systems for modelling energy systems. Enfoque UTE https://doi.org/10.29019/enfoqueute.801 (2022).
Budinis, S. et al. Can Carbon Capture and Storage Unlock ‘Unburnable Carbon’? Energy Procedia 114, 7504–7515, https://doi.org/10.1016/j.egypro.2017.03.1883 (2017).
Rocci, R., Gattone, S. A. & Vichi, M. A new dimension reduction method: Factor discriminant k-means. Journal of classification 28, 210–226 (2011).
PARIS REINFORCE project. The ModUlar energy system Simulation Environment (MUSE), https://paris-reinforce.epu.ntua.gr/detailed_model_doc/muse (2021).
Bivand, R. CRAN Task View: Analysis of Spatial Data, https://cran.r-project.org/web/views/Spatial.html (2022).
Vitalis, S., Arroyo Ohori, K. & Stoter, J. CityJSON in QGIS: Development of an open‐source plugin. Transactions in GIS 24, 1147–1164 (2020).
Luh, S., Budinis, S., Giarola, S., Schmidt, T. J. & Hawkes, A. Long-term development of the industrial sector–case study about electrification, fuel switching, and CCS in the USA. Computers & Chemical Engineering 133, 106602 (2020).
Giarola, S., Sachs, J., d’Avezac, M., Kell, A. & Hawkes, A. MUSE: An open-source agent-based integrated assessment modelling framework. Energy Strategy Reviews 44, 100964 (2022).
Sachs, J., Meng, Y., Giarola, S. & Hawkes, A. An agent-based model for energy investment decisions in the residential sector. Energy 172, 752–768, https://doi.org/10.1016/j.energy.2019.01.161 (2019).
Kummu, M., Taka, M. & Guillaume, J. H. A. Gridded global datasets for Gross Domestic Product and Human Development Index over 1990–2015. Scientific Data 5, 180004, https://doi.org/10.1038/sdata.2018.4 (2018).
CIESIN, C. Gridded population of the world (GPW), v4. Available at sedac. ciesin. columbia. edu/data/collection/gpw-v4. Accessed May 5, 2016 (2005).
Stierli, M. Credit Suisse Global Wealth Databook 2014, https://www.credit-suisse.com/media/assets/corporate/docs/about-us/research/publications/global-wealth-databook-2014.pdf (2014).
Ürge-Vorsatz, D., Cabeza, L. F., Serrano, S., Barreneche, C. & Petrichenko, K. Heating and cooling energy trends and drivers in buildings. Renewable and Sustainable Energy Reviews 41, 85–98 (2015).
Acknowledgements
Diego Moya has been funded by the Ecuadorian Secretariat for Higher Education, Science, Technology, and Innovation (SENESCYT), Award No. CZ03-35-2017, and supported by The Science and Solutions for a Changing Planet Doctoral Training Partnership, Grantham Institute, the Department of Chemical Engineering’ Sustainable Gas Institute and The Sargent Centre for Process Systems Engineering at Imperial College London. The Technical University of Ambato (UTA), Award No. 1895-CU-P-2017 (Resolución HCU) is also acknowledged for their support. We also acknowledge EPSRC funding via NEUPA (EP/T023031/1) and IDLES (EP/R045518/1). The Institute for Applied Sustainability Research (IIASUR) supports international research on global sustainability applied to the Global South, which is a non-profit research organisation. Andrew Northern from Imperial College London’s Centre for Academic English (CfAE), and Orlando Sabogal from University College London are acknowledged for their valuable comments during the development of this manuscript. We acknowledge the important comments and suggestions made by the anonymous reviewers to improve the quality, clarity, and strictness of this article. This research was developed during the PhD studies of Dr. Moya at Imperial College London and in collaboration with the coauthors of this study. The edition, submission and revision of this article has been developed during Dr. Moya position at Aramco’s TSPD-TOS team. Dr. Moya acknowledges the support and endorsement of Dr. Ali Al-Dawood to submit this article. The views expressed in this paper do not necessarily reflect Saudi Aramco’s official policies and do not reveal confidential data. This paper has been written with the support of the Climate Compatible Growth Programme (#CCG) of the UK’s Foreign, Commonwealth & Development Office (FCDO). The views expressed in this paper do not necessarily reflect the UK government’s official policies.
Author information
Authors and Affiliations
Contributions
Diego Moya: Conceptualization, data curation, formal analysis, validation, investigation, methodology, visualisation, writing review and editing, funding acquisition. Dennis Copara: Data curation, visualisation, validation. Alexis Olivo: Data curation, visualisation, writing review and editing. Christian Castro: Formal analysis, investigation, validation. Sara Giarola: Funding acquisition, Investigation, methodology, writing review and editing. Adam Hawkes: Funding acquisition, methodology, writing review and editing, Supervision.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Moya, D., Copara, D., Olivo, A. et al. MUSE-RASA captures human dimension in climate-energy-economic models via global geoAI-ML agent datasets. Sci Data 10, 693 (2023). https://doi.org/10.1038/s41597-023-02529-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-023-02529-w
- Springer Nature Limited