1 Introduction

Elementary flows are essential components of data used for life cycle assessment (LCA). They are used in life cycle inventory (LCI) models to represent use of raw resources in a process and emissions of pollutants and other materials into the environment. Life cycle impact assessment (LCIA) methods provide impact characterization factors for elementary flows to enable impact estimation. Various conventions exist for naming (nomenclature), categorizing, using, and storing elementary flows in LCA data, which causes inconsistencies in use and implementation of elementary flows when using LCI, and LCIA data from multiple sources. This is both a problem for human readability and use, as well as, a problem for machine management of these data in LCA software and databases, both of which are critical to LCA data interoperability. In this study, we evaluate elementary flows from various data sources against a defined set of criteria to determine clarity, consistency, and interoperability in usage and management. From this review, we define common shortcomings in elementary flow nomenclature and data management and describe how broadly they apply to existing LCA data. We then provide initial recommendations for improvement in elementary flow naming and management, particularly to support interoperability and usage of elementary flows in LCA data for the envisioned Global Network of LCA databases (Canals et al. 2016).

1.1 Background

In general, a flow in life cycle inventory data refers to an input or output to a process. Flows may be of two broad types: elementary flows or intermediate (known as “technosphere”) flows according to ISO 14044 (ISO 14044 2006). Elementary flows may be defined as materials, energy, or space that are taken directly from the environment or released directly back into the environment. Elementary flows appear in LCIA method data, where flows are associated with characterization factors (units of impact per unit of flow) for estimating the impact of a given unit of a particular flow. The calculation of impact assessment results using data from an LCI (a fundamental calculation supporting LCA results) and factors from an LCIA method requires that the elementary flows in these sources correspond or match. LCA software often have their own native lists of elementary flows, in which LCA software providers generally assure that the elementary flows in the various LCI and LCIA datasets available in the software match, but a software may have a unique set of elementary flows that do not match any LCI or LCIA sources.

Elementary flows generally need to have a minimum of three components to identify them, but may have more:

  1. 1.

    The name of the material, energy, or space (e.g., “Carbon dioxide” or “freshwater”) that will enter or leave the technosphere. This is commonly called “substance” but this term is too limited and the term flowable from the ECO LCA ontology (McBride and Norris 2010) is used by the authors.

  2. 2.

    The flow context, which are a set of categories typically describing an environmental context of the flow origin or destination (e.g., “to air”). The name compartment or category is often used for this component, but we used context to provide a broader meaning that includes the flow directionality (e.g. “resource” or “emission”). The categories can be tiered in one or sometimes up to four or five levels.

  3. 3.

    A flow unit and its associated flow property (e.g., kg/mass). Flow units may be associated with conversion factors that can be used to convert between different units within a flow property (e.g., kg to lbs.) or even between flow properties (e.g., kg to m3).

Each of these individual flow components may be associated with more information, or metadata, in part dependent on what type of flow they are. For instance: flowables, if chemicals, may have a Chemical Abstracts Service number (CAS No.) and be associated with various other intrinsic properties. Other types of flows, like land occupation or raw energy inputs, may not have this additional information. Flows at a minimum should have a flowable, context and unit, and the unique combination of these components may be considered a unique flow, but whether or not it is unique is ultimately determined by the system in which it is used (e.g., LCA software).

Use of a common nomenclature is often put forth as a systematic way to ensure elementary flow consistency. A nomenclature is a system for naming entities within a realm of knowledge (UP O 2016). Rules for the naming of flowables and contexts may be considered elementary flow nomenclatures. Ideally, if a common nomenclature were used by all LCA data sources, then names for flowables and contexts would be the same. However, flows from two sources with the same name and context nomenclatures may still have different units, ID numbers, or other differences in metadata. Additionally, there may be differences in interpretation of the nomenclature resulting in differences in the names and contexts and minor differences such as extra spaces or commas. Alternatively, there may be loss of information when flows are extracted from native software that creates unintentional differences in implementation. Lack of harmonization in nomenclature and differences in implementing IDs in software and data providers causes disconnects between flows. As an example, one dataset may contain the use of a flow with the name “Nitrous oxide” while another may have a flow with the name “Dinitrogen oxide.” These datasets refer to the same chemical (N2O) but LCA software would interpret these as two independent entities. Furthermore, even “CO2” and “CO2” are identified as different entities by software tools.

LCA data providers are currently not using a common list or system of elementary flows. An early activity within the UNEP-SETAC Life Cycle Initiative was the creation of a recommended list of flow exchanges by the Data Availability and Data Quality Workgroup (de Beaufort-Langeveld et al. 2003). Desiring to preserve the autonomy of the user, the Data Availability and Data Quality Workgroup opted to provide a list of parameters with their preferred nomenclature. However, as LCA data has continued to evolve, the number of suppliers has grown, and diversified, flows have been rapidly increasing, and are created and managed independently by the various data providers.

ISO 14048 provides limited guidance on the creation of elementary flow nomenclature (ISO 14048 2002). Based on section 7.1, a nomenclature can be one of three types: exclusive, inclusive, or user-defined. Exclusive nomenclature cannot be expanded by users as only specific terms are valid. ISO 14048 requires exclusive nomenclature for the directionality and receiving environment (compartment) for flows. Inclusive nomenclature may be expanded by the user when necessary for a specific application. ISO 14048 recommends that further receiving environment specification information be an inclusive nomenclature. User-defined nomenclature may be adapted as the user sees fit. The UNEP-SETAC recommended list of parameters can be viewed as a user-defined nomenclature with guidelines (de Beaufort-Langeveld et al. 2003).

Recently developed LCA data formats, including ILCD (Wolf et al. 2011) and ecoSpold 2 (Weidema et al. 2013; Hischier and Weidema 2009), use an identification number called a universally unique identifier (UUID) to identify unique flows. Other unique identifiers such as integer numbers are possible and used in some LCA data sources and software tools. These identifiers are commonly used in LCA software to link flows in a flow list with those that occur in process exchanges and in impact methods to enable LCA calculations.

1.2 Purpose and approach

As described above, elementary flows in all LCI and LCIA sources used in a model must correspond, or match, in order to build a functional LCA model. If this is considered in the context of using data from various sources with different elementary flow lists, there is a problem of interoperability of LCA data (Ingwersen 2015). Interoperability of LCA data is a core concern of a recently formed initiative to create a Global Network of LCA databases (GLAD) (Canals et al. 2016). The purpose of this study is to provide a baseline characterization of elementary flows in commonly used LCA datasets in the form of a critical review and to make initial recommendations for how they can be created and managed more effectively to make LCA data more robust and consistent.

Via the nomenclature working group of GLAD, a voluntary team of experts was assembled to gather LCA data sources and perform this review. The team assembled elementary flow lists from LCA data sources and developed a set of criteria for evaluation of the elementary flows in these sources. The results of the evaluation are discussed to extract initial recommendations for best practices in elementary flow creation and management.

2 Methods

The following sections describe the data collection procedures, typology of flows used for analysis, and criteria for flow evaluation.

2.1 Data collection

An attempt was made to gather elementary flows from LCI, LCIA methods, and software sources representing at least three world regions. World regions were determined based on the UNEP/SETAC regional networks with the addition of North America, since this work is in collaboration with North American partners (UNEP-SETAC Life Cycle Initiative 2016). The 12 sources used are: CML v4.5 released on April 2015, accessed on January 22, 2016 (CML); CPM, automatically generated by the CPM LCA Database-SPINE to ILCD format conversion functionality on February 19, 2013 (CPM); Ecoinvent version 3.2 (Wernet et al. 2016); GaBi version SP29 last updated on January 01, 2016 (thinkstep GaBi); IDEA, from implementation of the IDEA database in openLCA in August 2015 (AIST and JEMAI); ILCD and ELCD 3.2, accessed on October 22, 2015 (JRC 2006); OpenLCA version 1.5.0 beta 1 publicly released on March 3, 2016 (Greendelta); ProBas, released on February 12, 2015 (Federal Environment Agency - Germany); ReCiPe version 1.11 released in December 2014, accessed on January 22, 2016 (ReCiPe); SimaPro version 8.05.13, accessed on December 2015 (Pre-Sustainability); TRACI 2.1 (US EPA); and US LCI, accessed on January 2015 (US LCI et al.). Sources were used that were either publicly available or shared with the project team by other participants in GLAD. All flows were collected in a common template, designed to capture the flow and the flow metadata to support analysis. The goal was to capture all available data and metadata for these elementary flows and therefore additional fields were added to the template when present in one of the sources.

Flow metadata are defined as information critical to identification of the flow that is not included in the flow name, such as flow source, flow UUID, flow context, etc. Table 1 lists the flow data and metadata information that were collected from each source for analysis. Clarifiers are metadata that link the flowable descriptive terminology. CAS No. and formulas are viewed as metadata linking to an externally defined taxonomy (e.g., CAS No., chemical formula, CORINE Land Use (EEA 1995)), while synonyms link the flowable to either a formal or informal vocabulary that is not always clearly defined. Flow context information is collected in up to three different fields. Not all sources used all or any of the context fields. The context fields are used to collect two types of information, the directionality which indicates whether a flow is an input (resource) or output (emission) and the environmental compartment (e.g., air, soil, water, etc.).

Table 1 Flow data and metadata fields collected for analysis

2.2 Typology

The SETAC Workgroup on Data Availability and Data Quality classified flows based on types (e.g., chemical substances, energy, etc.), providing specific recommendations for flow names based on this classification (Hischier et al. 2003). A similar approach for developing nomenclature has been explained in the Methodology and Overview: Data quality guideline for the ecoinvent database version 3 (Weidema et al. 2013) and the ILCD handbook (EC JRC IES 2010). Edelen and Ingwersen proposed an elementary flow categorization method based on nine types, as shown in Table 2. A modified typology based on the proposed method by Edelen and Ingwersen along with definitions (as shown in Table 2) was created for this evaluation (Edelen and Ingwersen 2015). One of these types was assigned to each flow collected to support type-specific flow evaluation.

Table 2 Elementary flow typology

Flow type definitions include directionality for each group as well as a definition and examples to improve clarity. For example, coal is a mineral of fossilized carbon in the form of a sedimentary rock. Coal is a “Fossil or Nuclear Fuels” input because it is being used as a fuel source, despite being a mineral.

2.3 Evaluation

Evaluation of flows was conducted from the perspective of usability and management. Criteria for each of these perspectives, defined in Table 3, were developed based on general principles from sources in the field of information and knowledge management (Abbas 2010; Gruber 1993; ISO 25964-2 2013; Nickerson et al. 2013; Pellini and Jones 2011). Here, usability is comprised of three principles: (1) clarity, (2) consistency, and (3) extensibility. Gruber defines clarity as the structured application of a naming convention and the use of clearly defined and linked terminology (Gruber 1993). Consistency is the uniform application of conventions within a source. Extensibility is the ability of the nomenclature to be applied to create new flows, while consistently applying a uniform naming convention. Principles deemed relevant for the management perspective of flows are translatability and uniqueness. Translatability is the ability of a nomenclature to be translated between different encoding systems. Uniqueness defines that all flows must have a means of unique identification within a database.

Table 3 Criteria for elementary flow evaluation

For each principle, one or more criteria were developed. Criteria were developed to be binary when possible, allowing for evaluation that is more objective and automated testing of sources. When automated testing was used, all flows were analyzed. For evaluation questions that were not binary (extensibility), manual evaluation of a subsample of flows were used. A subsample was used so that the time commitment for completing the manual testing was feasible. A minimum of 50 flows from each source and type were used. The 50 flows were randomly generated using an automated method. In instances where less than 50 flows existed, all flows were used.

The criteria are summarized in Table 3. To test clarity, flows were tested for directionality, resource (input flow) or emission (output flow); compartmental information, containing an impact assessment compartment (e.g., water, air, soil, ground), the ability to determine if the flow was an elementary flow, the presence of clarifiers (e.g., synonyms, CAS No., formula), and the inclusion of flow unit, flow property, or flow context within the flowable. Flows were classified as either resources or emissions based on information contained within the flow context (levels 1, 2, or 3), which identified the flow as a resource or emission. Elementary flows are defined as exchanges with the natural environment, either input flows from the natural environment to the technosphere or output flows from the technosphere to the natural environment. Therefore, elementary flow status was determined first by identifying if the flow was a resource (input) or an emission (output), based on the metadata in the context fields. Flow context was evaluated to determine if a compartment (e.g., air, soil, water) was specified, for example a resource from water or an emission to air. Then, resource flows were evaluated by the metadata in context fields to determine if resources flows come from the natural environment and go to the technosphere, or emissions come from the technosphere and go to the natural environment. A full list of the context fields and how they were categorized (e.g., input/output/unknown and to/from technosphere or biosphere) can be found in Tables S6S9 (Electronic Supplementary Material). The flowable was tested to see if it contained either flow units, flow properties, or flow context information. All clarity analyses used automated testing of all sample flows.

Consistency criteria were applied to test consistency within a source, or for some criteria within a type and a source. Consistency tests for consistent formatting (e.g., spacing and capitalization), redundancies in the flowable + context, and internal UUIDs. Formatting errors were analyzed, since additional or improper spacing and different capitalization can complicate automated matching of flows from one source to another and because current guidelines exist for capitalization of flowables (flow names). All consistency testing was automated.

Extensibility was tested by manually reviewing a subsample of flows within sources and types for a clearly defined naming pattern. Translatability was evaluated using a Python script to check flow names, compartments, synonyms, CAS number, description, and formulas for any unsupported characters from 85 different character encodings. Uniqueness evaluates flows by source, by type for the use of unique identifiers with flowable, flow unit, flow property, and flow context. Unique identifiers are used as an internal database management strategy to prevent non-unique flows.

3 Results

In total, more than 134,000 elementary flows were collected for analysis from 12 sources. The subsample defined consists of 3645 flows, or 2.7% of all flows. The subsample was only used for the extensibility criteria. Of the 12 sources, five are LCI sources, three are LCIA sources, and four are software sources. The sample consists of more than half—53.8%—LCI sources with about equal numbers of flows from LCIA and software sources (Table 4). Flows were collected from database sources representing North America (2), Europe (9), and Asian Pacific (1) geographic regions. The overwhelming majority of flows—88.9%—come from European sources. The skewed regional representativeness of the data sources reflects the predominance of European databases in the LCA data space. This article focuses on the causal issues of nomenclature interoperability and not on the practices of any one flow provider; therefore, all sources in the results will be referenced by their source type. All original data and evaluations are publicly available and can be found at https://catalog.data.gov/dataset/flow-list-and-test-results.

Table 4 Sample flows by source and type

The application of the typology revealed that 91.8% of all elementary flows collected are categorized as “Element and Compound” or “Group of Chemicals” (Table 4), while all other types range from 0.2 to 2.4%. Although the non-chemical types make up a much smaller percentage of the number of flows, there is little or no guidance on nomenclature for these types of flows. The number of flows varies significantly by source, from 0.3% from LCI 1 to 30.4% from LCI 2. To account for this significant variation all results are presented by flow type and by flow source.

3.1 Input or output?

Flows that are clearly defined in directionality, as either inputs or outputs, improve the clarity for users. The typology defines the “Biological,” “Energy,” “Fossil or Nuclear Fuels,” and “Mineral, Metal or Aggregate” as inputs, or resources, in Table 2. The types “Element and Compound” and “Group of Chemicals” are defined as outputs, or emissions. Tables S1, S2, and S3 (Electronic Supplementary Material) show how each context information provided by the different sources was organized as either being input, output, or unknown, respectively. Manual categorization of flows using the defined typology revealed flows did not all follow the typology directionality definitions, as shown in Fig. 1a. A small percentage of each of the input type flows were categorized as outputs, ranging from 0.8 to 7.6%, from “Mineral, Metal, or Aggregate” and “Biological” types, respectively. The output types, “Element and Compound” and “Group of Chemicals” also contained input flows, just at a smaller percentage, ranging from 0.1 to 0.7%. This higher rate of miscategorization by the non-chemical types could be related to the lesser standardization of nomenclature for these types in comparison to “Element and Compound.” For every type, some flows were missing context information making it impossible to label the directionality. The number of flows that exhibited no clear directionality varied significantly from 99% for LCIA 3 to <1% for multiple sources. A defined typology was used to allow for categorization of flows into types with similar properties. However, context information, which defines whether a flow is either an input or output, did not align with the typology definitions provided in Table 2.

Fig. 1
figure 1

Flow clarity analysis results. a Results of analysis of whether flows are clearly an input or output, shown by flow type. b Presence of compartments, shown by flow source. c Results of analysis of whether flows are clearly an elementary flow, shown by flow type. d Presence of external identifiers or synonyms, shown by flow type. e Presence of flow metadata, shown by flow source. f Results of analysis of whether flows contain spacing and/or capitalization errors, shown by flow type

3.2 Flow compartment information

All flow context information was analyzed for inclusion of compartment information. Since compartments are imperative for proper impact assessment, the usage of compartments improves clarity. Tables S4, S5, and S6 (Electronic Supplementary Material) define the context information of compartments based on context level 1, context level 2, and context level 3 metadata, respectively. There is an overall high rate of usage of compartment information, 93.8%, with most sources containing compartments for >80% of flowables, as shown in Fig. 1b. The highest rates of usage of compartments in the context information were for the “Element or Compound” and “Group of Chemicals” types, both >90%. All but the “Energy” and “Water” types contained a compartment within >50% of flows. This low rate of compartment information in these types can be attributed to the definition of the type. Energy is not necessarily viewed as a flowable that would flow from one of the most widely used compartments (i.e., water, air, soil), and water flowables infer a flow from a water compartment.

3.3 Is it truly an elementary flow?

Using the strict definition of elementary flows as exchanges with the biosphere, flows that would be from processes and intended for other processes are technosphere flows, but some are misidentified as elementary flows in the sources. Elementary flow determination was completed using the metadata in the context fields. Flows were analyzed for content of two types of information, input or output (e.g., emission or resource) and the compartment (e.g., technosphere, biosphere, soil, air, water, and ground). Flows were deemed indeterminable if either input, output, or compartment information was missing, unknown (e.g., unspecified), or the information provided was unclear or clearly not an elementary flow (e.g., resource “from technosphere,” emission “to technosphere”). Tables S4, S5, and S6 (Electronic Supplementary Material) defines the contexts information that contains both input and output information and a compartment based on context level 1, context level 2, and context level 3 metadata, respectively. For a flow to be determined as an elementary flow, it was required to contain input context information with compartment information indicating it flows from the biosphere or output context information with a compartment indicating it flows to the biosphere. The biggest issue within the flow lists was not the inclusion of large amounts of non-elementary flows—only 0.3% of all flows were non-elementary. The “Element or Compound” and “Group of Chemicals” exhibited the lowest rates of non-elementary flows with 0 and 0.2%, respectively. However, significant amounts of flows were classified as indeterminable—44.3% of all flows—as shown in Fig. 1c. Indeterminable flows were either missing input/output information or compartment information or both. Many indeterminable flows, such as those by Software 1 contained partial information, but did not contain both input/output information and a recognizable compartment. All three LCIA sources exhibited an extremely low amount of flows clearly identifiable as elementary flows, with two sources, LCIA 1 and LCIA 3, being >99% indeterminable, which is mostly due to the low rate of compartment information in LCIA sources. Overall, the lack of compartment and input/output information shows that most flows are not clearly defined as elementary flows, per the criteria provided. Overall, LCIA sources were significantly less likely to provide context information than any other types of sources, showing a general need for LCIA sources and experts to be more engaged in the LCA community to ensure connectivity between LCI flowables and LCIA impacts.

3.4 Clarifiers

In the flow collection process, three fields (e.g., CAS No., chemical formula, and synonym) were identified as clarifiers, or containing information linking the flowable to a vocabulary. Of these, CAS No. and formulas (chemicals formulas and CORINEFootnote 1) link to a formal externally defined taxonomy, while synonyms do not link to any formal definitions. The overall majority, 76.9% of all flows, use an externally linked clarifier, while only 32% use synonyms (see Fig. 1d). The overall tendency to rely on externally defined clarifiers improves the clarity. However, for most of the non-elemental flowables, the low rate of clarifiers can lead to redundancies with similar names or flowables that are confusing. Sources that link to an external taxonomy do so frequently; however, one-third of all sources do not use clarifiers at all.

Synonyms are used less often than CAS No. or formulas. The highest rate of usage for synonyms is with the “Element or Compound” type at 36.3%, while all other types utilize synonyms significantly less frequent, ranging from 0 to 6.0%. Synonyms can be a useful tool, especially when integrating flowables from different sources; however, the infrequent usage of synonyms outside of two sources, LCI 2 and Software 3, leaves little benefit to practitioners. Only European sources use synonyms.

3.5 Data/metadata in flowable

The inclusion of other information such as the flow unit, flow property, and context (e.g., emission or resource label and indication of compartment) in the flowable occurs with 2.8% of all flows. The “Element or Compound” typology exhibited the lowest rate of metadata in the flowable at 0.2%. Metadata are most often included in the name by the LCI 1 source, with 99.7% of all flows containing data, as shown in Fig. 1e. For most sources, the inclusion of metadata in the flowable does not seem to be a significant problem.

3.6 Formatting

Flows were tested for two types of formatting errors, improper spacing, or improper capitalization. Spacing errors are defined as either a double space or no space after a comma. Spacing errors occurred at a much lower rate than capitalization errors, as shown in Fig. 1f. The greatest occurrence of spacing errors is within the “Element or Compound” type with 52.6% of flows containing this error. The high occurrence of spacing errors for the “Element or Compound” type was due to many chemical names being written without a space after the comma (e.g., 1,1,2-tetrafluoride). Outside of the “Element or Compound” type, spacing errors were minimal, at a max of 1.5% occurrence within the “Fossil or Nuclear Fuels” type. Capitalization errors were much more significant, ranging from 1.1% of “Fossil or Nuclear Fuels” flows and 48.1% in “Element or Compound” flows. Therefore, spacing errors were considered insignificant in comparison to capitalization errors. Total presence of formatting errors in sources revealed that some sources had no detectable errors while others had errors for up to 98.8% of flows. LCIA sources are more likely to exhibit formatting errors.

3.7 Redundancies

Flows were tested for redundancies in the combination of the flowable and the flow context, since flowables alone are not necessarily unique. Eleven percent of flows were redundant. The greatest numbers of redundancies, 66.1%, were found in the “Other” type, which could be linked to the vague definition of the “Other” category. The high rate of redundancy in the “Fossil or Nuclear Fuels” type is because source LCI 3, which has a redundancy rate of 90.1%, has a very high number of “Fossil or Nuclear Fuels” flows in comparison with all other sources. Redundancies overall do not seem to be a major issue, except in the LCI 3 source. A small percentage—up to 5.8% of each type of flow—is redundant because the same flowable and context are repeated, but with different units. Standardizing units could prevent these redundancies, since units can be converted, especially for the “Fossil or Nuclear Fuels” type. UUID redundancies were not a significant issue with none or <2% in all sources, except 6% in LCI 5 and a high redundancy of 44% in LCI 1, severely limiting the effectiveness of UUIDs as unique identifiers for that source.

3.8 Pattern analysis

The extensibility of the flows was tested using pattern analysis. Flows were tested by source and type for flows for a pattern within the flowable. Pattern analysis was a manual test to determine the pattern of descriptive information within a flowable. The nomenclature patterns for LCI 6 source, type “Fossil or Nuclear Fuels” are shown below, in Table 5.

Table 5 Example pattern analysis

In this example, three patterns were determined based on the different types of descriptive information included and the order of the information within the flowable name. When using a user-defined nomenclature, flowables may appear multiple times, just with varying levels of specificity. Ideally, any flow type should contain only one pattern.

The number of patterns derived for each source and flow ranges from 1 to 15. Furthermore, while the type “Element or Compound” resulted in a low number of patterns (mean of 2) and “Fossil or Nuclear Fuels” and “Land” resulted in higher number of patterns (mean of 7 and 9 patterns, respectively), these results do not correlate with the number of flows analyzed. For instance, in the former, all sources had 50 flows in the sample (the maximum considered), while the two latter had an average of 29 and 42 flows per source in the sample. This is evidence that certain types of elementary flows (e.g., “Element or Compound”) have more aligned use of nomenclature, regardless of the number of flows.

Meanwhile, across sources, no strong correlation was observed between the number of patterns and the number of flows considered in the sample, e.g., LCI 4 and Software 1 had the highest number of flows considered (426 and 417 in total, respectively) and the highest number of patterns found (75 and 79 in total, respectively); while LCIA 1 had the lowest amount of flows sampled (100) and the lowest number of patterns (5). However, this may be explained due to the level of interoperability among and within the sources, i.e., LCIA 1 had solely two types of flows (“Element or Compound” and “Group of Chemicals”), while the sources with high number of patterns had flows for all types, i.e., LCI 2, LCI 4, LCI 5, Software 1, Software 2, and Software 3.

3.9 Translatability

The ability to move LCA data without losing flow information from one system to another requires flow translation. One pitfall with conversion of LCA data is potential loss of information in a change from on encoding scheme to another, if all the same characters are not supported in the schemes. Unsupported characters were analyzed by source and by type. Only 1.1% of all flows contained unsupported characters, and 0.6% of flows contained unsupported characters other than the percent sign, indicating that translatability was not a significant issue. However, the usage of the percent sign is not a major concern since it is only unsupported by three encoding types that are not commonly used. Flows typed as an “Element or compound” contain the largest variety of unsupported characters with 19 different unsupported characters, while the “Energy” and “Biological” types only have one unsupported character. The only unsupported character used in all types is the ä.

3.10 Unique identifiers

The use of unique identifiers was analyzed for data fields (e.g., flowable, flow unit, and flow property), and context field levels 1 and 2. No sources use UUIDs for the context level 3 field. Overall, flowable UUIDs are use around twice as often as with unit or flow property UUIDs. By source use of UUIDs was mostly either use 100% of the time or not at all. Only one source, Software 1, uses UUIDS for all data fields. Two sources use UUIDs for the flowable and flow unit, while LCI 1 and LCI 3 only use UUIDs for the flow unit and LCI 5 only use UUIDs for the flowable. Only three sources use either context1 or context2 UUIDs. For LCI 1 and Software 1 sources, less than 100% of flows contained context UUIDs, this is because some types, regardless of source, do not use context UUIDs.

3.11 Example flow analysis

While complete analysis of the different flow lists reveals the data and metadata trends within the LCA community, this section will focus on a few select flows to highlight the differences the interoperability issues of elementary flows. Table 6 consists of eight flows from various sources. Flowables (a) and (b) would be considered clear flows by the criteria used in the evaluation since each of the flows contain directionality and compartment information and defined clarifiers are included in the metadata. However, even these seemly clear flows contain variations in capitalization, user-defined nomenclature for the context information, and varying formatting for the CAS No. field, decreasing the machine readability and interoperability of these flows. Example flowables (c)–(h) are much less clear by the criteria in this evaluation. None of these flows are linked to any types of clarifiers. In fact, the name of (e) suggests that it should be connected to some type of definition. However, since the definition or a clarifier is not provided with the flow, this flow name is unusable unless a user has prior knowledge of this naming convention. Flowables (c) and (d) lack clear directionality, decreasing the clarity of these flows. Flowables (c) and (f) are ambiguously named, preventing any automated characterization in an LCIA assessment. Flowable (g) is in fact not an elementary flow based on the flow context information, since it is a “Resource from Technosphere.” Flowable (i) contains duplicate metadata in the name and in the context information. This complex name increases the likelihood of formatting errors and redundancy of the flow. Overall, the lack of an exclusive nomenclature for the directionality and compartments of flows greatly decreases the interoperability of flows from one source to another, such as between flowable (h) and flowable (i).

Table 6 Example elementary flows analyzed

4 Discussion

There has been little progress towards development of a common elementary flow list for LCA data. We speculate in part that this is because elementary flows themselves are more complex than have been assumed, resulting in the numbers of elementary flows used in data growing by orders of magnitude (from 10s to 10,000s in some sources), making flow lists more difficult to manage and less interoperable between sources. Furthermore, creating and maintaining data with a common elementary flow list requires significant resources that have been spent in the creation of databases and software tools mostly by small teams of internal experts, and the different groups have different stakeholders that have in some cases different requirements. For example, more detailed differentiation of individual flows increases data quality and accuracy, but often leads to additional work in data collection and LCIA method development. Not all users require the same level of detail or have the same level of resources in mind. With this in mind, the high level of coordination needed between the different flow providers and users to develop and maintain a common flow list is likely to be too resource-intensive for a voluntary international body to operate, and a single common list is not likely to be amenable or desirable to all stakeholders.

First, it is clear from our analysis that there are issues with existing flow lists in LCI, LCIA, and software sources that may compromise the integrity and reliability of their use by practitioners. There are fundamental problems with some flows that may make them challenging to interpret and apply. While there are practical reasons to convert what are formally technosphere flows into elementary flows in LCA data — for example wastes intended for treatment for total waste accounting in LCIA calculations ​— doing so blurs the definitions in the LCA data model, and other means of this type of accounting should be used. An elementary flow needs to have clear directionality (as an input or output), and this is not always clear by evaluating their names or contexts. We found that less than 60% of flows have clear directionality. Directionality is largely determined now by interpretation of compartments—however, some sources do not provide clear contexts or compartments. Identifying the flowable in a flow is facilitated by the use of a clarifier, such as a CAS number or synonym. These clarifiers are provided by some sources for type “Element and Compound,” but frequently not for other types of flows, in cases because they are not available. “Element and Compound” flows tend to exhibit higher levels of clarity and consistency and lower numbers of patterns. Connecting flow types with an inclusive nomenclature improves the interoperability of the flowables.

Other issues were found that demonstrated lack of consistency within a source. Redundancy in flows can hamper both their use and management by bloating the number of flows in a list and leading to the greater likelihood of confusion by users. Simple errors in the syntax of flow naming, with extra or missing characters, etc., makes manual and automated processing tasks with elementary flows more prone to error.

The use of metadata in flowables (flow names) can seem convenient but is sometimes overused to store information that would be more machine-accessible elsewhere, e.g., compartment information or flow properties. Not all flow lists allow for a systematic, detailed reporting of flow properties though, and compartments are not implemented in all detail in some lists as shown. Nevertheless, the many types of patterns observed in the pattern analysis demonstrate the need for a more detailed structure of flow metadata storage and the development of an inclusive nomenclature to capture specific flow context information. The current use of the flowable to report various metadata can be a hindrance in efforts towards consistent flow matching.

A high number of patterns were found for some flow types. The linking of flow within the “Element or Compound” type to a more clearly defined taxonomy, or inclusive nomenclature such as the CAS registry, helps maintain lower numbers of naming patterns. Further study on how the non-chemical types can be linked to established inclusive or exclusive nomenclatures to minimize patterns within sources is needed.

On the technical side, issues of translatability can seem simplistic, but they show that with little effort large potential gains regarding very practical interoperability can be gained. While flow lists can continue to operate in their existing encodings, they and their users then need to be aware of future issues of interoperability with the growing number of non-English data sources. The translatability between most common sources was not a problem in our evaluation. The issue of unsupported characters is more important when translating between languages. This study included a German language source and as an example of the unique challenges non-English sources face. To further support the growing diversity within the LCA community and support non-English countries developing LCA resources, standardizing unsupported character alternatives is important.

The findings of this review are not surprising given that there are not strict rules or guidance provided for creation and management of elementary flows in the ISO standards or other international guidance documents (e.g., “Shonan” Guidance Principles). Some leading database providers and format creators have played a leading role in advancing the description of elementary flows and their components (e.g., ILCD (European Commission 2010), ecoinvent (Wernet et al. 2016)) which has led to more robust flow lists. However, in the emerging context of a global world where data needs to become more interoperable between sources, these results show that the many shortcomings and discrepancies will provide challenges for wider use and interoperability; one challenge regards interoperability and another regards data integrity.

The Global LCA Data Access (GLAD) initiative has determined that a fundamental level of LCA data interoperability is flow interoperability. Generally, this implies that flows must be mapped, or translated, from one source to another. There is an ongoing effort to map elementary flows of different LCI databases as a deliverable concurrent with this critical review the Nomenclature Working Group. The clarity, consistency, and of course, translatability of flows will affect the mapping and translation process. The more clear and concise flows are, the easier it will be to translate them from one flow source to another. If no appropriate match is found, extensibility of the target flow list will be important to be able to create new flows in the target list that are consistent with other flows in that list.

LCI databases often provide LCIA results for their data for the convenience of their users, which means that they have to match LCIA flow lists to their own ones regularly. Furthermore, software providers often have to translate flows from LCI and LCIA sources to match their own flow lists. They may partially rely on existing matchings provided by LCI sources, but depending on the business, strategy may match both LCI sources and LCIA methods again independently. A study by Lesage et al. (2015) has revealed that this can result in loss of data integrity due at least in part to misinterpretation of elementary flows, but also due to inconsistencies in version update timings between different flow sources. This critical review further reveals why this is likely the case with the many issues with flow clarity and context that could make the flow translation process difficult.

5 Recommendations and conclusions

It is incumbent upon all owners/maintainers of current flow lists to address some of the shortcomings found through this critical review. While few flow lists were originally developed with interoperability in mind, more efforts to prepare for data exchange between different systems will likely be appreciated by the system stakeholders. Making improvements to their own flow lists will not only help their current users, it will help facilitate the process of interoperability for LCA data networks, and overall increase the integrity of LCA data which will in turn increase the robustness of LCA results. Some recommendations made on the basis of this work are:

  • Apply typology to increase flow clarity.

  • Increase usage of clarifiers (e.g., synonyms, formulas, and CAS No.), especially those linked to exclusive or inclusive nomenclature.

  • Define an exclusive nomenclature for context directionality and compartment (e.g., context 1: emission/resource, context 2: compartment (e.g., air, water, soil) as required by ISO 14048 (ISO 14048 2002).

  • Define an inclusive nomenclature for detailed context information (e.g., context 3: detailed compartment information) as recommended by ISO 14048 (ISO 14048 2002).

  • Develop guidance on the proper format and usage of context information.

  • Enforce guidelines for capitalization rules.

  • Establish guidelines for avoiding encoding errors OR establish a common encoding that supports all special characters.

  • Develop guidelines to ensure that metadata (e.g., flow context, flow units, flow property) are captured in the metadata fields and NOT in the flowable name.

  • Use unique identifiers for flowable and flow context information.

  • Set standard units for types such as “Energy” and “Fossil or Nuclear Fuels” to avoid redundancies due to varying units.

  • Define explicit nomenclatures for flowables by type that are inclusive.

  • Improve collaboration across organizations of different source types, especially between the LCI and LCIA communities, to improve interoperability between inventories and impact assessments.

Recommendations that are more explicit are planned as another output of the Nomenclature Working Group of GLAD.