Introduction

Unconventional natural gas has been heralded as an abundant source of clean energy in North America (Rivard et al. 2014). Unconventional natural gas differs from conventional sources in that the permeability of the gas-bearing formation is too low to allow gas to flow freely in time frames suitable for economic extraction (Myers 2012; Vidic et al. 2013). Hydraulic fracturing (HF), combined with horizontal drilling techniques, is known as the main and most effective approach for inducing the release of gas, increasing gas flowability, and improving gas recovery (Kargbo et al. 2010; Richard 2010). Hydraulic fracturing involves pumping large volumes of fracturing fluid, which consists of water, proppants (typically quartz sand), and various chemical additives, under high pressure into a perforated wellbore to initiate and expand fractures within the adjacent geological formation (Vidic et al. 2013).

The extensive use of HF has triggered significant public concerns over the risks to human health and the environment (Jenner and Lamadrid 2013; Boudet et al. 2014; Soeder et al. 2014; Vengosh et al. 2014). One of the major concerns involves the additives used in fracturing fluids, which could potentially contaminate ground and surface water supplies and pose health risks to surrounding aquatic ecosystems and water resource users (Stringfellow et al. 2014; Vengosh et al. 2014; Kahrilas et al. 2015). Various additives are used in HF operations for different downhole purposes, such as inhibiting the growth of microorganisms, facilitating the transportation of proppants, and preventing pipeline corrosion (Stringfellow et al. 2014; Hurley et al. 2016). Some additives comprise ingredients that have been confirmed as highly persistent in the environment, extremely toxic for aquatic organisms, and having adverse effects on human health (Chen and Carter 2017; Elliott et al. 2017; Luek and Gonsior 2017; Stringfellow et al. 2017; Hu et al. 2018a, b). These additives can potentially contaminate soils and water resources through accidental chemical spills, inappropriate disposal of HF flowback water, and leakage due to well integrity failure (Soeder et al. 2014; Vengosh et al. 2014; Akob et al. 2016; Burton et al. 2016; Kassotis et al. 2016, 2017; Cozzarelli et al. 2017; Orem et al. 2017; Stringfellow et al. 2017).

Encouraged by industry-level directives, the unconventional gas industry is transitioning toward more sustainable chemical use in HF operations for safety reasons and public confidence (Jordan et al. 2010). To facilitate this transition, methods that can effectively measure additives’ environmental and human health (EHH) hazards and produce comparable results are required. Various indexing methods have been developed accordingly to convert multi-dimensional chemical hazard information into numerical indicators and aggregate the indicators into a final index reflective of the amalgamated chemical hazards (Jordan et al. 2010; Hepburn 2012; Verslycke et al. 2014; Hurley et al. 2016). Based on the distribution of the resulting chemical hazard indices, qualitative chemical hazard guidelines (e.g., terms such as “low,” “medium,” and “high” hazard) can be established to assist HF operators in selecting additives with a low EHH effect for more sustainable unconventional gas extraction (Hurley et al. 2016).

Although the indexing methods are capable of condensing chemical hazard information into a single metric that is easy to understand and interpret (Sadiq et al. 2010), they cannot adequately handle uncertainties introduced by the hazard assessment. Like many other environmental guidelines, hazard interpretation relying on hazard guidelines is inherently associated with fuzzy uncertainties. At the cognitive level, fuzzy uncertainties originate from the vagueness and ambiguity of human thoughts and can result in different hazard perceptions for the same qualitative hazard guidelines (Li et al. 2007, 2016). Fuzzy-based techniques are useful tools for handling the uncertainties associated with evaluation guidelines and criteria (Li et al. 2016). For example, the combined use of a chemical hazard indexing system and a fuzzy inference system can help reduce the fuzzy uncertainties resulting from the subjective interpretation of HF chemical hazard indices and generate more reasonable chemical hazard assessment results as compared to the individual chemical hazard indexing method (Hu et al. 2018a).

Another limitation of chemical hazard indexing methods is that they cannot incorporate other important information, such as the use frequency of additives and the certainty of the measured chemical hazard, into the final index for EHH risk assessment. In this study, the term EHH risk is used to describe the possibility estimated for an additive to cause significant adverse consequences to EHH. Risk assessment is the estimation of the frequency and physical consequences of undesirable events (Sadiq et al. 2007a). During HF, additives can cause harm (i.e., consequences) to EHH via chemical spills and the unintended release of fracturing fluids (Rozell and Reaven 2012), while the use frequency of an additive can represent its relative exposure frequency (Hu et al. 2018a). From a risk assessment perspective, if an additive is associated with a high EHH hazard, but it has not been frequently used in HF operations, then the EHH risk of this additive should not be considered high. Moreover, the certainty of the measured EHH hazard, depending on the availability of chemical hazard information, can also affect the additive’s EHH risk implication. If an additive has frequently been used in HF operations, but some of its chemical hazard information is lacking, then the EHH risk of the additive is still difficult to determine.

Although the non-commensurate information of risk factors, such as chemical hazard, use status, and hazard certainty, can be converted into numerical indices using indexing approaches, interpreting these risk factors separately could result in a skewed assessment and leads to inappropriate chemical selection. In our previous study, a fuzzy-based indexing approach was developed to assess the EHH risks of HF additives based on the measured chemical hazard and use frequency (Hu et al. 2018a); however, the certainty of the measured chemical hazard was not considered as all the required chemical hazard information was available for the additives assessed. In real-world HF operations, chemical hazard information is not always completely available due to partially disclosed chemical compositions and/or chemical toxicity data gaps (Hu et al. 2018b). The lack of chemical hazard information could result in different levels of data uncertainty with the measured EHH hazard, which can affect the overall EHH risk implication. Therefore, it is important to develop a new approach that can measure the EHH risk of an additive based on the holistic assessment of its hazard nature, use status, and the certainty level of the measured hazard.

Clustering analysis is an effective data mining technique to divide data points in a large dataset into different groups in such a way that the degree of association between two data points is maximum if they belong to the same group (Čokorilo et al. 2014). Data points are objectively grouped into different clusters based on similar characteristics defined by the multi-dimensional clustering criteria. Thus, clustering analysis is suitable for grouping data points based on the holistic evaluation of multi-dimensional information. This method has been widely used in risk assessments to help understand the risk profiles of objects based on relevant risk factors and identify the objects of high risk concern (Sánchez-Pérez et al. 2004; Iliadis et al. 2010; Čokorilo et al. 2014; Iwańkowicz and Rosochacki 2014; Noiva et al. 2016). This type of analysis can lead to more efficient resource allocation for risk management (Iliadis et al. 2010).

The main objective of this study is to develop a novel methodology by combining fuzzy clustering analysis (FCA) and chemical hazard indexing for EHH risk assessment of HF additives under data uncertainties. The EHH safety index (SI), use frequency index (UFI), and data availability index (DAI) of representative HF additives used in British Columbia (BC), Canada, were obtained from an indexing system. These indices were subjected to the FCA to extract information about the similarities among different additives. The additives were grouped into clusters with different EHH risk implications based on their similarities. The FCA can help identify the additives of high EHH risks that require particular attention in HF operations. The developed methodology can facilitate the transitioning toward more sustainable chemical use in the unconventional gas industry.

Methodology

Framework

The assessment of HF additives was carried out through the framework shown in Fig. 1. The assessment process begins with additive data acquisition. The chemical composition (i.e., ingredient species and mass percentages) and use frequency of additives were identified, and the chemical toxicity data of the identified ingredients were collected. The collected data were converted into three types of indices (i.e., SI, DAI, and UFI) using an indexing system. The additives were then subjected to the FCA based on the values of the three types of indices. As shown in Fig. 1, n additives can be grouped into k clusters using the FCA, and the generated clusters are characterized by different value ranges of the three types of indices. The EHH risks of the generated clusters can be ranked by interpreting the index ranges. The FCA can also measure the degrees of membership (DOM, or μ) of each additive to the generated clusters. The DOM of an additive to different clusters can be converted into a risk index (RI) using a quality-weighted aggregation approach. The EHH risks of additives can be compared according to the calculated RIs.

Fig. 1
figure 1

Environmental and human health (EHH) risk assessment framework for HF hydraulic fracturing additives using a fuzzy clustering analysis approach

Additive use frequency indexing

Additive use data for HF operations in BC from November 2011 to August 2014 were collected from the FracFocus Chemical Disclosure Registry database (FracFocus 2014). The additives were identified by their registered trade names, and the component ingredients of each additive were identified by their Chemical Abstracts Service Registry Numbers (CASRNs). The additives were divided into 13 functional categories, such as friction reducers, iron control agents, and gelling agents, according to their engineered purposes. The downhole purposes of different functional categories can be found in Hu et al. (2018b). Within each category, the representative additives were selected for the assessment according to their use frequencies. The usage of an additive was determined by counting the number of unique instances in which an additive’s trade name, a specific well number, and operation date appeared in the database. The UFI was calculated as the use percentage value of an additive within its functional category. Additives with a UFI < 1 were not considered representative and were not included in the assessment. The detailed information about the selected additives, including their code names, component ingredients, and functional categories, is listed in Appendix I in ESM.

Additive hazard indexing

The EHH hazards of the selected additives were measured using an indexing system. The indexing system was modified based on the Hydraulic Fracturing Fluid Greenness Assessment System (HyFFGAS) developed by Hurley et al. (2016). The modified system uses various authoritative chemical toxicity databases in addition to the Materials Safety Data Sheets used by the HyFFGAS. Also, the modified system employs the DAI as an indicator of the certainty of the measured EHH hazard (Hu et al. 2018b).

The EHH hazard of an additive is determined by the chemical toxicity and mass percentages of its component ingredients (Hu et al. 2018b). The derived chemical toxicity data were analyzed against several hazard categories (Appendix II in ESM), such as carcinogenicity, to develop a human health safety score and aquatic toxicity to generate an environmental health safety score for each ingredient. The hazard categories were selected in light of the Globally Harmonized System of Classification and Labelling of Chemicals (GHS) (United Nations 2015). The two safety scores were assigned equal mathematical weights (0.5) and aggregated to generate an ingredient safety index. The ingredients’ safety indices were further aggregated to generate a SI for the additive according to the ingredients’ normalized weight percentages.

The resultant SI reflects the EHH hazard of an additive on a scale from 0 to 10, where 0 represents the lowest EHH safety (i.e., the highest hazard) and 10 represents the highest EHH safety (i.e., the lowest hazard). In case of missing chemical toxicity data, both the highest and the lowest hazardous scenarios for the data-missing hazard category were considered. Hence, the SIs of additives with missing toxicity data are presented in a range of the minimum and maximum values, representing the highest and lowest possible EHH hazards. The chemical toxicity data of ingredients and the calculated SIs of additives are also listed in Appendix I in ESM. The mean value of the minimum and maximum SIs of each additive was used for the FCA.

Hazard certainty indexing

Missing ingredient information and toxicity data could result in different levels of uncertainty with the measured additive hazard. A DAI was calculated according to Eq. (1) for each additive to assess the extent of chemical toxicity data available for supporting the hazard indexing result. The DAI calculation and data certainty evaluation were established in light of a HF chemical hazard screening system developed by Intrinsik (2013).

$${\text{DAI}} = 100 - \frac{{100 \cdot N_{m} }}{{N_{i} \cdot N_{t} }}$$
(1)

where Nm is the total number of data-missing hazard categories, Ni is the total number of ingredients, and Nt is the total number of hazard categories. The resulting DAI is presented in a range from 0 to 100, where a higher DAI indicates a higher certainty of the measured EHH hazard. The distributions of the three types of indices were evaluated by kernel density estimation (KDE), which is a nonparametric method for estimating the probability density function of a random variable.

Fuzzy clustering analysis

Prior to the FCA, the calculated UFI, SI, and DAI were normalized to a scale from 0 to 1. FCA assigns each data point in a dataset a DOM to every formed cluster, indicating that each data point belongs to different clusters with different levels of association. The DOM is a unique feature that distinguishes FCA from other crisp clustering algorithms, such as the k-means clustering and hierarchical clustering. This feature offers FCA great flexibility in risk assessment because it is capable of addressing fuzzy uncertainties (i.e., the concept of partial truth) and is suitable for grouping data points with weakly defined boundaries (Iliadis et al. 2010). A widely accepted fuzzy clustering algorithm is fuzzy c-means. For a dataset \(X = (x_{1} ,x_{2} , \ldots ,x_{n} )\) comprising \(n\) data points, the fuzzy c-means algorithm classifies the data points into predefined p clusters based on measured similarities among the data points. Each cluster has a center ej (j ∈ [1, p]), and the Euclidean distance dij between a data point xi and ej can be calculated as:

$$d_{ij} = \left\| {x_{i} - e_{j} } \right\|$$
(2)

In this study, each additive was considered as a data point defined by three-dimensional values (a, b, c) (i.e., the normalized SI, UFI, and DAI). The Euclidean distance between data points xi (ai, bi, ci) and ej (aj, bj, cj) in a three-dimensional space was calculated as:

$$d_{ij} = \sqrt {(a_{i} - a_{j} )^{2} + (b_{i} - b_{j} )^{2} + (c_{i} - c_{j} )^{2} }$$
(3)

At the beginning of fuzzy c-means, random centers (usually with a value of zero) are selected for the clusters. Based on the derived dij, a DOM μj(xi) can be calculated as a measure of the similarity between a data point xi and the jth cluster:

$$\mu_{j} (x_{i} ) = \frac{{(1/d_{ij} )^{2/(\zeta - 1)} }}{{\sum\limits_{k = 1}^{p} {(1/d_{ik} )^{2/(\zeta - 1)} } }}$$
(4)

where ζ is a fuzzification parameter to determine the degree of fuzziness between different clusters. A higher value of ζ will lead to higher fuzziness between clusters. Commonly, ζ takes values between 1.25 and 2 (Cox 2005; Sadiq et al. 2007b). In this study, ζ value was set at 2. The parameter dik is the Euclidean distance between xi and the center of the kth cluster. The new centers of clusters can be calculated as:

$$e_{j}^{{\prime }} = \frac{{\sum\limits_{i} {[\mu_{j} (x_{i} )]^{\zeta } x_{i} } }}{{\sum\limits_{i} {[\mu_{j} (x_{i} )]^{\zeta } } }}$$
(5)

Based on the new \(e_{{_{j} }}^{{\prime }} ,\mu_{j} \left( {x_{i} } \right)\) will be updated. The iteration will continue until the minimum objective function J is achieved:

$$J = \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{p} {[\mu_{j} (x_{i} )]^{\zeta } d_{ij}^{2} } } ;\,\,\,\,\,p \le n$$
(6)

A total of seven clusters were formed for the EHH risk assessment. The FCA process was carried out using the statistical computing software R™ (version 1.0.136). The clustering results were interpreted using violin plots, which are box plots combined with a rotated kernel density plot on each box (Noiva et al. 2016). Based on the value ranges of the three types of indices, the EHH risks of clusters can be interpreted and ranked. After the FCA, seven DOMs (µ1, µ2, µ3, µ4, µ5, µ6, and µ7) can be generated for each additive to show the degrees of similarity between the EHH risk characteristics of the additive and the seven clusters. A RI can be calculated for each additive based on the DOMs using Eq. (7):

$$RI = \sum\limits_{j = 1}^{p} {\mu_{j} } \cdot w_{j}$$
(7)

where wj is the specific quality-ordered weight of cluster j, and p is the total number of clusters. The values of wj were determined by the EHH risk characteristics of different clusters. For example, wj values (0.15, 0.30, 0.45, 0.60, 0.75, 0.80, and 1.00) can be assigned to the seven clusters ordered from the lowest to the highest EHH risk, respectively. The values of wj were assigned subjectively, and they can be modified to accommodate different needs of risk assessors (Lu et al. 1999). Assigning a larger value of wj to the cluster of a higher EHH risk represents a risk-averse attitude of the assessor, while assigning a larger value of wj to the cluster of a lower EHH risk represents a pro-risk attitude of the assessor. If equal weights are assigned to wj, it implies that the assessor is indifferent and represents a compromising or normative attitude (Sadiq et al. 2007b). In this study, a risk-averse attitude is preferred for EHH risk assessment. The calculated RI can be used for comparing the EHH risks of different additives. A higher RI represents a higher EHH risk because the additive has a stronger affiliation with the clusters identified as having high EHH risks.

Application of FCA in additive selection

Six hypothetical fracturing fluids (F1 to F6) were designed to demonstrate the application of the FCA in additive selection. As shown in Table 1, six fracturing fluids were designed to contain additives from the same functional categories at the same concentrations. The concentrations of additives in different functional categories were designed according to the compiled data from shale fracturing operations (ALL Consulting 2012). The additives contained in fluid F1 were randomly selected from the clusters identified with low EHH risks, while the additives grouped into high EHH risk clusters were randomly selected to produce fluid F3. Fluid F2 consists of additives grouped into the clusters characterized by moderate EHH risks. In comparison, fluids F4, F5, and F6 were designed solely based on the SIs of additives. Fluids F4, F5, and F6 were made of additives with the lowest, medium, and highest SIs within each functional category, respectively. The SI, UFI, DAI, and RI of a fracturing fluid were calculated by aggregating the respective indices of the component additives. A weighted sum method was used for the aggregation, in which the normalized concentration of an additive was used as the weight for that additive (Hu et al. 2018b).

Table 1 Compositions of six hypothetical fracturing fluids

Results and discussion

Additive data overview

A total of 105 representative additives were selected for the risk assessment. According to the disclosed chemical information, these additives consist of 97 known ingredients. It is important to know the EHH hazards of ingredients since they are the basic elements of additives. The EHH hazard information of these ingredients is shown in Fig. 2. It can be seen in Fig. 2a that about 30% of the assessed ingredients are inherent or not readily/rapidly biodegradable in the environment, which might cause long-lasting effects on EHH. Bioaccumulation potential (Fig. 2b) does not seem to be a significant concern because only 6% of the ingredients were identified as having high bioaccumulation potential. As shown in Fig. 2c, the major environmental health hazard identified is that about 10 and 17% of ingredients are GHS Categories 1 and 2 aquatic toxicants (i.e., the EC50 value, representing the concentration of a toxicant at which 50% of the maximum effect is produced, is lower than 1 and 10 mg/L for aquatic organisms, respectively). Additionally, 22% of the ingredients have confirmed or suspected adverse human health effects (Fig. 2d), including carcinogenic, mutagenic, and reproductive toxicities. It was also found that the lack of chemical toxicity data is significant for some hazard categories, such as bioaccumulation potential, gem cell mutagenicity, and reproductive toxicity, as the hazard information of these categories was identified as unknown for > 50% of ingredients. The chemical toxicity data gaps resulted in different levels of uncertainty with the measured additives’ EHH hazards.

Fig. 2
figure 2

Percentages of ingredients with different environmental and human health (EHH) hazards: a biodegradation, b bioaccumulation potential, c environmental and human toxicity, and d human health effect

The KDE results are shown in Fig. 3. The results indicate that the SIs of additives are frequently found between 5.5 and 6 (Fig. 3a). This index range indicates that the representative additives are generally of moderate EHH hazards (SI ∈ [5, 8)) according to the hazard scales defined in the indexing system (Hurley et al. 2016). The probability density distribution of DAIs (Fig. 3b) shows that the DAIs are most likely to be found between 50 and 90, indicating medium to high levels of certainty (i.e., DAI ∈ [50, 75] and (75, 100], respectively) with the hazard indexing results (Intrinsik 2013). Although the chemical toxicity data gaps are significant for several hazard categories, the additive hazard indexing results can still provide relatively sufficient EHH hazard information for the EHH risk assessment. Moreover, the KDE of UFIs (Fig. 3c) suggests that most additives have been used < 20% of the total usage within each functional category. The relatively low use frequency of individual additives suggests that very few selected additives have been preferentially used over other alternatives within the same functional category. The KDE results characterize the additives as moderate EHH hazards, relatively high hazard certainty, and low use frequencies. A holistic evaluation of the EHH risks of additives based on the three risk factors calls for the FCA.

Fig. 3
figure 3

Kernel density estimation (KDE) of a safety indices (SI), b data availability indices (DAI), and c use frequency indices (UFI) of additives

Fuzzy clustering analysis results

Determination of cluster numbers

Prior to the FCA, the optimum number of clusters was determined by evaluating the decrease in within-group sum of squares (SS) as a result of increasing the number of clusters. The fuzzy c-means algorithm calculates the center of each cluster iteratively and stops when the minimum function J (Eq. 6) is achieved, which can be reflected by the insignificant change of within-cluster (or within-group) SS in two successive iterations (Cox 2005). Lower within-group SS indicates lower within-group dispersion of data points, but higher between-groups dispersion (Tibshirani et al. 2001). Normally, the optimal number of clusters is selected when the decrease in within-group SS is not obvious as compared to its previous number (Cox 2005). As Fig. 4 shows, the within-group SS decreased significantly up to seven clusters and the decrease tends to be insignificant afterward. Therefore, seven clusters were determined to be suitable for classifying additives in this study.

Fig. 4
figure 4

Variation of within-group sum of squares (SS) as cluster number increases

Index levels of clusters

The three-dimensional scatter plots of additives in seven clusters are shown in Fig. 5. It can be seen that the additives in clusters 6 and 3 have similarly high SIs and DAIs, but their UFIs are different. Both clusters 7 and 1 comprise additives of relatively low SIs, indicating high inherent EHH hazards. The difference in UFIs between clusters 7 and 1 is noticeable. Thus, the additives within the two clusters would have different EHH risk implications. Cluster 4 has the lowest number of additives characterized by high UFIs, while additives in clusters 2 and 5 are generally associated with low UFIs. The boundaries between different clusters are difficult to identify in the scatter plot due to the fuzzy nature of FCA. Thus, violin–box plots are used to interpret the index ranges of different clusters.

Fig. 5
figure 5

Scatter plots of additives in different clusters

As shown in Fig. 6, the characteristics of seven clusters are easier to interpret in the violin–box plots rather than the scatter plot. The outer shape of a violin–box plot represents all possible results, while the width of the plot indicates the distribution frequency of the results. The wider area in a violin plot indicates that the results can be more frequently observed within the value range defined by the area. According to the quartile breaks as well as the minimum and maximum values of each cluster, the values of the three risk factors were classified into five different levels (Table 2). As shown in Table 3, a predominant level is defined for the three factors for each cluster. The EHH risk characteristics of each cluster can be interpreted based on the combination of the three factors at different levels. For instance, the common EHH risk characteristics of additives in cluster 1 can be described as medium-level SIs, medium-level UFIs, and high-level DAIs, corresponding to medium-level inherent EHH hazards, medium-level usage, and low-level hazard uncertainty, respectively.

Fig. 6
figure 6

Violin–box plots of normalized a safety indices (SI), b use frequency indices (UFI), and c data availability indices (DAI) of additives in different clusters (black dots represent outlier data points)

Table 2 Levels of three risk factors corresponding to different index ranges
Table 3 Characteristics of different clusters based on the levels of indices

EHH risk interpretation of clusters

The severity of harm depends on the inherent EHH hazards of the spilled additives, which can be reflected by the calculated SIs. Nevertheless, the exposure frequencies to the additives’ EHH hazards are difficult to measure since they vary significantly for each HF operation, depending on many other risk factors, such as well integrity conditions, HF operation procedure, fracturing fluid formulas, locations and geological characteristics of HF sites, and exposed population (Rozell and Reaven 2012; Gandhi et al. 2018). Quantitative risk assessments are only possible when a detailed historical inventory of the undesired events that have occurred in the study area is available (Ferrari et al. 2016). Documentation of chemical contamination or spills, particularly related to fracturing fluid spills, is commonly absent at most sites (Gandhi et al. 2018). Thus, establishing a universally applicable model to quantify the actual exposure to the EHH hazards of different additives can be a challenge, and a qualitative risk assessment may be more feasible and informative (Rozell and Reaven 2012).

In this study, the hazard exposure frequency was evaluated according to the historical usage of additives. From a perspective of risk assessment, a higher UFI suggests a larger chance for the additive to be used in a fracturing fluid, implying a higher possibility of being involved in an unintended release. Hence, a high UFI implies a high frequency of hazard exposure, considering the information of other risk factors affecting the exposure frequency is lacking. If a cluster is characterized by low SIs and high UFIs, then this cluster can be considered to be a high EHH risk cluster due to the high inherent hazard and high exposure frequency. Another important factor, DAI does not directly reflect the EHH risk of an additive; however, it indicates the certainty level of the measured EHH hazard. For instance, if two additives show similar low SIs (i.e., high hazards) and high UFIs, then the additive of higher DAI will be assigned higher risk due to the higher certainty of the high EHH risk. Nonetheless, if two additives show similar high SIs (i.e., low hazards) and low UFIs, then the additive of higher DAI will be assigned lower EHH risk due to the higher certainty of the low EHH risk.

Based on their risk characteristics, the clusters were ranked from the highest EHH risk to the lowest as cluster 7, cluster 2, cluster 4, cluster 1, cluster 5, cluster 3, and cluster 6. Many additives in cluster 7 have low-level SIs, indicating that they are associated with high EHH hazards. The relatively high EHH hazards, combined with the medium-level UFIs and DAIs, suggest that cluster 7 has a high EHH risk. The additives in cluster 2 show a similar distribution of UFIs and a median SI value to cluster 7 additives, suggesting that cluster 2 also has a high EHH risk. However, cluster 2 shows a lower-level DAI than cluster 7, indicating that the high EHH risk of cluster 2 is associated with higher uncertainty. Hence, cluster 2 was assigned the second highest EHH risk, following cluster 7. In fact, the additives in cluster 2 have the lowest DAIs among all the clusters, indicating that the chemical toxicity data gap for cluster 2 additives is more significant than the additives in other clusters.

Although the SIs of the additives in clusters 1, 4, and 5 are similarly distributed around the medium level, the EHH risk of cluster 4 was ranked third due to the high UFIs of the additives. Moreover, the additives in cluster 1 show higher levels (i.e., higher median value) of UFI and DAI, but a lower level of SI than cluster 5 additives, resulting in a higher EHH risk for cluster 1. The additives in cluster 3 are associated with low-to-medium level EHH hazards, and the hazards are confirmed by medium-to-high level DAIs. Also, the additives in cluster 3 are characterized by low-to-medium level UFIs. Thus, cluster 3 was considered to have the second lowest EHH risk. Among all the clusters, cluster 6 was considered to be the cluster of the lowest EHH risk because the additives in this cluster have low EHH hazards, high certainty of the low hazards, and medium UFIs.

EHH risk assessment of additives

The percentages of different additive functional categories in the seven clusters are shown in Table 4. It is found that the iron control agent category has the highest percentage (i.e., 30%) of cluster 7 additives, showing the highest EHH risk among all functional categories. These results are in agreement with a previous study which revealed that the high inherent EHH hazard of iron control agents is due to a large number of ingredients identified as carcinogens and GHS Category 1 aquatic toxicants (Hu et al. 2018b). Additionally, friction reducers and gelling agents were frequently found in cluster 2, which is characterized by the lowest level of DAIs, indicating that more chemical information should be procured to increase the understanding of the EHH effects of these two functional categories. The additives within cluster 2 have a common feature that very limited information has been disclosed for their component ingredients. As a result, the indexing system generated a SI of 5 and a DAI of 0 for these additives with unavailable ingredient information.

Table 4 Distribution of different additive functional categories in seven clusters

Many biocides were classified into cluster 1, showing moderate EHH risks. This is unexpected because biocides are commonly regarded as the additive category of high EHH hazards due to their designed toxic effects, which were reported to be subjected to more regulation than other additives used in HF in the USA (Camarillo et al. 2016; Stringfellow et al. 2017). The previous work also found that the biocide category is associated with a relatively high EHH hazard and high certainty of the hazard (Hu et al. 2018b). The reason for the moderate EHH risks of biocides is that the individual biocides have not frequently been used in HF operations in BC. The low UFI lowered the estimated EHH risks of biocides. Other additive categories, such as anti-sludge agents, crosslinkers, scale inhibitors, and demulsifiers, are frequently found in clusters 6 and 3 characterized by low EHH risks. The remaining additive categories are frequently found in clusters 4, 1, and 5, which are identified with moderate EHH risks.

The risk profiles of individual additives were compared with the host clusters. As shown in Table 5, the additives in cluster 7 were also assigned DOMs to other clusters. Among them, iron control agent IC4 has the highest DOM (µ7 = 0.839) to cluster 7, indicating that it has the highest similarity to this cluster. A GHS Group 2 carcinogen, trisodium nitrilotriacetate (CASRN: 5064-31-3), was found in IC4, resulting in a low SI for this additive (ECHA 2017). The low SI, combined with the medium UFI and DAI, suggests that IC4 is well fitted in cluster 7.

Table 5 Comparison of DOMs of additives grouped into cluster 7

Friction reducer FR2 contains hydrotreated light distillates (petroleum) (CASRN: 64,742-47-8), which is of high environmental health hazards (e.g., not rapidly biodegradable, high bioaccumulation potential, and highly toxic to aquatic organisms) (Comet 2015; ECCC 2017; SA 2017). Additive FR2 also has a medium UFI among the representative friction reducers and a medium DAI due to missing a CASRN for one essential ingredient, thus showing strong similarity to cluster 7. Additionally, other additives with a high DOM to cluster 7 (i.e., µ7 > 0.5) contain ingredients which either are extremely toxic (i.e., GHS Category 1 toxicity) for aquatic ecosystems or have significant adverse effects (i.e., Groups 1 and 2 carcinogenic and mutagenic effects) on human health. These additives also have medium-level UFIs within their respective function categories and medium-level DAIs due to missing ingredient information, showing strong similarity to the risk characteristics of cluster 7. Therefore, the EHH risk profiles of individual additives with a high DOM can be well represented by their host clusters.

The calculated RIs for additives in different clusters are shown in Fig. 7. It can be seen that clusters 7 and 2 additives have high RIs, indicating high EHH risks. The difference in the RIs of additives within these two clusters is not significant (p =0.765). The use of clusters 7 and 2 additives should be reduced in HF operations. Also, it is recommended to monitor the use of high EHH risk additives and develop emergency response plans for potential spills of these additives. Cluster 4 additives generally have the third highest RIs because of their high UFIs. The additives grouped into clusters 1 and 5 show moderate EHH risks as compared to the additives in other clusters. The additives with moderate EHH risks are suitable to be used to substitute the same functional additives with high EHH risks. The difference in the RIs of additives within these two clusters is also not significant (p =0.594). Clusters 3 and 6 additives are associated with relatively low RIs; particularly, cluster 6 additives show the lowest RIs, and the difference is significant (p <0.05) as compared to cluster 3 additives. Therefore, cluster 6 additives are recommended for use in HF operations for EHH risk mitigation.

Fig. 7
figure 7

Comparison of risk indices (RIs) of additives in different clusters

Application of FCA in additive selection

The EHH risks of six hypothetical fracturing fluids were compared. As shown in Fig. 8, fluid F1 shows the lowest EHH risk. Fluid F4 shows a similar additive composition to F1 as only one additive (i.e., the iron control agent) is different (Table 1), and thus F4 has the same low EHH risk, indicating that selecting additives based on the FCA results shows good consistency with the selection based on the highest SIs. Nevertheless, the difference in EHH risk is noticeable in the pairwise comparisons between fluids F3 and F6 and between F2 and F5. Although F3 and F6 show similar low SIs (i.e., high hazards), the differences in the UFIs and DAIs between the additives contained in the two fluids are obvious. The additives contained in F6 have higher UFIs but lower DAIs, indicating that the high EHH hazard of F6 is of higher uncertainty. As a result of the holistic evaluation of the EHH hazard, exposure frequency, and hazard certainty, F3 was identified with a higher EHH risk than F6. In fact, F3 has the highest EHH risk among all hypothetical fracturing fluids.

Fig. 8
figure 8

Comparisons of a safety indices (SI), b use frequency indices (UFI), c data availability indices (DAI), and d risk indices (RI) of hypothetical fracturing fluids

Similarly, fluids F2 and F5 show comparable medium-to-high level SIs (i.e., low-to-medium hazards), but the UFIs and DAIs of the additives in the two fluids are different, resulting in different risk implications. The low-to-medium level hazard of fluid F2 is associated with higher certainty indicated by the higher DAI, resulting in a lower RI for F2. The RI of fluid F2 is approximately equal to the mean value of RIs of F1 and F3; however, the calculated RI of F5 is close to that of F6, suggesting that the selection of additives solely based on SIs could result in skewed EHH risk assessment results.

Conclusions and recommendations

This study developed a new approach using FCA combined with an indexing method to qualitatively assess the EHH risk of HF additives. The indexing method can convert non-commensurate chemical hazard information, chemical toxicity data availability, and additive use frequency into numerical indices, and the FCA can classify the additives into different meaningful groups based on the obtained indices. Seven clusters were generated, and each cluster comprises additives that have similar EHH risk profiles. By interpreting the characteristics of different clusters, the clusters of high EHH risk can be identified. The similarity between individual additives and their host clusters can be evaluated by the respective DOMs. Based on the DOMs to different clusters, the RIs of individual additives can be generated and compared to facilitate the formulation of fracturing fluids with low EHH risks.

The results show that among various additive categories, the iron control agent category has the highest percentage of additives identified with a high EHH risk, while many crosslinkers and anti-sludge agents are associated with relatively low EHH risks. The FCA can also identify the risk factors of concern for each additive. The results show that efforts should be made to disclose more ingredient information for friction reducers and gelling agents to make HF chemical use more transparent. The comparison of EHH risks of hypothetical fracturing fluids shows that selecting additives only based on their hazards could cause information eclipsing as a result of neglecting other important risk factors, such as the use frequency and the certainty of the measured EHH hazards. The combined use of FCA and chemical hazard indexing can generate EHH risk assessment outcomes for additives by holistically evaluating the relevant, readily available risk factors, which provides useful information for additive management in HF operations. Further studies on other risk factors, such as well integrity conditions, HF operation procedure, and locations and geological characteristics of HF sites, are recommended to increase the understanding of the EHH risk of HF operations.