Introduction

Mobilisation of potentially harmful elements (PHE) in the terrestrial environment has been occurring for millions of years from the natural weathering of the parent rocks. Anthropogenic activities have added to this through various activities such as metalliferous mining and smelting, fossil fuel combustion, traffic-related emissions, waste disposal, agricultural practices and the addition of horticultural materials as well as warfare and military contributions. Whilst the presence of PHE in the surface environment poses a hazard to human and animal health, it does not become a risk until the PHE can be mobilised from their sources and reach a receptor. A relatively simple and well-adopted method to assess trace element pools of differential relative lability in soils is the use of single and sequential extraction procedures (Abollino et al. 2011; Bacon and Davidson 2008). They provide information on the mobility and availability of metals and other elements, meanwhile identifying their potential negative impact through their release into other environmental compartments, including human ingestion.

Despite the relative simplicity of these extraction procedures, it is not feasible to apply them to large geochemical data sets which may consist of hundreds or even thousands of samples which represent regional or national-scale studies (Johnson et al. 2005).

In a study of the bioaccessibility of As in soil, Wragg et al. (2007) have suggested that the fractionation of PHE in soils can be assessed by assuming that soils from a particular region are made up from different chemical inputs, e.g. underlying geology, plant materials, anthropogenic inputs from both diffuse and point source pollution and agricultural inputs. They applied a chemometric mixture resolution algorithm to total major and trace element data from a small-scale study of 20 soil samples to identify the chemical composition and amount of these intrinsic soil components (ISC) as well as the distribution of As between the ISCs.

The objective of this study is to apply a similar ISC chemometric identification procedure to total chemical analysis data from a regional-scale set of 723 soil samples collected as part of the British Geological Survey G-BASE systematic geochemical survey of soils, stream sediments and surface waters (Johnson et al. 2005). The soils consist of 276 urban soils and 447 rural soils collected from in and around the UK town of Northampton. In this study, the focus is on the fractionation of Pb in the Northampton area. The ISC Pb fractionation data were compared to the fractionation of Pb in a subset of 10 urban soils obtained using the CISED sequential extraction test (Cave et al. 2004) and to the bioaccessibility of Pb in a subset of 50 of the urban soils measured using the unified BARGE method (UBM) (Denys et al. 2012).

The overall aim of the study is to combine the information from the regional-scale ISC identification, the urban CISED extraction test and the Pb bioaccessibility testing to provide an overview of the solid-phase fractionation of Pb in the soils of the Northampton area and its relative lability as measured by UBM. The outcomes will be used as an evaluation of a new approach to interpreting geochemical survey data and, more specifically, provide data on the behaviour of Pb in soils in rural and urban environments with specific reference to human exposure.

Materials and methods

Site description

A detailed description of Northampton and its geology is given in a recent study of the bioaccessibility of As in soils in the town (Cave et al. 2013). Briefly, Northampton is a large market town in central England (Fig. 1), with a population of ca. 200,000, located on Jurassic ironstones (Northampton Sand Formation and Marlstone Rock). The industrial development of the county of Northamptonshire, including the growth of Northampton, was supported by opencast quarrying of the abundant iron ore in the middle of the 19th century. There were also considerable leather currying and tanning works, breweries, iron foundries, and brick and tile works distributed across the area. It is, however, the shoemaking industry, which was very large, at one time employing 75 % of the population of the county that is more often associated with Northampton. The surrounding area is quite rural, dominated by agriculture, but there is a large motorway (M1) running through the sampling area (Fig. 1).

Fig. 1
figure 1

Location of the urban and rural samples in and around Northampton. The left-hand map shows the sampling locations at a local scale, and the right-hand map shows the location of the sampling area on a national scale

A comparison of the total Pb content of the soils from the urban and rural samples is shown in a boxplot in Fig. 2. Unsurprisingly, the Pb concentrations in the urban environment are generally higher than in the rural sites (median urban 55.9 mg kg−1, median rural 35.9 mg kg−1). All of the urban soils are below the normal background concentration (NBC) of 820 mg kg−1 for the Urban Domain for English soils, and only four of the samples that came from the rural sampling campaign are above the Principal Domain NBC of 180 mg kg−1 (Ander et al. 2013).

Fig. 2
figure 2

Box and whisker plots of the Pb concentrations in the urban and rural soils in and around Northampton

Sample collection, preparation and determination of total concentrations

The soil samples were collected according to the BGS Geochemical Baseline Survey of the Environment (G-BASE) sampling procedure (Flight and Scheib 2011). Collection was carried out on a 500-m grid at a density of 4 per km2 in open ground for the urban soils and 1 every 2 km2 for the rural samples. These are collected using a hand-held Dutch soil auger where each sample was comprised of a composite of five subsamples collected from a 5–20 cm depth at the centre and four corners of a 20 m square. The total element concentration of ca. 40 elements was determined in the <2 mm size fraction of each topsoil using the sample preparation, X-ray fluorescence spectroscopy and quality control procedures previously described (Allen et al. 2011; Johnson 2011).

Bioaccessible lead

A subset of 50 samples from the urban samples were selected for bioaccessibility testing to cover the range of total Pb concentrations in the urban sample set. The chosen samples were further sieved to <250 µm as this particle size fraction is considered to be the optimum size to adhere to children’s hands (Duggan et al. 1985; Calabrese et al. 1997). The total elemental concentration within the <250 µm size fraction was determined using the procedure described above. Assessment of the bioaccessible Pb was undertaken using the UBM, a gastrointestinal sequential extraction at physiologically relevant pH, temperature and timescales to emulate fasting conditions (to provide a conservative estimation) in the human gut. The UBM is based on the method previously described by Oomen et al. (2002), modified to ensure adequate conservatism, robustness and applicability over a range of soil types and different geographical regions, whilst maintaining its physiological relevance (Wragg et al. 2011). Incorporating mouth, stomach and small intestinal phases, the pH used within the test are 6.5, 1.2 and 6.3 with extraction times of 60 s, 1 h and 4 h, respectively. The method generates samples for analysis at the end of the gastric-phase extraction, known as ‘stomach’ and intestinal-phase extraction, known as ‘stomach and intestine’. Full details of the UBM application to different test soils have been previously described (Broadway et al. 2010; Pelfrene et al. 2011; Roussel et al. 2010; Wragg et al. 2009). As a part of its development, which included tightening the pH requirements of the stomach phase pH from 1.2–1.5 to 1.2 and a change in the separation step from centrifugation at 3000 g for 5 min to 4500 g for 15 min, the UBM has also been validated against a juvenile swine model for As, Cd and Pb (Caboche 2009; Denys et al. 2012). The validation study utilised a set of 16 different soils contaminated by mining and smelting activities, including the NIST reference material 2710, resulting in a correlation between relative bioavailable As, Cd and Pb and bioaccessibility that was highly significant for both phases. Denys et al. (2012) showed that the slopes of the regression were not significantly different from 1 (based on 95 % confidence interval) and that the regression intercepts were not significantly different from 0.

As there are no standard reference materials available with certified Pb values from physiologically based bioaccessibility methods, quality control of the extractions was monitored by incorporating the replicate extraction of the BGS guidance material [BGS 102 (Wragg 2009)] into the overall extraction scheme, which had been previously subject to an international inter-laboratory trial (Wragg et al. 2011) generating reference values for comparison. Within every sample batch containing a maximum of 10 unknown samples, one was extracted in duplicate and the rest only once, and one blank and one quality control soils were included.

The stomach and intestinal extracts were analysed for their bioaccessible Pb contents using a Thermo Elemental ExCell quadrupole ICP-MS instrument in combination with a Cetac ASX-510 autosampler. Details of the analysis and instrumental operating conditions have been previously described by Wragg et al. (2011) and Watts et al. (2008), respectively. The measured Pb concentration in the blank extractions was always less than the calculated method detection limit (five times the average blank measurement (n = 7), equating to 3 mg kg−1). Bioaccessible Pb values for BGS 102 (n = 8) for the gastric phase only, 14.36 ± 2.43 mg kg−1, were in good agreement with consensus values reported previously (Wragg et al. 2011), 15.2 ± 6.6 mg kg−1.

The UBM provides gastric and small intestine-phase samples. Higher, more conservative, bioaccessible Pb values which are more reproducible tend to be associated with the gastric phase because of the low-pH conditions employed (Cave et al. 2011; Wragg et al. 2011). As a result, the bioaccessible Pb values associated with the gastric phase of the UBM will be used in further interpretations.

CISED extractions

The samples investigated for their bioaccessible content were further subdivided for determination of the solid fractionation of the Pb content using the CISED method. The selected samples covered the range of bioaccessible Pb data. Each soil was leached by the CISED sequential extraction using aliquots of deionised water and increasing concentrations of (0.01–5.0 M) aqua regia in the method detailed by Wragg and Cave (2012). In brief, a 2 g aliquot of soil was accurately weighed onto the 0.45-µm polypropylene filter insert of a Whatman Vectaspin 20® polypropylene centrifuge unit, and a 10 ml volume of the initial extractant was added. The extractant was passed through the test sample using centrifugal force (3000 rpm) and collected for analysis by ICP-AES, and the process was repeated with the next volume of extractant. Seven extractants (deionised water, 0.01, 0.05, 0.10, 0.50, 1.0 and 5.0 aqua regia) were applied to each test soil in duplicate to produce a total of 14 extracts per soil for analysis. For extraction of the latter stages of the CISED (0.1–5.0 M), 0.25, 0.50, 0.75 and 1 ml of hydrogen peroxide were added to the extractants making up to volume (10 ml). One sample blank and one sample duplicate were extracted per sample batch. Analysis of the extracts for Al, As, Ca, Cd, Co, Cr, Cu, Fe, K, Li, Mg, Mn, Mo, Na, Ni, P, Pb, S, Se, Si, Sr, V and Zn was carried out using a Varian Vista AX CCD simultaneous instrument with dedicated Varian SPS-5 Autosampler. The extraction repeatability was <5 % across all elements analysed; Pb concentrations were below the method limit of detection for all blank samples.

Chemometric data analysis

The self-modelling mixture resolution algorithm (SMMR) is a multivariate statistical technique that deconvolves two-way data from a multi-component mixture into single components previously described by Cave and co-workers (Cave and Wragg 1997; Cave et al. 2004; Cave 2008). The output from the algorithm is summarised in Fig. 3.

Fig. 3
figure 3

Diagrammatic representation of the data outputs from the SMMR procedure

Results and discussion

Chemometric data analysis

The SMMR method was applied to the total element concentration data (matrix A, Fig. 3) consisting of 723 samples (276 urban soils and 447 rural soils) and 50 elements and to the CISED extract data consisting of 10 soils with 14 extracts per soil (140 samples) by 22 elements and was implemented in the MATLAB programming language.

For the total element data, the SMMR produces the ISCs and for the CISED extracts, the acid-soluble physico-chemical components. In both instances, the Pb associated with either ISCs or components was determined.

The number of components in the output matrices B and C (Fig. 3) for each of the two data sets was estimated using the minimum Akaike Information Criterion (AIC) (Akaike 1974) as described previously (Wragg and Cave 2012). The AIC provides an objective way of determining which model among a set of models is the most parsimonious, i.e. provides the best fit to the data with a minimum number of model parameters. The model with the lowest AIC is chosen as optimum.

Figure 4 shows that 21 ISCs were selected for the 723 samples from the total element data set and 15 components for the CISED extractions of the 10 samples. Multiple linear regression (MLR) modelling, used to examine the relationships between the bioaccessible Pb, the ISCs and components, was carried out using the R programming language (R Development Core Team 2011).

Fig. 4
figure 4

Selection of the number of components for the SMMR model of the total element data a for total element data and b CISED extractions

Pb fractionation from total element data

The SMMR analysis suggests that 21 geochemically distinct ISCs can be identified. Using the composition and proportion matrices from the SMMR (Fig. 3), the proportion of the Pb (summed over all 723 samples) associated with each ISC can be calculated. Figure 5 shows the mass of Pb associated with the 15 ISCs containing the highest mass of Pb. The top eight of these ISCs contain 80 % of the Pb summed over all samples. In addition to the outputs from the SMMR analysis, the bioaccessible fraction of Pb in a subset of 50 of the soils from the urban samples is also available. If our receptor of concern is a human and the pathway is by soil ingestion, then establishing a relationship between the Pb-containing ISCs and the bioaccessible fraction of Pb will provide an insight into the relative mobility of Pb in the ISCs.

Fig. 5
figure 5

Pb fractionation in the 15 ISCs containing the highest concentration of Pb. Error bars represent 95th percentile confidence limits

A MLR model was set up with bioaccessible Pb as the dependent variable and the mass of Pb associated with each of the 21 ISCs present in each soil (matrix B, Fig. 3) as the predictor variables. Using a stepwise selection of predictor parameters with AIC as the selection criteria, three ISCs were identified as being significant predictors of bioaccessible Pb (ISC 1, ISC 4 and ISC 7). The final model used a robust MLR model to take into account possible outliers in the data with the confidence intervals (CIs) on the coefficients and the p values being calculated by bootstrap resampling. The summary statistics for the model are given in Table 1.

Table 1 Summary statistics for the MLR model used to predict Pb bioaccessibility from ISC Pb values

The final MLR model describes 76 % of the variance in the data and suggests that ISCs 1, 4 and 7 contain the bioaccessible Pb in the soil samples. All three ISCs are in the top four Pb-containing ISCs for total Pb (Fig. 5). Although ISC 19 is also in the top four Pb-containing ISCs, it is not found to be a predictor of bioaccessible Pb, leading to the conclusion that the Pb in this ISC is not bioaccessible.

Examination of the coefficients of the model shows that the intercept is not significantly different from 0, suggesting that all of the bioaccessible Pb can be accounted for by the total element data ISCs and that no other constant term is required. ISCs 1, 4 and 7 can contribute a fraction of their Pb content between 0 and a maximum of 1 (0 being no contribution and 1 indicating all the Pb in that component is bioaccessible). Table 1 shows that the coefficients for ISC 1 and ISC 7 are close to 1 (within uncertainty limits) and that ISC 4 is less than 1 (ca. 0.6). This suggests that all of the Pb associated with ISC 1 and ISC 7 is bioaccessible and that a smaller fraction of the Pb in ISC 4 contributes to the total bioaccessible fraction.

The contributions to the Pb content of the soils of these three ISCs split into those from the rural and the urban areas are shown as box and whisker plots in Fig. 6. If there are higher concentrations of Pb for a particular ISC in a soil for a particular category, then this implies there is a higher amount of that ISC in the soil sample. For example, the Pb contribution for ISC 1 is higher in the urban samples than in the rural samples, and therefore the amount of ISC 1 in urban samples is higher than in the rural samples.

Fig. 6
figure 6

Boxplots of the Pb associated with ISCs 1, 4 and 7 split into rural and urban populations. The centre line of the box is the median, and the open circles are points greater than three times the interquartile range distant from the median

Figure 6 shows that ISCs 1 and 4 have higher concentrations of Pb in the samples found in the urban environment, whereas ISC 7 shows the opposite trend. ISCs 1 and 7 cover similar concentration ranges (0 to 30 mg kg−1 Pb), whereas ISC 4 has concentrations of Pb up to 563 mg kg−1.

Figures 7, 8 and 9 show the chemical element compositions of the three ISCs set out in decreasing order of percentage contribution. The error bars represent the 95th percentile confidence limits.

Fig. 7
figure 7

Chemical element composition of ISC 1

Fig. 8
figure 8

Chemical element composition of ISC 4

Fig. 9
figure 9

Chemical element composition of ISC 7

In ISC 1 (Fig. 7), the most abundant element is Fe (ca. 40 %) followed by Al (ca. 20 %), Ca (ca. 10 %) and P (ca. 2 %); the median Pb content for this ISC is ca. 0.04 %, but the uncertainty shows that this is quite variable. ISC 1 also contains V, Cr, Zn and Cu at similar concentrations, and Fig. 6 shows that it is more prevalent in the urban soils. This suggests that this may be an iron oxyhydroxide which is known to be impure, is able to adsorb metals (Cornell and Schwertmann 1996) and can be formed through anthropogenic processes.

ISC 4 (Fig. 8) is 50 % Ca; Fe and Al are not well defined, but P and Mg are ca. 5 %. Pb makes up about 1 % of this ISC and also contains similar compositions of Cr, Zn and Cu and a smaller but significant concentration of Sn. This ISC is again more prevalent in the urban environment and accounts for the largest proportion of the total Pb values >30 mg kg−1 (Fig. 6). The high Ca content could point to this being a carbonate phase, but if this were the case, it would be expected that the carbonate would be fully soluble in the acid environment of the UBM test and that Pb would be fully bioaccessible. The regression analysis (Table 1) suggests that this is not the case. The high concentration of P (ca. 10 %) could suggest a phosphate mineral (Fig. 8), but there is also evidence that Ca can form organo-mineral assemblages (Chen et al. 2014) in soil, but the exact nature of ISC 4 cannot be clearly defined without further mineral-specific studies. Despite this, the fact that it is more prevalent in the urban environment suggests that it is formed from anthropogenic processes possibly derived from aerial deposition of air particulates.

ISC 7 (Fig. 9) is predominantly Si (ca. 80 %) with the next most abundant element being Ca (ca. 10 %). The high Si-to-Ca ratio suggests that rather than being a Ca/Si mineral, this is more likely to be a quartz material with carbonate coatings. Further evidence for this comes from the regression of bioaccessible Pb against the Pb content of the ISCs (Table 1), which shows that the Pb associated with ISC 7 is fully bioaccessible. The fact that the Pb in this ISC is fully bioaccessible Pb suggests that it will be associated with carbonate coatings rather than the more recalcitrant quartz material. The Pb content is ca. 0.1 % with similar concentrations of Zn, Cu and Cr. Unlike ISCs 1 and 4, there appear to be generally higher concentrations of this ISC in the rural samples (Fig. 6).

The MLR modelling of bioaccessible Pb using the ISC data demonstrates that the ISCs derived from SMMR modelling of the total element data can be quantitatively linked to the bioaccessible fraction in the soils and that the three ISCs associated with bioaccessible Pb have plausible chemical signatures that can be linked back to specific physico-chemical components in the soil.

Pb fractionation from the CISED sequential extraction procedure

The CISED sequential extraction identified 15 distinct geochemical components (Fig. 4b). The distribution of Pb between each of these components containing the highest concentration of Pb is shown in Fig. 10 with the majority of the Pb being associated with one component (component 3). The mass of each of the components extracted over the 14 extraction steps was summed for each sample to give a data set of 10 samples by 14 extracts.

Fig. 10
figure 10

Pb fractionation in the eight components containing the highest concentration of Pb. Error bars represent 95th percentile confidence limits

In a similar manner to the total element data, a MLR regression model was set up with the Pb content of the 15 components in each sample as the predictor variables and the bioaccessible fraction as the dependent variable. Using the same variable selection criteria used for the ISCs model, only one CISED component was shown to be significant in predicting the bioaccessible fraction (component 3). Pearson correlation coefficients between CISED component 3 and the equivalent 10 samples from the total element data ISCs pick out the three ISCs that were identified as being controlling factors for bioaccessible Pb (Table 1) as having the highest positive correlations (ISC 4, 0.993; ISC 7, 0.941; and ISC 1, 0.722) showing a strong link between the ISCs and CISED component.

A linear regression of bioaccessible Pb against the Pb associated with component 3 (using orthogonal regression as both measures have similar uncertainties) and bootstrapping to determine the CI on the regression line leads to a slope of 1.02 (95th percentile CI 0.76–1.59) and an intercept of 5.93 (95th percentile CI −14.1 to 26.2), suggesting that the bioaccessible Pb as measured by the UBM is almost entirely accounted for by the Pb associated with CISED component 3 (Fig. 11). Examination of the chemical composition of component 3 (Fig. 12) shows that the error bars for Al, Fe and Ca are all large, and therefore, their contribution to component 3 is not well defined. P and Pb, however, are shown to be present (ca. 10–20 %) as well as Cu and Ba (ca. 2 %).

Fig. 11
figure 11

Regression line and bootstrapped 95th percentile confidence limits showing the relationship between bioaccessible Pb and the lead content of CISED component 3

Fig. 12
figure 12

Chemical element composition of CISED component 3

The fractionation of Pb from the ISC study of the total element data suggests that at least two of the ISCs (ISCs 4 and 7) related to bioaccessible Pb are Ca rich and one of these (ISC 7) may be associated with carbonates (Figs. 7, 8).

Figure 13 shows the CISED extraction profiles for the subset of 10 samples, showing the 14 extraction points associated with the application of the increasing acid strength in the method, for components 3 and 4. Component 4 accounts for the majority of the Ca found in the samples and is thought to be a calcium carbonate component. Clearly, component 3 has an extraction window at higher acid strengths (extracts 7 to 12 peaking at extract 8) than the extraction window for calcium carbonate (extracts 3 to 10 peaking at extract 5), suggesting that component 3 is not a simple carbonate. The sample showing an anomalous extraction profile for component 4 has higher carbonate content than the other samples and therefore requires higher acid strength to dissolve it from the soil.

Fig. 13
figure 13

CISED extraction profiles for the calcium carbonate component (Component 4) and the component associated with bioaccessible Pb (Component 3) for the CISED extractions on the 10 soil samples

A possible explanation for component 3 comes from literature studies in which a number of research groups show that Pb derived from anthropogenic sources where significant concentrations of P also exist results in a relatively stable form in soils as chloropyromorphite (Pb5(PO4)3Cl) or similar pyromorphite minerals (Hettiarachchi and Pierzynski 2004; Cotter-Howells et al. 1994; Scheckel et al. 2005). In particular, this mineral can be found in a specific form in Ca-rich environments where Ca can replace Pb to form chloroapatite [Ca5(PO4)3Cl](Cotter-Howells et al. 1994). This is in agreement with the composition of component 3 which clearly contains Pb, P and possibly Ca (Fig. 12). All three of the ISCs associated with bioaccessible Pb contain P, Pb and Ca (Figs. 7, 8 and 9), suggesting that fine-grained pyromorphite is formed within each of these ISCs. This would also explain why the CISED only identifies one component (the pyromorphite), whereas the ISC study showed that there were three components. This suggests that the three ISCs all contain fine-grained pyromorphite which is extracted as one common component by the CISED procedure. It could be speculated that the three ISCs have been formed recently and have trapped the anthropogenic Pb in the form of the pyromorphite mineral. Some studies suggest that Pb, in the form of a pyromorphite mineral, is relatively stable and not bioaccessible (Hettiarachchi and Pierzynski 2004; Scheckel et al. 2005). Other studies, however, have shown that pyromorphite solubility is increased in the low-pH environment found in the stomach compartment (Xie and Giammar 2007; Tang et al. 2004) and that the presence of impurities and organic acids also increases its solubility (Debela et al. 2010; Xie and Giammar 2007). This study shows, however, that if the assumption that pyromorphite is being formed is correct, this mineral forms the majority of the bioaccessible Pb fraction as measured by the UBM bioaccessibility method for the samples studied here. Further mineral-specific investigation of the soils is required to confirm these findings.

Conclusions

The study specifically shows that SMMR modelling of total element data from a regional geochemical survey provides meaningful fractionation information on the total Pb content of the soils in terms of ISCs. These data can be quantitatively linked to bioaccessibility measurements, allowing the identification of the ISCs which contain the bioaccessible Pb fraction. Comparison of the ISCs with physico-chemical components derived from the CISED sequential extraction method shows that there are distinct differences between the two methods. The latter method only targets the more labile acid-soluble fractions, whereas the former reflects all forms of Pb regardless of their mobility. Quantitative linkages, however, between the CISED components, bioaccessible Pb and the total element data ISCs have led to a synergistic understanding of the geochemistry of the bioaccessible Pb over the region being studied.

This outcome suggests that SMMR modelling of ISCs combined with sequential extraction fractionation of a subset of soils using the CISED method can provide a powerful tool for studying the fractionation of PHE in soils on a regional scale.