Introduction

Mining has caused severe damage to the environment. In particular, metal mining has led to widespread pollution of soil with potentially toxic elements; its legacy goes back centuries (Dudka and Adriano 1997; UNEP 2001). There are thought to be several million abandoned mines around the world. A rough estimate puts the number at more than 600 000 abandoned mines in South Africa, Australia, UK, USA and Canada alone (IIED 2002). In Mexico, 585 abandoned mines and their associated tailings are recorded in the latest geo-referenced inventory, but that number will almost certainly grow as many more sites are identified and are verified (SEMARNAT 2021). Many mines were abandoned when ore bodies were exhausted or when they became unprofitable. Waste materials were left in piles, exposed to rain and wind, and without vegetative cover they spread their toxic loads into their surroundings for years afterwards (IIED 2002). If nothing is done to prevent it, the waste will continue as a source of potentially toxic elements (PTE) spread by the erosion of tailings, in wind-blown dust, and in drainage water. Leaching of the elements can also acidify the soil, ground water and surface water (Dudka and Adriano 1997). The fates of the elements once in the soil depend to some degree on the nature of the soil itself. Most elements are more mobile in acid soil than in alkaline or calcareous soil (Alloway 2012) and more likely to be leached from the soil.

Metals in tailings are among the most damaging legacies of mining in that they can cascade through the environment into plants and animals and eventually into human food (Cross et al. 2017). As above, if no action is taken then the pollution continues to harm the environment and to threaten the health and safety of both humans and their livestock. The most immediate need in most cases is the assessment of the concentrations of the pollutants and their distributions in the affected land.

The distributions of metal pollutants in soil can vary from one spatial scale to another because of the natural variation in the soil itself and differences in land management (Yun et al. 2020). Farmers, their advisors and agencies responsible for restrictions on land use or remediation need to understand where and on what scale pollutants are spatially distributed to decide how to manage the land safely and to develop suitable strategies and methods for soil remediation.

Table 1 Degrees of freedom for the nested sampling and the corresponding distances
Table 2 Summary statistics of concentrations of the elements and of their transformations. All the concentrations are in mg kg\(^{-1}\), except for Ca and Fe which are % by mass

Mexico is rich in mineral ores. From the viceroyalty of the sixteenth century and into the twentieth century extraction was inefficient, and it left large amounts of metal-rich waste. There was little or no concern for the damage it might do to the environment or for regulation (Douglas and Hansen 2008). Detailed records are few, and the extents of lands affected are largely unknown, both in Mexico as a whole and in Sonora in particular. The tailings deposit around San Felipe de Jesús, the study of which we describe below, is one example of the legacy left by mining. The concentrations of lead (Pb), arsenic (As) and zinc (Zn) in the deposit and nearby soil exceed national and international standards (Del Rio-Salas et al. 2019). These elements can be taken up by plants (Loredo-Portales et al. 2020), and, given that this land is used for agriculture, they represent a serious threat to the safety of food for human consumption. We surveyed this area to map the distributions of the potentially toxic elements to identify where remediation is urgent or desirable and where the mobility of pollutant metals should be studied.

Materials and methods

Case study: San Felipe de Jesús

The study was done in San Felipe de Jesús and Aconchi, Sonora, in Northwestern, Mexico (Fig. 1). The two towns lie contiguous to one another along the Sonora River within the Sonora River basin. The regional climate is arid (BSO) with average monthly temperature ranging from 12.3 \(^{\circ }\)C in January to 30.4 \(^{\circ }\)C in July, but maximum temperature can reach 47 \(^{\circ }\)C (Brito-Castillo et al. 2010). The average annual precipitation is approximately 481 mm, with a range from 300 to 600 mm. Most of the rain falls in July and August (summer) in short spells (SMN, 2020). The natural vegetation is thorn-scrub dominated by leguminous trees and cacti (Martínez-Yrízar et al. 2010).

Mining started in the region in about 1900. Sampling from the mine workings in 1932 gave grades up to 16.21 oz/ton (470 g tonne\(^{-1}\)) silver, 21.7 % lead, 29.5% zinc and 27.65% copper. There are no records of production, but as much as 100 tonnes ore are estimated to have been extracted per day on average. Mining was suspended in 1944 because of low metal prices. Mining resumed briefly from 1957 to 1959 and recommenced again from 1963 to 1968. In 1973, a flotation plant was constructed for processing ore, and that functioned until 1991 (Tietz 2018). The abandoned laboratories still exist, and in them can be seen the remains of the chemicals used to analyse the samples.

Table 3 Eigenvalues of correlation matrix
Table 4 Parameters of spherical models and cross-validation statistics

Waste from the mine was piled in Aconchi, 0.5 km to the south of San Felipe de Jesús (Fig. 1). The pile is 140–160 m across at its base, covering approximately 16 300 m\(^2\), and with a height varying from 2 to 5 m (Espinoza-Madero 2012). The residues in this pile seem to be the main source of pollution in the neighbouring agricultural land. The pile is still completely free of vegetation, is subject to wind erosion during the dry season, and in the summer heavy bursts of rain erode gullies. During the rainy season, a small stream (named El lavadero) connects the pile to the Sonora River. Additionally, efflorescent minerals consisting of white crusts have precipitated on top of the pile by evaporation. These materials can concentrate toxic elements, and are easily soluble and dispersed by wind, contributing to dispersion of the elements into the surrounding environment (Bea et al. 2010; Del Rio-Salas et al. 2019; Loredo-Portales et al. 2020).

We selected for study an area of 900 ha, most of which is agricultural, along the Sonora River (with its northwest corner at 572305.56 E, 3303770.15 N to its southeast corner at 574670.22 E, 3299861.47 N) and close to the abandoned mine tailing deposit at 572717.27 E, 3302399.27 N (Fig. 1). The soil comprises Regosols, Fluvisols, and Phaeozems (1:250 000; INEGI (2005)), with 40% or more of sand. It has a pH ranging between 6.1 and 8.7 (in water), electrical conductivity 25 to 342 \(\upmu\) S m\(^{-1}\) (in water), 0.1 to 1.2% of C, and 0.03 to 0.16% of N.

Water is extracted from wells and the Sonora River for irrigation. Agriculture and cattle raising are the most economically important activities in the area. Agriculture is practised on the flood plain of the Sonora river. The main crops for human consumption are groundnuts (Arachis hypogaea), garlic (Allium sativum) and maize (Zea mays), whereas alfalfa (Medicago sativa) and barley (Hordeum vulgare) are the most important forage crops for livestock (SIAP 2019).

Survey

We mentioned above that the environmental damage and risks to the health of both humans and their livestock caused by toxic elements depends mainly on their concentrations and distributions. Our first task at San Felipe de Jesús was to assess these for five potentially toxic elements, namely, lead (Pb), arsenic (As), zinc (Zn), copper (Cu) and manganese (Mn), and to map them. All five had been reported to be present in large concentrations in the mine tailings (Del Rio-Salas et al. 2019; Loredo-Portales et al. 2020), no other potentially toxic elements were found to be present in important concentrations. We added calcium (Ca) to our list for analysis because it might help us to understand the mobility of the toxic metals (Alloway 2012). We also measured the concentration of iron (Fe) since it displays a conservative behaviour in the basin (Calmus et al. 2018). Despite the earlier studies, which focused on the concentrations of the elements in the mine tailings themselves, we knew nothing of the spatial scales of variation of the elements in the agricultural soil and so did not know how densely to sample for mapping, for which we should use kriging, the current best practice. Too sparse sampling could make kriging impracticable for lack of spatial correlation in the data; dense sampling on the other hand might be unnecessarily expensive and exceed the budget. Finding a suitable compromise has been a common problem in environmental science for many years. As Marchant and Lark (2007) pointed out, by sampling in two or more stages one can design efficient surveys for mapping; an initial stage provides rough estimates of the spatial scale(s) of variation, and later stages can fill in the gaps by grid sampling and concentrated where the contamination seems most serious.

Principles of nested sampling

Pollutants from abandoned mine tailings are spread by wind and water to varying extents and are not all equally mobile. Their distributions on neighbouring land can be further modified by the way the land is managed. So before one can design a sampling scheme suitable for mapping the distributions one needs to know what the spatial scales of variation are, as Lark et al. (2017) pointed out.

Youden and Mehlich (1937) were the first to propose a spatially nested sampling design to discover the spatial scales of variation in soil. They sampled soil at locations arranged hierarchically into clusters separated by fixed distances but with random orientations. Each distance corresponded to one level of the hierarchy, and at each sampling location they selected two substations, and so on. An analysis of variance of their measurements allowed them to partition the variance of the measured properties into components associated with each level of the design. By accumulating the components in sequence from the smallest to the largest distance one can obtain a crude variogram. The technique lay dormant in soil survey until Webster and Butler (1976) resurrected it for a soil survey in the Southern Tablelands of Australia. In both surveys, the designs were balanced with four levels. Adding more levels to refine the spatial structure while maintaining balance would soon make the technique unaffordable because the size of the sample would double with each added level. Furthermore, the doubled degrees of freedom at the lower levels would be unnecessarily large for estimation of the components of variance for the smallest separating distances.

Since then the basic design has been elaborated, sacrificing balance for economy. Oliver and Webster (1987), for example, designed a scheme with five levels but without doubling the sampling at the lowest level, and Atteia et al. (1994) extended the principle to six levels without doubling the sampling in the fifth and sixth levels. More recently Lark (2011) devised a strategy for optimizing such nested schemes (see also Webster and Lark 2013), and Lark et al. (2017) applied it in a survey of heavy metals in the soils near a large tailings dam in Zambia. We adapted the strategy for our survey of the polluted soil at San Felipe de Jesús.

Implementation of nested sampling

Our initial sampling was an unbalanced nested design with six stages with distances increasing in an approximately threefold progression from 3.6 m to 1050 m (3.6, 11, 33, 100, 300 and 1050 m). The first stage comprised eight main centres placed randomly over the region with an average distance between nearest neighbours of approximately 1050 m (Fig. 1). From each main centre, three second sites were chosen 300 m apart on an equilateral triangle (Fig. 2). From each vertex of the triangle, five sites were allocated 100 m away to comprise the third stage. The next level contains five sites at 33 m separation, the fifth and sixth levels are composed of four sites at 11 m separation, and three sites at 3.6 m separation, respectively (Fig. 2). This gave a total of eight main centres, with 20 points to each, and therefore, 160 soil sampling points in all. At each site at any one stage, from the second level onwards, points were placed on random orientations to comply with the random effects model. The sampling points are shown by red discs in Fig. 1. Once the site was located, we used a GPS (Garmin eTrex10) to geo-reference the point. Table 1 sets out the corresponding analysis of variance for this design.

At each sampling point in the design we took five samples of topsoil (0–30 cm) at the vertices and centre of a square of 50 m \(\times\) 50 cm and bulked them. Each sample was put in paper bag in the field, air-dried in the laboratory and sieved to pass 2 mm. The sieved sample was reduced to 30 mg by coning and quartering, and this sub-sample was milled in an agate ball mill according to EPA protocol (6200). The samples were analysed by a portable X-ray fluorescence spectrometer (XRF, Niton XL3t Ultra) to measure total concentrations of Pb, As, Zn, Cu, Mn, Fe and Ca. Data from the manufacturer assured us of its accuracy, and we verified its accuracy against the reference material NIST-2710a provided by the manufacturer after every 20 samples. There was no significant deviation from known values. The main source of error in the measurement of elements in soil by the technique is the heterogeneity within the soil samples themselves, as Ravansari et al. (2020) have pointed out. To diminish this error, measurements were made in triplicate and mean values calculated. The standard errors are listed in Table 2.

The structure of the sampling can be represented in a table as for an analysis of variance (anova). Table 1 lists the degrees of freedom with the corresponding distances. Our main aim is to estimate the components of variance at these distances, and so we have used residual maximum likelihood (reml) (Patterson and Thompson 1971) for the purpose, because it is more efficient than anova. Lark (2011) sets out the mathematics of the reml solution, and we do not repeat it here. The estimated components of variance were summed to give rough variograms (Fig. 3). Note that the concentrations for all elements except Fe and Ca were transformed to common logarithms to give distributions that were approximately symmetric. The transformations are listed in Table 2 for the whole data (see below).

The variograms deriving from the nested analysis, and shown in Fig. 3, are too rough for use in kriging. We wanted to improve the estimates between 11 and 33 m, and so we added 50 points 20 m away from 50 of the original 160 sampling points on random orientations. These are shown as green stars in Fig. 1. Finally, as one can see in Fig. 1, there were still large gaps between the nests, and we should want to place further points in these gaps for kriging. Otherwise there would be large errors in the kriged predictions. We therefore added a further 51 points at the nodes of a 220-m regular grid wherever nodes lay more than 200 m from a point in the nests. These points are shown as yellow + symbols in Fig. 1.

Samples of topsoil were taken from these additional locations and analysed by X-ray fluorescence spectroscopy in the same way as for the original 160. We thus had measurements for all elements at a total of 261 of locations from which to map the concentrations.

Geostatistical analysis: variograms and their modelling

The complete set of data comprised the measured concentrations on (1) soil sampled at sites of the original nested design, (2) a set of sites chosen close to 20 m from any of the previously sampled sites and (3) sites on a grid at 220-m intervals in those parts of the region with large gaps. In all there were 261 sampling sites providing 261 values, bar a few missing ones, for each metal. Table 2 summarizes the data, both on the original scales and after transformation where desirable. Although we did not analyse Fe geostatistically, we include it in the summary and in the principal components analysis (see below), because it helps to understand the calcium pattern.

We computed the Pearson correlation coefficients among the elements and did a principal components analysis on the correlation matrix for reasons that we explain below. The results are summarized in Table 3, from which one sees that almost 85% of the variance lies in the leading two components.

The sites are strongly clustered, one consequence of which is that the experimental variograms computed by the usual method of moments have strong peaks and troughs, which make modelling them uncertain. Marchant et al. (2013) found that in such a situation maximum likelihood estimation is better and gives stable results. It also has the advantage of fitting models over the whole range of the region. We used specifically residual maximum likelihood, reml, for the purpose. Having fitted models in this way, we compared the two most plausible models, exponential and spherical, by cross-validation. We did so by omitting each point in turn and predicting the value there by ordinary kriging from the rest of the data. The validated parameters were then ones finally be needed for interpolation and mapping.

Table 4 lists the parameter estimates and cross-validation statistics for the spherical models, which fitted best and for which the equation is

$$\begin{aligned} \gamma (h)= & {} \ c_0 + c_1 \left\{ \frac{3h}{2r}-\frac{1}{2} \left( \frac{h}{r} \right) ^3 \right\} {} \quad \hbox {for} \,h \le r \nonumber \\ {}= & {} \ c_0 +c_1 \hbox { for}\, h > r \nonumber \\ {}= & {} \ 0 \quad \hbox {for} \, h=0 . \end{aligned}$$
(1)

The parameters are the variances \(c_0\), the nugget variance, and \(c_1\), and the range r. We have treated the variation as isotropic, so that the lag h is a scalar in distance only.

The cross-validation statistics are the mean error of prediction (ME), the mean squared error of prediction (MSE) and mean square deviation ratio (MSDR), i.e. the ratio of the squared deviation to the kriging variance. They are as follows in which \(z(\mathbf{x}_i)\) is the observed value at \(\mathbf{x}_i\), \({\widehat{Z}}(\mathbf{x}_i)\) is the predicted value there and \(\sigma ^2_\mathrm{K}(\mathbf{x}_i)\) is the kriging variance. The averages are over the n data.

$$\begin{aligned}&\text{ ME } \ = \ \frac{1}{n} \sum _{i=1}^n z(\mathbf{x}_i)-{\widehat{Z}}(\mathbf{x}_i) \ . \\&\text{ MSE } \ = \ \frac{1}{n} \sum _{i=1}^n \left\{ z(\mathbf{x}_i)-{\widehat{Z}}(\mathbf{x}_i) \right\} ^2 \ . \\&\text{ MSDR } \ = \ \frac{1}{n} \sum _{i=1}^n \frac{\left\{ z(\mathbf{x}_i)-{\widehat{Z}}(\mathbf{x}_i) \right\} ^2}{\sigma ^2_\mathrm{K}(\mathbf{x}_i)} \ . \end{aligned}$$

We have added the median of the squared deviation ratio (medSDR):

$$\begin{aligned} \text{ medSDR } \ = \ \text{ median } \left[ \frac{\left\{ z(\mathbf{x}_i)-{\widehat{Z}}(\mathbf{x}_i) \right\} ^2}{\sigma ^2_\mathrm{K}(\mathbf{x}_i)} \right] \ . \end{aligned}$$

The mean errors are all close to zero, which is to be expected; kriging is an unbiased predictor. The mean squared errors are small. The important diagnostic is the MSDR. Ideally this should be 1; i.e. the squared deviation between the observed and predicted value should equal the kriging prediction error variance. The MSDRs for the first five elements listed in Table 4 are all close to 1; that for calcium is also sufficiently close to justify our accepting the model tabulated. The table includes the variances of the data, \(s^2\), for comparison with the sill variances, \(c_0+c_1\), of the models. The median of the squared deviation ratios should be close to 0.455 for a true model. All are less than this value; only that for copper is close.

Results

REML analysis and variograms

Summary statistics of concentrations for the elements are listed in Table 2. Among the elements, Ca had the largest mean concentration (1.81%), while the smallest was for As (20.14 mg kg\(^{-1}\)). The mean concentrations of other metals in order of magnitude were Mn 786 mg kg\(^{-1}\), Zn 233 mg kg\(^{-1}\), Pb 95.9 mg kg\(^{-1}\) and Cu 41.9 mg kg\(^{-1}\). The complete set of data exhibited a wide variation, with total concentrations varying from 0.7 to 4.3% for Ca, 67.7 to 3128 mg kg\(^{-1}\) for Zn, 425 to 1955 mg kg\(^{-1}\) for Mn, 18.6 to 896 mg kg\(^{-1}\) for Pb, 23.1 to 185 mg kg\(^{-1}\) for Cu, and 11.3 to 87.5 for As. Notice that all except Ca had skewed distributions, and that is why we transformed the concentrations to logarithms to stabilize the variances. At several points, Pb and As exceeded the national guide values (400 and 22 mg kg\(^{-1}\), respectively, DOF 2007).

Fig. 1
figure 1

The region surveyed and the sampling points in Northwestern, Mexico

Fig. 2
figure 2

The unbalanced nested design used: (a) the topological tree of the design; (b) the design as it might appear on the ground, the red point is the main station; blue lines represent nodes spaced 300 m apart, green lines indicate 100 m, purple lines link points 33 m apart, orange lines indicate 11 m, and black lines are nodes separated by 3.6 m

Fig. 3
figure 3

Variograms from the nested sampling in phase 1

Fig. 4
figure 4

Variogram models fitted by reml from the whole set of data

The reml analysis of the nested sampling revealed that most of the variance occurs at distances between 33 and 100 m. Figure 3 shows that only small proportions of the variances for As, Cu, Mn and Ca occur at less than 33 m. Nevertheless, as Lark and Marchant (2018) pointed out, it is good practice to include sampling points close to one another to ensure that variograms are well estimated at short lag distances because those estimates have a large effect on the uncertainty of kriging predictions. Therefore, we refined the nested sampling by choosing new 50 sampling points at 20 m far from any of the previous nested points, and then filled the gaps with 50 more points.

The maps of concentrations

The distributions of concentrations were spatially dependent, the variograms of the logarithms of the concentrations of Pb, Zn and Mn and of the concentration of Ca (Fig. 4) were in general, well structured with small nugget variances. The variograms of As and Cu had proportionately larger nugget variances; mainly, we think, because the error variances in the measurements are proportionately more. Iron showed no spatial dependence; it seemed to be uniformly distributed in the region.

Figures 5, 6, 7, 8, 9, 10 show the spatial distributions of the concentration of the elements in the soil of agricultural land. All the elements but Ca (Fig. 10) are strongly concentrated around the tailings deposit, particularly to the west, the principal hot spot. Lead, Mn and Zn have similar spatial patterns with four other areas relatively rich in the three elements (Figs. 5, 6 and 7). One such area is in the centre of the region with values 2.85 log\(_{10}\)(mg kg\(^{-1}\)) for Pb, 3.15 log\(_{10}\)(mg kg\(^{-1}\)) for Mn and 2.9 log\(_{10}\)(mg kg\(^{-1}\)) for Zn. Another region relatively rich in these metals is somewhat to the south east of it, though with somewhat smaller concentrations. A fairly narrow belt of land also relatively rich in Pb and Mn extends south from the tailings deposit with concentrations reaching 2.1 log\(_{10}\)(mg kg\(^{-1}\)) for Pb and 2.9 log\(_{10}\)(mg kg\(^{-1}\)) for Mn.

As above, As and Cu are concentrated in the hot spot surroundings of the tailings, In addition, both are relatively rich in the soil on either side of the Sonora River with concentrations of 1.25–1.30 log\(_{10}\)(mg kg\(^{-1}\)) for As and 1.45–1.50 log\(_{10}\)(mg kg\(^{-1}\)) for Cu. We suggest an explanation below. Elsewhere in the region their concentrations are less.

Fig. 5
figure 5

Map of Pb. The red polygon is the mine tailing, the blue dashed line corresponds to the El lavadero stream, the continuous blue line is the Sonora River, and the yellow lines are the levels of Pb in (log mg/kg). The rose wind was taken from Del Rio-Salas et al. (2019)

Fig. 6
figure 6

Map of Mn. The red polygon is the mine tailing, the blue dashed line corresponds to the El lavadero stream, the continuous blue line is the Sonora River, and the yellow lines are the levels of Mn in (log mg/kg). The rose wind was taken from Del Rio-Salas et al. (2019)

Fig. 7
figure 7

Map of Zn. The red polygon is the mine tailing, the blue dashed line corresponds to the El lavadero stream, the continuous blue line is the Sonora River, and the yellow lines are the levels of Zn in (log mg/kg). The rose wind was taken from Del Rio-Salas et al. (2019)

Fig. 8
figure 8

Map of As. The red polygon is the mine tailing, the blue dashed line corresponds to the El lavadero stream, the continuous blue line is the Sonora River, and the yellow lines are the levels of As in (log mg/kg). The rose wind was taken from Del Rio-Salas et al. (2019)

Calcium is the most abundant metal that we measured. Its spatial distribution is evidently unrelated to the other metals, and it seems unaffected by the tailings (Fig. 10). It is the only element where concentrations are less close to the sources of the tailings than elsewhere, and where Pb, Zn and Mn are richest.

The map of kriging variances for Pb, Fig. 11, shows how the prediction errors depend on the positions of the sampling points. The denser is the sampling, the smaller is the kriging variances. The maps of the kriging variances for the other elements have similar patterns, though the variances themselves are different, of course.

Finally, as noted above, the three elements Pb, Zn and Mn, have similar spatial patterns; their patterns differ from those of As and Cu, and all differ substantially from the distribution of Ca. This distinction is neatly summarized in the correlation circle obtained from the principal components analysis (Fig. 12). The metals Pb, Zn and Mn are strongly correlated with one another and appear as a cluster of points close to the extreme right of the circle. Arsenic and Cu appear away from them, upper right, and Ca, evidently fairly closely related to Fe appears far away in the upper left quadrant.

Discussion

Sampling to the nested design and the analysis of the data provided a sound guide for the subsequent grid survey for mapping. It showed at what spacings most of the variance occurs and which turned out to be at less than 100 m. Plots of the data on a map of the region also showed that the largest concentrations were near the pile of tailings. Those plots and the kriged maps show how the pollutant elements are concentrated around the tailings deposit; that deposit is a hot spot and evidently a major source of pollution. The elements Pb, Mn and Zn show strong spatial similarities that suggest a common transport process. In addition, the concentrations of As and Cu have a spatial pattern associated with the Sonora River, indicating an additional source of pollution. In contrast, Ca is less concentrated around the tailings deposit; it seems unrelated to the mining.

The dispersion of the elements around the tailings pile is likely to have been caused by the combined effects of water and wind. This combination of processes is widespread in arid and semi-arid regions where erosion by wind and water alternate with the changing seasons and interact with each other; it is a process that differs in its effects from those of wind and water separately (Yang et al. 2019). Tuo et al. (2014) found that the combined effect of wind and water erosion of the soil surface (0–1 cm) removed fine particles (\(<0.01\) mm) preferentially, leaving coarser particles (\(>0.05\) mm) in place. This suggests that the pollutant elements have been carried attached to the finer particles in the tailings and spread by this complex process.

Heavy rain, driven by moderate to strong wind, is especially erosive (Marzen et al. 2017), and it is likely to have re-distributed particles from the tailings in the patterns we observe in Figs. 5, 6, 7, 8, 9, 10. The rose diagrams in those figures show two predominant directions of the wind, namely towards north north east and south south east. Their velocities, ranging from 12 to 38 km hour\(^{-1}\), combined with heavy rain in short spells during summer are quite sufficient to carry material from the tailings.

Fig. 9
figure 9

Map of Cu. The red polygon is the mine tailing, the blue dashed line corresponds to the El lavadero stream, the continuous blue line is the Sonora River, and the yellow lines are the levels of Cu in (log mg/kg). The rose wind was taken from Del Rio-Salas et al. (2019)

Fig. 10
figure 10

Map of Ca. The red polygon is the mine tailing, the blue dashed line corresponds to the El lavadero stream, the continuous blue line is the Sonora River, and the yellow lines are the levels of Ca in (%)The rose wind was taken from Del Rio-Salas et al. (2019)

Fig. 11
figure 11

Error map of Pb on the logarithmic scale. The red polygon is the mine tailing, the blue dashed line corresponds to the El lavadero stream, the continuous blue line is the Sonora River, and the red lines are the estimated variance of Pb in (log mg/kg). The red to pink discs are the eight nested sampling nodes, the additional points are shown as green stars and yellow crosses

Fig. 12
figure 12

Correlations between the elements and the first two principal components plotted in the unit circle

The gully erosion of the tailings is likely to have contributed substantially to the enrichment of metals in the surroundings and toward to the Sonora River following the path of the El Lavadero stream. As the soil has a large proportion of sand (40% or more) and contains little organic matter, it has rather few active sites on to which metals can bind, thereby allowing the metals to be transported by leaching in infiltrating water or in run-off. Efflorescent salts in the San Felipe tailings are rich in Pb and Mn in particular (Del Rio-Salas et al. 2019). The fine fractions of these salts are susceptible to wind erosion because of the weak cohesion between particles (Sanchez-Bisquet et al. 2017). Wind carries significant amounts of dust from tailings deposits following the dominant wind direction (Moreno-Brotons et al. 2010) and creating trends with increasing distance from tailings (Lark et al. 2017; Djebbi et al. 2017). Thus, Pb and Mn could be dispersed several hundred metres from the tailings in the form of efflorescent salts. Also, the water-soluble salts of Mn, Zn and Pb in the efflorescent deposits could contaminate water from the tailings (Del Rio-Salas et al. 2019). These salts would then carried via channels into the Sonora River.

When the rain is especially heavy flooding spreads the pollutants, both in solution and as particles, over the flood plain to generate the spatial patterns that we observe in our study. The maps show where the pollutants are so concentrated that remediation should be considered. They also show where to prioritize further studies on the mobility of the pollutants in the light of other properties of the soil that are likely to enhance or retard mobility.

It is evident in Figs. 8 and 9 that much of the As and Cu derives from the tailings. These elements are concentrated around the El lavadero stream which connects the tailings pile with the Sonora River. Nieva et al. (2021) found that the mineralogical composition of the efflorescent salts depended on climate (specifically climate with alternating dry and wet seasons). They found that during the wet season, copiapite is the dominant mineral in the salts precipitated in the pores of the tailings, where the arsenates substituted the sulfates, converting the copiapite into an As reservoir. This arsenic can be released during the short spells of summer rain. Del Rio-Salas et al. (2019) found that the efflorescent salts of San Felipe de Jesús contain up to 26% of copiapite, so this could be an important process for the spread of As from the tailings.

Arsenic and copper are also spread more widely, with some of their larger concentrations close to the Sonora River (Figs. 8 and 9). It is likely that some of this As has come from spills from mines in the northern sector of the Sonora basin. Gomez-Alvarez et al. (1990, 1993); SEMARNAT (2014) and Silva-Rodriguez (2019) have documented such spills from mine wastes to the north of our region. Some of those discharges were rich in As and Cu, and after attaching themselves to soil and sediment they remained along the river channel (Rivera-Uria et al. 2018). The metal-enriched material could then be re-mobilized during heavy rain and dispersed downstream on the flood plain (Foulds et al. 2014). It is likely therefore that the current spatial pattern of As and Cu arises from mine discharges at various times in the past. Martín-Peinado et al. (2015) reported similar persistent residual pollution (including As and Cu) 15 years after a mine spill in Aznalcòllar, Spain. Although the remediation measures were implemented immediately, spilled material from the tailings remained mixed with the soil as a major source of pollution (García-Carmona et al. 2019).

We still need better understanding of metal pollution in this region. Not only have metals from the mine waste polluted the soil, they are also mobile in the soil and likely to be taken up by plants. Loredo et al. (2020) and Morales-Pérez et al. (2021) analysed samples of the soil from close to the front of the mine tailings. They found that Mn, Zn and As are highly mobile in the soil there and that Zn and Pb exceeded the threshold limits of phytoaccesibility. Such assessments need to be extended throughout the 900-ha region where concentrations are now seen to be large.

Conclusions

Our experience of splitting the survey of pollutant metals in the soil at San Felipe de Jesús in Northwestern Mexico into two stages shows the merit of preceding grid survey for mapping with a nested design and analysis to establish the scale(s) at which most variance occurs. It allowed us to plan an affordable sampling in the second stage that would provide predictions with acceptable error. Despite the several papers setting out the procedure and software now embodying reml for the analysis of spatially nested data the technique seems under-used.

The survey revealed widespread large concentrations of Pb, As, Zn, Cu and Mn in the soil of the region. The maps made by kriging from the sample data show clearly that the largest concentrations are associated with the tailings deposit on the western margin of the region. Concentrations of Pb, Zn and Mn decrease with increasing distance from the deposit, and it seems likely that the metals were transported by wind and water from the tailings. Arsenic and Cu are also concentrated close to the Sonora River, almost certainly with material from mine spills north of the basin. Land managers and responsible agencies can now focus on those parts of the region most seriously affected to restrict agriculture and plan feasible remediation.