Introduction

Heavy metals (HMs) are naturally occurring in soils, and their abundance reflects their contents of the parent rocks from which they are formed. Enrichment of soil with HMs suggests that these metals are not enriched in soil by weathering or any soil-forming processes. This enrichment may be originated from numerous external anthropogenic sources which can be categorized into point and nonpoint sources (Mattigod and Page 1983). Heavy metals are among the most important soil pollutants, which are concerned from both agricultural and environmental standpoints (Papazoglou et al. 2007). They are characterized by long biological half-lives and high toxicity in the geomedia and therefore tend to accumulate in the surficial environment including soils and sediments. Since metals are omnipresent in time and space, their adverse effect on the human health is most probable. Soil contaminated with HMs can produce apparently normal crops that may be unsafe for human or animal consumption. They can accumulate over time within the human organs and constitute serious disturbance to normal body function.

Multivariate statistical approaches including principal component analysis (PCA) and cluster analysis (CA) are the statistical tools used to segregate sources contributing to observed pollution (Huang et al. 1994). The PCA methods have been widely used in geochemical applications to soils (Qiao et al. 2011) to identify pollution sources and distinguish natural versus anthropogenic inputs. Cluster analysis is often coupled to PCA to check results and to group individual parameters and variables. Examples of applications to sediments (Ruiz et al.1998; Hu et al. 2004) are common. Source apportionment techniques were applied in the data set by integrating PCA with multiple linear regression (MLR).

Besides, GIS is successfully used in environmental pollution studies because of its ability in spatial analysis and visualization of the spatial patterns of the HMs. Furthermore, conjunction of GIS with multivariate statistical approach has been effectively used in soil pollution studies (Facchinelli et al. 2001; Zhang et al. 2006) and identification nonpoint sources of pollution (Chowdary et al. 2005).

The pollution sources in the study region can be organized into two groups: (1) point sources, which originate from local industrial activities (e.g., private mechanic workshops), transportation activities and sites used for wastewater disposal and (2) nonpoint sources which include intensive applications of phosphatic fertilizers, pesticides and irrigation using wastewater.

The study aims to:

  • Investigate concentrations of HMs in surface soil of agricultural area through spatial environmental observing techniques and analyze their possible sources by means of multivariate statistical techniques.

  • Estimate the contributions of the potential pollution sources to each soil HM.

  • Assess type and extent of heavy metal pollution problem (i.e., listing of heavy metal hot spots) in the study area, for allowing future planning of suitable site-specific management by the decision makers.

  • Construct geochemical distribution maps of HMs and define their pollution patterns by converting the points with the factor scores to various continuous surfaces based on IDW method.

Materials and methods

Study area description

Sohag Governorate is located in Upper Egypt occupying a major section (about 125 km long) of the Nile Valley. The study area is represented by the Nile basin stretch located in the middle sector of Sohag Governorate between latitude 26° 7′00″ to 26° 57′ 00″ N and 31° 20′ 00″ to 32° 14′00″ E. The observed area is situated close to the main road junction in the center of the Sohag city and is characterized by dense traffic jams and crowding on all branches of the transport network. The physiographic feature of the study area is a flat terrain with a width varying from 16 to 20 km (65 m above msl) and bordered from both the east and west by the higher relief Eocene limestone plateau (Fig. 1). The area is characterized by a typical hot desert climate.

Fig. 1
figure 1

Location map of the study area and sample locations. C and R refer to old cultivated and newly reclaimed soil samples, respectively

Geologically, the various rock units distributed in Sohag area (Fig. 2) are entirely composed of sedimentary sequence ranging in age from Lower Eocene to Recent; the main units are summarized in Table 1.

Fig. 2
figure 2

Lithological map of the study area showing the main geologic units

Table 1 Main geological units in Sohag area

The agricultural lands of the study area could be differentiated into two main sectors of land uses, namely the old cultivated lands and the newly reclaimed lands. The Nile floodplain is occupied by the old cultivated lands, which distributed along both the eastern and western sides of the River Nile. They are formed of fertile clay and silt sediments used for farming activities. Unfortunately, the deposition of the fertile Nile flood silt was extremely curtailed, owing to the High Dam construction, causing extensive use of chemical fertilizers and other agrochemicals to improve the soil productivity; so, addition of the various contaminants became a critical environmental problem. On the other hand, the calcareous and sandy sediments of the wadi deposits represent the newly reclaimed lands and hence are characterized by low productivity status. In order to improve the fertility of these lands and their suitability for reclamation purpose, the topsoil (20–40 cm) is usually transported from the more fertile floodplain sediments and spread out on the soil surface. In addition, there is an extensive use of chemical fertilizers and other amendments to overcome the low fertility of these soils. Ali (2005) concluded that the cultivated floodplain and the topsoil layer of the reclaimed lands in Sohag Governorate are silty in nature and have similar physicochemical properties.

Soil sampling and preparation

Eighty-six surface soil composite samples from the uppermost 10 cm depth were collected and identified using a global positioning system (GPS). To obtain representative samples, a series of standard soil sampling procedures were adopted (Mason 1992).

Prior to analysis, all samples were air-dried at room temperature; sieved through a 2-mm sieve; stored in polyethylene plastic bags for subsequent sample analysis; and designated as bulk sample. As HMs are normally more concentrated in the finer mud fraction (<63 μm) than in the sand and coarser fractions, the influence of soil texture on the metal load determination is avoided by analyzing the fine fraction. Accordingly in this study the fine fraction (<63 μm) was separated from the bulk portion and subsequently dried in a thermostatically controlled oven at 105 °C for 24 h.

Soil analyses

Concentrations of the investigated metals (Cd, Co, Cr, Cu, Fe, Mn, Ni, Pb and Zn) were analyzed after complete dissolution using a mixture of HNO3–HF–HClO4 acids, by Buck Scientific Model 210 VGP Atomic Absorption Spectrophotometer (AAS). The accuracy and precision of the procedure have been verified by the analysis of international geological standards SGR-1 and GS-N, and standards. Quality controls involved analysis of random samples, blank samples and national standard samples each time. The recovery rates for the estimated metals were 95–108 %. Due to the high inorganic carbonate variation ranges among all the investigated samples, the total concentrations for various metals are expressed on a calcium carbonate-free basis. Total organic carbon was determined by Walkley and Black (1934) method.

Data treatment and conditioning for multivariate analysis

In the multi-element chemical analysis of geological materials, the elements occur in different concentrations. Therefore, the data need to be standardized to comparable mean and variance. However, standardization does not make much sense if the data distributions are much skewed as is often the case with applied geochemical data. Most statistical methods assume that observations are from a normally distributed population, but most geological data do not exhibit normal distribution. Practically, many geologic variables are positively skewed which are often the norm in geological data (Chapman 1976; Rock 1988). The summary statistics (Table 2) of the untransformed unielemental geochemical data are positively skewed with kurtosis coefficients larger than zero. Also, the results of Anderson–Darling test (Anderson and Darling 1952) for each variable show the same trend.

Table 2 Descriptive statistics of the total HMs content, organic matter (OM) and calcium carbonate (CaCO3) in the examined surface soil samples (n = 68)

In this study, the most common logarithmic (base 10) transformation and other transformation methods of data were tested (e.g., natural log, square root, reciprocal transformation and Box–Cox), but the resulted p values suggest that these methods are not suitable which makes impossible use of such transformations. Wright (1989) suggested that in such cases the rank transformation of data might be helpful. After rank transformation of our data, the obtained p value of all studied variables ranged from 0.046 to 0.054, which confirm the success of this transformation. So, all calculations of bi- and multivariate statistics were based on ranked data (Swan and Sandilands 1995).

Also, correlation matrix (CM) is used to identify the relationship between HMs content and soil properties. The results of CM are presented in Table 3. Principal component analysis was interpreted in accordance with the hypothetical source of HMs (geogenic, anthropogenic or mixed). Varimax rotation was applied because orthogonal rotation minimizes the number of variables with a high loading on each component and facilitates the interpretation of results; see Table 4. As PCA can just provide qualitative information about pollution sources, but cannot provide quantitative contributions of each source type to each variable (Pekey et al. 2004), the integration of PCA with multiple linear regressions (MLR) has been used for quantifying source contributions (Tauler et al. 2008).

Table 3 Heavy metal concentrations from the literature (µg g−1)
Table 4 Linear correlation matrix of the “rank”-transformed data

In addition, cluster analysis was applied to classify HMs into different geochemical groups and to identify the sources either anthropogenic or natural (Li and Feng 2012). The software codes MINITAB ® and STATISTIC ® 7.0 for Windows were used for the statistical analysis.

GIS and spatial analysis

Geographical information system (GIS) is a system for managing, manipulating, analyzing and presenting geographically related information (Collins et al. 1995). Geochemical maps prepared by GIS can be used to identify soil contamination hotspot areas and to assess their potential sources. The ArcGIS 9.3 software was used for the mapping of geochemical properties.

Results and discussion

Characteristics of total HMs concentrations in surface soils

Univariate descriptive analysis and the main soil parameters are given in Table 2. The obtained data were plotted in the form of box–whisker plot diagram (Fig. 3).

Fig. 3
figure 3

Box–whiskers graph of the total HMs showing the median, lower and upper quartile, minimum and maximum values of 9 HMs examined in the <63-µm fraction of the surface soil. Fe in (%) and other metals in µg g−1

As shown in Table 2, there is a distinct change in the contents of HMs among the sampling soils, and the median concentrations of Cd, Co, Cr, Cu, Fe, Mn, Ni, Pb and Zn are 0.89, 39, 162, 74 µg g−1, 7.20 %, 1434, 72, 25 and 164 µg g−1, respectively, which follows a descending order as Fe > Mn > Zn > Cr > Cu > Ni > Co > Pb > Cd. According to Moalla et al. (1998), the total iron content in the Nile sediments of Lake Nasser ranges from 5.1 to 13.1 %. Near the study area, Dekov et al. (1997) reported that the average Fe content in the silt/clay fraction is 8.1 %. Cobalt, Ni and Mn are present in the divalent state and are found mostly in the ferromagnesian minerals where their respective divalent radii allow them to substitute readily for Fe(II) (Goldschmidt 1958). Hence, the higher content of Fe, Mn, Co and Ni (rock-forming HMs) can be attributed to the lithogenic nature of the Nile sediments which are characterized by high content of the clay and mafic minerals. Consequently, such minerals are extremely influenced by the source rock composition in the hinterlands particularly the Ethiopian basaltic plateau (Omer 1996).

The high range indicates nonhomogeneous distribution of elements throughout the study area, and the anomalous values are unevenly distributed. This nonhomogeneity may be due to both natural and anthropogenic load. The lower coefficient of variation (CV%) of Co, Fe, Cr, Ni and Mn also suggests that the distribution of these elements in surface soils is relatively homogenous in the area (Zhao et al. 2010; Luo et al. 2012). However, the CV% values of Pb, OM, CaCO3, Zn, and partly Cd and Cu are obviously higher, which suggest their anthropogenic sources (Yang et al. 2016).

In terms of median values, concentrations of all metals show (Fig. 4) higher values than their corresponding concentrations in global average shale (Turekian and Wedepohl 1961). On the other hand, Ni concentration is approximately identical to global average shale. The Fe, Mn and Co values are considerably enriched when compared with Turekian and Wedepohl's (1961) average. Such high concentrations coupled with low standard deviation values suggest a major natural (i.e., indigenous lithologic) sources for these elements. Mafic affinity of the cultivated floodplain source rocks responsible for elevated levels of the former metals when compared to the average earth’s crust and other world soils (Table 3).

Fig. 4
figure 4

Comparison of average concentrations of measured elements in the investigated soil and the average shales

Relative higher content of Cr and Cu in the floodplain may be attributed to both geogenic and anthropogenic contribution. On the other hand, Pb, Zn and Cd showed both higher concentrations and CV% values for surface soil samples suggesting the anthropogenic sources of these elements.

GIS visualization and spatial distribution of soil HMs

The GIS approach was adopted to check the interpretation of results as well as the spatial distribution geochemical anomalies in the study area. Both rank-transformed data and factor scores were interpolated by inverse distance weight (IDW) facilitated in ArcGIS 9.3 software. The use of GIS-based maps was selected to their capability to visualize spatial relationships between environmental data and other land features (Walsh 1988).

The distribution maps of HMs including carbonate and OM concentrations are illustrated in Fig. 5, where a high metal content was surrounded by relatively lower contents. The surface layer of the reclaimed lands possesses significantly higher carbonate content relative to cultivated floodplains (Fig. 5). This can be ascribed to the continuous mixing with the underneath calcareous subsoil layer. The abnormal elevated levels of organic matter were reported in sites situated nearby to the lands either applied for wastewater or used for manual wastewater disposal. Generally, the organic matter content in the sites located close to wastewater disposal site at El-Dair is markedly higher than that reported at El-Kola reflecting the longer term of wastewater disposal.

Fig. 5
figure 5figure 5

Single-element distribution maps of HMs contents in the agricultural surface soil samples of the study area

Rock-forming HMs (Fe, Mn, Ni and Co) show similar spatial distributions, the higher content of these HMs generally following the relative higher content of the clay fraction (Ali 2005). In addition, the environmentally related HMs (Pb, Cu and Zn) content of soils in the middle regions of the study area were found to be higher than in other areas, which have similar distribution patterns. On the other hand, they do not show apparent similarity to both OM and total carbonate spatial distribution except in areas affected by wastewater irrigation and close to wastewater treatment plants. So, such high-value distribution areas suggest multi-anthropogenic sources of the former elements including illegal wastewater irrigation by the natives (adjacent to wastewater plants at El-Kola and El-Dair). Also, the higher concentration of these elements found in other areas is due to the small size industrial activities and private vehicle machine workshops (close to Sohag city center) and the common agricultural practices such as the long-term application of Cu-, Pb-, Zn-based pesticides and fungicides and manure fertilizers. The changes in heavy metal concentrations in adjacent agricultural soil reflect the effect of the diffuse anthropogenic emissions in urbanization areas. The different materials such as gasoline, lubricants, tires or plastic that contain high concentrations of Pb, Zn and Cu are the main source of the diffuse anthropogenic emissions (Callender 2003). Based on spatial distribution maps and field investigation, the heavy metal pollution arising from railway transportation could be identified (Malawska and Wiłkomirski 2000; Liu et al. 2009; Wiłkomirski et al. 2011). Generally, the distribution of high Pb concentrations tends to follow routes of western side of the railway emissions in densely populated areas (Fig. 5). Furthermore, the high Pb concentration was located near the high-traffic main roads crossing Sohag city. Areas of high elevated Pb concentrations correlated with that have high Zn values along the west of the railway line which indicate that railway transport has a significant effect on the concentration of Pb and Zn adjacent to cultivated soil as observed at Sohag city station; El-Monshah station; and the rail crossings at Rawafi’a Esawiya and Ahaywa Gharb (Fig. 1). El-Kammar et al. (2009) studied the Nile sediments of Upper Egypt and concluded that the Pb-polluted areas are localized and restricted to the heavy traffic and artisanal workshops close to the cities. The distribution map of Cd shows similar spatial distribution coinciding with distribution pattern of carbonate, which increases away from the Nile stream toward areas occupied by reclaimed lands and the desert fringes on both the eastern and western sides. This behavior reflects that carbonate is the main carrier of cadmium and a significant part of cadmium, which may be originate from anthropogenic sources related to the agricultural practice. The chromium distribution showed an irregularity trend, and the high-value areas are distributed randomly in the study area. Concerning contaminated Egyptian alluvial soils, little work was done on chromium. Soil Cr contamination arises mainly from the uncontrolled emissions from artisanal works as tannery and textile industries and illegal disposal of wastewater, often without treatment, into the agricultural soils using private wastewater vans.

Correlation between HMs concentration and soil properties

The environmental geochemical conditions and the physicochemical characteristics of elements can be clearly interpreted by study the correlation coefficients (Zhang and Selinus 1998). The correlation between HMs provides some information about the sources and pathways of HMs. Table 4 presents the correlation coefficients between soil properties and HMs which includes values for both the Pearson correlation coefficient results (upper right) and the corresponding p values (lower left) that indicate the significance of the association. It is obvious that, Fe, Mn, Co and Ni significantly correlate with each other (p < 0.05) and have either negatively or negligible correlation with OM and calcium carbonate. The mentioned results confirm the natural geogenic source of these metals and reflect the dilution effect of both organic matter and the carbonate in the surface soil. On the other hand, negatively or negligible correlation was found for Fe with Pb, Zn and Cu which reflects the anthropogenic origin of these elements. Lead, Zn and Cu are strongly correlated to each other (p(Pb, Zn) < 0.001, p(Pb, Cu) < 0.01 and p(Zn, Cu) < 0.001), and they are significantly correlated with OM (p < 0.001, p < 0.05 and p < 0.05, respectively) which confirm the important role of OM in their accumulation in the surface soil.

Cobalt and Mn are closely associated with soils (p < 0.001) because of their similar chemical properties (Sposito 1989; Huang 1991; Post 1999; Kabata-Pendias and Pendias 2001). Mn could have great effect on the Co behavior in soil. Carbonates very strongly correlated with Cd (p < 0.001) followed by Pb (p < 0.01) and then Cr (p < 0.05), whereas it is negatively or nonstatistically significance correlated with all other variables. This indicates that carbonate is the main variable controlling the behavior of Cd and to a less extent Pb and Cr. On the other hand, a highly positive correlation was found between Cd and Pb (p < 0.001), Zn (p < 0.05) and Cr (p < 0.05) indicating a common origin. Such former relationship and its strongest positive correlation with carbonate (p < 0.001) in the surface soil suggest that it is relatively affected by the anthropogenic activities (i.e., the addition of fertilizers), beside the geogenic effect following the content of carbonate fraction. So, the behavior of cadmium is obviously controlled by both the natural soil composition and the anthropogenic activities. Based on the researches of published studies (Lu et al. 2010; Saeedi et al. 2012), if the correlation coefficient between the heavy metal variables is positive, it indicates their common source, mutual dependence and identical behavior during the transport. The significant positive correlation of Cr with Co and Cu with Ni also emphasizes their both anthropogenic supply and their association with the geogenic composition of the surface soil which is reflected in degree of significance (p < 0.05).

PCA for HMs in the soils and spatial geochemical interpretation

PCA analysis was performed on all eleven parameters of 68 locations. Liu et al. (2003) recommended only the components or (factors) with eigenvalues exceeding one. Accordingly, four principal components are considered which account for 68 % of the total variance in this study and the remaining 32 % were not explained by these factors. Values of the different eigenvalues are displayed in Fig. 6.

Fig. 6
figure 6

Scree plot of the different eigenvalues showing that the first four components are the most effective ones. Red circles are drawn for the four retained components. The horizontal line indicates an eigenvalue of 1

The extracted factors expected to contain critical information to explain metal contamination in the study region. Factor loadings >0.6 are regarded as significant in the interpretation of the data. Principal component 1 (PC1) explains 21.47 % of the total variance and has the highest loadings (strong positive) for carbonate and Cd and moderate positive loading for Pb (0.460) and Zn (0.549) and low positive loading for Cr (0.133). These associations of loadings represent a relative high correlation between each other (Fig. 7). This finding reflects a negative influence on surface soil quality by agricultural practices via excessive use of phosphorus fertilizers and other inorganic fertilizers in agriculture. Carbonate minerals play a major role in the effect on the soil properties and fixation of such metals in the soil (McFadden et al. 1991). Cadmium behavior in soils is governed by carbonate content in oxidizing environments and alkaline soils, which is likely to mineralogically precipitate as octavite (CdCO3) as well as CdO and Cd(OH)2 (McBride 1994; Holm et al. 1996). Elzahabi and Yong (2001) concluded that at high pH and carbonate contents, HMs retained in the soil were mainly as carbonate salts. Lead, Zn and Cd are used in many industrial, urban and agricultural applications (Kabata-Pendias and Pendias 1992; Adriano 2001) and are often found together at sites contaminated with HMs. Range of HMs (e.g., Cd, Cr, Zn, As and Se) concentrations found in phosphorus fertilizers clearly exceeds the range of concentrations of these metals in the soil (Mattigod and Page 1983). Phosphorus fertilizers are among the sources of heavy metal input into agricultural systems. On average, phosphate rock contains 11, 25, 188, 32, 10 and 239 µg g−1 of As, Cd, Cr, Cu, Pb and Zn, respectively (Mortvedt and Beaton 1995). Phosphate rock from Morocco and other North African countries including Egypt may contain much higher levels of these elements. Such metals have been found to be significantly high in some of the final products ready for marketing such as triple super phosphate, which result in accumulation of these elements and increase the contamination potential, especially of Cd, in the soil. Abdel Sabour and Rizk (2002) determined chromium content in raw phosphate rocks used in Egyptian phosphate fertilizer industry (64 µg g−1) as well as triple superphosphate products (98–103 µg g−1). They concluded that a potential chromium accumulation is expected on the long run due to the high application rate of phosphate fertilizers. Using XRF technique, phosphate rock samples of local origin were analyzed by Hassan et al. (2013) and found that the content of Zn in average equals to 469 µg g−1. Hence, factor 1 represents the effect of P fertilizers, which can be considered as an anthropogenic factor.

Fig. 7
figure 7

R-mode component loadings calculated for 11 variables measured on 68 surface soil samples from the study area. Those elements making large relative contributions to the component are indicated

Cobalt, Fe, Mn and Ni are tightly associated forming a pronounced geogenic trend. These variables are highly positively loaded on the PC2 (accounting for 20.26 % of the total variance), whereas they are negatively or nonsignificantly loaded on the other PCs. Metals loaded on this trend display no anomalous levels in most samples (i.e., lower CV%; see Table 2). This association is most probably of natural geogenic source (i.e., they are lithogenically controlled), and it corresponds to uncontaminated soil. Therefore, the Co, Fe, Mn and Ni contents are thought to be mostly of lithogenic origin throughout the study area, where no significant anthropogenic supply is suggested. The strong positive correlation among each other (Table 4) confirms their tight co-association and supports the stated suggestion. The positive loading of carbonate on PC2 characterizes the natural atmospheric inputs of wind-blown carbonate-bearing dust originated from the adjacent limestone plateau and its residual soils.

Third component (PC3) association is formed by three chalcophile elements, which have the highest loadings of Zn, Cu, Pb and OM which accounts for 15.27 % of total variance. The complete absence of any geological control supports a wholly anthropogenic origin, so this mode apparently reflects the influence of anthropogenic sources, namely the artisanal local workshops, wastewater irrigation carrying metal contaminants and possible use of inorganic pesticides containing Zn, Cu and Pb. The high correlation between each other reveals that they emit from the same sources.

Chromium is isolated in the PC4 (10.95 % variance) suggesting its ambiguity behavior in the surface soil. Chromium is generally present in soils in low concentrations, but concentrations may be elevated because of natural processes and human activities, such as fossil fuel combustion, sludge amendment to soil, fertilizer application and agricultural practices. Consequently, Cr in agricultural soils sampled has a mixed source from both lithogenic and agricultural inputs. Weak positive loading on Cu, Pb and Zn possibly explains deposition of particles from railway transport, implying significantly different sources. In addition, because of the ubiquity of Cr containing products produced, a significant amount of Cr is introduced into agricultural soils via the application of municipal sewage sludge, small factory units, including bakeries, car maintenance and washing facilities and plating and finishing workshops. The majority of factories and production units are located in Sohag center and close proximity to agriculture lands. Chromium in ambient air occurs from natural sources, industrial and product uses as well as burning of fossil fuels and wood. The textile industry, which employs products containing chromium to control algae, prevents corrosion, and in the dyeing process discharge effluents contain chromium (IRPTC 1978).

The graphical projection of the PCA results clarifies the association of the groupings of geochemical parameters and their close relations. Chromium and copper have been distributed between the F1–F2 and F2–F3, respectively (Fig. 8), indicating dual sources that contribute to both of Cr and Cu concentrations. Therefore, Cr and Cu originated from mixed sources of human activity or/and terrigenous influx from the soil.

Fig. 8
figure 8

Principal component analysis of the whole data set. Correlations between the element concentrations and principal components: a in the projection of factors 1 and 2, b in the projection of factors 1 and 3, and c in the projection of factors 1 and 4

The association of Cu, Cr, Zn, Pb and Cd elements (positive loading on PC1 Fig. 8a, c) is caused by common agricultural practices. Such practices increase the concentrations of Cu, Pb and Zn especially Cu which used as an algaecide and fungicide in the form of copper sulfate in viticulture (Mantovi et al. 2003) It was estimated that phosphorus fertilizers make up more than 50 % of total cadmium input in soils (De Meeûs et al. 2002). Soil Cr contamination arises mainly from the uncontrolled emissions from metallurgical industries, improper disposal of dye, tannery and textile industrial wastes, and incorporation of sewage sludge into agricultural soils, whereas Pb is readily emitted from high-temperature processes such as Pb ore smelting, coal burning, and the use of leaded petrol in automobiles and agricultural machine.

Figure 9 presents the correlations of the factor scores R 2 between these components. It is obvious that there was a negligible correlation proving the fact that the factors are essentially independent on each other.

Fig. 9
figure 9

Correlation matrix plot of the factor score

Source apportionment using absolute principal component score–multiple linear regression (APCS–MLR) receptor modeling

After identifying the possible pollution sources by the PCA, the contributions of each source to soil pollutants were then calculated using receptor model PCS–MLR (Pekey et al. 2004; Song et al. 2006). Detail of source contribution is shown in Fig. 10. Individually, PC1 (anthropogenic: inorganic fertilizers addition) contributes approximately 43 % to Cd, 31 % to Fe, 30 % to Ni, 22 % to Co, 25 % to Pb and less than 20 % to Cr, Mn and Cu. PC2 (natural: weathering of source rock) contributes approximately 67 % to Co, 65 % to Ni, 61 % to Mn, 47 % to Fe and contributing less than 30 % to remaining elements. PC3 (anthropogenic: contaminated soils due to different sources including industrial discharge) contributes approximately 60 % to Cu, 57 % to Pb, 45 % to Zn, 23 % to Cd and less than 20 % to remaining species. PC4 (Mixed) contribute 59 % to Cr, 26 % to Mn, 16 % to Cu and less than 10 % to Pb, Zn, Fe, Co, Cd and Ni.

Fig. 10
figure 10

Spidergram shows source apportion data generated using PCAA–MLR modeling after Tauler et al. (2008)

Cluster analysis

Cluster analysis is attached to PCA to check results and group individual parameters and variables. Figure 11 shows the CA results for the nine HMs, carbonate and OM as a dendrogram which classified and merged into three distinct clusters. These clusters completely corresponded to the factors identified in the previous multivariate statistical analysis. The first cluster is composed of two subclusters: subcluster 1 consisting of CaCO3 and Cd and sub-cluster 2 consisting of Cr. The second cluster is associated with OM, Zn, Pb and Cu, whereas the third one included Ni, Co, Mn and Fe. Chromium shows a higher degree of relationship with the first cluster, including CaCO3 and Cd, than the other HMs. Such a result is confirmed the above-mentioned discussion that suggested the source of the Cr contaminants was most likely associated with Cd, which originated from long-term and severe addition of phosphorus fertilizers. Lead, Zn and Cu are very well correlated with each other and form another cluster with OM, indicating that a part of the anthropogenic Pb, Zn and Cu are incorporated in the organic fraction. Nickel, Co, Mn and Fe are commonly associated with the soil parent materials. The analyzed results are in good accordance with the findings of the PCA analysis.

Fig. 11
figure 11

Dendrogram produced by clustering 11 variables. Cluster analysis was performed on ranked data using Ward’s hierarchical agglomerative method and squared Euclidean distance measure

Component distribution maps

IDW is applied to interpolate factor scores of soil samples to observe the spatial distribution of the HMs and to identify the zone of its “hot spots” (Reghunath et al. 2002; Lambrakis et al. 2004). Because the original data were standardized, factor scores values obtained were also in the same form, i.e., with a mean of zero and standard deviation of one. Dalton and Upchurch (1978) have shown that factor scores are related to the intensity of the chemical processes described by each factor. Extreme negative numbers (<−1) reflect areas essentially unaffected by the process, and positive scores (>+1) reflect most affected areas which termed as “hot spots.” Near-zero scores approximate areas affected to an average degree by the chemical process of that particular factor (Lawrence and Upchurch 1982).

In the study area, a several hot spots were noticed with high score values (>1). Most hot spots of PC1 influence were detected in the newly reclaimed lands that subjected to sever fertilizer additions. The influence of fine carbonate present in such lands works as a sink for Cd and other associated HMs released from agriculture activities (e.g., Pb, Zn and Cr). Furthermore, some hot spots are located in soils subjected to improper wastewater irrigation (close to wastewater treatment plants and fields at the western south of the study area).

There is only one hot spot which belongs to old cultivated soils which is located close to Sohag city center. Geographically, certain specific areas can be identified considering environmentally related HMs (Pb, Cu, Zn), mainly adjacent to high densely populated center of Sohag city and along the west side of the railway line (station and rail crossings), where levels are most enhanced (Fig. 12). Two other hot spots occur in sites situated very close to the lands applied for wastewater disposal and sites occasionally used for manual wastewater disposal. These two spots cannot be related neither to agrochemical activities nor to population. It is also clear that most of the hot spots are occupied by old cultivated soils.

Fig. 12
figure 12

IDW spatial distribution maps of the principal component score (PC1, PC2, PC3 and PC4) of soil HMs in the spatial principle component analysis

Elevated score values of factor 4 with high positive loading on Cr and weak positive loading on Cu (Fig. 7) resemble the distributions of Cr concentrations discussed previously. This group of elements consisting of Cr and mainly originated both from natural erosion and form nonpoint agricultural pollution because Cr and Cu were clearly separated from other HMs in PCA (Fig. 8a, c). And among HMs studied, Cr and Cu shared relatively lower pollution level and their spatial contents both own low spatial variation which were probably in relationship with the spatial distribution characteristics of agriculture land (Figs. 5, 12). Although they differ in their geochemistry, the former metals may be behaved similarly and shared the same anthropogenic sources and being trapped and accumulating in the mud fraction in soils. Half of their hot spots are situated in the reclaimed lands, reflecting the impact of agricultural activities.

Conclusion and recommendations

This is a pioneer study documented the HMs load in the agricultural lands in Sohag. The study revealed that the median values for most of the HMs including environmental related ones as Pb, Cr, Cd, etc. were high with respect to average shale by Turekian and Wedepohl (1961). These findings have serious implication for public health.

Both of geogenic and anthropogenic loads on HMs concentration in the surface soil were illustrated. Source identification by application of PCA integrated with GIS via factor maps followed by apportionment using multiple linear regression (MLR) receptor model offered an explanation of the possible pollution sources and separates geogenic from anthropogenic factors and their contribution. Such approach can serve as an effective tool for agricultural planners and decision makers in assessing the environmental health related impact of HMs on the soil.

The present investigation highlights the most probable, natural/anthropogenic/farming practice related, causing heavy metal pollution in the soils. Besides this, the areas affected with varied soil degradation problems (i.e., hot spots) could be distinctly targeted/prioritized on maps. This type of regional scale (spatial) information is in fact highly desired by the decision makers for achieving sustained/increased crop productivities through appropriate site-specific soil–water management plans/remediation measures.

The results of this study on identification of sources and hot spots have led to recommend the following control measures for improvement in soil quality of the study region.

  • The environmentally related HMs need to be continuously estimated in the surficial sediments along the whole Nile Valley.

  • Special attention should be paid to sites ready to be contaminated by anthropogenic activities as industrialization, urbanization, traffic emission, agricultural and wastewater disposal practices.

  • Detailed geochemical maps should be constructed for the various HMs along the Nile Valley. Accordingly, the different hot spots displaying anomalous levels of certain metals will be known. Then, appropriate regulations and arrangements should be taken to control and mitigate the adverse health effect of the different contaminants.

  • The interrelation between HMs level in the geoenvironmental media (sediments, soil and Plants) and the widespread environmental diseases needs to be quantified.

  • Sampling points should be concentrated around the identified hot spots. Then, a second stage of dense sampling protocol must perform to study these attractive sites in more detail.

  • Controls should be introduced on application of inorganic fertilizers especially super phosphate.