Introduction

Potentially toxic elements (PTEs) are a generalized term referring to toxic metals (loids) that threaten human health, the environment, or both (Adimalla et al., 2020a, 2020b; Agyeman et al., 2020). Studies on agricultural soil pollution have gained particular attention due to PTEs triggered by natural and anthropogenic sources (Song et al., 2018; Zheng et al., 2018). However, anthropogenic variables contribute even more to the PTE pollution of soils than natural sources. PTEs are typically found in virtually all environmental matrices, including but not limited to soil, plants, and water (Agyeman et al., 2020). Over the past few years, the pollution of agricultural land by PTEs has occurred in diverse ways, such as industrial wastewater, and car exhaust emissions have worsened this situation (Chen et al., 2015a; Wang et al., 2012). According to Liu et al., (2013), soil PTE pollution is nonbiodegradable and tenacious. However, soil PTE pollution has therefore been regarded as a global threat (Mamat et al., 2014).

Many studies have been carried out in recent years in many parts of the world on PTE toxicity, source detection, distribution patterns, and pollution degree (Adimalla et al., 2020a, 2020b; Mazhari et al., 2018; Rastegari Mehr et al., 2017). Ruiz-Fernández et al. (2019) stressed that PTE concentrations are spatially and temporally distributed and enriched in soil. Atafar et al., (2010) reported on the effects of fertilizer use on soil PTE concentrations in agricultural soil, emphasizing that excessive use of manure and phosphate fertilizers increases PTE content in agricultural soil. According to Li et al. (2014), anthropogenic sources of PTEs in soil pollution include those from agricultural, urbanization, domestic waste, industrialization, and mining activities. Li et al. (2014) further stated that PTE leaching causes food quality degradation, posing environmental risks to the ecosystem. The widespread use of fertilizers, manures, and agricultural waste in agricultural soils has the potential to degrade soil quality while also destroying the terrestrial ecosystem (Hu et al., 2019). The toxicity level of soil geochemistry at a location, as well as anthropogenic effects, is important sources of PTE pollution (Song et al., 2018).

There has been much research done on assessing the distribution of PTE sources in agricultural soil and looking into the processes that contribute to the decrease in PTE pollution in agricultural soil. The estimation of PTE concentrations using pollution indices (enrichment factor, pollution load index, ecological risk assessment) and the use of multivariate analyses, such as principal component analysis (PCA), positive matrix factorization (PMF), and Pearson correlation matrix (PCM), provide quantitative knowledge. However, the findings of using pollution indices are reliable, allowing soil scientists to provide realistic solutions to ecological problems. The identification of sources, the relationship that exists between PTEs, and the distribution of sources have primarily been a technical approach in assessing soil quality in the soil science community, and soil scientists use it most frequently to examine and determine the PTE fraction of contribution in polluted soil. Numerous authors, such as Rodríguez Martín et al. (2013), Huang et al., (2015); and Lü and He, (2018), indicated that multivariate statistical analysis, such as PCA, had been used in recognizing the pattern of PTE sources in soil and that it is extensively used globally. However, to further determine the percentage distribution of PTEs, most authors never hesitate to apply positive matrix factorization (PMF) or principal component analysis/absolute principal component score analysis-multilinear regression (PCA/APCS-MLR). Many soil scientists and researchers, such as Xue et al. (2014) and Chen et al., (2016), Agyeman et al., (2020), have used the PMF approach to determine the potential source distribution in agricultural soil.

Receptor models are constantly applied in source apportionment studies, and some of the famous approaches used comprise positive matrix factorization (PMF), UNMIX, principal component analysis/absolute principal component score analysis-multilinear regression (PCA/APCS-MLR), and chemical mass balance (CMB). In recent literatures, Fei et al., (2020; Hossain Bhuiyan et al., (2021); Salim et al., (2019); Wu et al., (2020); Zhang et al., (2020) relied mainly on PMF, APCS/PCA-MLR or both to compute and detect the elemental source distribution of PTEs. PMF and APCS/PCA-MLR are preferred over other receptor models due to the following competitive advantages: (i) they use effective monitoring procedures, and they establish a substantial database that has become a universal practice; (ii) these receptor models do not need prequantified source profiles (i.e. backwards tracking) in disparity with chemical mass balance (CMB); and (iii) the receptor model's capacity allows it to cope with large amounts of monitoring data (Lee et al., 2016).

The ability of PMF to apportion sources to PTEs is not in doubt, but other authors have also raised concerns about its limitations regarding efficiency and its ability to minimize error when applied. Among some of the errors reported in some papers, according to Yuanan et al. (2020), if the PTEs detected in surface soils have experienced significant selective enrichment, PMF may produce incorrect estimates; for instance, when the elements come from the same sources, they may no longer be coassorted. In another vein, Wu et al. (2020) also reported that PMF was unable to adequately determine the nature of the differences in Cr, Cu, Ni, and Pb detected in surface soils across the entire area, where variances caused by soil parent materials may be large. In addition, Gholizadeh et al. (2016) also outlined that the findings and R2 values of the predicted/observed plots for the majority of the water quality indicators revealed a greater goodness of fit with the APCS-MLR to the pollution source apportionment in the tested river waters.

PTE investigation is not a new thing since it poses a colossal menace in the environment and can cause a devastating effect on flora, fauna, and humans at large. The region under investigation is composed of several towns, including Havirov, Terlicko, Trinec, Bystrica, Jablunkov, Mainly Jabunkova, and Hrcava. Trinec and Vitkovice are important areas for assessing and determining the distribution of PTEs and the associated ecological implications. Steel industry operates in and around the Ostrava neighbourhood. This study seeks to combine a pollution index (that is, enrichment factor (EF)) and PMF to determine the source distribution of PTEs. This conduit allows source apportionment to be computed using the estimated EF values of each respective PTE rather than using the raw data. The study area is primarily an agrarian community that is also home to several industries as well as the steel industry and metal works. Based on the productive agricultural sector and active industrial activities, it is important to determine the soil health and quality of the study area. Against this backdrop, this current study seeks to explore the following objectives to (i) evaluate the concentration of PTEs in agricultural soil at different depths, (ii) compare the pollution levels at different depths, (iii) investigate the concentration of the PTEs that are spatially distributed at different soil depths, and (iv) quantify, identify, and compare the source contributions of PTEs in agricultural soil using EF-PMF and PMF receptor models.

Materials and methods

The research area is in the Frydek Mistek district in the Czech Republic, Europe, on the border with the Moravian–Silesian area (Fig. 1). The district is a rather rugged landscape, primarily with the Moravian-Silesian Beskydy, part of the outer carpathian mountain and the highest mountains. The carpathians have an undulating relief and a natural rock indicating highlands and valleys, which splits the main inland depressions. Larger portions of the district are skewed to the outer west carpathians, and the carpathians have only a small portion to the north and north-west. Geographically, the territory of the district is predominantly carbon-producing, providing a conducive haven for mining activities by the Paskov and Staříč mines, which are currently out of operation (Czso, 2019). The study area is positioned within the geographical coordinates of latitude 49° 41 ′0 'north and longitude 18° 20′ 0′ east at an altitude ranging between 225 and 327 m above sea level, which is characterized by a cold temperate climate and a high amount of rainfall even in dry months. Frýdek-Místek has humid, partly wet summers and cool, dry, windy winters, and most of the winters are cloudy. Temperatures typically vary between − 5 °C and 24 °C over the year and are rarely lower than 14 °C or above 30 °C, while average annual precipitation is between 685 and 752 mm (Weather Spark, 2016). The area survey of the district is estimated at 1208 km2, with 39.38% of the land area for cultivation and 49.36% being forests. The soil properties are clearly distinguished from the colour, structure, and carbonate content of the soil. The soil has a medium and fine texture content that is derived from parent materials. They are mainly colluvial, alluvial, or aeolian deposits. Some areas of the soil reveal mottles in the top and subsoil, which are mostly followed by concrete and bleaching. However, the predominant soil types in the region are Cambisols and stagnosols (Kozák et al., 2010). These soils are dominant in the Czech Republic, with an elevation range of 160.6m to 455.1m for stagnosol and 59.6 to 493.5 m for cambisol (Vacek et al., 2020).

Fig. 1
figure 1

Study area map showing sampling points

Soil sampling and soil analysis

A total sample size of 102 topsoils (51) and subsoils (51) was collected from agricultural land in the district of Frydek Mistek. The standard grid was the sample pattern adopted, and the soil sample intervals were 2 X 2 km using a handheld GPS unit (Leica Zeno 5 GPS) at a depth of 0–20 cm for topsoil and 20–40 cm for subsoil. The samples obtained were packaged in Ziploc bags, correctly labelled, and transported to the laboratory. The samples were air-dried, crushed by a mechanical device (Fritsch disk mill pulverize), and then sieved (< 2 mm) to obtain a pulverized sample. A gram of the dried, homogenized, and sieved soil sample (sieve size < 2 mm) was inserted into a Teflon bottle and well labelled. Seven millilitres of 35% HCl and 3 ml of 65% HNO3 (using automatic dispensers, a special dispenser for each acid) were dispensed in each bottle of Teflon, and the cap was gently closed to enable the sample to remain overnight for reactions (aqua regia procedure)(Cools & De Vos, 2016). The mixture was placed on a hot metal plate for 2 h to stimulate the process of digestion of the sample and left to cool. The mixture was then filtered to obtain a supernatant. The supernatant was transferred to a prepared 50-ml volumetric flask and then diluted with deionized water to 50 ml. The diluted supernatant was then filtered into 50-ml PVC tubes. Additionally, 1 ml of the diluted solution was further diluted with 9 ml of deionized water and filtered into a 12-ml test tube prepared for PTE pseudo-concentration of PTEs in this sample (Milićević et al., 2017). Metal concentration was measured by ICP–OES (inductively coupled plasma–optical emission spectrometry) (Thermo Fisher Scientific company, USA) in compliance with standard procedures and protocols. In addition to each study, the quality control and quality assurance processes were ensured (SRM NIST 2711a Montana II soil) by checking the reference criteria. Duplicate analyses were carried out to ensure that the error was minimized.

Pollution indices assessment

The consistency of agricultural soils must be assessed to determine the impact and toxicity of PTE pollution. Based on this, various pollution indices, such as the pollution index (PI), pollution load index (PLI), Nemerow pollution load (NPI), comprehensive ecological risk (ER), and risk assessment (RI), were used to assess the pollution status of the study area. Huang et al., (2018) and Sawut et al., (2018) argue that indices can reliably measure the status of soil pollution and the degree to which human activity impacts the soil environment. These indices are widely used in the assessment of PTE pollution in agricultural soil. The local background values of the PTEs in the study area are As (10 mg/kg), Cd (0.2 mg/kg), Cr (70 mg/kg), Cu (25 mg/kg), Ni (30 mg/kg), Mn (545 mg/kg), Pb (50 mg/kg), Zn (80 mg/kg).

Single pollution index (PI)

The single pollution index (PI) is the ratio of the soil PTE concentration to the geochemical background values. PI was introduced by Tomlinson et al., (1980), and the equation is given by

$${\text{PI}} = \frac{{C_{n} }}{{B_{n} }}$$
(1)

where Bn is the geochemical background value of the PTE in the soil (mg/kg) and Cn is the concentration of the PTE in the soil (mg/kg). PI is categorized into a low level (PI ≤ 1), moderate level (1 < PI ≤ 3), considerable level (3 < PI ≤ 6), or high level (PI ≥ 6).

Pollution load index (PLI)

PLI is used for the overall assessment of the degree of soil pollution. This index proposes a simple way to display the soil deterioration resulting from the accumulation of PTEs. This equation was introduced by Tomlinson et al., (1980), and the equation is given by

$${\text{PLI}} = \sqrt[n]{{{\text{PI}}_{1 } \times {\text{PI}}_{1} \times {\text{PI}}_{1} \times \ldots \ldots \times {\text{PI}}_{1} }}$$
(2)

where n represents the number of analysed PTEs and PLI is categorized into a low level (PLI ≤ 1), moderate level (1 < PLI ≤ 2), high level (2 < PLI ≤ 5), or extremely high level (PLI > 5) based on the degree of pollution.

Nemerow pollution index (PINemerow)

PINemerow computes the overall degree of pollution of the soil that consists of the concentration of all analysed PTEs (Qingjie et al. 2008). The index is used in the assessment of both the A horizons. The formula is given by

$${\text{PI}}_{{{\text{Nemerow}}}} { = }\frac{{\sqrt {\left( {1/n} \right.\mathop \sum \nolimits_{i - 1}^{n} \left. {{\text{PI}}} \right)^{2} + {\text{PI}}_{\max }^{2} } }}{n}$$
(3)

where PI represents the computed values for the single pollution index, Pmax is the maximum values for the single pollution index of all the PTEs, the interpretation of PINemerow class values is given as ≤ 0.7 = clean, 0.7–1 = warning list, 1–2 = slight pollution, 2–3 = moderate pollution, and ≥ 3 = heavy pollution.

Ecological risk assessment (ER and RI)

Ecological risk (RI) is an index used for the assessment of the degree of ecological risk caused by PTE concentrations in the soil. The index (RI) was introduced and applied by Hakanson, (1980), and the equation is given by

$${\text{RI}} = \mathop \sum \limits_{i = 1}^{n} E_{r }^{i}$$
(4)

where n is the number of PTEs and \(E_{r}^{I}\) is the ecological risk index factor, which is given by

$$E_{r}^{I} = T_{r}^{i} \times {\text{PI}}$$
(5)

\(T_{r}^{i}\) is the toxicity response coefficient of a specific PTE (Hakanson, 1980), and PI represents the single pollution index. The toxicity response coefficients of the PTEs used are 30 (Cd), 10 (As), 5 (Cu), 5 (Pb), 2 (Cr), 2 (Zn), 2 (Ni), and 1 (Mn). The EI has five classifications: low risk (EI ≤ 40), moderate risk (40 < EI ≤ 80), considerable risk (80 < EI ≤ 160), high risk (160 < EI ≤ 320), and very high risk (EI ≥ 320). The RI has four categories, namely, low risk (RI ≤ 150), moderate risk (150 < RI ≤ 300), considerable risk (300 < RI ≤ 600), or very high risk (RI > 600).

PMF receptor model

A receptor model PMF was carried out using US EPA PMF 5.0 software with further description of the U.S. Environmental Protection Agency User Guide (Norris et al., 2014). The receptor model PMF is a multivariate method for factor analysis to solve the CMB, and the original data matrix X is represented in the following order m × n, which can be given as

$$X = {\text{GF}} + E$$
(6)

G (m × p) is a factor contribution matrix, F (p × n) is a factor profile matrix, and E (m × n) is a residual error matrix. E is given as

$$e_{{{\text{ij}}}} = \mathop \sum \limits_{k = 1}^{p} g_{{{\text{ik}} }} f_{{{\text{ki}}}} - x_{{{\text{ij}}}}$$
(7)

where i represents elements 1 to mj represents elements 1 to n, and k represents the source from 1 to p.

The discharged factor contributions and profiles are acquired by the PMF model, which minimizes objective function Q under the constraint of nonnegative contributions, and the solution in the US-EPA PMF program is approximated by the Multilinear Engine-2 (ME-2)(Paatero, 1999)

$$Q = \mathop \sum \limits_{i - 1}^{n} \mathop \sum \limits_{j - i}^{m} (e_{{{\text{ij}}}} /u_{{{\text{ij}}}} )^{2}$$
(8)

where uij is the uncertainty in the jth chemical element for sample i.

The uncertainty is computed based on the element-specific method detection limit (MDL), and the error percentage is measured by the standard reference materials. Since all the calculated contents are above the MDL, the uncertainty equations are taken as follows:

$${\text{Unc}} = \sqrt {({\text{Error}} {\text{Fraction}} \times {\text{ concerntration}})^{2} + \left( {0.5 \times {\text{MDL}}} \right)}$$
(9)

The US-EPA PMF 5.0 software provides a rotation method that sets the Fpeak value to boost oblique edges (Paatero & Hopke, 2009; Paatero et al., 2002). Positive Fpeak values are sharpened by F and G, while negative Fpeak values are transformed by comparison.

Enrichment factor (EF) and EF-PMF source apportionment

This is the index used to assess the concentration of PTEs in soil. For each PTE, the EF was determined to determine the concentration levels of the elements that are caused by anthropogenic activities. EF is mostly used to differentiate the source of PTEs that may be natural or anthropogenic (Kowalska et al., 2018). This includes the stabilization of the soil relative to the reference elements. Enrichment factor estimation is used to estimate the impact of anthropogenic activities related to metal abundance in sediments or soils. Ergin et al. (1991) defined EF by the following equation:

$${\text{EF}} = \frac{{C_{n} /C_{{{\text{ref}}}} }}{{B_{n} /B_{{{\text{ref}}}} }}$$
(10)

where Cn is the concentration of the examined element in the examined environment, Cref is the concentration of the examined element in the reference environment, Bn is the concentration of the reference element in the examined environment, and Bref is the concentration of the reference element in the reference environment. The threshold values used were the world average values (Kabata-Pendias, 2011). The interpretation of the estimated values is defined as follows: EF < 2 denoting deficiency to minimal enrichment, 2 < EF < 5 representing moderate enrichment, 5 < EF < 20 signifying significant enrichment, 20 < EF < 40 suggesting very high enrichment, and 40 > indicating extremely high enrichment.

With the estimated EF values of the respective sampled points for each PTE, the source distribution was determined based on the values. In the normal sense of calculating PMF, raw data acquired after field analysis are used to determine the source apportionment. The novelty here is that instead of using the raw data for source analysis, the computed EF values will be used to ascertain the source distribution of the area under study. This approach is novel and rarely applied in PMF. The receptor model is based on the traditional PMF approach that combines enrichment factor with PMF to obtain a hybridized model EF-PMF. The receptor model EF-PMF is given as

$$({\text{EF}})_{{{\text{ij}}}} = \frac{{(C_{n} /C_{{{\text{ref}}}} )_{{{\text{ij}}}} }}{{(B_{n} /B_{{{\text{ref}}}} )_{i} }}$$
(11)

where \(EF_{ij}\) is the calculated total enrichment factor of the PTEs from the jth source in the ith sampling site, \((C_{n} /C_{ref} )_{ij}\) is both the concentration of the examined element in the examined environment, and Cref is the concentration of the examined element in the reference environment in the jth source from the ith sampling site, and \((B_{n} /B_{ref} )_{i}\) is Bn, the concentration of the reference element in the examined environment, and Bref, the concentration of the reference element in the reference environment of the reference element of the respective PTEs.

Multiple linear regression model

A multiple linear regression model (MLR) is a regression model that describes the relationship between the response variable and multiple predictor variables by utilizing linearly inserted parameters determined using the least-squares approach. In MLR, the least square model is a prediction function that is directed towards a soil attribute following the selection of an explanatory variable. PTEs were employed as response variables to build a linear relationship using the explanatory variable (that is, the factor contribution from both models). The MLR equation is given as

$$y = a + \mathop \sum \limits_{i - 1}^{n} b_{1 } {\rm X} x_{i} \pm \varepsilon_{i}$$
(12)

where y signifies the response variable, an indicates the intercept, n connotes the number of predictors, \(b_{1 }\) denotes the partial regression of the coefficient, \(x_{i}\) implies the predictors or the explanatory variables, and \(\varepsilon_{i}\) signifies the error in the model, which is also called the residual. The model was utilized in R (K = tenfold cross-validation, which was repeated 5 times).

Data partitioning

A random data split approach was used to divide the data into a test dataset (with 25% for validation) and a training dataset (75% for calibration). The training dataset was used to calibrate the regression models, while the test dataset was utilized to assess generalization capabilities. (Kooistra et al., 2003). This was done to assess the suitability of the various models used to estimate PTE source apportionment. All of the models were subjected to a tenfold cross-validation process that was repeated five times. To predict the target variables, the factor contributions for each receptor model were employed as predictors or explanatory variables. R was used to carry out the modelling procedure.

Accuracy assessment and validation

A variety of validation criteria were utilized to determine the best and finest model suitable for the estimation of source apportionment with pollution assessment-based positive matrix factorization receptor models while analysing the accuracy of the model and its validation. The receptor models were evaluated using the mean absolute error (MAE), root-mean-square error (RMSE), and R square, or coefficient of determination (R2). R2 describes the variation of the proportion in the response and is expressed by the regression model. The RMSE and the magnitude of the variability within the independent measurement define the model prediction capacity, while MAE determines the true quantitative value. The R2 value must be high to establish the optimum receptor model using the validation criteria, and the closer the value is to 1, the higher the accuracy. According to Li et al., (2016), an0 R2 criteria value of 0.75 is considered a satisfactory prediction. Methods for evaluating validation requirements using RMSE and MAE a lower obtained value is appropriate and deemed optimum for model selection. The following equation describes the validation procedures.

Mean absolute error

$${\text{MAE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} Y_{i} - \hat{Y}_{i}$$
(13)

R square

$$R^{2} \left( \% \right) = 1 - \frac{{\sum (Y_{i} - \hat{Y}_{i} )^{2} }}{{\sum (Y_{i} - \hat{Y}_{i} )^{2} }}$$
(14)

Root-mean-square error

$${\text{RMSE}}\left( \% \right) = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} (\hat{Y}_{i} - Y_{i} )^{2}$$
(15)

where n represents the size of the observations, \(Y_{i}\) represents the measured response, and \(\hat{Y}_{i}\) is also stated as the predicted response value, accordingly, for the ith observation term.

Data analysis

Statistical analyses were performed using KyPlot. PMF EPA 5.0 was used for source distribution estimation and is considered an excellent tool for estimating source apportionment. RStudio was used for both the principal component analysis and the assessment of the Pearson correlation matrix. Modelling and spatial distribution maps of the PTEs were analysed using ArcGIS, and inverse distance weighting (IDW) interpolation techniques were employed.

The IDW method for spatial interpolation is used to estimate cell values by weighing geometric data (points) in the vicinity of a processed cell. IDW interpolation is based on the idea that objects are more similar to each other than objects that are more subtly different. The effect of the variable entered on the map is assumed to decrease with an increasing distance from the sampling location. More weights are allocated to those points nearest to the target position, thus changing the assigned weights as the opposite 'pth distance' function, where the power function (p) is a positive actual number (Shukla et al., 2020). The forecast parameter for the target position consists of the sum of the 'allocated weights' and 'measured values' for each region.

Results and discussion

PTEs concentrations of soil samples

The descriptive statistics of the analysed PTE concentrations from the district of Frydek Mistek are presented in Table 1. The arithmetic mean concentrations of the PTEs (Cr, Cu, Cd, Mn, Ni, Pb, As, and Zn) in the topsoil and the subsoil were 27.33 mg/kg, 23.49 mg/kg, 1.65 mg/kg, 672.14 mg/kg, 16.88 mg/kg, 29.93 mg/kg, 4.02 mg/kg, and 81.67 mg/kg (topsoil), respectively, and those in the subsoil were 26.88 mg/kg, 23.22 mg/kg, 1.72 mg/kg, 694.29 mg/kg, 16.65 mg/kg, 31.41 mg/kg, 4.91 mg/kg, and 84.24 mg/kg, respectively. The maximum and minimum concentrations of the PTEs in the topsoil ranged from 3.21 mg/kg to 1691.76 mg/kg (maximum) and 0.95 mg/kg to 281.93 mg/kg (minimum), respectively, whereas those in the subsoil oscillated from 2.62 mg/kg to 1581.55 mg/kg (maximum) and 0.93 mg/kg to 213.23 mg/kg (minimum), respectively. Comparing the mean concentration of the topsoil and subsoil to the threshold levels (i.e. world average value, European average value, and upper continental crust) extracted from, Kabata-Pendias, (2011), the mean concentration levels of some of the PTEs in the current study (for both topsoil and subsoil) are higher than some of the PTEs (Zn and Cd) from all 3 geochemical background values. For instance, it is evident from Table 1 that the Mn and Cu concentrations in the current study area are above the European average value (EAV). Similarly, the mean concentration value of Pb in this study is likewise higher than the mean value of the corresponding PTE in the world average value (WAV). On the other hand, the mean concentration values of As, Cu, Mn, and Pb in the upper continental crust (UCC) are equally lower than the obtained mean concentration values of PTEs in the present study.

Table 1 The concentration of PTEs in the study area, showing basic statistical and geochemical background values

Standard deviation values computed for both soil depths were found to be high due to the concentration of the PTEs having high variable heterogeneity in the study area. The normality and the abnormality of the PTE data distribution were determined using the skewness values. According to Chandrasekaran et al., (2015), if the PTE skew value ranges from 1 to − 1, it can be interpreted as a normal distribution; however, if the PTE value is slightly skewed positively (> 1), it is said to be an abnormal distribution. Generally, the skewness values of the following PTEs Pb, Zn, As, and Cd data were below 1; therefore, it can be interpreted that the distribution of PTE data is normal, whereas the other PTE data (Mn, Cr, Cu and Ni) distribution is abnormal and skewed in the right direction as well as leptokurtic.

Nezhad et al. (2015) reported that the coefficient of variance (CV) suggests the degree of variability within the concentrations of PTEs. Thus, if the CV varies between 0 and 20%, it is presumed that the PTEs are of a natural origin, and if it is above 20%, it implies that it is being influenced by anthropogenic activities. Thus, if CV ≤ 20%, this indicates low variability; 21% ≤ CV ≤ 50%, it is considered moderate variability; 50% ≤ CV ≤ 100%, it suggests high variability; and if CV is above 100%, it is regarded as exceptionally high variability. The coefficient of variation (CV %) of the PTEs in the current agricultural soils decreased in the following order: As > Cu > Pb > Ni > Mn > Cr > Zn > Cd > Pb. The results clarified that all the PTEs have moderate variability and are more homogeneous except As in both soils and Cu in the subsoil. Arsenic showed high variability in the topsoil, and Cu showed high variability in the subsoil. High variability in As and Cu indicates a nonhomogenous distribution, which explains the existence of a potential human-related impact.

Pollution characteristics

Single pollution index (PI) and pollution load index (PLI)

The single pollution index estimated revealed a heterogeneous distribution of the PTEs within the topsoil and the subsoil (Table S1). Out of the 51 samples collected for each soil (topsoil and the subsoil), most of the soils sampled showed low PTE pollution levels for all the metals except Mn and Cd. Thirteen soil samples showed low pollution levels for Mn in the topsoil and nine for the subsoil. Some of the PTE concentration levels in the topsoil exhibited moderate pollution levels, such as 31 samples for Mn, 29 for Pb, 7 for As, 1 for Cr, 4 for Ni, 31 for Zn, 12 for Cd, and 5 for Cr. In contrast, the moderate pollution level in the subsoil was as follows: Mn 41, Pb 35, As 13, Cr 1, Ni 3, Zn 33, Cd 4, and Cu 4. Cd and Mn pollution levels in both soils in some observed areas showed a considerable level of pollution, accounting for 37 (Cd),1 (Mn) in the topsoil and Cd (42), Mn (1) in the subsoil sample areas. Five areas sampled in the subsoil showed a high Cd pollution degree, whereas in the topsoil, only 2 samples indicated a high pollution rate. Regardless of the fact that the pollution levels in both soils range from moderate to high, the difference is significant. The subsoil level of enrichment was higher than the topsoil level in all the PTE pollution analysed except for Ni and Cu. The computed PI demonstrates that there is a downwards movement of PTEs from the topsoil to the subsoil coupled with a geogenic base. Madrid et al., (2002) and Parveen et al., (2012) reported that vast amounts of PTEs accumulate mostly in surface layers of the soil (topsoil), but the results ascertained contradict that assertion due to leaching of PTEs from the topsoil to the subsoil eroding some level of accumulated concentration from the topsoil to the subsoil. Antoniadis and Alloway, (2002) findings corroborate our results that leaching plays a role in the mobility of PTEs from the topsoil to the subsoil.

The pollution load index computed displayed varied pollution levels, as indicated in Table S 1. Forty–eight out of the 51 sampled areas in the topsoil showed a low pollution load, while forty-nine were in the subsoil. Each of the soils displayed a moderate pollution load; however, the topsoil exhibited a high pollution load for 2 sampled areas, whereas the subsoil showed a high pollution load. From the PLI _IDW (pollution load index _inverse distance weighting Fig. 2) spatial distribution map, it was evident that the hotspot of the distribution of the pollution load in the subsoil is mainly centred in the middle of the north–western part of the map. The north–west part of the map is a primarily agrarian community, and therefore, the hotspot there may be attributed to the intensive agriculture in that vicinity. The topsoil also showed a hotspot in the middle of the north-western part of the map, as well as a hotspot in the eastern parts of the map. Comparatively, the pollution load in the topsoil appears to be denser than that of the subsoil. The steel industries and some metal works are in the north-eastern and eastern parts of the area of the study area. The spatial distribution map (PLI _IDW) displayed the pollution load on the topsoil rather than the subsoil. It clearly shows that anthropogenic activities such as industry, agriculture, vehicular traffic, sewage sludge, and atmospheric deposition are significant contributors to soil pollution in the upper layer (topsoil). The results from the PLI _IDW spatial map are coherent with the attained results from various studies, such as Zhu et al., (2006); Boyter et al., (2009); Gąsiorek et al., (2017); Zhu et al., (2017); Chen and Lu, (2018).

Fig. 2
figure 2

Spatial distribution of the pollution load index (PLI _IDW) estimated values

Potential ecological risk index

The potential ecological risk index for both soils exhibited varied pollution concentration levels, but all the PTEs showed a low ecological risk level except for cadmium (see table S2). The variability of cadmium in the topsoil demonstrates six moderate risk levels in the topsoil to one moderate risk level in the subsoil. The topsoil exhibited a considerable ecological risk level for 39 sampled locations compared to 42 locations showing a considerable risk level in the subsoil. Few locations within the study areas displayed a high level of ecological risk level for the topsoil and subsoil, accounting for 6 and 8 high ecological risk levels, respectively.

The risk index computed for both soils ranged from low risk to moderate risk, accounting for 31 and 20 for topsoil and 27 and 24 for subsoil, respectively, out of the 51 sampled locations for each soil (see Table S2). The RI_IDW spatial distribution map shows that the risk assessment level in the subsoil is higher than that in the topsoil. The hotspot pattern in the subsoil is denser than that of the topsoil from the north-eastern and south–eastern parts of the map. Regardless of the source of anthropogenic pollution, it may be responsible for the accumulation of PTEs on topsoil, which migrates from topsoil to subsoil, (Liu et al., 2016), it was complemented by the geogenic source.(Fig. 3).

Fig. 3
figure 3

Spatial distribution of the risk index (RI _IDW) estimated values

Nemerow pollution index

Using the Nemerow pollution index to determine the pollution level in both soils, 9 (topsoil) and 2 (subsoil) sampled locations each were clean, 19 and 25 samples each fell within the warning perimeter category, and 22 and 24 were slightly polluted, respectively. Only a sample from the topsoil was found to be moderately polluted (see Table S1). Similarly, the spatial distribution map (Fig. 4) showed hotspots and higher pollution levels in the subsoil than in the topsoil. Pockets of hotspots in the subsoil were found in some parts of the map, except in the north–western area. The southern part of the subsoil spatial distribution map exhibited relatively sporadic pollution. The pollution patterns from the RI _IDW and Pnemerow _IDW spatial distribution maps exhibit some coherency.

Fig. 4
figure 4

Spatial distribution of the Nemerow pollution load (Pnemerow _IDW) estimated values

Enrichment factor

The PTEs showed diverse enrichment levels ranging from a deficiency or minimal enrichment level to a significant enrichment level (see Table S3). All the PTEs displayed low or minimal enrichment levels in both soil horizons except for Cd. Low enrichment levels were computed for Mn, Ni, Pb, Zn, As, Cr, and Cu, which accounted for 15, 47, 24, 25, 45, 51, and 47 in the topsoil and 11, 48, 22, 20, 39, 50, and 47 in the subsoil. Thirty–four of the sampled locations fell within a moderate enrichment level for Mn, 4 for Ni, 27 for Pb, 26 for Zn, 6 for As, 6 for Cd, and 4 for Cu(topsoil), whereas in the subsoil, PTEs exhibited the following 39(Mn), 48(Ni), 29(Pb), 31(Zn), 12 (As), 1 (Cr), 1 (Cd), and 4 (Cu). PTEs showed a moderate enrichment level accordingly. Only Cd exhibited a significant enrichment level for both soils, with 45 and 50 sampled areas being significantly enriched for the topsoil and the subsoil, respectively. According to Zhang and Liu (2002), EF values of 0.5 ≤ EF ≤ 1.5 indicate that PTE concentrations can occur totally from natural weathering processes. However, if the EF values are above 1.5, a large portion of PTEs has been delivered from noncrustal materials as well as a divergent source, such as point and nonpoint emissions and biota (Sautherland et al., 2000; Zhang & Liu, 2002). It was evident that most of the EFs calculated from the sampled areas were above 1.5, signifying that anthropogenic activities played a crucial role aside from the natural source. However, the movement of PTEs that accumulated in the topsoil to subsoil explains the differences in the enrichment of the subsoil compared to topsoil. Despite this, the disparity in enrichment level may be attributed to the transfer of PTEs from the topsoil to the subsoil; however, it may also be due to the high level of PTEs related to parental materials with potential uplift due to leaching.

Multivariate analysis of PTEs

Pearson correlation matrix (PCM)

The correlation matrix (Table 2) demonstrates that there is a relationship between the PTEs under analysis in both soils. The PTEs PbCd, ZnCd, and AsCd showed a high degree of connection in the topsoil, with r values ranging between 0.7 and 0.81. It appears that the subsoil showed no high correlation but rather a moderate correlation between ZnPb, AsPb, NiCr, ZnCd, and AsCd, with r values varying between 0.52 and 0.61. Similarly, the relationship between ZnAs, AsPb, NiZn, NiCd, and NiCu in the topsoil also showed moderate connections. Generally, the correlation in the topsoil is stronger than that in the subsoil. This indicates that the PTEs from both soil levels may share a closely related source.

Table 2 Correlation matrix between the PTEs

Identification of sources based on PCA

The PCA results are displayed in Table 3 and projected in Fig. 5a. Hou et al. (2013) reported that PCA is a useful tool that can provide informative suggestions on pathways for PTEs and primary sources. The characteristics of the extracted principal components (PCs) selected have eigenvalues all equal to or greater than 1. Based on the criteria, PC 1 and PC 2 were found to be statistically significant, accounting for 71.89% and 62.76% of the data variations for topsoil and subsoil, respectively. The groupings of the PTEs in the projection of components 1 and 2 from both soils suggest that Pb, Zn, Cd, and As are polluting PTEs and that other PTEs Mn, Cu, Ni, and Cr are more geogenic elements. This is consistent with Borůvka et al., (2005) report stating that the positive correlation between this group of elements and their place within the primary component projection denotes their origin. Even though some of the PTEs are more geogenic, anthropogenic factors augment their enrichment in both soils. The topsoil and the subsoil accounted for the principal component (PC 1) 46.40 and 36.65%, respectively.

Table 3 Correlations between PTEs and principal components
Fig. 5
figure 5

a A projection of principal components 1 and 2 for the topsoil (A) and the subsoil (B). b Spatial distribution of the principal components (PCA _IDW) estimated values

The principal component values for the PTEs are in the order of Zn < Cd < Pb < Ni < Cu < As in the topsoil, and those for the subsoil are in the order of Zn < Cd < Pb < Cu < As. The following PTEs, Ni, Pb, Zn, and Cd, exhibited high correlations ranging from 0.7 to 0.91 in the topsoil, while Cd and Zn also displayed strong positive loads (Table 3). In PC1, the control sources are more anthropogenic than geogenic. Consequently, this suggests anthropogenic pollution arising from farming practices, industrial activities, atmospheric deposition, and soil manure/fertilizer application. This is consistent with the claim of Chen et al. (2015). The topsoil revealed a hotspot around the north-eastern part of the map (PC1) that may be linked to industrial activities (steel industry). The spatial distribution of the PC-IDW (principal component of inverse distance weighting) map for the PC1 subsoil indicates the hotspot on the north-western part of the map, which may be attributed to both geogenic and anthropogenic sources (Fig. 5b). Virtually all the north-eastern to south-eastern parts of the map (subsoil PC 1) showed a moderate spatial distribution of PTEs. Subsoil enrichment is not limited to the geogenic source but rather to the mobility and leaching of PTEs from the topsoil to the subsoil. This is consistent with similar results captured by Borůvka et al. (2005). In PC 2, 25.49% and 26.12% of the total variance were explained, and their positive loads were satisfactory, with r = 0.55 (Mn) and 0.75 (Cr) for topsoil and 0.63 (Ni), 0.68 (Cr), and 0.45 (Mn) for subsoil. The PTE positive loads in PC2 are more geogenically inclined. The PC2 spatial distribution map shows more hotspot and moderate PTE distributions in the topsoil than in the subsoil. This is the consequence of anthropogenic activities that occur on topsoil. The north–west and south–western portions of the PC2 map (topsoil) are mainly agrarian vicinities, which use many agro-related products coupled with the geogenic source, which likely accounted for the high spatial distribution level of PTEs in that region.

Spatial distribution of PTEs

Adimalla et al. (2019) reported that spatial distribution maps play a crucial role in defining safe and hazardous areas as well as providing the basic details required to avoid and monitor further soil pollution. The spatial maps of both soil levels of As, Zn, Pb, Cr, Ni, Cu, Mn, and Cd are shown in Figs. 6 and 7. The concentration of As in both soils seen in Fig. 6 indicates a high concentration of As in both soils. Hotspots of As seen in the north-east and south-east parts of the map, but in the topsoil, more hotspots can be realized in the south-east part, which may be attributed to agricultural activities. Hu and Cheng, (2013) and Liang et al., (2015) indicated that some phosphate fertilizers contain As and are used as essential pesticide ingredients that tend to increase As in agricultural soils. The spatial distribution map of Cd shown in Fig. 6 exhibits more Cd pollution content in the subsoil than in the topsoil. The hotspots of Cd can be seen in the north-eastern and central parts of the south-eastern part of the map. The high content of Cd in the soil can be attributed to the multiplicity of sources, such as the parent materials, metal works, and steel industry waste discharges that have leached into the subsoil. Chai et al., (2015) and Sun et al., (2011) confirmed similar cases in agricultural soil in which the reasonably high concentration of Cd is due to the steel and smelting industries, and it has been shown that the waste discharged from the steel and smelting industries ultimately contributes to the accumulation of Cd in the surrounding soil. The topsoil shared similar hotspots pattern in the north–western part of the map with the subsoil (Fig. 6). The north-west part of the study area is known predominantly for intensive agricultural activities. The hotspot of Cr seems to be more geogenic with support from agro-related activities (agrochemicals) in both soils. The variable copper distribution in the topsoil is more pronounced than that in the subsoil. The hotspots displayed in the south–eastern part, the central part of the north–west, and the south-western part of the topsoil Cu spatial distribution map are more of geogenic origin with a boost from vehicular traffic (Fig. 6). Zhao et al., (2015) findings corroborate the present research results that vehicle exhaust deposits large contents of Cu on the topsoil. Both soils displayed varied spatial variability in Mn concentrations; nevertheless, the distribution and the hotspots in the subsoil were denser than those in the topsoil in the north–west parts of the map (Fig. 7). The distribution of Mn may be ascribed to the parent material. Nickel showed a hotspot in the north-western part of the map in the topsoil, whereas in the subsoil, it showed a hotspot in the south-western part of the map. In addition, both soils showed a sparse distribution of Mn on the map (Fig. 7). The distribution of Mn in both soils is more of a geogenic source. The distribution of Pb concentration in the subsoil in the north–western part is more pronounced than that in the topsoil, and Pb is denser in the south-eastern part of the map in the topsoil than in the subsoil. The pollution in the north–western part of the subsoil is more of agro-related pollution as a result of mobility and leaching. Nevertheless, the pollution in the north-west part of the topsoil may be attributed to the steel industry and the metalwork within that vicinity. Pb enrichment in the subsoil may be attributed to the leaching of Pb-based pesticides and fertilizers (e.g. lead arsenate) used in agricultural land. Atafar et al., (2010) argues that fertilizer and pesticide application increases the concentration of Pb in agricultural soil. Zinc distribution varied in both soils (Fig. 7), showing hotspots in the north-western part (subsoil) and central part of the eastern part (topsoil) of the map. Zinc enrichment might be attributed to the nonferric metal industry and agricultural practices. Kabata-Pendias, (2011) affirm that Zn's anthropogenic sources relate to the metal industry, steel industry, and agriculture.

Fig. 6
figure 6

Spatial distribution of potentially toxic elements using IDW (TS—topsoil & SS—subsoil)

Fig. 7
figure 7

Spatial distribution of potentially toxic elements using IDW (TS—topsoil & SS—subsoil)

Source analysis using EF-PMF and PMF receptor models

EPA PMF software (version. 5.0) was used for the PMF and ER-PMF (enrichment factor positive matrix factorization model) receptor modelling analysis. The computed EFs and raw data of the respective PTEs, as well as the data for uncertainty, were used as input data for both receptor models. We considered the optimal number of variables in both receptor model studies and followed the set of guidelines. A gradual decline in the Q/Qexp index and a tendency to stabilize the Q value were chosen. The value of freak does not seem to boost sources. The results of both receptor model analyses are illustrated in Table 4. For a PTE to dominate a factor, it must attain 40% or more percentage apportionment in a factor.

Table 4 Percentage contribution of each factor to PTEs derived from the ER-PMF receptor model

The first factor was dominated predominantly by As (77.8%), Cd (50.6%), and Zn (43.7%) in the subsoil for the EF-PMF receptor model, while in the PMF receptor model, it was controlled by As (75.7%), Cd (49.3%), Pb (48.8%), and Zn (42.8%). This pattern of dominance by the PTEs in the subsoil for both receptor models suggests the influence of anthropogenic activities. This result is consistent with the PCA projection loadings in PC 1 for the subsoil (see Fig. 5). The first four PTEs that dominated the subsoil of Factor 1 (Pb, Zn, Cd, As) suggest more anthropogenic sources of pollution (such as agrochemicals and industrial activities) through leaching. Previous studies found elevated levels of Pb, Zn, Cd, and As above their tolerable soil limits from a range of activities, such as metalworking, the steel industry, industrial waste, and agriculture (Atafar et al., 2010; Belon et al., 2012; Sun et al., 2013; Tang et al., 2017). The spatial distribution maps (Figs. 6 and 7) of Pb, Zn, Cd, and As show the enrichment of these PTEs in the subsoil compared to the topsoil. Even though anthropogenic activities are directly incident on topsoil, the migration of PTEs from the topsoil to the subsoil due to soil-related properties (chemical and physical) plays a major role in the enrichment of the subsoil. Moreover, activities such as leaching may be responsible for these findings. Arsenic dominance and enrichment in the subsoil is a result of the successive application of fertilizer to the soil every crop season to ensure a bumper harvest that leaches from the topsoil to the subsoil as a result of rainfall, excessive irrigation, organic matter content, and soil texture. Soil physical and chemical factors may considerably alter the migration from the topsoil to the subsoil, such as Cu and Cd (Slavich et al., 2013). Studies carried out by Duan et al., (2015) have shown that Pb, Ni, and Cu increased in the subsoil compared to the topsoil. This is because the PTEs are leached to the subsoil from the topsoil (Shiva Kumar & Srikantaswamy, 2014). The topsoil was controlled by Cu (61.6%) and Ni (59.5%) for the EF-PMF receptor model, and the PMF model was dominated by Cu (71.5%), Ni (40.3%), Pb (40.0%), and Zn (42.7). The EF-PMF dominance PTEs (Ni, Cu) in the topsoil based on the source identification by PCA suggest that Ni and Cu are more of geogenic origin but with an anthropogenic source augmenting it. In contrast, the prevalence of PMF model PTEs (Cu, Ni, Pb, Zn) indicates that topsoil factor 1 is a composite of anthropogenic and geogenic sources. Topsoil enrichment based on the controlling PTEs, especially Ni and Cu, in both models may be inclined to a geogenic source with assistance from agro-related activities (agrochemicals). Previous studies by Liang et al., (2017) and Yang et al., (2017) have discovered that the use of fertilizers and pesticides in agricultural fields elevates the normal content of copper in the soil. Huang et al., (2019) reported that nickel enrichment mostly in the surface of the soil is more of a natural source.

Factor 2 was dominated by Mn (56%) and Pb (41.5%) for the EF-PMF receptor model (subsoil) and Cr (56.2%) and Mn (57.5%) in the PMF receptor model (subsoil). However, in the topsoil, As (71.7%), Cd (42.3%), and Pb (40.6%) were included in the EF-PMF receptor model, and As (88.8%), Cd (52.1%), Pb (54.3%), and Zn (43.1%) were included in the PMF receptor model. The dominant PTEs (particularly Cr and Mn) in both receptor models in the subsoil share a similar pathway, as revealed by PCA, which is more of a geogenic source supplemented by anthropogenic activities such as the steel industry. Chromium is an ideal element for the formation of alloys in the steel industry, and it is abundant in topsoil in most study areas, indicating the existence of industrial activities (Yeung et al., 2003) and agricultural practices. The leachability of Cr from the topsoil to the subsoil due to rainfall augments the concentration level in the subsoil. Previous studies have shown that Cr is a significant source of elements discharged by a variety of industrial activities that pollute soils through waste disposal (Choppala et al., 2013; Guan et al., 2018; Pan et al., 2016). Zhang et al., (2016) performed a comprehensive study collecting numerous articles in China (464), and the findings in the papers collected indicated that the average amount of Cr in the agricultural soil was 78.94 mg/kg, which surpasses the geochemical background level (57.30 mg/kg). Further studies by Li et al., (2009) and Liu et al., (2015) have indicated that the use of sewage as a source of irrigation is likely to increase the Cr content of agricultural soil. The elevation of Cr in the topsoil can come from a multiplicity of sources, but in the current study area, Cr elevation will predominantly be attributed to steel plant and agricultural activities.

According to Foo et al., (2008) and Yang et al., (2011), manganese (Mn) is widely regarded as the most abundant lithospheric element and as a major soil element. The excessive manganese in topsoil might be attributed to the anthropogenic source supplementing the geogenic source. Various soils in Mn are enriched by anthropogenic inputs that may influence the function of the environment (Herndon et al., 2011).

The final factor (factor 3) was primarily dominated by Cr (59.9%), Cu (62.4%), Mn (44%), and Ni (65.2%) in the EF-PMF receptor model and Cu (59.4%) and Ni (57.5%) in the subsoil. The topsoil was eclipsed by Cr (74.7%), Mn (61.8%), and Pb (41.1%) for EF-PMF and for the PMF receptor model; it was largely influenced by Cr (48.4%) and Mn (48%). The consistency in both models in source apportioning the dominant PTEs in each factor is inline. Both models apportioned Cr and Mn as dominant PTEs in factor 3 for the subsoil, but EF-PMF goes a step further to include Mn and Cu. The dominant PTEs in the subsoils aligning with the PCA projection loading suggest that the dominant PTEs are more of geogenic origin. The source distribution in the topsoil by both models also exhibits a blend of both anthropogenic and geogenic sources. The spatial distribution of Pb in both topsoil and subsoil indicates anthropogenic-induced Pb concentrations in the soil. Research conducted by Zhao et al., (2018) indicated that PTE (Cd, Cu, Pb) intake from chemical fertilizers and manure increased by 3–4 percent yearly. Wang et al.,(2006) stated that anthropogenic activities, such as metal smelting, pollute the environment with PTEs such as Pb, but rain leaching carries the PTEs from the soil surface to the subsurface of the soil.

Performance of EF-PMF and PMF

The source analysis suggested that both models apportioned sources efficiently and showed a generally consistent trend of the same PTEs selected in the same factors. Nevertheless, based on the fixed criteria chosen by this paper in selecting the appropriate PTE to have a controlled factor (obtaining 40% or more), the traditional PMF apportioned more PTEs than the novel receptor model EF-PMF. Both models were subjected to multiple linear regression analysis based on the source apportionment accuracy, marginal error, and percentage efficiency. An intracomparison is performed by calculating the regression analysis between the models to see whether the model fits well (Song et al., 2008; Stanimirova et al., 2011) or which model optimizes efficiency and minimizes error. The estimated validation and efficiency assessment using R2, RMSE, and MAE suggested that the novel receptor performed better than the parent model PMF. According to Singh et al., (2017), to determine the performance of diverse modelling techniques, RMSE, R2, and MAE are most often employed. The coefficient of determination (R2) suggested that both model source apportionment efficiencies were consistent with each other. However, the root-mean-square error (RMSE) computed for both models and soil levels suggested that out of the 8 PTEs (As, Cd, Cr, Cu, Ni, Mn, Pb, and Zn) assessed, 7 (As, Cr, Cu, Ni, Mn, Pb, and Zn) from the EF-PMF receptor model showed a significant reduction in errors compared with the PMF. Similarly, the mean absolute error (MAE) computation also established the same results as RMSE when both receptor models were compared. Thus, this result implies that the novel EF-PMF model can apportion sources to PTEs with minimal errors compared to PMF. The hybridization of the enrichment factor to the parent receptor model PMF has exhibited its applicability and efficiency in source analysis of PTEs in agricultural soil. The pollution index enrichment factor has strengthened the base model PMF, thereby minimizing errors in source apportionment in this study. A similar comparative analysis was performed by Gholizadeh et al. (2016), who concluded that the R2 values of the observed/predicted plots for the majority of the water quality variables demonstrated superior goodness-of-fit with the APCS-MLR receptor modelling approach to pollution source apportionment than PMF. Contrary to the concluding remarks of Gholizadeh et al. (2016), Zhang et al., (2019) compared three receptor models (PMF, PCA-MLR, and Unmix), and based on the R2 values obtained, the authors established that PMF demonstrated superior goodness-of-fit in their studies than the other receptor models. Similarly, Larsen and Baker, (2003) and Yang et al., (2013) likewise upheld the results by using model accuracy and validation criteria to suggest that PMF was the finest model to expound PAHs in Huanghuai Plain soils.(Table 5).

Table 5 Comparison of the EF-PMF and PMF receptor models via multiple linear regression

Conclusion

This study investigates the source contribution of PTEs using a novel EF-PMF receptor model approach against a parent receptor model (PMF) and further evaluates the pollution characteristics and determines the spatial variability of PTE in agricultural soil at different depths. PMF has been used in various papers quantifying the source apportionment in sediments, determining air quality assessment (an air pollutant) and equally applied in the soil to determine the source contribution at different levels.

The study ascertained that both soils (topsoil and subsoil) Zn and Cd mean concentrations values of this present study are higher than all 3 geochemical background levels presented in Table 1. Similarly, the study suggested that Mn and Cu were also higher than the respective PTEs in EAV; likewise, Pd, mean values were higher than the same PTE in the WAV level. As, Cu, Mn, and Pb average values from the UCC were found to be lower than the mean values of the same PTEs in this study. The pollution indices, such as PI, EF, ER, hinted that some of the computed PTE concentration values fell within the low to moderate pollution/risk levels. Only the concentrations of Mn and Cd exhibited high pollution or considerable risk of significant pollution levels. The calculated Nemerow pollution index, pollution load index, and risk index revealed varied pollution levels ranging from low to moderate, with a few samples, indicating a high level of ecological risk in both soils. The PCM stated that there was a relationship between some of the metals, and the PCA identified the sources into two categories, namely geogenic (Mn, Cu, Ni, Cr) and anthropogenic (Pb, Zn, Cd, and As). The spatial distribution revealed hotspots of all the PTEs studied, showing that the pollution level in the subsoil is considerably higher than that in the topsoil. It further revealed a multiplicity of sources, such as agricultural activities, industrial activities, industrial waste, and geogenic sources, accounting for the pollution level in both soils. However, leaching and mobility of PTEs from the topsoil to the subsoil accounted for the higher level of PTE enrichment in the subsoil than in the topsoil. The receptor models EF-PMF and PMF showed a consistent source apportionment for PTEs in each factor. The receptor models suggested that geogenic, steel industry, agrochemicals, fertilizer application, metal works, and sewage irrigation were the prominent sources from which each PTE was inclined to be polluted. PTE pollution sources included a combination of anthropogenic sources such as chromium (agricultural activities, the steel industry, and industrial sewage irrigation). For instance, Cu and Ni were both dominant in both soils, which is factor 3 in the subsoil, and factor 1 in topsoil, which is also probably attributed to a geogenic source with a boost from agricultural activities such as fertilizers and pesticides. Mn, on the other hand, based on its richness in the lithosphere, is of more natural origin in both soils. The assessment of the receptor models using the validation and accuracy assessment criteria (RMSE, MAE, and R2) via multiple linear regression suggested that the novel receptor model EF-PMF performed better than the parent model PMF. The EF-PMF receptor model was able to minimize the error ascertained in source apportionment substantially. Out of the 8 (As, Cr, Cd, Cu, Ni, Mn, Pb, and Zn) PTEs assessed from each depth of soil sampled; 7 (As, Cr, Cu, Ni, Mn, Pb and Zn) performed better in the EF-PMF receptor model than PMF. Hybridization of pollution indices such as the enrichment factor with the PMF receptor model in source apportionment minimizes error and optimizes efficiency. The outcomes of this study suggest that the EF-PMF receptor model be used in all polluted soils and environments. Furthermore, this study demonstrated the dire situation of soil health and quality in the study area and provided the necessary information for stakeholders to take proactive efforts to reduce increasing PTE levels in agricultural soils.