Introduction

Trajectory analysis has been widely used to investigate the transport process of air pollutants and to figure out source areas that affect the air pollution at receptor areas, especially on the regional or interregional scale. To deal with large sets of trajectories, a variety of statistical analysis methods have been developed and applied, such as flow climatology, cluster analysis, residence time analysis and conditional probability, concentration fields, redistribution concentration fields, and inverse modeling (Stohl 1998). Tian et al. (2015, 2016) developed new methods, source directional apportionment (SDA) and source regional apportionment (SRA), to determine source transport and regional contribution quantitatively in China based on the trajectory analysis.

The potential source contribution function (PSCF) is a trajectory analysis method based on the concept of conditional probability (Ashbaugh et al. 1985; Malm et al. 1986). PSCF can indicate a potential source region contributing to a high air pollutant concentration by using the total number of trajectories over a given geographic region and the number of trajectories with a high air pollutant concentration at the receptor (Ashbaugh et al. 1985). It has been used in several studies on transport of many kinds of air pollutants (Zeng and Hopke 1989; Cheng et al. 1993a, b, 1996; Gao et al. 1993, 1994, 1996; Fan et al. 1995). Heo et al. (2009) used PSCF to figure out possible source areas contributing to elevated secondary particle concentrations (sulfate and nitrate) in Seoul. Nguyen et al. (2015) evaluated the influences of long-range-transported smoke plumes on the highly elevated OC concentrations measured at Gosan, a background site in Korea, using PSCF. Several researches applied PSCF for ambient particulate matters in China (Wang et al. 2004, 2006; Zhang et al. 2011). Several techniques have been developed and applied to reduce the uncertainty in PSCF, for example a weighting function and total PSCF (Hopke et al. 1995; Cheng et al. 1993b).

Biomass burning is a significant source of air pollution in Northeast Asia (Streets and Yarber 2003). There are several types of biomass burning such as residential burning for heating and cooking (biofuel burning), open burning after harvest, and forest fires. The injection height varies depending on the type of biomass burning; at the surface level for residential burning and at the atmospheric boundary layer level (500–3,000 m) in the mid-Korean peninsula (Choi et al. 2008) for open burning. The injection height from forest fires varies depending on the intensity of the fire. For some forest fires in Siberia in spring, researchers used a simulated vertical level of up to 5 km as the injection height in their modeling study based on the study results in the USA (In and Kim 2010). On the other hand, the injection heights from small open burning could be lower than it from forest fire because of lower heat flux released from small fire.

Polycyclic aromatic hydrocarbons (PAHs) are a class of compounds containing carbon and hydrogen with a fused ring structure containing at least two benzene rings. These compounds are widely distributed in the atmosphere and highly carcinogenic to humans. PAHs are mostly formed during the incomplete combustion and pyrolysis of fossil fuels or wood and from the release of petroleum products. Anthropogenic sources of PAHs are power plants, industrial, and domestic combustion, transportation, coke ovens, and biomass burning (Ravindra et al. 2008).

The ratio of each PAH species varies with combustion conditions such as fuel type and combustor temperature. That is, each specific emission source has a characteristic PAH composition profile. Therefore, particulate PAHs are suitable for receptor modeling (Marr et al. 2006; Lee and Kim 2007). Venturini et al. (2014) identified source contributions of air pollutants including PAHs subjected by positive matrix factorization (PMF) and examined if sources are local or regional among wind direction. Still, in the current PSCF analysis studies, all kinds of biomass burning are treated as one (see, for example, Kim et al. 2013). In other words, the conventional PSCF does not consider the injection height of air pollutants.

To differentiate the effects of these, we developed a new method, the three-dimensional potential source contribution function (3D-PSCF), and demonstrated its applicability. First, we developed a simple algorithm to account for the height of a trajectory with a high pollutant concentration in the PSCF. Then, we applied the developed 3D-PSCF to the measurement of particulate PAHs in Seoul to consider the differences of biofuel burning and small open burning from large open burning and forest fires. In doing so, we more clearly identified the emission areas for biofuel burning and small open burning.

Development of 3D-PSCF

Because 3D-PSCF is based on the conventional or 2D-PSCF, we first present the basic algorithm of the 2D-PSCF and explain its characteristics including limitations. Then we present the 3D-PSCF algorithm.

Conventional PSCF (2D-PSCF) and limitations

The PSCF value at each cell on a PSCF plot is the ratio between how often air parcels pass through the corresponding cell and how frequently the air parcels passing through the cell are associated with polluted air at the receptor site. Cells with high PSCF values have a high possibility of containing emission sources. We briefly introduce the 2D-PSCF method in this section for ease of reference. The description given here largely follows the work of Hopke et al. (1993).

Let N be the total number of trajectory segment endpoints during the whole study period, T. If n segment trajectory endpoints fall into the ijth cell (represented by n ij ), the probability of this event, A ij , is given by

$$ P\left[{A}_{ij}\right]\kern0.5em =\kern0.5em \frac{n_{ij}}{N} $$
(1)

Where P[A ij ] is a measure of the residence time of a randomly selected air parcel in the ijth cell relative to the time period T.

Suppose the same ijth cell contains a subset of m ij segment endpoints for which the corresponding trajectories arrive at a receptor site when the measured pollution concentrations are higher than a pre-specified criterion value. In this study, we used the calculated mean values for each species at each site as the criteria values. The probability of this high concentration event, B ij , is given by P[B ij ],

$$ P\left[{B}_{ij}\right]\kern0.5em =\kern0.5em \frac{m_{ij}}{N} $$
(2)

Like P[A ij ], this subset probability is related to the residence time of an air parcel in the ijth cell, but the probability B is for contaminated air parcels.

The potential source contribution function (PSCF) is defined as

$$ {P}_{ij}\kern0.5em =\kern0.5em \frac{P\left[{B}_{ij}\right]}{P\left[{A}_{ij}\right]}\kern0.5em =\kern0.5em \frac{m_{ij}}{n_{ij}} $$
(3)

P ij is the conditional probability that an air parcel that passed through the ijth cell had a high pollution concentration upon arrival at the trajectory endpoint. That is, P ij is the probability value showing the related source is located in the ijth cell and, thus, does not have unit.

The conventional PSCF analysis has several limitations. For instance, when a map grid cell contains too few trajectories, most of which are associated with poor air quality at the receptor site, the cell could exhibit an exceptionally high PSCF value that is statistically invalid because of the uncertainty associated with the small number of trajectory samples. To address such a problem, other researchers have used approaches with a weighting function (Zeng and Hopke 1989; Cheng et al. 1993a; b). However, we did not use a weighting function here because we wanted to evaluate the results of our new algorithm without the influence of other modification.

Another limitation is that the conventional PSCF cannot distinguish trajectories passing through at different altitudes. We define the injection height of a pollutant as the height up to which the pollutant can penetrate the atmosphere. Air pollutants emitted from a ground-based source are known to be diffused and well mixed within the boundary layer whose thickness is normally 1 or 2 km (Wallace and Hobbs 2006; Labonne et al. 2007). Altitude studies have shown that the air pollutants emitted from forest fires can be injected into the atmosphere above 2 km (Mazzoni et al. 2007; Labonne et al. 2007; Freitas et al. 2007). On the other hand, injection height of air pollutants emitted over cropland were usually below 1 km (Martin et al. 2010; Jian and Fu 2014). Thus, the injection height depends on the source, the transport characteristics of the pollutant, and the prevailing weather conditions. Naturally, an air parcel passing through a cell can catch a particular pollutant only when it passes through the cell at an altitude lower than the injection height of that pollutant. The conventional PSCF analysis cannot include the effects of injection height for air pollutants emitted from different biomass burning activities. This effect of the third dimension, i.e., the effect of the injection height, is completely neglected in the traditional PSCF analysis, which is why we call it the 2D-PSCF.

3D-PSCF

To include the effect of injection height, we include a threshold altitude in the PSCF calculations. Because we consider not only the horizontal information but also the vertical influence of the injection height in our analysis, we name our method 3D-PSCF.

The primary idea of the 3D-PSCF method is illustrated in Fig. 1. As discussed previously, the conventional 2D-PSCF analysis assumes that the air parcels passing through a cell containing an emission source are indiscriminately affected by the air pollution in that cell (Fig. 1a). On the other hand, a 3D-PSCF analysis considers an air parcel passing through a cell with an emission source to be unaffected by the pollution if it passes through the cell at an altitude higher than the threshold height (Fig. 1b, c). To differentiate a trajectory going through a cell below the threshold height from one above the threshold height, we simply ignore the segment end points that lie above the threshold height during the calculation of each cell’s PSCF value.

Fig. 1
figure 1

Schematic illustration of the concept of (a) 2D-PSCF and (b, c) 3D-PSCF. According to the 2D-PSCF analysis, trajectory A and trajectory B have the same possibility of being contaminated by a ground source of air pollutants. The 3D-PSCF analysis distinguishes trajectory A from trajectory B. Trajectory A cannot possibly contact the pollutants from the source area because it passes above the threshold height. The part of trajectory A passing above the threshold height (gray) is not counted in the 3D-PSCF calculation. Trajectory B and the black part of Trajectory A are counted as valid because they pass below the threshold height

The algorithm can be summarized as follows. Let N be the total number of trajectories observed during the whole study period. If n trajectories fall into the ijth cell below the threshold height (represented by n '  ij ) the probability of this event, A '  ij , is given by

$$ P\left[A{\mathit{\hbox{'}}}_{ij}\right]\kern0.5em =\kern0.5em \frac{n{\mathit{\hbox{'}}}_{ij}}{N} $$
(4)

The probability of a high concentration event, B '  ij , is again calculated using the threshold height.

$$ P\left[B{\mathit{\hbox{'}}}_{ij}\right]\kern0.5em =\kern0.5em \frac{m{\mathit{\hbox{'}}}_{ij}}{N} $$
(5)

where m '  ij trajectories pass through the ijth cell below the threshold height for which the corresponding trajectories arrive at the receptor site at a time when the measured concentration of the pollutant of interest is higher than the pre-specified critical value (typically, the median of the measured concentrations in the available data set).

The 3D-PSCF (P 3D ij ) is thus defined as

$$ {P}_{ij}^{3D}=\frac{m{\mathit{\hbox{'}}}_{ij}}{n{\mathit{\hbox{'}}}_{ij}} $$
(6)

In other words, P 3D ij is the conditional probability that an air parcel that passes through the ijth cell below the threshold height exhibits a high concentration upon arrival at the receptor site. To evaluate the results of 3D-PSCF independently, we did not apply any other techniques to minimize the uncertainty in the PSCF calculation.

Methods

Ambient data

We collected samples at the Yongon campus of Seoul National University, Seoul, Korea (37°35′N, 127°00′E). We carried out sampling for 24 h every third day with no rain from September 2006 to August 2007. Thirteen PAH compounds were identified and quantified: Phen, Anthr, Flt, Pyr, BaA, Chry, BbF, BkF, BeP, BaP, Ind, DahA, and BghiP. The particulate PAH measurement result we used was described in detail in Lee et al. (2011).

We applied chemical mass balance (CMB) modeling to our PAH data (Jung et al. 2015). We considered seven major PAH source profiles in Table 1 : coal for power plants, coke ovens and coal for residential use, gasoline and diesel vehicles, NG combustion, and biomass burning and used the contributions of biomass burning. The PAHs source profiles devised in Lee and Kim (2007) [the original references for the source profiles: coal for power plant, coke oven and coal for residential—Li et al. (2003), gasoline and diesel vehicles—Rogge et al. (1993a), NG combustion—Rogge et al. (1993b), and biomass burning—Rogge et al. (1998)]. More information about CMB modeling for PAH data is available in Lee and Kim (2007) and Kim et al. (2013).

Table 1 Source composition of individual PAH compounds in the particle phase used for source profiles (Lee and Kim 2007)

Backward trajectory data

We performed backward trajectory analysis for the sampling days using the HYSPLIT4 (Hybrid single-particle lagrangian integrated trajectory) model (http://www.arl.noaa.gov/ready/hysplit4.html, NOAA Air Resources Laboratory, Silver Spring, MD, USA) with GDAS (Global Data Assimilation System, operated by the US National Weather Service’s National Centers for Environmental Prediction) as the meteorological data. We calculated the trajectories at hourly intervals from 15:00 UTC when sampling started to 15:00 UTC the following day when sampling finished. For each sampling day between 2006 and 2007, we used six starting heights: 500, 1,000, 1,500, 2,000, 2,500, and 3,000 m. Given 73 sampling days, we mapped 10,950 trajectories to the PSCF.

When we carried out research on long-range transport of air pollutants from Siberia and Mongolia, 72 h (3 days) backward trajectories were insufficient for some sampling days. The backward trajectories during sampling days with slow wind speed covered only small parts of China and the Yellow Sea. Thus, to consider long- or middle-range transport of air pollutants, we set the duration time for these data to 120 h (5 days) in the backward trajectory analysis. Figure 2 shows the frequency with which backward trajectories passed through each grid cell on sampling days. Most air parcels traveled over North/Northeast China and passed over the Yellow Sea and then arrived at Seoul, though a few air parcels passed over Japan and East China during the sampling period.

Fig. 2
figure 2

Backward trajectory frequency during the sampling days between 2006 and 2007

Results and discussion

We applied total particulate PAH concentrations and the particulate PAH concentrations estimated to be from biomass burning to identify the variation of the major source area between 2D- and 3D-PSCF.

Figure 3 illustrates the 2D-PSCF result for the total particulate PAH concentrations observed at Seoul. Figure 4 shows the 2D-PSCF result for the particulate PAH concentrations estimated to be from biomass burning based on the CMB modeling result. The total particulate PAH concentration 2D-PSCF plot in Fig. 3 displays two major regions as probable primary contributors to high PAH levels in Seoul. The first region is over the northern part of Hebei Province, Beijing and Tianjin, China. The second region is the border area between Dandong, China, and Sinuiju, North Korea. On the other hand, as shown in Fig. 4, the PSCF values for particulate PAHs from biomass burning were high over the northern part of Hebei Province, Beijing and Tianjin, China.

Fig. 3
figure 3

PSCF plot of total particulate PAHs

Fig. 4
figure 4

PSCF plot of particulate PAHs emitted from biomass burning calculated from the CMB modeling

Figures 5 and 6 show the 3D-PSCF values for total particulate PAHs and particulate PAHs from biomass burning, respectively, calculated with varying threshold heights. Figure 5 confirms the major particulate PAH emission area as similar to that shown in Fig. 3; the north part of Hebei Province, Beijing, Tianjin and Dandong, Liaoning Province. As the threshold height decreases, however, hot spots appear more clearly at Beijing, the north part of Hebei, and the border area between Liaoning Province and North Korea. Figure 5 displays two main regions that contributed to the elevated concentration of particulate PAHs in Seoul. Region 1 started in the north part of Hebei Province and mainly appeared in Beijing; Region 2 appeared throughout the east part of Liaoning Province, China, and expanded to the border area between Dandong, China, and Sinuiju, North Korea. The two regions are most clearly shown in the PSCF plots with threshold heights of 2,000 m. On the contrary, at the threshold height of 1,500 m, the effects of particulate PAH emissions in Beijing was decreased and new hot spots appeared at Inner Mongolian region. 3D-method improved the ability of PSCF to indicate source location compared to 2D-method based on the value of PSCF. For example, the PSCF value at Beijing in Fig. 3 (using 2D-PSCF) was 0.6661. The values became higher, when threshold heights were lower in 3D-PSCF. However, when threshold height was 1,500 m, the value was lower than in 2D-PSCF due to the number of trajectories was too small to statistically meaningful result, as shown in Table 2 . For the same reason, new hot spots appeared at Inner Mongolian region when threshold height was 1,500 m, in Fig. 5 (d), could be statistically meaningless. Still, with appropriate threshold height, 3D-PSCF identified major source area, such as Beijing, more clearly compare to 2D-PSCF.

Fig. 5
figure 5

PSCF plot of total particulate PAHs with varying threshold heights: a 3,000 m, b 2,500 m, c 2,000 m, d 1,500 m

Fig. 6
figure 6

PSCF plot of particulate PAHs from biomass burning with varying threshold heights: a 3,000 m, b 2,500 m, c 2,000 m, d 1,500 m

Table 2 The variation of the PSCF values between 2D- and 3D-PSCF

Region 1 passed through high-energy-consumption areas in China. Coal consumption in Beijing, Tianjin, and Hebei Province was more than 9 % of the total coal consumption in China between 2006 and 2007, as shown in Table 3 . Liaoning Province consumed 4.6 % of all the coal in China between 2006 and 2007 (China NBS 2010) and contained major coal deposits and coal mines (IEA 1999). Therefore, northerlies or northwesterlies could transport a major fraction of particulate PAHs emitted from those areas to Seoul. The 3D-PSCF distinctly indicated the effects of emissions in Beijing, Tianjin, and Liaoning on particulate PAH concentrations in Seoul more clearly, as shown in Fig. 5, whereas 2D-PSCF evaluate the effect of those areas in low values and does not indicate the impact from Inner Mongolian region, as shown in Fig. 3.

Table 3 Coal consumption in China by region (unit, 10,000 tonne) (China NBS 2010)

In general, air pollutants emitted from the surface mix well in the boundary layer, 1–2 km. The mixing heights were generally less than 300 m under stable conditions and were in mixed-well conditions at 2–3 km in Seoul (Kim et al. 2007). In addition, the injection height of aerosols emitted from biomass burning varies with the conditions of burning, such as burning area, time, and materials (Mazzoni et al. 2007; Labonne et al. 2007; Freitas et al. 2007; Martin et al. 2010; Jian and Fu 2014). However, we considered that the limitation of injection heights for aerosols emitted by biomass burning was generally about 2 km. In this study, the effects in identifying the geographic location of an emission source using 3D-PSCF were noticeable below a height threshold of 2,000 m. However, when the threshold was lower than 1,500 m, PSCF values could be overestimated because of a decrease in the total number of trajectories used in the calculation.

Figure 6 illustrates the PSCF values for particulate PAHs from biomass burning with varying threshold heights. As the threshold heights decreased, the major source areas for particulate PAHs from biomass burning expanded over North China, such as Beijing and Hebei Province, which is indicated Region 1 in Fig. 6. We suggest that biomass burning in North China significantly influenced the air quality in Seoul because biomass burning occurred widely in North China. The emission of organic carbon (OC) from biomass burning in Northeast China (Heilongjiang Province and Jilin Province), located right above North Korea, was 27.5 % of the total OC emissions from biomass burning throughout China (Streets et al. 2003). However, PSCF values for particulate PAHs emitted from biomass burning appear high over North China (i.e., Beijing, Tianjin, Hebei, and part of Shandong) in Fig. 6, possibly because PAH emissions from biomass burning and biofuel consumption were larger in North China than in Northeast China (Zhang and Tao 2008). Compared to the result of the conventional PSCF, Region 1 is clearly displayed in the results with 2,500 and 2,000 m thresholds in Fig. 6. Furthermore, a new track appeared from the north. When applying a 1,500 m threshold, the possible source areas appear quite fragmented, and the track is not apparent. Based on the results of Figs. 5 and 6, 2,000 m height is an appropriate threshold height in 3D-PSCF to find source area for residential biomass burning and small open burning. The PSCF values in Table 2 also confirms this result.

Whereas the PSCF values over China vary considerably between the 3D-PSCF and the conventional 2D-PSCF, we did not observe a similar change over North Korea. However, PSCF values over Pyongyang and the west part of North Korea gradually became noticeable from those of surrounding areas as the threshold heights decreased. Pyongyang, the capital of North Korea, has approximately 13.6 % of the total population in North Korea and is the country’s center of industry, economy, and transportation (UNEP 2012). Therefore, energy consumption in Pyongyang including biofuel might be larger than in the surrounding area. Differences in air pollutant emissions inside North Korea were thus distinguished in 3D-PSCF, whereas the 2D-PSCF did not show a higher value over Pyongyang.

Conclusion and further study direction

We developed 3D-PSCF to apply injection heights to the PSCF calculation. As sources for high particulate PAH concentrations in Seoul, 3D-PSCF clearly distinguished the mega-source areas, Beijing, Tianjin, and Liaoning, especially at 2,000 m, unlike 2D-PSCF. The high potential source areas for biomass fuel burning estimated by 3D-PSCF were including Pyongyang, North Korea, which did not appear in the 2D-PSCF results. Lee and Kim (2007) and Kim et al. (2013) evaluated North Korea as a source area of particulate PAHs in Seoul emitted by biomass burning between 2002 and 2003. Using PSCF calculations with threshold heights allowed us to identify the geographical position of large source area, big cities using biomass for fuel.

Nonetheless, this study has some limitations. Statistical errors could occur in the PSCF calculations. We did not use the arbitrary weight function often used in other PSCF calculations to reduce the values of cells with few endpoints. Those kinds of statistical errors could be reduced by using more sampling data. In future work, we will add ambient data from between 2010 and 2011. In addition, the inherent uncertainty of CMB results could affect understanding of the source of biomass fuel burning in the 3D-PSCF calculation. The use of another biomass key marker, such as levoglucosan (1,6-anhydro-b-d-glucopyranose), could reduce that uncertainty. The work would support the research for quantifying the impact of air pollutants emitted from biomass fuel burning in North Korea to air quality in Seoul.