1 Introduction

Air pollution is one of the world's leading environmental problems recently due to many factors. Particles in the air, including solid, liquid, and gaseous ones, produce air pollution (Kanta et al., 2024; Singh & Singh, 2022; Srivastava, 2022). These gases and particles can originate from factories, dust, pollen, mold spores, volcanoes, wildfires, automobile and truck exhaust, and other sources (Luo et al., 2023; Xu et al., 2023). Particulate matter is one of the air pollutants in the atmosphere (Duarte et al., 2022). Xu et al of 2.5 microns (PM2.5) can pierce deep into lung passages and enter the bloodstream causing severe cardiovascular, cerebrovascular, and respiratory illnesses related to both short-term and long-term exposure to particulate matter (Ahmadi et al., 2024; Sharma et al., 2024). Air pollution can harm human health and cause many diseases—long-term exposure linked to cancer and poor neonatal outcomes (Anjum et al., 2024; Johnson et al., 2021). Globally, ambient air pollution is related to an estimated 4.2 million fatalities, primarily due to heart disease, stroke, chronic obstructive pulmonary disease, lung cancer and acute respiratory infections (Chen et al., 2020; Santos et al., 2021; Sofwan et al., 2021; Sundram et al., 2022; Tung et al., 2022; Wan Mahiyuddin et al., 2023).

The contribution to particulate matter (PM) pollution causes due to rapid urbanization, traffic, biomass burning, industry, sea salt, and road dust/soil (Dhammapala et al., 2022; Gupta et al., 2022; Luo et al., 2024; Rahman et al., 2022; Shang & Luo, 2021; Soleimani et al., 2022). Urban areas have more industrial activities, vehicular traffic, and energy consumption (Akomolafe et al., 2024). Daily traffic congestion in residential areas makes PM2.5 a severe health concern (Singh & Agarwal, 2024). Vehicles, industrial processes, garbage burning, co-burning power plants, and other natural and human sources generate these particles. Vehicle traffic is one of the leading causes of PM2.5 in urban and suburban areas (Jiang et al., 2023; Soleimani et al., 2022). The concentration of PM2.5 at spatial changes means the change in place or space. It describes a phenomenon that changes specification location or graph size areas. Spatial changes in PM2.5 highlight how particle concentrations differ across various places (Meng et al., 2024). Spatial distribution of air pollution, such as PM concentration in urban regions, was higher than in rural areas of the same city (Xiao et al., 2020).

In Malaysia, there are two monsoon seasons: the southwest monsoon, between June to September, and the northeast monsoon, between November and March. According to the Malaysian Meteorological Department, inter-monsoon seasons occur in Malaysia from April to May and October to November. The southwest monsoon is the driest time of year over the whole nation, except for the Sabah in East Malaysia (Ramli et al., 2024). Most states record the lowest monthly rainfall during this time. The equatorial region's atmospheric conditions during this monsoon season can be classified as relatively stable. Meanwhile, the northeast monsoon is the rainy season in Malaysia. Monsoon weather systems that developed along with cold air outbreaks from Siberia produced heavy rains that often caused severe flooding in the states of Kelantan, Terengganu, Pahang, and East Johor in Peninsular Malaysia, as well as the state of Sarawak in east Malaysia (Mohtar et al., 2018; Rahman et al., 2015). Haze also contributes to air pollution that occurs every year affected Southeast Asia countries (Fang et al., 2024; Gu et al., 2024). Haze episodes are now of great concern because they affect Malaysia virtually annually. According to the Malaysia Department of Environment (DOE), Malaysia declared several regions an emergency in 1997, 2005, 2013, and 2015 (Abd Wahid et al., 2018; Jaafar et al., 2018) and the latest one occurred in 2019 (Othman et al., 2022) with worst haze episode on record was in Malaysia (DRAHMAN et al., 2024; Fang et al., 2024).

Nonparametric regression is a technique for estimating the relationship between dependent and predictor variables that does not rely on any presumptions regarding the functional form of the relationship or the statistical distribution data (Yu et al., 2004). Nonparametric regression is a statistical technique that adapts to data, allowing for flexible prediction rules when parametric models fail to capture the true relationship between predictors and dependent variables. In previous research, Yu et al. (2004) examined the relationship between concentrations of 03, NOx, SO2 and respirable suspended particles (RSP) taken in the vicinity of Hong Kong International Airport (HKIA) and Los Angeles International Airport (LAX). Henry et al. (2002) used nonparametric regression to investigate the relationship between air pollution concentration and wind speed. The results are smooth curves with error bars that make it possible to pinpoint the locations of adjacent sources and the direction of the wind where the concentrations peak, equations for this approach are presented, along with the corresponding confidence interval. Even though nonparametric regression can produce smoothed regression curves with confidence bands, no statistical framework has been developed to test for the presence of directionality in the context of nonparametric regression, and most of the conclusions were made based solely on regression plots (Cheng et al., 2015).

To assess and estimate the values of particles scattered in space or on Earth, a class of spatial statistical techniques known as geo-statistics was established (Gorai et al., 2015). A specific kind of deterministic approach for multivariate interpolation using a known distributed collection of points is inverse distance weighting (IDW). IDW is flexible in sample points by adjusting the number of sample points for better interpolation (Fattorini et al., 2023). Besides that, the IDW models fit the observed data points quite well, indicating that the analysis findings can accurately monitor and forecast PM2.5 concentration (Ismain et al., 2023). A weighted average of the available locations' values is used to determine the values allocated to the unknown sites (Aniyikaiye et al., 2024; Kumar & Kumar, 2020; Shukla et al., 2020). Based on previous research, (Zhang et al., 2018) used Inverse Distance Weighted (IDW), Spline, and Kriging to calculate spatial interpolation to evaluate air quality to measure various air pollution in Shanghai. For Mohammadi et al. (2024), interpolation for the Isfahan city region was carried out using the IDW (Inverse Distance Weighting) method and the projected data for 2020 was introduced and the pollution map was displayed for each month of the year. Besides that, Jumaah et al. (2019) developed an Air Quality Index (AQI) prediction algorithm based on meteorological parameters collected using inverse distance weight (IDW) in Kuala Lumpur.

Air pollution affects human health, especially in developing cities, due to the spread of car exhaust, factory fumes, open burning, etc. The study focuses on the particulate matter with a diameter of 2.5 microns (PM2.5) because the concentration of PM2.5 is more harmful and smaller compared to PM10 which can penetrate deeper into the respiratory system. Besides that, PM10 is substance that is common but PM2.5 is still new in Malaysia. It can be seen that the collection of data starting from July 2017, just 4 years compared to PM10, more than 10 years of data collection from DOE. So, there are limited references or research on PM2.5 concentration in Malaysia compared to other developed countries. The researchers used nonparametric methods to show the relationship between PM2.5 and trends or seasonal factors during the study period. In addition, to obtain precise information about the study area, the spatial statistical method, IDW, is used to get an interpolation image of the mapping of the study area. This study aims to find the relationship between PM2.5 over time, trends, seasonal patterns concerning time change and PM2.5 variation in Klang Valley in Peninsular Malaysia. The relationship between variables, trends and seasonal patterns are defined using nonparametric. At the same time, the PM2.5 variation of spatial changes in September 2019 due to haze episodes will be explained in the affected area using IDW. The findings of this study can be directly applicable to other sites or regions to give government agencies benchmarks for managing air quality and health issues as well as crucial details on the long-term health hazards faced by the studied area's inhabitants. In general, this study offers significant perspectives for managing air quality and controlling pollution. Preserving public health and promoting sustainable development require an understanding of the spatiotemporal dynamics of PM2.5 components.

2 Data and Study Area

2.1 Area of Study

Kuala Lumpur, or the Federal Territory of Kuala Lumpur, is one of the fastest-growing cities in Asia and the largest city in Malaysia. Kuala Lumpur is 25 miles (40km) east of Peninsular Malaysia's ocean port, Port Kelang, on the Strait of Malacca in the country's west central region, midway along the west coast's tin and rubber belt. It is the nation's most extensive urban region and its transportation, commercial and cultural hub. It is among the fastest-growing metropolitan regions in Southeast Asia in population and economic development. Batu Muda and Cheras monitoring stations are classified as urban areas in Kuala Lumpur. Putrajaya is in west-central Peninsular Malaysia and 15 miles (25km) south of Kuala Lumpur. It functions as the administrative and judicial capital of Malaysia. Putrajaya monitoring station is in Putrajaya and is classified as an urban area due to the high level of social activity in the government administration sector, with the location of almost all Malaysian government ministries concentrated in that place. Selangor is a state in western Peninsular Malaysia with a view of the Straits of Malacca. The state is level in the west, and the mountainous that surround the state's western edge effectively create the Klang Valley. Selangor, which covers an area of around 8,000 km2, stretches to the north coast of Melaka on the west side of Peninsular Malaysia. It borders the Federal Territories of Kuala Lumpur and Putrajaya and is situated in the centre of Peninsular Malaysia on the west coast. Selangor has several monitoring stations, including Kuala Selangor, Petaling Jaya, Shah Alam, Klang and Banting.

This study focuses on one parameter, which is PM2.5 concentration. The monthly average dataset of PM2.5 concentration from January 2018 to December 2021 was provided by the Department of Environment (DOE), Malaysia. There are 65 monitoring stations in Malaysia, but our focus is Klang Valley. All stations are in the Klang Valley, and the station involved is in a state at Selangor and Federal Territory, Kuala Lumpur, and Putrajaya. The peninsula of Malaysia, also known as West Malaysia, includes the central zone, south zone, north zone, and east zone, with the region of the 13-state federation of Malaysia. The Klang Valley, located in the Central zone, is depicted on a larger map than the Klang Valley, shown on a smaller map of the peninsula of Malaysia in Fig. 1. The total stations involved are eight stations which are Batu Muda (S1) and Cheras (S2) stations in Kuala Lumpur, Putrajaya (S3) station in Putrajaya, Kuala Selangor (S4), Petaling Jaya (S5), Shah Alam (S6), Klang (S7) and Banting (S8) stations in Selangor. There is a further classification of station development locations located in urban (Batu Muda, Cheras, Putrajaya, and Klang), suburban (Kuala Selangor, Shah Alam, Banting) and industrial (Petaling Jaya).

Fig. 1
figure 1

The Map of Peninsula Malaysia consist of the Klang Valley

2.2 Collecting and Processing Data

Eight continuous air quality monitoring stations (CAQMSs) in Peninsular Malaysia's Klang Valley were used to gather PM2.5 concentrations. Each monitoring station's location in Peninsular Malaysia's Klang Valley is depicted in Fig. 1. On behalf of the Malaysian Department of Environment, Pakar Scieno TW Sdn Bhd is in charge of running every CAQMS. The PM2.5 concentrations were measured using a Thermo Scientific tapered element oscillating microbalance (TEOM) 1405 DF (USA). Before submitting the data to the Malaysian Department of Environment, Pakar Scieno TW Sdn Bhd (Shah Alam, Malaysia) handled all calibration processes and quality control/quality assurance (QA/QC) of the data (Mohtar et al., 2022; Rusmili et al., 2023).

2.3 The Breakpoint of PM2.5 Concentration in the Air Pollutant Index (API)

The air pollutant index (API), which might arise due to traffic, forest fires, industrial activity, or other things that can increase air pollution, is computed by averaging data from an air quality sensor. For Malaysian API calculations, it is advised that all pollution breakpoints adhere to the United States Environment Protection Agency (USEPA) recommended breakpoints based on health implications (Latif et al., 2014; Rusmili et al., 2023; Usmani et al., 2020). The pollutants were measured at various averaging periods per WHO guidelines. Based on the recommendations from the Malaysian Ambien Air Quality Standard (MAAQS) that follow the requirements from the perspective of human health effects, the average time of particulate matter with the size of less than 2.5-microns (PM2.5) breakpoint over 24 h is 35 (µg/m3). In contrast, the average time over a year is 15 (µg/m3). According to the USEPA, the categories of the breakpoint PM2.5 are Good (0–12.0 µg/m3), Moderate (12.1 – 35.4 µg/m3), Unhealthy for the Sensitive Group (35.5–55.4 µg/m3), and Unhealthy (55.5–150.4 µg/m3).

3 Methodology

3.1 Nonparametric Regression

Nonparametric methods over parametric ones are multiple advantages as complex nonlinearities frequently characterize the causal relationship (Shahbaz et al., 2020) between PM2.5 concentration and time, nonparametric estimators are better suited to model situations of these nonlinearities without modifying the functional form (Salibian-Barrera, 2023). When using parametric methods to describe nonlinearities like abrupt or smooth structural breaks, dummy variables must be added to the parametric specifications. This can cause issues with the asymptotic features of estimate methods, especially when the methods are applied to small sample sizes. Valid estimates can be produced by nonparametric empirical approaches when there are abrupt or smooth structural changes (Shahbaz et al., 2017). Nonparametric regression estimates the mean value of a dependent variable given its importance to one or more predictor variables (Yu et al., 2004). A general model;

$$y=m\left(x\right)+\varepsilon$$
(1)

where \(y\) is the response variable (PM2.5), x is the covariate (time), and \(\varepsilon\) is an independent error term with mean 0 and variance \({\sigma }^{2}\), \(m\left(x\right)\) is a nonparametric regression curve at a specific point \(x\). So, to test the no-effect model, there are null and alternative hypotheses for testing no effect, namely

$${H}_{0}:E\left({y}_{i}\right)=\mu$$
(2)
$${H}_{0}:E\left({y}_{i}\right)=m({x}_{i})$$
(3)

Pseudo-likelihood ratio test (PLRT), or F test, checks the relationship between two variables. If the F statistical is more than F tabulate, the null hypothesis is rejected and concluded that there is no relationship between the predictors and the outcome. Besides the F test, the significance value (p-value less than 0.05) also can be used to test the hypothesis testing. Based on the graph, if the curve exceeds the band, then this indicates that the parametric and nonparametric models are more than two standard errors. If the curve exceeds the band, we can conclude that there is some evidence of a relationship between the two variables (Azzalini et al., 1989; Bowman & Azzalini, 1997; Eubank & LaRiccia, 1993; Huang & Su, 2009).

The width of the band and the detailed shape of the curve change. However, this does not markedly change the indication of where and to what extent the nonparametric estimates exceed the reference band. Reference band stretches from

$$\overline{y }-2\widehat{\sigma }b to \overline{y }+2\widehat{\sigma }b$$
(4)

Where \(b\) denotes the quantity \(\sqrt{\sum_{i}{({v}_{i}-\frac{1}{n})}^{2}}\), \({v}_{i}\) is a dependence of the value on \(x\); n is the sample point number while \(\widehat{\sigma }\) is the estimated standard deviation.

The band was calculated at each position using the standard error of the difference between the two curves. The band has a straightforward hypothesis testing interpretation that works for smoothing approaches that introduce bias. It can be beneficial as a graphical tool for locating any differences that may be found or for elucidating why apparent differences do not provide robust evidence of statistical significance for the global comparison of curves. Reference bands for equality are developed and investigated in various contexts (Bowman & Azzalini, 1997; Dong et al., 2023).

A regression method called Locally Estimated Scatterplot Smoothing (LOESS) enables drawing off a smooth line connecting the scatter plots. It aids in demonstrating the connection between variables and trends in variables. It is a nonparametric regression technique that combines K-nearest neighbour regression with multiple regression. Without making any data assumptions, nonparametric regression identifies a curve. This smoothing algorithm detects broad patterns and predicts how different variables will interact. A line flowing along central tendency is the outcome of LOESS. Large data sets are mainly used to demonstrate the connection between two variables. A locally estimated scatterplot (loess) was used to smooth out monthly changes in PM2.5 concentration in the monitoring station to compare four years with time series and trend analysis (Bowman & Azzalini, 1997; de Jesus et al., 2020; Khan et al., 2022). The local estimated smoothing (LOESS) is used to smooth out monthly changes in air quality along with time series and trends analysis to compare air quality over four years.

3.2 Inverse Distance Weighting (IDW)

The most common geographic information system (GIS) method is inverse distance weighting (IDW). The statistics data are taken into consideration because IDW is a particular interpolator. IDW is a deterministic interpolation technique that estimates the values of unsampled points according to the values at nearby locations weighted only by distance. In IDW, a known point close to an unknown point is given a relatively bigger weight, and a known point far away is given a relatively smaller weight (Choi & Chong, 2022). IDW assumes that the relationship between the nearby and interpolation locations is closer. The equation (Oyana, 2020).;

$$z\left(x\right)=\frac{{\sum }_{i=1}^{n}{w}_{i}{z}_{i}}{{\sum }_{i=1}^{n}{w}_{i}}$$
(5)
$${w}_{i}=\frac{1}{{d}_{i}^{2}}$$
(6)

where \(z\left(x\right)\) is the value for unknown point \(x\); n is the sample point number near the target location; \({w}_{i}\) is the weight for sampled point \({x}_{i}\), \({z}_{i}\) is the value for a sampled point \({x}_{i}\) and \({d}_{i}\) are the distance from point \({x}_{i}\) to point x.

IDW provides an easy way to predict values of continuous variables at locations where measurement is unavailable. However, it is not sensitive to areas of peaks or pits and would lead to undesirable results. The IDW approach is also an appropriate and valuable spatiotemporal prediction method for spatial air quality modelling when the point observation is less abundant (Jumaah et al., 2019). IDW interpolation expressly operates under the presumption that variables at closer locations share more characteristics than those farther apart (Shukla et al., 2020). The spatial surface produced by spatial interpolation might not reflect the actual pollution surface because there are few monitoring stations. Despite this, it is a fair estimation of air pollution spatial variation.

In this study, R software version 4.2.1 was used. Library "pheatmap" and "ggplot2" were used to analyse and construct the graph. Besides that, temporal patterns of monthly aggregated data at eight locations were evaluated. QGIS software was used to interpolate and map the study area by using the Inverse Distance Weighting (IDW) method. This method is suitable for this analysis because it is natural and appropriate for modelling spatial air quality with fewer observation points (Halim et al., 2020). The study’s framework focuses on analyzing PM2.5 concentration across both time and space. We establish correlations between PM2.5 concentration, trends, and seasonal fluctuations using nonparametric regression methods within the data period from 2018 to 2021. Additionally, for the spatial aspect, the IDW approach provides a clearer understanding of spatial variations during haze episodes in September, specifically in the critical year of 2019, which experienced significant impacts Fig. 2.

Fig. 2
figure 2

The framework of the study

4 Result and Discussion

Table 1 Descriptive Annual of PM2.5 Concentration from the year 2018–2021

In 2019, the mean of PM2.5 was high compared to other years (Table 1) because of haze episodes that occurred in that year. In addition, socioeconomic occur rapidly simultaneously with the increase of development of cities in the urban and suburban areas. Based on New Malaysia Ambient Air Quality Standard, particulate matter of fewer than 2.5 microns (PM2.5) with an annual average time is 15µ/m3 and 35 µ/m3 for 24 h. The air quality standard in WHO is ten µ/m3 for one year and 25 µ/m3 for 24 h (Usmani et al., 2020).

Table 2 Hypothesis Testing for eight monitoring stations in Klang Valley

According to Table 2, all stations (S1-S8) achieve significance with a value less than 0.05. So, by rejecting the null hypothesis, there is no bias because there is no evidence of a relationship between PM2.5 concentration and times. Monthly average data is used to illustrate the trend of each station (S1-S8) from 2018 to 2020. The concentration of PM2.5 is one of the air pollutants analysed to see the movement of the data using LOESS. Nonparametric regression is used to check the relationship between PM2.5 concentration and times. A reference band (grey colour) is placed on the graph to indicate where the nonparametric regression curve should lie under the null hypothesis (Lin et al., 2022; Simonoff & Tsai, 1991). The reference band for the no-effect model for PM2.5 concentrations and time relationship are based on the data provided by DOE (Fig. 3).

The time series plot (Fig. 3) shows fluctuation for all stations. In the year 2019, in the middle year, there are sudden increase or spike in September at all stations. This spike happens due to haze episodes affecting Southeast Asia, including Malaysia. In 2020, all stations showed a decrease in PM2.5 concentration compared to the previous year. Due to the pandemic COVID-19 occurred early in 2020, and it had a positive effect on the natural environment and air quality improvement because of a government order to prevent people from going out to prevent the spread of the virus COVID-19 (B. M. Hashim et al., 2021a, 2021b). Even though there were fluctuations, the value of concentration PM2.5 was still lower than in the years 2018 and 2019. The end of 2020 shows an increase from November to December at the station (S4, S5, S6, S7, S8) due to the relaxation of the restriction order implemented by the government of Malaysia. In the early year 2021, there is increasing in PM2.5 because there were economic activities that started to operate because the government lowered or lifted the rule of the restriction order to help people lives and do their activities like everyday life before the restriction order (Elengoe, 2020; J. H. Hashim et al., 2021a, 2021b).

Fig. 3
figure 3

The trend of the monthly mean concentration of PM2.5 from 2018 to 2021. A locally estimated scatterplot was used to smooth out monthly fluctuation. The shadows represent standard errors

The concentration of PM2.5 at Batu Muda Station began to increase in March and decreased in October 2018 and 2019. In September 2019 showed a high concentration of PM2.5. In 2020, there were fluctuations between months. It started to increase in March, decreased in April, and rose again until June. In July, PM2.5 fell, rising then in August. Then it declined again in September. It increased in October, then dropped until December (Fig. 4(a)). Cheras stations are in urban areas and developing cities. The concentration of PM2.5 started to increase at the beginning of the year. September 2019 shows the highest concentration of PM2.5 between 3 years. July, August, and September months provide a high concentration of PM2.5 due to the dry season (Fig. 4(b).

In 2018 and 2019, the concentration of PM2.5 increased from January to March and June to September. Putrajaya is an administrative city where many people live and work, resulting in many vehicle users. Also, many concrete buildings cause the area to heat up. Humans significantly impact urban climate because the concentration effects of human activities can differ considerably from those in nearby rural areas. Urbanisation's climatic effects can be seen in urban heat islands (Salleh et al., 2013). In 2020, the concentration of PM2.5 was lower than the previous year because the government implemented work-from-home (WFH) to reduce interaction between people and cause minimal use of transportation (Fig. 4(c)). Kuala Selangor Station has a lower concentration of PM2.5 because it is in a rural area. In 2019, the concentration of PM2.5 was high compared to 2018 and 2020. This year, a haze occurred, affecting almost Malaysia, especially in September 2019. The concentration of PM2.5 increased starting in the middle of the year due to the dry season that began in May to August, except in September 2019 and decreased from September to December due to the rainy season (Fig. 4(d)).

At the Petaling Jaya station, 2018 and 2019 had a higher concentration than 2020. From January to April, the concentration was between 21 -32 (µg/m3), then decreased in May, then increased again from June to August except for 2019, which still increased until September 2019. Then it began to decline at the end of the year. In 2020, the ups and downs throughout the year due to the transition of movement control orders (MCO) that allow people to go out slowly. From November to December 2021, there was an increase in PM2.5 as the government began to lose the restriction orders that allowed some businesses to operate (Figure 4(e)) (Abdullah et al., 2020). In the urban area, the Shah Alam monitoring station gave high PM2.5 concentrations in 2018 and 2019. 2019 from January to March and June to September, there was an increase in PM2.5 concentrations. In 2020, the concentration of PM2.5 will be below 20 throughout the year (Figure 4(f)).

The concentration of PM2.5 in 4 years was similar. Klang Station is located on the city's outskirts but close to the sea, which has Port Klang with high activity of ships in and out from the port. The release of smoke from ships causes the release of particles that cause air pollution in the atmosphere. Although in 2020, most human activities are closed, Klang Port is still functioning as usual (Figure 4(g)). Banting station, located in a suburban, showed a high concentration of PM2.5 in 2018 and 2019. Due to the dry season, there are increases from June to September, which start declining in October. In the year 2020, the concentration of PM2.5 is lower compared to other years because of the COVID-19 outbreak (Figure 4(h))(Abdullah et al., 2020; Mohd Nadzir et al., 2021).

Fig. 4
figure 4figure 4

The trend of the monthly mean PM2.5 concentration in 4 years; (a)Batu Muda(S1), (b)Cheras (S2), (c) Putrajaya (S3), (d) Kuala Selangor (S4), (e) Petaling Jaya (S5), (f) Shah Alam (S6), (g) Klang (S7), (h) Banting (S8)

Overall, September 2019 provided high concentrations of PM2.5 at each station. Throughout this year, there was a Southeast Asian haze from Indonesia that affected several countries in Southeast Asia from February to September 2019. The affected countries are Brunei, Indonesia, Malaysia, the Philippines, Singapore, Thailand, and Vietnam. Malaysia was affected in August and became worst in September. This problem is a long-term issue that occurs in varying intensity during each regional dry season. Forest fires mainly cause it due to illegal clearing carried out on behalf of the palm oil industry in Indonesia, especially in the islands of Sumatra and Borneo, which spread rapidly during the dry season. Southwest and Northeast monsoons hit Malaysia annually from June to September and November to March, respectively. June to September is the dry season, and most air masses primarily originate from the west or southwest, particularly in Sumatra, Indonesia (Khan et al., 2016; Othman et al., 2022). In 2020, PM2.5 concentrations were lower than in 2018 and 2019. In April 2020, the first movement control order (MCO 1.0) was implemented. At that time, all activities are closed to prevent people from going out to reduce infection and human-to-human interaction. The MCO resulted in lower air pollution levels due to a decrease in the number of cars on the road. Reducing vehicle emissions and other anthropogenic activities with local origins will eventually create a clean ambient (Elengoe, 2020).

Based on Fig. 5, the graph shows graph seasonality based on the distribution of the concentration PM2.5 because there is an increase between June—September in 4 years for all stations. The southwest monsoon occurs between June and September, and the northeast monsoon between November and March. According to the Malaysian Meteorological Department, inter-monsoon seasons occur in Malaysia from April to May and October to November. PM2.5 levels were higher between June and September during the Southwest monsoon. According to Ahamad et al. (2014) finding, there is less air pollution during the northeast monsoon season (wet season) than during the southeast monsoon (dry season). Tai et al. (2010) study revealed that rainfall and a decrease in temperature simultaneously contributed to the reduction in PM2.5 concentration (Usmani et al., 2020). The high temperature usually results in warm weather and reduced relative humidity, which promotes local and regional biomass burning and increases the number of particles in the air. The main factor is the temperature increases which causes significant resuspension of particles and high evaporation rates (Juneng et al., 2011; Latif et al., 2014). In addition, the direction of the wind coming from Sumatra, Indonesia, brings the burning of biomass, especially when there are many particulate matters in the atmosphere which is a contributing factor to air pollution (Latif et al., 2014). This haze episode occurred in 2013 (Ho et al., 2014; Jaafar et al., 2018) and the latest in 2019 (Othman et al., 2022; Zainal et al., 2021) which affects all Southeast Asia countries, including Malaysia, Thailand, Singapura and Vietnam.

Fig. 5
figure 5

PM2.5 concentration for Station 1- Station 8

There are spatial changes in the year 2019 due to the haze episode, and to see the difference during the period, interpolation is used to get a better picture based on the map. Spatial interpolation in this study uses Inverse Distance Weighting (IDW) method to interpolate the study area in a month in 2019.

In 2019, Fig. 6 showed that the concentration of PM2.5 was medium in the Klang Valley from January to July and October to December. Transportation, manufacturing, and many other sectors cause the release of air pollutants in the atmosphere, especially particulate matter, in minimum amounts. Therefore, the concentration of PM2.5 in the atmosphere is at the average between 12–35 (µg/m3) due to the uncontrolled release of the gaseous. In March 2019, from the map (Fig. 6), we can see that the concentration of PM2.5 increased but still on a moderate scale caused by pre-haze. (Zainal et al., 2021) said that pre-hazes were primarily from Malaysia and a minor amount from Sumatra. Open burning, automobile emissions, industrial fires, and agricultural practices, including burning peat lands and rice paddies for replanting, all contribute to Malaysia's haze. From April to July, the concentration of PM2.5 improves better than the previous month. In August, the crisis forest burn in Indonesia started and affected many countries in Southeast Asia (Othman et al., 2022). For Malaysia, during August, there are spatial changes PM2.5 breakpoint from 36.4–42.8 µg/m3. In September, the concentration of PM2.5 was higher and unhealthy based on the breakdown point of PM2.5 concentration from USEPA (Fig. 6). This situation is simultaneous with the heatmap (Fig. 4) which the concentration of PM2.5 is between 65.1–78.6 µg/m3, close to the Unhealthy level from the breakpoint of PM2.5 based on USEPA. The primary source of the increases in PM2.5 concentration was a haze event brought on by agricultural operations in Sumatra, Indonesia before it spread to Malaysia (Zainal et al., 2021).

Fig. 6
figure 6

The spatial interpolation concentration of PM2.5 using Inverse Distant Weighting (IDW) from January to December 2019

5 Conclusion

The results obtained from nonparametric regression analysis indicate that all monitoring stations (S1-S8) exhibit significant findings (p-value < 0.05), leading to the rejection of the null hypothesis. This suggests that there is no discernible relationship between PM2.5 concentration and time. Across all stations, a consistent trend in PM2.5 concentration has been observed over a four-year period. Specifically, the yearly trend records PM2.5 concentrations falling within the Unhealthy stage (35.5–55.4 µg/m3) at the breakpoint defined by USEPA. However, an anomalous trend occurred in 2019, characterized by sudden increases in certain months due to haze episodes. During the Southwest monsoon season (June to September) from 2018 to 2021, PM2.5 concentrations followed a seasonal pattern. Conversely, concentrations dropped during the wet season (November to March, known as the northeast monsoon) and the Inter-monsoon seasons (April to May). Analyzing temporal changes in 2019, most stations fell into the moderate category based on the PM2.5 breakdown point in the Air Pollutant Index (API) for a specific period. Notably, September 2019 exhibited unhealthy readings according to the monthly heat map spanning 2018 to 2021. Urban areas generally experience higher PM2.5 levels due to increased human activities and development. The Putrajaya station stands out with a high PM2.5 concentration breakpoint (78.6 µg/m3) compared to other stations. Leveraging Inverse Distance Weighting (IDW), the provided map highlights PM2.5 concentrations in the area, particularly emphasizing higher levels (indicated by red color) during September 2019 compared to other months. It’s important to note that data collection for PM2.5 concentration began only in July 2017, resulting in a constrained study period with relatively short data duration. To enhance accuracy and expand the study’s scope, extending the time period for future research is recommended. Overall, examining the temporal and spatial trends in PM2.5 concentrations in the Klang Valley is important for public policy, environmental sustainability, and international efforts to combat air pollution and climate change, in addition to being vital for the health and welfare of the local population.