Introduction

Carbon monoxide (CO) is a colourless, odourless and tasteless gas present in the atmosphere. The primary sources of CO are combustion of fossil fuel, biomass burning and oxidation of methane and biogenic hydrocarbons (Holloway et al. 2000). In the atmosphere, CO can be produced both naturally and anthropogenically. Natural sources include forest fires, oxidation of plant biogenic hydrocarbons (Vadrevu et al. 2013), volcanic eruptions (Simarski 1992), oceans (Ohta 1997; Zuo et al. 1997; Stubbins et al. 2006) and coastal areas (Jones and Amador 1993) whereas anthropogenic sources include combustion of agricultural waste and fossil fuels (Crutzen and Andreae 1990). Although CO produced during normal animal metabolism in low quantities, it is highly toxic in moderately high concentration (~ 35 ppm) towards animals as well as human beings (Buchwitz et al. 2005). The importance of CO in the atmosphere is that it controls the photochemical production of ozone, through its reactions with OH radical (Brühl and Crutzen 1999). Thus, CO acts as a major sink of hydroxyl radical (OH). Depending upon the concentration of ambient OH radicals, the atmospheric lifetime of CO may vary between a few weeks to 2 months (Lawrence et al. 2003). The reactions (including photolytic ones) involved in the formation and destruction of CO in the atmosphere are given in Table 1 (Ghosh et al. 2015). Though a non-greenhouse gas, CO controls the production of major greenhouse gases—methane, ozone and CO2—in the troposphere (IPCC AR5). Due to this, CO has an indirect radiative forcing effect (0.18–0.29 Wm−2) in the atmosphere and thus uplifts the global warming effect (Logan et al. 1981; Thompson 1992; Brühl and Crutzen 1999). All these factors make CO an important air pollutant (Lalitaporn et al. 2013) (Manju et al. 2018). Earlier studies have already shown that remote locations of Northern Hemisphere have nearly double the CO concentrations as compared to Southern Hemisphere (Hamilton and Mansfield 1991).

Table 1 Details of the study grids

Ground-based measurements, due to their limited spatio-temporal coverage, are unable to explain fully the behavioural pattern of CO on a regional basis. As a result, a long-term model–based analysis of multiple datasets, as provided by MERRA-2 output, is a potential mechanism for understanding the regional contribution of CO in the atmosphere. In the past decade, MERRA-2 reanalysis data have been extensively applied in many research works (Reichle et al. 2017; Wargan et al. 2017). However very limited study related to reanalysis of CO data in the Southeast Asian region are conducted so far (Amnuaylojaroen et al. 2014). As spatial and temporal variability of CO is very high, a detailed monitoring of these patterns is therefore important in understanding the behaviour of CO on a regional basis.

This paper aims to present a comprehensive study on the spatio-temporal variability of CO over the Eastern Indo-Gangetic Plain and in parts of Southeast Asia using 10-year MERRA-2 reanalysis data. A long-term CO trend over these regions, including some of the major cities of the Eastern Gangetic Plain, is investigated with an objective to examine the annual variations. A comparative study between MERRA-2 surface CO data with ground-based experimental data was carried out to study the trends of CO surface concentration in the metropolitan city of Kolkata. The reanalysis data was also used to assess the seasonal variation of CO concentration in all of the study regions. In addition to surface concentration data, variation of chemical emission, production and loss over the 10-year study period is also investigated. Finally, a simple linear model was proposed with the aim to evaluate the dependence of surface CO concentration with corresponding CO emission, production and loss.

Materials and methods

Study region and meteorology

This paper focuses on the long-term variability of CO in the Eastern Indo-Gangetic Plains (IGP) and parts of Southeast Asia, for reasons detailed here after. The availability of diverse resources has made IGP one of the densely populated areas of the world. Dynamic industrial and economic activities—both small and large scale—have made this region highly polluted with trace gases and aerosols (Aggarwal et al. 2004; Sheel et al. 2010). This has resulted in exceptionally high levels of CO concentration in this region especially in the eastern part. Further details of this study site have been discussed in details in the authors’ previous papers (Ghosh et al. 2013, 2017).

Southeast Asia is geographically bounded by India on the west, China to the north, Oceania and Pacific Ocean on the east and Australia and Indian Ocean to the south. This region experiences frequent seismic and volcanic activities as it lies in the intersection of geological plates. It is the third most populated regions of the world after South and East Asia. This region experiences mainly tropical hot and humid climate and has a distinct monsoon season. However the northern parts—northern Vietnam and Himalayan region of Myanmar—have subtropical climate. Even though this region’s economy is primarily dependant on agriculture, the newly industrialised countries include Indonesia, Malaysia, Vietnam, Thailand and Philippines. The Southeast Asian countries present varied topographies and the same make them interesting study regions from the perspective of CO.

With growing industries and high population density in both Eastern IGP and Southeast Asian regions, these areas represent a larger potential for increasing pollutant emissions. In the rural areas, fossil fuel combustion from agricultural waste and wood is a primary source of pollutants whereas in urban and industrialized areas an increasing demand for energy prompts for extensive utilization of fossil fuel (Sathaye et al. 1994). Alongside, plenty of coastal areas with a belt of active volcanic regions have prompted the authors to carry out a comprehensive study of CO in these regions for a long-term period.

Based on the preliminary observations of the spatial distribution of CO (as discussed in “Spatial variation of CO” section), specific regions from the broader study area are selected for a study of CO in details. The study region is broadly divided into 3 major parts which are selected on the basis of locations of CO hotspots. These three regions are further divided into subsets, based on topography and regional characteristics, as follows:

  • Region A—The industrial belt of eastern Indo-Gangetic Plains (IGP) and the Indo-Gangetic Delta region.

    • A1—Northern Bihar industrial area; A2—Jharkhand and West Bengal industrial area; A3—West Bengal Agricultural area

  • Region B—Jakarta (Indonesia) and adjoined ocean region.

    • B1—Java Sea adjacent to Jakarta (ocean area); B2—Jakarta (city)

  • Region C—Parts of Myanmar, Thailand and Vietnam

    • C1—Forest area of Northern Myanmar; C2—Agricultural area of Central Thailand; C3—Industrial area of Northern Vietnam

Figure 1 represents these three study regions (A, B and C) along with their subsets, where the areas selected for study are defined by rectangles in the figure. Hereafter in the paper, the study zones will be referred to as A, B and C respectively. The grid dimensions and further details of study grids are given in Table 1.

Fig. 1
figure 1

(inset) Schematic representation of Southeast Asia map showing various study grids A, B and C. Subsets of A: A1 and A2–Thermal power plants of Eastern IGP, A3–Agricultural area of Central West Bengal; Subsets of B: B1—Java Sea, B2—Jakarta; Subsets of C: C1—Forest area of north Myanmar, C2—Agricultural area of central Indonesia, C3—Industrial area of North Vietnam

Alongside, the surface CO concentration is studied for eleven selected locations in the eastern IGP mainly within A1 and A2, which are metropolitan or industrial cities with dense populations, in order to get a better insight about the seasonal variations. Details of these sites are discussed elsewhere (Ghosh et al. 2013).

Collection of data

MERRA-2 data

The modern-era retrospective analysis for research and applications, Version 2 (MERRA-2) developed at Global Modelling and Assimilation Office (GMAO) is an updated retrospective analysis of the full modern satellite era (Bosilovich et al. 2015). The reanalysis data, available from 1980 to present, is an enhanced and highly developed data assimilation system which incorporates hyperspectral radiance, GPS-radio occultation data, microwave data and numerous other datasets (Rienecker et al. 2011). This data is produced using the Goddard Earth Observing System Model, Version 5 (GEOS-5) consisting of a collection of model components which are in accordance with the modular architecture of the Earth System Modelling Framework (ESMF). MERRA-2 provides the first long-term global reanalysis dataset of aerosols consisting of satellite-based observations from NASA’s MODIS, MISR and AERONET which are computed using the Goddard Chemistry, Aerosol, Radiation and Transport (GOCART) model (Chin et al. 2002). MERRA-2 includes the following emission sources: anthropogenic emission sources from AeroCom Phase II (Lasko et al. 2018), volcanic and biomass burning emission sources from the Reanalysis of the Tropospheric Chemical Composition, version 2 (RETRO-2), the Global Fire Emissions Database, version 3.1 (GFED-3.1) and the Quick Fire Emission Dataset, version 2.4r6 (QFED-2.4.r6) (Rienecker et al. 2011). The NASA Giovanni Portal (disc.sci.gsfc.nasa.gov/giovanni) gives free access to these datasets for the environment. MERRA-2 is a modernized and improved version of the original reanalysis product of GEOS-5, the MERRA (Molod et al. 2015; Takacs et al. 2016). The MERRA-2 evaluated observations are also described in McCarty et al. (McCarty et al. 2016).

This paper presents the study of MERRA-2 reanalysis surface CO data from Jan 2007–Dec 2016. The data has a monthly temporal coverage with a resolution of 0.5° × 0.625°. In conjunction with this, columnar CO emission, CO chemical loss and production, data from MERRA-2 are also collected. In order to elucidate the persistent high concentrations of surface CO in all the subzones of region C, fire product data from MODIS C6 are also considered. Details of the data collected from MERRA-2 are presented in Table 2.

Table 2 Details of MERRA-2 data

Active fire data

Near real-time (NRT) active fire data are collected from NASA’s fire information for resource management system (FIRMS) to identify the fire hotspots in the South East Asian region. FIRMS distributes the NRT data within 3 h of satellite overpass from NASA’s moderate resolution imaging spectro-radiometer (MODIS) and NASA’s visible infrared imaging radiometer suite (VIIRS) (Giglio et al. 2016). The data used in the present study are collected from MODIS Collection 6 NRT hotspot/active fire detections (MCD14DL) available online from https://earthdata.nasa.gov/firms.

Ground-based measurement data

In order to conduct a comparative study between surface CO concentrations obtained from MERRA-2 reanalysis and experimental data, ground-based measured data for surface CO are also collected from Central Pollution Control Board of India over Kolkata (a station in Eastern IGP) for the present study.

Results and discussion

Spatial variation of CO

Figure 2 represents the spatial variation of annual mean CO surface concentrations over the entire study region from January 2007 to December 2016 as derived from the MERRA-2 reanalysis data. These maps are useful in identifying several regional hotspots of CO over a 10-year long period. Surface CO shows significant variations over the major portions of the landmass as compared with those observed over the oceans. The same indicates that high values are primarily contributed by the industries, like power plants, integrated steel plants, steam reforming of natural gas, hydrogen production from liquid hydrocarbons, gasification of coal, biomass and in some types of waste-to-energy gasification facilities in those regions. The coastal influence—though present—is uniform throughout. Annually averaged values range from as low as 50 ppbv for most of the regions to values exceeding 1000 ppbv in the others. Distinct regions of high CO concentrations are observed, based on which specific study regions along with their subgrids are chosen as mentioned in “Study region and meteorology” section. Distribution of CO (monthly average) in the eight zones combined range from 750 ppbv (C1) to 3781 ppbv (A2). Very high values are consistently observed in A2, A3 and B2 as compared with the other regions during the entire period of study. In zone A2, the locations are identified to be the belt overlapping the industrial regions of Kharagpur, Jamshedpur, Bokaro and Dhanbad. Major thermal power plants and other large and small scale industries are located here around the several coal belts. Emissions from these industries and power plants are the major contributors of CO in this area. A detailed study of CO distribution in these industrial cities is further discussed in “Surface CO distribution on some specific locations” section. Region A3, the agricultural area of central West Bengal, is well known for growing predominantly paddy throughout the year. Persistent high concentration of CO in this region may be due to various reasons: (a) incomplete oxidation of methane (CH4) which eventually produces CO on a photolytic route (refer R5–R8 of Table 3); (b) combustion of agricultural waste and (c) advection from adjacent industrial areas.

Fig. 2
figure 2

Spatial variation of CO surface concentration from 2007 to 2016

Table 3 Reactions involved for formation and destruction of CO

Zone B2, the region surrounding Jakarta, is the only region in southern part of Southeast Asia where very high levels of CO are observed consistently for the entire study period. Jakarta is the second largest urban area of the world after Tokyo–Yokohama, and it boasts of a speedy economic growth. Situated in the northwest coastal area of Java, its northern part is mostly plain land whereas the southern part has hilly areas. With high population density in the metropolitan area, nearly 10 million vehicles drive on the roads daily. This city is over-burdened with transportation problems (Williamson 2007). Soaring population levels and large number of automobiles are the main cause of high amounts of CO throughout the year in this region. This particular location being near the coastal area experiences long range convective transport (inflow) of CO. Besides, continuous volcanic eruptions take place in Java—one of the most active volcano regions of the world—which may also contribute towards the high columnar CO levels around Jakarta.

Zone C, comprising of parts of Myanmar (C1), Thailand (C2) and Vietnam (C3), is located in landlocked regions. The three subzones in this region are specifically chosen for further detailed study for the following reasons: Region C1, in northern Myanmar, is mostly covered by forests. The high concentrations in these areas may be explained on the basis of large scale deforestation, forest fire and gaseous transport from the nearby sources. C2 and C3 are rural agricultural land of central Thailand and industrial area of northern Vietnam respectively. In order to elucidate the persistent high concentrations of surface CO in all the subzones of region C, fire product data from MODIS C6 are also considered. The output of fire data is in the form of fire pixels. These pixels are represented as detection confidence values in percentage (%). Further details about confidence levels of fire pixel detection can be found in MODIS active Fire Product User’s Guide (Giglio 2007). The relation between fire pixel detection range of values and detection confidence class are given in Table 4. Table 5 represents the monthly average fire pixel values for zone C1, C2 and C3 respectively. All the values belong to the nominal class, i.e., from 30 to 80. As major values lie towards the upper threshold (above 60), it may be deduced that occurrences of forest fire and probable agricultural waste burning contributes a considerable amount of CO for the land locked zone of C.

Table 4 Fire pixel detection confidence class and range
Table 5 Maximum, minimum and median monthly fire pixel values for zone C1, C2 and C3 respectively

Long-term variation of CO

Figure 3 represents long term distribution of CO for all the eight study regions. The nature of these plots presents a sinusoidal pattern for all the study areas among which zones A1, A2 and A3 have the most predominant periodic patterns. These three zones are followed by C1, C2 and C3 respectively, where the regular periodic sine wave nature is occasionally disturbed due to an abrupt change in the pattern in some months. The long-term variation of B1 and B2 shows no particular pattern except that both being similar and B2 having a higher range of values than B1. This may be expected as B1 is overlapping the coastal area near an ocean and adjacent to B2—a highly populated and industrialised city. In order to further interpret this pattern, the authors have analysed seasonal variation of CO in all these study areas (see “Seasonal variation of CO” section). Moreover the long-term trend on all the study zones does not show much variation even in a span of 10 years. That is, over the years, no prominent increase in the intensity of the spots can be observed. The long time (10 year) series plot (Fig. 3) rather shows a slight decreasing trend (trend line not included in this particular plot) indicating that CO surface concentration is decreasing over time. It is thus observed that in spite of speedy economic growth and an increase in urbanization and industrialisation of these regions, the meteorological factors—wind speed, wind direction, turbulence and atmospheric stability—must have contributed in convective and diffusive transport of CO from the hotspot regions regularly.

Fig. 3
figure 3

Long-term variation of surface CO concentration from 2007 to 2016

Seasonal variation of CO

The monthly spatial images (not presented in the paper) along with the long-term variation of surface CO concentration, strongly indicate towards seasonal variation of CO in all the selected study zones. In order to analyse this in details, therefore, the seasonal variation of surface CO concentration values are plotted in Fig. 4a–c. It is observed that the three broad zones A, B and C lying in three different geographical locations, each have distinct seasonal characteristics. Zones A1, A2 and A3 lying in eastern IGP have four seasons, namely winter (Jan–Feb), pre-monsoon (Mar–May), monsoon (Jun–Sept) and post-monsoon (Oct–Dec). Figure 4a represents the seasonal variation of regions A1, A2 and A3 respectively. The general patterns in all three regions are similar. Among the four seasons mentioned above, the post-monsoon season has the maximum surface CO concentration in all three regions. Only few exceptions are—regions A1 and A2 in 2012 and region A3 in 2010, where the seasonal average of monsoon is marginally less than the winter concentrations. Followed by the post-monsoon values are the seasonal values of winter, monsoon and pre-monsoon respectively. The notable feature here is that the values and patterns between the two types of zones—industrial (A1 and A2) and agricultural (A3) are very much similar. Probable reason is that the industrial and agricultural regions lie just next to each other and CO having a relatively longer chemical lifetime (~ 3 to 4 h) can be easily dispersed and distributed by convective transport on a regional basis. Furthermore, variation of surface CO on some specific locations of regions A1, A2 and A3 is further discussed in “Surface CO distribution on some specific locations” section.

Fig. 4
figure 4

a Seasonal variation of CO surface concentration from 2007 to 2016 for A1, A2 and A3. b Seasonal variation of CO surface concentration from 2007 to 2016 for B1 and B2. c Seasonal variation of CO surface concentration from 2007 to 2016 for C1, C2 and C3

Figure 4b presents the seasonal variation of surface CO on regions B1 and B2 respectively. According to the Koppen Classification (Koppen 1990; Essenwanger 2001), Jakarta, the region encompassing B2, experiences tropical monsoon type of climate which has only two seasons—wet (Oct–May) and dry (Jun–Sept). Based on this classification, the seasonal distribution on surface CO in B1 and B2 has been divided into two seasons, wet and dry seasons. In both B1 and B2, CO concentrations are higher in wet or monsoon season than in the dry season. For B1, not much variation in surface CO concentration is observed. This is expected as B1 lies totally above the Java Sea\along the coastal area towards north of Jakarta. The plot for Region B1 (being covered by water mass) shows the characteristics of a coastal area.

Figure 4c represents the seasonal variation of C1–C3. Region C1, the forest area of North Myanmar has a humid subtropical climate with three seasons—summer (Mar–Apr), monsoon (May–Oct) and winter (Nov–Feb). Seasonal average of surface CO concentrations is maximum for summer, followed by winter and monsoon. This region of Myanmar lies to the south of the northern mountains and has got dense forest. However in recent times this region has been affected by deforestation due to mining and other activities. Additional factor contributing towards high CO concentrations in the summer months is forest fires. High monthly average fire pixel values in Table 5 lying close to the upper threshold of the nominal confidence class further supports this explanation. Region C2 represents the central plains of Thailand which is on the basin of Chao Phraya River. This region of Thailand is also part of the rice bowl of South East Asia as it is one of the largest producers of rice in this region. This part of Thailand experiences Tropical Savanna climate with three distinct seasons—summer (Mar–May), monsoon (Jun–Oct) and winter (Nov–Feb). The seasonal variation of C2 is plotted based on these seasonal divisions. Contrary to C1, the seasonal values of winter in C2 are higher than summer. However, similar to C1, the monsoon CO concentrations are minimal. Industrial region of Northern Vietnam (C3) has primarily two seasons—winter (Oct–Mar) and summer (Apr–Sep). The winter season is cool and dry whereas summers are hot and rainy. Seasonal variations of CO in C3 is similar to that in C2 as winter concentrations are higher than summer. Both C2 and C3 are landlocked areas. Higher concentrations in winter, as compared to summer may be possibly due to accumulation of CO from local emission and chemical production as well as weak dispersion due to poor convection during the winter months as compared to summer.

Surface CO distribution on some specific locations

The general statistics of CO surface concentration in different stations of A1, A2 and A3 over the period of 10 years from 2007 to 2016 is presented in Table 6. The mean concentration is highest over Dhanbad (3710.5 ppbv), followed by Jamshedpur (3330.2 ppbv) and Bokaro (3281.7 ppbv). All these places are industrial hubs, and it is expected that the CO levels will be higher. The city of Dhanbad—also known as the Coal Capital of India—houses many of the largest coal mines of India (Anand 2006). Due to this reason, several coal washeries and power plants are also located here. Similar to Dhanbad, Bokaro is also one of the most industrialised zones of India primarily based upon coal. Out of these three cities, the variability of surface CO is found to be highest over Jamshedpur primarily due to the incomplete combustion and ineffective flaring of the blast furnace gas. The major steel plant in Jamshedpur has coal fired captive power plants. The mean CO concentration over Jamshedpur during 2007–2011 was 3438.4 ppbv as compared with 3222.1 ppbv during 2012–2017. This indicates that there is 6.3% decrease in CO levels over Jamshedpur. This is primarily due to some of the measures that the government has recently enforced for combating air pollution, like use of cleaner fuels and the implementation of better engines in vehicles to lower emissions (Mallik and Lal 2014). The mean CO concentration over Kolkata, during 2007–2011 was 1719.9 ppbv against 1759.5 ppbv during 2012–2017. This corresponds to a 2.3% rise in CO levels over Kolkata. Despite various government measures to curb pollution, a disproportionate rise of vehicles along with heterogeneous traffic conditions could be the major cause of the rise of CO concentration.

Table 6 Statistics of surface concentration of CO (× 10 ppbv) over selected study regions of Eastern IGP

The average monthly variation of CO over these selected study locations is also presented in Fig. 5. Winter maxima were found to be a common feature for all these locations which may be attributed to various factors such as increase in emission from burning of wood, coal or other materials to combat cold, weak photochemical removal and stagnant weather conditions. In the IGP, winter seasons are characterized by a thick layer of fog and haze (Mallik and Lal 2014) and low wind speed which causes trapping of pollutants near the surface (surface layer inversion). However, during summer, increased photochemical loss due to the higher solar radiation and stronger vertical mixing results in the lower concentration of CO. The lowest concentration was observed during monsoon season, and this is a result of strong convection due to the influx of south-westerly clean winds from the Arabian Sea and the Indian Ocean.

Fig. 5
figure 5

Averaged monthly variations in CO concentrations over different locations during 2007–2016

Comparison of MERRA-2 surface CO data with ground-based data

A comparative study between MERRA-2 surface CO data with ground-based experimental data is carried out in order to investigate the trends of CO surface concentration in the polluted atmosphere of Kolkata. Based on the availability of data, ground-based CO data is collected from CPCB (Central Pollution Control Board) for 5 years (2012–2016). Several earlier studies on satellite based measurements or model reanalysis data have represented these data on a comparative basis with ground-based measurements (Chin et al. 2002; Jiang et al. 2017; Wang et al. 2018). Figure 6 represents the comparative study of the monthly mean surface CO between ground-based measurements (black) and MERRA-2 (red) over Kolkata during the period of 2012–2016. It is apparent from the figure that the trend is very similar for both sets of data (ground-based measured ones and MERRA-2 reanalysis output). The significant fact about the plot is that MERRA-2 reanalysis–based CO data is always an overestimate. Experimental values are much less than what is assimilated by MERRA-2.

Fig. 6
figure 6

Comparative study of surface CO between ground-based measurements (black) and MERRA-2 (red) over Kolkata during 2012–2016

Effects of emission, chemical production and chemical loss on CO concentration: Interpretation using statistical analysis

In the atmosphere, the presence of any trace species (under steady-state) at a location is an outcome of emissions (point, line, area and fugitive sources), net chemical production, chemical loss (mostly through complex chain of reactions) and convective transport (inflow and outflow). In the lower troposphere, all these factors have a noticeable influence on various atmospheric species. As discussed in “Surface CO distribution on some specific locations” section, in all the specific study zones, principal sources of CO seem to be the incomplete combustion of fossil fuel or burning of biomass. The effect of these factors is better understood if the chemical production and chemical loss of CO and emission parameters are studied in further detail. Variations of emission, chemical production and chemical loss from 2007 to 2016 for all the eight zones are plotted in Fig. 7a–c. It is evident from these figures that the emissions are much higher in all the study regions as compared with chemical production or loss. This implies that emission, mostly due to incomplete combustion, as a result of inefficient engineering design, is the principal driving force for the presence of CO in these two zones. Long term average of CO emissions shows the highest values in region B2 (Jakarta) consistently throughout 10 years of the study period. Jakarta is highly populated. Its speedy economic growth and high transportation burden accounts for such high emissions. This region is followed by A2, the industrial area of IGP, from emission perspective. Combustion of fossil fuel from numerous industries is the primary contributor of CO in this area. In all the three study regions of IGP, a periodic modulation between emissions, chemical production and chemical loss is observed. This feature clearly indicates a seasonal variation of all these three factors. This phenomenon also accounts for the prominent seasonal variation of surface CO in the regions of IGP.

Fig. 7
figure 7

a Variation of emission, chemical production and chemical loss from 2007 to 2016 for A1, A2 and A3. b Variation of emission, chemical production and chemical loss from 2007 to 2016 for B1 and B2. c Variation of emission, chemical production and chemical loss from 2007 to 2016 for C1, C2 and C3

In order to further investigate the contribution of each of chemical production, emission and chemical loss—towards a net CO concentration, a statistical analysis is performed using the multiple linear regression (MLR) method. This method attempts to model the relationship between a response (dependant) variable (MERRA-2-generated surface CO concentration in this case) with three independent variables (CO emission, chemical production and chemical loss in this case). Following equation for MLR analysis is considered:

$$ Y={a}_0+{a}_1{x}_1+{a}_2{x}_2+{a}_3{x}_3+{a}_4{x}_1{x}_2+{a}_5{x}_1{x}_3+{a}_6{x}_2{x}_3+{a}_7{x}_1^2+{a}_8{x}_2^2+{a}_9{x}_3^2 $$
(1)

Parameters of Eq. (1), namely a0, a1, a2, a3, a4, a5, a6, a7, a8 and a9 are estimated using MLR technique. This method calculates the best fit line for the observed data by minimising the sum of the squares of the vertical deviations from the data point to the line.

MLR analysis is carried out for each study zone separately, taking the 10 year dataset the results of which are presented in Table 7. The coefficient of determination, R2 is best for zone C [C1, 0.975; C2, 0.965 and C3, 0.799] followed by A [A1, 0.68; A2, 0.88 and A3, 0.865]. This result is indicative of the fact that CO concentration in both regions A and C are primarily contributed by the factors considered in the model. As already discussed in “Spatial variation of CO” section, predominant sources of CO in the six subzones of these 2 regions are incomplete combustion of fossil fuel or agricultural waste burning; hence other factors are negligible compared with them.

Table 7 Multiple linear regression results

However, in zone B, the coefficient of determination values are extremely poor [R2 (B1) 0.433; R2 (B2) 0.344], implying this model is not suitable for CO surface concentration estimation in the region. This result is in line with the emission and chemical production observations discussed above. Thus this zone is a unique study region where neither emission nor chemical production can justify the very high concentrations of CO throughout the time period of study. This uniqueness of B2 (recording the lowest R2 among all the eight study regions)—the zone encompassing Jakarta—is due to its geographical location. It is situated at the mouth of Ciliwung river on the Jakarta Bay which lies on the Northwest coast of Java. This bay is an inlet of Java sea. Apart from this, there are twelve more rivers which flow through the city. It is well-known that coastal eutrophication (from the river) largely increases the nutrient load in the estuarine water. This results in increased photoproduction of carbon monoxide in water due to an increase of dissolved organic matters derived from terrestrial areas (Gattuso et al. 1998). CO is also the second most abundant carbon-containing species which is produced from photochemistry of marine (coloured) dissolved organic matter (Mopper et al. 1991). With regard to the three major factors, namely emission, production and loss, we can infer the following:

  1. 1.

    Emission: High values exist in all the regions with existing point, line and area sources of CO like power plants, integrated steel plants where iron and steel making go in tandem. Also, there could be several sources of incomplete combustion like non-standard automobiles (auto-rickshaws etc.). The scenario is just the reverse in regions B1, C2 and C3. For, B1, there could be land-ocean interactions and a box model (MOPITT) output like the one used in MERRA-2 could be insufficient where convective inflow and outflow terms, weighted by numerical values of wind and wind-flow pattern are not accounted. Chemistry (refer Table 1) is not the key factor and convective transport plays a big role. With C2, lots of methane production is there, and the same is accounted in terms of production (ref Table 1) and not emission. With C3, industrial area of North Vietnam is on its way of development. Also, most of the industries like pharmaceuticals, food processing, electronics etc. do not produce much of CO.

  2. 2.

    Production: Net effect of the complex reaction mechanism for making and breaking of CO (as per Table 1) could be there in most of the regions with varied sources and sinks. Also, there could be effective underestimation by MERRA-2 due to a simple mechanism of CO production used.

  3. 3.

    Loss: In terms of chemistry (refer Table 1), the reaction-tree used in MERRA-2 is too simple than reality. Actually, much more complex chemistry exists in between land-atmosphere, atmosphere-ocean and land-ocean which are neglected in MERRA-2. Transport plays a vital role in the overlapping regions of land-ocean and land-atmosphere and such exchanges are not considered in MERRA-2.

While attempting multiple linear regression method, multi-collinearityFootnote 1 is not taken into account. In our case: The independent variables are: x1 = emission; x2 = production; x3 = loss. x3 = f(x1x2); x2 = f(x1). Both production and loss are correlated to emissions (refer Table 1). However, the correlations are not linear since we are not sure of the exact order of the reaction by which CO is produced and lost. Transport is the second important factor which is neglected in this model. This creates overestimates and underestimates on several occasions.

Summary

MERRA-2 reanalysis data for carbon monoxide is successfully used to interpret the CO surface concentration over eight different zones in eastern IGP and parts of Southeast Asia. This data is used to explore the spatial and seasonal variations of surface CO in these locations. A long-term (10 years) study from January 2007 to December 2016 shows a significant spatial variation with prominent hotspots for surface CO. Seasonal variation over these regions shows similar distribution patterns for the same time period.

High levels of CO over Dhanbad, Jamshedpur and Bokaro reflect ever-increasing levels of air pollution in these regions and the need for effective strategies to abate the rising emissions. Winter maxima for surface concentration of CO for all the regions of Southeast Asia are observed. Study regions of IGP showed pre-monsoon minima for seasonal surface CO. A comparative study of MERRA-2 reanalysis surface CO data with ground-based measurements for the station Kolkata represents a similar trend for both the datasets. However, the reanalysis data overestimate CO concentrations throughout the study period. Photo-production of CO in the atmosphere is primarily due to the photolytic conversion of carbon dioxide and formaldehyde. The second one is through two different routes. One is a direct production of CO and H2 and the other is through a non-elementary reaction where HCO is formed as a free-radical intermediate. This route is OH• dependent. Sources of HCHO are not considered in MERRA-2.

Multiple linear regression analysis in all the eight study zones suggests that high CO concentration in these zones is primarily due to a combination of local emission, chemical production and chemical loss. However, these three factors cannot be solely responsible for the occurrence of CO at a place as evident from the regression results of zone B. Due to its geographical location, CO is photo-chemically produced from dissolved organic matter. Meteorology has an important influence on the transport and chemistry of various trace gases in the atmosphere. Hence various meteorological parameters may also play a key role for the presence of CO in this zone.