1 Introduction

Urbanization is happening at a rapid pace. At present more than half of the world’s population (4.2 billion) is residing in urban areas (UN 2018) and it is expected to reach 68% by 2060 (UNDESA 2018). There are globally 1064 large urban areas having population above 0.5 million, 57.7% of them are from Asia, followed by 12.4% in North America and 11.4% in Africa (Demographia 2018). Around 10% of world population resides in megacities (with population over 10 million) which occupy just 0.2% of the earth’s land surface (Demographia 2018). Multitude of developmental activities in the urban areas are directly contributing to the air borne emissions, which in turn increase the exposure to the pollutants and accelerate the associated health effects. Continuing urbanization also poses threat to societal grand challenges such as climate change, energy security and urban transport (Blocken 2015).

Air quality in most of the cities worldwide fails to meet World Health Organization (WHO) guidelines for safe levels, putting people at a risk of respiratory disease and other health problems. Around 91% of the global population dwells in places where air quality exceeds the WHO limits and WHO (2018) estimates that nearly 4.2 million early deaths are due to air pollution induced lung cancer, respiratory diseases, stroke and heart diseases. Further, by 2060 the mortality is expected to rise between 6 to 9 million and costs 1% of global GDP (gross domestic product) which is equivalent to USD 2.6 trillion annually (OECD 2016). The effects of air pollution is alike between the developed and the developing world, but the damage is expected to be higher in developing countries (south and south-east Asia and Africa), owing to poor health care systems and management.

The scientific evaluation of urban air systems is pivotal for better management of the air – the natural resource we breathe. Air quality is measured in many parts of the world with a network of ground monitoring stations. Although the data is accurate, often, the information is highly sparse over the region, as the network density is skewed around specific urban centers or institutes (Mead et al. 2013). The cost involved in establishing the ground based monitoring can be very expensive and it demands additional resources for regular maintenance and calibration. It is impractical to establish the high density monitoring network for generating spatio-temporal air quality over a wide region. In this backdrop, there lies an urgent need to augment the existing ground based monitoring networks with numerical air quality simulations, which not only enhances the understanding of dispersion of air pollutants but also provides a high resolution spatio-temporal pollution contours, which are useful in devising the mitigation measures. Various kinds of what-if alternative scenario assessments can be visualized with the model simulations, prior to the actual implementation of the policies. For instance, effectiveness of avenue plantation in curbing kerbside pollution (Pugh et al. 2012; Gromke and Blocken 2015; Li et al. 2016), traffic route optimization for reducing the built-up of pollutants (El Fazziki et al. 2017) and reduction in O3 due to vegetation (Sicard et al. 2018) etc. can be tested.

Numerical models are applied in simulating the pollution contours at various geographical scales – regional or mesoscale (<200 km), urban scale (5–50 km), microscale (<2 km), building scale (<100 m), indoor or component scale (<10 m) (Toparlal et al. 2017) and human scale (<1 m) (Lateb et al. 2016). The complexity in the modeling unfolds, apart from the overall meteorology, as the scale gets finer, especially in terms of details about geospatial data and its physico-chemical interactions with the atmosphere. The effect of built-up geometry on microclimate (Lateb et al. 2016), thermal interactions induced eddies (Santiago et al. 2015), interaction with green cover (Jeanjean et al. 2016), and water bodies (Musy et al. 2015), space resolved emission inventory (Thunis et al. 2016) etc. are major factors. Knowledge of urban setup and its microclimate is an important input for environmental modelers, urban designers and engineers, and policy makers to plan sustainable built environments (Blocken 2015).

In general, the fine scale models need to be integrated with coarser scale models to get realistic initial and boundary conditions. For instance, urban scale models derives its boundary values of velocity field, mixing height, ambient temperature, turbulence kinetic energy (TKE) etc. from the mesoscale models. The seamless integration between the models of varying scales has remained as one of the challenges in urban air quality models. Framework for modeling near real time urban air quality in high resolution is not available at present for many cities, due to lack of holistic input data (Kumar et al. 2015) and modeling expertise. The cost for setting up of the simulation infrastructure is very low in comparison with the establishment of high density ground monitoring network, as the methodology is based on open-source software and freely available geospatial data. However, the domain knowledge is very much essential for developing the customized models. In this context, we aim to provide a complete picture on high resolution urban air quality modeling with the objectives — a) to provide the current status on broad aspects associated with the urban air quality modeling, b) to review the existing methods applied in urban air quality modeling and c) to identify the challenges for further improvement in urban air quality modeling and way forward. For this purpose, the article is organized into 5 Sections, as follows: Section 1 provides overall view about the urban microclimate processes and various urban scale models. Section 2 discusses the significance of high resolution geospatial data and its development methods. Section 3 provides a review of various regional scale chemical transport models, in particularly models based on Eulerian approach. Subsections within Section 3, discusses the meteorological models, gridded emission inventory, urban canopy parameterization and it also discusses the need for computational fluid dynamic models for accurate urban air quality modeling. And, validation methods are mentioned in Section 4.

2 Urban Microclimate

Urban microclimate is very dynamic within the atmospheric boundary layer (ABL) and it is significantly affected by the geospatial obstacles and thermal properties of buildings, roads, vegetation, water bodies etc. Especially, the wind flow pattern between the group of buildings can be very complex to account, and involves the formation of stagnation points and zones of recirculation (Easom 2000; Piringer et al. 2007), which in turn affects the dispersion of pollutants (Chang and Meroney 2001).

ABL is the lowest region of the troposphere which is directly influenced by the earth’s surface and experiences rapid fluctuations (time scale of about an hour or less) in temperature, wind, moisture and mixing height (Bonner et al. 2010). Dispersion of pollutants is mainly determined by atmospheric stability class (for instance, classes A-G: ranging from extremely unstable to extremely stable (Pasquill 1961)) and the vertical mixing height (Pul and Holtslag 1994). Spatio-temporal variation of vertical mixing height may range from <100 m to several kilometers (Hennemuth and Lammert 2006). Particularly in urban areas the ABL can be aptly called as urban boundary layer (UBL) which is also significantly influenced by the urban features. The bottom portion (roughly 10%) of UBL consists of surface layer (SL) which composes of two sub layers – (a) roughness sub-layer (RSL) nearer to ground and (b) inertial sub-layer (ISL) on top of RSL (Fisher et al. 2006). Within the RSL, the built-up area forms an urban canopy layer (UCL) and the configuration of buildings play an important role in determining the pollutant dispersion at microscales (Huq and Franzese 2013). Whereas, the overall dispersion over the urban area is significantly governed by the characteristic length scales of UBL turbulence (order of couple of meters to a meter (Temel et al. 2018)) rather than the UCL scales (order <0.3 m (Okaze et al. 2015)) which are influential only nearer to the sources (Franzese and Huq 2011).

Studies on urban microclimate have started with the work of Luke Howard in nineteenth century on ‘The Climate of London’ and early studies have observed the differences between rural-urban temperatures. Since 1960s, the Urban Heat Island (UHI) effect was studied with statistical methods, and in later 1970s the energy budget models were developed to explain UHI (Mills 2014). Urban studies based on flux measurements and scaled up physical models were prevalent during 1980s. Later on, numerical models assisted with field campaigns were dominated during 1990s and from 2000 onwards surge in development of more realistic urban climate models was noted (Mills 2014). The urban microclimate studies can be categorized into three major groups – (a) observational approach: based on wind tunnel tests (Pournazeri et al. 2012; Cochran et al. 2015), thermal imaging (Bechtel et al. 2017), and field measurements (Montazeri and Blocken 2013; Xia et al. 2014), (b) computer based numerical simulations (Mirzaei and Haghighat 2010; Mirzaei 2015): such as semi-empirical methods (based on Gaussian plume, Chavez et al. 2012) and computational fluid dynamics (CFD) modeling (Temel et al. 2018; Blocken 2014; Miao et al. 2013), and (c) theoretical approach (Santamouris et al. 2012): in which effectiveness of various mitigation methods are tested even before their implementation (for instance, effectiveness of green zones, influence of urban building material with varying optical properties (Synnefa et al. 2012) are assessed for mitigating UHI effect).

Urban microclimate is well studied by the city planners and architects than the environmental modeling groups and have sufficiently discussed about the wind flow around buildings and thermal comfort. However, these studies didn’t focus on the pollutant dispersion and its validation with measured data. Out of the 183 urban microclimate studies (from 1998 to 2015) as reviewed by Toparlar et al. (2017), 105 studies were validated with the measured velocity or temperature fields. Very few studies have touched upon the air quality. About 50% of the total studies were performed by ENVI-Met software (Bruse and Fleer 1998), which is a non-hydrostatic 3D model with surface-plant-air modules (heat and mass transfer, fluid flow, vegetation interactions etc.) and capable of modeling in high spatial resolution of order 0.5–1 m. The software employs only single turbulence model (Yamada and Mellor), and limited options for grid generation and wall functions are notable drawbacks (Middel et al. 2014; Maggioto et al. 2014). ENVI-Met having a graphical user interface editor with limited yet easy-to-use features is best suited for architecture applications and urban planning (Paas and Schneider 2016). For instance, the model is used for quantifying the urban thermal comfort (Kariminia et al. 2015; Wang et al. 2015) and building energy demand (Yang et al. 2012). Another software, SOLENE-microclimate (Malys et al. 2015) is the implementation of thermo-radiative model with the CFD (Code-Saturne), which accounts for the two-way feedback between microclimate and built-up area and is mainly designed for building energy simulation studies. The model integrates the radiation phenomenon with thermo-aeraulic phenomenon over the urban morphology (Gros et al. 2014). The Urban CFD (Coirier et al. 2005) on the other side, is designed to simulate velocity field, turbulence and contaminant transport at building scale. The model generates 3D hexahedral computation mesh for CFD simulations from shape file of the city. Yet another model, Urban Canopy Model (UCM) (Kusaka et al. 2001) can be used to calculate the energy and momentum transfers between urban surface and atmosphere with more emphasis on realistic urban geometry than slab models (unified parameterization of urban features, mostly used in mesoscale meteorological models). Effect of building height in uneven incidence of solar radiation and reflection of longwave radiation are included in the model physics. Unlike other models, UCM can be easily integrated with numerical weather models for parameterizing the effects of urban canopy. Kang et al. (2016) have studied the UHI with varying built-up area of a city using the UCM, and Bhati and Mohan (2016) also reported the UHI effect on Delhi using UCM parameterization in a meteorological model.

In order to correctly employ the urban microclimate models including the geospatial interactions, many researchers have stressed the need of an accurate 3D (3-dimensional) geospatial data (Kariminia et al. 2015). Various aspects of geospatial data include – dimensions of built-up area, road network, vegetation mapping, water bodies, open land and their albedo, thermal properties, boundary interactions and others.

3 High Resolution Geospatial Data

High resolution geospatial data featuring the urban landscape is crucial in accurately simulating the urban meteorological conditions for dispersion modeling (Kang et al. 2016). The major challenges for development of a 3D city model, especially in developing countires are due to lack of high resolution geospatial data, city level demographic information, road network, building dimensions and geometry. Storage, processing and retrieval of spatial data and its interoperability among various 3D modeling suits is also a bottleneck. Broadly, 3D model of a city can be developed by three methods, namely (a) Photogrammetry, (b) Light Detection and Ranging (LiDAR) – 3D based point clouds, and (c) algorithm based or computer simulated. Photogrammetry is the science of making measurements from photographs, especially for recovering the exact positions of surface features. The 3D model of the surface can be generated using stereo-photogrammetry which involves estimating the three-dimensional coordinates of points on an object employing measurements made in two or more photographic images taken from different positions (Kraus 2011). High resolution IKONOS, QuickBird (https://www.satimagingcorp.com/satellite-sensors/ikonos/), and other stereo satellite images with software packages i.e. Satellite Image Precision Processing (SAT-PP) and CyberCity Modeler are used to extract digital surface models (DSM) and 3D city models (Kocaman et al. 2006; Hirschmuller 2008). These methods although offers high accuracy, but are very expensive to measure, process and store the information. Micro-aerial-vehicle or flying drone based survey is widely used for generating 3D surfaces for engineering, architecture and construction applications (Daftry et al. 2015). The scale of operation using drones is very limited and the method is prone to inaccuracies while reconstructing large-scale objects (Daftry et al. 2015). LiDAR data is emerging as a cost-effective alternative to traditional surveying techniques, is based on an optical remote-sensing technique that uses laser light to sample the surface of the earth. The survey produces highly accurate point cloud data, that can be processed to obtain the exact 3D geospatial features (Fig. 1a) and DSM (Biljecki et al. 2017; Rottensteiner et al. 2014). LiDAR data is to be obtained through specific field surveys and it requires high computational facilities, which are its notable drawbacks. OpenTopography (http://opentopography.org/) portal provides freely available LiDAR data, but its coverage is limited to some parts of the world only. Yan et al. (2015) may be referred for more details on urban land cover classification using LiDAR data. Some other methods include acquisition of 3D city information through sensors mounted on the vehicles (Frueh and Zakhor 2003).

Fig. 1.
figure 1

3-D geospatial features of a city developed based on (a) LiDAR data (www.opentopography.org) processed using ArcGIS-10.2 and (b) OpenStreetMap data (www.openstreetmap.org) processed using Blender-2.78

In order to achieve the uniformity among the multitude of 3D data semantics, in 2012, Open Geospatial Consortium (OGC, http://www.opengeospatial.org/) issued an open standard for 3D city generation, known as CityGML (https://www.citygml.org/). CityGML is a data model and exchange format to store digital 3D models of cities and landscapes (Yao et al. 2018). It defines the ways to describe most of the common 3D features and objects found in cities (such as buildings, roads, rivers, bridges, and vegetation) and the relationships between them. It also defines different level of details (LOD) for the 3D objects, which allows us to represent objects for different applications. 3D City Database software (3DCityDB, Zhihang Yao et al. 2018) based on CityGML standard provides a platform-independent suite to facilitate the development and deployment of 3D city model applications. Machine learning algorithms are also used in combination with 2D city data to generate its 3D structure using multiple attributes and their combinations, which also satisfies the accuracy recommendations of CityGML for LOD-1 models (Biljecki et al. 2017). In addition to this, using OpenStreetMap (OSM) (https://www.openstreetmap.org/) data along with the surface elevation data provided by Shuttle Radar Topography Mission (SRTM, http://lpdaac.usgs.gov) are used to generate 3D city models (Over et al. 2010). OSM’s database is crowd sourced, which needs to be verified before considering for any study. Although, it is relatively easy and inexpensive to develop 3D city models with the OSM data (Fig. 1b), the technology is still in evolution phase and needs to be harmonized. In view of this, United States Special Operations Command (USSOCOM) has conducted world-wide competition to develop algorithms which use satellite imagery and various open source or commercially available elevation data products to improve upon the state-of-the-art for automated building detection and labeling (https://www.nasa.gov/feature/ussocom-urban-3d-challenge).

4 Chemical Transport Models

Chemical transport modeling is a collective representation of four major processes – emission of pollutants, transport / dispersion of pollutants (dependent on transport of meteorological variables), physico-chemical conversion and deposition. Simulation of all these processes is required to estimate the spatio-temporal profile of various pollutants in order to evaluate the overall air quality. The nature of treating emission inventory, transportation, and physico-chemical interactions of air pollutants, vary with respect to mesoscale and building scale modeling, which are discussed in the subsequent sections.

Some of the pioneering works of Taylor (1927) in measurement of turbulent velocities, Sutton (1953) in micrometeorology, Pasquill (1974) is atmospheric diffusion, Anderson (1969) and Turner (1970) in determination of dispersion relations, Pasquill and Smith (1983) in atmospheric diffusion have paved the way for the development of air quality dispersion modeling. In general the dispersion models can be broadly classified into three categories (Thunis et al. 2016): (a) Gaussian; (b) Lagrangian, and (c) Eulerian. Whereas, El-harbawi (2013) has also mentioned about two more categories — Box models and Dense gas models, which are not discussed here. Air quality is also evaluated using various statistical models based on chemometric methods such as – principal component analysis, source apportionment, and chemical mass balance, which are applied on pollution data (Tsakovski et al. 2012) in order to identify the latent factors, contribution of various pollutant sources, and to assess regional air quality (Banerjee et al. 2015; Azid et al. 2015; Yotova et al. 2016).

Gaussian plume model (Pasquill 1961) is widely used tool for the representation of pollutant concentration over a region; generated by a point source during stationary emission and constant meteorological conditions (Zannetti 1990). By integrating Gaussian plume equation over the geometry of emission source, it can also be used to simulate downwind pollutant concentrations from line and area sources as well. Owing to the simplicity, the Gaussian plume models are prominently employed for statutory requirements for getting environmental compliance of developmental activities. For instance, dispersion of pollutants from thermal power plants (Toja-Silva et al. 2017), cement industries (Khaniabadi et al. 2018) and from small scale cottage industries (Yang 2017) are studied employing Gaussian models. The Lagrangian models mathematically describe the pollutant parcels as particles moving through the space according to random walk process (El-Harbawi 2013). The pollutant particles are treated as discrete phase in the Lagrangian approach and path of each particle is tracked. They are ideal for simulating the long range transport phenomenon of pollutants at continental scales, for instance Wen et al. (2012) have reported based on Lagrangian model, that nearly 30% of O3 concentration in Ontario, Canada has US origins. Steensen et al. (2013) applied the PUFF (Searcy et al. 1998) model which is based on Lagrangian principles to study the long range transport of volcanic ash. Yet, on the other side, Pepe et al. (2016) demonstrated the applicability of a Lagrangian model AUSTAL2000 (Janicke 2011) for simulating traffic pollution dispersion on urban center which also considers the effect of complex terrain (built-up area). Unlike Lagrangian approach, the Eulerian models track the motion of pollutants with respect to a fixed reference point in space by solving the conservation of mass, momentum and energy equations over a control volume and assumes continuum of matter. Majority of air quality models are of Eulerian type, which can be further classified into two categories based on the scale — (a) regional scale or mesoscale (CMAQ, CAMx, CALPUFF and others) and (b) building scale models (using CFD, OpenFOAM, ENVI-Met, Urban-CFD and others).

CMAQ, CAMx, CALPUFF are some of the widely used chemical transport models (CTM) for regional air quality simulations endorsed by United States Environment Protection Agency (USEPA). The Community Multiscale Air Quality model (CMAQ, http://www.cmaq-model.org/) is a 3D Eulerian photochemical model developed by USEPA in order to understand the complex atmospheric physico-chemical interactions (Ching and Byun 1999). Earlier studies have validated the capability of CMAQ in accurately simulating various pollutants – NOx, SOx, aerosols and O3 on regional scales (Byun and Schere 2006; Liu et al. 2013). The model is widely used for research and policy assessments across the world (Sharma et al. 2016 (India); de la Paz et al. 2013 (Mediterranean Basin); Chatani et al. 2011 (Japan); Simon et al. 2012 (USA)). The Comprehensive Air Quality Model with Extensions (CAMx, http://www.camx.com/) is a 3D Eulerian model which simulates spatio-temporal concentration of pollutants (O3, CO, PM, NOx,) over domains ranging from regional to urban scale (El-Harbawi 2013). However, the CAMx is also employed to predict the inter continental O3 transport using global climate models (Nopmongcol et al. 2017), as a part of Air Quality Modelling Evaluation International Initiative (AQMEII-3). Ciarelli et al. (2017) simulated European air quality (SO2, O3, NO2, CO and PM2.5) using CAMx during 4 years of study (2006–2009) and observed that the model over estimated the SO2 and O3 concentrations, while it under estimated the NO2 and CO concentrations. Shahbazi et al. (2017) applied the CAMx model along with a meteorological model to study the effect of reduced traffic volume on Tehran’s air quality. The model treats deposition and chemical reactions of pollutants in more realistic manner (El-Harbawi 2013). The California puff (CALPUFF) model is a dynamic Lagrangian dispersion model, which is able to simulate long-range transport of pollutants ranging from 50 to several hundred kilometers. It can model emissions (SO2, NOx, HNO3, NH3, PM and other toxic pollutants) from point, line, area and volume sources. The model requires characteristics of emission source input data and meteorological data such as, wind speed, direction, temperature, cloud cover, mixing height, relative humidity, surface pressure, precipitation, and upper air sounding data. CALPUFF is used for simulating the spatial concentration of emissions from power plants (Jeon and Lee 2015), incineration plants (Cetin Dogruparmak et al. 2018), and industrial complexes (Lee et al. 2014; Affum et al. 2016). Review by El-harbawi (2013) may be referred for further information about various air quality models.

CMAQ, CAMx and CALPUFF etc. have well established modules for simulating – gas phase and aqueous phase reactions, aerosol dynamics, and wet and dry deposition etc. However, the CTMs rely on meteorological fields, which are to be provided by numerical weather models (Baklanov et al. 2014) such as –WRF (Weather Research and Forecast, Sharma et al. 2016; Kang et al. 2016) or MM5 (Fifth-Generation Penn State/NCAR Mesoscale Model, Gsella et al. 2014; Zhang et al. 2016). Recent studies by Zhong et al. (2016) for east Asia region on PM10, O3, SO2 and NO2, Yahya et al. (2017) for USA on climate change induced air quality scenarios, and Jose et al. (2017) for Europe region on forest fires and others are instances of coupled applications of WRF and CTMs. Earlier to 2000, most of the CTMs are offline-coupled, which means, chemistry is driven one-way by the meteorological variables. Both CMAQ and CAMx have been used as an offline-coupled model (Yu et al. 2012; Zhang et al. 2012). On the other side, online-coupled models can simulate meteorology-chemistry interactions and two-way feedback processes, such as effect of aerosols on weather (radiation calculations) and the effect of weather variables on aerosol (optical properties) related mutually dependent processes. Online-coupled models represent more realistic one-atmospheric processes, but at the cost of development of more complicated model and computationally expensive processes (Zhang et al. 2016). Weather Research and Forecast with Chemistry (WRF-Chem, Grell et al. 2005) is most advanced and widely used online-coupled model now-a-days and also CMAQ version 5.0 and above are capable of two-way coupling with WRF (Baklanov et al. 2014). Other online-coupled models include Multi-scale Climate Chemistry Model (MCCM, Grell et al. 2000) and Gas, Aerosol, Transport and Radiation (GATOR) dispersion model (Jacobson 2001).

4.1 Weather Research and Forecast / Chemistry Model

Weather Research and Forecast (WRF, http://www2.mmm.ucar.edu/wrf/users/) is a non-hydrostatic (with option for hydrostatic) weather prediction model developed by the National Center for Atmospheric Research (NCAR). The model can simulate the meteorological variables such as – 3D wind profile, potential temperature, surface pressure and TKE (with a specific PBL scheme) on scales ranging few meters to several kilometers (Skamarock et al. 2008), using the initial and boundary conditions obtained from the National Center for Environmental Prediction (NCEP) compiled data. Similarly, the Penn State University / NCAR mesoscale model (MM5, http://www2.mmm.ucar.edu/mm5/) is a non-hydrostatic / hydrostatic mesoscale model, designed to simulate regional scale phenomenon of atmospheric circulations (Grell et al. 1994). Grell et al. (2005) may be referred for comparison between the WRF-Chem and MM5-Chem models.

The WRF is widely used mesoscale weather forecasting model which provides several options for choosing long wave radiation schemes, shortwave radiation schemes, surface layer schemes, land-surface schemes, urban physics, planetary boundary layer schemes and cumulus parameterizations. The WRF-Chem model is packaged along with WRF framework, in order to provide the unified framework in which meteorologic component of the model is fully consistent with the atmospheric component and both use the same transport grid, time-step, and physics schemes (Grell et al. 2005). The WRF-Chem model supports several gas-phase chemical mechanisms viz. RADM2, RACM, CB-4 and CBM-Z, photolysis schemes viz. Madronich, Fast-J and F-TUV, aerosol schemes viz. MADE/SORGAM, MADE/VBS, MAM, MOSAIC and GOCART, and plume rise model for treating emissions from wildfires (https://ruc.noaa.gov/wrf/wrf-chem/). For instance, the WRF-Chem simulated (using EDGAR emissions, Yonsei University (YSU) PBL scheme, RADM2 gas-phase reaction scheme and GOCART aerosol scheme) spatial profile of PM10 (particulate matter below 10 μm) over central India is shown Fig. 2. The accuracy and sensitivity of WRF-Chem simulations significantly depends upon multitude of factors such as – PBL parameterization schemes employed (Gunwani and Mohan 2017), gaseous and aerosol reaction schemes (Georgiou et al. 2018), representativeness of emission inventory (Matthias et al. 2018), urban canopy model schemes (Hu et al. 2010; Xie et al. 2012) and grid resolution (Kuik et al. 2016).

Fig. 2
figure 2

Particulate matter (PM10) profile simulated over the central Indian region using WRF-Chem-V3.9 and visualized using QGIS-v3.2

4.2 Gridded Emission Inventory

The CTMs require inventory of pollution sources which are representative of the study area. While solving the transport equations, the emission value in the respective grids will be advected or convected (using meteorological fields plugged in by numerical weather models), diffused and transformed (both physical and chemical) resulting in generation of realistic spatio-temporal profile of the pollutants. For enabling air quality modeling, the emission sources are made available in gridded format with varying spatio-temporal resolutions and speciations. The emission values in each grid are usually estimated from gross national estimates and certain proxy-variables such as, spatial distribution of activity / population. ACCMIP (Lamerque et al. 2010), EDGAR (Gutschow et al. 2016), GEIA (Price et al. 1997), POET (Granier et al. 2005), RETRO (Schultz et al. 2008) are global gridded emission inventories which provides both anthropogenic and biogenic sources. MEGAN (Sindelarova et al. 2014) and CAMS-GLOB-BIO (Sindelarova et al. 2014) provides global biogenic emissions. REAS (Kurokawa et al. 2013) provides anthropogenic emission data of east Asia. The Emissions of atmospheric Compounds and Compilation of Ancillary Data (ECCAD) portal (http://eccad.aeris-data.fr/) hosts emission inventory data from several databases in a much organized fashion at various spatio-temporal resolutions for facilitating air quality modeling.

Guidelines are provided by European Environment Agency (EEA 2009) to set up continental and national scale emission inventories, while there are no such guidelines for setting up urban and local scale emission inventories (Thunis et al. 2016). Usually, urban to local scale air quality studies rely on project specific emission inventories (Kang et al. 2016), which lacks uniformity and makes it difficult for integration into other studies. However, Forum for AIR quality MODElling (FAIRMODE 2018) guideline document and Citeair2 project report (www.citeair.eu/) stresses about harmonized compilation of emissions across EU to avoid irregularities. Under the FAIRMODE framework, the comprehensive emission source database, SPECIEUROPE (Pernigotti et al. 2016) is developed for the EU, in similar lines with EPA’s SPECIATE (Simon et al. 2010) for USA. Trombetti et al. (2017) have developed a method to downscale the national emission aggregates based on fuel consumption, productivity etc. (modeled using Greenhouse Gas and Air Pollution Interactions and Synergies Mode, GAINS model) to the grid resolution of 100 m. On the other side, the accuracy of the emissions and location of sources is also crucial to avoid uncertainties in CTM simulations.

Sparse Matrix Operator Kernel Emissions (SMOKE, Coats Jr 1996) emission modeling system is usually used to convert the site specific inventory data (along with the global gridded emissions) into the gridded format suitable for CTMs, especially for CMAQ. Some of the pre-processing utilities (prep_chem_source, http://brams.cptec.inpe.br) for the WRF-Chem also provide an easy conversion of user input emissions (and global gridded emissions) data into the required format. Point sources usually consist of stack emissions from industries, line sources represents traffic emissions, and area sources represent fugitive household emissions or from opencast mines or forest fires etc.

Apart from gridded emission inventory into regional scale CTMS, plugging in the time and space resolved emissions into the building scale CTM framework is another major challenge. Specific to ingestion of traffic emissions into the building scale CFD simulations, the SUMO (Krajzewicz et al. 2012) model which simulates the urban traffic, can be used to interface the dynamic traffic emissions from the roads (Borrego et al. 2016). It is also important to consider the non-road line sources including emissions from aircrafts within the urban domain. AEDT (Aviation Environmental Design Tool) (Wilkerson et al. 2010) is a tool which estimates full-flight (take-off/landing, ascent/descent and cruise) emissions from global commercial aircrafts (Vennam et al. 2017) and there are several studies which applied the model in CTMs of varying scales for understanding the effects of aviation emissions on surface air quality (Woody et al. 2015).

4.3 UCM Parameterization

Though, WRF captures overall evolution of the meteorological variables, like wind profile in ABL and beyond, but fails in resolving the building scale meteorologic features, which will subsequently influences the performance of fine scale air quality models (Pepe et al. 2016). Nevertheless, WRF with UCM parametrization enables it to broadly mimic the role of surface layer in urban meteorology, in a way that, UCM approximates the urban built-up in a 2D symmetrical street canyons in sub-grid level and estimates roof, wall and road surface temperatures and fluxes based on urban fraction (Tewari et al. 2007). Broadly UCM schemes are classified into three categories (Baklanov et al. 2016)– (a) single-layer canopy scheme, (b) multi-layer canopy scheme (Masson 2006; Martilli 2014) and in the third category, (c) the effect of buildings is explicitly resolved into CFD models (Baklanov and Nuterman 2009; Santiago and Martilli 2010). First two categories are very simple approximations of the UCMs and can be easily integrated into the meteorological models. In single-layer UCM, the model calculates radiative and turbulent fluxes which are used as bottom boundary conditions in meteorological models. Whereas in multi-layer UCM, several lower atmospheric layers are integrated with urban schemes, and the atmospheric equations consists of drag, heat and production sources in the momentum, temperature, and TKE terms in the model (Baklanov et al. 2008). This will improve the predictability of micro-climatic variables – ground surface temperature and wind profile, as per the floor density distribution of urban built-up (Kondo et al. 2005). However, Kusaka et al. (2001) shows that their single-layer UCM output is consistent with the multi-layer canopy model and field observations. Grimmond et al. (2010) have compared 33 UCMs’ accuracy and operational complexity in predicting the energy and water exchanges in urban setting and concluded that, all models do not perform best or worst for all fluxes but some classes of models estimate better for individual fluxes. Also, concluded that, there is no significant difference between the performances of simpler and complex UCMs. The choice of input parameter values into the UCMs will result in large deviations in model performance in simulating the fluxes (Grimmond et al. 2011).

Although, urban parameterization is employed with UCMs, the mesoscale weather models are not explicitly designed to resolve the street scale or building scale phenomena over sub-grid levels (Thunis et al. 2016; Baklanov et al. 2016), which needs to take the advantage of CFD models (Fitch et al. 2003; Chen et al. 2013). The wind flow at large, is although governed by the mesoscale processes such as thermally induced winds, orographic winds, and sea breeze circulations, the transport of meteorological variables and pollutant dispersion within heavily built-up urban areas is influenced by the microscale features of urban canopy. Also, as discussed earlier, the differential heating between built-up, roads, vegetation, water bodies and ambient air influences the dispersion (Ortiz and Friedrich 2013; Santiago et al. 2015; Ortolani and Vitale 2016; Xiao et al. 2018) which is a microscale phenomenon. These limitations further hampers the extended studies related to human exposure and others (Batterman et al. 2014). In this juncture, the role of a CFD model becomes very important in simulating the urban air quality in very high resolution at building or street scale, which essentially accommodates the site specific emission inventory and micro-meteorological processes.

4.4 CFD Models

The CFD models solve the advection and dispersion equations of gaseous pollutants based on the flow field obtained by Navier-Stokes equations. For instance, the building scale flow field over a set of buildings simulated using OpenFOAM is shown in Fig. 3. CFD models provide wider scope for implementation of various turbulence models and solution methods. Usually, turbulence is explicitly resolved in DNS (Direct Numerical Simulation) models, while parameterized in RANS (Reynolds Averaged Naiver-Stokes) and LEM (Large Eddy Simulations) models using different concepts. OpenFOAM is an open source programmable multi-physics CFD toolbox, which is widely accepted among environmental modeling groups (Kadaverugu 2016), offers RANS, LES, hybrid URANS/LES (unsteady) and DNS models.

Fig. 3
figure 3

Velocity filed in x-direction simulated over the buildings of CSIR-NEERI campus using OpenFOAM-v1712

Toparlar et al. (2017) investigated 176 studies reported till 2015 in CFD modeling of urban microclimate, out which 96% have used RANS, 2.8% have used LES and rest have used both the methods to solve the governing Navier-Stokes equations. Although LES simulations are known to be accurate (Jeanjean et al. 2017), owing to heavy computational requirements and lack of best practice guidelines (Toparlal et al. 2017), it has received less attention. Yamada and Mellor E-ε (Yamada and Mellor 1975) and standard k-ε (Jones and Launder 1972) are most commonly used turbulence models (Toparlar et al. 2017). However, standard k-ε model is reported to underestimate the TKE in the wake of buildings, while overestimates near the frontal corners of buildings (Tominaga et al. 2008). Methods like realizable k-ε (Shih et al. 1995) and RNG k-ε (renormalization group) (Yakhot and Orszag 1986) are also gaining prevalence during the recent studies (Toparlar et al. 2017).

Employing CFD models involves meticulous integration of radiative codes (computation of shadows to get differential heating of surfaces etc), wall interactions and vegetation response codes with the transport of meteorological variables and pollutants. Vegetation foliage and its vertical distribution will show varying levels of transpiration (Kadaverugu 2015) and also the shape and size of the leaves have disproportionate properties in obstruction / removal of particulate matter (Janhall, 2015). For instance, small hedges are effective in improving air quality in street canyons unlike tall trees (Abhijith et al. 2017). Thermal interaction of vegetation and water bodies (Robitu et al. 2006) have to be clubbed into the unified CFD domain. Phenomenon of precipitation, cloud formation and radiative interactions are not included in the existing CFD models. Integration of atmospheric chemistry modules into CFD code is also another major challenge, which are to be addressed. However, CFD models resolve heat and mass transport at much higher resolutions with appropriate boundary conditions received from numerical weather models / meteorological / mesoscale models. The trade-offs in selection of CFD models is to be wisely chosen by the modeler; based on the specific application.

On the downside, the CFD models are computationally intensive to extend the studies over large domains and simulation of atmospheric chemistry within the domain is a challenging task (Stein et al. 2007; Lefebvre et al. 2011, 2013; Beevers et al. 2012; Isakov et al. 2014). Thunis et al. (2016) observed that, CFD models for urban air quality are rarely used in Europe, may be due to their specialized applications for very fine scales. Also Thunis et al. (2016) noted based on a survey that, about 30% of air quality studies and 60% of research studies are based on Eulerian models. While, Gaussian plume models represent around 25% for air quality studies and 10% for research studies. When the studies are classified according to the geographical scales, out of 177 studies reported, 30% represented regional scale, 30% urban scale, 15% local scale and 15% street scale. The local and street scale air quality simulations are based on CFD models. Hardly any urban air quality study based on CFD tools were reported from the Latin America, African continent and India (Toparlar et al. 2017), although they represent large share of global urbanization.

Tewari et al. (2010) demonstrated the improved predictability of urban air quality by coupling WRF model with a CFD model. Liu et al. (2012) investigated street level traffic pollutant dispersion by coupling Large Eddy Simulations with WRF output for an entire city. Similarly, Miao et al. (2013) coupled a CFD model with WRF to study the airflow and dispersion of pollutant in a complex urban area of Beijing. However, the work was not aimed at validating the dispersion of real life pollutants with actual measured data. Further, Jensen et al. (2017) simulated air quality at every street of Denmark using the suit of multi-scale air quality models with high resolution emission data. Similar kind of work would not have been possible in other parts of world due to constraints of high resolution emission data. Recent studies employing the CFD and mesoscale models are provided in Table 1.

Table 1 Summary of recent studies featuring coupled meteorological and CFD models in urban air

Apart from the modeling challenges associated with CFD models, its seamless integration with mesoscale weather models and development of accurate 3D computational domain of the cities are another major challenges. The problems at multi-scale integration stems from the dissimilarities in the solution approach of mesoscale and building scale models. For instance, WRF-Chem solves compressible Euler equations of atmospheric flow (in finite difference with explicit time integration), while OpenFOAM solves (in steady state with RANS) incompressible Naiver-Stokes (in finite volume with semi-implicit time integration) using SIMPLE algorithm (Zheng et al. 2015). Owing to non-matching computational grids (in space and time) and differences in discretization methods, care has to be taken while transferring boundary values between the models of varying scales. Velocity components and TKE values for CFD simulations can be directly obtained from the mesoscale grids which are nearer to the building scale domain. However, certain variables like momentum diffusion coefficient and TKE dissipation rate are to be estimated using parameterized expressions (Zheng et al. 2015).

5 Validation Methods

Both the regional and building scale air quality model simulations are to be validated with ground monitored air quality or meteorological data, in order to assess the suitability of methods employed in the simulations. In general, validation methods can be broadly divided into three categories – (a) comparison with ground observations (Liao et al. 2015; Kwak et al. 2015; Sharma et al. 2016), (b) comparison with satellite retrievals (Streets et al. 2013; Steensen et al. 2013; Fernandes et al. 2015) and (c) comparison with other modeled simulations (Zhang et al. 2016). The measured data, apart from validation is also useful in off-line interpolation, online-mapping (nudging), and 3D or 4D data assimilation and ensemble methods. Lahoz et al. (2010) may be referred for a review on assimilation methods. Thunis et al. (2016) noted that nearly 80% of studies reported in APPRAISAL (http://www.appraisal-fp7.eu) database have used ground monitored data for data assimilation, 50% for model validation, 20% for post-processing of the simulations, 13% for setting up of boundary conditions and remaining 17% for model calibration. They also noted that, nearly 75% of the studies used the data from automated monitoring networks. For instance, AERONET (Aerosol Robotic Network, Holben et al. 2001), AURN (Automated Urban Rural monitoring Network, https://uk-air.defra.gov.uk/) in UK, SAFAR (System of Air Quality and Weather Forecasting And Research, http://safar.tropmet.res.in/) in India.

The mobile air quality laboratories are also gaining importance and used for evaluation of traffic related air pollution (Wang et al. 2009; Zwack et al. 2011; Padro-Martinez et al. 2012) and pedestrian exposure (Rakowska et al. 2014). Mobile laboratory measurements have both spatial and temporal resolved information in comparison with stationary observations, which also compliment the validation of CFD simulations of urban air quality (Kwak et al. 2018). Apart from that, the community based air quality sensing using wireless network system, such as OpenSense (https://gitlab.ethz.ch/tec/public/opensense) which is in a developing phase, has potential to develop and validate near real time high resolution urban air quality models.

The monitored data represents an observation at a point locations, whereas modeled data is a volume averaged approximation, in this context the representativeness of the monitored data is a crucial issue in validating the models. Satellite retrieval of atmospheric column pollutant concentrations such as NOx, SOx, CO, O3, CH4 and aerosols etc. are also used for air quality model assimilation and validation studies. Some of the major satellite sensors (pollutants retrieved) are: MODIS (PM), MISR (PM), AIRS (SO2, CO, CH4, CO2), OMI (O3, NOx, SOx, PM, NMVOC, HCHO), MOPITT (CO, CH4), TES (NH3, CO2), and IASI (SO2, CO, CH4, NMVOC, NH3, CO2). Streets et al. (2013) may be referred for further review on emission estimation from satellite observations. Recently launched Sentinal-5 (O3, NO2, SO2, HCHO, CO, CH4, AOD) and Gaofen-5 (first full-spectrum hyper-spectral satellite for comprehensive air pollution monitoring) offer further enhanced atmospheric data. NASA provides multitude of satellite retrieved air quality products through web platforms such as Giovanni, LAADS, Worldview, LPDAAC, AppEEARS and others, where spatio-temporal data products can be obtained for air quality studies (https://airquality.gsfc.nasa.gov/resources) and model validations. The spatial air quality data simulated by the models can be compared with satellite data products either qualitatively (Steensen et al. 2013) or quantitatively using spatial correlation analysis (Kim et al. 2006).

Model performance is usually assessed by calculating statistical metrics such as – correlation coefficient, normalized mean bias, mean fractional error and bias, and normalized mean square error, between the modeled and observed environmental variables (Zhong et al. 2016). One of the open source R (R Core Team 2017) libraries, openair-project (Carslaw and Ropkins 2012), is widely used for statistical analyses of air quality data and model validations.

6 Conclusion

Poor air quality is one of the stress factors causing deterioration of human well being. Urban centers are becoming pollution hot spots due to rapid development and lack of air quality management strategies. There is an urgent need for evaluation of air quality in urban centers and to identify the regions with non attainment of permissible limits. Although, the ground-based monitoring networks provide the status of air quality, it is insufficient to understand the dispersion dynamics of the pollutants and their interaction with urban features. Regional air quality modeling using mesoscale chemical transport models alone will not provide complete picture about the dynamics of air quality at local / building scale, as the mesoscale models do not explicitly resolve the sub grid level processes. However, they provide regional background of meteorological and air quality inputs which can further drive the local / building scale CFD models for simulating the high resolution urban air quality. Dissimilarities in solution approach of mesoscale and local scale models poses challenges in seamless integration of flow of variables between the multiscale models.

The community developed mesoscale air quality models have advanced routines for aerosol and gas phase chemistry, physical processes involving cloud formation, radiation budgeting, and soil surface interactions etc. The overall accuracy of the models also depend on the choice of parameterizations, which are specific to study area and need to be tested on case to case basis. Whereas, the CFD models although have better capabilities in simulating urban microclimate, they are not mature enough to represent the gas-phase and aerosol reactions, interactions between air and vegetation, built-up area, roads, and water bodies etc. Further, integration of spatio-temporal variability of emission sources into the urban domain is an another challenge. The accuracy of the CFD models also significantly depends on the quality of the geospatial data (3D features), and there is a clear shortcoming in application of latest technologies for extraction of 3D spatial data in developing countries. Particularly, the 3D surface models generated through algorithms are feasible over large domains and expected to play a major role in urban resource management and subsequently helps in managing urban air quality, noise pollution and storm water floods.

Lack of uniformity and check on good practices in the urban CFD modeling is a major concern. Guidelines should be established in order to harmonize the local-scale air quality studies which are now happening at isolation. CFD tools offer multitude of options for the selection of turbulence models and other discretization methods for solving partial and ordinary differential equations. Arriving at best choice of models for specific air quality modeling study is a tedious task, and it is reiterated by many researchers. Hence, validation of the CFD simulations is very much required for its acceptance in policy and compliance studies. There is a huge scope for development of OpenFOAM libraries for specific urban micro-climate phenomena for simplified and standardized work-flow for urban studies.

The discipline of urban planning is now drawing the attention of specialists in computational fluid dynamics, meteorology, architecture, civil engineering, physics and chemistry. The broad spectrum of aspects discussed in the current review will enhance the understanding of researchers and policy makers in appreciating the multidisciplinary nature of urban air quality modeling and thereby helps them to plan location specific mitigation strategies.