1 Introduction

Several studies suggest birth weight is associated with air quality exposure during gestation (Kramer 2003; Glinianaia et al. 2004; Maisonet et al. 2004; Parker et al. 2005; Šrám et al. 2005; Ritz et al. 2007; Darrow et al. 2011), and relate decreasing outdoor air quality levels with increasing incidence (or prevalence) rates of adverse birth weight outcomes (low birth weights defined as birth weight <2,500 g; very low birth weights defined as birth weight <1,500 g; small-for-gestational age, defined as birth weight below 10th percentile for gestational age). In these studies, data on health outcomes and health covariates are usually measured individually, while exposure assignments are usually based on air quality monitoring stations, a problem known in spatial statistics as the change-of-support problem (Young and Gotway 2007). At these very specific places, where air monitoring stations are placed, exposure data can be collected, but usually these locations do not coincide with health data locations. Air monitoring stations tend to be sparsely located in places where relatively high concentrations are expected and exposure data are time series of instantaneous concentrations measured for few pollutants. One possible approach to overcome these limitations is using lichen diversity biomonitoring programs. Lichens are symbiotic organisms consisting of fungi and algae or cyanobacteria, and are the most studied bioindicators and biomonitors of air pollution (Branquinho 2001). They have a wide geographic distribution and their diversity tends to decline in frequency and distribution pattern in polluted areas as a physiological response to long-term cumulative harmful effects of air pollutants (known and unknown). They have been used to monitor several air pollutants such as metals (Martin and Coughtrey 1982; Garty 2001), dioxins (Augusto et al. 2004, 2007), polycyclic aromatic hydrocarbons (Augusto et al. 2009) or particulate matter (Loppi and Pirintsos 2000; Pinho et al. 2008). A common example of their importance in health studies is the work of (Cislaghi and Nimis 1997), where results showed a good correlation between lichen diversity and lung cancer mortality. To estimate health exposures most approaches involve the use of statistical models that incorporate spatial information about exposure (Gryparis et al. 2009). However, estimated exposures can be misleading if they do not take into account the uncertainty of exposure estimates. One possible approach to overcome this problem is to estimate standard errors of exposure by analytical methods, but this can be unsolvable since the amount of uncertainty varies with location (Waller and Gotway 2004). Another approach is to adjust exposure measurements errors collecting simultaneously data on the exposure surrogate and on a gold standard method and apply regression calibration methods that incorporates external validation data to correct for possible bias (Thurston et al. 2003; Spiegelman 2010). Recently (Gerharz and Pebesma 2013) reported results on air pollution measurements uncertainty using geostatistical unconditional simulation and fine resolution measurements collected with GPS tracks to assess exposure of individuals travelling by foot through the city of Munster. Waller and Gotway (2004) tried a different approach based on geostatistical simulation which assumes a model for spatial autocorrelation of exposure data, and uses Monte Carlo simulations to provide individual distributions of exposure and to assess individual uncertainty of exposure. Here we assessed spatial uncertainty of individual exposures using a similar approach. We conducted a semi-ecological design study which is widely used to assess effects of air pollution in humans (Kunzli and Tage 1997). In this kind of analysis, health outcomes and covariates are measured in individuals and exposure assignments are based on an ecological variable (unit of observation is aggregate rather than individual). We used lichen data as exposure measurements to derive an air quality index during gestation and geostatistical sequential simulation to derive uncertainty of exposure measurements. Lichen data were used as an option to traditional air quality monitoring stations, given that they cover rural, urban and industrial regions, with higher spatial resolution of exposure measurements when compared with available air monitoring stations data. To incorporate uncertainty of exposure measurements we used a geostatistical sequential simulation algorithm that provided multiple equally probable realizations of observed data, with reproduction of observed histogram and spatial covariance while matching for conditioning data (Soares 2001). Each simulation yielded a unique value at each location and represented a measure of exposure. The set of simulated values at each location provided a measure of exposure uncertainty. We used all simulations as input for statistical analysis with generalized linear models (Nelder and Wedderburn 1972). Finally, after fitting all multivariate models on birth weight, we assessed statistical significance of exposure parameter, using exposure parameter empirical distribution. Mean and confidence interval (CI) of the empirical distribution were used as bootstrap point and CI estimates of exposure parameter.

2 Materials and methods

2.1 Studied area

In this study we used data from all mothers who participated to the Gestão Integrada Saúde e Ambiente (GISA) health and environment project, living in predominantly urban areas during pregnancy period. The classification of an urban area is a statistical concept set by the national statistical office to classify parishes in national territory according to a methodological approach that includes degree of urbanization regarding population density and land-use plans. Predominantly urban areas in Coastal Alentejo (Alentejo Litoral) region include two parishes in Alcácer do Sal (Santiago and Santa Maria do Castelo), and the parishes of Grândola, Sines, Santo André and Santiago do Cacém. Predominantly urban areas of Alcácer do Sal and Grândola are situated in the north part of Coastal Alentejo while Sines, Santo André and Santiago do Cacém are situated in the centre coastal area of that region and are close to an important industrial pole comprising mainly petrochemical and energy related industries. Selecting urban areas for analysis enabled us to remove potential confounding due to rural/urban differences in life styles and socio-demographic factors (like smoking habits, prevalent types of occupation, levels of education or pregnants ages), known to be relevant when modeling birth weight.

2.2 Air quality data

Outdoor air quality during gestation was assessed through a lichen diversity biomonitoring program, where the number of lichen species and its frequency measured in period 2008–2010 at 84 sites (Fig. 1a) were used to estimate a lichen diversity index value. This index reflects an integrated environmental exposure over time embracing the biological effect of all pollutants in a synergistic way, even of those pollutants that cannot be measured (Ribeiro et al. 2010). Moreover it allows for the identification of the more disturbed areas resulting from air pollution and provides an overall measurement of the air quality, since lichens are exposed to the same complex mixture of pollutants that humans have been exposed to. To obtain this index (where higher lichen diversity index values point to higher air quality) we followed a standard protocol according to (Asta et al. 2002). Among several existing lichen growth forms used as air quality indicators (Llop et al. 2012) we selected fruticose lichens for analysis, since they are known to have higher surface/volume ratio and are thus more prone to intercept atmospheric pollutants than others (Branquinho 2001).

Fig. 1
figure 1

a Location of sampling sites and lichen diversity values and b places of residence of participant mothers living in predominantly urban areas

Sampling sites were selected using a stratified random method, by dividing Coastal Alentejo region into a regular grid (each grid cell represented an area of 15 km × 15 km) and selecting a simple random sample of three sampling locations from each grid cell. The average distance between sampling sites is 4,805 m (minimum = 1,500 m; median = 4,766 m; maximum = 10,940 m), the average altitude of sampling sites is 109 m (minimum = 11 m; median = 100 m; maximum = 270 m).

2.3 Health data

Health data was collected at primary health centers by the research team, from pregnant registries, clinical records and a specific questionnaire administered orally to mothers involved in this study. The study protocol is available elsewhere (Ribeiro et al. 2010). For data analysis, we categorized health data (Table 1) on maternal characteristics known to have impact on birth weight, concerning genetic factors (maternal prepregnancy weight, maternal height), demographic and social factors (maternal age, education, occupation), obstetric factors such as parity (having given birth previously), prior preterm births (having given birth previously baby or babies with <37 weeks of gestation) or prior low birth weight (having given birth previously baby or babies with <2,500 g birth weight), antenatal care factors (first antenatal visit and number of visits during pregnancy), nutritional factors (gestational weight gain), maternal morbidity during pregnancy such as preeclampsia (high blood pressure and protein in the urine after the 20th week of pregnancy), gestational diabetes (high blood sugar diagnosed during pregnancy), uterine bleeding or hypertension, and toxic exposures (to tobacco smoke, active or passive). We also collected night and day-time addresses of mothers during pregnancy, geocoded and stored them in a geographic information system. Our outcome variable was birth weight. Birth weight is determined by duration of gestation and rate of fetal growth (Kramer 1987). Hence, to provide an optimal assessment of fetal growth considering both duration of gestation and rate of fetal growth, we used exact birth weight percentiles by gestational age (Fenton and Sauve 2007). To perform this conversion, we used the Centers for Disease Control and Prevention growth charts (Kuczmarski et al. 2002) for term births (≥37 weeks of gestation) and the Fenton’s fetal-infant growth chart (Fenton 2003) for preterm births (<37 weeks of gestation).

Table 1 Maternal characteristics distribution (n, %), mean birth weight percentile (mbwp) and standard error of the mean (se), by sample subgroups

Categories for health data were defined as suggested in literature review. Age-groups were defined according to the work of Fraser and colleagues (Fraser et al. 1995), education levels were defined according to Bell and colleagues or Parker and colleagues (Bell et al. 2007; Parker et al. 2005), prepregnancy body mass index and gestational weight gain were categorized according to IOM (Institute Of Medicine) and NRC (National Research Council) guidelines (2009). Occupation was categorized according to the Portuguese Classification of Occupations 2010. Parity, preeclampsia, gestational diabetes, uterine bleeding and hypertension were categorized according to their occurrence (occurred or not during pregnancy), antenatal care was defined as adequate according to the National Reproductive Health Program criteria, smoking in pregnancy was defined as smoking cigarettes on a daily basis, and environmental tobacco was defined as being exposed to passive smoke on a daily basis during pregnancy.

2.4 Statistical analysis

For analysis of health data, we calculated descriptive statistics and used Mann–Whitney and Kruskall Wallis tests to compare differences of mean birth weights between 2 and 3 or more (ordered) groups, respectively. We set the significance level (p value) at 0.05. To describe spatial patterns of exposure we estimated the direct variogram and to assess its spatial uncertainty we applied geostatistical modeling and ran a sequential simulation algorithm. To describe the impact of exposure in birth weight, we used each simulation to estimate an exposure parameter using generalized linear models (Nelder and Wedderburn 1972). Finally we used bootstrap techniques to estimate mean and confidence intervals of exposure parameter withdrawn from empirical distribution of exposure parameters. We describe each one of these steps below. For statistical analysis of data we used PASW 18 (Spss Inc 2010) for variogram analysis we used Variowin 2.21 (Pannatier 1998) and GeoMS (Centre for Natural Resources and the Environment 2000) for direct sequential simulation. Arc GIS software (Environmental Systems Research Institute 2006) was used to integrate all spatial and tabular data in one work environment, to execute general procedures for data analysis and geoprocessing operations, and to visualize geographic features.

2.4.1 Spatial patterns of exposure

To describe exposure we used the complete dataset of lichen diversity values measured in Coastal Alentejo region (n = 84) and conducted an exploratory spatial data analysis. We calculated descriptive statistics and estimated spatial patterns of lichen diversity index value with the variogram function, \( \gamma (.) \), based on the method-of-moments estimator (1), where z are data measures of variable Z, N(h) represents the number of sampling sites separated by a vector h, and sα and sα + h represent sampling sites separated by a vector h (Table 2).

Table 2 Lichen diversity index value descriptive statistics
$$ \gamma \left( {\mathbf{h}} \right) = \frac{1}{{2N\left( {\mathbf{h}} \right)}}\sum\limits_{\alpha = 1}^{{N({\mathbf{h}})}} {\left[ { (z (s_{\alpha } )- z (s_{\alpha } + {\mathbf{h}} )} \right]^{2} } $$
(1)

To estimate the variogram function (1) for any distance, we fitted an Exponential model (2) to variogram estimates. This theoretical model belongs to a family of permissible or valid variogram models because \( \gamma (.) \) satisfies the conditionally negative-definiteness property (16, 17).

$$ \gamma \left( {\mathbf{h}} \right) = \left\{ \begin{gathered} 0, \quad {\mathbf{h}} = 0\hfill \\ C_{0} + C_{1} \left[ {1 - { \exp }\left( { - {{3*\left\| {\mathbf{h}} \right\|} \mathord{\left/ {\vphantom {{3*\left\| {\mathbf{h}} \right\|} a}} \right. \kern-0pt} a}} \right)} \right], \quad {\mathbf{h}} \ne 0\hfill \\ \end{gathered} \right. $$
(2)

In (2) \( C_{0} ,C_{1} \) represent model coefficients or partial sills, \( a \) represents spatial range, and \( \left\| {\mathbf{h}} \right\| \) represents a vector distance between two locations. To find the parameters estimates that best fit our data we used both visual fit to observed data and Indicative Goodness of Fit statistic (Pannatier 1996).

2.4.2 Spatial uncertainty of exposure

The location of exposure measurements and health assessments don’t coincide, as illustrated in Fig. 1. To overcome this issue we used a geostatistical approach and applied Direct Sequential Simulation algorithm (Soares 2001) to exposure data. We assume exposure data are spatial in nature and that observed exposure data is a realization of a stochastic process:

$$ \left\{ {Z\left( {\mathbf{s}} \right):{\mathbf{s}} \in D} \right\} $$

where exposure variable Z at any spatial location s over spatial domain D is a continuous random variable with cumulative distribution function \( F\left( z \right) = \Pr \left\{ {Z\left( {\mathbf{s}} \right) \le z} \right\} \) and with stationary variogram, \( \gamma (.) \). Direct sequential simulation aims to provide L realizations of exposure, \( \left\{ {z^{(l)} \left( {{\mathbf{s}}_{1} } \right), \ldots ,z^{(l)} \left( {{\mathbf{s}}_{i} } \right), \ldots ,z^{(l)} \left( {{\mathbf{s}}_{n} } \right)} \right\} \), with \( l = 1, \ldots ,L \) at locations \( \left\{ {{\mathbf{s}}_{1} , \ldots ,{\mathbf{s}}_{i} , \ldots ,{\mathbf{s}}_{n} } \right\} \), reproducing histogram and spatial covariance of observed exposure data, \( z\left( {{\mathbf{s}}_{\alpha } } \right) \), with \( \alpha = 1, \ldots ,k \), through the sequential use of conditional distributions:

$$ F\left( {{\mathbf{s}}_{n} ;z_{n} |k + n - 1} \right) \cdot F\left( {{\mathbf{s}}_{n - 1} ;z_{n - 1} |k + n - 2} \right) \cdot \ldots \cdot F\left( {{\mathbf{s}}_{2} ;z_{2} |k + 1} \right) \cdot F\left( {{\mathbf{s}}_{1} ;z_{1} |k} \right) $$

where \( F\left( {{\mathbf{s}}_{n} ;z_{n} |k + n - 1} \right) \) is the conditional cumulative distribution function of \( Z\left( {{\mathbf{s}}_{n} } \right) \) given k observed data and n−1 previously simulated values \( \left\{ {z\left( {{\mathbf{s}}_{1} } \right), \ldots, z\left( {{\mathbf{s}}_{i} } \right), \ldots ,z\left( {{\mathbf{s}}_{n - 1} } \right)} \right\} \). To reproduce histogram and spatial covariance of observed exposure data, the direct sequential simulation algorithm operates following these steps: (1) For simulation l, defines a random path visiting all nodes \( {\mathbf{s}}_{i} \) (i = 1,…, n) to be simulated over spatial domain, D; (2) estimates the local mean and variance of \( z\left( {{\mathbf{s}}_{i} } \right) \), with linear simple kriging estimator and kriging variance, conditioned to observed exposure data \( z\left( {{\mathbf{s}}_{\alpha } } \right) \) and previously simulated values \( z^{\left( l \right)} \left( {{\mathbf{s}}_{i} } \right) \); (3) draw a value \( z^{l} \left( {{\mathbf{s}}_{i} } \right) \) from the histogram of observed exposure data, located in the interval centered on local mean of \( z\left( {{\mathbf{s}}_{i} } \right) \) with ± one standard deviation of \( z\left( {{\mathbf{s}}_{i} } \right) \); (4) return to step (1) until all nodes have been visited by the random path. At each step, a correction for local bias and local mean was applied, as suggested by (Soares 2001). We ran a set of 100 simulations to provide a measure of spatial exposure uncertainty, since mean and variance of simulations at each location could be derived from simulated values.

2.4.3 Exposure model

In our exposure model we assumed 15 h a day spent at place of residence and 9 h a day spent at place of work. We did not consider commuting times because their contribution to overall exposure was negligible in studied region. Exposure of jth pregnant mother during gestational period in simulation l (l = 1,…,100), \( E_{j}^{\left( l \right)} \), is calculated as a weighted average of exposure data locations (3).

$$ E_{j}^{\left( l \right)} = \sum\limits_{i = 1}^{{n_{j} }} {t_{j} \left( {s_{i} } \right)*z_{j}^{\left( l \right)} \left( {s_{i} } \right)} $$
(3)

\( t_{j} \left( {s_{i} } \right) \) represents time (as a proportion of overall time of gestation) spent by jth mother during pregnancy, at location \( s_{i} \)(i = 1,…,n j) and \( z_{j}^{\left( l \right)} \left( {s_{i} } \right) \) represents simulated lichen diversity value on simulation l for jth mother during pregnancy, at location \( s_{i} \).

2.4.4 Generalized linear models

To analyze how birth weight is associated with air quality we applied generalized linear models (Nelder and Wedderburn 1972). To measure linear associations between each one of the health covariates and birth weight we estimated the following model:

$$ g(\mu_{j} ) = \hat{\beta }_{0} + \hat{\beta }_{z} Z_{j} ,\quad {\text{where }}j = 1 , 2 ,\ldots , {\text{ J}} $$
(4)

In model (4), \( g(\mu_{j} ) \) is the identity link function, \( \hat{\beta }_{0} , { }\hat{\beta }_{z} \) are estimated parameters and \( Z_{j} \) is one covariate value for the jth mother (j = 1,…,J). To evaluate goodness-of-fit we calculated Akaike’s Information Criterion (AIC) for each univariate model. We tested model (4) against the null model, \( g\left( {\mu_{j} } \right) = \hat{\beta }_{0} \), and used the log-likelihood ratio statistic to assess statistical significance. Results from this analysis provided us valuable information for selecting a list of candidates with impact on birth weight, for inclusion in posterior multivariate modeling. To measure crude associations between exposure and birth weight we used all the simulations computed previously and fitted L (l = 1,…,L) univariate models (5) to estimate L unadjusted exposure parameters. In model (5), \( g^{\left( l \right)} (\mu_{j} ) \) is the identity link function for simulation l (l = 1,…,L), \( \hat{\beta }_{0}^{\left( l \right)} ,\hat{\beta }_{1}^{\left( l \right)} \) are estimated parameters at simulation l, \( E_{j}^{\left( l \right)} \) is the exposure model value for the jth mother at simulation l.

$$ g^{\left( l \right)} (\mu_{j} ) = \hat{\beta }_{0}^{\left( l \right)} + \hat{\beta }_{1}^{\left( l \right)} E_{j}^{\left( l \right)} $$
(5)

We further evaluated associations between exposure and birth weight with multivariate modeling. For each simulation, we started with a null model and performed a forward stepwise procedure where exposure index was forced into the model. During the stepwise selection procedure, candidate covariates were included in the model if their p value was lower than 0.15 and excluded if their p value was higher than 0.20. We used the log-likelihood ratio test for model selection. After completion of the stepwise procedure, we fitted the following multivariate model for simulation l (l = 1,…, L):

$$ g^{\left( l \right)} (\mu_{j} ) = \hat{\beta }_{0}^{\left( l \right)} + \hat{\beta }_{1}^{\left( l \right)} E_{j}^{\left( l \right)} + {\hat{\varvec{\beta }}}_{z}^{\left( l \right)} {\mathbf{Z}}_{j} $$
(6)

where \( g^{\left( l \right)} (\mu_{j} ) \) is the identity link function for simulation l, \( \hat{\beta }_{0}^{\left( l \right)} ,\hat{\beta }_{1}^{\left( l \right)} , { }{\hat{\varvec{\beta }}}_{z}^{\left( l \right)} \) are estimated parameters at simulation l, \( E_{j}^{\left( l \right)} \) is the exposure index of the jth mother during gestation at simulation l and \( {\mathbf{Z}}_{j} \) is a vector of significant covariates included in the model.

2.4.5 Bootstrap mean and confidence intervals

Instead of estimating a mean and a confidence interval for exposure parameter by theoretical analysis, we numerically evaluated them by means of sequential simulation, using non-parametric bootstrap techniques (Efron 1988).

$$ \hat{\beta }_{1}^{{}} = \frac{1}{L}\sum\limits_{l = 1}^{L} {\hat{\beta }_{1}^{\left( l \right)} } $$
(7)

From empirical distribution of exposure parameter estimates, we calculated the bootstrap estimate exposure parameter (7) and set confidence intervals for \( \hat{\beta }_{1}^{{}} \) taking the interval between \( 100 \cdot {\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. \kern-0pt} 2} \) and \( 100 \cdot (1 - {\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. \kern-0pt} 2}) \) percentiles of the empirical distribution of \( \hat{\beta }_{1}^{{}} \), where \( \alpha \) denotes an arbitrary significance level.

3 Results

3.1 Health data

We analyzed data of 867 babies of mothers living in predominantly urban regions during pregnancy that participated in GISA project. In two babies, birth weight percentiles were not calculated because birth weight or weeks of gestation were missing. These two babies were not used in further analysis.

Sample distribution of birth weight is symmetric (Fig. 2a), observed mean value is 3,190 g (minimum = 635 g, maximum = 4,915 g) and observed median value is 3,200 g. Median gestational age at birth is 39 weeks and observed mean is 38.8 weeks (minimum = 25 weeks, maximum = 42 weeks). In Fig. 2b birth weight percentile histogram shows a leptokurtic unimodal shape and positive asymmetry with observed mean of 0.39 (39th percentile). The growth charts used here are based on populations with specific characteristics that differ from our sample population, which explains the positive skewness shown in Fig. 2b. In a clinical context, where birth weight percentiles are used to identify babies at risk or to provide indication on infants growth, we should have considered other percentile chart of birth weight and gestational age more plausible for our population (less skewed). Since we use birth weight percentiles as a measure of relative birth weight to evaluate associations with air quality, this is not an issue. Main descriptive statistics on maternal characteristics are summarized in Table 1.

Fig. 2
figure 2

Histogram of a birth weight and b birth weight percentile in GISA sample

Descriptive results did not show significant differences between mean birth weight percentiles categories of socio-demographic characteristics analyzed, age at pregnancy, education or occupation. There was a significant linear trend on prepregnancy body mass index categories and on gestational weight gain categories. Health covariates referring to past obstetric factors showed significant differences in mean birth weight percentiles categories for parity and occurrence of previous low birth weight, but not for previous preterm births. No pregnancy complications showed a significant difference in mean birth weight percentile. In toxic exposures to tobacco smoke, a significant difference was found between mean birth weight percentile of babies exposed to maternal smoking during gestation and babies not exposed, but not for environmental tobacco (passive smoking).

3.2 Exposure data

Distribution of lichen diversity values is positively skewed with an elongated tail at right (Fig. 3b). Observed mean value for lichen diversity index value is 13.1 (Table 2).

Fig. 3
figure 3

a Variogram estimates (black dots), Exponential model (continuous line): ordinate represents variogram values and abscissa represents distance class (in km), b histogram

Geographical distribution of samples (Fig. 1) show lower values in the central area of Coastal Alentejo, when compared with values measured in north part of Coastal Alentejo. To quantify spatial dependence we set an 8 km lag distance and estimated the variogram function based on the method-of-moments estimator. We fitted an Exponential model, frequently applied in earth and environmental sciences (Goovaerts 1997), with the following estimated parameters:

$$\gamma \left( {\mathbf{h}} \right) = \left\{ \begin{array}{ll} {0,} & {{\mathbf{h}} = 0} \\ {95,\,2\left[ {1 - \exp \left( { - {{3*\left\| {\mathbf{h}} \right\|} /{14}}}\right)} \right],} & {{\mathbf{h}} \ne 0} \end{array} \right.$$

i.e., a range, a, of 14 km and a sill, C1, of 95.2 with no nugget effect (C0).

This function enabled a satisfactory fit to sample variogram of lichen diversity index (Fig. 3a).

Since the location of exposure measurements and health assessments don’t coincide, we conducted a sequential simulation with exposure data that provided simulated exposure measurements at health data locations. We ran 100 simulations of air quality using direct sequential simulation algorithm, in Fig. 4 we can see variogram, histogram and maps of three simulations generated.

Fig. 4
figure 4

LDVI results for simulations 19, 79 and 95: (first row) variogram function (continuous line) fitted on variograms estimates (black dots), (middle row) histograms and, (last row) map outputs

Global results showed satisfactory matches between histograms and variograms of observed data and simulations.

3.3 Exposure and health data

3.3.1 Overall sample

To measure associations between birth weight and exposure we applied generalized linear models. We began estimating crude (or unadjusted) health covariates parameters on birth weight percentile (Table 3). We computed univariate models (4) for every covariate. Univariate models fitted with prepregnancy BMI, parity, previous low birth weight, gestational BMI gain, gestational diabetes and smoking during pregnancy presented statistically significant parameters. Health covariates prepregnancy BMI, gestational BMI gain, gestational diabetes and smoking during pregnancy presented better (smaller) goodness of fit AIC measures.

Table 3 Statistical results of univariate generalized linear models fitted to health covariates

To explore crude associations between birth weight and exposure variable we used all the simulations computed previously, and fitted 100 univariate models (5) to estimate 100 unadjusted exposure parameters on birth weight percentile. A summary of model results are shown in Fig. 5, where the empirical distribution of exposure parameter estimates provided an estimated mean parameter with positive sign suggesting that an increase in lichen diversity index is associated with an increase in birth weight percentile.

Fig. 5
figure 5

Histogram of exposure parameter (univariate models)

We used bootstrap techniques to ascertain 99, 95 and 90 % confidence intervals (CI) for the mean estimated parameter (Table 4).

Table 4 Mean and 99, 95 and 90 % confidence intervals (lower, upper intervals) for exposure parameter

We see that all the confidence regions for the parameter of interest contains the zero quantity, suggesting association between simulated lichen diversity index values and birth weight percentile is not statistically significant at 0.01, 0.05 and 0.1 significance levels, respectively.

3.3.2 Subgroup of babies exposed to gestational tobacco smoke

We further explored associations between birth weight percentile and exposure during pregnancy in the subgroup of babies exposed to gestational tobacco smoke (n = 128). This exploratory analysis was developed on the basis of the hypothesis that babies exposed to gestational tobacco smoke are at higher susceptibility of lower birth weights if exposed to lower levels of outdoor air quality during gestation. For this subgroup, we repeated the previous step and used all the simulations computed and fitted 100 univariate models (5) to estimate 100 crude exposure parameters and to provide a mean and confidence intervals of the empirical distribution (Fig. 6).

Fig. 6
figure 6

Histogram of exposure parameter (univariate models) in subgroup of babies exposed to gestational tobacco smoke

As in the global sample, the empirical distribution of the parameter of interest provided a positive estimated mean parameter, suggesting that an increase in lichen diversity index is associated with an increase in birth weight percentile among babies exposed to gestational tobacco smoke. Furthermore, in this subgroup the exposure estimated parameter falls between 0.0011 and 0.0123 with a probability of 99 % (confidence interval) for the mean (Table 5), suggesting association between exposure and birth weight percentile among babies exposed to gestational tobacco smoke is statistically significant at 0.01 significance level. In multivariate analysis, we included prepregnancy BMI, previous low birth weight, gestational weight gain, preeclampsia and antenatal care, since these covariates have shown to be significative in this sub group of sample (results not shown) and performed a forward stepwise regression procedure for variable selection.

Table 5 Mean and 99 % confidence interval (lower, upper bound) for exposure parameter in subgroup of babies exposed to gestational tobacco smoke

Results (Table 5 and Fig. 7) still indicate that, with a probability 99 %, exposure parameters lie between 0.0002 and 0.0118, suggesting a significant association between exposure and birth weight percentile among babies exposed to gestational tobacco smoke, even after controlling for significant covariates.

Fig. 7
figure 7

Histogram of exposure parameter (multivariate models) in subgroup of babies exposed to gestational tobacco smoke

4 Discussion

In this article we characterized and evaluated the association between birth weight and outdoor air quality during gestation using a lichen diversity index value as surrogate. World Health Organization reported potential effects of air pollutants on birth weight (WHO European Centre for Environment and Health 2005), and several significant results concerning associations between air quality during gestation and birth weight outcomes have been reported (Wang et al. 1997; Dejmek et al. 1999; Bobak 2000; Maisonet et al. 2001; WHO European Centre for Environment and Health 2005; Rogers and Dunlop 2006; Perera 2008).

We presented geostatistical simulation on biomonitoring data as an alternative approach to assess the association between outdoor air quality and birth weight in a large region. To our knowledge this is the first time that the association between air quality and birth weight is studied using biomonitors. Exposure measurements from ecological biomonitoring data were used as an option to traditional air quality monitoring stations data to increase spatial resolution of exposure data. Even so, improved spatial resolution was not enough to match exposure and health data locations. To overcome this problem we could have estimated a map of mean outdoor air quality using a kriging estimator, but it would not provide a measure of exposure uncertainty, since variance of kriging estimator depends only on data configuration and not on observed data. Instead, we used a geostatistical simulation algorithm that provided several equiprobable maps from which we could draw estimates of individual exposure and a measure of individual exposure uncertainty. We decided to address uncertainty using a spatial uncertainty model (instead of a local uncertainty one) to assess a joint cumulative distribution of several air quality index levels at different locations. Using this measure, the set of independent realizations provided inputs to the generalized linear models that worked as transfer functions, yielding a distribution of response values (the estimated parameters for air quality index) representing a measure of parameter uncertainty (Goovaerts 1997).

One of the main concerns of using lichen diversity to assess air quality to measure the impact of air pollution on pregnancy outcome is that lichens tend to reflect long-term air pollution, while the impact of air pollution on pregnancy outcome is expected to derive mainly from shorter term variations during the pregnancy period itself. We are aware that the main limitation in the use of lichen index in health studies is that we don’t know exactly what specific period of time they are reflecting. However, using lichen diversity we know their index reflects the integration of biological effects of all pollutants (even of those we are not able to measure nowadays) over time, with more emphasis in the recent periods of atmospheric deposition. In the case of sudden air pollution episodes, lichens will either disappear or decrease in abundance, as a physiological response to the harmful effects of air pollutants. So, lichen index is always reflecting the worst pollution conditions, independently of seasonal variations or occurrence of sudden pollution episodes.

To overcome lichen diversity limitations on temporal resolution, we recommend a previous analysis on available data for air pollution (using existing air monitoring stations or passive sampling devices) to provide a comprehensive overview on seasonal variations and sudden pollution episodes, contributing to gain insight on the type of air pollution episodes reflected in lichen diversity data. From available data on three air monitoring stations placed in the region, (data not shown) we know that neither relevant sudden air pollution episodes nor considerable seasonal variations occurred over the entire region during observation period. Therefore we conclude lichen index is likely reflecting the long-term pollution and we feel comfortable to assume that the impact of air pollution on pregnancy outcome is expected to derive mainly from a steady mean air quality during the pregnancy period.

Using the whole sample, we found a positive association between air quality and birth weight. However, this association was not statistically significant. We observed crude significant associations between birth weight and prepregnancy body mass index, parity, previous low birth, gestational weight gain, gestational diabetes and in utero exposure to maternal smoking. The role of mother’s age at pregnancy, education or occupation on birth weight are shown in several articles (Kramer 1987; Fraser et al. 1995; Kristensen et al. 1997; Moreira et al. 2007) and underline the importance of psychosocial factors in birth weight outcomes. Our results did not show significant associations with any of these factors, but we did observe positive associations between birth weight and older age-groups, higher levels of education and intellectual workers (versus manual workers), as expected. Constitutional and nutritional factors are well established major determinants on birth weight (Cnattingius et al. 1998; Schieve and Cogswell 2000; Baeten et al. 2001; Kabiru and Raynor 2004). We found positive significant trends on birth weight by prepregnancy body mass index categories and by gestational weight gain categories. We also found significant associations between birth weight and previous obstetric factors. Nuliparity and occurrence of previous low birth weight were negatively associated with birth weight. These results are in accordance with other reports (Kramer et al. 1990; Shah 2010). Occurrence of previous preterm births was negatively associated with birth weight, however this trend was not statistically significant. We also analyzed some of pregnancy complications shown to have impact on birth weight (Kramer et al. 1990; Baeten et al. 2001; Kabiru and Raynor 2004). As expected, our results showed statistically significant association between birth weight and exposure to gestational diabetes. For each of the remaining complications, we observed a negative association between birth weight and exposure to preeclampsia, uterine bleeding and hypertension, but without statistical significance. Gestational exposure to maternal smoking was another studied determinant. It is known that babies exposed to gestational tobacco smoke are more prone to adverse pregnancy outcomes such as low birth weight or small for gestational age (Schieve and Cogswell 2000; Baeten et al. 2001; Moreira et al. 2007). We observed a statistically significant negative association between birth weight and exposure to gestational tobacco smoke. We further performed an exploratory analysis on the association between exposure and birth weight in the subgroup of babies exposed to gestational tobacco smoke. This exploratory analysis was aimed to generate hypothesis concerning susceptibility of birth weight to additional levels of outdoor air pollution among babies exposed to gestational tobacco smoke during gestation. Our findings from univariate analysis showed that an increase in air quality levels was followed by a modest but significant increase in birth weight. In multivariate linear regression, we considered prepregnancy BMI, previous low birth weight, gestational weight gain, preeclampsia, antenatal care and air quality as explanatory variables, known to be determinants of birth weight. Results showed once again, a modest but significant association between exposure and birth weight among babies exposed to gestational tobacco smoke, even after controlling for significant covariates. This estimated association may have little clinical significance at individual level but it might contribute to favour further research of impacts on birth weight of babies exposed to maternal smoking from additional exposure to lower levels of outdoor air quality (e.g. case–control studies).

5 Conclusions

Semi-ecological studies on air pollution are a valid research method to assess effects of air pollution in humans. On the framework of a semi-ecological study we assessed the association between birth weight and outdoor air quality during gestation, using a lichen diversity index value as surrogate for exposure. Lichen data was used as an option to traditional air quality monitoring stations to achieve higher spatial resolution of exposure measurements. To our knowledge this is the first time that the association between air quality and birth weight is studied using biomonitors. Instead of using birth weight as outcome, we used exact birth weight percentiles by gestational age that provided an optimal assessment of fetal growth. To overcome the limitation derived from misalignment between exposure measurements and health data locations, we used geostatistical simulation that provided a measure of exposure’s spatial autocorrelation (using a variogram function), a measure of exposure estimate (using each simulation) and a measure of exposure’s uncertainty (using all simulation outputs). To incorporate uncertainty in our analysis we used generalized linear models, fitted simulation outputs and health data on birth weights and assessed statistical significance of exposure parameter using non-parametric bootstrap techniques. As expected from literature review, our results showed statistically significant associations between birth weight and exposure to several known risk factors (prepregnancy body mass index, parity, previous low birth, gestational weight gain, gestational diabetes and in utero exposure to maternal smoking). Results also suggest a modest but significant association between birth weight percentile and exposure, among babies exposed to gestational tobacco smoke. We think these results could contribute to hypothesize if additional exposure to low levels of air quality affects birth weight of babies exposed to maternal smoking.