Introduction

In the last decade transportation policy has tried out various ‘hard’ policy measures (e.g., physical improvements to transport infrastructure or operations, traffic engineering, control of road space and changes in price), aiming to stop or at least to reduce the steady increase of car use. However, despite huge financial investments, these ‘hard’ infra-structural initiatives alone seem to have failed to deliver the shifts from car use that were hoped for and expected (e.g., Stopher 2004).

Perhaps, because of these sobering experiences, one can observe growing interest in transport policy focusing on a range of initiatives which are widely described as ‘soft’ policy measures. Typical soft measure examples are workplace travel plans, personalised travel planning, public transport marketing, and travel awareness campaigns. In a recent review Cairns et al. (2004) develop ‘low’ and ‘high intensity’ impact scenarios of future soft travel measure implementations. Using evaluation results on the effectiveness of these measures as input, in their ‘high intensity’ scenario, Cairns et al. (2004) estimate that in the UK a consistent implementation of comprehensive soft-policy programs may result in a reduction of the total traffic up to 11%.

If this claim is real, then soft travel measure programs would be a very valuable transport policy approach. Not at least because of the results of the Cairns et al. review, the UK Department for Transport (2005) has decided to integrate soft travel measures as a vital part of its local transport strategy and in the next 10 years to invest substantial financial resources intended to motivate local authorities to implement soft travel measure programs at the regional and local level.

This underlines the high significance of research synthesis for a transport policy which tries to base its decisions on reliable empirical evidence: Politicians are less interested in the results of single evaluation study but in the general picture emerging from momentarily available empirical research findings concerning their decision problem. Besides a critical evaluation of the methodological quality of these findings, they expect research synthesis to provide answers to the following questions: On the average, what impact can be expected from a specific policy strategy and are there important factors moderating the impact of this strategy in specific context or for specific target groups?

To answer these questions each reliable research synthesis has to solve two main problems: the first problem consists in compiling a representative sample of research results. Even the use of modern bibliographical tools and the Internet does not guarantee that the retrieved study sample is, in the end, representative of the total of studies conducted on a topic. For example, as negative results are frequently not publicly reported using only results published in journals or as working papers may constitute a biased sample of what has actually been found. A research synthesis based on an unrepresentative sample of results would mistakenly be considered to represent the ‘truth’, whereas in fact it is more a biased representation of what has been published.

The second main problem consists in how to condense and synthesise the compiled body of primary research findings. All the research reviews we know on the effectiveness of soft policy transportation measures use narrative techniques for these purposes: The strengths and weakness of the compiled quantitative evaluation results as well as the general trend emerging from these results are mainly analysed by verbal comparison and discussion. From our viewpoint the dominance of narrative research synthesis methods in the field is astonishing. Lastly, the central aim of these reviews is to synthesise adequately the trends of a body of quantitative ‘raw data’. Analysing such quantitative data by verbal comparison and discussion stands in stark contrast to the standards normally requested in primary research. Here most researchers would reject conclusions based on a narrative analysis of a body of quantitative data (e.g., Button and Kerr 1996). The reason for this rejection is that simultaneously processing many pieces of information overstrains our cognitive capabilities. Thus, trying to narratively describe and summarise the complex relationship between a set of explanatory factors and the interested outcome variable is very susceptible to subjective biases. Using multivariate statistical tools is a more objective and reliable way of solving this task.

The other reason for our scepticism concerning the capability of the narrative approach to synthesise quantitative data adequately relates to the fact that the process by which it reaches its conclusions often remains unclear. Especially in the case of contradictory results—that is when the compiled data base indicates a strong variability of available result—the process by which these contradictory findings are weighted and integrated is frequently not clear. As the guidelines researchers used within this process remain implicit, other researchers often have difficulties to evaluate and replicate their results.

To summarise there is growing doubt whether the narrative approach provides a scientifically defensible way of synthesising a body of quantitative research results.

The present research

The main goal of the present paper is to present quantitative meta-analyical methods as an alternative approach for synthesising quantitative research finding. In simple terms, meta-analysis is the “analysis of analyses” (Hunter and Schmidt 1990) that is the use of formal statistical techniques to sum up a body of quantitative research findings.

In the field of economics, meta-analysis was first introduced in environmental economics. Florax (2002) shows that approximately 40 meta-analyses appeared between 1980 and 2001, half of them addressing the valuation of pollution and recreation, and one-third being concerned with the nexus of agriculture, land use, and natural resources. Although environmental economics has been leading the way, the concept of meta-analysis is now being picked up in other areas of economics like labour economics (e.g., Groot and Maassen van den Brink 2000), and industrial organisation (e.g., Fuller and Hester 1998). There are also a growing number of meta-analyses on transportation economic issues like the value of travel time and demand elasticities (e.g., Button 1995; Goodwin 1992; Wardman 2001, Waters 1995), the technical efficiency of public transit operators (Brons et al. 2005), urban traffic constraint policies impact (Button and Kerr 1996), and land use impact (Button et al. 1995; Button and Rietveld 2000). However, to the best of our knowledge no study has yet tried to use meta-analytical techniques for synthesising research findings on the car reduction effect of soft policy measures.

The first section of our paper presents a short summary of the statistical model underlying the later applied meta-analyical techniques. The second section describes the aims, design, and conclusions of two reviews (Cairns et al. 2002, 2004) narratively synthesising quantitative information from 44 case studies evaluating the effect of a specific soft transport policy—work travel plans—on commuters’ car use. In our paper we want to re-analyise the information reported in these two reviews with quantitative meta-analytical techniques. This allows us to directly compare the conclusions drawn from the narrative synthesis of these data with the conclusions drawn from our meta-analytical synthesis.

We started our analysis with a demonstration of how to use a graphical tool—the funnel plot—for assessing the representativeness of the compiled data base. In the next step we statistically analyse the variability of the reported 44 effect sizes and showed how to take this variability adequately into account when estimating a pooled weighted mean effect size (ES). In the fourth step we demonstrated how to use weighted mixed-effects meta-regression to analyse the influence of organisational, site, or work travel plan characteristics quantitatively on the effect sizes reported in the evaluation studies. In the last step, we compared our meta-analytical results with the conclusions Cairns et al. (2002, 2004) drew from their narrative review.

The statistical model underlying meta-analysis

Taking into account the fact that the ES estimated in different primary studies on the same topic are based on different sample sizes, and hence have different levels of statistical precision, meta-analysis specifies the following statistical model:

$$T_{i} = T_{i}^{\ast } + e_{i,}$$
(1)

where T i is the estimated effect found in a specific primary study, T * i is the “true” effect (obtained if the entire target population was evaluated), and e i is the error due to estimation on a sample smaller than the population. It is assumed that e i has a mean of zero and a variance of v i .

Multiplying the effect of each primary study included in a meta-analysis with the inverse of v i , 1/v i , provides an estimate of the weighted mean ES that considers the fact that v i varies across studies. If sampling variation is the only source of variation in the reported by primary ES, weighting in this manner produces a precise estimate of the mean ES.

However, there are two other potential sources of variation that, if present, have to be taken into account when estimating the weighted mean ES. One source has to do with the fact that the estimates are produced over different time periods, for different population groups, in different locations, and so forth. The other source results from the fact that not measured or unmeasurable factors often cause additional variation in study effects. Each of these sources of variation may be identified by extending the model described by Eq. (1) in the following way:

$$T_{i}=\beta _{0}+\beta _{1}X_{1}+\beta _{2}X_{2} + \beta _{3}X_{3} + \cdots + \beta _{p}X_{p} + e_{i} + u_{i}$$
(2)

where β 0 is the model intercept, the Xs are observed characteristics of the studies that cause variation in the true program effects T * i , the βs are coefficients representing the marginal effects of the characteristics on the true program effect, e i is the sampling error, and u i is a random error term with variance σ 2, representing unmeasured factors causing variation in program effects.

If the βs are not zero, but u i is identically zero for all studies, then the model is referred to as a fixed-effects model. For a more formal test whether u i is identically zero for all studies the Q-statistic can be used. This test statistic is given by Q = Σ w i (T i  − β 0 − β 1 X 1 − β 2 X 2 − … − β p X p )2, where w i  = 1/v i . Q is approximately distributed as Chi-square, with n − p − 1 degrees of freedom.

If the βs are not zero and u i , as well as e i , varies across studies, then the model is referred to as a mixed-effects model. The weight used in estimating the mixed effects model is the inverse of the sum of the sampling error plus the random effects error (1/[v i +  σ2]). Clearly, the fixed-effects model is a special case of the mixed-effects model.

To estimate the mixed effects model, an estimate of σ2 is needed (e.g., Raudenbush 1994). One procedure, based on a method of moments estimator, uses ordinary least squares (OLS) to estimate the model depicted in Eq. (2) without the random error term. Then, the mean square residual variance from the regression is used to calculate an estimate of σ2, based on the following formula:

$$ \sigma^{2}= {\rm MSR} - k/(n-p-1), $$
(3)

where MSR is the mean square residual from the OLS regression and k is a constant given by the following formula (see Raudenbush 1994, p. 319):

$$k = \Sigma v_{i}-{\rm trace}\,\left[{\bf X'VX(X'X)}^{-1}\right]$$
(4)

where the boldface refers to matrix notation for the vector of p explanatory variables (the X i ) and the n sampling variances (the v i ), and trace is the sum of the diagonal elements of the resulting matrix. After obtaining the estimate of σ2, the model is re-estimated by weighted least squares, using 1/[σ2 + v i ] as weights.

Good textbooks are now available which describe in more detail the presented model as well as the statistical methods that can be used to estimate the model parameters (e.g., Hedges and Olkin 1985; Rosenthal 1991; Cooper and Hedges 1994; Lipsey and Wilson 2001).

The data-base

As stated above, the present paper aims at a meta-analytical re-analysis of a body of quantitative information taken from 44 case studies evaluating the effectiveness of work travel plans reported by two recently published narrative reviews (Cairns et al. 2002, 2004). Work travel plans are defined as bundles of measures put in place by an employer to encourage more sustainable travel, particularly less single occupancy car use (e.g., Schreffler and Organizational Coaching 1996). The first review (Cairns et al. 2002) is entitled ‘Making travel plans work’ and was commissioned by the UK Department of Transport. It is the biggest and most careful study so far on the car reduction effects of work travel plans in the UK context. The main goal of this review is to analyse the impact of organisational, site, and travel plan characteristics on the effectiveness of work travel plans that exemplify ‘best practice’ work travel plans on many dimensions. To collect their best practice work travel plan examples, Cairns et al. (2002) used the following three stage approach: In the first stage a comprehensive survey by Steer Davies Gleave (2001) served as one key resource for identifying potential best practice candidates. In the second stage a sample of best practice case studies were selected from this list of potential candidates which includes organisations from a range of sectors (health facilities, private sector companies, government organisations and local authorities); a range of locations (town centre, suburban and rural), and a range of sizes. The key criteria for including an organisation in the review was the existence of evaluation data monitoring the effectiveness of the work travel plan. Table 1 presents the list of the 21 case studies finally included in the review. To obtain information on potential success factors for these 21 case studies, in the third stage, a semi-structured interview was conducted in each included organisation to assess organisational and site characteristics as well as a full list of all implemented travel plan measures.

Table 1 Before / After percentage of staff not commuting with car and the calculation of the effect size (Cohen’s h) for the 21 Work Travel Plans reported by Cairns et al. (2002)

Because the Cairns et al. (2002) review reports empirical evidence from a sample of ‘best-practice’ case studies, one may ask whether its findings can be generalised to ‘normal’ travel plans. Fortunately, in a second review Cairns et al. (2004) reported additional 23 case studies evaluating the effectiveness of more ‘normal’ work travel plans. This review is titled ‘Smarter choices: changing the way we travel’ and was also commissioned by the UK Department of Transport. Because the scope of the Cairns et al. (2004) review is much broader (it summarises evidence on the effectiveness of 11 different soft transport policy measures), it does not provide the detailed organisation, site, and intervention related information reported in the Cairns et al. (2002) review. Table 2 presents the additional 23 case studies taken from the Cairns et al. (2004) review. Because both reviews use the same before/after proportion of car driving commuters as effectiveness measure we can pool the two data sets.

Table 2 Before/after percentage of staff not commuting with car and the calculation of the effect size (Cohen’s h) for the 23 Work Travel Plans reported by Cairns et al. (2004)

Main conclusions of the narrative reviews

We now briefly summarise the central conclusions that Cairns et al. (2002, 2004) draw from their narrative analysis of the compiled data. These conclusions simultaneously form the empirical hypotheses we want to test with quantitative meta-analyical techniques. Cairns et al. (2002) estimate that across the reviewed 21 case studies the introduction of a work travel plan is associated with an average reduction of 14 cars driven to work for every 100 staff. Using the median instead of the mean results in an average reduction of 12 cars per 100 staff. For the additional set of 23 case studies, Cairns et al. (2004) estimate a quasi identical average car use reduction. However, both reviews report a strong variation in the achieved car use reduction. Across all 44 studies the change in after car use varies from a 52%-point decrease to a 17%-point increase. This variability indicates that site, organisational, or intervention characteristics may have a strong moderating effect on the ES reported by a specific work travel plan.

However, from their narrative analysis of the association between over 60 study descriptors and the ES variability Cairns et al. (2002) draw the conclusion that very few generalisations could be made: They view little evidence that organisation features like size, organisational type, lower paid staff, proportion of women employed, the age of the workforce, or site location (e.g., rural vs. town centre) are strongly associated with the observed ES variability. They also view little evidence that the no-car alternatives particularly promoted by a work travel plan makes a difference: some organisations had achieved strong effects by focusing on a range of modes, whilst others had been relatively successful by only focusing on one, such as train use or car sharing.

According to Cairns et al. (2002) the only factor that seems to matter is addressing parking. For the 13 travel plans which had addressed parking, either by restricting the number of staff entitled to park in the organisation’s car park, by introducing parking charges or by providing significant payments for giving up parking, the average car use reduction was 24%, and the median was 17%. For the eight travel plans which had not addressed parking, the average car use reduction was 10%, and the median was 9%.

If these conclusions represent an adequate summary of the quantitative information obtained from the analysed case studies, our meta-analytical results should also indicate a strong impact of parking related work travel plan measures whereas the impact of other travel plan features as well as organisational and site related features should be low.

Meta-analytical results

Assessing the representativeness of the data base

As discussed above, assessing how representative the results of the compiled case studies are for the body of actually existing studies is critical for the validity of each research synthesis. In the meta-analytical context a graphical tool—funnel plot—is recommend for checking the chance that representativeness biases are present (e.g. Light et al. 1994). A funnel plot graphically presents the bivariate distribution of sample size versus ES. If no bias is present this plot should be shaped like a funnel, with the spout pointing up that is, with a broad spread of points for the highly variable small studies at the bottom and decreasing spread as the sample size increases. However, the mean ES should be the same regardless of sample size. That is, one should be able to draw a vertical line through the mean ES, and the points should be distributed on either side for all sample sizes. In other words, the funnel should not be skewed.

Figure 1 presents the respective funnel plot of the 44 studies retrieved from the two Cairns et al. reviews. The x-axis represents Cohen’s h, the statistic we use for standardising the ES reported in the single evaluation studies (see below), the y-axis represents study weight (1/SE) which contains the same information as sample size. The vertical line goes through the estimated mixed-effects mean ES (see below).

Fig. 1
figure 1

Funnel plot for the 44 effect sizes (ES) reported in the Cairns et al. (2002, 2004) reviews

As can be seen from Fig. 1 with the exception of three potential outliers (see below) the plot of the 44 studies shows a picture quite consistent with the requested funnel pattern. This result does not provide evidence that the representativeness of the data base is threatened by severe retrieval or reporting biases.

Defining and calculating the ES statistic

The key to meta-analysis, is defining an ES statistic capable of presenting the quantitative findings of a set of studies in a standardised form that permits meaningful numerical comparison and analysis across the studies. In the present meta-analysis we use the before–after proportions of car driving commuters reported by Cairns et al. (2002, 2004) as ES measure. To prevent a negative sign of the ES, in the first step, we converted these car-use proportions into their corresponding no-car-use proportions (1—car-use proportion, see Tables 1 and 2). Instead of using the change in the before/after no-car-use proportions directly as ES statistic statisticians recommend the use of arcsine-transformed proportions for calculating the ES (e.g., Lipsey and Wilson 2001). Tables 1 and 2 present the arcsine-transformed before/after no-car-use proportions as well as the standardised ES statistic resulting from subtracting the transformed before proportion from the transformed after proportion (so-called Cohen’s h, Cohen 1988). In a second step the standard error (1/n NoCARbefore + 1/n NoCARafter) was calculated for each ES, whose inverse (w = 1/ SE2) is used as study weight when calculating the mean ES. One problem encountered when estimating the standard error was that the two reviews frequently provide no complete information of the before/after surveys samples. To solve this missing value problem we multiplied the total staff number with the respective before/after proportion of staff not commuting by car for each case study. We view these numbers (see Tables 1 and 2) as the most reliable estimate of the probable sample size and used their inverse for calculating the ES variance. For the Orange (Temple Point) case study this procedure results in an estimated ES variance of 0.015239 (400 × 21% = 84; 400 × 73% = 292; 1/84 + 1/292 = 0.015239).

Estimating a fix-effects weighted mean effect size

If one assumes that sampling variation is the only source of ES variability, the above described fixed-effects model provides an adequate estimate of the mean ES. Figure 2 presents a graphical presentation (the so-called Forest plot) of the fixed-effects weighted 21 primary ES (black squares) taken from Cairns et al. (2002) review. The horizontal lines of the squares represents the 95% confidence interval (CI) of each primary ES. The diamond in the last row of Fig. 1 represents the calculated weighted fixed-effects mean ES. Across all 21 case studies, the fixed-effects mean ES is 0.27. The z test value of this estimate is 27.56, which exceeds the critical z-value of 1.96 (α-level 0.05). Correspondingly, the 95% CIs around the mean ES (0.25 < μ < 0.29) do not include zero. Figure 3 presents the respective Forest plot of the 23 ES taken from Cairns et al. (2004). Across these 23 studies the calculated fixed-effects mean ES is 0.18 (z-value = 14.42; 95% CI = 0.16 < μ < 0.21).

Fig. 2
figure 2

Forest plot of the 21 ES reported in the Cairns et al. (2002) review

Fig. 3
figure 3

Forest plot of the 23 ES reported in the Cairns et al. (2004) review

The difference between the mean fixed-effects ES estimated for the 21 case studies reported by Cairns et al. (2002) and the 23 case studies reported by Cairns et al. (2004) is statistically significant (Qbetween = 30.43, df = 1; p < 0.001), which indicates on average a stronger ES for the 21 ‘best practice’ case studies. Pooling the two data-sets results in a fixed-effects mean ES of 0.24 (z-value = 30.61; 95% CI = 0.22 < μ < 0.25).

Testing the homogeneity of the primary ES

Even if already a simple visual inspection indicates a strong variability of the 44 ES distribution we use the Q statistic for a more formal test of this assumption. As expected, for both data set the value of the Q statistic is significant (Cairns et al. 2002; Q = 531.87, df = 20; p < 0.001; Cairns et al. 2004; Q = 196.06; df = 22; p < 0.001). Obviously sampling variability alone cannot explain the ES heterogeneity.

In the meta-analytical context there are different approaches for identifying the sources of this between-study variability. One approach consists in using graphical tools like box plots for identifying potential outliers that is single ES which lay considerably above or under the mean of the total ES distribution. The logic behind this approach is that extremely strong or weak ES are probably not representative for the ‘true’ population ES.

For the ES reported in the 44 case studies the box plot indicates three potential outliers whose ES lay more than two standard deviations above respectively below the mean of the ES distribution. With an ES of 1.1 the Orange (Temple Point) case study is one of these potential outliers. The Cairns et al. (2002) review reports some information which might explain this unusually high ES: The telecommunication company Orange originally began travel planning work on their two sites in North Bristol. However, during study time a significant proportion of those staff were relocated to a new building in the centre of Bristol. Thus the observed strong car use reduction effect reported by this case study probably reflects not only the effect of the introduced work travel plan but also the effect of this relocation. With a ES of 0.77 Bluewater shopping centre is the second potential outlier. Again the Cairns et al. (2002) review reports some features which distinguishes this case study from the other case studies: To increase its accessibility for customers Bluewater invested £5 million in a complete remodelling of the local bus network, with major increases in services, waiting and information facilities, ticket discounts, a month’s free travel for some staff, coupled with parking restrictions. The third potential outlier, HS Prison Birmingham, is taken from the Cairns et al. (2004) review and reports a strong negative ES of −0.64. Unfortunately Cairns et al. (2004) report no information providing a explanation of this divergent finding.

Deleting the three potential outliers from the pooled data set significantly decreases the value of the Q-statistic (from 758.36 to 296.35), however, this value is still significant (df = 40, p <  0.001) indicating a considerable degree of remaining ES variability. Without the three outliers the estimated mean fixed-effects ES drops to 0.19 (z-value = 24,17; 95% CI = 0.18 < μ < 0.21). An interesting additional result is that after removing the three outliers the difference between the ES estimated for the two case study set separately is no longer statistically significant (Q = 0.39; df = 1; p = 0.53). Obviously the above reported higher mean ES of the 21 ‘best practice’ case studies is not substantive but reflects mainly the impact of the three outliers.

Estimating a mixed-effects weighted mean effect size

The last analyses indicate, that a mixed-effects model is probably a more adequate model for estimating the weighted mean ES. As discussed above, besides sampling error the study weight 1/[σ2 + v i ] calculated under the mixed-effects assumption explicitly takes between-study variability into account. For the 21 ES taken from Cairns et al. (2002), the mixed-effects mean ES is 0.30 (z-value = 5.69; 95% CI = 0.20 < μ < 0.40). This estimate is significant, however, compared with the fixed-effects model the respective z-value is much lower. This result demonstrates that under the condition of significant between-study variability the application of a fixed-effects model drastically inflates Type I error. Deleting the two potential outliers Orange and Bluewater from the Cairns et al. (2002) data set results in a mixed-effects mean ES of 0.23 (z-value = 7.48; 95% CI = 0.17 < μ < 0.29). For the 23 primary studies taken from Cairns et al. (2004), the mixed-effects mean ES is 0.18 (z-value = 4.01; 95% CI = 0.09 < μ < 0.27). Deleting the potential outlier HS Prison from this data set results in a mixed-effects mean ES of 0.22 (z-value = 5.05; 95% CI = 0.13 < μ < 0.30).

For the pooled 44 studies the mixed-effects mean ES is 0.24 (z test value = 6.87; 95% CI = 0.17 < μ < 0.31). Removing the three outliers results in a estimated mixed-effects ES of 0.23 (z test value = 9.08; 95% CI = 0.18 < μ < 0.28).

From our viewpoint the mixed effect ES of 0.23 provides the most adequate quantitative summary of the car reduction effect reported across the 41 case studies. For a better understanding what a ES of 0.23 means we have to relate it to the original metric, in our case the not transformed proportion of car driving commuters. Thus we used the study weights calculated under the mixed-effects assumption to estimate the weighted not transformed average before car driver proportion. Across all 41 studies this proportion is 64 out of 100 staff. Related to this before car driver proportion a ES of 0.23 indicates a decrease to 53 out of 100 staff.

Exploring predictors of the ES variability

In this central section we want to test whether our meta-analytical results confirm the conclusions Cairns et al. (2002) draw concerning the impact of case study descriptors on the observed ES variability. Unfortunately, the Cairns et al. (2004) review does not report such information. Thus the following analyses are based only on the 19 case studies (without the outliers Orange and Bluewater) taken from Cairns et al. (2002). In a first step we critically review the over 60 study descriptors reported by Cairns et al. Because many of them seem to contain similar information, or information not central we reduced them to a set of 21 study descriptors. In the next step we assigned these 21 descriptors to five more global study descriptor ‘packages’: The first set of three descriptors related to the monitoring process itself (data problems, year when the monitoring process started, and duration of monitoring in months), a second set of five descriptors related to characteristics of the organisation introducing the work travel plan (private vs. public organisation, staff size, female/male bias of staff, average income, age), a third set of four descriptors related to characteristics of the site where the organisation is located (rural vs. urban location, proportion of staff living within 3–5 miles distance, walking and cycling access), and the fourth set of five descriptors related to the measures used by the introduced work travel plan to promote employees’ use of no-car alternatives (number of introduced measures for promoting bus/rail, cycling, walking, and car sharing as well as the total amount of money invested in the introduced work travel plan measures). The last predictor set consists of four descriptors related to parking (amount of offsite parking from little to ample; whether less than 100% of staff are entitled to park in the organisation’s own car park; whether those entitled to park are charged on a daily or annual basis, and whether the organisation offers those entitled to park a financial incentive to give up their parking permit).

As discussed above, meta-regression is a statistical tool which can be used for the quantitative multivariate test how a set of predictors is associated with the observed ES variability. However, in the present study the combination of a small study sample with 21 potential predictors increases the occurrence of colinearity problems. For this reason we use the following multi-stage approach for conducting the meta-regression analysis. In the first step we calculated the bivariate association between each of the 21 study descriptor and the weighted mixed-effects ES. The results of these bivariate analyses were later used to check the impact of colinearity on the multivariate results (signs, magnitude of coefficients). Also for reducing colinearity problems, in a second step highly correlated predictors (r > 0.60) were excluded. In the third step, for each of the five ‘descriptor packages’ a separate multiple meta-regression analysis was performed where all the single study descriptors assigned to one of these four descriptor ‘packages’ were used as predictors of ES variability. In the last step those study descriptors for which the five separate multiple meta-regression analyses indicate a significant association with ES variability were includes as predictors in the final multiple meta-regression model.

For reasons of space it is not possible to document the results obtained on each of these steps in detail. However, because of the significance Cairns et al. attributed to parking Table 3 presents the results of bivariate meta-regression analysing for each for the four parking related study descriptors separately how it is associated with the distribution of the mixed-effects weighted ES. In the bivariate case the standardised regression coefficient (ß) can be interpreted as bivariate correlation. As can be seen from Table 3, for the two parking related study descriptors ‘amount of offsite parking’ and ‘<100% of staff entitled to parking’ meta-regression provides little evidence that they are systematically associated with the ES distribution. The bivariate correlation of parking charges with the ES distribution is r = 0.20 that is the existence of parking charges explains 4% of the ES variability. However, this correlation is statistically insignificant. The only parking related study descriptor which shows a substantive association with ES variability is payments for giving up parking. The correlation of this study descriptor with ES variability is r = 0.55. This correlation is statistically significant and explains 30% of ES variability. However, it could be argued that Cairns et al. (2002) do not assume that one specific parking related descriptor is special importance but that addressing parking in general is decisive. For a direct test of this assumption we created a new variable with a number one for all cases studies addressing parking and a zero for all travel plans not addressing parking. As can be seen from Table 3, the correlation between this variable and ES variability is low (r = 0.09) and statistically insignificant.

Table 3 Results of weighted mixed-effects meta-regression models testing the bivariate association between four parking management measures and the work travel plan effect sizes (without the two outliers, N = 19)

Conducting similar analyses in the other four study descriptor ‘packages’ reduces the set of potentially influential study descriptor from 21 to 10. Table 4 presents the results of the final mixed-effects multivariate meta-regression analysis: Of the 10 predictors statistically significant in the five separated setwise regression analyses only five remain significant: cycling access, staff with female bias, organisation type, duration of the monitoring process, and incentive payment for giving up parking. More detailed analyses show that work travel plans implemented on sites with poor or average cycling access have stronger ES than travel plans implemented on sites with good or excellent cycling access. Therefore, in Table 4 cycling access was dichotomised with zero for good/excellent cycling access and one for poor/average cycling access. All predictors have positive signs that is higher values are associated with greater ES. That means that travel plans implemented on a site with poor/average cycling access, in a public organisation with a proportion of female staff above 70%, with a longer duration of the monitoring process, and payment for giving up parking report the strongest ES. Together the five descriptors explain 73% of ES variability.

Table 4 Results of the final mixed-effects meta-regression model for the Cairns et al. (2002) data set without the two Outliers Orange Temple Point and Bluewater, N = 19

Discussion and conclusion

The goal of our paper is to present meta-analysis as an alternative, statistical approach for synthesising a body of quantitative research findings. To demonstrate this approach we used the information from 44 case studies on the impact of work travel plans on commuters car use recently summarised by two narrative reviews (Cairns et al. 2002, 2004). We decided to re-analyse this data set because this allows us to compare the conclusions Cairns et al. draw from their narrative synthesis with our meta-analytical results. To prevent misunderstandings, the goal of our paper is not to blame these authors for doing a bad job. On the contrary, especially the Cairns et al. (2002) review is one of the best narrative reviews we know on the soft policy topic. These authors have done an excellent job in retrieving and compiling a representative sample of empirical information on the effects of work travel plans available at that time. The Funnel plot used for checking the representativeness of the data set compiled by Cairns et al. (2002, 2004) more objectively provides little evidence that the representativeness of these data is threatened by severe reporting and retrieval biases. These results provide further evidence for the effort and care Cairns et al. invested in compiling a representative data base. However, the funnel plot also indicates that three case studies had better be treated as outliers that is the results reported by these case studies seem to be influenced so heavily by unique external circumstances that their results are probably not representative for the effect one could expect on the average from the introduction of a work travel plan.

Thus our main concern relates to the decision of Cairns et al. to use a narrative approach for analysing and synthesising the compiled body of quantitative research findings. From our viewpoint quantitative meta-analytical techniques provide a more systematic, transparent and powerful alternative for solving this task. One great advantage of meta-analysis consists in the formal statistical model it provides for defining, analysing, and treating the heterogeneity of the compiled research results. Taking heterogeneity into account is critical for estimating adequately the general trend of the data as well as identifying factors associated with the observed variability of the reported research findings. The above described statistical model discriminates three potential heterogeneity sources: heterogeneity due to sampling error, heterogeneity due to systematic between-study differences, and heterogeneity due to the influence of unmeasured or unmeasurable variables. For the Cairns et al. data set our analyses indicate that all three heterogeneity sources are present.

For a demonstration of how taking these three heterogeneity sources into account affects the estimated general trend of a data set, we calculated the average unweighted car reduction effect, the fixed-effects weighted car reduction effect and the mixed-effects weighted car reduction effect across the 41 case studies (without the three outliers). Using the unweighted mean results in a average 10%-point car use reduction from 65 per 100 staff before to 55 per 100 staff after. Using the fixed-effects approach results in an average 9%-point car use reduction from 58 per 100 staff before to 49 per 100 staff after. Using the mixed-effects approach results in an average 11%-point car use reduction from 64 per 100 staff before to 53 per 100 staff after. From a methodological standpoint the mixed-effects approach provides the most defensible estimate for the average car reduction effect across the 41 studies.

Using meta-analyis also allows us to test statistically the probability that the estimated mean effect reflects only random fluctuations as well as how precise the estimated mean effect is. For our data set the analyses indicate a probability less than 1% that the car reduction observed after the introduction of work travel plans reflects only random fluctuation. However, because of the small study sample the precision of the estimated average effect is low. With a probability of 95% the ‘true’ average effect of work travel plans lies somewhere between a 9 and 15%-point car use reduction effect per 100 staff.

The second main advantage of meta-analyise is that it allows multivariate statistical tools to be used for assessing the association between study descriptors and heterogeneity of the reported weighted ES. In the present case, the application of this tool leads to different conclusions from those drawn by Cairns et al. (2002) on the basis of their narrative synthesis: These authors view parking as the factor most strongly associated with the ES variability. According to their analysis work travel plans which had addressed parking, either by restricting the number of staff entitled to park in the organisation’s car park, by introducing parking charges or by providing significant payments for giving up parking, on average result in a much stronger car reduction effect than travel plans not addressing parking. Our meta-analytical results provide only limited support for this conclusion: The parking related study descriptors ‘restricting the number of staff entitled to park in the organisation’s car park’, ‘introducing parking charges’ and ‘amount of offsite parking’ are not significantly associated with the ES variability. The only parking related study descriptor which is substantively associated with the observed ES variability is ‘offering payments for giving up parking’.To summarize, our results indicate that not parking restraints but only positive financial incentives for giving up parking voluntarily are significantly associated with the degree of car reduction achieved by the 44 analysed work travel plans.

From their narrative data synthesis Cairns et al. (2002) also conclude that there is little evidence of a strong and consistent association between organisational and site characteristics and the observed ES variability. Again our meta-analytical results draw another picture: Besides incentives for giving up parking voluntarily, the private or public nature of an organisation, a high proportion of females on the staff, and a poor/average cycling access to the site are significantly associated with stronger work travel plan effects. We can only speculate about the reasons behind these results. It may be that the staff of public organisations producing public goods like health and education also feel a stronger commitment to the goal of reducing the negative environmental and health related impact of car use. The gender impact may reflect that—if a acceptable alternative is available—females are more willing to give up their claim to use the family car than males. At the first sight the positive association between a poor/average cycling access and stronger work travel plans is difficult to understand. However, a possible explanation may be that in these sites the introduction of a work travel plan offers an attractive no-car commuting alternative to those who used to feel uncomfortable when using the car but do not regard cycling as an acceptable alternative. Furthermore, our results indicate that features of the monitoring process itself have an influence on the reported work travel plan effects: On average case studies with a longer monitoring interval report stronger ES. Together the five significant study descriptors explain 73% of the ES variability reported in the Cairns et al. (2002) review.

From our viewpoint the results of the meta-regression provide a valuable tool for practical planning purposes: When a planner is confronted with the problem how to allocate limited resources optimally to potential work travel candidates she/he can directly use the estimated regression parameters to identify the sites with the highest car use reduction potential. For example, introducing the respective parameters of the University of Bristol into the regression equation ES = −0.0894 + (0.0921 × poor cycling) + (0.0963 × female) + (0.1095 × type) + (0.0024 × month) + (0.1044 × payalt) results in a estimated ES of 0.22, whereas for Plymouth Hospital the estimated ES is 0.50. From our viewpoint such a quantitative equation is the most practical result practitioners can expect from research synthesis.

However, we also have to mention some weaknesses of the present meta-analysis: Especially the meta-regression results are based on a very small sample size of 19 ‘best practice’ case studies. Thus the regression results reflect the fitting of a linear equation to this idiosyncratic small data set and not the test of a prior formulated hypotheses. Future research has to check whether these results could be replicated with a new, larger data set. A second concern relates to the fact that because of missing information we were forced to use proxies for the actual sample sizes on which the reported before–after car use proportion are based. How strongly the use of proxies influences the calculated study-weights also remains open.

However, our most fundamental concern relates to the question whether the estimated weighted mean effects sizes can be interpreted as reflecting the causal impact of work travel plans. All 44 evaluation results reported by Cairns et al. (2002, 2004) are based on a weak quasi-experimental evaluation design, namely the one-group-pre-post-test design. The inability of the one-group-pre-post-test design to exclude the influence of history, maturation, testing, mortality and regression effects as alternative explanations of the found car use reduction (e.g., Shadish et al. 2002) leaves the question open whether—and if so—how much of this change can be attributed to the causal impact of the implemented work travel plans. Thus one has to be cautious in interpreting the available weak quasi-experimental evidence: The evaluation results may underestimate but more probably overestimate the true causal car use reduction effect of work travel plan. Thus the most important task of future research on the effectiveness of work travel plans consists in conducting evaluation studies which use more powerful true experimental designs. From our viewpoint using site based randomised control-group-post test designs provides a viable alternative to the momentarily dominating one-group-pre-post-test design.

However, the mentioned weaknesses do not relate to the meta-analytical approach of research synthesis itself but the quality of the available research findings. Research synthesis cannot compensate for poor data quality.

Sebastian Bamberg

is assistant professor (Privatdozent) at the Department of Psycholoy at the Justus-Liebig Universität Gießen, Germany. His main research intersts are the application of social-psychological action theories to the explanation of pro-environmental behaviors.

Guido Moeser

is a doctoral candidate in the Department for Social and Cultural Sciences at the Justus-Liebig University Gießen, Germany. His principal research interests include research synthesis, statistics and transportation. His Ph.D. thesis (2006) addressed practical applications of research synthesis in the domain of transport policy.