Introduction

The speed at which cyclists travel is a direct consequence of cycling behavior, which is in contrast to public transport, where travel speed is beyond the control of the traveller. As opposed to driving a car, where the upper and lower ranges of travel speeds are restricted by limits, cycling speeds are less regulated. Cycling speed is restricted primarily by the physical condition of the cyclist and varies with the effort put into it. The environment (e.g. weather, wind, slipstreams, road surface and topology) and to some extent bicycle technology are factors that influence the degree to which physical power translates to speed. In addition, safety motives in relation to road infrastructure and volume of traffic influence the speed at which cyclists choose to travel.

These considerations suggest that cycling speed varies greatly across individuals, regions, time of year, road sections and the bicycle itself. Variation in speed within and across studies is documented in Allen et al. (1998).

Cycling speeds influence travel times and therefore play an important role in transport model systems. Route choice models are typically based on the algorithm “choose the fastest available cycling route”. Thus, knowledge about how speed varies across network links is of crucial information in route choice modelling. Cycling speed is also important knowledge in travel mode choice modelling, since travel time savings or losses of the alternative “cycle” influences the choice of whether to cycle or not. All other things equal, a transport model will predict a higher market share of cycling where it can take place at higher speeds.

Yet many transport models (such the Norwegian regional transport models; Madslien et al. 2005) impose a constant cycling speed when route choice is being modelled, thus making travel distance the sole determinant. Some models have route choice dependent on the characteristics of road infrastructure. For instance, Menghini et al. (2010) and Ehrgott et al. (2012) have established route choice models where qualitative information about the road network is among the generalized cost functions underlying the behavioral model. In this case, behavioral changes in route choice due to qualitative improvements in the road network (e.g. separating pedestrians from cyclists) can be predicted. Such route choice models circumvent the need for separated speed models by directly modelling the effect of network characteristics on cyclist behavior without making cycling speed explicit. However, speed models are needed when one wants to predict the effect of changes in the road network on cycling speeds and travel times savings. Information about the latter could be crucial for economic appraisal studies, where travel time changes enter the economic accounting of the user benefits of road infrastructure projects. Applications to economic appraisal, which has a central role in Norwegian transport planning, was a major motivation for the establishing of the speed models presented in this paper.

One reason why so few transport models vary cycling speed across network links is likely a scarcity of comprehensive speed models in the scientific literature. El-Geneidy et al. (2007) were probably the first to study a comprehensive model where speed is described as a function of network link characteristics. Some significant results were identified (see later discussion in “Discussion” section) despite the low sample size of the study. To our knowledge, El-Geneidy et al.’s speed model was never implemented in a transport model. In fact, it seems that most behavior cycle models in which network link information is used within a transport model system are route choice models that do not have an underlying speed model. Models by Menghini et al. (2010) or Ehrgott et al. (2012) use link length (not link speed or link travel time) as one of the explanatory variables in their route choice models.Footnote 1 An alternative approach, where link attributes are first translated into link speed by means of a speed model and then consecutively used in route choice modelling, can be found in Ziemke et al. (2017), whose speed model, however, is limited and not estimated on empirical data.

In this paper, we present speed models (one for bicycle and one for e-bike) describing speed on a network link as a function of several characteristics of infrastructure and topology, as well as the user segment defined by gender, trip purpose and type of bicycle. We regard our model as the most comprehensive cycling speed model in the literature.

Besides its scope, a strength of the model is that it is estimated empirically based on revealed preference data. Thus, the model mirrors real world behavior and is therefore an empirical model as opposed to (1) pure theoretical models that are based on physical laws (see, e.g., Parkin and Rotheram (2010, p. 4) and (2) speed models that use given/assumed parameter values (as in Ziemke et al. 2017).

Another contribution of the paper is the speed modelling of e-bikes utilizing a data set with over 12,000 registered e-bike trips. In only a few previous studies (Schleinitz et al. 2017; Dozza et al. 2016) have separate models for e-bikes been estimated.

Our data were collected in Oslo, which has about 650 000 inhabitants and a GDP per capita of about 100,000 euros. Oslo is one of the wealthiest cities in Europe.Footnote 2 The mode share of cycling for all trips in Oslo was 4% in 2012 (URBANET 2013), and although this has increased in recent years, it is still lower than in most other European capitals. Barriers to cycling are long and cold winters, the hilly topology of the city and a lack of separate cycle lanes and pathways. The latter has recently received much attention in the political debate, resulting in city authorities allocating significant resources to improve cycling infrastructure.

The paper is structured as follows: the data are described in second section, the statistical model in third section, parameter estimates in fourth section, and calibration and implementation of the model in fifth section. Sixth section is a discussion of (a) methodological issues and possible future improvements, (b) how our results compare with the earlier literature and what policy implications can be drawn. Seventh section concludes.

Data

Data collection

Cycling speed, the dependent variable of the model, is measured on the basis of GPS observations. This method is gaining in popularity in cycling research (Menghini et al. 2010; Winters and Teschke 2010; Broach et al. 2012; Dill et al. 2014) and in studies with the primary objective of measuring cycling speed (El-Geneidy et al. 2007; Dozza et al. 2016; Schleinitz et al. 2017).

The GPS observations in this study were recorded by means of the commercial mobile application Sense.Dat (www.dat.nl) downloaded and taken into use by the respondents.

The original data collection was initiated by Fyhri et al. (2016), along with respondents recruited from two samples: (a) 1000 persons who had applied to take part in an e-bike subvention program in OsloFootnote 3 and (b) a sample of 10,000 cyclists from Oslo drawn from a bike insurance register. Data collection was carried out over three rounds. In the first (carried out in January 2016), respondents were asked if they wanted to use the app to record all their travel at a later point. Information about the program and login was sent via email. The second round was an intermediate questionnaire survey for those who opted out of using the app (conducted in April). The final round was carried out from May to June 2016, and consisted of the app data collection and a questionnaire survey (completed between 26 May and 3 June).

A total of 3132 participants (890 and 2242 from the first and second sample, respectively) responded to the first survey. Of these, 1376 participants (376 and 1000) agreed to register their travel with the app, and were sent the information regarding its download and use, as well as a unique user ID for login. Not all those invited took the app into use, so the final sample of app users was 721 (161 vs. 560, respectively) participants.

Table 1 presents an overview of relevant background variables for the participating app users, and compares these with data from the total sample of cycle owners in Oslo, from which the current sample is partially drawn (Fyhri et al. 2016).

Table 1 Background characteristics of participating app users and the bicycle owners in Oslo

Study participants differ in several respects from the broader population of bicycle owners: fewer of them are female and the employment rate among them is higher.

Registration of cycling information and geographical mapping

The app automatically recognized transport mode from the speed and motion pattern. According to the developers, the accuracy of recognition of transport mode is 90%. The app does not distinguish between bicycle and e-bike, however. Identification of the type of bicycle used was therefore based on the purchase date of the e-bike. All bike trips registered after the purchase date are considered e-bike trips.

The purpose of trips was given automatically by the app based on re-occurring places and trips. The user can state the purpose of the trip or correct the guess, in this way future trips being more accurate. However, the distribution of trip purposes in our data was substantially different from traditional survey data, indicating that the automatic registration of trip-purposes in Sense.Dat is not very reliable (see also “Discussion” section).

The app automatically registers when a trip is completed and stores a trip ID to single GPS observations. After data cleaning, the data set consists of the 48,633 cycling trips—36,447 by bike and 12,186 e-bike trips—made by 709 persons identified by a user ID in the mobile application. The number of registered cycle trips per person in the data collection period (1 April to 31 June 2016) ranges from 1 to 411.

The mobile application maps raw GPS observation on Open Street-Map in an initial data cleaning process. This procedure removes noise from the GPS observations by forcing them onto the network links. For the scope of our study, we regard that possible effects/errors of this procedure to be uniform over the data set. We therefore decided to use the map-matched time and location stamps for our speed calculation rather than the one’s from the raw GPS-observations.

For modelling building we used the official road network of the Public Roads administration of Norway (the national road database NVDBFootnote 4). The reason for this is the need to have link information that is consistent with the network used in Norwegian transport models. Therefore, we projected GPS observations on the Oslo region NVDB network,Footnote 5 which consists of close to 50,000 links (see Fig. 1).

Fig. 1
figure 1

The Oslo road network (NVDB)

Projections result in the attachment of link IDs onto single GPS observations,Footnote 6 which previously were marked with a user and trip ID as well as time and location stamps.

Calculation of speed at a link level

Based on this information, we calculate the average speed \(S_{l}\) on a network link l for a given cycling trip m. The following equation is used (indices for trips are suppressed).

$$S_{l} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} d_{li} }}{{\mathop \sum \nolimits_{i = 1}^{n} t_{li} }}$$
(1)

where i = 1 is the first GPS observation for a given trip on link l, i = n is the last GPS observation for a given trip on link l, dli is the distance (in meters) between observation i and i − 1 identified by the related location stamps; we set \(d_{l1} = 0\), \(t_{li}\) is the duration (in seconds) between observation i and i − 1 identified by the related time stamps.Footnote 7

Thus, only GPS observations mapped on a link are used to calculate speed on that link; speeds measured at link transitions are not included. Because speed may decrease at link transitions, due to braking and acceleration, we measure speed on a link level as higher than on a trip level. We therefore need to calibrate the speed model (see “Calibration and implementation” section).

Alternatively, we could have included distance and duration between the first observation of link l and the last observation on link l − 1 (or between the first observation on l + 1 and the last of link l). This would have meant that the speed measurements were more continuous. However, it would also have led to inconsistency in the concept structures of the model, as the estimated effects of characteristics of link l would partly estimate speed associated with link l − 1. This was deemed problematic, especially when l is short or when the last observation on l − 1 is far from the link transition. Note that after projection on the applied network there are links without any (valid) GPS observation, which means that some speed measures would have been based on speed on l − 2, l − 3 etc.

Besides the need to calibrate the speed model, the approach chosen has implications for the total sample size of the estimation model. That is because trip-link pairs with only one valid GPS observation could not be used for estimation (see next section).

Statistical model

We specify the same statistical model for both bicycle and e-bike, but estimate two sets of coefficients by means of separate model runs.

The dependent variable in the model is average speed on link l calculated by Eq. (1). Speed is specified by a function of the user segment, defined by gender g and trip purpose p, link characteristic \(X_{k,l}\) and related coefficients \(\beta_{k}\), as well as an error term \(\varepsilon\).

$$S_{l,g,p} = e^{{\beta_{o} + \beta_{men} * D_{g = 1} + \beta_{work} * D_{p = 1} + \mathop \sum \nolimits_{k} \left( {\beta_{k} *X_{k,l} } \right) + \varepsilon }}$$
(2)

where \(\beta_{o}\) is the constant term, \(D_{g = 1}\) a dummy variable that equals 1 if cyclist is male, \(D_{p = 1}\) a dummy that equals 1 if the cycle trip is registered as a work-related trip. \(X_{k,l}\) is a set of variables describing characteristics of network link l.\(\varepsilon\) are independent and identically (IID) normally distributed error terms.

The main argument for modelling speed with an exponential function is that it allows calibration of the model at a later stage with respect to only a single parameter and without the need to scale the beta-coefficients. After log transformation, the linear regression model can be estimated using the least-squares method in standard statistics programs (here SPSSFootnote 8).

We weight observations by link length (meters), putting higher weights on longer links. As a direct consequence, coefficient estimates are affected more by longer links where speed measures are presumably more robust and where the relationship between link characteristics and link speed is profound.Footnote 9 Weighting also makes sense from a model application point of view, as the speed model should predict average speed (travel times) for entire cycle trips. The model should therefore have a relatively better fit for long links that contribute more to the calculation of travel times on a trip level. To avoid bias in coefficient estimates for explanatory variables that are likely to depend on link length, we applied interaction effects. In final model specification, this was done for the effect of T and X crossings, for which the relative impact of braking and acceleration is likely to depend on link length. Technically, we have simply specified 12 dummy variables based on combinations of the dummy variables for type of crossing and dummy variables for the link length groups.

Having a large database at hand, we decided to exclude from the estimation observations with presumed low data quality. As already mentioned (in “Statistical model” section), the applied methodology of speed calculation at a link level required that we discarded link-trips pairs with only one GPS observation per link. Furthermore, we decided to drop all observations where we did not observe at least 75% of the link length with GPS observations. A few links in our network had an unreasonably high net gradient attached.Footnote 10 We excluded all gradients below − 20% and above 20%, and also observations with unreasonably high (more than 60 km/h) and low speeds (below 5 km/h) on a link. The former was likely due to data issues or to a wrong registration of transport mode, while among the latter cases there were presumably many trips when the cycle was walked (rather than ridden). Finally, after some testing, we excluded very short links (below 10 m) from the estimation, because the data quality, in particular the measurement of gradients, was regarded as inferior for short links.

The applied exclusion criteria are summarized in Table 2.

Table 2 Applied criteria for exclusion of GPS-observations that could be mapped on the NVDB-network (original N = 2,848,560)

These exclusions of unreasonable and “low quality” observations led to clear improvement in the goodness-of-fit in the estimation model. As t-statistics of the estimated parameters remained very high in general, the reduction in the number of observations was not regarded as a problem from a statistical perspective. Most importantly, the parameter estimates became more reasonable (comparing parameter sign and order with prior expectations) after the exclusion.

The following link characteristics are used in the model:

  • Net gradient specified as 18 different dummy variables for ranges of gradients (the level “0–1% gradient” is normalized).

  • Average net gradient of inbound (i.e. the preceding) links (continuous variable).Footnote 11

  • Horizontal curvature (continuous variable).

  • Type of road specified by four dummy variables: cycling path, cycling lane, walk/cycle path and remaining roads (the latter is normalized).

  • Type of crossing specified as 12 dummies depending on (1) T or X cross, (2) at start or end of link, (3) short, middle or long link.

  • Main cycling route alongside major roads in and around the city (one dummy variable).

  • Proxy for traffic density/safety concerns specified as four dummies depending on whether link is in city center and whether the road has a 30 km/h speed limit for cars.Footnote 12

A more detailed overview of the explanatory variables—presented as descriptive statistics—and a short note about their technical coding is given in “Appendix”. Table 5 in the “Appendix” also gives the a-prior expectation of the parameter signs.

Parameter estimation

Because of log transformation of the estimation model, the beta-coefficients (besides the constant term) are interpreted as percentage change given marginal change in the explanatory variable.

Table 3 reports on the estimation results for the bicycle and the e-bike model.

Table 3 Estimation results

The parameter estimate \(\beta_{o}\) lets us calculate the average speeds (prior calibration) given that all explanatory variables are zero. This would be for the reference group of all dummy variables, i.e. Non-city center road with car speed limit > 30 km/h, other types of road with gradient 0 to 1%. We can see that e-bikes have a higher speed and, looking at the confidence intervals, this is clearly significant.

All other things being equal, males cycle 13.0% faster compared to women, while the effect of gender on e-biking is just 4.9% (see “Discussion” section).

Cycling on work-related trips seems to be at higher speeds. As mentioned in “Data” section, there is uncertainty about the reliability of which trip purpose is identified in the mobile application. However, work-related trips are highly significant; they are stable between bicycle and e-bike and follow a priori expectation. We are therefore reassured that the identification of work-related trips worked reasonably well.

The net gradient of the link is one of the most important factors in explaining variation in speed over links. For positive gradients (cycling uphill) there is, as expected, a monotone (and roughly linear) decrease in speed with increasing gradient. With a gradient above 9%, participants cycle 42.7% slower than on links with or without a marginal gradient (0–1%). As expected, the decrease is lower for e-bikes (32.3%).

The effect of the gradient for downhill cycling is non-monotonic. The highest speeds are on average estimated on gradients between 5 and 6%, while speed seems to decrease above 6%. This is probably due to braking motivated by safety concerns. Speed in relation to gradient for bicycle and e-bike is given in Fig. 2, where the values are averages over segments and already calibrated to fit speed on trip level (see next section).

Fig. 2
figure 2

e-Bike and bicycle speed given different gradients

The coefficient continuously measured gradient of inbound links is not readily comparable to coefficients for the dummy variables on gradients on the actual link.Footnote 13 To get the marginal effect of a percentage point increase in gradient of the inbound link, the coefficient estimates of − 0.3936 and − 0.2946 are divided by 100. Interpretation is that an increase in average gradient of inbound links lowers speed for bicycle and e-bike by 0.39 and 0.29%, respectively. Thus, for gradient on the actual link, the effect on speed is lower for e-bike. This highlights how additional power from e-bikes helps to maintain speed when cycling uphill.

Horizontal curvature is measured by an index variable that equals 0 if the link is a straight line and 1 when link length is twice the air distance. The curvature effect on speed going from the latter to the former is estimated at − 22.3% bicycle and − 19.5% e-bike.

Estimates for type of cycling infrastructure are relative to the category “other type of roads”, which entails streets not facilitated for cycling. We can see that highest speeds are estimated for cycling paths, where the cyclist is separated from cars and pedestrians. The relative effects compared to the category “other type of roads” are 10.6% (bicycle) and 12.3% (e-bike).

By comparison, there is a significantly lower effect for cycle lanes marked on roads otherwise used by cars. Here, speeds are estimated to be (just) 8.2 and 7.4% high. On paths shared by pedestrians and cyclists, cycling speed is 6.1% increased for bicycle compared to category “other type of roads”. Interestingly, the effect is significantly higher (8.5%) for e-bikes.

For estimates relating to crossings, there are three general effects: (1) speeds are lower on trips with crossings, (2) speed decreases for X-crossings are higher than for T-crossings, and (3) the effect of crossings is greater for short links and small for long trips. All findings follow a priori expectation. There are no clear effects related to a crossing at the start versus end of a link. Also, the differences between bicycle and e-bike are ambiguous. As discussed in “Discussion” section, estimates for crossings may be underestimated due to the way average trip speeds are measured at link level.

The variable “main cycling route” encompasses three main cycling routes into and around the inner city of Oslo. These routes are often used for longer distance cycling. The infrastructure is mixed in terms of types of road (separated cycling path, cycling lanes and walk/cycle path), but in general of good standard. The estimated relative effects on speed are 11.4% (bicycle) and 9.5% (e-bike).

The last set of explanatory variables (in Table 3) is of dummy variables for city center and car speed limit on the link. Car traffic is often regulated to 30 km/h and below for streets with a high density of pedestrians and/or cyclists. Furthermore, since the density of cyclists and pedestrians is generally higher in the city center, these variables may therefore function as a rough proxy for traffic density and related safety motives. Studying the estimated relative effects, we can see that both the city center and the reduced speed limit for cars have a significantly decreasing effect on cycling speed.

The goodness-of-fit of the model and other variables that might explain the variation in cycling speed are discussed in “Discussion” section.

Calibration and implementation

As mentioned in “Data” section, the method applied to measure average speed on link level (Eq. 1) is an overestimation compared to speeds measured on a trip level. While calibration of the overall level of speed may not be necessary for route choice models that are based purely on travel times (as the fastest route remains the fastest route independently of the overall level of speed), a correct calibration of the model is needed for mode choice modelling and economic appraisal.

In order to calibrate the models for predicting—on average—correct travel times on a trip level, we introduce a calibration factor \(C_{g,p}\) for each segment identified by gender g and trip purpose p. This is done separately for bicycle and e-bike with the following equation:

$$C_{g,p} = \frac{{\frac{1}{{M_{g,p} }}\mathop \sum \nolimits_{m = 1}^{{M_{g,p} }} S_{g,p,m}^{trip} }}{{\frac{1}{{L_{p,g} }}\mathop \sum \nolimits_{l = 1}^{{L_{p,g} }} \hat{S}_{g,p,l}^{link} }} = \frac{{\overline{{S_{g,p}^{trip} }} }}{{\overline{{\hat{S}_{g,p}^{link} }} }}$$
(3)

where \(\overline{{S_{g,p}^{Trip} }}\) is the average speed measured over all trips \(\left( {M_{g,p} } \right)\) of segment g,pFootnote 14 and \(\overline{{\hat{S}_{g,p}^{Link} }}\) are the model predictions of average speed over links \(L_{p,g}\) that we have data for.

For implementing the model, we calculate speed for all segments on all links in the network with the following equation

$$S_{l,g,p} = \left( {e^{{\beta_{o} + \beta_{men} * D_{g = 1} + \beta_{work} * D_{p = 1} + \sum\nolimits_{k} {\left( {\beta_{k} *X_{k,l} } \right)} }} } \right) * C_{g,p} )$$
(4)

Average speeds at trip level and calibration factors for each segment are presented in Table 4.

Table 4 Average speed at trip level and calibration factors

Figure 3a–d show how the average speed on a link calculated with Eq. 3 varies over the network for uphill and downhill links and for bicycle and e-bike, respectively. Results are shown for the biggest segment (bicycle, male, non-work).

Fig. 3
figure 3figure 3

a Predicted speed (km/h) for bicycle in the Oslo network for segment male/non-work: uphill links. b Predicted speed for e-bike in the Oslo network for segment male/non-work: uphill links. c Predicted speed for bicycle in the Oslo network for segment male/non-work: downhill links. d Predicted speed for e-bike in the Oslo network for segment male/non-work: downhill links

The figures show the lower speed in Oslo city center. This relates to the flat geography, but is directly connected with the dummy variables for inner Oslo that were included in the model.

Figure 4a, b shows the overall variation in predicted cycling speed over all user segments and network links for bicycle and e-bike, respectively.

Fig. 4
figure 4

a Histogram of predicted speed (km/h) for bicycle (all network links and population segments included). b Histogram of predicted speed (km/h) for e-bike (all network links and population segments included)

Note that average values of the distribution depicted in Fig. 4a, b (16.3 and 17.7 km/h, respectively) differ from averages measured at trip level (compare Table 4) as calculations at trip level are based only on links where cycling is actually taking place in our data collection.

Discussion

Methodological challenges

We start the discussion section by briefly elaborating the methodological issues of our study and possible improvements for future studies.

It is not easy to recruit people for studies based on GPS tracking and there is an obvious danger of sample selection bias, i.e. that those engaged in cycling and/or tend to cycle fast are more inclined to participate in the study. In addition, they might change behavior, e.g. cycle faster or slower due to social desirability effects (Podsakoff et al. 2003) when they know they are being tracked. This danger applies in our study as well; however, it could be argued that the problem is less pronounced here as we are using an app that tracks all movements/transport modes and is not a typical fitness or dedicated cycle app. Previous survey data of participants from the same sample population as used here have also indicated that their cycling behavior does not differ significantly from that of the general public in terms of self-reported speed choice or risky cycling (Fyhri et al. 2012).

The app’s automatic registration of transport mode and trip purpose has measurement errors that are difficult to quantify. We attempted to circumvent wrongly registered cycling trips by excluding “unrealistic” travel speeds (below 5 and above 60 km/h). As mentioned in “Data” section, the distinction between bicycle and e-bike trip is based on whether the respondents owned an e-bike at the time of the cycle trip, but this indirect measurement is likely to produce incorrect registrations. Given that more registered e-bike trips are actually by (regular) bicycle, the true difference in speed between e-bike and bicycle is potentially higher than estimated in our study. The same applies for the difference between work and non-work trips. Work trips are underreported in our study and if this is due to incorrect registration, the speed gap between work and non-work trips may be underestimated.

The method applied for measuring average link speed (see “Data” section) is likely to underestimate the effect of crossings. Furthermore, the applied network had a poor coding of traffic lights, so we left this information out of the model. This is a clear way of improvement for future studies.

A variable we have information about but which was omitted from the models is cyclist’s age. The relationship between age and speed was found to be non-linear with our data (Flügel et al. 2016), where the highest speeds were on average observed for the middle-age user group (35–55 years). To have included age in the model would therefore have required at least three new dummy variables and would have increased the number of different speed measures in the implemented model from 16 (2^4 combinations of gender, trip purpose, type of cycle and direction) to 48. This was discarded from a practical point of view.Footnote 15

The models presented in “Parameter estmation” section explain around 24% (bicycle) and 17% (e-bike) of the observed variation in cycle speed. Besides missing variables (such as traffic lights and age), a major reason for this relatively low level of explanatory power is likely related to measurement errors in the mapping of the GPS-data to our network. Figure 5 shows a plot of predicted average speed and measured average speed on a link level (here including data from both bicycle and e-bike).

Fig. 5
figure 5

Measured versus predicted speed on a network link level (bicycle and e-bike combined)

From the color codes in Fig. 5 one can see that network links with few observations (the blue ones) are pretty widely scattered, most likely because of measurement errors, while predicted values stay in the reasonable range of 10–30 km/h. For those cases, the model fit seems poor while in fact the model does a good job in “regressing out” the measurement errors in the empirical data.

Lastly, improvements to the statistical modelling are possible. The IID error terms applied are clearly restrictive, as we have repeated observations from identical trips and persons. As a test, we have performed models with fixed effects for respondents, accounting for the correlation in observations within a given persons and aiming to control for some of the unobserved variation on a person-level.Footnote 16 The adjusted R square statistics improved from 0.24 to 0.30 (bicycle) and from 0.17 to 0.26 (e-bike). These improvements may seem moderate and might again indicate that measurement errors are the most prominent source of the error term. The estimated beta-coefficients in these additional models are very close to the model presented in “Parameter estimates” section, except for the dummy variable for gender. This parameter was only identifiable—as gender is constant for a given person—after removing some fixed effects. The estimated value for gender was furthermore deemed unreliable.Footnote 17 The personal fixed effects model was therefore discarded for model implementation.

Discussion of result and implications

Despite the methodological weaknesses mentioned, we are confident that our model is a sound representation of how cycling speed varies with different factors, primarily because sign and order of coefficient estimates follow a priori expectations and seem to fit well with findings in the literature.

It is not surprising that men cycle faster than women—a fact found consistently in the literature (El-Geneidy et al. 2007; Lin et al. 2008; Parkin and Rotheram 2010). While the average speed is significantly different, our data show a high “gender-internal” variation in speed, and 25% of the faster female cyclists have higher speeds than the median male cyclist (Flügel et al. 2016). Our data also show that the gender gap in speed (on average) is clearly reduced for e-bike (around 5% compared to 13% for regular bicycle). Public subsidies for e-bikes (as in Oslo; Fyhri et al. 2016) may therefore be interesting from the perspective of equity and may be seen as a move to get more female travellers into cycling.

The fact that e-bikes go faster than regular bicycles has been documented earlier in the literature (Schleinitz et al. 2017), and also our finding that they do particularly well when cycling uphill. Our modelling of downhill and uphill gradients showed a non-monotone and non-linear relationship which—to our knowledge—has not been measured to this degree of detail. The “braking effect” identified may have interesting implications from a road safety perspective, but needs to be verified in further studies.

Interestingly, evidence in the literature regarding the effects of cycle infrastructure is mixed. In one study, cycling in a separate infrastructure resulted in higher speeds (El-Geneidy et al. 2007), whereas in other studies there is no mention of this (Bernardi and Rupi 2015; Schleinitz et al. 2017). The reason for the disparity is probably the presence of other road users on the infrastructure—pedestrians in particular. Our study shows that speeds are highest on roads where cyclists are kept separate from cars and pedestrians. Cycling speed is on average 20.6 km/h (bicycle) and 19.0 km/h (e-bike) on dedicated cycling paths, and lowest when cycling is not facilitated at all (17.6 and 16.3 km/h). We also estimated values for marked cycling paths (on car streets) to be 19.8 and 18.7 km/h. Estimated speeds in combined pedestrian/cycling streets (18.4, 16.7 km/h) suggest—from a speed evaluation point of view—that it is relatively more important to separate cyclists from pedestrians than cars. This may have interesting implications for transport infrastructure planning, as cyclists, just like other road user groups, need to minimize travel times.

It seems that our average predicted speeds are in the upper range of values in the literature.Footnote 18 In this connection, we have to point out that the concepts “average speed”, “cruising speed”, “speed on flat roads” and “speed identified by the constant term in regression models” are sometimes confounded in the literature, so clear comparison is not easy. In addition, methodological differences in study designs and cultural differences in the study areas are likely to explain differences in measured or predicted average speed. We regard our model as largely transferable to other cities, but a recalibration of the general speed level might be desirable.Footnote 19 It should be noted that cycling culture in Oslo is typical of that of many cities with relatively low cycling shares, in that it is characterized by a relatively high proportion of training oriented cyclists who cycle as a form of exercise (Fyhri et al. 2015), and many workplaces provide lockers and showers that facilitate “high-speed” cycling to work.

On the basis of the speed model and the variation in cycling speed it predicts, we plan to revise and improve the modelling of route and travel mode choices in different Norwegian transport models. We hope that our model contributes to an improved modelling of travel behavior in general and cycling behavior in particular.

Conclusion

Our large-scale collection of GPS data on cycling in Oslo shows that cycling speed varies greatly over network links, user segments and type of bicycle (regular bicycle and e-bike). Significant and expected effects were found for most parameters, including type of infrastructure, type of crossing and link gradient. The assumption of constant link cycling speed in the clear majority of transport models is therefore shown to be restrictive, and an implementation of cycling speed models is expected to increase the precision of transport model forecasts.