1 Introduction

In terms of sustainability, public transit is the only viable alternative to private cars in most urban areas, where journeys are too long to be made by nonmotorized modes of transport. Studies regarding public transit generally result in two propositions: public transit investments should be supported by (1) transport policies that can restrict car usage in cities while improving public transport and (2) land-use planning and urban design policies in order to make the urban neighbourhoods around transit stations more public transport friendly and more walkable. Under the guiding principles of transit-oriented development (TOD), the basic ideas are to design urban cities in a relatively high-density, compact and mixed form and to provide high-quality, efficient public transit services, together with a pedestrian-friendly environment [1]. There is a growing requirement to integrate urban and transport planning, which is already well established in European cities [2, 3].

Among public transit modes, urban rail transit (URT) systems are particularly favoured by planners because they are believed to be more effective in attracting car users. For more sustainable urban transportation, it is necessary to invest in rail transit infrastructure; URT investments also have crucial effects on the economic, social and physical life of cities where the fixed infrastructure results in permanent changes in urban areas. Many studies have reviewed the performance of URT systems, particularly the factors behind the success in terms of passenger ridership level, which is mostly associated with transit station area design parameters; these analyses have primarily been carried out in developed regions, where data on land-use and station area characteristics are more available [4,5,6,7,8,9]. Density, diversity and connectivity are found to be essential factors increasing the usage of particular rail transit stations at a microscale perspective [10]. However, the variability in the definitions and scales of these variables (whether it is a macroscale concept relating to a city or a microscale one regarding the vicinity of the station) causes ambiguity in determining the quantified impacts on ridership.

Studies on the estimation of the success of URT in developing countries are rather limited [11, 12]. Results from developed regions are also difficult to generalize for the developing country context, and travel behaviour can differ due to the availability of alternative transportation modes (size and availability of public transit services, automobile ownership levels, etc.). URT services may have more mesoscale (corridor-based) effects in the early stages of the URT network, which are mostly integrated with other transit (i.e. bus) or paratransit services. Thus, it is necessary to study the scope and extent of the relation between URT ridership and various land-use and public transit supply variables for a developing country case, which is the main goal of this study.

For the city of Ankara, transit usage is decreasing in the face of rapidly increasing automobile usage. Attempts to create a URT network began more than 2 decades ago, but only two lines were put into service: (1) the first metro rail line, M1, serving a suburban development corridor and (2) the LRT line, A1, serving an urban corridor, based on parameters pertaining to both neighbourhood design and integration with other modes (see Fig. 1). Despite the significant investment requirements, enlargement of the URT network for Ankara has been continually discussed, involving the evaluation of factors affecting URT ridership in Ankara. Detailed land-use and public transit network supply data were collected for 14 selected stations (seven stations each for both M1 and A1 lines) excluding the ones in the central business district (CBD), representing extreme values due to the overwhelming number of transfer and connection opportunities and commercial density. Even though it brings some limitation, this exclusion provided the opportunity to determine the scope of the impact of land-use and station area design parameters in a more general perspective. While an overall evaluation was performed via a 14-station dataset, corridor-based models were also developed separately.

Fig. 1
figure 1

Rail transit networks of A1 (green) and M1 (red) lines used in Ankara [13]

The contribution of the study is the attempt to model the complex relation between URT ridership and spatial design and system factors defined at different scales (macro, meso and micro) for a developing country case. Secondly, correlations between different variables were observed at a statistically significant level, suggesting the need for (a) more integrated approach in evaluating URT investments, but also (b) conducting corridor-based evaluations to capture the real dynamics of the relationship between URT network design and ridership. Selection of stations by excluding singular cases (stations in the CBD, stations in industrial zones, etc.) enabled analysis of the effect of the selected parameters in a more controlled manner, therefore increasing the potential of generalization of the findings.

2 Literature Review

2.1 Modelling Studies for Urban Rail Transit Evaluation

Investigating the parameters affecting the urban rail transit ridership can be analysed in three categories: (1) simulation studies, (2) descriptive studies and (3) multiple statistical analyses [14]. Simulation studies are generally focused on the travel demand forecasting models to estimate the impact of changes in urban form on travel behaviour [4]. On the other hand, descriptive studies are generally used when travel surveys (such as household surveys) were performed and the effects of the land-use pattern, income levels, the properties of the station areas, etc., were investigated. The main advantage of descriptive studies is to determine particular travel behaviours and provide insights for further studies [4].

Multiple statistical analysis, on the other hand, provides more opportunity to analyse the relationship between the factors and the ridership level, directly. Loo et al. [4] investigated the ridership level of rail transit systems in Hong Kong and New York City, where different land-use characteristics are present. Multiple regression analyses were performed by gathering independent variables in four headings: (1) land use (total commercial and residential area, total garage floor area, mixed land use, total off-parking floor area), (2) station characteristics (being in the CBD, at a major interchange station, years of operation, etc.), (3) socioeconomic and demographic characteristics of the station area (car ownership per household, population, population size per residential floor area, employment overpopulation, etc.) and (4) intermodal competition (number of bus stops present). The results showed that the station characteristic parameters were found to more significantly affect the ridership level compared with the other independent variables. Stefancic et al. [5] only focused on the land-use properties of the Novi Jelkovec region of Zagreb to determine urban public transport ridership levels. The accessibility matrix of the region was derived and will later be used to analyse the correlation between land use and ridership level. Liu et al. [6] used multiple regression analysis considering the network characteristics of the city of Guangzhou, China. The data were collected from smart cards. It was found that apart from distance to the CBD and transfer possibilities, land use correlated with ridership to a significant extent.

Gutierrez et al. [7] also focused on the parameters affecting the transit ridership in the Madrid Metro via multiple regression analysis. Among ten independent variables, accessibility to stations, mixing land use (diversity) and public bus feeder system were found to be significant parameters, with an R2 of 0.753. However, street density was not found to be significant. Sung and Oh [8] reported that the integration of public buses with rail transit stations was influential for increasing rail ridership of the Seoul Metro. For the follow-up study, Sung et al. [9] used spatial regression analysis to reveal the impact of land use and accessibility around the station on rail transit ridership. The Seoul Metropolitan Region was selected as the case study, and a radius of 750 m around the stations was analysed using land use (commercial, residential, public service, etc.) and accessibility (number of station entrances, distance to nearest station) data. It was found that density around residential areas had the most significant effect on ridership levels, whereas diversity had low consistency. On the other hand, regarding station-level accessibility, the number of entrances and integration with bus systems were positively related to ridership levels. Cervero and Murakami [15] stated that the connectivity and integration of public buses were influential parameters affecting Hong Kong Metro’s ridership levels.

Level of integration, encouragement by other transportation modes, TOD and land-use planning policies are the key elements in increasing the ridership level of rail transit systems [16]. Salzberg et al. [17] examined the success factors behind the high usage of rail systems in Chinese cities. Their primary findings showed that integration of metro systems with existing systems of public transport is the main indicator of the high usage of urban metro systems. They also noted that transit-oriented development and long-term financial sustainability were crucial factors that influence rail ridership level directly. Jaroszynski and Brown [18] examined the ridership level of the LRT system by considering the socioeconomic, planning and operational factors of eight US metropolitan areas. Ranking analyses were carried out, and their findings revealed that socioeconomic factors were not determinative for ridership level; instead, accessibility, multimodal coordination and integration were the important parameters. Furthermore, public transportation integration, promoting park and ride facilities, access modes to LRT stations and schedule coordination were found to be key elements for increasing LRT ridership [16, 19]. Lin and Shin [20] also reported that daily ridership was affected positively by the station floor space area and insignificantly affected by mixed land use and density for 46 metro stations in Taipei City, Taiwan.

Lane et al. [21] collected ridership, demographic and transportation system data from 17 US cities, including 58 commuter rail corridors, 22 light rail corridors and 1218 stations. In contrast to other studies, they defined additional variables, such as considering reverse commute trips, special transportation hubs/ports and introducing transportation system variables. Multiple regression analysis was performed, and their developed model explained the commuter rail transit, with R2 values of 0.97 and 0.92 for light rail transit. Kuby et al. [22] examined the factors affecting the ridership of a light rail system via multiple regression analysis in nine US cities by considering 12 variables in which the diversity, accessibility and bus transfer locations were found to be significant parameters for non-CBD rail transit stations. On the other hand, bus transfer station in CBD locations was not found to be a significant variable. A more comprehensive study was conducted by Liu et al. [23], who evaluated the rail ridership data of rail transit stations of both light and commuter rail forms for the US State of Maryland. Whereas public bus connectivity, station location in the CBD and distance to nearest station were significant parameters, public bus connectivity was only a significant parameter for the commuter rail system.

In addition to the aforementioned studies, the parameters affecting rail transit ridership were evaluated via principal component analysis (PCA). The aim of this analysis is to reduce the dimensionality and to group the correlated independent variables [24]. Li et al. [12] investigated railway ridership trends. The ridership data were taken from 1950 to 2015, and PCA was implemented for seven independent variables, an approach that resulted in two main components. Nawrocki et al. [25] investigated the factors affecting LRT usage in the USA and Japan. The analysis was performed for 24 rail lines with 1,293 stations in the USA and 21 rail lines with 616 stations in Japan. Nine independent variables (walkability, LRT station spacing, LRT speed, system length, bus connection, the total number of stations in LRT, walking impedance, etc.) were selected, resulting in three main components from PCA. Furthermore, the PCA was supported with multiple regression analysis to derive the mathematical formulation. The results showed that whereas LRT speed and walkability were not statistically significant parameters, system length and number of LRT stations did significantly affect ridership. Similarly, the factors influencing rail transit use were investigated by Brons et al. [26]. Thirty-five independent variables were grouped as eight components, one of which was station accessibility.

2.2 Spatial Parameters in Evaluation of Rail Transit Usage

In the light of the above literature review, an additional focus should be paid to the concepts of spatial parameters that are commonly referred to. In the ‘New Urbanism’ movement, the transit system is the emphasis of the urban system with a high density and mixed land use. Density is a major factor in determining transit system ridership, and with the addition of a mixed land-use pattern, it is possible to create better urban areas, where residents would prefer using public transit systems [27]. This approach includes compact urban areas integrated with the transit systems, such as ‘transit villages’, ‘smart growth’ and ‘TOD’. At a policy level, spatial parameters commonly emphasized in the literature are (1) density, (2) diversity/mixed land use and (3) connectivity and accessibility that affect transit usage (see Table 1 for related studies).

Table 1 Parameters and measures used to study urban rail transit evaluations

Density is the number of people living in a given area, or the ratio of people to land area. It is an essential factor in the decision-making process of transit systems [27,28,29,30,31,32]. In high-density areas, transit usage also increases, whereas in a low-density area, high-capacity transit systems become unattractive, and therefore significant investments are wasted [29]. As density increases and mixed land use becomes constant, people tend to walk and use bicycles or public transit systems compared to areas with low density [30]. It was found that a high concentration of people living and working around 800 m of a station increases the ridership levels [9, 33]. Diversity (mixed land use) is defined as the integration of land use by increasing the proximity of urban activities. It directly affects travel behaviour and sustainable transport mode usage [30]. Street connectivity indicates how densely the streets are connected with each other. The high connectivity of a street network provides an increased level of accessibility for pedestrians. Accessibility is also considered to be an efficient and major measure in the decision stage of urban transport planning [34]. Connectivity is also determined by the presence of sidewalks, pedestrian paths and their continuity. A grid network provides the simplest street pattern and is often emphasized as the preferred model in neotraditional neighbourhood design.

Despite the strong conceptual link between these three concepts and URT ridership, the ambiguity stems from the fact that under these major concepts, it is possible to define measures at different scales (macro to micro), as shown in Table 2. Thus, for density, although it is possible to talk about number of living (or working) units within the walkable region of a station (microscale), it is also possible to discuss the total number of workers per unit area along a URT corridor (mesoscale), or even the compactness of a city can be considered to be a density parameter when cities are compared (macroscale).

Table 2 Different scales used in the study

3 Evaluation Methodology for Urban Rail Transit Station Usage

Rail transit usage is generally studied using the dependent variable of ridership (or a transformation of it) at a station. As for independent variables, among the many parameters used in different categories discussed above, it is highly important to choose the ones that substantially affect the ridership. First, the scales and definitions of the selected variables will be discussed, followed by the mathematical relation formulations.

3.1 Selection of URT Factors

In the planning literature, transit station usage is associated with station area (walkable area around the station (500–800 m) parameters at the ‘microlevel’. In the study, an area with a radius of 800 m around the transit station was used as the walking distance. Firstly, a land-use analysis is performed regarding the microscale analysis. Land-use maps are created in the field survey. Additionally, a descriptive analysis is made in order to provide a broad picture. It also helped to reveal insightful data regarding the selected areas. Secondly, density (DN), diversity (DV) and connectivity/accessibility calculations were made. Accessibility and connectivity were measured with two different parameters, the connectivity score (CS) and connectivity index (CI). However, for Ankara, it is important to take into account both mesoscale (along transit corridors) and macroscale (whole city) variables (see Table 3). The ‘mesoscale’ focuses on transit corridors. Thus, to distinguish the characteristics of the two Ankara study corridors, Metro (M1) and Ankaray (A1), a dummy variable for the metro line, Dmetro, is created. Since the public transit supply network characteristics vary along the two corridors, ‘mesoscale’ variables were introduced to capture accessibility via alternative shared-ride systems such as frequencies of public bus (Fbus), commuter rail (Frail) and paratransit services (Fdolmus), at every rail station. Finally, a ‘macroscale’ parameter representing the integration between different systems/modes (I) was introduced, which is the summation of frequencies of all modes transferable at rail stations. It should be noted that when URT and other shared-ride mode networks have more complex and network-level connections (as opposed to the current corridor-based design in Ankara’s URT service network), some of the mesolevel variables in this study (such as frequency of bus services) may result in more macrolevel impact than the mesoscale one assumed in this study.

Table 3 Parameters used in evaluating station area design and transit usage in Ankara

3.2 Statistical Evaluation Approach

To evaluate the conceptual literature on urban rail usage, it is important to derive numerical models for the expected relationship between station ridership (\(\sigma\)) and land-use and design parameters, which is the main focus of this study. A comprehensive statistical evaluation included the following analyses (see Fig. 2):

  1. (i)

    A correlation analysis is performed between the selected dependent and independent variables to determine whether there are any collinearity situations.

  2. (ii)

    PCA is performed to group the variables that behave similarly.

  3. (iii)

    Linear regression models are developed to determine the prediction power of a parameter(s) on ridership. In the evaluation of URT characteristics, first, single regression models were developed using the rail station ridership as a dependent variable, which is regressed over selected variables (see Fig. 2). These relations were developed for all 14 station values together, as well as stations within each corridor separately. Later, multiple regression models were attempted, combining all station- and corridor-based variables and an integration variable (macrolevel) (see Fig. 2). In the multiple regression model, a dummy variable representing the corridor effect was also introduced as a separate mesoscale variable; this variable is redundant when multiple regression analysis was developed at a corridor level.

  4. (iv)

    Nonlinear regression models are also utilized to improve the numerical model fitting.

Fig. 2
figure 2

Framework for statistical evaluation of ridership and selected variables

To improve the analyses; the following modelling issues should be considered:

  1. (i)

    In complex relationships, such as rail station usage and design parameters, it is also necessary to consider the diminishing effect of the parameters; this simply means that ridership does not increase with the same rate of the selected parameter(s). In such cases, a logarithmic transformation of the ridership may reveal more about the studied relation. In this research, the natural log of the ridership, denoted as Ln(\(\sigma\)), is used as an alternative dependent variable, as shown in Table 3.

  2. (ii)

    The outliers (such as both terminal and transfer stations) in the dependent or independent variables can influence the statistical significance of the prediction to a substantial degree; to eliminate these issues, the trimmed data excluding these extreme values are used in the regression analysis (see Table 3 for notations).

  3. (iii)

    Stations along different urban rail corridors may have varying behaviours in transit usage. Therefore, this is studied by either introduction of a dummy variable representing the corridor (such as the Dmetro) or using different regression models for various corridors, following the market segmentation principle in econometric models.

Considering all of the issues discussed above, 16 simple regression models were developed using the selected parameters for both ridership and logarithm of the ridership. Such models can be statistically significant if the R2 of the estimation and the confidence levels of the predicted coefficients are high. Stepwise multiple and nonlinear regressions were also used as more sophisticated models. Models are tested for all data (considering two corridors jointly), as well as for each corridor, separately. It should be noted here that the corridor-based models with trimmed data had six station data points, which is a limitation in terms of a statistical evaluation, but included in this study regardless for the sake of completeness of the evaluation approach.

3.3 Field Data Collection

For independent variables, traffic and transportation data are rarely available for Ankara, which is also an issue for many other urban regions in Turkey. In particular, for the microscale measures in this study, land-use analyses for the vicinity of the stations (800 m) were performed with a team of two persons, between June 2013 and April 2014. For this analysis, base maps were provided from the Ankara Greater Municipality, Cankaya Municipality and Yenimahalle Municipality. The field study at each station was carried out to collect the following data:

  • For density: the building units were counted within 800 m of each station area. This figure was then multiplied by the number of stories and average household size. The result was divided by the total study area to calculate the residential density. For the employment density, the number of employees within the area was divided by the total study area. Total density (residential density plus employment density) was used in the study.

  • For diversity: a list of all commercial, administrative, social, military, health services areas, etc., was made. The number of employees was determined by assuming an average of five employees for small businesses, such as hairdressers, real estate firms, markets and furniture shops, and an average of ten employees for the Ostim industrial site (located around M1 Station 2). For the others, data were obtained from administrative units (State Archive, Ankara University, Gazi University, and so on).

  • For connectivity: two measures are selected as,

    • Connectivity score: a checklist that comprises the presence and quality of lighting, ease of pedestrian activity, parking, landscaping, flat terrain and sidewalks was completed during the field survey while experiencing the site.

    • Connectivity index: the number of roadway links was divided by the number of roadway nodes, as used by [40].

  • Integration data (daily frequency of public bus, dolmus and commuter rail): Daily headway data for all considered modes were obtained from the Ankara Greater Municipality for each mode. Integration (I) is the sum of these frequencies for all modes.

.

4 Urban Rail Transit Usage Models for Ankara

In Ankara, there is a light rail transit system called Ankaray (A1), which was opened in 1996 (Fig. 1). It runs between Dikimevi and the intercity bus terminal of Ankara (ASTI) and has 11 stations spanning over a system length of approximately 8.7 km. Ankara Metro line (M1 in Fig. 1) began operating in 1997, connecting the CBD (Kızılay station) to the new residential and industrial areas in the west expansion corridor (towards Batıkent station) that were proposed in the urban development plan (see Fig. 1). Having a total length of 14.6 km and 12 stations (seven underground, three at grades, two elevated), the M1 line serves the Ulus, Yenimahalle, Demetevler, Ostim and Batıkent regions. Although there were three more metro lines under construction, they were not open to service during the study period; thus, the scope of the study includes stations along the M1 and A1 lines, only. Characteristics of all 23 stations along the two study lines are summarized in Table 4.

Table 4 Stations and their characteristics along the metro and LRT lines in Ankara

4.1 URT Station Selection

Considering the major difference in the natures of the two study corridors, the study stations are selected according to the following criteria: firstly, the stations that are close to/at the CBD were excluded: (1) due to their potential to attract more ridership because of their location, not their spatial parameters, and this would misguide the study (see Table 4), (2) the main reason for high transit ridership levels of CBD stations would require a different analysis framework with additional research questions or surveys, which was beyond the scope of this study.

Secondly, the Akkopru station (St. 8 on M1) area was not included due to the existence of a particularly large shopping mall located right next to the station, which is a main attraction point, with entertainment, shopping, catering services, etc. Even though they were the endpoints of their lines, Batıkent, ASTI and Dikimevi stations were included in the analysis because they have similar spatial characteristics as the other selected station areas in terms of major land-use parameters. As a result, seven stations from each corridor were selected, as noted in Table 4. Ridership data for these stations were provided by Ankara Greater Municipality.

All data, including the ridership collected at the selected study stations, are presented in Table 5. The descriptive statistics of these variables show that the average daily ridership value is 322,235 trips with a standard deviation of 183,855 trips. Examining these average values, ridership data for Batıkent (700,119 trips) and Dikimevi (675,915) stations stand out as extreme because they are twice as large as the average ridership value. As a result, to observe the potential differences, the analyses were repeated assuming all 14 stations (referred to as ‘all data’) and excluding these outliers (termed ‘trimmed data’) in this study.

Table 5 Ridership and related spatial parameter values for the two study corridors

4.2 Correlation Between Station Area Design Parameters

The correlation analysis between the ridership \(\sigma\) (and Ln (\(\sigma\)) as a nonlinear alternative) and the selected parameters was performed using SPSS. Table 6 shows each Pearson’s correlation coefficient (r) and the corresponding significance levels. The correlation is assumed to be significant at the p = 0.05 level. The major findings are listed as follows:

  • The ridership (\(\sigma\)) was significantly correlated with Fbus (with Pearson’s correlation coefficient of r = 0.564 and significance level of p = 0.036) and integration (r = 0.547, p = 0.043). However, the logarithm of ridership (Ln (\(\sigma\))) was significantly correlated with Fbus only (r = 0.534, p = 0.049)

  • Density (DN) was significantly related to diversity (DV) with r = 0.559 and p = 0.038.

  • The effect of metro line (Dmetro) was significantly correlated with DV at the stations (r = 0.637 and p = 0.016).

  • Connectivity score (CS) was significantly related to connectivity index (CI) (r = 0.616 and p = 0.019).

  • The integration parameter (I) was highly correlated with the frequencies of other services (Fbus, Fdolmus, Frail) with p < 0.001, as expected.

  • Frequencies of other services (Fbus, Fdolmus, Frail) have a significant correlation among themselves.

Table 6 Correlation analysis result between ridership and independent variables

4.3 Principal Components among Station Area Design Parameters

After detecting the highly correlated variables, a PCA analysis is required to group the station area design parameters acting in a similar manner in terms of predicting the ridership level. However, in interpreting the PCA results, it is crucial to verify the sampling adequacy, the Kaiser–Meyer–Olkin (KMO) value, which should be greater than 0.50, as recommended by [44]. The PCA analysis for all data with all eight independent variables with orthogonal rotation (varimax) originally produced a KMO value of 0.42. Therefore, as recommended by Kaiser [44], we removed one of the highly correlated variables (in this case, Fbus, which was more correlated with integration (I) and Fdolmus) and repeated the PCA analysis, which led to an improved KMO value of 0.58, which is above the acceptable limit. The results of this PCA analysis are presented in Table 7. The SPSS output showed three components with eigenvalues over ‘1’, which, altogether, explained 85.65% of the variance (the same success in the initial and rotated results). The result of the PCA analysis suggested three components (or three main directions), as follows (see Table 8):

  • Component 1: Fdolmus, Frail and I (public transit supply measures)

  • Component 2: DN and DV parameters (land-use measures)

  • Component 3: CS and CI (walkability measures)

Table 7 Total variance explained from PCA analysis for all data

Hence,

  • The diversity and density variables were not creating different dimensions; they showed similar trends for the case of Ankara lines

  • Walkability measures (connectivity-based ones) around the stations are creating an independent dimension, as stated in the literature

  • Public transit supply (at a meso- or macrolevel) is a separate dimension that should be included in modelling steps.

Finally, for this PCA analysis, Bartlett’s test of sphericity \(\chi^{2} (21) = 42.46\), p < 0.001, also indicated that the results of the PCA analysis were significant.

4.4 Linear Regression Analyses

As a general evaluation, first, the stations from both corridors (M1 and A1) were studied together. The proposed 16 regression models discussed in Sect. 3 from all 14 stations produced particularly low R2 values, as shown in Fig. 3a. An improvement of R2 was not observed even using the trimmed data (\(\sigma_{tr}\)), in which the highest R2 value was obtained when ridership (\(\sigma\)) was regressed over Fdolmus (R2 = 0.318). The success of the linear regression models was certainly increased when they were developed for the study corridors separately (see Fig. 3b, c). This result, by itself, provides evidence of corridor-level differences in explaining station ridership with station area design parameters. For the A1 corridor, the model results suggested the following findings (see Fig. 3b):

  • A higher R2 value of R2 = 0.660 was obtained when Ln(ridership) was modelled as a function of density with the trimmed data. The same model considering all seven stations produced a slightly smaller R2 value of 0.612, but the estimated model coefficients were not significant at the 0.05 level, with p = 0.066.

  • A linear relation between ridership and CI was suggested, with an R2 value of 0.592 being obtained when all data were considered. Opposite to the behaviour in the density relationship, it did not produce any statistically significant parameters at the p = 0.050 level when trimmed data were used.

  • Fdolmus produced an R2 value of 0.596 for all stations, which was also statistically significant.

Fig. 3
figure 3

Linear regression analysis results for a both, b A1 and c M1 corridor(s)

For the M1 stations, more parameters improved R2 values and statistically significant linear relationships. The major findings can be summarized as follows:

  • Using the trimmed data, diversity explained the ridership level with R2 values of 0.772 and 0.752 when the dependent variables were selected as Ln(ridership) and ridership, respectively, which were statistically significant.

  • Considering all stations of M1, R2 values of 0.697 for ridership and 0.575 for Ln(ridership) were obtained when Fbus was selected as the independent variable, in which the former was statistically significant.

  • When the integration parameter was used as an independent variable, an R2 value of 0.618 was obtained when Ln(ridership) was considered, which produced a statistically significant result.

  • Connectivity score did not produce significant results for all models.

  • Connectivity index and Fdolmus parameters also produced slightly improved R2 values, but their models were not significant at the 0.05 level.

The linear regression models that best explained ridership level are presented in Table 9. The results indicated that analysing both corridors jointly did not produce statistically significant results, with the model R2 being even lower (0.318). However, for the A1 corridor, there was a negative relationship between Ln(ridership) and density when trimmed data were used (see Eq. (2)). Furthermore, whereas CI negatively affected the ridership level when all stations were analysed, Fdolmus influenced the ridership level positively (see Eqs. (3–4) in Table 9). For the M1 corridor, ridership level was positively affected by Fbus, diversity and integration.

Multiple regression analysis for only the M1 corridor (with all study stations) produced a model with an R2 value of 0.902 with p = 0.010, composed of the variables, Fbus (in Component 1) and density (in Component 2), which both had statistically significant and positive coefficients (see Table 10). Note that despite the low t score, the constant was kept in the model to represent the effects of other factors. Comparing this model with the single regression one using Fbus (see Table 8), a major increase was seen in the model R2 value.

Table 8 Rotated component matrix
Table 9 Best linear regression models for predicting the ridership level
Table 10 A multiple regression model (M1 corridor)

4.5 Nonlinear Regression Analysis

To investigate the relationship between station ridership and design parameters further, nonlinear regression analyses were carried out. Assuming a significance level of 0.05, nonlinear regression models with improved R2 values than linear ones are illustrated in Table 11. For the A1 corridor, nonlinear models slightly improved the R2 values compared to the linear regressions, but the variables that produced higher R2 values were still Fdolmus, connectivity index and density.

Table 11 Nonlinear regression models for the A1 and M1

For the M1 corridor, similar variables were observed in the nonlinear models, as well. However, despite no significant linear relation between Ln(ridership) and CS, a cubic relation produced a very high R2 value of 0.970, which was so far the best regression model in this study.

4.6 Discussion of the Results

Density is reported as the factor most strongly influencing rail transit usage in the literature. It had a negative relationship with ridership for the A1 line, in both linear and nonlinear models. It had a positive impact on ridership for the M1 line, but only in the multivariate model. The connectivity and accessibility were modelled using the measures of CI and CS. Both measures were not statistically significant in linear relation for the M1 corridor, but the latter was highly significant in a cubic relation (see Table 10), and the former was negatively related to ridership for the A1 corridor (see Table 9), which is a counterintuitive result. Diversity, another important parameter, had a positive relation for the M1 corridor, but it was not statistically significant for the A1 corridor. Such corridor-based behaviour may be due to the facts that,

  • The M1 line serves the western corridor (developed as a result of controlled decentralization) along which the impact of density would be observed more clearly, whereas the A1 line runs along a former busway corridor that already had a variety of uses developed around the stations and might have a strong influence from the CBD, masking the expected impact of density.

  • In suburban areas that still experience development (such as along the M1), connectivity and accessibility measures appeared to have an increasing impact up to a limit in a nonlinear fashion, whereas in inner-city corridors (such as in the A1), this definitely has a significant impact because the built environment is already well established, and transit usage patterns are formed over longer periods of time.

Another significant finding of this study was that in the context of Ankara, public transport integration and infrastructure are still critically important in the usage of rail transit systems. Having feeder systems, frequency of public bus transport services and that of dolmus systems that support the urban rail stations are extremely important measures that increase urban rail ridership levels.

Finally, it is seen that when combining different planning and design parameters by applying a multiple regression analysis, no statistically significant results were found when the two systems were analysed together. These measures explain variations in stations’ ridership better when each corridor is analysed separately, suggesting that corridor-based characteristics are important, and parameters behave differently for different corridors.

4.7 Limitations of the Study

To study the impact of TOD policy, which is based on land-use parameters (diversity, density and connectivity), this study was established to focus on the effect of factors influencing URT ridership, including the TOD-based ones. Stations in the CBD which are expected to be a significant effect [6] and those with singular properties (such as industrial land-use zones and large shopping malls in the vicinity) were excluded at the data collection stage because they are expected to have the capacity to increase the ridership regardless of station area design or land-use characteristics. However, it is possible to include such stations to verify the expected high impact on ridership for future studies, even though it is highly likely that they will be removed in the modelling stage as outliers. Furthermore, in regions where more stations and land-use data are available, diversity and density variables can be introduced at different levels whenever the data allow, to test the difference of scale (micro- versus meso- versus macrodiversity).

Additionally, the PCA analysis and regression analysis were chosen. Even though the number of study stations was limited, it was still sufficient to develop statistically significant models with the selected parameters, shedding light on the research question. More advanced models can also be implemented, such as advanced machine learning algorithms, but the number of sampling size (both the number of stations and independent variables) is not appropriate to apply machine learning algorithms for prediction of the ridership data. More advanced models, such as advanced machine learning algorithms, should be considered when data from more URT stations, and more spatial design parameters, are available.

5 Conclusions and Further Recommendations

In the planning literature, within the theory of TOD, density, diversity and connectivity are accepted as the main station area design parameters influencing the ridership level for rail transit systems. They can be represented by different measures at the microscale and even interpreted for meso- or macroscales. However, the literature generally does not provide any quantitative measures for their impact. In this study, quantitative evaluation of the impact of the station area design parameters on the rail transit usage in Ankara was investigated. Context-specific results were different compared to the overall results:

  • The station area design parameters were not always linearly related to the ridership, and they did not behave similarly or significantly for every corridor.

  • Improved models using the natural logarithm of the ridership or nonlinear regressions suggested the diminishing effect of some of the parameters.

  • In the case of Ankara URT systems with two main corridors, ridership at a station was more influenced by the potential of integration with other modes, which was not addressed in the literature before and may be valid for a city with developing urban rail systems.

Excluding stations with extreme ridership values resulted in better models in some cases, which suggested that location-specific attributes, such as being in the CBD, terminal points and intermodal change points, might have stronger impacts on the ridership than station area design parameters. However, it should be kept in mind that, in this study, nonlinear models may be statistically more important due to the small sample size (the number of rail stations).

In the light of this study, the following recommendations can be made for the planning and design of urban rail transit station areas:

  • At the station area design process, a context-specific evaluation can provide further insights about the systems, expected usage and their relation with the urban environment. A system that is located in the inner city or in the suburban development area might have different results. This should be kept in mind while performing analyses on different systems or on systems with stations having varying characteristics. Corridors serving different parts of the city, such as the CBD, suburban or urban areas, may reveal different relations between the ridership and station area design parameters. Rather than treating all transit stations together, they should be divided into similar subgroups (according to transit system type, urban characteristics, ridership levels, etc.) and studied separately following the ‘market segmentation principle’.

  • For developing cities (or cities with developing urban transit systems), analysis and design of transit station areas must be considered in the light of alternative modes and potential integration with them. Integrated transport policies should be well planned and implemented all around the city.

For the case of Ankara, even though the decision regarding the new URT lines and stations is made at a macroscale based on many constraints, it is still possible to evaluate the land-use and public transit supply factors around new URT lines at both micro- and mesoscales, to determine the potential improvements to increase URT ridership. More importantly, the use of smart cards for public transit (both buses and URT services) is increasing; in future studies, it may be more beneficial to take advantage of advanced computational techniques, where even the purpose of a trip (commuting, education, etc.) can be deduced based on the spatiotemporal distribution of the trips made and land-use-based trip generation/attraction attributes (number of origins/destinations, trip chaining potential, etc.).