Introduction

In the contemporary urban landscape, the smooth functioning of freight transportation assumes unparalleled significance, particularly in the wake of the COVID-19 pandemic’s surge in e-commerce. It ensures efficient and timely distribution of goods to and from residential and commercial areas as well as freight hubs. However, the escalating freight vehicle kilometers traveled (VKT) has generated challenges in urban life, manifesting as road congestion and negative social and environmental impacts. Notably, freight vehicles’ substantial presence in urban environments, accounting for approximately 30% of the total traffic volume, with an additional 20% attributed to freight parking and delivery activities, is an example to highlight the importance of this issue (Patier 2002). Similarly, the significant proportion of highway VKT, estimated at 10.4% for different classes of trucks in the United States, provides evidence of the pervasive impact of freight transportation on urban infrastructure (FHWA 2021). In 2020, trucks were identified as responsible for a substantial 45% of the greenhouse gas emissions in the USA, emerging as the leading contributor among all transport modes (US Environmental Protection Agency 2022).

Numerous investigations studied the multifaceted impacts of bottlenecks within the realm of freight transportation. According to a report from the Canadian Automobile Association, Canada’s worst bottlenecks cost drivers around 11.5 million hours annually (CPCS and HERE Technologies 2017). In freight transportation, bottleneck congestion significantly impacts goods movement, resulting in an inefficient freight road network, delay costs for businesses, and price inflation for consumers. Notably, the Greater Toronto Area (GTA) experiences an annual average of $125 increase in goods and service costs for households, directly attributed to traffic congestion. Moreover, freight traffic congestion adds more than 15% to last-mile delivery costs for Canadian businesses and customers, with the overall annual delay cost for goods movement within the GTA supply chain ranging from $500 to $650 million (Toronto Region Board of Trade 2018). In addition, freight congestion contributes to other externalities such as negative environmental impacts stemming from air pollution and noise (Chin 1996; Profillidis et al. 2014). The severity of traffic bottlenecks is worsening in the targeted study areas such as the United States and Canada, which leads to growing economic, environmental, and social costs (Hale et al. 2016). Furthermore, there are compelling reasons to focus on the broader environmental and social costs of freight bottlenecks at the network level, as failing to do so can result in land-use patterns that not only produce substantial negative externalities affecting local communities but also adversely impact economic activity through excessive constraints on the private sector (Holguin-Veras et al. 2021).

Given the diverse range of consequences of freight bottlenecks, their identification and ranking hold importance in devising policy-making strategies for mitigating disruption in goods movement while considering social and environmental impacts. In this pursuit, harnessing the potential of freight big data emerges as a compelling approach to detect and analyze freight transportation bottlenecks at a much larger scale. As such, this study aims to contribute a perspective by redefining freight bottlenecks at the network level (based on the network topology). To efficiently process network-level freight bottleneck information, the authors developed a parallel-connected components algorithm that enhances runtime and reduces data processing with a two-layer labeling and graph contraction approach. This database-parallel implementation demonstrates a significant reduction of 30% in data processing. Through the case studies, we bridge the gap between the proposed bottleneck identification methodology and practical applications, advancing the understanding of freight bottleneck identification at the network level. The study outcomes can help inform decision-makers, transportation planners, policymakers, and researchers on freight-related work with a novel network view.

The subsequent sections of the paper are organized as follows: “Related Literature” outlines past studies on bottleneck research. “Research Contributions” summarizes the research contribution for this paper. “Methodology” explains the proposed methodology for data-driven, network-level freight bottleneck identification, from definitions to the implementation of the algorithm, as well as the ranking and visualization. “Case Studies and Results” presents case studies and discusses the results of the identification and ranking of freight bottlenecks at various geographical scopes. Finally, “Conclusion” consolidates the findings and implications of our research on freight bottleneck. In addition, we discuss potential avenues for further research.

Related Literature

The analysis of traffic bottlenecks has been a significant area of research since its inception in 1969. For a comprehensive review of traffic bottleneck modeling, interested readers can refer to Li et al. (2020). Freight bottleneck is a unique type of traffic bottleneck that exclusively involves the analysis of truck freight mobility, which accounts for a subset of trips in the road networks (FHWA 2020). Although there has been a limited number of studies that focused on freight bottlenecks, this section provides a review of those studies as well as a selection of relevant studies on traffic bottlenecks. Our selection included those studies that addressed truck traffic as well as those that proposed identification and ranking methodologies with a focus on the data sources. Given the difference in data sources and analytic focus, the definition and identification of freight bottlenecks are not yet widely agreed upon. The selection is aimed to be diverse, incorporating a wide range of bottleneck detection methods based on different data properties and analytical frameworks. We summarize the reviewed literature in Table 1 where the information serves to contextualize the contributions of this paper. The table’s columns list various attributes of bottlenecks identified in multiple studies, including a column labeled’topology’ to represent network-level modeling of the freight bottlenecks. Throughout the paper, the terms’topological’ and’network-level’ will be used interchangeably.

Table 1 Summary of reviewed bottleneck studies

Numerous studies have contributed to the understanding of traffic/freight bottlenecks. Based on research from the Federal Highway Administration, White and Grenzeback (2007) explain how recurring and nonrecurring congestion relates to freight transportation bottlenecks and proposes a methodology for freight bottleneck analysis, which includes (1) locating freight bottlenecks, (2) identifying truck volumes for the corresponding bottlenecks, and (3) estimating delay based on truck hours. Cambridge Systematics Inc. (2005) uses data from the Highway Performance Monitoring System (HPMS) to locate highway freight bottlenecks and rank them based on estimated truck hours of delay. Yuan et al. (2014) identify traffic bottlenecks at the micro level for specific roads, considering the number of vehicles in each lane and signal timing. Long et al. (2008) create a traffic simulation model and identifies bottlenecks based on average journey velocity (AJV) at road links. Zhao et al. (2013) use truck probe GPS data to identify roadway bottlenecks based on speed and statistical predictability and rank them based on travel time reliability. Yue et al. (2018) use casual congestion trees and graphs to identify urban traffic bottlenecks based on the significance of road segments. Sohail et al. (2019) use cloud-based detection of traffic bottlenecks based on open-sourced OBD-II telematics data (Rettore 2018) and conducts an experiment in rush hour in a small geographical region. Yang et al. (2020) develop a bottleneck pattern identification model based on flow oversaturation probability on expressways in Beijing. Kouchakzadeh (2021) examines the impact of the COVID-19 pandemic on traffic congestion affecting commercial vehicles in identified highway bottlenecks across the Greater Toronto and Hamilton Area (GTHA). This investigation utilizes travel speed data provided by Geotab Inc., adopting a criterion whereby traffic flow moving at less than 66% of the free-flow speed is defined as congested. Zhao et al. (2019, 2021) develop a model for real-time detection of dynamic traffic bottlenecks by finding the top influential geographical grids that will cause the greatest number of other congested geographical grids in the future based on congestion diffusion model. Chen et al. (2022) develop a dynamical traffic model for the periphery-downtown network based on the Vickery bottleneck model (Vickrey 1969). The Vickery model encapsulates traffic congestion by treating it as a system bottleneck. It allows for a nuanced understanding of traffic congestion by recognizing a limited throughput at a bottleneck and advocating for congestion-based user charges. Li et al. (2022) also integrated the Vickery bottleneck in the day-to-day dynamic traffic model for the accommodation of autonomous vehicles.

From the review of the literature regarding the identification and ranking of traffic and freight bottlenecks, we have summarized our findings in Table 1. The different bottleneck identification methodologies can be primarily categorized based on three metrics: (1) capacity, (2) queue, and (3) speed. Capacity-based approaches focus on examining the supply–demand relationship at a road network level where traffic demand exceeds the road capacity, leading to congestion. Queue-based approaches, on the other hand, concentrate on identifying specific nodes or road segments in the network, where vehicles slow down and form queues. Speed-based approaches aim to identify roads with traffic speed disruptions lower than the measured benchmarks. These three metrics for identifying bottlenecks are highly aligned and interconnected. For instance, road segments with low-speed vehicles caused by reduced speed limits could form queues downstream, significantly influencing the overall flow capacity of the road network (Soriguera et al. 2017). Furthermore, a spatial analysis of bottlenecks allows us to categorize them into two main special categories: (1) a moving/dynamic bottleneck, which occurs when slow vehicles disrupt upstream traffic flow, and (2) a stationary bottleneck, which refers to the blockages that may happen at certain parts of the road network (Daganzo 1997). Another categorization of bottlenecks is based on their frequency. Bottlenecks can be further divided into (1) recurrent bottlenecks, which occur frequently and predictably, and (2) non-recurrent bottlenecks, which are not as easy to predict and are more sporadic. Finally, bottlenecks can be analyzed based on the duration of their impact, whether they are long-term bottlenecks that are stable concerning time and geographical location, or short-term bottlenecks that are temporary and not observed as regular traffic disruption (Chen et al. 2014; Qi et al. 2019) (see Fig. 1).

Fig. 1
figure 1

Types of traffic bottleneck analysis

Given the distinct data sources and lack of research focused exclusively on freight vehicles, the definition and identification of freight bottlenecks are not yet widely agreed upon. While there are many research papers related to general traffic bottlenecks, little has been done specifically focusing on bottlenecks related to freight transportation. Truck freight mobility is different from other modes of transport mainly due to operational restrictions and performance benchmarks. Therefore, according to the Federal Highway Administration (FHWA), the analysis of truck freight transportation is justified as a unique research area (FHWA 2020). Specifically, freight bottleneck is considered the study of the economic loss of goods stuck in traffic (FHWA 2020). There is established research such as American Transportation Research Institute (2022), which focuses on more than 300 highway intersections. However, to the best of the author’s knowledge, no study has yet addressed the freight bottleneck at a network level and a large scale based on telematics data sources. The cited studies demonstrate diverse approaches employed by researchers and institutions, utilizing various data sources and analytical techniques for bottleneck identification and analysis. In this paper, we expand the definition of freight bottlenecks at the network level focusing on heterogeneous road types along with the bottleneck topology. In addition, our freight bottleneck ranking is based on a transportation utility function that allows various potential inputs for economic, social, and environmental inputs.

Research Contributions

Based on the review of literature in “Related Literature”, this work aims to advance the knowledge on freight bottleneck identification and ranking in the following four aspects: (1) network-based freight bottleneck definition, (2) parallel algorithm for freight bottleneck identification, and (3) utility-based and freight-specific bottleneck ranking.

  1. 1.

    Network-based freight bottleneck definition: Existing methods for identifying bottlenecks often restrict their focus to fixed segments or points of the road network with homogeneous road types. This can lead to a fragmented understanding, missing out on the spatial and interconnected dynamics between different road types, especially the crucial links connecting high-freight capacity roads. Addressing this gap, this paper introduces a network-level definition that encompasses the entire road network for bottleneck identification. Our framework not only considers standard freight transportation metrics but also the topology of the freight bottleneck inside the road network.

  2. 2.

    Parallel algorithm for freight bottleneck identification: By leveraging a novel database-parallel-connected component algorithm, we enrich the transportation literature with a method that can be scaled to a very large road network. The implementation algorithm is with complexity O(n) and proves to always converge. The algorithm in the case studies proves to be scalable and efficient, which facilitates a large-scale and data-driven bottleneck analysis.

  3. 3.

    Utility-based and freight-specific bottleneck ranking: Traditionally, freight bottleneck analysis has predominantly revolved around metrics like truck hours of delay, often overlooking the more intricate economic and environmental facets of the issue. In this study, we pivot from this singular metric approach and introduce a ranking framework grounded in a utility function. This utility function permits customization, enabling stakeholders to prioritize freight-specific metrics, such as emissions, based on strategic objectives in transportation planning. To further enhance the accuracy of understanding bottleneck-induced economic losses, we present a metric, weight-based temporal delay cost (in “Weight-Based Temporal Delay Cost”, recognizing that trucks of varying sizes and loads in congestion have different levels of impact on the supply chain. In addition, we present freight bottleneck concentration (FBC) metrics (in “Freight Bottleneck Concentration (FBC) Metrics”) to quantify the spatial extent of freight bottleneck influence on local roads thanks to the network-level information.

Data Description

The proposed methodology is based on and requires telematics big data, specifically utilizing high-accuracy telematics devices that can record raw GPS points (latitude and longitude) and timestamps. The types of equipment that can satisfy this requirement include telematics units, commercial GPS data loggers, and smartphones with advanced GPS functionality. On the spatial side, these devices are capable of providing GPS accuracy up to 2 m, which is crucial for effective map matching and ensuring the fidelity of the snap-to-road algorithm. On the temporal side, to strike a balance between accuracy and the size of the data, the telematics device in this study transmits only critical points based on the curve logging algorithm (Cawse 2005), which checks points of maximum error and sends only the critical points necessary to reconstruct the vehicle trip speed profiles. For the GPS data in this study, the average for varied GPS intervals ranges from 5 to 30 s, depending on the trip. If the GPS device lacks this capability, it would be recommended to log data at fixed intervals of 5 s and at most 30 s to ensure the capture of speed profiles for each trip while balancing the size of data transmission, enabling the detection of congestion at necessary road segments within the network. The methodology also requires data for the weight class of vehicles for the proposed delay cost metric in “Weight-Based Temporal Delay Cost”. This big data ensures comprehensive capture of speed profiles for the road network throughout the time buckets in a day, and high-accuracy GPS points with timestamps ensure the effectiveness of the map-matching snap-to-road algorithm for matching telematics data to the road network. For this research, two datasets are used: (1) a map dataset based on the OpenStreetMap (OSM) (OpenStreetMap 2022) geographical database, and (2) an interpolated snap-to-road dataset.

  • The map dataset based on OSM contains graphical information about road networks with nodes and edges, as well as geographical information. Each edge represents a road segment based on OSM, and each node represents the connection between edges. OSM node and edge identification (Id) numbers are used later in “Bottleneck Identification with Parallel Algorithm” for graph representations.

  • The interpolated snap-to-road data are processed from raw telematics data from truck telematics using the curve logging algorithm (Cawse 2005) for efficient transmission of interpolated vehicle traversal information from telematics devices to servers without losing significant details (Saalfeld 1999). The authors incorporate a pre-existing snap-to-road algorithm (Lewis and Liu 2023) adapted for this study that transforms raw GPS points into traversals on a per-trip basis on OSM. While there are also APIs such as (Google Inc. 2021) that can implement snap-to-road functionality, this research utilizes its own tailored algorithm. The data are organized in a relational format, where each observation represents a trip traversal, characterized by variables such as speed, timestamp, direction, road segment length, and the sequence of that segment within the corresponding trip. Additional metadata, including device ID and vehicle weight class, are also captured within each observation. After processing, the dataset contains interpolated vehicle traversal data snapped to the map road segments.

In addition, in the data aggregation process, we ensure the aggregated dataset for the proposed methodology follows the data privacy policy of the telematics provider (Geotab Inc. 2021).

Methodology

This section presents a comprehensive methodology for identifying and ranking freight bottlenecks, complemented by a parallel-connected components algorithm to enable scalable network-level investigation for bottleneck based on telematics big data. All the variable notations used in the methodology throughout the paper are summarized in Appendix A. The methodology developed in this study begins with a definition of monthly freight bottlenecks, incorporating specific spatial and temporal features. Figure 2 provides a summary of the methodology in five main steps. Step 1 involves the aggregation and processing of raw telematics and road network data, creating aggregated cloud datasets for subsequent analysis. Steps 2 and 3 compute all freight traffic metrics. Step 4 identifies freight bottleneck using a parallel-connected components algorithm. Finally, Steps 5.1 and 5.2 involve the ranking and visualization of the identified freight bottlenecks.

Fig. 2
figure 2

Flowchart of the proposed freight methodology for freight bottlenecks

Definition for Freight Bottleneck (Monthly Aggregation)

Freight bottlenecks, and traffic bottlenecks in general, have been defined in different ways by previous research studies. The choice of bottleneck identification methodologies is often determined by the nature of the research, as well as the availability, scope, and type of data. Based on the three types of approaches mentioned in “Related Literature”, our model applies the speed-based bottleneck identification approach due to the nature of the telematics dataset. (as discussed in “Methodology”). In addition, given the limited visibility of the total number of vehicles on the road, capacity-based detection for freight bottleneck is not suitable as it becomes challenging to accurately measure or estimate the road capacity. Similarly, queue-based identification requires comprehensive visibility of the number of vehicles on the road, which is not achievable with the subset of trucks’ with installed telematics. However, thanks to the ongoing development in telematics big data, an increasing number of vehicles are connected, which allows a broader analysis of the freight movement with historical and real-time information (Daganzo et al. 2020; BCC Research 2022).

To establish a cohesive, network-level definition based on 1 month of data, we present the following definition that serves as a foundation for the proposed network-based identification and ranking methodology. A freight bottleneck is a sub-network in which freight vehicle movement is disrupted for a given time bucket. The sub-network considered in this definition can encompass any road type (for freight vehicles) within the road network, and the chosen time bucket should be sufficiently granular to avoid encapsulating periodic traffic waves. For instance, daily (24-h) time bucket is impermissible as it contains a full cycle of morning and evening peak traffic periods. By isolating these periods, the methodology can more accurately capture the unique characteristics and dynamics of traffic flow at a more granular level. In this research, a 1-h time bucket for monthly aggregation is implemented. Details of the monthly data aggregations are discussed in “Data Processing and Aggregation”. The definition allows the capture of an additional layer of network-level information for freight movement disruption.

The identification model uses levels of speed below the minimum of free-flow speed and speed limit as a detection framework and the delay cost of freight bottlenecks is based on a weight-based temporal delay cost metric detail in “Weight-Based Temporal Delay Cost”. The speed margin detection discussed in “Freight Traffic Disruption Identification” along with the freight bottleneck detection algorithm in “Bottleneck Identification with Parallel Algorithm” will group road segments to form a network-level bottleneck.

Data Processing and Aggregation

Data processing and aggregation are integral in analyzing large datasets, enabling the extraction of meaningful insights, promoting data quality and accuracy, and protecting privacy. The proposed methodology aggregates the two datasets mentioned in “Methodology” to a monthly freight traffic dataset, for 1-h time buckets, providing monthly freight bottlenecks considering data privacy (Duri et al. 2002). The choice of aggregation period is based on operational relevance, freight traffic dynamics, and the trade-off between granularity and data volume. It is worth noting that during the aggregation process, road types unrelated to truck routes in OSM map road classification (OpenStreetMap Wiki contributors 2022) are omitted. As shown in Table 2, the columns in the aggregated monthly dataset can be categorized into groups: period, geography, road segment, time bucket, speed, travel time, and freight load. The aggregation is performed within Google BigQuery based on the formulas shown in Table 2. The variable notations in the table are detailed in Appendix A. For simplicity, the formulas omit the subscripts for various variables by assuming the data are grouped by and aggregated based on the bold metrics in Table 2. For example, average speed \({\overline{v} }_{e,d,a,b,n}\) on road segment \(e\) on direction \(d\) in the time bucket \(\left[a,b\right)\) categorized by weekday or weekend using binary variable \(n\) is simplified to notation \(\overline{v }\).

Table 2 Aggregated monthly freight traffic dataset for \(\forall e \in E\) at time bucket \(t\in \left[a,b\right)\) on direction d categorized by weekday or weekends using n, the set of all traversals is L

Measuring delay cost for passenger vehicles often involves the utilization of metrics such as delay per person or vehicle volume. However, for freight transportation involving trucks, evaluating truck delays based on the weights of goods carried and the time delay (Coyle et al. 2011) offers a deeper insight. By utilizing a load-based measurement, one can better quantify the economic implications of delays. This method is especially relevant because trucks with different payload capacities carry significantly different loads of goods. Since the telematics device does not measure payload information for trucks in each trip, we use payload capacity as weight based on the class of the truck as the upper bound for the weight of freight carried. The maximum payload capacity for all classes of vehicles is based on a report from the United States Department of Energy (Transportation Research Board, National Research Council et al. 2010; United States Department of Energy Vehicle Technologies Office 2010). Such an upper bound for truck payload enables the benchmark of delay cost based on weight instead of truck volume on the road.

All the aggregated speed metrics are shown in Table 1 and computed based on the spot speed \({\widehat{v}}_{i,e,t,d}\) (maximum observed speed of the vehicle \(i\) in each road segment \(e\) group by time buckets \(t\) and weekday or weekend \(d\)) instead of travel speed (measured by \(\frac{l(e)}{{t}_{i,e,d}^{0}-{t}_{i,e,d}^{1}}\)) (Wolshon et al. 2016). Aggregation based on the spot speed throughout the road segment aids with the measurement of free-flow speed without traffic control (signals, stop signs, etc.) in “Benchmark Speed”.

Benchmark Speed

Free-flow speed, also known as operating speed, is the speed at which a vehicle travels under uncongested or ideal conditions (Toole 2009). For bottleneck analysis, a key metric for freight speed margin identification is the benchmark speed, in which the actual speed at different time buckets is compared against (Spiller et al. 2017). The margin (speed difference) based on the benchmark speed is fundamental for the speed-based bottleneck identification algorithm in “Bottleneck Identification with Parallel Algorithm”. This model defines benchmark \({\widetilde{v}}_{e,d,a,b,n}=\text{min}\left(ff{s}_{e,d,a,b,n},{v}_{e,d,a,b,n}^{*}\right)\), where \(ff{s}_{e,d,a,b,n}\) is the free-flow speed and \({v}_{e,d,a,b,n}^{*}\) is the speed limit at the any road segment. The methodology filters the vehicle type to focus only on all classes of trucks, excluding passenger cars. The free-flow speed

$$ff{s}_{e,d,a,b,n}=\text{max}\left\{{v}_{e,d,a,b,n}^{.85} | \forall (a,b),n\right\}$$
(1)

is calculated as the maximum of all 85th percentile at all time buckets, \([a,b)\), for weekdays or weekends, \(n\), from road segment \(e\) in the direction, \(d\), by the definition from Federal Highway Administration (FHWA) (Toole 2009). To ensure the reliability of the benchmark calculation, we exclude road segments with a sample size of less than 50 for a month, as such instances may be influenced significantly by individual vehicle behavior. The presence of an insufficient minimum sample size for speed detection could lead to inaccurate results (Varsha et al. 2016). After conducting a series of tests with sample size thresholds ranging from 10 to 150, we established that a minimum of 50 instances per month strikes an optimal balance. This threshold minimizes the distortion that outliers can cause in the aggregated data, such as idling instances that disproportionately affect travel time on free-flowing segments. It also allows for the inclusion of congested segments from lower level roads, essential for a complete view of network-wide freight traffic disruptions. Consequently, segments that recorded fewer than 50 trip traversals or no data are excluded from the road network for the purpose of this study.

Freight Traffic Disruption Identification

After determining the benchmark speed \({\widetilde{v}}_{e,d,a,b,n}\) for all the road segments within the freight transportation road networks, we employ a speed-based detection approach for freight traffic disruption based on the average speed \({\overline{v} }_{e,d,a,b,n}\). This divides the levels of speed margins against the benchmark speed for freight transportation into four levels based on the ratio of the actual aggregated speed at specific time buckets to the identified benchmark speed in “Benchmark Speed”. In Table 3, freight traffic disruption levels greater than or equal to one are considered freight congestion. In the table, we have chosen to present the speed variables without their subscripts for simplicity. The full notation can be found in Appendix A.

Table 3 Different levels of freight traffic disruption

To retrieve the network with disruption for the database, the model queries a sub-network comprising only those road segments with freight traffic disruption levels equal to or greater than one for the algorithm in “Bottleneck Identification with Parallel Algorithm” to only focus on the disrupted road segments. For real-world applications, the thresholds used to detect speed margins could be customized based on regional standards to adapt the methodology to specific contexts and scenarios. In addition, same as “Benchmark Speed”, the methodology excludes the road segment with a sample size of less than 50 in a month.

Weight-Based Temporal Delay Cost

To measure the severity of freight bottleneck, one of the most widely used metrics, especially for carriers, is delay cost (Margiotta et al. 2015). To measure delay and compare it with other freight bottlenecks, the truck volume-based methods (Cambridge Systematics Inc. 2005; American Transportation Research Institute 2022) may not capture the difference in payload capacity among different classes of trucks. To address this limitation, this paper introduces a novel metric for measuring delay cost based on weight. The unit measure for freight transportation performance is \(\text{kg}\cdot \text{s}\) [\(\text{kilogram}\times \text{second}\)]. The new metric is referred to as weight-based temporal delay cost, denoted by \(C\), and is calculated as follows (notations detailed in Appendix A):

$${C}_{e,d,a,b,n}=\left({\overline{t} }_{e,d,a,b,n}-\frac{l(e)}{{\widetilde{v}}_{e,d,a,b,n}}\right){W}_{e,d,a,b,n}$$
(2)

where \({W}_{e,d,a,b,n}=\sum {\widehat{w}}_{i,e,t,d}\). The weight-based temporal delay cost is assessed for a road segment \(e\) on direction \(d\) in the time bucket \([a,b)\) on a weekday or weekend \(n\). This cost is computed by taking the product of two primary factors: the maximum payload capacity for all vehicles, and the time delay at any road segment. The delay is determined by finding the difference between the average travel time and the benchmark speed travel time, denoted as \({\widetilde{v}}_{e,d,a,b,n}\). For example, the weight-based delay cost of one Class 2 vehicle with a max payload capacity of \(4000\text{ lbs}\) (\(1814\text{ kg}\)) in a road segment with an average travel time of 20 s at 10–11 a.m. and a benchmark speed travel time of 15 s is \({C}_{e,d,a,b,n}=(20-15)\times 1814=9070\) \(\text{kg}\cdot \text{s}\). It is worth noting that maximum payload capacity is an upper bound of delay cost based on weight. In the conclusion section, we discuss possible research directions to incorporate the real carrying load.

Bottleneck Identification with Parallel Algorithm

In “Definition for Freight Bottleneck (Monthly Aggregation)”, the identification of freight bottleneck is to detect the sub-network from the road network with freight traffic disruptions. Considering a road network is a graph, the freight bottleneck detection model should be scalable for a large graph within a practical computational time. Therefore, serial implementation of the connected components algorithm, such as depth-first search (DFS) or breadth-first search (BFS), is not suitable for large road networks stored in a database. Therefore, we propose a parallel implementation of the connected components algorithm implemented directly in the cloud database.

The connected components algorithm has been researched and implemented in various domains, notably in image processing for pattern recognition (He et al. 2017). Previous research implemented the parallel-connected components algorithm at a large scale on shared memory devices using OpenMP (Chandra et al. 2001; Niknam et al. 2010; Chapman and Kalyanaraman 2011; Gupta et al. 2014; Slota et al. 2014; Zhang et al. 2020; Manne and Patwary 2022) or distributed memory devices using MPI (Buš and Tvrdík 2001; McLendon III et al. 2005; Plimpton and Devine 2011; Gianinazzi et al. 2018; Message Passing Interface Forum 2021; Lamm and Sanders 2022). Computational and algorithmic analysis is essential due to the immense size of the telematics dataset, which is predominantly stored in a cloud database environment. Therefore, this step implements the algorithm directly in a database environment using Google BigQuery with database parallelism (Google Inc. 2022). In addition, the implementation of the parallel-connected components algorithm can link with all other steps in the model in a cloud database to enable the recursive generation of freight insights monthly as a data pipeline. A core idea behind this approach is to decompose the large-scale road network graph into manageable independent freight bottlenecks sub-graphs (Sigurdsson 2018). Subsequently, for each independent freight bottleneck, attributes can be calculated and ranked based on the algorithm and the pre-computed road segment attributes.

Consider a road network as a directed graph \(G=(V,E)\), where \(V\) is the set of nodes with unique Ids according to OSM. To simplify the graph representation and fit the algorithm implementation, an edge is represented as a 2-element node tuple \((v,u)\), for \(v,u\in V\), using the starting and ending node Ids instead of the edge Id defined in OSM. Theorem 1 states the identification of the connected components in the graph can be captured by the feasible set of freight-congested sub-networks:

Theorem 1

(Fan and Golari 2014). Let \(G=(V,E)\) be a graph with \(m\) connected components \({H}_{1},{H}_{2},...,{H}_{m}\). \({V}_{k}\) is the set of nodes in connected component \({H}_{k}\) and \({E}_{k}\) is the set of edges represented by 2-element tuple \((v,u)\) where \(v,u\in {V}_{k}\), \(k=\text{1,2},...,m\). Let \({z}_{ik}\in \{\text{0,1}\}\) be a decision variable such that if \({v}_{i}\in {V}_{k}\Rightarrow {z}_{ik}=1\) otherwise \({z}_{ik}=0\). Let \({x}_{ij}\) be a decision variable such that \(({v}_{i}\in {V}_{k}{v}_{j}\in {V}_{k}i\ne j)\Rightarrow {x}_{ij}=1\) otherwise \({x}_{ij}=0\). The feasible set of the formulation (3)–(10) will find all the connected components in \(G\) such that the vertex set of \({H}_{k}\) can be expressed as \({V}_{k}=\{{v}_{i}\in V:{z}_{ik}=1\}\):

$${x}_{{i}_{1}{i}_{2}}+{x}_{{i}_{2}{i}_{3}}-{x}_{{i}_{1}{i}_{3}}\le 1$$
(3)
$${x}_{{i}_{1}{i}_{2}}-{x}_{{i}_{2}{i}_{3}}+{x}_{{i}_{1}{i}_{3}}\le 1$$
(4)
$$-{x}_{{i}_{1}{i}_{2}}+{x}_{{i}_{2}{i}_{3}}+{x}_{{i}_{1}{i}_{3}}\le 1$$
(5)
$$\forall {i}_{1},{i}_{2},{i}_{3}|{i}_{1}\ne {i}_{2}\ne {i}_{3}{v}_{{i}_{1}},{v}_{{i}_{2}},{v}_{{i}_{3}}\in V$$
(6)
$${x}_{ij}=\sum_{k=1}^{m}{z}_{ik}{z}_{jk}$$
(7)
$$\sum_{k=1}^{m}{z}_{ik}=1$$
(8)
$${z}_{ik},{x}_{ij}\in \{\text{0,1}\}$$
(9)
$$\forall i,j|{v}_{i},{v}_{j}\in V,k=\text{1,2},...,m$$
(10)

Theorem 1 proves the completeness of the algorithm based on the network-level formulation of the freight bottleneck, effectively identifying all elements within the defined problem space.

Proposed Algorithms

Before applying Theorem 1 formulation to identify freight bottlenecks (detailed in Algorithm 2) using the connected components algorithm, a network of freight traffic disruption is constructed. Algorithm 1 constructs an undirected graph for the freight traffic disruption road network on “Freight Traffic Disruption Identification”. By adopting this approach, the subset of road segments experiencing freight traffic disruption and their topological connection are selected. Then, we can identify the topology of the freight bottleneck. The choice is rooted in the network-based approach and the relevant metrics, where roads with congestion in only one direction do not impact the final ranking of freight bottlenecks since the non-disrupted road segments do not add up to the delay of the whole sub-network of freight bottleneck. Another assumption for the road segment topology is that bi-directional open roads are connected and related, while closed roads are disconnected and unrelated. Such assumption is based on the premise that identifying network-level (topological) information for bottlenecks requires accurately model the inter-connectivity of the road network for open and closed roads. This assumption is also data-consistent and can be implemented in OSM as closed roads have unique node and edge Ids for different directions, while open roads have the same node and edge Ids with a binary direction variable, as noted in Table 2.

Algorithm 1 shows the parallel construction of the undirected graph. Based on the connectivity assumption, the graph modeling simplifies the connectivity without losing road network topological information for identification of bottlenecks, and the undirected graph can be feed into Algorithm 2 for efficient parallel processing. The algorithm processes an undirected graph based on operations for every vertex \(v\in V\). By checking all the neighboring nodes \(u\in N(v)\) where \(N(v)=\{u|{x}_{vu}=1\}\), if the edge is not bi-directional, the algorithm inserts an additional edge in the opposite direction to the edge set in the original graph.

Algorithm 1
figure a

Parallel generation of an undirected graph

Algorithm 2 describes the implementation of the parallel-connected component algorithm using a parallel 2-layer node labeling and graph contraction. This generates the connectivity information within the network such that all the road segments within the same freight bottleneck carry the same label at the end of the algorithm. The algorithm implements a graph contraction (or edge contraction) (Philips 1989) and 2-layer labeling adopted from a two-pass (or two-scan) connected component labeling (Hernandez-Belmonte et al. 2011). This approach enables its implementation in the database. The 2-layer labeling is the algorithm speed-up tactic implemented for efficient processing (compared to 1-layer labeling). The 2-layer labeling consists of local and global labeling. Local labeling scans the graph to create provisional labels for graph contraction, while global labeling assigns labels at the current iteration to the original graph vertices. All the mapping functions for labeling are stored as partitioned tables, distinguishing them from previous works.

Algorithm 2
figure b

Database parallel-connected component algorithm with 2-layer node labeling

The algorithm begins by creating a copy of the original graph, which will be repeatedly constructed every iteration. During each iteration, the graph is contracted based on identical labels (which are determined by minimum neighboring node Id) until it becomes an empty set, at which point all connected nodes possess the same label. The algorithm initializes the mapping for node label \({f}_{g}\) to map the node Id to itself. For every vertex \({v}_{\text{update}}\) in each iteration, the algorithm searches in parallel for the minimum node Id in the neighboring nodes, including the node itself. Then, updates the global labeling function \({f}_{g}\) and creates a local labeling function \({f}_{l}\) based on the current minimum neighboring node Id. Based on the local labeling \({f}_{l}\), the algorithm contracts the copied graph by modifying the node Id with the current node label. The edges then form a multiset with duplicates incurred by updating the graph node ids with minimum neighboring node IDs. In order to remove the duplicates to form a new contracted graph, the algorithm finds the support of multiset \({E}_{\text{update}}\) and removes the self-loop in the graph to compose a new vertex and edge set after contraction. The algorithm stops if the updated graph \({G}_{\text{update}}\) is empty, which indicates the completion of graph contraction.

The 2-layer labeling approach in the database parallelism for connected components algorithm offers two significant advantages: (1) reduced computational overload: in database parallelism, table joining and table repartitioning are two computationally expensive operations (Google Inc. 2023). By employing a 2-layer labeling strategy, the data size is reduced as the road network graph contracts with each iteration. This reduction in data size leads to a decrease in the computational time required for graph contraction, especially when performing table joining of local labeling and the updated graph. The reduced computational overhead improves the efficiency of the connected components algorithm and speeds up the processing time. (2) Efficient storage utilization: As the algorithm progresses through iterations, the buffer for storing local labeling reduces, reducing the required storage space. This efficient storage management results in better performance and avoids potential storage bottlenecks that could impact the algorithm execution speed.

Theorem 2 proves the convergence of the proposed algorithm in finite iterations. In Theorem 2, it is shown that the implementation of Algorithm 2 will work for road network graph as long as the node ids in the graph are unique. The uniqueness of node Ids based on OSM and the minimum neighboring node Id labeling strategy guarantees the termination of the parallel-connected components Algorithm 2 in finite iterations, without encountering deadlocks. The proof of Theorem 2 is based on strong induction.

Theorem 2

Let G = (V, E) be an undirected graph with a set of nodes V having unique Ids and a set of edges E represented by 2-element tuple (v, u), where v, u ∈ V. Then, Algorithm 2 implemented on a finite graph G (any subset of the whole road network) will always terminate.

The worst-case complexity for parallel implementation of the connected components is determined to be \(n-1\) iteration, which can be expressed as \(O(n)\), where \(n\) represents the maximum number of nodes across all freight bottlenecks. Consequently, the algorithm’s runtime will scale up linearly based on the graphically largest freight bottleneck within a given time bucket. However, in practical scenarios, the number of iterations for the parallel-connected components is expected to be significantly lower than the worst-case scenario. This is primarily attributed to the non-sequential nature of node Ids based on OSM data along roads. This characteristic enables much faster graph contraction than the sequential node Ids for connected components. An experiment demonstrates this in “Case Studies and Results”.

After executing Algorithm 1 and Algorithm 2, the resulting dataset contains the network-level information for freight bottlenecks represented by a bottleneck Id. The bottleneck Id is assigned to each connected component, corresponding to the minimum node Id of that component. After obtaining the network information for freight transportation bottlenecks, the next step is to merge results into one dataset including benchmark speed from “Benchmark Speed”, freight traffic disruption in “Freight Traffic Disruption Identification”, weight-based temporal delay cost from “Weight-Based Temporal Delay Cost”, and the bottleneck Id in “Bottleneck Identification with Parallel Algorithm” back to the aggregated dataset in Table 2. Then, the dataset contains various metrics along with the topological relationship based on road segments.

Toy Example of Parallel Implementation

Figure 3 shows an example of the parallel implementation of the proposed connected components based on graph contractions discussed in Algorithm 2. The tables in Fig. 2 show the corresponding representation of an undirected graph where the sequence of node labels is inconsequential. The label table for the current iteration is determined based on the minimum neighboring node id. The graph edges represent the road segments that have been identified with freight traffic disruption as discussed in “Freight Traffic Disruption Identification”.

Fig. 3
figure 3

A toy example of database-parallel implementation of connected components to identify freight transportation bottleneck

In each iteration of the algorithm, the first step involves identifying the local labeling for the current iteration of the graph. This is achieved by updating the global labeling, and contracting the graph based on the groups of nodes with the same local labeling. In the post-contraction updated graph table, duplicates and self-loop edges are removed, creating a new graph for the subsequent iteration. The algorithm proceeds iteratively until it reaches a point similar to Fig. 3(f), where further graph contraction would lead to an empty graph, i.e., \({G}_{\text{update}}=\varnothing\). The algorithm terminates when the graph table is empty.

Freight Bottleneck Concentration (FBC) Metrics

With the network-level information identified through our proposed methodology, we present the freight bottleneck concentration (FBC) metric to quantify the extent of freight bottleneck influence at the network level on nearby roads. This metric leverages the principle of integrating freight considerations into land-use planning for quantitative assessment of network-level impacts.

Let \(l(H)={\sum }_{e\in H}l(e)\) denote the length of the freight bottleneck \(H\), and \(\text{Conv}(H){\bigcap }G\) represent the road segments from the full road network that fall within the minimum convex hull of the freight bottleneck, which also belongs to the set of freight-significant road segments mentioned in “Benchmark Speed”.

We define the freight bottleneck concentration metric (\({\alpha }_{H}\)) as

$${\alpha }_{H}=\frac{l(H)}{l(\text{Conv}(H)\bigcap G)}$$
(11)

The metric \({\alpha }_{H}\) quantifies the degree to which a given road network is affected by freight bottlenecks, with larger values indicating a higher concentration of freight bottlenecks at a local level. The convex hull is achieved by computing the minimum convex geographical boundary that covers the sub-network of bottleneck \(H\). Then, the road segments that are inside are filtered through the intersection of \(\text{Conv}(H)\) and the full road network dataset \(G\). Figure 4 shows the sub-network and the minimum convex hull that enclosed the freight bottleneck. FBC measures the percentage of the bottleneck’s area represented within the convex hull. At the policy-making level, the FBC metric serves as a pivotal tool, providing a clear, quantifiable means to gauge the severity and spatial distribution of freight bottlenecks. It aids policymakers in identifying critical sub-networks significantly impacted by these bottlenecks, thereby informing the development of land-use policies that integrate the implications of freight bottleneck impacts at the network level within the affected areas.

Fig. 4
figure 4

An example of the smallest convex hull enclosed a freight bottleneck

FBC establishes a measure for the extent of the impact of the freight transportation bottlenecks. It incorporates a spatial component through the use of a convex hull, allowing us to consider not only the bottleneck length but also the distribution and relative location of bottlenecks. In addition, FBC allows large-scale comparison, enabling us to assess and compare bottleneck situations across different geographic areas, or over time within the same area. The comparative analysis of the metrics is valuable as it captures the quantifiable sub-network for freight bottlenecks defined in this paper.

Freight Bottleneck Ranking and Visualization

After the identification of freight bottlenecks on a network, the analysis is based on the aggregation of road segment metrics to bottleneck sub-network topology. The final ranking of the freight transportation bottleneck is determined by a utility function \(U(C)\), which can be defined using a set of attributes \(C\) measuring different metrics. The set of attributes includes a variety of metrics that are pivotal for evaluating and understanding freight bottlenecks, as illustrated in Table 4. These metrics extend beyond basic measurements, encompassing diverse aspects of freight movement based on the freight performance measurement metrics defined by the FHWA. The visualization and animation of long-term freight bottleneck progression are generated using the open-source geospatial visualization tool Kepler.gl.

Table 4 Freight bottleneck metrics based on freight performance measurement metric from FHWA (Margiotta et al. 2015; FHWA 2017b)

Table 4 shows the various metrics for ranking freight bottlenecks, which are developed based on US Federal Highway Administration reports (FHWA 2017a). Many of the metrics are known in the field. In the analysis for freight bottlenecks, the authors extrapolate and develop the formula based on the definition of freight bottleneck in “Definition for Freight Bottleneck (Monthly Aggregation)” with a focus on aggregation based on freight bottleneck sub-network topology. Similar to Table 2, the notations in Table 4 are simplified by ignoring part of the subscript including temporal metrics a, b, and n. The computation of variables in Table 4 are grouped by and aggregated based on the temporal metrics.

The utility function can have a deterministic ranking of all the freight transportation bottlenecks considering different sets of preference parameters (Armstrong 1939; Rader 1963). This allows decision-makers and planners to prioritize their efforts effectively by ranking the freight transportation bottlenecks and allocating capital resources accordingly. By assigning weight parameters to the metric, the function ranks freight bottlenecks differently. The utility function is represented as

$$U(C)={\beta }_{1}{c}_{1}+{\beta }_{2}{c}_{2}+\dots +{\beta }_{n}{c}_{n}$$
(12)

where \({\sum }_{i=0}^{n+m}{\beta }_{i}=1\) and \(\beta\) is the weight parameter for the utility function. The delay and social costs are represented by a score \(c\in C\) normalized from various bottleneck metrics. The analysis model supports freight bottleneck ranking in a specific geographical region or worldwide.

Visualization of freight bottlenecks on maps comes with network-level information and changes of the sub-networks over time. The framework of selecting the visualization techniques for freight transportation includes investigation of level of abstraction of data and the intend objective for the matching visualization techniques (Ma et al. 2022). In the context of the proposed definition of freight bottleneck, the suitable visualization technique is a spatial abstraction of trajectory information to display spatial information combined with one layer of animation for temporal information. Due to the large number of freight bottlenecks happening in various time buckets, the visualization implements a filter to let the user decide the top number of freight bottlenecks displayed to avoid visual occlusion.

Figure 5 shows the layout of the visualization tool. After loading the bottleneck dataset, the model can generate visualizations with freight bottleneck network information, time buckets, and bottleneck metrics. The interactive visualization allows users to investigate different road segments, highlighting the bottleneck network topology and displaying the metrics from the dataset. The user can also change the time bucket to explore bottlenecks at different periods.

Fig. 5
figure 5

Example of using Kepler visualization tool for freight bottlenecks

Case Studies and Results

In this section, we present three case studies of freight bottlenecks using the proposed methodology at different spatial–temporal scales. The case studies for freight bottleneck identification and ranking are based on telematics data from February 2022 provided by Geotab Inc. In February 2022, Geotab had over two million vehicles connected through their telematics devices (Corrente 2022). There are 170,891,847 trips by freight vehicle in the month in the telematics dataset and the breakdown of trips by various weight classes of trucks is shown in Fig. 6. The data represent from around 5% of the commercial vehicle traffic depending on geographical location.

Fig. 6
figure 6

Percentage of trips by different classes of trucks

The first study is on the province of Ontario in Canada, with the primary objective of presenting the visualization and analytical results of our model and benchmarking the computational performance. By intentionally narrowing our focus to just the province of Ontario, we can effectively manage the computation and compare the performance of different algorithms under a controlled scale.

The second case study widens the spatial scale to cover all freight bottlenecks across the US, with an analysis of the top 100 bottlenecks. This broader approach demonstrates the scalability of the methodology and its efficacy when applied on a very large road network. In addition, this large-scale implementation provides an opportunity for model comparison and validation.

The third case study widens the temporal scale to cover the 12 months of the year 2022, with a focus on a comparison analysis of the most congested bottleneck in the US. This study compares the identified top bottleneck using the proposed weight-based temporal delay cost against a widely used freight bottleneck identification and ranking approach by from American Transportation Research Institute (ATRI) (American Transportation Research Institute 2022).

Case Study: Ontario

In this subsection, we explore a case study of freight bottlenecks in Ontario from two main perspectives. First, we present the visualization and ranking of freight bottlenecks using a single metric of weight-based temporal delay cost. Second, we discuss establishing a computational performance benchmark and evaluating our methodology during the implementation of the connected components algorithm.

To demonstrate the visualization and ranking of freight bottleneck, we employ a single metric of weight-based temporal delay cost. Figure 7 is a snapshot of visualization for the top-ranked freight bottlenecks in Ontario based on weight-based temporal delay cost. The distribution of the top 20 freight bottlenecks in Ontario is displayed on the timeline. The network information for the most severe bottleneck ranked by weight-based delay cost is shown with part of its metrics. The figure shows in the red circle is the weekday freight bottleneck northeast of Toronto spanning along Highway 401 from 8:00 a.m. to 9 a.m. with the freight bottleneck topology.

Fig. 7
figure 7

Snapshot of animation visualization of top freight bottlenecks in Ontario

Table 5 contains the top 10 freight bottlenecks that the proposed method detected in Ontario. In addition to the metrics indicated in the table, geographical information for all the identified bottlenecks can be easily queried for visualization and analytics. Other numerical metrics proposed in “Freight Bottleneck Ranking and Visualization” are also available for analysis.

Table 5 Top 10 freight bottlenecks in Ontario (with partial metrics), February 2022

To establish a benchmark for the computational performance of our methodology, we also use the Ontario case study as an example. Total bytes processed for the proposed algorithm are tested. The choice of total byte processed as the benchmark is due to variations in computational time, depending on the availability and dynamically allocated computation in cloud database.

Benchmarking the performance of computing network information for freight research is challenging, primarily because no previous freight research has employed this specific method to the best of the author’s knowledge. Therefore, in this experiment, we conducted a comparison between 1-layer and 2-layer labeling strategies for the algorithm. The results demonstrated that the 2-layer labeling strategy reduces the total number of bytes processed by 31.75% compared to the 1-layer labeling strategy while exhibiting similar computational times when using similar computational resources.

The 2-layer parallel-connected components algorithm required 21 iterations for the road network in Ontario, while the maximum number of edges for the detected freight bottlenecks was 1630. Figure 8 illustrates the total bytes processed for each iteration of the connected components algorithm with 1-layer and 2-layer labeling. In the case of 1-layer labeling, the graph contraction step requires joining the full graph table and the label table, resulting in constant total bytes processed. Conversely, as shown in Fig. 8, the 2-layer labeling approach requires an additional step to compute local labeling. However, as the graph contracts, the computational cost decreases rapidly. As a result, the sum of the additional local labeling step and the graph contraction step in the 2-layer connected components algorithms proves to be more efficient.

Fig. 8
figure 8

The total bytes processed comparison for each iteration with 1-layer and 2-layer labeling

Furthermore, the experiment revealed that the algorithm with 2-layer labeling takes approximately 5 min while 1-layer labeling takes around 7 min. The nearly 40% increase in computational time efficiency aligns with the reduction in total bytes processed, emphasizing the efficiency achieved by the 2-layer labeling strategy. The results indicate the computational efficiency and superiority of the 2-layer parallel-connected components algorithm, making it a good choice for freight bottleneck identification and analysis within large-scale transportation networks.

Case Study: USA

Another case study is done to analyze and visualize the top 100 freight bottlenecks in the US in February 2022. We implemented and used the same methodology and ranked the bottlenecks for the whole road network and ranked the top 100 bottlenecks. Table 6 presents the top 10 identified freight bottlenecks in the United States. The location information is represented by latitude and longitude, providing the centroid for each bottleneck’s network topology. The centroid for freight bottleneck is calculated by the discrete series arithmetic mean of the geographical input that defines the bottleneck topology. In addition, if the freight bottleneck spans multiple states, all the involved states are recorded.

Table 6 Top 10 freight bottlenecks in the United States (all on weekdays), the location is recorded as the centroid of the bottleneck topology and the state the bottleneck is in Additional metrics are shown in the Appendix

For every identified bottleneck within the road network, all corresponding metrics are computed based on the metrics outlined in “Freight Bottleneck Ranking and Visualization”. The network topology and the bottleneck metrics associated with North America’s top 10 most critical freight bottlenecks in Feb. 2022 are visualized and presented in Appendix C.

Case Study: Top Congested US Freight Bottleneck 2022

This case study analyzes and compares the most severe US bottleneck in the year 2022 with results from the top congested freight bottleneck from American Transportation Research Institute (ATRI) (American Transportation Research Institute 2022). Due to the methodology difference, direct and equivalent comparison is unattainable. It is hard to find a benchmark methodology in the literature that deals with large-scale freight bottleneck identification with network-level information. Therefore, we use ATRI as the closest methodology for this case study.

Table 7 compares the methodology proposed in this paper with ATRI’s approach for identifying freight bottlenecks across three dimensions: spatial coverage, temporal resolution, and delay measurement. Our method offers broader spatial coverage, finer temporal granularity, and network information. ATRI uses yearly aggregation on weekdays and 307 fixed highway locations, the proposed methodology aggregates monthly, encompassing both weekdays and weekends, and spans the full road network.

Table 7 Comparison of the methodology proposed in this paper vs. ATRI’s methodology highlighting spatial coverage, temporal resolution, delay measurement, and advantages

Due to the difference in aggregation horizon (monthly vs. annually), time bucket granularity (1-h vs. 24-h), and identification methodology (dynamic sub-network vs. fixed highway road segments), the comparison aggregates the methodology proposed to the same temporal scale for aggregation horizon and time bucket for ranking the most congested freight bottleneck. By doing so, we apply our methodology for identifying freight bottlenecks for all months in the year 2022. Second, we apply the corresponding bounding box based on the location description of the top 10 ATRI freight bottlenecks. Then, we aggregate based on 12 monthly bottlenecks in the proposed methodology that have a geographical intersection with the corresponding bounding box in ATRI. Then, we rank the aggregated bottleneck based on aggregated weight-based temporal delay cost as mentioned in “Weight-Based Temporal Delay Cost”. Given the methodological and data discrepancies, this comparison seeks to account for these variables by aggregation while preserving the innate spatial nature of each approach.

Table 8 shows the comparison of the relevant metrics for the top congested bottleneck identified and ranked by ATRI and our methodology. Both methods identify the same geographical area as the most congested freight bottleneck. Both methods also identified closely similar average speeds for the identified bottlenecks. The speed difference during peak hours can be explained by the difference due to the spatial nature of both methods. ATRI focuses on highway road segments while the proposed methodology expands to the sub-network with heterogeneous road types with different (often lower) benchmark speeds.

Table 8 The most congested freight bottleneck identified by ATRI and our methodology on a yearly aggregation

Conclusion

This research paper presents a methodology for the network-level identification, ranking, and visualization of freight bottlenecks. The methodology developed in this study begins with the network-level definition of a freight bottleneck, encompassing specific spatial and temporal features. The study established a data-driven, network-based freight bottleneck identification and ranking methodology based on parallel-connected components algorithm and introduced a freight-specific metric to measure the various attributes for freight bottlenecks. Although this research utilizes data from one telematics provider, this methodology can be extended to other data sources at any geographical scale with reasonable database computation time.

Central to the proposed methodology is the implementation of the parallel-connected components algorithm, which creates detailed topological information about the road network, specifically focusing on road segments experiencing freight movement disruption. By implementing a parallel computing structure in the database and conducting real-world case studies, we have demonstrated the effectiveness and operational capability.

The broader impact of this research lies in its potential to provide decision-makers, transportation planners, and policymakers with data-driven, network-level insights into freight transportation network performance. Such insights pave the way for approaches to optimize goods movement. By addressing freight bottlenecks, we aim to foster a more resilient and sustainable urban transportation network, ultimately benefiting society, the economy, and the environment.

In this study, a few challenges were identified. Notably, there exists a potential data and statistical bias as the model predominantly relies on data from a single telematics provider, which holds a market share of around 5% depending on regional differences. While the introduced weight-based temporal delay cost is an advancement over truck volume measurements, estimating delay costs based on the monetary value of goods (instead of weight) could offer an even more accurate economic evaluation. Moreover, the integration of factors such as emissions, noise, and safety within the utility function could enrich bottleneck severity assessments.

Potential avenues for future research on this topic include predictive analytics for freight bottleneck identification and mitigation, further analysis of the social and economic costs of freight bottlenecks, and integration of more sources of relevant data, such as weather, events, or road closure data. In addition to the above avenues, future research can study the impact of freight bottlenecks on public policy and regulations related to freight transportation and logistics based on data-driven and network-level insights for bottlenecks.