Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction

Bae, Inhwan; Park, Jin-Hwi; Jeon, Hae-Gon

doi:10.1007/978-3-031-20047-2_16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13682))

Included in the following conference series:

European Conference on Computer Vision

4078 Accesses
15 Citations

Abstract

Modeling the dynamics of people walking is a problem of long-standing interest in computer vision. Many previous works involving pedestrian trajectory prediction define a particular set of individual actions to implicitly model group actions. In this paper, we present a novel architecture named GP-Graph which has collective group representations for effective pedestrian trajectory prediction in crowded environments, and is compatible with all types of existing approaches. A key idea of GP-Graph is to model both individual-wise and group-wise relations as graph representations. To do this, GP-Graph first learns to assign each pedestrian into the most likely behavior group. Using this assignment information, GP-Graph then forms both intra- and inter-group interactions as graphs, accounting for human-human relations within a group and group-group relations, respectively. To be specific, for the intra-group interaction, we mask pedestrian graph edges out of an associated group. We also propose group pooling &unpooling operations to represent a group with multiple pedestrians as one graph node. Lastly, GP-Graph infers a probability map for socially-acceptable future trajectories from the integrated features of both group interactions. Moreover, we introduce a group-level latent vector sampling to ensure collective inferences over a set of possible future trajectories. Extensive experiments are conducted to validate the effectiveness of our architec ture, which demonstrates consistent performance improvements with publicly available benchmarks. Code is publicly available at https://github.com/inhwanbae/GPGraph.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Group LSTM: Group Trajectory Prediction in Crowded Scenarios

GCHGAT: pedestrian trajectory prediction using group constrained hierarchical graph attention networks

Article 26 January 2022

A GNN-Based Architecture for Group Detection from Spatio-Temporal Trajectory Data

Keywords

1 Introduction

Pedestrian trajectory prediction attempts to forecast the socially-acceptable future paths of people based on their past movement patterns. These behavior patterns often depend on each pedestrian’s surrounding environments, as well as collaborative movement, mimicking a group leader, or collision avoidance. Collaborative movement, one of the most frequent patterns, occurs when several colleagues form a group and move together. Computational social scientists estimate that up to 70% of the people in a crowd will form groups [40, 48]. They also gather surrounding information and have the same destination [40]. Such groups have characteristics that are distinguishable from those of individuals, maintain rather stable formations, and even provide important cues that can be used for future trajectory prediction [48, 78].

Pioneering works in human trajectory forecasting model the group movement by assigning additional hand-crafted terms as energy potentials [41, 47, 66]. These works account for the presence of other group members and physics-based attractive forces, which are only valid between the same group members. In recent works, convolutional neural networks (CNNs) and graph neural networks (GNNs) show impressive progress modeling the social interactions, including traveling together and collision avoidance [1, 2, 17, 39, 54]. Nevertheless, trajectory prediction is still a challenging problem because of the complexity of implicitly learning individual and group behavior at once.

There are several attempts that explicitly encode the group coherence behaviors by assigning hidden states of LSTM with a summation of other agents’ states, multiplied by a binary group indicator function [6]. However, existing studies have a critical problem when it comes to capturing the group interaction. Since their forecasting models focus more on individuals, the group features are shared at the individual node as illustrated in Fig. 1(a). Although this approach can conceptually capture group movement behavior, it is difficult for the learning-based methods to represent it because of the overwhelming number of edges for the individual interactions. And, this problem is increasingly difficult in crowded environments.

To address this issue, we propose a novel general architecture for pedestrian trajectory prediction: GrouP-Graph (GP-Graph). As illustrated in Fig. 1(b), our GP-Graph captures intra-(members in a group) and inter-group interactions by disentangling input pedestrian graphs. Specifically, our GP-Graph first learns to assign each pedestrian into the most likely behavior group. The group indices of each pedestrian are generated using a pairwise distance matrix. To make the indexing process end-to-end trainable, we introduce a straight-through group back-propagation trick inspired by the Straight-Through estimator [5, 21, 35]. Using the group information, GP-graph then transforms the input pedestrian graph into both intra- and inter-group interaction graphs. We construct the intra-group graph by masking out edges of the input pedestrian graph for unassociated group members. For the inter-group graph, we propose group pooling &unpooling operations to represent a group with multiple members as one graph node. By applying these processes, GP-Graph architecture has three advantages: (1) It reduces the complexity of trajectory prediction which is caused by the different social behaviors of individuals, by modeling group interactions. (2) It alleviates inherent scene bias by considering the huge number of unseen pedestrian graph nodes between the training and test environments, as discussed in [8]. (3) It offers a graph augmentation effect with pedestrian node grouping.

Next, through weight sharing with baseline trajectory predictors, we force a hierarchy representation from both the input pedestrian graph and the disentangled interactions. This representation is used to infer a probability map for socially-acceptable future trajectories after passing through our group integration module. In addition, we introduce a group-level latent vector sampling to ensure collective inferences over a set of plausible future trajectories.

To the best of our knowledge, this is the first model that literally pools pedestrian colleagues into one group node to efficiently capture group motion behaviors, and learns pedestrian grouping in an end-to-end manner. Furthermore, GP-Graph has the best performance on various datasets among existing methods when unifying with GNN-based models, and it can be integrated with all types of trajectory prediction models, achieving consistent improvements. We also provide extensive ablation studies to analyze and evaluate our GP-Graph.

2 Related Works

2.1 Trajectory Prediction

Earlier works [18, 38, 42, 66] model human motions in crowds using hand-crafted functions to describe attractive and repulsive forces. Since then, pedestrian trajectory prediction has been advanced by research interest in computer vision. Such research leverages the impressive capacity of CNNs which can capture social interactions between surrounding pedestrians. One pioneering work is Social-LSTM [1], which introduces a social pooling mechanism considering a neighbor’s hidden state information inside a spatial grid. Much of the emphasis in subsequent research has been to add human-environment interactions from a surveillance view perspective [11, 23, 33, 37, 49, 52, 58, 59, 61, 75]. Instead of taking environmental information into account, some methods directly share hidden states of agents between other interactive agents [17, 50, 64]. In particular, Social-GAN [17] takes the interactions via max-pooling in all neighborhood features in the scene, and Social-Attention [64] introduces an attention mechanism to impose a relative importance on neighbors and performs a weighted aggregation for the features.

In terms of graph notations, each pedestrian and their social relations can be represented as a node and an edge, respectively. When predicting pedestrian trajectories, graph representation is used to model social interactions with graph convolutional networks (GCNs) [2, 22, 39, 59], graph attention networks (GATs) [3, 19, 23, 32, 54, 63], and transformers [16, 69, 70]. Usually, these approaches infer future paths through recurrent estimations [1, 9, 16, 17, 26, 50, 74] or extrapolations [2, 31, 39, 54]. Other types of relevant research are based on probabilistic inferences for multi-modal trajectory prediction using Gaussian modeling [1, 2, 30, 39, 54, 55, 65, 69], generative models [11, 17, 19, 23, 49, 58, 75], and a conditional variational autoencoder [9, 20, 26, 27, 29, 36, 50, 60]. We note that these approaches focus only on learning implicit representations for group behaviors from agent-agent interactions.

2.2 Group-Aware Representation

Contextual and spatial information can be derived from group-aware representations of agent dynamics. To accomplish this, one of the group-aware approaches is social grouping, which describes agents in groups that move differently than independent agents.

In early approaches [24, 76, 77], pedestrians can be divided into several groups based on behavior patterns. To represent the collective activities of agents in a supervised manner, a work in [41] exploits conditional random fields (CRF) to jointly predict the future trajectories of pedestrians and their group membership. Yamaguchi et al. [66] harness distance, speed, and overlap time to train a linear SVM to classify whether two pedestrians are in the same group or not. In contrast, a work in [14] proposes automatic detection for small groups of individuals using a bottom-up hierarchical clustering with speed and proximity features.

Group-aware predictors recognize the affiliations and relations of individual agents, and encode their proper reactions to moving groups. Several physics-based techniques represent group relations by adding attractive forces among group members [40, 41, 44, 46, 51, 56, 66]. Although a dominant learning paradigm [1, 4, 43, 62, 73] implicitly learns intra- and inter-group coherency, only two works in [6, 12] explicitly define group information. To be specific, one [6] identifies pedestrians walking together in the crowd using a coherent filtering algorithm [77], and utilizes the group information in a social pooling layer to share their hidden states. Another work [12] proposes a generative adversarial model (GAN)-based trajectory model, jointly learning informative latent features for simultaneous pedestrian trajectory forecasting and group detection. These approaches only learn individual-level interactions within a group, but do not encode their affiliated groups and future paths at the same time. Unlike them, our GP-Graph aggregates a group-group relation via a novel group pooling in the proposed end-to-end trainable architecture without any supervision.

2.3 Graph Node Pooling

Pooling operations are used for features extracted from grid data, like images, as well as graph-structured data. However, there is no geographic proximity or order information in the graph nodes that existing pooling operations require. As alternative methods, three types of graph pooling are introduced: topology-based pooling [10, 45], global pooling [15, 72], and hierarchical pooling [7, 13, 68]. These approaches are designed for general graph structures. However, since human behavior prediction has time-variant and generative properties, it is no possible to leverage the advantages of these pooling operations for this task.

3 Proposed Method

In this work, we focus on how group awareness in crowds is formed for pedestrian trajectory prediction. We start with a definition of a pedestrian graph and trajectory prediction in Sect. 3.1. We then introduce our end-to-end learnable pedestrian group assignment technique in Sect. 3.2. Using group index information and our novel pedestrian group pooling &unpooling operations, we construct a group hierarchy representation of pedestrian graphs in Sect. 3.3. The overall architecture of our GP-Graph is illustrated in Fig. 2.

3.1 Problem Definition

Pedestrian trajectory prediction can be defined as a sequential inference task made observations for all agents in a scene. Suppose that N is the number of pedestrians in a scene, the history trajectory of each pedestrian $n \in [1, ..., N]$ can be represented as ${\boldsymbol{X}}_n\!=\!\{ (x_n^t, y_n^t)\,|\,t\!\in \![1, ..., T_{obs}] \}$, where the $(x_n^t, y_n^t)$ is the 2D spatial coordinate of a pedestrian n at specific time t. Similarly, the ground truth future trajectory of pedestrian n can be defined as ${\boldsymbol{Y}}_n\!=\!\{ (x_n^t, y_n^t)\,|\,t\!\in \![T_{obs}\!+\!1, ..., T_{pred}] \}$.

The social interactions are modeled from the past trajectories of other pedestrians. In general, the pedestrian graph $\mathcal {G}_{ped}\!=\!(\mathcal {V}_{ped}, \mathcal {E}_{ped})$ refers to a set of pedestrian nodes $\mathcal {V}_{ped} = \{ {\boldsymbol{X}}_n\,|\,n\!\in \![1, ..., N] \}$ and edges on their pairwise social interaction $\mathcal {E}_{ped} = \{ e_{i,j}\,|\,i,j\!\in \![1, ..., N] \}$. The trajectory prediction process forecasts their future sequences based on their past trajectory and the social interaction as:

$$\begin{aligned} \widehat{{\boldsymbol{Y}}} = F_\theta \left( X,\,\mathcal {G}_{ped}\right) \end{aligned}$$

(1)

where $\widehat{{\boldsymbol{Y}}} = \{ \widehat{{\boldsymbol{Y}}}_n\,|\,n\!\in \![1, ..., N] \}$ denotes the estimated future trajectories of all pedestrians in a scene, and $F_\theta (\,\cdot \,)$ is the trajectory generation network.

3.2 Learning the Trajectory Grouping Network

Our goal in this work is to encode powerful group-wise features beyond existing agent-wise social interaction aggregation models to achieve highly accurate human trajectory prediction. The group-wise features represent group members in input scenes as single nodes, making pedestrian graphs simpler. We use a U-Net architecture with pooling layers to encode the features on graphs. By reducing the number of nodes through the pooling layers in the U-Net, higher-level group-wise features can be obtained. After that, agent-wise features are recovered through unpooling operations.

Unlike conventional pooling &unpooling operators working on grid-structured data, like images, it is not feasible to apply them to graph-structured data. Some earlier works to handle this issue [7, 13]. The works focus on capturing global information by removing relatively redundant nodes using a graph pooling, and restoring the original shapes by adding dummy nodes from a graph unpooling if needed. However, in pedestrian trajectory prediction, each node must keep its identity index information and describe the dynamic property of the group behavior in scenes. For that, we present pedestrian graph-oriented group pooling &unpooling methods. We note that it is the first work to exploit the pedestrian index itself as a group representation.

Learning Pedestrian Grouping. First of all, we estimate grouping information to which the pedestrian belongs using a Group Assignment Module. Using the history trajectory of each pedestrian, we measure the feature similarity among all pedestrian pairs based on their $L_2$ distance. With this pairwise distance, we pick out all pairs of pedestrians that are likely to be a colleague (affiliated with same group). The pairwise distance matrix ${\boldsymbol{D}}$ and a set of colleagues indices $\varUpsilon $ are defined as:

$$\begin{aligned} {\boldsymbol{D}}_{\,i,j} = \Vert F_\phi ({\boldsymbol{X}}_i) - F_\phi ({\boldsymbol{X}}_j)\Vert ~~~\text {for}~~ i,j \in [1, ..., N], \end{aligned}$$

(2)

$$\begin{aligned} \varUpsilon = \{ \text {pair}(i,\,j)\,|\,i,j \in [1, ..., N], ~i \ne j, ~{\boldsymbol{D}}_{\,i,j} \le \pi \}, \end{aligned}$$

(3)

where $F_\phi (\,\cdot \,)$ is a learnable convolutional layer and $\pi $ is a learnable thresholding parameter.

Next, using the pairwise colleague set $\varUpsilon $, we arrange the colleague members in associated groups and assign their group index. We make a group index set G, which is formulated as follows:

$$\begin{aligned} G = \Big \{ G_k \,|\, G_k = \!\!\bigcup _{(i,j) \in \varUpsilon }\! \{i,\,j\},~~G_a\!\cap G_b = \varnothing ~~\text {for}~ a \ne b \Big \} \end{aligned}$$

(4)

where $G_k$ denotes the k-th group and is the union of each pair set (i, j). This information is used as important prior knowledge in the subsequent pedestrian group pooling and unpooling operators.

Pedestrian Group Pooling. Based on the group behavior property that group members gather surrounding information and share behavioral patterns, we group the pedestrian nodes, where the corresponding node’s features are aggregated into one node. The aggregated group features are then stacked for subsequent social interaction capturing modules (i.e.GNNs). Here, the most representative feature for each pedestrian node is selected via an average pooling. With the feature, we can model the group-wise graph structures, which have much fewer number of nodes than the input pedestrian graph, as will be demonstrated in Sec. 4.3. We define the pooled group-wise trajectory feature ${\boldsymbol{Z}}$ as follows:

$$\begin{aligned} {\boldsymbol{Z}} = \{{\boldsymbol{Z}}_k\,|\,k \in [1, ..., K]\}, ~~~~~{\boldsymbol{Z}}_k = \frac{1}{|G_k|} \sum _{i\;\!\in \;\!G_k} \!{\boldsymbol{X}}_i, \end{aligned}$$

(5)

where K is the total group numbers in G.

Pedestrian Group Unpooling. Next, we upscale the group-wise graph structures back to their original size by using an unpooling operation. This enables each pedestrian trajectory to be forecast with output agent-wise feature fusion information. In existing methods [7, 13], zero vector nodes are appended into the group features during unpooling. The output of the convolution process on the zero vector nodes fails to exhibit the group properties. To alleviate this issue, we duplicate the group features and then assign them into nodes for all the relevant group members so that they have identical group behavior information. The pedestrian group unpooling operator can be formulated as follows:

(6)

where $IMAGE$ is the agent-wise trajectory feature reconstructed from Z, having the same order of pedestrian indices as in ${\boldsymbol{X}}$.

Straight-Through Group Estimator. A major hurdle, when training the group assignment module in Eq. (4) which is a sampling function, is that index information is not treated as learnable parameters. Accordingly, the group index cannot be trained using standard backpropagation algorithms. The reason is why the existing methods utilize separate training steps from main trajectory prediction networks for the group detection task.

We tackle this problem by introducing a Straight-through (ST) trick, inspired by the biased path derivative estimators in [5, 21, 35]. Instead of making the discrete index set $G_k$ differentiable, we separate the forward pass and backward pass of the group assignment module in the training process. Our intuition for constructing the backward pass is that group members have similar features with closer pairwise distance between colleagues.

In the forward pass, we perform our group pooling over both pedestrian features and the group index from the input trajectory and estimated group assignment information, respectively. For the backward pass, we propose group-wise continuous relaxed features to approximate the group indexing process. We compute the probability that a pair of pedestrians belongs to the same group using the proposed differentiable binary thresholding function $\frac{1}{1+\exp (x-\pi )}$, and apply it on the pairwise distance matrix ${\boldsymbol{D}}$. We then measure the normalized probability ${\boldsymbol{A}}$ of the summation of all neighbors’ probability. Lastly, we compute a new pedestrian trajectory feature ${\boldsymbol{X}}'$ by aggregating features between group members through the matrix multiplication of ${\boldsymbol{X}}$ and ${\boldsymbol{A}}$ as follows:

$$\begin{aligned} {\boldsymbol{A}}_{\,i,j} = \frac{\frac{1}{1 + \exp \!\big (\frac{{\boldsymbol{D}}_{\,i,j}-\pi }{\tau }\big )}}{\sum _{i=1}^{N} \Big ({\frac{1}{1 + \exp \!\big (\frac{{\boldsymbol{D}}_{\,i,j}-\pi }{\tau }\big )}}\Big )} ~~~\text {for}~~ i,j \in [1, ..., N], \end{aligned}$$

(7)

$$\begin{aligned} {\boldsymbol{X}}' = \langle \, {\boldsymbol{X}} - {\boldsymbol{X}}{\boldsymbol{A}} \,\,\rangle + {\boldsymbol{X}}{\boldsymbol{A}}, \end{aligned}$$

(8)

where $\tau $ is the temperature of the sigmoid function and $\langle \,\cdot \,\rangle $ is the detach (in PyTorch) or stop gradient (in Tensorflow) function which prevents the backpropagation.

For further explanation of Eq. (8), we replace the input of pedestrian group pooling module X with a new pedestrian trajectory feature ${\boldsymbol{X}}'$ in implementation. To be specific, we can remove ${\boldsymbol{X}}{\boldsymbol{A}}$ in the forward pass, allowing us to compute a loss for the trajectory feature ${\boldsymbol{X}}$. In contrast, due to the stop gradient $\langle \,\cdot \,\rangle $, the loss is only backpropagated to ${\boldsymbol{X}}{\boldsymbol{A}}$ in the backward pass. To this end, we can train both the convolutional layer $F_\phi $ and the learnable threshold parameter $\pi $ which are used for the computation of the pairwise distance matrix ${\boldsymbol{D}}$ and the construction of group index set G, respectively.

3.3 Pedestrian Group Hierarchy Architecture

Using the estimated pedestrian grouping information, we reconstruct the initial social interaction graph $\mathcal {G}_{ped}$ in an efficient form for pedestrian trajectory prediction. Instead of the existing complex and complete pedestrian graph, intra- and inter-group interaction graphs capture the group-ware social relation, as illustrated in Fig. 3.

Intra-group Interaction Graph. We design a pedestrian interaction graph that captures relations between members affiliated with the same group. The intra-group interaction graph $\mathcal {G}_{member}\!=\!(\mathcal {V}_{ped}, \mathcal {E}_{member})$ consists of a set of pedestrian nodes $\mathcal {V}_{ped}$ and edges on their pairwise social interaction of group members $\mathcal {E}_{member} = \{ e_{i,j}\,|\,i,j\!\in \![1, ..., N], k\!\in \![1, ..., K], \{i,j\}\!\subset \!G_k \}$. Through this graph representation, pedestrian nodes can learn social norms of internal collision avoidance between group members while maintaining their own formations and on-going directions.

Inter-group Interaction Graph. Inter-group interactions (group-group relation) are indispensable to learn social norms between groups as well. To take various group behaviors such as following a leading group, avoiding collisions and joining a new group, we create an inter-group interaction graph $\mathcal {G}_{group}\!=\!(\mathcal {V}_{group}, \mathcal {E}_{group})$. Here, nodes refer to each group’s features $IMAGE$ generated with our pedestrian group pooling operation, and edges mean the pairwise group-group interactions $\mathcal {E}_{group} = \{ \bar{e}_{p,q}\,|\,p,q\!\in \![1, ..., K] \}$.

Group Integration Network. We incorporate the social interactions as a form of group hierarchy into well-designed existing trajectory prediction baseline models in Fig. 3(b). Meaningful features can be extracted by feeding a different type of graph-structured data into the same baseline model. Here, the baseline models share their weights to reduce the amount of parameters while enriching the augmentation effect. Afterward, the output features from the baseline models are aggregated agent-wise, and are then used to predict the probability map of future trajectories using our group integration module. The generated output trajectory $\widehat{Y}$ with the group integration network $F_\psi $ is formulated as:

(9)

Group-Level Latent Vector Sampling. To infer the multi-modal future paths of pedestrians, an additional random latent vector is introduced with an input observation path. This latent vector becomes a factor, determining a person’s choice of behavior patterns, such as acceleration/deceleration and turning to right/left. There are two ways to adopt this latent vector in trajectory generation: (1) Scene-level sampling [17] where everyone in the scene shares one latent vector, unifying the behavior patterns of all pedestrians in a scene (e.g., all pedestrians are slow down); (2) Pedestrian-level sampling [50] that allocates the different latent vectors for each pedestrian, but forces the pedestrians to have different patterns, where the group behavior property is lost.

We propose a group-level latent vector sampling method as a compromise of the two ways. We use the group information estimated from the GP-Graph to share the latent vector between groups. If two people are not associated with the same group, an independent random noise is assigned as a latent vector. In this way, it is possible to sample a multi-modal trajectory, which is independent of other groups members and follows associated group behaviors. The effectiveness of the group-level sampling is visualized in Sect. 4.3.

3.4 Implementation Details

To validate the generality of our GP-Graph, we incorporate it into four state-of-the-art baselines: three different GNN-based baseline methods including STGCNN (GCN-based) [39], SGCN (GAT-based) [54] and STAR (Transformer-based) [69], and one non-GNN model, PECNet [36]. We simply replace their trajectory prediction parts with ours. We additionally embed our agent/intra-/inter-graphs on the baseline networks, and compute integrated output trajectories to obtain the group-aware prediction.

For our proposed modules, we initialize the learnable parameter $\pi $ as one, which cut the total number of nodes moderately down by half, with the group pooling in the initial training step. Other learnable parameters such as $F_\theta $, $F_\phi $ and $F_\psi $ are randomly initialized. We set the hyperparameter $\tau $ to 0.1 to give the binary thresholding function a steep slope.

To train the GP-Graph architecture, we use the same training hyperparameters (e.g., batch size, train epochs, learning rate, learning rate decay), loss functions, and optimizers of the baseline models. We note that we do not use additional group labels for an apple-to-apple comparison with the baseline models. Our group assignment module is trained to estimate effective groups for trajectory prediction in an unsupervised manner. Thanks to our powerful Straight-Through Group Estimator, it accomplish promising results over other supervised group detection networks [7] that require additional group labels.

4 Experiments

In this section, we conduct comprehensive experiments to verify how the grouping strategy contributes to pedestrian trajectory prediction. We first briefly describe our experimental setup (Sect. 4.1). We then provide comparison results with various baseline models for both group detection and trajectory prediction (Sect. 4.3 and Sect. 4.2). We lastly conduct an extensive ablation study to demonstrate the effect of each component of our method (Sect. 4.4).

Table 1. Comparison between GP-Graph architecture and the vanilla agent-wise interaction graph for four state-of-the-art multi-modal trajectory prediction models, Social-STGCNN [39], SGCN [54], STAR [69] and PECNet [36]. The models are evaluated on the ETH [42], UCY [28], SDD [47] and GCS [67] datasets. Gain: performance improvement w.r.t FDE over the baseline models, Unit for ADE and FDE: meter, Bold: Best.

Full size table

4.1 Experimental Setup

Datasets. We evaluate the effectiveness of our GP-Graph by incorporating it into several baseline models and check the performance improvement on public datasets: ETH [42], UCY [28], Stanford Drone Dataset (SDD) [47], and the Grand Central Station (GCS) [67] datasets. The ETH & UCY datasets contain five unique scenes (ETH, Hotel, Univ, Zara1 and Zara2) with 1,536 pedestrians, and the official leave-one-out strategy is used to train and to validate the models. SDD consists of various types of objects with a birds-eye view, and GCS shows highly congested pedestrian walking scenes. We use the standard training and evaluation protocol [17, 19, 36, 39, 50, 54] in which the first 3.2 s (8 frames) are observed and next 4.8 s (12 frames) are used for a ground truth trajectory. Additionally, two scenes (Seq-eth, Seq-hotel) of the ETH datasets provide ground-truth group labels. We use them to evaluate how accurately our GP-Graph groups individual pedestrians.

Evaluation Protocols. For multi-modal human trajectory prediction, we follow a standard evaluation manner, in Social-GAN [17], generating 20 samples based on predicted probabilistic distributions, and then choosing the best sample to measure the evaluation metrics. We use same evaluation metrics of previous works [1, 17, 34, 61] for future trajectory prediction. Average Displacement Error (ADE) computes the Euclidean distance between a prediction and ground-truth trajectory, while Final Displacement Error (FDE) computes the Euclidean distance between an end-point of prediction and ground-truth. Collision rate (COL) checks the percentage of test cases where the predicted trajectories of different agents run into collisions, and Temporal Correlation Coefficient (TCC) measures the Pearson correlation coefficient of motion patterns between a predicted and ground-truth trajectory. We use both ADE and FDE as accuracy measures, and both COL and TCC as reliability measures in our group-wise prediction. For the COL metric, we average a set of collision ratios over the 20 multi-modal samples.

For grouping measures, we use precision and recall values based on two popular metrics, proposed in prior works [6, 12]: A group pair score (PW) measures the ratio between group pairs that disagree on their cluster membership, and all possible pairs in a scene. A Group-MITRE score (GM) is a ratio of the minimum number of links for group members and fake counterparts for pedestrians who are not affiliated with any group.

4.2 Quantitative Results

Evaluation on Trajectory Prediction. We first compare our GP-Graph with conventional agent-wise prediction models on the trajectory prediction benchmarks. As reported in Table 1, our GP-Graph achieves consistent performance improvements on all the baseline models. Additionally, our group-aware prediction also reduces the collision rate between agents, and shows analogous motion patterns with its ground truth by capturing the group movement behavior well. The results demonstrate that the trajectory prediction models benefit from the group-awareness cue of our group assignment module.

Table 2. Comparison of GP-Graph on SGCN with other state-of-the-art group detection models (Precision/Recall). For fair comparison, the evaluation results are directly referred from [6, 12]. $\mathcal {S}$: Use a loss for supervision, Bold: Best, Underline: Second best.

Full size table

Evaluation on Group Estimation. We also compare the grouping ability of our GP-Graph with that of state-of-the-art models in Table 2. Our group assignment module trained in an unsupervised manner achieves superior results in the PW precision in both scenes, but shows relatively low recall values over the baseline models.

There are various group interaction scenarios in both scenes, and we found that our model sometimes fails to assign pedestrians into one large group when either a person joins the group or the group splits into both sides to avoid a collision. In this situation, while forecasting agent-wise trajectories, it is advantageous to divide the group into sub-groups or singletons, letting them have different behavior patterns. Although false-negative group links sometimes occur during the group estimation because of this, it is not a big issue for trajectory prediction.

To measure the maximum capability of our group estimator, we additionally carry out an experiment with a supervision loss to reduce the false-negative group links. We use a binary cross-entropy loss between the distance matrix and the ground-truth group label. As shown in Table 2, the performance is comparable to the state-of-the art group estimation models with respect to the PW and GM metrics. This indicates that our learning trajectory grouping network can properly assign groups without needing complex clustering algorithms.

4.3 Qualitative Results

Trajectory Visualization. In Fig. 4, we visualize some prediction results of GP-Graph and other methods. Since GP-Graph estimates the group-aware representations and captures both intra-/inter-group interactions, the predicted trajectories are closer to socially-acceptable trajectories and forms more stable behaviors between group members than those of the comparison models. Figure 4 also shows the pedestrians forming a group with our group assignment module. GP-Graph uses movement patterns and proximity information to properly create a group node for pedestrians who will take the same behaviors and walking directions in the future. This simplifies complex pedestrian graphs and eliminates potential errors associated with the collision avoidance between colleagues.

Table 3. Ablation study of various pooling &unpooling operations on SGCN [54] (FDE/COL/TCC). In the case of our Pedestrian Group Pooling &Unpooling, we additionally provide experimental results using the ground-truth group labels (Oracle). Bold: Best, Underline: Second best.

Full size table

Group-Level Latent Vector Sampling. To demonstrate the effectiveness of the group-level latent vector sampling strategy, we compare ours with two previous strategies: scene-level and pedestrian-level sampling in Fig. 5. Even though the probability maps of pedestrians are well predicted with the estimated group information (Fig. 5(a)), its limitation still remains. For example, all sampled trajectories in the probability distributions lean toward the same directions (Fig. 5(b)) or are scattered with different patterns even within group members, which leads to collisions between colleagues (Fig. 5(c)). Our GP-Graph with the proposed group-level sampling strategy predicts the collaborative walking trajectories of associated group members, which is independent of other groups (Fig. 5(d)).

4.4 Ablation Study

Pooling &Unpooling. To check the effectiveness of the proposed group pooling &unpooling layers, we compare it with different pooling methods including gPool [13] and SAGPool [25] with respect to FDE, COL and TCC. gPool proposes a top-k pooling by employing a projection vector to compute a rank score for each node. SAGpool is similar to the gPool method, but encodes topology information in a self-attention manner. As shown in Table 3, for both gPool and SAGPool, pedestrian features are lost via the pooling operations on unimportant nodes. By contrast, our pooling approach focuses on group representations of the pedestrian graph structure because it is optimized to capture group-related patterns.

Group Hierarchy Graph. We examine each component of the group hierarchy graph in Table 4. Both intra-/inter-group interaction graphs show a noticeable performance improvement compared to the baseline models, and the inter-group graph with our group pooling operation has the most important role in performance improvement (variants 1 to 4). The best performances can be achieved when all three types of interaction graphs are used with a weight-shared baseline model, which takes full advantage of graph augmentations (variants 4 and 5).

Grouping Method. We introduce a learnable threshold parameter $\pi $ on the group assignment module in Eq. (2) because in practice the total number of groups in a scene can change according to the trajectory feature of the input pedestrian node. To highlight the importance of $\pi $, we test a fixed ratio group pooling with a node reduction ratio of 50%. As expected, the learnable threshold shows lower errors than the fixed ratio of group pooling (variants 5 and 6). This means that it is effective to guarantee the variability of group numbers, since the number can vary even when the same number of pedestrians exists in a scene.

Additionally, we report results for the group-level latent vector sampling strategy (variants 5 and 7). Since the ADE and FDE metrics are based on best-of-many strategies, there is no difference with respect to numerical performance. However, it allows each group to keep their own behavior patterns, and to represent independency between groups, as in Fig. 5.

Table 4. Ablation study (ADE/FDE). AW, MB, GP, WS, FG and GS respectively denote agent-wise pedestrian graph, intra-group member graph, inter-group graph, weight sharing among different interaction graph, fixed ratio node reduction of grouping and group-level latent vector sampling respectively. All tests are performed on SGCN. Bold: Best, Underline: Second best.

Full size table

5 Conclusion

In this paper, we present a GP-Graph architecture for learning group-aware motion representations. We model group behaviors in crowded scenes by proposing a group hierarchy graph using novel pedestrian group pooling &unpooling operations. We use them for our group assignment module and straight-forward group estimation trick. Based on the GP-Graph, we introduce a multi-modal trajectory prediction framework that can attend intra-/inter group interaction features to capture human-human interactions as well as group-group interactions. Experiments demonstrate that our method significantly improves performance on challenging pedestrian trajectory prediction datasets.

References

Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Bae, I., Jeon, H.G.: Disentangled multi-relational graph convolutional network for pedestrian trajectory prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) (2021)
Google Scholar
Bae, I., Park, J.H., Jeon, H.G.: Non-probability sampling network for stochastic human trajectory prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Bartoli, F., Lisanti, G., Ballan, L., Del Bimbo, A.: Context-aware trajectory prediction. In: 2018 24th International Conference on Pattern Recognition (ICPR) (2018)
Google Scholar
Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
Bisagno, N., Zhang, B., Conci, N.: Group LSTM: group trajectory prediction in crowded scenarios. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 213–225. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_18
Chapter Google Scholar
Cangea, C., Velickovic, P., Jovanovic, N., Kipf, T., Lio’, P.: Towards sparse hierarchical graph classifiers. arXiv preprint arXiv:1811.01287 (2018)
Chen, G., Li, J., Lu, J., Zhou, J.: Human trajectory prediction via counterfactual analysis. In: Proceedings of International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Chen, G., Li, J., Zhou, N., Ren, L., Lu, J.: Personalized trajectory prediction via distribution discrimination. In: Proceedings of International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Proceedings of the Neural Information Processing Systems (NeurIPS) (2016)
Google Scholar
Dendorfer, P., Elflein, S., Leal-Taixé, L.: MG-GAN: a multi-generator model preventing out-of-distribution samples in pedestrian trajectory prediction. In: Proceedings of International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Fernando, T., Denman, S., Sridharan, S., Fookes, C.: GD-GAN: generative adversarial networks for trajectory prediction and group detection in crowds. In: Proceedings of Asian Conference on Computer Vision (ACCV) (2018)
Google Scholar
Gao, H., Ji, S.: Graph U-Nets. In: Proceedings of the International Conference on Machine Learning (ICML) (2019)
Google Scholar
Ge, W., Collins, R.T., Ruback, R.B.: Vision-based analysis of small groups in pedestrian crowds. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2012)
Google Scholar
Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: Proceedings of the International Conference on Machine Learning (ICML) (2017)
Google Scholar
Gu, T., et al.: Stochastic trajectory prediction via motion indeterminacy diffusion. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Helbing, D., Molnar, P.: Social force model for pedestrian dynamics. Phys. Rev. E 51(5), 4282 (1995)
Article Google Scholar
Huang, Y., Bi, H., Li, Z., Mao, T., Wang, Z.: STGAT: modeling spatial-temporal interactions for human trajectory prediction. In: Proceedings of International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Ivanovic, B., Pavone, M.: The trajectron: probabilistic multi-agent trajectory modeling with dynamic spatiotemporal graphs. In: Proceedings of International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with Gumbel-Softmax. International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Kosaraju, V., Sadeghian, A., Martín-Martín, R., Reid, I., Rezatofighi, H., Savarese, S.: Social-BiGAT: multimodal trajectory forecasting using bicycle-GAN and graph attention networks. In: Proceedings of the Neural Information Processing Systems (NeurIPS) (2019)
Google Scholar
Lawal, I.A., Poiesi, F., Anguita, D., Cavallaro, A.: Support vector motion clustering. IEEE Trans. Circ. Syst. Video Technol. (TCSVT) 27, 2395–2408 (2017)
Google Scholar
Lee, J., Lee, I., Kang, J.: Self-attention graph pooling. In: Proceedings of the International Conference on Machine Learning (ICML) (2019)
Google Scholar
Lee, M., Sohn, S.S., Moon, S., Yoon, S., Kapadia, M., Pavlovic, V.: Muse-VAE: multi-scale VAE for environment-aware long term trajectory prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Lee, N., Choi, W., Vernaza, P., Choy, C.B., Torr, P.H.S., Chandraker, M.: Desire: distant future prediction in dynamic scenes with interacting agents. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Lerner, A., Chrysanthou, Y., Lischinski, D.: Crowds by example. Comput. Graph. Forum 26(3), 655–664 (2007)
Article Google Scholar
Li, J., Ma, H., Tomizuka, M.: Conditional generative neural system for probabilistic trajectory prediction. In: Proceedings of IEEE International Conference on Intelligent Robots and Systems (IROS) (2019)
Google Scholar
Li, J., Yang, F., Tomizuka, M., Choi, C.: EvolveGraph: multi-agent trajectory prediction with dynamic relational reasoning. In: Proceedings of the Neural Information Processing Systems (NeurIPS) (2020)
Google Scholar
Li, S., Zhou, Y., Yi, J., Gall, J.: Spatial-temporal consistency network for low-latency trajectory forecasting. In: Proceedings of International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Liang, J., Jiang, L., Murphy, K., Yu, T., Hauptmann, A.: The garden of forking paths: Towards multi-future trajectory prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Liang, J., Jiang, L., Niebles, J.C., Hauptmann, A.G., Fei-Fei, L.: Peeking into the future: predicting future person activities and locations in videos. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Liu, Y., Yan, Q., Alahi, A.: Social NCE: contrastive learning of socially-aware motion representations. In: Proceedings of International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Maddison, C.J., Mnih, A., Teh, Y.W.: The concrete distribution: a continuous relaxation of discrete random variables. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Mangalam, K., et al.: It is not the journey but the destination: endpoint conditioned trajectory prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 759–776. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_45
Chapter Google Scholar
Marchetti, F., Becattini, F., Seidenari, L., Bimbo, A.D.: Mantra: memory augmented networks for multiple trajectory prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Google Scholar
Mohamed, A., Qian, K., Elhoseiny, M., Claudel, C.: Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Moussaïd, M., Perozo, N., Garnier, S., Helbing, D., Theraulaz, G.: The Walking Behaviour of Pedestrian Social Groups and Its Impact on Crowd Dynamics. Public Library of Science One (2010)
Google Scholar
Pellegrini, S., Ess, A., Van Gool, L.: Improving data association by joint modeling of pedestrian trajectories and groupings. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 452–465. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_33
Chapter Google Scholar
Pellegrini, S., Ess, A., Schindler, K., Van Gool, L.: You’ll never walk alone: modeling social behavior for multi-target tracking. In: Proceedings of International Conference on Computer Vision (ICCV) (2009)
Google Scholar
Pfeiffer, M., Paolo, G., Sommer, H., Nieto, J.I., Siegwart, R.Y., Cadena, C.: A data-driven model for interaction-aware pedestrian motion prediction in object cluttered environments. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA) (2018)
Google Scholar
Qiu, F., Hu, X.: Modeling group structures in pedestrian crowd simulation. Simul. Model. Pract. Theory 18(2), 190–205 (2010)
Google Scholar
Rhee, S., Seo, S., Kim, S.: Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligencev (IJCAI) (2018)
Google Scholar
Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: human trajectory understanding in crowded scenes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 549–565. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_33
Chapter Google Scholar
Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: human trajectory understanding in crowded scenes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 549–565. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_33
Chapter Google Scholar
Rudenko, A., Palmieri, L., Lilienthal, A.J., Arras, K.O.: Human motion prediction under social grouping constraints. In: Proceedings of IEEE International Conference on Intelligent Robots and Systems (IROS) (2018)
Google Scholar
Sadeghian, A., Kosaraju, V., Sadeghian, A., Hirose, N., Rezatofighi, H., Savarese, S.: Sophie: an attentive GAN for predicting paths compliant to social and physical constraints. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data. In: Proceedings of European Conference on Computer Vision (ECCV) (2020)
Google Scholar
Seitz, M., Köster, G., Pfaffinger, A.: Pedestrian group behavior in a cellular automaton. In: Weidmann, U., Kirsch, U., Schreckenberg, M. (eds.) Pedestrian and Evacuation Dynamics 2012, pp. 807–814. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-02447-9_67
Chapter Google Scholar
Shafiee, N., Padir, T., Elhamifar, E.: Introvert: Human trajectory prediction via conditional 3d attention. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Shao, J., Loy, C.C., Wang, X.: Scene-independent group profiling in crowd. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Shi, L., et al.: SGCN: sparse graph convolution network for pedestrian trajectory prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Shi, X., et al.: Multimodal interaction-aware trajectory prediction in crowded space. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) (2020)
Google Scholar
Singh, H., Arter, R., Dodd, L., Langston, P., Lester, E., Drury, J.: Modelling subgroup behaviour in crowd dynamics dem simulation. Appl. Math. Model. 33(12), 4408–4423 (2009)
Google Scholar
Solera, F., Calderara, S., Cucchiara, R.: Socially constrained structural learning for groups detection in crowd. IEEE Trans. Pattern Anal. Mach. Intell. 38, 995–1008 (2016)
Google Scholar
Sun, H., Zhao, Z., He, Z.: Reciprocal learning networks for human trajectory prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Sun, J., Jiang, Q., Lu, C.: Recursive social behavior graph for trajectory prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Sun, J., Li, Y., Fang, H.S., Lu, C.: Three steps to multimodal trajectory prediction: Modality clustering, classification and synthesis. In: Proceedings of International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Tao, C., Jiang, Q., Duan, L., Luo, P.: Dynamic and static context-aware LSTM for multi-agent motion prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 547–563. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_33
Chapter Google Scholar
Varshneya, D., Srinivasaraghavan, G.: Human trajectory prediction using spatially aware deep attention models. arXiv preprint arXiv:1705.09436 (2017)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Vemula, A., Muelling, K., Oh, J.: Social attention: modeling attention in human crowds. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA) (2018)
Google Scholar
Xu, Y., Wang, L., Wang, Y., Fu, Y.: Adaptive trajectory prediction via transferable GNN. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Yamaguchi, K., Berg, A.C., Ortiz, L.E., Berg, T.L.: Who are you with and where are you going? In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)
Google Scholar
Yi, S., Li, H., Wang, X.: Understanding pedestrian behaviors from stationary crowd groups. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., Leskovec, J.: Hierarchical graph representation learning with differentiable pooling. In: Proceedings of the Neural Information Processing Systems (NeurIPS) (2018)
Google Scholar
Yu, C., Ma, X., Ren, J., Zhao, H., Yi, S.: Spatio-temporal graph transformer networks for pedestrian trajectory prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 507–523. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_30
Chapter Google Scholar
Yuan, Y., Weng, X., Ou, Y., Kitani, K.: AgentFormer: agent-aware transformers for socio-temporal multi-agent forecasting. In: Proceedings of International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Zanotto, M., Bazzani, L., Cristani, M., Murino, V.: Online Bayesian nonparametrics for group detection. In: Proceedings of British Machine Vision Conference (BMVC) (2012)
Google Scholar
Zhang, M., Cui, Z., Neumann, M., Chen, Y.: An end-to-end deep learning architecture for graph classification. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) (2018)
Google Scholar
Zhang, P., Ouyang, W., Zhang, P., Xue, J., Zheng, N.: SR-LSTM: state refinement for LSTM towards pedestrian trajectory prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Zhao, H., Wildes, R.P.: Where are you heading? dynamic trajectory prediction with expert goal examples. In: Proceedings of International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Zhao, T., et al.: Multi-agent tensor fusion for contextual trajectory prediction. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Zhong, J., Cai, W., Luo, L., Yin, H.: Learning behavior patterns from video: a data-driven framework for agent-based crowd modeling. In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (2015)
Google Scholar
Zhou, B., Tang, X., Wang, X.: Coherent filtering: Detecting coherent motions from crowd clutters. In: Proceedings of European Conference on Computer Vision (ECCV) (2012)
Google Scholar
Zhou, B., Wang, X., Tang, X.: Understanding collective crowd behaviors: learning a mixture model of dynamic pedestrian-agents. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar

Download references

Acknowledgement

This work is in part supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) (No. 2019-0-01842, Artificial Intelligence Graduate School Program (GIST), No. 2021-0-02068, Artificial Intelligence Innovation Hub), the National Research Foundation of Korea (NRF) (No. 2020R1C1C1012635) grant funded by the Korea government (MSIT), Vehicles AI Convergence Research & Development Program through the National IT Industry Promotion Agency of Korea (NIPA) funded by the Ministry of Science and ICT (No. S1602-20-1001), the GIST-MIT Collaboration grant and AI-based GIST Research Scientist Project funded by the GIST in 2022.

Author information

Authors and Affiliations

AI Graduate School, GIST, Gwangju, South Korea
Inhwan Bae, Jin-Hwi Park & Hae-Gon Jeon

Authors

Inhwan Bae
View author publications
You can also search for this author in PubMed Google Scholar
Jin-Hwi Park
View author publications
You can also search for this author in PubMed Google Scholar
Hae-Gon Jeon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hae-Gon Jeon .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bae, I., Park, JH., Jeon, HG. (2022). Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13682. Springer, Cham. https://doi.org/10.1007/978-3-031-20047-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-20047-2_16
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20046-5
Online ISBN: 978-3-031-20047-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction

Abstract

Similar content being viewed by others

Group LSTM: Group Trajectory Prediction in Crowded Scenarios

GCHGAT: pedestrian trajectory prediction using group constrained hierarchical graph attention networks

A GNN-Based Architecture for Group Detection from Spatio-Temporal Trajectory Data

Keywords

1 Introduction