1 Introduction

Trajectory prediction technology is becoming increasingly important with the development of intelligent society, especially in the applications of autonomous driving and surveillance systems. In autonomous driving, accurate prediction of the future trajectories of pedestrians around the vehicle can enable the vehicle to take corresponding actions in advance to avoid collisions or perform emergency braking [1, 2].

Due to the complex interactions between pedestrians and the surrounding environment, it is difficult to predict the future trajectories of pedestrians. Various factors can affect a pedestrian’s trajectory, such as obstacles, other pedestrians, vehicles, traffic signals, and even the pedestrian’s own subjective agency. According to [3], 70% of pedestrians tend to walk in groups. Interaction between pedestrians is primarily driven by subjective agency and social norms. The difficulty of pedestrian trajectory prediction is greatly increased due to the many factors that can influence it, such as: 1. Different social behaviors of pedestrians themselves. For example, when walking parallel to others, pedestrians maintain a group status and avoid crossing through the group when someone is walking toward them. 2. Randomness from the movement itself, as pedestrians can turn, stop, move, etc., at any time, making trajectory prediction more difficult. 3. Pedestrians can interact with surrounding objects or other pedestrians, but this interaction is too complex and subtle to accurately quantify. Due to the above reasons, the challenge of trajectory prediction is significantly increased.

To solve the above difficulties, Social-LSTM [4] designs a pooling layer to transfer the interaction information between pedestrians, and then applies a long short-term memory (LSTM) network to predict future trajectories. Following such a pattern, methods have been proposed [5,6,7,8,9] to share interaction information through different mechanisms, such as attention mechanisms and similarity measures. Meanwhile, in order to reflect the diversity of trajectories, some generative adversarial network (GAN) methods [10,11,12,13,14,15] learn to generate multiple feasible trajectoties rather than predict a certain one. These works [16, 17] have also given good inspiration.

The above methods have some limitations, as they usually follow the principle of homogeneity between training and testing sets, i.e., training and testing are conducted on datasets with the same data distribution. Therefore, the testing results obtained in this way do not have generality and cannot adapt to the prediction of pedestrian trajectories in the real world. [18] Quantitatively and objectively evaluated the potential domain differences between the ETH and UCY experimental datasets. Table 1 provides specific numerical statistics for five trajectory domains, including the number of pedestrians, walking speed, acceleration, etc. According to this table, it is clear that there are huge differences in the number of pedestrians among the trajectory domains. In terms of pedestrian movement patterns, the average pedestrian movement speed in ETH is the highest, almost three times that of HOTEL. In addition, the average pedestrian movement acceleration in ETH is also the highest, nearly five times that of ZARA2. The E-D value and S-D value also reveal huge differences between the five different trajectory domains. The situations faced in real life are undoubtedly more complex and diverse. Pedestrian trajectory prediction will face different scenarios, such as in shopping malls, where pedestrians are crowded and dense, and their trajectories are more curved, constantly changing to avoid collisions. However, on sidewalks, pedestrians are sparse and almost always walk in the same direction, and their trajectories are mostly straight lines. One case represents two different scenarios, and a network learned in a shopping mall environment may contain more redundant information for sidewalks, thus affecting the accuracy of pedestrian trajectory prediction.

Table 1 Statistics of five different scenes, ETH, HOTEL, UNIV, ZARA1, and ZARA2

To overcome the above two limitations, we add a domain adaptation module and a continual learning module to the original model. First, we substitute the aggregation layer by means of modeling pedestrian trajectories in the source and target domains as spatio-temporal graphs. Then, we use MMD (maximum mean discrepancy) to quantitatively achieve the distance between the distributions of the two datasets in the source domain and the target domain and measure the similarity of the two datasets. MMDLOSS is used to reduce training problems caused by different data distributions. Finally, a reward and punishment mechanism is delivered to the parameter training of the model through EWC regularization, so as to avoid forgetting the knowledge of the old dataset when training with the new one. Therefore, our model has both desirable generalization ability and is capable to overcome catastrophic forgetting.

In summary, our proposed model makes the following contributions in total:

  1. 1.

    We propose a more detailed division for pedestrian interaction, which comprehensively considers elements such as distance and direction. We believe that this division can be extensively used for pedestrian prediction.

  2. 2.

    We embed a domain adaptation module to enhance the learning ability of the model on new datasets by calculating the similarity of the data distribution in the target and source domains.

  3. 3.

    We introduce a continual learning module to preserve its potential to remember previous data via limiting parameter changes, effectively relieving the catastrophic forgetting problem of the model after studying a new dataset.

  4. 4.

    To the best of our knowledge, this is the first paper to propose a combination of domain adaptation and continual learning methods for trajectory prediction, in the seek of the real-world utility with deep learning-based trajectory prediction models.

2 Related work

2.1 Forecasting pedestrian trajectory

Pedestrian trajectory prediction is used to predict the future position of the target agent based on their past positions and the surrounding environment. Early research attempted to use traditional mathematical models [19] for prediction, such as Gaussian models [20, 21] and Markov decision models [22]. However, traditional mathematical models rely heavily on manually annotated prior knowledge, which is costly and lacks accuracy. With the development of deep learning, a large number of deep learning methods have been applied to solve this problem. In Social-LSTM [4], pedestrians are modeled using recurrent neural networks (RNN), and a designed pooling layer is used to integrate the hidden states of pedestrians, including shared human–machine interaction features. However, RNN-based models suffer from the problems of gradient disappearance and difficulty in scalability during the training process, so many works [3, 5,6,7,8,9, 23] have combined other networks to improve prediction performance. In addition, considering the subjective initiative and uncertainty of pedestrians’ walking, pedestrian trajectory prediction methods based on GAN have been proposed [10,11,12,13,14,15], which introduce the idea of adversarial thinking into the task and overcome the shortcomings of previous methods that are mostly based on optimizing the distance between pedestrians and predicting only one average trajectory. Furthermore, trajectory prediction methods based on spatio-temporal graphs [24] have also been widely applied. In the prediction task, the spatio-temporal graph is divided into two dimensions: time and space, which respectively model the pedestrian's historical trajectory and simulate social interaction between pedestrians. Attention mechanisms have also been introduced into this task [25,26,27], and they encode the different importance of adjacent pedestrians for trajectory prediction to improve prediction accuracy. The attention mechanism breaks the sequential dependence of the RNN network and provides a more intuitive method for simulating the topological structure of pedestrians in a shared space.

2.2 Domain adaptation

Domain adaptation is a type of transfer learning that deals with the problem of different data distributions between the source and target domains. In deep learning, researchers usually assume that the training dataset and the target test dataset have the same data distribution. However, in real life, this assumption is often difficult to satisfy. When there is a large difference in data distribution between the training dataset and the target test dataset, overfitting can easily occur, causing the trained model to perform poorly on the test dataset. To solve this problem, researchers have proposed three methods: feature adaptation, instance adaptation, and model adaptation.

Feature adaptation involves extracting the features of the source and target domains into a common feature space where the distance between the source and target domains is close enough to align them, thus improving the performance of the target domain. Instance adaptation assigns weights to the source domain data that are similar to the target domain data and uses these data to train the model, which performs relatively well on the target domain. Model adaptation finds some parameters for transfer learning to improve the performance of the target domain.

In our task, we use the first method, feature adaptation. We use a mathematical formula to measure the distance between the source and target domains and use this distance as a loss function in the deep learning network to minimize the distance and align the features of the source and target domains. Popular distance metrics include MMD [28], CORAL [29], and adversarial [30].

2.3 Continual learning

Deep learning-based AI models have achieved suitable performance, even surpassing humans on individual tasks. But deep learning models are mostly trained on static identically distributed datasets and cannot adapt or scale their behavior over time. In order to allow deep models to have the equal human-like ability to learn multiple tasks and cross-apply multiple types of knowledge, the concept of continual learning [31,32,33] is proposed. Ring (1997) defines continual learning as a process of continual development based on complex environments and behaviors, and the establishment of more complicated competencies on pinnacle of the that is already learned.

Continuous learning on deep neural networks has two goals: one is to deal with the catastrophic forgetting problem [34] naturally existing in neural networks due to their own design; the other is to make the training model more general, meaning that the model has the ability to learn new knowledge and memorize old knowledge at the same time. Continuous learning can be subdivided into the following four categories: (1) task-incremental CL, (2) class-incremental CL, (3) domain-incremental CL, and (4) task-agnostic continuous learning (task-agnostic CL), which is the most challenging continual learning scenario. In our work, we deal with domain incremental continual learning. It means that the data arriving at different times belong to the same category of the same task, but the data arrive in batches, and the distribution of the input data has changed. Therefore, its basic assumptions are: (1).\(P( { x^{t} } ) \ne P ( { x^{t+1} } )\), (2).\({ {y^{t} } } = { {y^{t+1} } }\), (3).\(P ({ y^{t} } ) \ne P ( { y^{t+1} } )\), with \(P(\cdot )\) representing the possibility of classification.

Domain incremental continual learning is different from domain adaptation, which aims to transfer knowledge from old tasks to new tasks and only considers the generalization ability on new tasks, while domain incremental continual learning needs to overcome catastrophic forgetting and maintain performance on old tasks as well as new ones. Our method combines the characteristics of continual learning and domain adaptation, hoping to inherit the advantages of both.

3 Problem description

Given one person i observed trajectory \( V^{i} = \left\{ v_{1}^{i},...,v_{\text {obs}}^{i} \right\} \) from step \( T_{1}\) to \( T_{\text {obs}} \), aim to predict the future trajectory \(V^{i} = \left\{ v_{\text {obs}+1}^{i},...,v_{pred}^{i} \right\} \) from step \(T_{\text {obs}+1}\) to \(T_{\text {pred}}\), where \(v_{t}^{i} = (x_{t}^{i},y_{t}^{i})\epsilon R^{2} \) denote the coordinates at time t. Considering all the pedestrians in the scene, the goal is to predict trajectories of all the pedestrians simultaneously by a model \( f(\cdot )\) with parameter \( W^{*}\). So, the entire representation is:

$$\begin{aligned} \bar{V}=\left\{ f(V^{1},...,V^j{N}\mid W^{*}) \right\} . . \end{aligned}$$
(1)

where \(\bar{V}\) is the set of future trajectories of all the pedestrians, N evinces the number of pedestrians, and \(W^{*}\) represents the set of all learnable parameters in the model.

4 Our methods

4.1 Social behavior classification

In the model, we follow the Social-STGCNN model: spatio-temporal convolutional neural network (ST-GCN) and temporal extrapolator convolutional neural network (TXP-CNN). In this model, a set of spatial graphs \(G_{t}\) is first constructed, which represent the relative positions of pedestrians at each time step t in the scene. \(G_{t}\) is defined as \(G_{t} = ( V_{t},E_{t} ) \), where \(V_{t} = { v_{i} = ( x_{i}^{t}, y_{i}^{t} ),\forall i\epsilon { 1,....,N } } \) is a set of vertices of the graph G. \( ( x_{i}^{t},y_{i}^{t} ) \) represent the position of the pedestrian at time \(t \mid t\epsilon { 1,...,t_{\text {obs}}} \). \(E_{t} = { e^{i,j} \mid \forall i,j\epsilon { 1,...,n } } \) is a set of edges of a graph G, which represent the interaction between node i and node j. In order to model the strength of mutual influence between two nodes, an adjacency matrix \( A_{t} \) representing the weight relationship needs to be established.

According to the actual situation in real life, the essence of the interaction strength between pedestrians is whether the trajectories of pedestrians will intersect. Generally speaking, pedestrians will change their walking habits in two cases. One is subjective initiative, that is, the target address changes; the other is to avoid other pedestrians who may collide. For the sake of define \( A_{t} \) more specifically, we propose a new definition method: social behavior classification (SBC). We divide the possibilities of pedestrian trajectories into the following categories: 1. Pedestrians walk in groups. 2. Pedestrians are too far away. 3. Pedestrians walk on their backs. 4. Pedestrians walk in opposite directions without collision. 5. Pedestrians walk on opposite sides and collide. 6. Pedestrians walk in different directions without collision. 7. Pedestrians will collide when walking in different directions. In Fig. 1, we define that when the distance between pedestrians is less than r, it can be regarded as the group has no influence on each other. Then, we define a pedestrian circle with a radius of R, and consider the outside of the circle as infinity, and all pedestrians outside the circle have no interaction with the target agent.

Fig. 1
figure 1

Our model flowchart shows that given the trajectories of source domain and target domain, we first construct the spatial feature graph G of both through social behavior classification method, then extract the spatio-temporal node embedding of graph G through spatio-temporal graph convolution ST-GCNN, and finally predict the final trajectory through TXP-CNN. Note that when training the target domain, it is necessary to calculate the distribution difference between the source domain and the target domain first, and the magnitude of the parameter change during training should be calculated to construct an importance matrix and combine MMD loss and EWC loss to train the target domain

Fig. 2
figure 2

The 7 categories defined by the SBC method, v represents the walking direction of the pedestrian \(V_{i}\) at time T, \(\alpha \) and \(\beta \) represent the angle between the walking direction and the line between the agents, and d represents the distance between the agents

For the latter cases, we introduce the concept of direction angle: \( \alpha \), \( \beta \). When both \( \alpha \) and \( \beta \) are obtuse angles, the mutual influence is 0. When both \( \alpha \) and \( \beta \) are acute angles, the mutual influence is positive. But even if both \( \alpha \) and \( \beta \) are acute angles, there are cases where the mutual influence is 0, such as example 4 and 6 in Fig. 2. In order to distinguish these two cases, we define a new variable \( \gamma \). If the extension line in the final pedestrian speed direction has an intersection, it means that there will be a collision, \( \gamma = 1 \), otherwise \( \gamma = 0\). So, \(A_{t}\) is defined as:

$$\begin{aligned} A_{t} = \gamma *l(v_{i}, v_{j} )*D(\alpha ,\beta ). \end{aligned}$$
(2)

More specifically:

$$ l(v_{i} ,v_{j} ) = \left\{ {\begin{array}{*{20}l} {0,} \hfill & {v_{i} - v_{j}^{2} < r{\text{ or }}v_{i} - v_{j}^{2} > R} \hfill \\ {\frac{1}{{v_{i} - v_{j}^{2} }},} \hfill & {{\text{Otherwise}}} \hfill \\ \end{array} } \right. $$
(3)
$$ D(\alpha ,\beta ) = \left\{ {\begin{array}{*{20}l} {{\text{cos}}\alpha {\text{cos}}\beta ,} \hfill & {\frac{\pi }{2} < = \alpha ,\beta < = \pi } \hfill \\ {0,} \hfill & {{\text{Otherwise }}} \hfill \\ \end{array} } \right. $$
(4)

4.2 Domain adaptation module

Most of the existing trajectory prediction methods are using targeted datasets for training, validation and testing. This means that the data distribution of the dataset used for training and testing is the same, which is very different from the real-life situation. To address the problem of different data distributions, we introduce a domain adaptation module into the model. We use the maximum mean discrepancy (MMD) loss function to measure the distribution distance between the source and target domains. MMD loss is defined as follows:

$$ {\text{MMD}}(X,Y) = \frac{1}{n}\sum\limits_{{i = 1}}^{n} \phi (x_{i} ) - \frac{1}{m}\sum\limits_{{j = 1}}^{m} \phi (y_{j} )^{2} . $$
(5)

The key of MMD is how to find a suitable \(\phi () \) as a mapping function, but this mapping function may be different in different tasks, and this mapping may be in a high-dimensional space, so it is difficult to select or define. In our method, we use a Gaussian kernel function: \(K (u,v) =e^{-\frac{\mid u-v \mid ^{2}}{\sigma }} \). The reason is that the Gaussian kernel can map the space of infinite latitude. For MMD loss, we have the following formula:

$$ \begin{gathered} {\text{MMD}}(X,Y) = \frac{1}{{n^{2} }}\sum\limits_{i}^{n} {\sum\limits_{{i^{\prime } }}^{n} {\left( {x_{i} ,x_{i}^{\prime } } \right)} } - \frac{2}{{nm}}\sum\limits_{i}^{n} {\sum\limits_{j}^{m} {k\left( {x_{i} ,y_{i} } \right)} } \hfill \\ \quad \quad \quad \quad \quad \quad + \frac{1}{{m^{2} }}\sum\limits_{j}^{m} {\sum\limits_{{j^{\prime } }}^{m} {k\left( {y_{i} ,y_{i}^{\prime } } \right)} } \hfill \\ \end{gathered} $$
(6)

After obtaining the maximum mean difference loss, we use the loss reward and punishment mechanism to align the distribution and further strengthen the generalization ability of the model, rather than being limited to the dataset used for training, which contributes to solving the complex data distribution in real life.

4.3 Continual learning module

Catastrophic forgetting has always been a challenge for data-driven models. When the model is trained on a new dataset, it is difficult to guarantee that the newly trained model can still maintain the prediction accuracy of the previously trained dataset. Trajectory prediction faces complex and diverse situations in actual life. When new data enters the model processing, how to ensure knowledge retention of previously encountered situations becomes a top priority. In the training process of the neural network, the change of the parameters represents the learning ability of the network, then the parameters retain the knowledge learned on the dataset. Facing different datasets, some parameters vary greatly and some parameters vary less. A parameter with large variation indicates that has a higher contribution to learning the dataset, while a parameter with a small change indicates that has a limited contribution to the learning of the dataset. Based on this, we believe that the previously learned knowledge can be preserved by limiting the variation of distinct parameters of the network, as shown in Fig. 3. First, we use the Fisher matrix to measure the importance of parameters. The Fisher matrix is defined as follows:

Fig. 3
figure 3

On the basis of task A, the schematic diagram of using EWC regularization, L2 regularization, and penalty-free learning direction for task B, respectively. This figure appears in [35] for the first time

$$\begin{aligned} I (\theta )= E\left[\left( \frac{\partial }{\partial \theta }\text {log}f ( x\mid \theta )\right) ^{2}\mid \theta \right], \end{aligned}$$
(7)

where \( \theta \) represents learnable parameters. Actually, we can caculate the Fisher matrix by gradient. Thus, the formula can be rewritten as:

$$\begin{aligned} I (\theta ) = \frac{1}{N}\sum _{( x,y)_{i}\epsilon A } \left( \frac{\partial l_{L(\theta \mid ( x,y )_{i} ) } }{\partial \theta } \right) ^{2}, \end{aligned}$$
(8)

where A represents task A, \( \theta \) represents learnable parameters.

After getting the importance matrix, we can start the reward and punishment mechanism of the parameters. EWC loss can be defined as:

$$\begin{aligned} l_{\text {ewc}} = -\sum _{i=1}^{\text {params}} [ I_{A} ]_{ii} \frac{( \theta _{i}- [\theta _{i} ] _{A}^{*} )^{2} }{2}, \end{aligned}$$
(9)

where \(\theta \) represents learnable parameters and \(I_{A}\) represents the importance matrix.

4.4 Objective function

The overall objective function consists of three parts, the prediction loss \(L_{\text {pred}} \) is used to predict the future trajectory prediction, the alignment loss \(L_{\text {mmd}} \) is used to align the distribution of the source trajectory domain and the target trajectory domain, and the EWC loss \(L_{\text {ewc}} \) is used to ensure the learning direction of the target domain and the source domain. The prediction loss \(L_{\text {pred}} \) is the negative log likelihood as follows:

$$\begin{aligned} L_{\text {pred}} = -\sum _{t=T_{\text {obs}+1} }^{T_{\text {pred}} } \log (P((x_{t}^{i},y_{t}^{i})\mid \hat{\mu }_{t}^{i},\hat{\sigma }_{t}^{i},\hat{\rho }_{t}^{i})). \end{aligned}$$
(10)

The entire model is jointly trained by \(L_{\text {pred}}, L_{\text {mmd}}, L_{\text {ewc}} \), thus we have:

$$\begin{aligned} L = L_{\text {pred}} + \lambda L_{\text {mmd}} + \mu L_{\text {ewc}}, \end{aligned}$$
(11)

where \(\lambda \) and \(\mu \) are hyperparameters for balancing these three terms.

5 Experiments

In this section, we first present the definition of our proposed new setting as properly as the evaluation protocol. Then, we carry out extensive evaluations on our proposed model under this new setting, in comparison with previous existing methods.

Dataset Experiments are conducted on two real-world datasets, ETH [36] and UCY [37], as these two public datasets are widely used in this task. ETH consists of two scenes named ETH and HOTEL, and UCY consists of three scenes named UNIV, ZARA1, and ZARA2. The dataset contains a large number of interactions between pedestrians and their surroundings, including pedestrian–pedestrian interactions and pedestrian–environment interactions, such as pedestrian crossing, group and individual movements, crowd gathering and dispersal, and collision avoidance. In the scenes of the ETH dataset, most trajectories are simple straight lines, and there is not much social or spatial interaction between pedestrians. In contrast, the scenes of the UCY dataset tend to show more social interactions between pedestrians and interactions between pedestrians and the surrounding environment.

Experimental settings We introduce a new experimental setting that treats each scene as a trajectory domain. The model is first trained on one domain, then separately trained on the validation set of the other four domains, and then tested on the target and source domains, respectively. Given 5 trajectory domains, we have a total of 20 trajectory prediction tasks: \(A->B'/C'/D'/E'\), \(B->A'/C'/D'/E'\), \(C->A'/B'/C'/D'/E'\), \(D->A'/B'/C'/E'\), \(E->A'/B'/C'/D'\). Among them, A, B, C, D, and E represent ETH, HOTEL, UNIV, ZARA1, and ZARA2, respectively. This setup is somewhat challenging due to the catastrophic forgetting problem and domain differences.

Evaluation protocol To ensure a fair comparison under the new setting, an existing baseline is trained using a validation set of the source and target trajectory domains. Specifically, taking \(A->B'\) as an example, an existing baseline is trained training set of A and validation set of B, and then evaluated on test set of A and B. Note that the validation and test sets are independent of each other, and there are no overlapping samples between them.

Baselines Five state-of-the-art methods are compared with our proposed method under the new setting and the evaluation protocol: Social-STGCNN [38], Star [24], LB-EBM [39], SGCN [40], and SocialVAE [41]. Each model corresponds to 20 tasks, for a total of 80 comparison tasks.

Evaluation metric Following two metrics are used for performance evaluation. In these two metrics, \( N^{t} \) is the total number of pedestrians in target trajectory domain, \( \bar{v}_{t}^{i}\) are predictions, and \( {v_{t}^{i}} \) are ground-truth coordinates.

Table 2 ADE results of our model in comparison with existing state-of-the-art baselines on 20 tasks
Table 3 FDE results of our model in comparison with existing state-of-the-art baselines on 20 tasks
  • Average displacement error (ADE):

    $$\begin{aligned} ADE = \frac{ {\textstyle \sum _{i=1}^{N^{t}} {\textstyle \sum _{t=T_{\text {obs}+1} }^{T_{\text {pred}}} \parallel v_{t}^{i} - \bar{v}_{t}^{i} \parallel _{2}}}}{N^{t}(T_{\text {pred}} - T_{\text {obs}} ) } . \end{aligned}$$
    (12)
  • Final displacement error (FDE):

    $$\begin{aligned} FDE = \frac{{\textstyle \sum _{i=1}^{N^{t}}}\parallel v_{\text {pred}}^{i} - \bar{v}_{\text {pred}}^{i} \parallel _{2}}{N^{t}} . \end{aligned}$$
    (13)

Implementation detail Similar with previous baselines, 8 frames are observed and the next 12 frames are predicted. The number of ST-GCN layers is set as 1, and TXP-CNN layers is set as 5. In the training phase, the batch size is set as 128, \( \lambda \) is set as 1, and \( \mu \) is set as 1. The whole model is trained for 250 epochs, and Adam is applied as the optimizer. We set the initial learning rate as 0.01 and change to 0.002 after 150 epochs. In the inference phase, 20 predicted trajectories are sampled and the best among 20 predictions is used for evaluation.

5.1 Quantitative analysis

Tables 2 and 3 show the evaluation results of our method and 4 baselines on 20 tasks. From these two tables, we can see that our method outperforms these baselines in some cases. In the case where the target domain only trains the validation set, the amount of data is very different from the source domain, so the test effect of the model in the target domain will be greatly limited. However, our model has added domain adaptation. This module can better judge the data distribution of the target domain and the source domain, so our method has shown better results in ADE/FDE on the target domain. For the source domain, when the model is trained on the target domain, the learning direction will inevitably be biased toward the target domain and ignore the knowledge retention of the source domain, which is the problem of catastrophic forgetting. Even after training on the target domain with much smaller data than the source domain, all these baseline methods degrade significantly on the source domain, basically reaching the 70% drop metric. However, because the EWC module can control the parameter changes and retain the knowledge of the source domain, our model still maintains good insurance in the source domain, and the reduction index is controlled within 15%.

Table 4 ADE/FDE results obtained by replacing the kernel function with the SBC method in Social-STGCNN

To verify the effectiveness of our proposed SBC method, we replace the kernel function in Social-STGCNN with SBC, and the results are shown in Table 4. The results show that our SBC method has a positive effect on pedestrian trajectory prediction. In most cases, we outperform the original kernel function, which means that distance is not the only factor that affects pedestrian interaction, and pedestrians’ social habits, etc., also affect pedestrian trajectories.

Table 5 ADE/FDE results of domain adaptation module and continuous learning module ablation experiments

To verify the effectiveness of each module, we conducted further ablation experiments on the network structure of this paper. We used the ETH dataset as the source dataset and the hotel dataset as the target dataset. After training on the source dataset, we obtained the ADE/FDE results of the test set. Then, we trained on the validation set of the target dataset and tested on the test set to verify the effectiveness of the domain adaptation module. Finally, we tested the performance of the model on the test set of the source dataset to verify the effectiveness of the continual learning module.

As shown in Table 5, compared with 1.2, the network structure with the domain adaptation module achieved better performance on the target dataset. However, due to the existence of the domain adaptation module, the network parameters quickly approached the target dataset, resulting in severe knowledge loss on the source dataset and leading to a significant drop in performance when returning to train on the source dataset. Compared with 1.3, the continual learning module effectively mitigated the catastrophic forgetting problem. After training on the target dataset, the network still maintained good predictive performance on the source dataset. Compared with 1.2.3.4, we found that the network could achieve our expected results, that is, to maintain the predictive performance on the source dataset while improving the predictive performance on the target dataset with the combined action of the domain adaptation and continual learning module.

6 Conclusion

In this paper, we propose a unified model that incorporates graph neural networks and temporal convolutional neural networks for future trajectory prediction and adds domain adaptation and continual learning modules to mitigate domain differences and catastrophic forgetting. Extensive experiments demonstrate the superiority of our model in future trajectory prediction. Our work is the first to combine domain adaptation and continual learning to study future trajectory prediction, making an appropriate attempt for the application of trajectory prediction based on deep learning in real life. In the future work, we will conduct more in-depth research on the factors affecting pedestrian trajectory.