Keywords

1 Introduction

Distributed event representations are a widely-used machine-readable representation of events, known to capture meaningful features relevant to various applications [1, 2, 13]. Due to event texts are too short, capturing their semantic relationships is a challenging task. For example, despite the greater lexical overlap between “military launch program” and “military launch missile” their semantic similarity is limited, but “military launch program” and “army starts initiative” display a certain degree of semantic similarity, even though they share no lexical overlap.

In previous studies [3,4,5, 19], Neural Tensor Networks (NTNs) [16] have been commonly utilized to construct event representations by composing the constituent elements of an event, i.e., (subject, predicate, object). Nevertheless, these approaches entail a substantial compositional inductive bias and are inadequate for handling events that possess additional arguments [6]. Recent research has demonstrated the effectiveness of employing powerful PLMs, such as BERT [9], to create flexible event representations instead of using static word vector compositions [17, 21]. Nevertheless, utilizing PLMs alone to learn event representations is insufficient to capture the complicated relationships between events. Therefore, in order to tackle the challenge of capturing complex relations between events, some researchers have proposed the use of graph neural networks in event representation learning. This approach has shown to yield better performance, as demonstrated in recent studies [22, 23]. But graph neural networks are often associated with high computational complexity, which can lead to significant difficulties in training the models [24]. To fully capture complicated event relations and efficiently learn event representations, Gao et al. proposed SWCC [6], which leverages contrastive learning [7] to improve the event comprehension ability of PLMs. It has achieved state-of-the-art results on event similarity tasks and downstream tasks.

In our work, we argue that there is a rich amount of event comprehension ability in PLMs, but previous works did not make fully use of such abilities. Inspired by the advancements in prompt learning [14, 15], we have realized that providing task descriptions to PLMs can help to elicit the knowledge embedded within them. And then, this knowledge can be utilized to enhance event representation learning. To learn event representations, previous works [6, 21] leverage contrastive learning to improve the event comprehension capacity of PLMs. However, they share three common limitations. Firstly, the length of event text is relatively short, which differs significantly from the text length used in the pre-training of language models. As a result, the distribution of text length between pre-training and event representation learning is inconsistent. This inconsistency may undermine the learning process of event representation based on PLMs. Secondly, the Predicate-Subject-Object (PSO) word order, which is adopted by PLMs-based event representation models [6, 21], is significantly different from the natural language word order used during the pre-training [9]. In PSO word order, the inversion of subject and predicate can potentially undermine the performance, as the pre-trained MLM knowledge may have a counterproductive effect. Because MLM in pre-training predicts a masked token based on its context [9], a change in word order can also cause the position of the context to change. Therefore, the pre-trained MLM knowledge may be a burden for event representation learning in PSO word order. Finally, the state-of-the-art model utilizes MLM loss to prevent the forgetting of token-level knowledge during the training of event representation in the PLM [6]. However, the model only randomly masks one sub-word, which may not provide sufficient understanding of complex event texts [20].

We are motivated to address the above issues with the goal of eliciting the event comprehension capabilities from PLMs. To this end, we present PromptCL: a Prompt template-based Contrastive Learning framework for event representation learning. To address the first issue, we propose a novel prompt template-based contrastive learning method. In this approach, we incorporate a prompt template borrowed from prompt learning into contrastive learning, which comprises a description outlining the event components. The injection of the prompt template serves two purposes: it extends the length of event texts and provides semantic guidance to PLMs. To address the second issue, we propose using the SPO word order to solve the problem of subject-predicate inversion, which aligns with the natural language word order. To address the final issue, we present an approach called EventMLM, which focuses on the structure of events and aims to increase the masking rate. EventMLM not only masks entire words but also masks the complete subject, predicate, or object of the event. This approach trains PLMs to understand the relationships between event components. Overall, our study makes the following noteworthy contributions:

  • We propose PromptCL, a simple and effective framework that improves event representation learning using PLMs. To the best of our knowledge, this is the first study that utilizes prompt learning and contrastive learning to elicit event representation abilities from PLMs.

  • We introduce prompt template-based contrastive learning that extends the length of event texts and provides semantic guidance to PLMs. Additionally, we introduce the SPO word order and the EventMLM method, which are designed to train PLMs to comprehend the relationships between event components.

  • Our experimental results demonstrate that our framework outperforms previous state-of-the-art methods on event-related tasks. We conduct a thorough analysis of the proposed methods and demonstrate that they generate similarity scores that are more closely aligned with the ground truth labels.

2 The Proposed Approach

This section details our proposed approach that aims to enhance event representations by eliciting the event comprehension capabilities of PLMs. Our approach is illustrated in Fig. 1, comprising three parts: the prompt template-based contrastive learning (left), and the SPO word order (middle), and the EventMLM (right).

2.1 Prompt Template-Based Contrastive Learning

The proposed contrastive learning method based on prompt templates involves augmenting an event text by randomly inserting a template using a Bernoulli distribution. This augmentation results in a modified event that serves as a positive example during the training of contrastive learning.

Prompt Template. Given an event \(x=\{x_s, x_p, x_o\}\), the function \(prompt(\cdot )\) inserts a template into the event with a probability \(\pi \) following a Bernoulli distribution:

$$\begin{aligned} x^+ = prompt(x, \pi ) \end{aligned}$$
(1)

and the resulting prompt-augmented event is denoted as \(x^+\):

$$\begin{aligned} x^+ = \{subject\ is\ \underline{x_s}, predicate\ is\ \underline{x_p}, object\ is\ \underline{x_o}\} \end{aligned}$$
(2)

For example, if \(x=\{military, launch, missile\}\), the augmented event \(x^+\) could be :

$$\begin{aligned} x^+=\{subject\ is\ \underline{military}, predicate\ is\ \underline{launch}, object\ is\ \underline{missile}\} \end{aligned}$$
(3)

The random insertion of the template ensures that the model is trained on a slightly diverse set of events, improving its ability to capture the core semantic meaning of events.

Fig. 1.
figure 1

Architecture of PromptCL.

Dual Positive Contrastive Learning. To augment the impact of prompt templates and enhance the diversity of positive examples, we introduce an additional positive sample, whereby an input event \(x_i\) is compared not only with its prompt-augmented text \(x^+_{i,1} = prompt(x_i, \pi )\), but also with another prompt-augmented text \(x^+_{i,2} = prompt(x_i, \pi )\). Drawing inspiration from [10] and based on in-batch negatives [7], we extend InfoNCE objective [12] to:

$$\begin{aligned} \mathcal {L}_{CL}= \sum _{\alpha =\{h^+_{i,1},h^+_{i,2}\}}-\log {\frac{g(h_i, \alpha )}{g(h_i, \alpha ) + \sum _{k\in {\mathcal {N}(i)}}g(h_i, h_k)}} \end{aligned}$$
(4)

where \(h^+_{i,1}\) and \(h^+_{i,2}\) correspond to event representations of \(x^+_{i,1}\) and \(x^+_{i,2}\) respectively. \(k\in {\mathcal {N}(i)}\) is the index of in-batch negatives and \(g(\cdot )\) is a function: \(g(h_i, h_k) = \exp (h^{\top }_i h_k / \tau )\), where \(\tau \in R^+\) is temperature.

2.2 Subject-Predicate-Object Word Order

Unlike prior studies [6, 21], where PSO word orders were used to construct the input events, we use the Subject-Predicate-Object (SPO) word order in our study. The event text x consists of three components, namely the subject \(x_s\), predicate \(x_p\), and object \(x_o\). Specifically, we utilize the PLM to process an event text that consists of a sequence of tokens, following the input format represented below:

$$\begin{aligned} \mathrm{[CLS]}\ x_s\ x_p\ x_o\ \mathrm{[SEP]} \end{aligned}$$
(5)

Let \(s = [s_0, s_1, ... , s_L]\) be an input sequence, where \(s_0\) corresponds to the \(\mathrm [CLS]\) token and \(s_L\) corresponds to the \(\mathrm [SEP]\) token. When given an event text as input, a PLM generates a sequence of contextualized vectors:

$$\begin{aligned}{}[v_\mathrm{[CLS]}, v_{x_1} , ... , v_{x_L} ] = \textrm{PTM}(x) \end{aligned}$$
(6)

The representation of the \(\mathrm [CLS]\) token, denoted by \(v_\mathrm{[CLS]}\), serves as the first input to downstream tasks in many PLMs. Typically, the final representation of an input sequence is obtained by taking the \(\mathrm [CLS]\) representation as the output, that is, \(h = v_\mathrm{[CLS]}\).

2.3 Event-Oriented Masking Language Modeling

To fully utilize the text comprehension ability of PLMs, we present a novel event-oriented masking function, denoted by \(em(\cdot )\), which randomly masks a component of the input event using a uniform distribution. For a given event \(x=\{x_s, x_p, x_o\}\), the resulting masked event is denoted as \(x'\):

$$\begin{aligned} x' = em(x) \end{aligned}$$
(7)

For example, if the predicate \(x_p\) is randomly selected to be masked using a uniform distribution, we replace it with special tokens \(\mathrm [MASK]\). Note that multiple tokens may be replaced by the \(\mathrm [MASK]\) tokens.

$$\begin{aligned} x'=\{x_s, \mathrm{[MASK]...[MASK]}, x_o\} \end{aligned}$$
(8)

Distinctively, our EventMLM method differs from previous work [6], which merely masks a single token. Our proposed method not only masks several tokens but also considers the components of the event. Moreover, our method focuses on the event structure and trains the PLM to comprehend the relationships between the components, thus enhancing the event representation. In this example, the PLM needs to accurately predict the masked tokens (predicate \(x_p\)) by understanding the semantic relationship between the subject \(x_s\) and object \(x_o\).

2.4 Model Training

The overall training objective comprises three terms:

$$\begin{aligned} \mathcal {L}_{overall} = \mathcal {L}_{CL} + \mathcal {L}_{EventMLM} + \mathcal {L}_{CP} \end{aligned}$$
(9)

Firstly, we have the prompt template-based contrastive learning loss (\(\mathcal {L}_{CL}\)), which effectively incorporates prompt templates into event representation learning. Secondly, the EventMLM loss (\(\mathcal {L}_{EventMLM}\)) aims to improve the text comprehension ability of PLMs and teaches the model to comprehend the relationships between the components of input events. Finally, we introduce the prototype-based clustering objective (\(\mathcal {L}_{CP}\)) as an auxiliary loss to cluster the events while enforcing consistency between cluster assignments produced for different augmented representations of the input event [6].

3 Experiments

Consistent with conventional practices in event representation learning [3, 6, 19, 21], we conduct an analysis of the event representations acquired through our approach on two event similarity tasks and one transfer task.

3.1 Dataset and Implementation Details

For the event representation learning models training and event similarity tasks, we utilize the datasets released by Gao et al. [6]Footnote 1. For the transfer task, we use the MCNC dataset that was previously employed by Lee and Goldwasser [11]Footnote 2. It is noteworthy that above datasets explicitly specify the components of the event, indicating that they support the arbitrary organization of word order.

Our model begins with the checkpoint of BERT-based-uncased [9], and we utilize the \(\mathrm [CLS]\) token representation as the event representation. During training, we employe an Adam optimizer with a batch size of 256. The learning rate for the event representation model is set to 2e-7. The value of temperature is set to \(\tau =0.3\). Furthermore, we select the probability of prompt template insertion to be \(\pi =0.2\).

3.2 Event Similarity Tasks

Hard Similarity Task. The objective of the hard similarity task is to assess the ability of the event representation model to differentiate between similar and dissimilar events. Weber et al. [19] created a dataset (referred to as"Original") comprising two types of event pairs: one with events that have low lexical overlap but should be similar, and the other with events that have high overlap but should be dissimilar. The dataset consists of 230 event pairs. Ding et al. [3] subsequently expanded this dataset to 1,000 event pairs (denoted as "Extended"). We evaluate the performance of our model on this task using Accuracy\((\%)\) as the metric, which measures the percentage of instances where the model assigns a higher cosine similarity score to the similar event pair compared to the dissimilar one.

Transitive Sentence Similarity. This dataset [8] (denoted as "Transitive") is comprised of 108 pairs of transitive sentences containing a singular subject, object, and verb (e.g., “military launch missile”). Each pair is annotated with a similarity score ranging from 1 to 7, with higher scores indicating greater similarity between the two events. To evaluate the performance of the models, we employ Spearman’s correlation\((\rho )\) to measure the relationship between the predicted cosine similarity and the manually annotated similarity score, consistent with prior work in the field [3, 19, 21].

Comparison Methods. In our study, we conduct a comparative analysis of our proposed approach with various baseline methods. We group these methods into four distinct categories:

(1) Neural Tensor Networks: The models, Event-comp [19], Role-factor Tensor [19], Predicate Tensor [19], and NTNIntSent [3], employ Neural Tensor Networks to learn event representations. (2) Pre-trained Language Model: Two event representation learning frameworks that leverage PLMs are UniFAS [21] and UniFA-S [21]. (3) Graph Neural Network: The utilization of graph neural networks for event representation learning is employed by HeterEvent [22] and MulCL [23]. (4) Contrastive Learning: SWCC [6] is a state-of-the-art framework that is based on a PLM and combines contrastive learning and clustering.

Table 1. Evaluation performance on the similarity tasks. The Hard Similarity Task is represented by the Original and Extended datasets. The Transitive Sentence Similarity is evaluated using the Transitive dataset.

Results. Table 1 presents the results of various methods on the challenging similarity tasks, including hard similarity and transitive sentence similarity. The findings reveal that the proposed PromptCL outperforms other approaches in terms of performance. Compared to the UniFA-S approach that simply utilizes PLMs, PromptCL exhibits superior performance due to its innovative features such as prompt template and contrastive learning that better explore the text comprehension ability. PromptCL outperforms state-of-the-art event representation methods, such as SWCC, that leverage a PLM and contrastive learning. The observed enhancements can be attributed to PromptCL’s thorough exploration of PLM’s text comprehension capabilities via its prompt template-based contrastive learning and EventMLM techniques. This finding emphasizes the limited exploration of text comprehension ability in prior research and underscores the efficacy of our proposed framework, PromptCL.

3.3 Transfer Task

We conduct an evaluation of the generalization ability of event representations on the Multiple Choice Narrative Cloze (MCNC) task, which involves selecting the next event from a small set of randomly drawn events, given a sequence of events. We adopt the zero-shot transfer setting to ensure comparability with prior research [6].

Table 2. Evaluation performance on the MCNC task. *: results reported in Gao, et al.  [6]

Results. The performance of various methods on the MCNC task is reported in Table 2. The table indicates that the PromptCL method exhibits the highest accuracy on the MCNC task in the unsupervised setting. This result suggests that PromptCL has superior generalizability to downstream tasks compared to other methods in the study. We believe that the use of a prompt template can enhance the generalization capabilities of event representation models, as discussed in the section “Content of prompt”.

4 Analysis

Table 3. Ablation study for several methods evaluated on the similarity tasks. *: degenerate to PSO word order.

Ablation Study. To evaluate the effectiveness of each component in the proposed approach, we conduct an ablation study as presented in Table 3. We begin by investigating the impact of the prompt template method by setting the probability of inserting templates to zero. Removing the prompt template component resulted in a significant drop of 7.9 points in the model’s performance on the extended hard similarity task. Furthermore, we examine the effect of the SPO word order method on the similarity tasks. Removing this component led to a drop of 1.8 points in the model’s performance on the original hard similarity task. We also study the impact of the EventMLM method. Removing this component causes a 0.06 (maximum) point drop in performance on the transitive sentence similarity task. The BERT (InfoNCE) is trained using the InfoNCE objective only.

Fig. 2.
figure 2

Effect of prompt insertion probability.

Fig. 3.
figure 3

To plot the align loss and uniform loss. (lower is better)

Probability of Prompt Insertion. The influence of the probability of inserting prompt templates during the training process is depicted in Fig. 2. We have observed that as the probability of prompt template insertions increases, the model’s overall performance in intrinsic evaluation steadily improves. The insertion of prompt templates during the training of event representation models enhances the generalization of event representation in intrinsic evaluation tasks.

Uniformity and Alignment. Figure 3 displays the uniformity and alignment of various event representation models, along with their Transitive Sentence Similarity results. In general, models that exhibit better alignment and uniformity achieve superior performance, which confirms the findings in Wang, et al. [18]. Additionally, we observe that the insertion of prompt templates during event representation learning significantly improves alignment compared to baselines.

Table 4. To demonstrate the semantic clarity of prompts and evaluate their performance.

Content of Prompt. Table 4 illustrates the impact of adjusting prompt content on the training process. As shown in the table, an increase in prompt semantic clarity results in a better performance on the Hard Similarity Tasks. The generalization of event representation models is closely related to the clarity of the prompt template used during training. Specifically, a clearer prompt template provides enhanced semantic guidance, leading to more effective event representation models with better generalization capabilities.

Table 5. A case study on the Extended dataset of Hard Similarity Task.

Case Study. Table 5 shows the case study of randomly sampling several groups of events from the Extended dataset of the Hard Similarity Task. The performance of BERT(InfoNCE) and PromptCL in predicting the similarity scores of these events was evaluated. A closer alignment between the predicted and ground truth similarity scores indicates a deeper understanding of the event by the model. The results are presented in Table 5, which demonstrate that PromptCL outperforms BERT(InfoNCE) in predicting similarity scores that more closely align with the ground truth labels. This suggests that the proposed prompt template-based contrastive learning, and SPO word order, and EventMLM can aid in comprehending short event texts and provide semantic guidance for PLMs.

5 Conclusion

This study presents a novel framework called PromptCL, which aims to improve the learning of event representations through the use of PLMs, without the need for additional features such as co-occurrence information of events as used in SWCC. In particular, we introduce a prompt template-based contrastive learning method and SPO word order that allow us to easily elicit the text comprehension ability of PLMs, and an EventMLM method that trains the PLM to comprehend the relationships between event components. Our experiments demonstrate that PromptCL achieves superior performance compared to state-of-the-art baselines on several event-related tasks. Moreover, our comprehensive analysis reveals that the utilization of a prompt leads to enhanced generalization capabilities for event representations.