Keywords

1 Introduction

Both entity extraction and relation extraction are fundamental and critical tasks for information extraction in natural language processing. The extracted entity-relation triples can be applied to various downstream tasks, such as automatic knowledge graph construction. The early studies [1, 2] employed the pipeline approach, which extracts entities and relations sequentially. In the pipeline manner, as relation extraction depended on entity extraction, errors such as missing or incorrect entities in entity extraction were propagated to relation extraction, and were amplified [3].

In recent years, more and more studies paid attention to joint extraction approaches, which combine entity and relation extraction by multi-task learning and accomplish the two subtasks within one model. Various joint extraction approaches had been proposed. Table filling-based approach utilized a table structure to achieve joint extraction [3,4,5,6]. However, this approach required much computational resources during training. Tagging-based approach [7,8,9,10,11] designed novel tagging methods for extracting entities and relations simultaneously. However, elaborately designing a complex and relatively reasonable tagging method required much expertise. Sequence-to-sequence approach [12] treated joint extraction as a triple extraction task. It extracted triples by a sequence generation model, and was beneficial to solve the relation overlap problem. Nevertheless, the construction of the joint extraction task as a sequence generation task leaded to increased exposure bias since there was no order information among triples. In order to avoid the bias issue, some researches employed sequence-to-non-sequence approaches. Zhang et al. [13] constructed a seq2tree method, while Sui et al. [14] developed a seq2set method using a non-autoregressive encoder-decoder. All of these models were able to alleviate negative impact caused by the exposure bias.

However, both sequence-to-sequence and sequence-to-non-sequence approaches employed merely one set of encoder-decoders to construct features of the two subtasks. The parameters that were exclusive to each of the subtasks were generally the last parameters utilized for classification. This structure assumed that the features of the subtasks were mutually compatible and conflict-free. However, Zhong et al. [15] mentioned that there was a high possibility of feature conflicts between the two subtasks, which might significantly limit the performance of models.

Encoder in deep learning models can be shared by both subtasks in encoding phase. This helps the encoder learn to extract more useful information from input, since multi-task learning integrates the losses of both subtasks during training. However, in decoding phase, dual decoders may help to avoid the feature conflict problem. To this end, a novel model named as Dual-Joint-Input-PFN-Decoder is proposed in this paper. The Dual-Joint-Input-PFN-Decoder is based on the seq2set structure of SPN4RE [14] and integrates a Dual-Joint-Input-PFN strategy into dual-decoder. The Dual-Joint-Input-PFN is implemented by two Joint-Input-PFN layers which are proposed based on Partition Filter Network (PFN) [16]. The original PFN is not applicable to be utilized in dual-decoder directly. One reason is that feature structure constructed by dual-decoder is different from that in the original PFN and the other reason is that dual-decoder needs to construct interactions for both features. Thus, Joint-Input-PFN is proposed by improving the original PFN and it receives two features as input and extract favorable interaction from one feature based on another feature. In order to extract the interaction that is beneficial to both subtasks simultaneously, the Dual-Joint-Input-PFN strategy is constructed based on the pairwise Joint-Input-PFN. The strategy captures interaction features from two features that are beneficial to both subtasks and ensure that there is no feature conflict during the construction of the interactions. Based on the Dual-Joint-Input-PFN strategy, this paper decodes the two subtasks separately with a dual-decoder network incorporating the Dual-Joint-Input-PFN strategy for avoiding feature conflicts.

The main contributions of the paper lie on three-fold:

  1. 1)

    A new Dual-Joint-Input-PFN strategy is proposed by incorporating two Joint-Input-PFN strategies improved from Partition Filter Network for construct interactions between entity and relation extractions.

  2. 2)

    A new Dual-Joint-Input-PFN-Decoder model integrates the Dual-Joint-Input-PFN strategy into dual-decoder structure is proposed to utilize interactions of entity and relation extraction for reducing feature conflicts.

  3. 3)

    The proposed model achieves the best performance on two standard datasets compared with state-of-the-art baseline methods, demonstrating its effectiveness.

2 Related Work

The entity-relation extraction task is the fundamental task of many downstream tasks, and the aim of the task is to extract all entity-relation triples from given a sentence. Existing research on joint entity-relation extraction can be divided into categories of pipeline-based models, Table-filling-based models, tagging-based models, seq2seq models, and Multitask learning-based models.

The pipeline-based models [1, 2] were characterized by first extracting entities, and then classifying relations between the entities. However, these models might easily lead to the accumulation of errors. For instance, if a correct entity was missed in entity extraction task, the relations related to this entity could not be extracted correctly in relation extraction. In a backpropagation manner, the information utilized to correct errors can only flow from the relation extraction task to the entity extraction task, not from the entity extraction task to the relation extraction task, resulting in the failure of the models in utilizing connection information between the two subtasks. The Table-filling-based models [3,4,5,6] constructed relations between each pair of tokens in given a sentence with the help of table structure, and achieved entity extraction and relation extraction according to relations between tokens. This structure could well solve the problem of triples in overlapping. However, the scale of table structure and the length of sentences were quadratic, so the models often needed to consume a lot of computational resources. The tagging-based models [7,8,9,10,11] elaborated a novel tagging approach to triples extraction. These models could focus on extracting triples with different characteristics by adopting different tagging methods. However, they often required meticulous and complex human involvement consuming a huge amount of time to design an appropriate tagging strategy. Based on the structure of seq2seq models [12, 13, 17], these models adopted a similar encoder-decoder approach to extract entity-relation triples. They could achieve the entity-relation extraction task with excellent performance of existing translation models and could overcome the entity overlapping problem. Although these models had serious exposure bias problem at first, the exposure bias had decreased to be a mainstream problem after continuous improvement. However, there was always a problem of feature conflict with these models. These models tended to employ merely one set of structures to complete the joint extraction task, but the information between the two subtasks was not always beneficial for both, especially the closer to the downstream task, the more the two subtasks have task-specific features. Therefore, the feature conflict brought by mixing two features could limit the performance of the models. Secondly, after separating the two subtasks, an interaction mechanism needed to be constructed to ensure that the association information between the two subtasks were not be lost.

Multitask learning utilized connection information between tasks to integrate multiple tasks into a single model. Joint extraction can be considered as a multi-task learning task. Wang et al. [3] and Sun et al. [18] built interaction mechanisms for the entity extraction and relation extraction through which the model could capture the connection information of the two tasks and thus promote overall model performance. However, these interaction mechanisms did not filter entity and relation features, and direct fusion of two features to construct interaction leaded to the feature conflict issue.

In the joint entity-relation extraction task, encoding sentences to obtain appropriate features could further improve the performance of models. In early research, there were some other networks utilized as encoders for entity and relation extraction tasks, including CNN, LSTM, GRU, GNN, and GCN. With the emergence of large-scale pre-trained language models and performance breakthroughs achieved by them in various NLP tasks, more and more models have begun to utilize these language models as encoders or embedding layers the extraction tasks to better capture semantic information in sentences. BERT [19] was a pre-trained language model that was obtained by training on a large-scale corpus employing a multilayer Transformer encoder [20].

In this paper, the structure of SPN4RE [14] is used as a backbone structure for improvement. In order to solve the problems of feature conflicts, decoder in the original structure improved to be a dual-decoder. In addition, we propose an improved Partition Filter Network strategy into the Dual-Joint-Input-PFN-Decoder model to generate an interaction mechanism for entity extraction and relation extraction to enhance connections during of forward information propagation.

3 Methods

This paper proposes a new Dual-Joint-Input-PFN-Decoder model on by taking the sequence-to-set framework of Sui et al. [14] as a backbone structure for jointly extracting entities and relations in sentences. Our model integrates dual-decoder with Dual-Joint-Input-PFN strategy and Dual-Joint-Input-PFN is implemented by two Joint-Input-PFN strategies for performance improvement. The overall network structure is as shown in Fig. 1. Dual-Joint-Input-PFN-Decoder model needs to generate a fixed size set predictions for each sentence, and the input of the model is initialized by a fixed-size number of learnable embeddings that termed as triple embeddings. After encoding sentences by an encoder, sentence features are extracted and input into the model to transform triple embeddings into output features. Afterwards, the model compares output features with the corresponding labels through bipartite matching and calculates the loss. The improved Dual-Joint-input-PFN-Decoder model receive two inputs at the same time to build interactions between entity extraction and relation extraction and is able to separately decode entity and relation features.

Fig. 1.
figure 1

The architecture of the proposed Dual-Joint-Input-PFN-Decoder model

Given a sentence \(s = \{ x_1 ,x_2 , \cdots ,x_n \}\), xi denotes a token and n denotes the length of the sentence. The model encodes s by pre-trained BERT [19]. The encoding output is denoted as \(H^E \in R^{l \times d}\), where l is the length of the encoded sentence containing the three specified symbols [CLS], [SEP], and [PAD], while d is the hidden dimension size of hidden features.

3.1 The Dual-Joint-Input-PFN Strategy

The original PFN is an interaction strategy built on a multi-task learning framework, in which PFN encodes a feature and generates three types of features: features for entity extraction, features for relation extraction, and features for entity-relation extraction. Particularly, PFN considers that features for entity extraction is irrelevant or even harmful for relation extraction and vice versa. Afterwards, the features for entity extraction/relation extraction and the features for entity-relation extraction are combined to build entity/relation features. However, features constructed by dual-decoder are different from the feature accepted by PFN. Interaction strategy employed in dual-decoder need to be able to receive entity features and relation features simultaneously and select beneficial feature from one type of features based on another type of features. PFN cannot meet the requirements above due to structural limitations.

To improve the PFN, we propose Dual-Joint-Input-PFN strategy which is implemented by two Joint-Input-PFN strategy. The Joint-Input-PFN strategy receives two inputs at the same time and utilize one type of features to partition and filter the other features to obtain beneficial information. However, it is insufficient to utilize merely one Joint-Input-PFN strategy to construct two interactions, since one Joint-Input-PFN cannot construct interactions beneficial to both extraction task. In terms of entity extraction, Joint-Input-PFN utilize entity features to select features from the relation features for keeping useful ones that beneficial for entity extraction. However, it not able to utilize entity feature for selecting useful features from the relation features for relation extraction. Therefore, we proposed Dual-Joint-Input-PFN strategy by employing symmetric Joint-Input-PFN strategy for generating interactions that are beneficial to both entity extraction and relation extraction, presented in Fig. 2. Moreover, interactions are separated constructed, thus it avoids the problem of feature conflicts. For illustration, a Joint-Input-PFN structure for entity extraction is presented as follows since the Dual-Joint-Input-PFN strategy is implemented by two same Joint-Input-PFN strategy, in which one for entity extraction and one for relation extraction.

Fig. 2.
figure 2

The Dual-Joint-Input-PFN strategy

The Joint-Input-PFN receives the entity features \(H_i^{\text{ent-D}}\) and relation feature \(H_i^{\text{rel-D}}\) decoded by the dual-decoder. It also takes the hidden state \(H_{i - 1}^{\text{Joint-Input-PFN}}\) and cell state \(c_{t - 1}\) from previous Joint-Input-PFN strategy. The Joint-Input-PFN calculates current cell state \(\tilde{c}_i\) utilizing the relation features \(H_i^{\text{rel-D}}\) and the hidden state \(H_{i - 1}^{\text{Joint-Input-PFN}}\), as shown in Eq. (1):

$$ \tilde{c}_i = \tanh ({\text{Linear}}([H_i^{\text{rel-D}} ;H_{i - 1}^{\text{Joint-Input-PFN}} ])) $$
(1)

The [;] denotes a connection operation. Afterwards, the master gate [21] is employed to select beneficial features from the current cell state \(\tilde{c}_i\). The procedure is as Eq. (2):

$$ \begin{aligned} & \tilde{p}_{\tilde{c}_t } = {\text{cummax}}({\text{Linear}}([H_i^{\text{ent-D}} ;H_{i - 1}^{\text{Joint-Input-PFN}} ])) \\ & \tilde{q}_{\tilde{c}_t } = 1 - {\text{cummax}}({\text{Linear}}([H_i^{\text{ent-D}} ;H_{i - 1}^{\text{Joint-Input-PFN}} ])) \\ \end{aligned} $$
(2)

The selector contains a \(\tilde{p}_{\tilde{c}_t }\) for selecting relation features of current cell state \(\tilde{c}_i\) that are beneficial or harmful to the entity extraction, and a \(\tilde{q}_{\tilde{c}_t }\) for selecting relation features of \(\tilde{c}_i\) that beneficial or irrelevant to the entity extraction. After the selection using the selector, the relation feature of current cell state \(\tilde{c}_i\) is split into three parts using the \(\rho_{{\text{useful,}}\tilde{c}_t }\) selector, \(\rho_{{\text{harmful}},c_t }\) selector and \(\rho_{{\text{unrelated}},c_t }\) selector, as shown in Eq. (3). The acquired beneficial features, harmful features, and irrelevant features to entity extraction are denoted as Eq. (4):

$$ \begin{aligned} & \rho_{{\text{useful,}}\tilde{c}_t } = \tilde{p}_{\tilde{c}_t } \cdot \tilde{q}_{\tilde{c}_t } \\ & \rho_{{\text{harmful}},c_t } = \tilde{p}_{\tilde{c}_t } \rho_{{\text{useful}}} \\ & \rho_{{\text{unrelated}},c_t } = \tilde{q}_{_{\tilde{c}_t } } - \rho_{{\text{useful}}} \\ \end{aligned} $$
(3)
$$ \begin{aligned} & \rho_{\text{ent-useful}} = \rho_{{\text{useful}},\tilde{c}_t } \cdot \tilde{c}_t + \rho_{s,c_{t - 1} } \cdot c_{t - 1} \\ & \rho_{\text{ent-unrelated}} = \rho_{{\text{unrelated}},\tilde{c}_t } \cdot \tilde{c}_t + \rho_{e,c_{t - 1} } \cdot c_{t - 1} \\ & \rho_{\text{ent-harmful}} = \rho_{{\text{harmful}},\tilde{c}_t } \cdot \tilde{c}_t + \rho_{r,c_{t - 1} } \cdot c_{t - 1} \\ \end{aligned} $$
(4)

\(\rho_{s,c_{t - 1} }\), \(\rho_{e,c_{t - 1} }\), \(\rho_{r,c_{t - 1} }\) are selector to select useful features from the previous cell states \(c_{t - 1}\), calculated in the same way as \(\rho_{{\text{ent-useful,}}\tilde{c}_t }\), \(\rho_{{\text{ent-unrelated}},c_t }\), \(\rho_{{\text{ent-harmful}},c_t }\). Similarly, for the relation decoder, another Joint-Input-PFN strategy can be constructed to extract the features \(\rho_{\text{rel-useful}}\) from the entity feature that are beneficial to the relation feature.

The features \(\rho_{{\text{ent-useful,}}\tilde{c}_t }\) are generated by extracting useful feature from relation feature based on entity feature, as interaction features between entity and relation features. The Skipping connection [22] and Linear layer are employed to generate the interactions of features \(H_i^{\text{ent-DPFN-D}}\) and \(H_i^{\text{rel-DPFN-D}}\), as shown in Eq. (5). Afterwards, the two features with interactions are sent to next layer dual-decoder for further processing.

$$ \begin{gathered} H_i^{\text{ent-DPFN-D}} = {\text{Linear(}}[H_i^{\text{ent-D}} ;\rho_{\text{ent-useful}} ]{)} \hfill \\ H_i^{\text{rel-DPFN-D}} = {\text{Linear(}}[H_i^{\text{rel-D}} ;\rho_{\text{rel-useful}} ]{)} \hfill \\ \end{gathered} $$
(5)

3.2 The Dual-Joint-Input-PFN-Decoder Model

In existing studies, sequence-to-sequence or sequence-to-non-sequence models have employed one decoder to decode entity extraction and relation extraction together. However, considering that one decoder may cause the problem of feature conflicts during sharing one structure, as mentioned in [15]. We propose to use a Dual-Joint-Input-PFN-Decoder model incorporated with Dual-Joint-Input-PFN strategy to avoid feature conflicts. Based on the transformer-base non-autoregressive decoder [23], our model utilizes dual-transformers fused with our proposed Dual-Joint-Input-PFN strategy for decoding entity and relation features separately.

The Dual-Joint-Input-PFN-Decoder model consists of two identical decoders, each of which is a non-autoregressive Transformer-decoder structure with k layers. Before decoding, the Dual-Joint-Input-PFN-Decoder model takes triples embeddings, denoted as \(E \in R^{m \times d}\), as inputs, where m is the maximum number of triples in all sentences. The Transformer-decoder contains a self-attention layer, an inter-attention layer, and a feed forward networks (FFN) layer. The forward propagation process for each layer of the dual-decoder can be formalized as Eq. (6), where i denotes the i-th layer of decoder:

$$ \begin{gathered} H_i^{\text{ent-D}} = {\text{Transformer}}_{{\text{ent}}} (H_{i - 1}^{\text{ent-D}} ,H^E ) \hfill \\ H_i^{\text{rel-D}} = {\text{Transformer}}_{{\text{rel}}} (H_{i - 1}^{\text{rel-D}} ,H^E ) \hfill \\ \end{gathered} $$
(6)

However, the forward propagation of the dual-decoder is completely separated without interactions between entity and relation extraction. The Dual-Joint-Input-PFN formally receives two inputs \(H_{i - 1}^{\text{ent-D}}\) and \(H_{i - 1}^{\text{rel-D}}\), and generate \(H_{i - 1}^{\text{ent-DPFN}}\) and \(H_{i - 1}^{\text{rel-DPFN}}\) with interaction features, as shown in Eq. (7). Thus, the forward propagation process can be revised to Eq. (8) by replacing the inputs with \(H_{i - 1}^{\text{ent-DPFN}}\) and \(H_{i - 1}^{\text{rel-DPFN}}\), respectively.

$$ (H_{i - 1}^{\text{enti-DPFN}} ,H_{i - 1}^{\text{rel-DPFN}} ) = {\text{Dual-Joint-Input-PFN}}(H_{i - 1}^{\text{ent-D}} ,H_{i - 1}^{\text{rel-D}} ) $$
(7)
$$ \begin{gathered} H_i^{\text{ent-D}} = {\text{Transformer}}_{{\text{ent}}} (H_{i - 1}^{\text{ent-DPFN-D}} ,H^E ) \hfill \\ H_i^{\text{rel-D}} = {\text{Transformer}}_{{\text{rel}}} (H_{i - 1}^{\text{rel-DPFN-D}} ,H^E ) \hfill \\ \end{gathered} $$
(8)

4 Experiment

4.1 Dataset

Our proposed methods are evaluated based on two standard datasets WebNLG [24] and NYT [25], which are widely applied to the joint entity-relation extraction task, e.g., Zeng et al. [26]. WebNLG contains 5019 sentences in training dataset and 703 sentences in test dataset. There are 171 predefined relation types in WebNLG. NYT contains 56196 sentences in training dataset and 5000 sentences in test dataset, with a total of 24 relation types. The experiments were conducted using the Zeng et al. [26] version of the NYT dataset (Table 1).

Table 1. Statistics for WebNLG and NYT

4.2 Evaluation Metrics

The evaluation metrics are standard precision, recall and F1-score as follows:

$$ \begin{aligned} & Precision = \frac{TP}{{TP + FP}} \\ & Recall = \frac{TP}{{TP + FN}} \\ & F1 = \frac{2 \times Precision \times Recall}{{Precision + Recall}} \\ \end{aligned} $$
(9)

TP denotes the number of positive classes predicted to be positive, FP denotes the number of negative classes predicted to be positive, and FN denotes the number of positive classes predicted to be negative. A triple is regarded as correct if and only if matched the head entity, tail entity, and relation type of the triple are all exactly matched. There are two ways to evaluate whether an entity is correctly matched, namely partial match and exact match. Partial match means that the first word of predicted entity is the same as the first word of the matched label. Exact match means that the whole predicted entity is the same as its matched label. The WebNLG dataset utilizes partial match, and the NYT dataset utilizes exact match for entity matching respectively by following the same strategy in previous work.

4.3 Baselines

Our methods are compared with the following baseline models:

  1. 1)

    NovelTagging [9]: The model proposed a new tagging strategy featured with extracting a triple from sentences instead of extracting entity and relation separately.

  2. 2)

    CopyRE [26]: The model employed an encoder-decoder structure in a sequence-to-sequence manner by utilizing a copy mechanism to identify entities from sentences.

  3. 3)

    GraphRel [27]: The model encoded triples using graph neural networks and employed graph structures to construct interactions between entity and relation extraction.

  4. 4)

    CasRel [7]: The model treated a relation as a mapping from head entities to tail entities, to enable the extraction of entity-relation triples jointly.

  5. 5)

    RIN [18]: Two structures were developed and separately used to implement entity and relation extraction with a Recurrent Interaction Network to construct their connection information.

  6. 6)

    TPLinker [11]: The model proposed a Handshaking tagging strategy to extract overlapping triples.

  7. 7)

    SPN4RE [14]: The model utilized a non-autoregressive model and bipartite matching loss to implement a sequence-to-set framework, to solve the exposure bias issue.

The experiments were implemented using the base version of BERT [19] as encoder. The initial learning rate of BERT was set to 0.00001, while the learning rate of Dual-Joint-Input-PFN-Decoder model was set to 0.00002. The Dual-Joint-Input-PFN-Decoder model was composed of 3-layer non-autoregressive transformers. In addition, a dropout was applied to prevent overfitting, with a rate of 0.1. The maximum gradient was set to 20 to prevent explosive growth of the gradient. The training was performed utilizing the AdamW method and a Layernorm was employed to accelerate training speed. All experiments were conducted on a server with Intel Xeon CPU E5-2609, 96 GB memory, and RTX 2080Ti.

4.4 The Results

The performance comparison of our proposed model with all baseline models are presented in Table 2. Our model achieved the best performance with a precision of 0.932, a recall of 0.929 and a F1-score of 0.931 on WebNLG, and a precision of 0.930, a recall of 0.918 and a F1-score of 0.924 on NYT. Compared with SPN4RE, our model had an improvement of F1-score by 0.2% and 0.1% on the WebNLG and NYT datasets, respectively. Moreover, our model had a clear improvement on prevision over the state-of-the-art SPN4RE by 0.4% and 0.5% on the two datasets, respectively. Comparing the models with interaction mechanism, our model exceeded the RIN by 5.5% and 7.3% in F1-score on the WebNLG and NYT datasets, respectively. Comparing the models with complex tagging strategy, our model exceeded the TPLinker by 1.2% and 0.4% in F1-score on the WebNLG and NYT datasets. The results strongly demonstrate that the Dual-Joint-Input-PFN-Decoder model is able to avoid feature conflicts and build interaction between entity and relation extraction.

Table 2. The results of performance comparison, where * denotes results of reproduced models, ‘partial’ denotes entities using partial match, and ‘exact’ denotes entities using exact match.

To verify the effectiveness on extracting element of relational triples, we conducted a comparison of entity pair extraction and relation type extraction with SPN4RE. Although the performance of entity pairs decreased on F1-score by 0.3%, the performance of relation extraction improved on F1-score by 0.3%, as shown in Table 3. The improvement in relation extraction also driven a 0.2% improvement on F1-score of relational triples extraction, which was an indication that the proposed model correctly matched entity pairs and relation properly.

Table 3. The comparison of our model with the SPN4RE on entity pair extraction and relation type extraction on WebNLG

To verify the effectiveness on extraction various numbers of triples contained in a sentence, we conducted an experiment on a sentence with different numbers of triples. The number of triples in a sentence denoted as n. The best performance is achieved at n = 1 with a F1-score 0.911. The experimental results are shown in Fig. 3. Although the performance of this model decreases on F1-score by 0.4% and 0.9% compared to SPN4RE at n = 2 and n = 3, respectively. The performance of the model improves on F1-score by 3% at n = 1. Many existing models do not perform well for triple extraction at n = 1. Compared with TPLinker and CasRel, the performance of the proposed model at n = 1 is improved by 3.1% and 1.8% in F1-score, respectively.

Fig. 3.
figure 3

The comparison of various number of triples in a sentence on WebNLG

To verify the effectiveness of Dual-Joint-Input-PFN-Decoder model, an ablation experiments was conducted: Ours(w/o interaction) denotes our model without interactions and Ours(PFN) denotes the model with PFN, while Ours(Double-Encoder) denotes the model with separate two encoders. The results are shown in Table 4.

Table 4. Results of ablation experiments on WebNLG

The performance of the Ours(PFN) decreased on F1-score by 1.9% verifying that the PFN was not suitable in our proposed model. The performance of the Ours(Double-Encoder) decreased by 1.4%, indicating that the utilization of two encoders still lowered the performance. In addition, the performance of the Ours(w/o interaction) decreased by 0.8%, indicating the importance of building interactions between in entity and relation extractions.

5 Conclusion

This paper proposed a Dual-Joint-Input-PFN-Decoder model and Dual-Joint-Input-PFN strategy to solve the problem of feature conflicts in the original model and to better capture the connection information between subtasks. Dual-Joint-Input-PFN-Decoder model is employed to separate the construction process of two subtask features. Dual-Joint-Input-PFN strategy can simultaneously receive two inputs and obtain favorable information from the one input based on another input, enabling Dual-Joint-Input-PFN strategy to be applied to the dual-decoder. Experiment shown that our model achieved the best performance compared with the state-of-the-art baseline models, demonstrating the effectiveness of the method.