Keywords

1 Introduction

Open-domain conversation tries to meet human needs in terms of dialogue understanding and emotional resonance while keeping continuous. However, traditional merely data-driven multi-turn conversation models often generate simple and repetitive contents [15, 19]. To address this issue, previous studies add additional persona information documents [16] or guide the conversation topic [22] to improve dialogue informativeness and diversity.

Notably, more recent studies investigate external knowledge as additional inputs of conversations [5, 14, 24, 26, 27], including knowledge graphs (denoted as KGs) [24, 26, 27], or unstructured texts [5, 14]. Methods based on KGs show that KGs organize information around entities, making it easy to reason. Nevertheless, extracting relations for establishing the knowledge graph usually leads to the loss of information. More, it often generates less informative responses by simply applying and reformulating triples of KGs, e.g., KnowHRL [22] adds keywords from KGs using reasoning strategies to guide topics but the informativeness of the conversation has not increased significantly. Informative texts, e.g., comments about movies, can provide rich knowledge for the generation. However, their unstructured representation schemes require the language models to perform knowledge selection or attention from the knowledge texts, e.g., SKT [5] designs a complex screening process to use document knowledge. In general, these works are impossible to avoid the problem that the KGs are incomplete or the processing of documents is complicated. A very recent work, MKST [26] first attempts to apply different forms of knowledge in conversation. It extracts the entities mentioned in the sentences and links them to their corresponding entities in KGs as label knowledge. It designs a multi-knowledge-aware encoder to encode label, unstructured, and dialogue information together and get a generation by a knowledge-aware decoder. However, label knowledge is not achieved through reasoning which may not help the further expansion of dialogue. More, MKST just relies on dialogue data sets with background knowledge, e.g., Wizard-of-Wikipedia dataset.

To address above problems, we propose a new multi-turn dialogue generation model, Dynamic Multi-form Knowledge Fusion based Open-domain Chatting Machine (DMKCM). Its goal is to fuse abundant knowledge in an indexed corpus (a virtual Knowledge Base or a virtual KB) and information expansion capabilities of a commonsense knowledge graph (commonsense KG) simultaneously to enrich and expand informativeness in multi-turn conversation. The differences and functions of these two types of external knowledge can be summarized as follows:

  • Virtual KB: This kind of knowledge base is usually an indexed corpus where each document link to its related documents with keywords. Each document in this base can express a complete meaning.

  • Commonsense KG: This kind of knowledge graph includes the triples \([head\_entity, relation, tail\_entity]\), whose entities are also called commonsense facts. These commonsense facts can enhance language representation in the commonsense aspect and even expand topics with reasoning by traversing entities and relations.

Fig. 1.
figure 1

An Example of Knowledge Fusion in a Conversation. Yellow indicate key words from 1st hop, and blue indicate entities of 2nd hop. Red indicates key words from history virtual knowledge. Different colored circles and dotted arrows point out the source of latent knowledge in the response. Black arrows indicate the flow of information. (Color figure online)

For DMKCM, we design two branches, including a dialogue branch (green blocks in left of Fig. 2) and a knowledge branch (orange blocks in left of Fig. 2). The dialogue branch generates responses by interchanging knowledge with the knowledge branch. On the knowledge branch, we separately take different reasoning strategies on virtual KB and commonsense KG to get passages (1st hop) and entities (2nd hop) that are related to the current dialog. 1st hop services for riching information of response. 2nd hop is to better capture concepts shifted in conversation structure which help generate more meaningful conversations, like “ ” and “ ” (concepts in “Historical context” and “POST” in Fig. 2) hop to related concepts, e.g., “ ” and “ ” elt., along the commonsense relations, in “Related Entities” of Fig. 1. This is a typical case in natural conversations. In addition, before 1st hop, we also expand concepts by commonsense KG to calculate the filtering scores, like “ ” (from “POST” in Fig. 2) to “ ” etl.). Using these filtering scores to select results inferred from Virtual KB helps to remove potential noise in reasoning results. When the topic is shifted, it is hard to find suitable knowledge from the current 1st hop to generate a response. Especially, we find history 1st hop (history virtual knowledge) can solve this issue, e.g., “ ” and “ ” in history virtual knowledge bank related to response in Fig. 1. For this, history virtual knowledge is dynamically stored into the history virtual knowledge bank and provides knowledge support for the current turn. This helps the topic transition better in the current dialogue and also enriches the response to some extent. Our work improves the model explainability on knowledge extraction, helps to generate informative responses, and expands the topic of conversation to a certain extent. Explainability is important to dialogue in information-oriented scenarios, where a user needs to know how new knowledge in chatbot’s responses is linked to the knowledge in their utterances, as Fig. 1 shows. Our experiments on two conversation datasets, including Persona-Chat [25] and DailyDialog [8], demonstrate the effectiveness of DMKCM.

In summary, the following contributions are made in this paper:

  • This paper creatively proposes a novel dialogue generation model-DMKCM, to dynamically fuse multi-from knowledge into generation. To our best knowledge, this work is the first attempt to fuse virtual KB and commonsense KG into dialogue to get better responses.

  • To adjust to the open domain dialogue task, we construct a new virtual knowledge base using the dataset of commonsense stories-ROCStory.

  • We find that history virtual knowledge helps generate better responses and provides a new dynamically delayed updating strategy to store and filter history virtual knowledge.

  • The experimental results and cases show the superior performance of our model. Various evaluating indicators indicate that DMKCM not only maximizes the advantages of achieved knowledge but also helps to generate more informative and coherent conversations.

2 Related Work

2.1 Dialogue Generation with External Knowledge

Many works have proved that external knowledge can facilitate dialogue generation. [27] presents a novel open-domain dialogue generation method to demonstrate how large-scale commonsense knowledge can facilitate language understanding and generation. [4] proposes a latent relation language model, a class of language models that parameterize the joint distribution over the words in a document and relevant entities via knowledge graph relations. For the use of the external documents, [10] incorporates external documents into the procedure of response generation in custom service dialogues. GLKS [14] adopts a global guide method to the local, and uses the dialogue contexts to filter out important n-gram information from the document to guide the generation process. However, the knowledge graphs lose facts, and external texts require complicated processing. These two forms of knowledge still have limitations in exerting external knowledge.

2.2 Virtual Knowledge Base

Virtual Knowledge Base (virtual KB) is an indexed corpus, which treats a corpus as a knowledge base containing entities and texts. It has been widely employed in open-domain Question Answer (QA) [2, 3, 11]. Virtual KB accomplishes the QA tasks by answering queries with spans from the corpus, ensuring that facts can be preserved in the relation extraction process. Whereas, to the best of our knowledge, virtual KB has not yet been mentioned in open-domain dialogue generation. DrKIT [2] is a state-of-the-art reasoning algorithm with QA on a virtual KB, which traverses textual data like a KB, softly following paths of relations between mentions of entities in the corpus. Inspired by this, we present a novel model, DMKCM, which includes a reasoning strategy based on DrKIT for getting more information related to our dialogue. To better fit our task, we convert a commonsense story corpus-the indexed ROCStories [12] as our virtual KB, instead of professional Wikipedia.

3 Model

3.1 Overview

The overview of DMKCM is shown in Fig. 2. DMKCM consists of two branches, dialogue branch and knowledge branch. The dialogue branch (green blocks in left of Fig. 2) aims to generate conversation based on an encoder-decoder model and interacts information with the knowledge branch to improve the informativeness expression of response. The knowledge branch (orange blocks in left of Fig. 2) is to reason, store, merge, and expand knowledge by Virtual Knowledge reasoning module (VK-reasoning), Dynamic Virtual Knowledge memory module (DVK-memory), Dynamic Virtual Knowledge selector module (DVK-selector), and Commonsense Knowledge expansion module (CK-expansion).

Fig. 2.
figure 2

The left is the overview of DMKCM, including the dialogue branch (green blocks) and knowledge branch (orange blocks). Right is details of the knowledge branch, which includes Knowledge Acquisition and Knowledge Selector. Knowledge Acquisition has three modules, including VK-reasoning, DVK-memory, and CK-expansion. Knowledge Selector represents the process of DVK-selector and the merge can be seen in Eq. 7. (Color figure online)

Before presenting our detail for the dialogue generation approach, we first introduce our notations and critical concepts. Formally, suppose we have a conversation dataset \(D=\left\{ \left( {C}^{i},{X}^{i}, R^{i}\right) \right\} _{i=1}^{N}\), where \(C^{i}\) represents conversational context before i-th turn. \(X^{i}\) is i-th user utterance. \(R^{i}\) is a response regarding to \(X^{i}\). The final goal of this task is to estimate a generation probability distribution P(R|[CX], K) from D. Therefore, one can generate a response for \(\left[ C,X\right] \) following P(R|[CX], K), where \(\left[ C,X\right] \) means the concatenation for context C and current user utterance X, K is the knowledge from knowledge branch, and R is the corresponding response. Assume that there are already \(n-1\) turns in a dialogue. We use Transformer encoder (T_enc) to encode \(X^{n}\) and \(C^{n}\) and get the last hidden state \(h_{p}^{n}\) and state \(h_{e}^{n}\) from \({X}^{n}\) and \(\left[ C^{n},X^{n}\right] \) separately. \(h_{p}^{n}\) and \(h_{e}^{n}\) represent encoded semantic information of \({X}^{n}\) and \(\left[ C^{n},X^{n}\right] \). Next, we elaborate on details for each of them.

3.2 Knowledge Branch

Firstly, VK-reasoning reasons and filters candidate documents related to the user utterance \(X^{n}\) as 1st hop from a virtual KB. We send \(X^{n}\), candidate documents, and \(C^{n}\) to CK-expansion, aiming to expand commonsense concepts for better response. DVK-memory is a dynamic transfer module. It dynamically stores encoded vectors of 1st hop from previous \(n-1\) turns. We named these vectors as history virtual knowledge. Then, we filter and send related history virtual knowledge to DVK-selector as a knowledge supplement. Next, DVK-selector dynamically integrates information from 1st hop and history virtual knowledge for the decoder. Noticeably, after this process, encoded information from the current 1st hop requires to be updated into DVK-memory. Before generating a response, CK-expansion needs to expand the words of input information by traversing in a commonsense KG to capture concepts shifted in conversation structure. These extended concepts are denoted as 2nd hop.

Virtual Knowledge Reasoning (VK-Reasoning). Corresponding to our open-domain dialogue task, we select a commonsense story corpus-ROCStories [12] as the source of our virtual KB. Firstly, we index each unique title of ROCStories as an entity in our virtual KB. And then, to complete the simulation of the relationship pattern of KGs on the text, we traverse each story to connect related titles (entities). We use the latest reasoning algorithm DrKIT [2] to train a reasoning model with our conversation dataset and virtual KB. By this trained model, we reason and get the related candidate documents \(K_{D}^{n}\) to \(X^{n}\). In particular, to obtain documents that are more relevant to \(X^{n}\), we list related words of each word in \( X ^ {n} \) from a commonsense. The number of co-occurrence of each document and this list is regarded as this document’s filtering score. We get top T candidate documents from \(K_{D}^{n}\) by this score. For convenience, these filtered candidate documents are named as \(K_{V}^{n}\) (1st hop).

As shown in right of Fig. 2, we encode \(K_{V}^{n}\) for DVK-selector. Concretely, these candidate documents are successively encoded by transformer encoder. Then, we get the last hidden state \(h_{V_{t}}^{n}\) from the encoder layer, which represents encoding information of t-th candidate document and t is from 1 to T. T means the total number of candidate documents. Subsequently, we merge the last hidden states generated each time into a matrix called \(H_{V}^{n}\). The process is as follows:

$$\begin{aligned} H_{V}^{n} = \left[ h_{V_{1}}^{n},h_{V_{2}}^{n}, \ldots , h_{V_{T}}^{n}\right] ^{T}, \end{aligned}$$
(1)
$$\begin{aligned} h_{V_{t}}^{n} = T\_enc\left( K_{V_{t}}^{n}\right) , \left( t=1, \ldots ,T\right) . \end{aligned}$$
(2)

Dynamic Virtual Knowledge Memory (DVK-Memory). When the topic is shifted, e.g., “ ” in “A4” of Table 1, VK-reasoning may find documents about “ ” instead of “weight” and “healthy” which stand for the topic from context. This leads to little support for the current conversation because our model lacks practical knowledge to generate a response if only using the 1st hop knowledge.

Table 1. An Example of Conversation Topic Shift in DailyDialog. The topic words in the dialogue are emphasized in bold, and the words in red indicate the words that deviate from the current dialogue topic.

Thus, we design DVK-memory to dynamically store and filter history virtual knowledge representation, which provides a virtual knowledge supplement for the 1st hop and finally helps the topic transition has better smooth in the current dialogue. This process applies a dynamic delayed updating strategy (in Algorithm 1).

figure k

We denote \(M=\left\{ M^{n}\right\} _{n=2}^{N}\) as a set of history virtual knowledge representation, where \(M^{n}\) is historical knowledge representation for n-th turn of dialogue.

$$\begin{aligned} M^{n}=\left[ H_{V}^{1}, \ldots , H_{V}^{n-1}\right] ^{T}. \end{aligned}$$
(3)

Then, We apply attention mechanism from [1] to calculate the extracting historical knowledge representation of documents \(h_{m}^{n}\) from \(M^{n}\) that is related to the representation of current user utterance \(h_{p}^{n}\):

$$\begin{aligned} h_{m}^{n}=\sum _{i=2}^{n-1}\alpha _{w,k}^{i}M_{k}^{i}, \end{aligned}$$
(4)
$$\begin{aligned} \alpha _{w,k}^{i}=\frac{exp\left( S_{w,k}^{i}\right) }{\sum _{i=1}^{n}exp\left( S_{w,k}^{i}\right) }, \end{aligned}$$
(5)
$$\begin{aligned} S_{w,k}^{i}=V_{a}^{T}tanh\left( W_{h}\left[ h_{p_{w}}^{i};M_{k}^{i} \right] \right) , \end{aligned}$$
(6)

where \(M_{k}^{i}\) represents the k-th position hidden state of history virtual knowledge \(M^{i}\) and \(h_{p_{w}}^{i}\) is the w-th token vector in \(h_{p}^{i}\). \(V_{a}^{T}\) and \(W_{h}\) are trainable parameters. \(S_{w,k}^{i}\) is the unnormalized attention weight by an attention neural network and \(\alpha _{w,k}^{i}\) is the normalized attention weight from \(S_{w,k}^{i}\).

Dynamic Virtual Knowledge Selector (DVK-Selector). We apply multi-head attention mechanism (MultiHead) [21] to extract features of current virtual knowledge \(H_{V}^{n}\) and historical knowledge \(h_{m}^{n}\) according to dialogue semantic information \(h_{c}^{n}\). A gate is proposed for information fusion and its result is \(A_{merge}^{n}\). Specifically,

$$\begin{aligned} \begin{aligned} A_{merge}^{n}= \left\{ \begin{array}{ll} \mu A_{V}^{n}+h_{e}^{n}, &{} n=0\\ \mu A_{V}^{n}+(1-\mu )A_{M}^{n}+h_{e}^{n}, &{} n > 0 \\ \end{array} \right. \end{aligned}, \end{aligned}$$
(7)
$$\begin{aligned} A_{M}^{n}=MultiHead\left( h_{e}^{n}, h_{m}^{n}, h_{m}^{n}\right) , \end{aligned}$$
(8)
$$\begin{aligned} A_{V}^{n}=MultiHead\left( h_{e}^{n}, H_{V}^{n}, H_{V}^{n}\right) , \end{aligned}$$
(9)
$$\begin{aligned} \mu = sigmoid\left( W_{g}h_{e}^{n}\right) . \end{aligned}$$
(10)

Here, we use the sigmoid function to get a gating parameter \(\mu \) for fusion, and the \(W_{g}\) are trainable parameters. \( A_{V}^{n}\) is the current virtual knowledge features related to \(h_{e}^{n}\) and \(A_{M}^{n}\) is the historical knowledge features related to \(h_{e}^{n}\). Particularly, since DVK-memory takes a delayed updating strategy, it needs to remove the \(h_{m}^{n}\) when n is 0.

Commonsense Knowledge Expansion (CK-Expansion). To expand concepts and further enhance informativeness, we expand entities of \(K_{V}^{n}\), \(C^{n}\) and \(X^{n}\) by searching neighbor nodes on a commonsense KG. We use \(K_C^{n}=(k_h^{n},k_r^{n},k_t^{n})\) to represent the knowledge triples, which connects the original entities and expanded entities. \(k_h^{n}\) is a set of words (entities) from \(K_{V}^{n}\), \(C^{n}\) and \(X^{n}\). \(k_t^{n}\) means expanded entities by the KG. \(k_r^{n}\) is the relation of \(k_h^{n}\) and \(k_t^{n}\) on the KG. Inspired by GCN that can encode graph structure well, we use Multi-layer CompGCN (M_CompGCN) [20] to encode the knowledge triples by combining the node embedding and the relation embedding.

$$\begin{aligned} h_{K_h}^n, h_{K_r}^n, h_{K_t}^n = M\_CompGCN(K_C^{n}). \end{aligned}$$
(11)

We use the dialogue context encoding \(h_e^n\) to compute the degree of attention \(\beta ^i\) with the encoded head \(h_{K_h}^n\) and the relation \(h_{K_r}^n\), and then multiply with the encoded tail \(h_{K_t}^n\). Finally, we get the representation of knowledge triples \(h_{k_C}^n\).

$$\begin{aligned} h_{k_C}^n = \sum _{i=1}^k{\beta ^i h_{k_t}^i}, \end{aligned}$$
(12)
$$\begin{aligned} \beta ^i = Softmax( h_{e}^i[h_{k_h}^i+h_{k_r}^i]), \end{aligned}$$
(13)

where k is the number of the triples.

3.3 Generation

We use Transformer Decoder (T_dec) to generate words,

$$\begin{aligned} h_{d}^{n} = T\_dec(y_{t-1}^{n}, A_{merge}^{n}). \end{aligned}$$
(14)

Then, a Controller is designed, in which the decoded hidden state \(h_{d}^{n}\) will be mapped into vocab size and outputs the probability of words \(P_{v}\) by Softmax function,

$$\begin{aligned} P_{v} = Softmax(W_{v}h_{d}^{n}). \end{aligned}$$
(15)

In addition, we can also generate knowledgeable words by using knowledge expansion representation encoded in CK-expansion,

$$\begin{aligned} P_{K_{C}} = Softmax(\sum _{i=1}^l{\gamma _i^n h_{k_C}^n}), \end{aligned}$$
(16)
$$\begin{aligned} \gamma _i^n = Softmax(h_{d}^{n}W_{k}h_{K_C}^n). \end{aligned}$$
(17)

We get an attention weight \(\gamma _i^n\) by using the decoded hidden state \(h_{d}^{n}\) to focus on the \(h_{k_C}^n\), which can make the model focus on the relative knowledge triples; then, we choose the knowledge entities according to entities probability \(P_{K_{C}}\) of relative weighted knowledge after Softmax function.

The final generated words will consider both the distribution of standard vocabulary and the distribution of knowledge entities. We use a soft gate probability \(g_t\) to choose the generated words from standard vocabulary or knowledge entities.

$$\begin{aligned} y_{t} = g_{t} \cdot P_{v} + (1-g_{t}) \cdot P_{K_{C}} \end{aligned}$$
(18)
$$\begin{aligned} g_{t} = \sigma (h_{d}^{n}) \end{aligned}$$
(19)

3.4 Training

To train the proposed model, we minimize the negative log-likelihood

$$\begin{aligned} L_{NLL} = - \frac{1}{N} \sum _{i=1}^{N}\sum _{t=1}^{T}log P(y_{t}^{(n)}|y_{<t}^{(n)},X^{(n)}, K^{(n)}), \end{aligned}$$
(20)

where N is the total number of the dataset, and T is the timestep of the n-th turn response sentence. \(X^{(n)}\) represents the n-th turn user utterance in the dataset, and \(K^{(n)}\) represents the n-th turn knowledge.

4 Experiments

4.1 Dataset

Conversation Corpus: We choose DailyDialog [8] and PersonaChat [25] as our datasets. In our work, four turns of dialogue are a unit of training sample and pre-processed statistics of the above datasets are shown in the Fig. 3.

Commonsense Knowledge Corpus: ConceptNet [17] is a semantic network designed to help computers understand the meanings of words that people use. Its English vocabulary contains approximately 1,500,000 nodes. Source of Virtual KB: The ROCStories [12] is a commonsense story corpus which contains 98,161 five-sentence stories.

Fig. 3.
figure 3

Statistics of two datasets.

4.2 Comparison Method

We compare our model with representative baselines to investigate its effectiveness. The baselines are as follows: (1) Attn-S2S [18]: A classic method applies simple attention [1] to the input context based on the sequence-to-sequence model; (2) Transformer [21]: Transformer is a popular network architecture, based solely on attention mechanisms; (3) Dir-VHRED [23]: A recent work based on the latent variable hierarchical recurrent encoder-decoder model and characterizes the latent variables using Dirichlet distribution instead of traditional Gaussian distribution; (4) GLKS [14]: The newest dialogue generation model based on unstructured knowledge builds a global-to-local knowledge selection method to improve the quality of selected unstructured knowledge in background-based conversations; (5) CCM [27]: A commonsense knowledge-aware conversation model, which leverages commonsense knowledge from ConceptNet [17] through two graph attention mechanisms to facilitate informative response generation; (6) MKST [26]: The latest universal transformer-based architecture fuses label, background knowledge in open-domain conversation. Since MKST relies on datasets with background knowledge, we compare it with our model only on PersonaChat which has background information.

4.3 Implementation Details

We conduct the experiments with a Transformer structure (our baseline) with 8 heads, 6 layers, 512-dimensional hidden states, and a 2-layer GCN model. In the VK-reasoning, we set the number of reasoned candidate documents as 10 and filtered candidate documents as 5. During the CK-expansion, we search the neighbors of nodes and preserve the top 100 neighbor nodes. When processing datasets, the history context we choose is the previous turn of the current conversation. To train the model, we use the Adam optimizer [6] and use Adam-warmup to adjust the learning rate.

4.4 Evaluation Metrics

To analyze and evaluate our model more comprehensively, we use both automatic and human evaluations. Automatic Evaluation: Based on previous work, we apply several widely used automatic metrics. Specifically, we adopt PPL, BLEU-1,2,3,4) [13], and Distinct-1,2 (Dist-1,2) [7] to intuitively reveals quality, coherence and diversity of generated responses. PPL is the perplexity score that measures the quality of the language model. BLEU calculates word-based precision between a generated response and a gold response. Distinct evaluates the informativeness and diversity of the predicted responses. Human Evaluation: As known, automatic evaluation indicators have limitations in evaluating human conversations [9]. In our work, we randomly sample 200 test samples to conduct human evaluations. For response, we define three metrics: (1) Fluency (Flu.), i.e., degree of fluency and human readability; (2) Informativeness (Inf.), i.e., degree of knowledge for responses; (3) Appropriateness (App.), i.e., degree of relevance to the given context; Each response has 3 annotators to give a 3-graded whose rating range from 0 to 2. We take the average scores as the final results for each metric.

5 Results and Discussion

5.1 Performance Evaluation

Automatic Evaluation. Table 2 lists the automatic evaluation results for each model. Our model outperforms almost the baselines on two corpora. In the quality of the model, our PPL is the lowest, indicating that our generated responses Model is more grammatical. In the aspect of coherence, DMKCM has higher BLEU values, demonstrating our model tends to generate responses that are more similar to the gold responses than baselines in most cases. It can be inferred that our model can effectively obtain useful information from the historical context and historical knowledge in memory to help generate a response. On diversity, the Dist-1,2 metrics demonstrate that the models leveraging external knowledge achieve better performance than the knowledge-based model, e.g., GLKS, CCM, MKST, in generating meaningful and diverse responses. According to Table 2, in terms of indicators, DMKCM is better than MKST, which is the latest multi-knowledge based dialogue generation model. This signifies the effectiveness of our model on using structured knowledge or unstructured knowledge or the method of fusion.

Table 2. Automatic evaluation results of the proposed model and the baseline models. Numbers in bold indicate the best-performing model on the corresponding metrics.

Human Evaluation. Figure 4 clearly shows the human evaluation metrics results of DMKCM compared with the baselines through the radar chart. The three vertices of the radar chart respectively represent fluency, informativeness, and appropriateness. From the radar chart, DMKCM has the best performance on two datasets. Particularly, the informativeness has the most obvious advantage over other baselines, which indicates the effectiveness of our fusion of the multi-form knowledge and can generate coherent and informative responses.

Fig. 4.
figure 4

Comparison of human Evaluation Results. The rating range from 0 to 2 and the bigger means the better.

Table 3. Results of ablation study.

5.2 Ablation Study

As shown in Table 3, we analyze the effectiveness of each module proposed in DMKCM through the following situations: (1) 2Hop: DMKCM drops CK-expansion; (2) Mem and 2Hop: DMKCM drops DVK-memory and CK-expansion; (3) 1Hop and Mem: DMKCM drops VK-reasoning and DVK-memory; (4) 1Hop, Mem, and 2Hop: This is baseline (Transformer). From the results, we can observe that the performance of situation (1) drops sharply from our model. This is within our expectations since CK-expansion helps to capture extra information from the post, which can improve the diversity of generated responses. This result can also show that fusing structured knowledge can effectively help dialogue generation. Metric results of the situation (1) are better than situation (2), which verifies that retaining history virtual knowledge using DVK-memory can effectively help dialogue generation. Situation (3) is related to unstructured knowledge and its poor results prove the effectiveness of VK-reasoning and DVK-memory, and the reasoned virtual knowledge affects generating responses. Situation (1) is better than situation (4) can also reflect all of the modules play important roles in our model. In summary, these modules of DMKCM designed for fusing structured and unstructured knowledge are helpful for response generating in terms of informativeness and coherence.

Table 4. Case study of generated responses.
Fig. 5.
figure 5

Examples of DMKCM

5.3 Case Study

Sample conversations are shown in Table 4 which indicate that DMKCM can generate better responses than the baselines on two conversation datasets. Traditional end-to-end models, e.g., Attn-S2S and Dir-VHRED, tend to generate simple, safe, and incoherent responses without access to knowledge. Despite knowledge-based model, e.g., CCM, GLKS, and MKST, which generate informative responses, still produces irrelevant responses. Instead, DMKCM fuses knowledge from the knowledge graph and virtual knowledge as a whole and encodes more related information via DVK-selector and Controller modules, which supports generating more information and coherence response. In practice, the effect of these modules can be reflected, like Fig. 5. It thus generates more reasonable responses through better use of knowledge.

6 Conclusion

To solve the challenge of lacking informative response in multi-turn dialogue generation, we propose a novel model, DMKCM. The existing methods of introducing knowledge into dialogue generation have some limits, so we combine virtual KB and commonsense KG to help generate better responses. In addition, we find that history virtual knowledge can improve responses and provide a new dynamically delayed updating strategy to store and filter history virtual knowledge. Experimental results on two datasets show that DMKCM can generate a more informative dialog with appropriate content ordering.