Keywords

1 Introduction

World knowledge is required to leverage several Natural Language Understanding (NLU) systems. [11] argue that people respond to each other’s utterances in a meaningful way not only by paying attention to the latest utterance of the conversational partner itself but also by recalling relevant information about the concepts covered in the dialogue and integrating it into their responses. In this sense, they propose end-to-end dialogue systems by augmenting them with common-sense knowledge, integrated in the form of external memory. [9] define a method that aims at increasing the accuracy of the traditional systems of keyphrase extraction, expanding the training set with not-in-text terms, obtained from an inference process using world knowledge models. As final result, the method overcomes the limitation of identifying keyphrases that do not appear in the text and/or in the corpus.

In the context of Artificial Intelligence (AI), world knowledge is commonly referenced by common-sense knowledge that is the set of background information that an individual is intended to know or assume and the ability to use it when appropriate [1, 2, 6]. Several common-sense knowledge bases have been constructed during the past decade, such as ConceptNet [10] and SenticNet [3]. In Portuguese, we highlighted the base InferenceNet [8] that contains the inferential content of concepts, defined and agreed upon in a community or area of knowledge. For instance, when we read the news “João murdered his wife by shooting her to death after an argument on Solon Pinheiro Street”, we are able to refute an assertion that the type of weapon used in the crime was a “cold weapon” because we, users of natural language, know the conditions in which the concepts “to shoot” and “to murder” can be used. Another motivating example is illustrated in Fig. 1, where the piece of knowledge “Computer is used to watch movies” is required to generate the best answer to the dialog.

Fig. 1.
figure 1

Dialogue examples and common-sense representation.

Despite the richness and vastness of the common-sense knowledge bases, we argue that common-sense knowledge has to be integrated into the target applications (Text Classification, Dialogue systems, Information Extraction systems, etc.) more effectively. The amount of common sense relations (triples (arg1;semantic relation;arg2)) is huge, and they are spread out in networks, making it difficult to choose which pieces of knowledge are, in fact, relevant. In this work, our goal is learning which set of common-sense relations best fits the target application by developing an external memory module, based on Deep Learning techniques, rather than forcing the system to encode it (the common-sense relations) in model parameters as in traditional methods.

In order to consider this common-sense knowledge in target applications, we propose a deep learning model of common-sense knowledge in Portuguese language, which can be easily coupled in NLU systems in order to leverage their performance. More specifically, the model is composed by a neural network LSTM (Long Short Term Memory) that receives a text from the target application, for example, an user message in a dialog, a response to a user tweet, a news text; and selects and learns what is the best set of common sense relations to return to the target application, which should be considered in the target learning model or system. We implemented the common sense learning module in two target applications - a Stance Classification system and an End-to-End Dialogue system. In both cases, incorporating the deep learning model improved the results.

2 Background Knowledge

2.1 World Knowledge in NLU Systems

In [11] was developed a general-theme Chatbot that uses common-sense knowledge as external information. They used a Tri-LSTM encoder with a new LSTM layer to process the common-sense knowledge. The dataset used was from Twitter and composed of 1.4M dialogue pairs. They used the ConceptNet [10] as common sense database. In the experimental evaluation, N = 10 responses were passed to the system, where one answer was positive and the other negative. The result achieved for Recall@1 was 77.5%. In [9] the authors proposed to improve the performance of the keyphrase extraction task by expanding training data using not-in-text terms obtained through an inference process using common-sense knowledge bases. The authors argue that even words that are not present in the text can be related to the text and possibly chosen as keyphrases. The achieved results show performance improvement for the task of keyphrases extraction by 5% on average.

2.2 Common Sense Knowledge Bases

Some existing Common Sense Knowledge bases are ConceptNet [10] and InferenceNet [8]. These bases provide world knowledge to Artificial Intelligence applications. InferenceNet [8] is a knowledge base that provides semantic-inferentialism knowledge for the Portuguese language, with 186.047 concepts related through 842,392 relationships in the format type rel(c1, c2). The ConceptNet [10] is a Knowledge graph that represents relations between words/phrases with assertions, for example, “a dog has a tail” and can be represented as (dog, HasA, tail). The ConceptNet contains 21 million edges and over 8 million nodes. There are 83 languages that contain at least 10,000 nodes. The vocabulary size for the Portuguese is 473,709.

3 A Deep Learning Model of Common Sense Knowledge

In this paper, we propose a deep learning model that assists the use of common-sense knowledge in NLU tasks – DeepCS Model. Common sense knowledge bases are vast and rich, and deciding which knowledge to consider in the application is a challenge. Thus, the proposed model retrieves the set of common-sense knowledge from knowledge bases (CSKB) such as ConceptNet, InferenceNet, and SenticNet and learns the best combination of relations that can contribute to the target app. Figure 2 presents the general architecture of the DeepCS. A target application sends the application’s text(s) from the training dataset - INPUT A (a question or a tweet) for the pre-processing CS module and INPUT B (the response or a tweet reply) for DeepCS Module - and receives a word vector with the best common-sense sentence. Figure 2 presents a general RNN architecture that uses common-sense knowledge as extra information. In this architecture, INPUT_A represents the target text, and INPUT_B represents the text to be sorted. Both Inputs are from the dataset available for training. INPUT_C represents X common sense sentences related to INPUT_A. As shown in Fig. 2, they were applied to pre-trained Word Embeddings, such as GloVe Embeddings [7]. That is, each word found in the entries was replaced by a numeric vector that represented it. INPUT_A and B are processed by a neural network architecture generating the representations. Then input is multiplied by a matrix W learned by the neural network and with the result is applied a hadamard product with INPUT_B, as shown in Eq. 1. The result had1 will be used in the future with the knowledge of common sense returned from DeepCS.

$$\begin{aligned} f(input_a, input_b) = (input_a * W) \circ input_b \end{aligned}$$
(1)
Fig. 2.
figure 2

General architecture of the Deep Learning Model of common-sense Knowledge.

Pre-processing and Selection of the Common Sense knowledge

As shown in Fig. 3, this module receives de INPUT_A and performs tokenization, stop words and numbers removal, word lemmatization, and vocabulary creation V. Then, for each word in the vocabulary V, N common-sense knowledge can be returned from the CSKB. This knowledge composes an H-list of common sense sentences that can be used in the current task. The next step is to relate each example of INPUT_A to a set of sentences listed in H. Thus, X common-sense knowledge can be chosen for each INPUT_A if there is a word in common. For example, “[[The police]] can [[catch a thief]]” could be related to the example of INPUT_A “French police publishes photos of suspect in yesterday’s Montrouge shooting. Maybe the same people in Kosher market http://t.co/j5nQIl4Ytu”, because in both sentences, the common-sense relation and the INPUT_A, there is the word “police”. Finally, these selected common-sense knowledge also goes through a preprocessing step that performs tokenization, stopwords and numbers removal, word lemmatization, and new words are added to vocabulary V.

Fig. 3.
figure 3

Pipeline of the preprocessing and common sense extraction step.

Processing and Learning of Common Sense Knowledge by the Neural Network Model DeepCS

The stage in which common sense knowledge is processed is called DeepCS, a neural network designed to learn which knowledge best assists in the classification task. This network receives as INPUT_C, a number of X common sense knowledge related to INPUT_A, and also receives INPUT_B. DeepCS will return a representation of common sense knowledge called MaxSC with highest relation to INPUT_B. With the result MaxSC given by the DeepCS network it is possible to perform processes that result in the application target, for example, in Fig. 2, the result of the multiplication had1 was added to the knowledge representation MaxSC, then further processes were performed until a classification was returned.

Fig. 4.
figure 4

The DeepCS architecture.

In Fig. 4, the architecture of the DeepCS neural network model is presented in detail. The model receives common-sense sentences that are processed by an LSTM, resulting in the output cs. Then, cs is multiplied by a matrix W, where the values are learned by the network, generating the output csw. In addition to common-sense Knowledge relations, the module also receives the INPUT_B that corresponds to the text to be sorted. The csw and INPUT_B are multiplied, as shown in Eq. 2.

$$\begin{aligned} f(Input_b, csw) = csw*Input_b \end{aligned}$$
(2)

From the result of multiplication 2 is extracted the highest value index, called maxIndex. Next, a Hadamard product [5] with csw and INPUT_B information is presented, as shown in Eq. 3, generating the had result.

$$\begin{aligned} f(csw, Input_b) = csw \circ Input_b \end{aligned}$$
(3)

Finally, the greatest common sense relationship is returned by value in the maxIndex position, as shown in Eq. 4. This result is returned by the DeepCS model as MaxSC.

$$\begin{aligned} f(maxIndex, had) = had[maxIndex] \end{aligned}$$
(4)

The DeepCS module is decoupled of the target application and can be reused in several natural language processing tasks, improving the representation and the common-sense learning.

4 Experimental Evaluation

4.1 Target Application A - Stance Classification

With the ease of acquiring information over the Internet, it is becoming increasingly important to identify which news is true in order to avoid the spread of rumors. The Stance Classification task is a subtask in the Rumors verification task and consists of classifying a user response in relation to a target post (a tweet) as a comment, a denial, a question or a support. It is believed that analyzing the stance of a text in response to a target would help in the detection of a rumor. We use the dataset provided by SemEval-2019 for Stance Classification task 7. This dataset is made up of a news text, a user response to the news text, and a label that indicates whether the user response was a Comment, Negation, Question, or Support. This data was taken from the Twitter and Reddit platforms. Figure 5 presents an example of tweet/reply for each of the four classes.

Fig. 5.
figure 5

Examples of the stance classification dataset.

Table 1 presents the statistics of the Stance Classification dataset. It is possible to notice that there is an imbalance between the classes, where comment represents more than 50% of the dataset for both training and testing. Common sense knowledge used in this experiment was downloaded from ConceptNet [10]. With the vocabulary created from the entries, each word should return at most ConceptNet’s three common-sense relationships [10]. This returned 7980 common-sense relationships. For each tweet target given as INPUT_A, five common-sense relations was listed in H. For comparison, two scenarios were tested for the Stance Classification problem. The first scenario describes a neural network model using LSTM that receives only Input A and Input B as input. This scenario does not use common sense knowledge in its processing. In the first scenario the entries are replaced with 100-dimensional GloVe [7] pre-trained Word Embeddings and processed by an LSTM. The hyperparameters values for the LSTM layer are 128 units, activation Tanh, recurrent dropout = 0.2, dropout = 0.2 and return sequences set to false. So, there is a multiplication between INPUT_A and a matrix W learned by the network. Finally, the inputs are multiplied and passed to a softmax function that returns the model result. For training was used batch size of size 3 and the optimizer used was ADAM. In second scenario, the Deep Learning Common Sense Knowledge (DeepCS) module is applied by adding a new input for common sense knowledge. For each input, five common sense sentences were passed to the module. Common sense knowledge relations are also replaced by pre-trained GloVe 100-word Word Embeddings and like other inputs processed by a 128-unit LSTM layer. This model was trained during 20 size 3 batch size epochs.

Table 1. Statistics of the Stance Classification Dataset.
Table 2. Experiments Results with the use of DeepCS module in Stance Classification application target.

A Table 2 presents the results obtained in scenarios 1 and 2. Analyzing the values, it is possible to notice that scenario 2 (that use the DeepCS module) obtained better results compared to scenario 1, with an increase of 0.02 in the F1-Score metric. All classes presented the best results in scenario 2, except class Support. We argue that this class is very similar to the class Comment.

4.2 Target Application A – Chatbot in the Portuguese Language

Previous experiments, presented in [4], showed improved performance in the task of classifying the best answer in a Chatbot in the Portuguese Language, when applied DeepCS module in this target application. This application consists of a neural network that, given a question, aims to choose the best answer from a set of possible answers. This kind of Chatbot is known as retrieval based. It was used as dialogues corpus of a Chatbot in operation of the Clinic SIM, clinic that operates in the Brazilian Northeast. The corpus was organized with user input, system response, dialog classification, and common sense knowledge statements related to user input. In order to train the network, it is necessary a coherent data set, where system response makes sense with user input, and incoherent, where system response makes no sense with user input. For the training set, 12544 examples were used, divided equally between coherent and incoherent. For the 3136 test set examples, half coherent and half inconsistent, were used. Table 3 presents the results for scenarios 1 and 2 of the dialog system. Scenario 1 corresponds to tests made without the use of common sense knowledge. In Scenario 2, common sense knowledge was added to the neural network model. The results show improvement in all the metrics used in these tests being more noticeable for the Recall metric.

Table 3. Experiments Results with the use of DeepCS module in Chatbot application target.

5 Conclusion

This paper proposes a neural network model called DeepCS for the use of common sense knowledge in various tasks in Natural Language Processing, called application targets. Despite the richness and vastness of the common-sense knowledge bases, we argue that common-sense knowledge has to be integrated into the target applications (Text Classification, Dialogue systems, Information Extraction systems, etc.) more effectively. The DeepCS module of common-sense knowledge in Portuguese language, which can be easily coupled in NLU systems in order to leverage their performance. More specifically, the model is composed by a neural network LSTM (Long Short Term Memory) that receives a text from the target application, for example, an user message in a dialog, a response to a user tweet, a news text; and selects and learns what is the best set of common sense relations to return to the target application, which should be considered in the target learning model or system Specifically for this research, the model was used for the Stance Classification task, a rumor detection subtask. Stance classification aims to classify a user reply to a social news post as comment, question, denial or support. Two scenarios were created for comparison. The first scenario did not use the DeepCS model, receiving only the tweet news post and the reply message. Already in the second scenario, it receives as input new source post, reply message and common sense knowledge. The use of the DeepCS model as a source of extra information showed a slight performance improvement in the metric F1-score. The DeepCS model was used in a second application target – a Chatbot in the Portuguese Language. The result in Scenario 2 was better 2%. As future work, it would be interesting to analyze the performance of the DeepCS model in other tasks, for example, tasks that use a single input. Another important point for evolution is to test different parameters and architectures for the neural network model. Regarding common-sense knowledge, using different common sense bases, such as InferenceNet [8], and varying the parameters in the preprocessing of knowledge, are analyzes that can influence the performance of the DeepCS module.