A Deep Learning Model of Common Sense Knowledge for Augmenting Natural Language Processing Tasks in Portuguese Language

Carvalho, Cecília Silvestre; Pinheiro, Vládia C.; Freire, Lívio

doi:10.1007/978-3-030-41505-1_29

Cecília Silvestre Carvalho¹⁴,
Vládia C. Pinheiro¹⁴ &
Lívio Freire¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12037))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

609 Accesses

Abstract

Despite the richness and vastness of the common-sense knowledge bases, we argue that common-sense knowledge has to be integrated into the target applications (Text Classification, Dialogue systems, Information Extraction systems, etc.) more effectively. In order to consider this common-sense knowledge in target applications, we propose a deep learning model of common-sense knowledge in Portuguese language, which can be easily coupled in Natural Language Understanding (NLU) systems in order to leverage their performance. More specifically, the model is composed by a neural network LSTM (Long Short Term Memory) that receives a text from the target application, for example, an user message in a dialog, a response to a user tweet, a news text; and selects and learns what is the best set of common-sense relations to return to the target application, which should be considered in the target learning model or system. We implemented the common-sense learning module in two target applications - a Stance Classification system and an End-to-End Dialogue system. In both cases, incorporating the deep learning model improved the results.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Sense representations for Portuguese: experiments with sense embeddings and deep neural language models

Article 28 February 2021

Bilinear Fusion of Commonsense Knowledge with Attention-Based NLI Models

Common Sense Knowledge in Large Scale Neural Conversational Models

Keywords

1 Introduction

World knowledge is required to leverage several Natural Language Understanding (NLU) systems. [11] argue that people respond to each other’s utterances in a meaningful way not only by paying attention to the latest utterance of the conversational partner itself but also by recalling relevant information about the concepts covered in the dialogue and integrating it into their responses. In this sense, they propose end-to-end dialogue systems by augmenting them with common-sense knowledge, integrated in the form of external memory. [9] define a method that aims at increasing the accuracy of the traditional systems of keyphrase extraction, expanding the training set with not-in-text terms, obtained from an inference process using world knowledge models. As final result, the method overcomes the limitation of identifying keyphrases that do not appear in the text and/or in the corpus.

In the context of Artificial Intelligence (AI), world knowledge is commonly referenced by common-sense knowledge that is the set of background information that an individual is intended to know or assume and the ability to use it when appropriate [1, 2, 6]. Several common-sense knowledge bases have been constructed during the past decade, such as ConceptNet [10] and SenticNet [3]. In Portuguese, we highlighted the base InferenceNet [8] that contains the inferential content of concepts, defined and agreed upon in a community or area of knowledge. For instance, when we read the news “João murdered his wife by shooting her to death after an argument on Solon Pinheiro Street”, we are able to refute an assertion that the type of weapon used in the crime was a “cold weapon” because we, users of natural language, know the conditions in which the concepts “to shoot” and “to murder” can be used. Another motivating example is illustrated in Fig. 1, where the piece of knowledge “Computer is used to watch movies” is required to generate the best answer to the dialog.

Despite the richness and vastness of the common-sense knowledge bases, we argue that common-sense knowledge has to be integrated into the target applications (Text Classification, Dialogue systems, Information Extraction systems, etc.) more effectively. The amount of common sense relations (triples (arg1;semantic relation;arg2)) is huge, and they are spread out in networks, making it difficult to choose which pieces of knowledge are, in fact, relevant. In this work, our goal is learning which set of common-sense relations best fits the target application by developing an external memory module, based on Deep Learning techniques, rather than forcing the system to encode it (the common-sense relations) in model parameters as in traditional methods.

In order to consider this common-sense knowledge in target applications, we propose a deep learning model of common-sense knowledge in Portuguese language, which can be easily coupled in NLU systems in order to leverage their performance. More specifically, the model is composed by a neural network LSTM (Long Short Term Memory) that receives a text from the target application, for example, an user message in a dialog, a response to a user tweet, a news text; and selects and learns what is the best set of common sense relations to return to the target application, which should be considered in the target learning model or system. We implemented the common sense learning module in two target applications - a Stance Classification system and an End-to-End Dialogue system. In both cases, incorporating the deep learning model improved the results.

2 Background Knowledge

2.1 World Knowledge in NLU Systems

In [11] was developed a general-theme Chatbot that uses common-sense knowledge as external information. They used a Tri-LSTM encoder with a new LSTM layer to process the common-sense knowledge. The dataset used was from Twitter and composed of 1.4M dialogue pairs. They used the ConceptNet [10] as common sense database. In the experimental evaluation, N = 10 responses were passed to the system, where one answer was positive and the other negative. The result achieved for Recall@1 was 77.5%. In [9] the authors proposed to improve the performance of the keyphrase extraction task by expanding training data using not-in-text terms obtained through an inference process using common-sense knowledge bases. The authors argue that even words that are not present in the text can be related to the text and possibly chosen as keyphrases. The achieved results show performance improvement for the task of keyphrases extraction by 5% on average.

2.2 Common Sense Knowledge Bases

Some existing Common Sense Knowledge bases are ConceptNet [10] and InferenceNet [8]. These bases provide world knowledge to Artificial Intelligence applications. InferenceNet [8] is a knowledge base that provides semantic-inferentialism knowledge for the Portuguese language, with 186.047 concepts related through 842,392 relationships in the format type rel(c1, c2). The ConceptNet [10] is a Knowledge graph that represents relations between words/phrases with assertions, for example, “a dog has a tail” and can be represented as (dog, HasA, tail). The ConceptNet contains 21 million edges and over 8 million nodes. There are 83 languages that contain at least 10,000 nodes. The vocabulary size for the Portuguese is 473,709.

3 A Deep Learning Model of Common Sense Knowledge

In this paper, we propose a deep learning model that assists the use of common-sense knowledge in NLU tasks – DeepCS Model. Common sense knowledge bases are vast and rich, and deciding which knowledge to consider in the application is a challenge. Thus, the proposed model retrieves the set of common-sense knowledge from knowledge bases (CSKB) such as ConceptNet, InferenceNet, and SenticNet and learns the best combination of relations that can contribute to the target app. Figure 2 presents the general architecture of the DeepCS. A target application sends the application’s text(s) from the training dataset - INPUT A (a question or a tweet) for the pre-processing CS module and INPUT B (the response or a tweet reply) for DeepCS Module - and receives a word vector with the best common-sense sentence. Figure 2 presents a general RNN architecture that uses common-sense knowledge as extra information. In this architecture, INPUT_A represents the target text, and INPUT_B represents the text to be sorted. Both Inputs are from the dataset available for training. INPUT_C represents X common sense sentences related to INPUT_A. As shown in Fig. 2, they were applied to pre-trained Word Embeddings, such as GloVe Embeddings [7]. That is, each word found in the entries was replaced by a numeric vector that represented it. INPUT_A and B are processed by a neural network architecture generating the representations. Then input is multiplied by a matrix W learned by the neural network and with the result is applied a hadamard product with INPUT_B, as shown in Eq. 1. The result had1 will be used in the future with the knowledge of common sense returned from DeepCS.

$$\begin{aligned} f(input_a, input_b) = (input_a * W) \circ input_b \end{aligned}$$

(1)

Pre-processing and Selection of the Common Sense knowledge

As shown in Fig. 3, this module receives de INPUT_A and performs tokenization, stop words and numbers removal, word lemmatization, and vocabulary creation V. Then, for each word in the vocabulary V, N common-sense knowledge can be returned from the CSKB. This knowledge composes an H-list of common sense sentences that can be used in the current task. The next step is to relate each example of INPUT_A to a set of sentences listed in H. Thus, X common-sense knowledge can be chosen for each INPUT_A if there is a word in common. For example, “[[The police]] can [[catch a thief]]” could be related to the example of INPUT_A “French police publishes photos of suspect in yesterday’s Montrouge shooting. Maybe the same people in Kosher market http://t.co/j5nQIl4Ytu”, because in both sentences, the common-sense relation and the INPUT_A, there is the word “police”. Finally, these selected common-sense knowledge also goes through a preprocessing step that performs tokenization, stopwords and numbers removal, word lemmatization, and new words are added to vocabulary V.

Processing and Learning of Common Sense Knowledge by the Neural Network Model DeepCS

The stage in which common sense knowledge is processed is called DeepCS, a neural network designed to learn which knowledge best assists in the classification task. This network receives as INPUT_C, a number of X common sense knowledge related to INPUT_A, and also receives INPUT_B. DeepCS will return a representation of common sense knowledge called MaxSC with highest relation to INPUT_B. With the result MaxSC given by the DeepCS network it is possible to perform processes that result in the application target, for example, in Fig. 2, the result of the multiplication had1 was added to the knowledge representation MaxSC, then further processes were performed until a classification was returned.

In Fig. 4, the architecture of the DeepCS neural network model is presented in detail. The model receives common-sense sentences that are processed by an LSTM, resulting in the output cs. Then, cs is multiplied by a matrix W, where the values are learned by the network, generating the output csw. In addition to common-sense Knowledge relations, the module also receives the INPUT_B that corresponds to the text to be sorted. The csw and INPUT_B are multiplied, as shown in Eq. 2.

$$\begin{aligned} f(Input_b, csw) = csw*Input_b \end{aligned}$$

(2)

From the result of multiplication 2 is extracted the highest value index, called maxIndex. Next, a Hadamard product [5] with csw and INPUT_B information is presented, as shown in Eq. 3, generating the had result.

$$\begin{aligned} f(csw, Input_b) = csw \circ Input_b \end{aligned}$$

(3)

Finally, the greatest common sense relationship is returned by value in the maxIndex position, as shown in Eq. 4. This result is returned by the DeepCS model as MaxSC.

$$\begin{aligned} f(maxIndex, had) = had[maxIndex] \end{aligned}$$

(4)

The DeepCS module is decoupled of the target application and can be reused in several natural language processing tasks, improving the representation and the common-sense learning.

4 Experimental Evaluation

4.1 Target Application A - Stance Classification

With the ease of acquiring information over the Internet, it is becoming increasingly important to identify which news is true in order to avoid the spread of rumors. The Stance Classification task is a subtask in the Rumors verification task and consists of classifying a user response in relation to a target post (a tweet) as a comment, a denial, a question or a support. It is believed that analyzing the stance of a text in response to a target would help in the detection of a rumor. We use the dataset provided by SemEval-2019 for Stance Classification task 7. This dataset is made up of a news text, a user response to the news text, and a label that indicates whether the user response was a Comment, Negation, Question, or Support. This data was taken from the Twitter and Reddit platforms. Figure 5 presents an example of tweet/reply for each of the four classes.

Table 1 presents the statistics of the Stance Classification dataset. It is possible to notice that there is an imbalance between the classes, where comment represents more than 50% of the dataset for both training and testing. Common sense knowledge used in this experiment was downloaded from ConceptNet [10]. With the vocabulary created from the entries, each word should return at most ConceptNet’s three common-sense relationships [10]. This returned 7980 common-sense relationships. For each tweet target given as INPUT_A, five common-sense relations was listed in H. For comparison, two scenarios were tested for the Stance Classification problem. The first scenario describes a neural network model using LSTM that receives only Input A and Input B as input. This scenario does not use common sense knowledge in its processing. In the first scenario the entries are replaced with 100-dimensional GloVe [7] pre-trained Word Embeddings and processed by an LSTM. The hyperparameters values for the LSTM layer are 128 units, activation Tanh, recurrent dropout = 0.2, dropout = 0.2 and return sequences set to false. So, there is a multiplication between INPUT_A and a matrix W learned by the network. Finally, the inputs are multiplied and passed to a softmax function that returns the model result. For training was used batch size of size 3 and the optimizer used was ADAM. In second scenario, the Deep Learning Common Sense Knowledge (DeepCS) module is applied by adding a new input for common sense knowledge. For each input, five common sense sentences were passed to the module. Common sense knowledge relations are also replaced by pre-trained GloVe 100-word Word Embeddings and like other inputs processed by a 128-unit LSTM layer. This model was trained during 20 size 3 batch size epochs.

Table 1. Statistics of the Stance Classification Dataset.

Full size table

Table 2. Experiments Results with the use of DeepCS module in Stance Classification application target.

Full size table

A Table 2 presents the results obtained in scenarios 1 and 2. Analyzing the values, it is possible to notice that scenario 2 (that use the DeepCS module) obtained better results compared to scenario 1, with an increase of 0.02 in the F1-Score metric. All classes presented the best results in scenario 2, except class Support. We argue that this class is very similar to the class Comment.

4.2 Target Application A – Chatbot in the Portuguese Language

Previous experiments, presented in [4], showed improved performance in the task of classifying the best answer in a Chatbot in the Portuguese Language, when applied DeepCS module in this target application. This application consists of a neural network that, given a question, aims to choose the best answer from a set of possible answers. This kind of Chatbot is known as retrieval based. It was used as dialogues corpus of a Chatbot in operation of the Clinic SIM, clinic that operates in the Brazilian Northeast. The corpus was organized with user input, system response, dialog classification, and common sense knowledge statements related to user input. In order to train the network, it is necessary a coherent data set, where system response makes sense with user input, and incoherent, where system response makes no sense with user input. For the training set, 12544 examples were used, divided equally between coherent and incoherent. For the 3136 test set examples, half coherent and half inconsistent, were used. Table 3 presents the results for scenarios 1 and 2 of the dialog system. Scenario 1 corresponds to tests made without the use of common sense knowledge. In Scenario 2, common sense knowledge was added to the neural network model. The results show improvement in all the metrics used in these tests being more noticeable for the Recall metric.

Table 3. Experiments Results with the use of DeepCS module in Chatbot application target.

Full size table

5 Conclusion

This paper proposes a neural network model called DeepCS for the use of common sense knowledge in various tasks in Natural Language Processing, called application targets. Despite the richness and vastness of the common-sense knowledge bases, we argue that common-sense knowledge has to be integrated into the target applications (Text Classification, Dialogue systems, Information Extraction systems, etc.) more effectively. The DeepCS module of common-sense knowledge in Portuguese language, which can be easily coupled in NLU systems in order to leverage their performance. More specifically, the model is composed by a neural network LSTM (Long Short Term Memory) that receives a text from the target application, for example, an user message in a dialog, a response to a user tweet, a news text; and selects and learns what is the best set of common sense relations to return to the target application, which should be considered in the target learning model or system Specifically for this research, the model was used for the Stance Classification task, a rumor detection subtask. Stance classification aims to classify a user reply to a social news post as comment, question, denial or support. Two scenarios were created for comparison. The first scenario did not use the DeepCS model, receiving only the tweet news post and the reply message. Already in the second scenario, it receives as input new source post, reply message and common sense knowledge. The use of the DeepCS model as a source of extra information showed a slight performance improvement in the metric F1-score. The DeepCS model was used in a second application target – a Chatbot in the Portuguese Language. The result in Scenario 2 was better 2%. As future work, it would be interesting to analyze the performance of the DeepCS model in other tasks, for example, tasks that use a single input. Another important point for evolution is to test different parameters and architectures for the neural network model. Regarding common-sense knowledge, using different common sense bases, such as InferenceNet [8], and varying the parameters in the preprocessing of knowledge, are analyzes that can influence the performance of the DeepCS module.

References

Cambria, E., Hussain, A.: Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis. SC, vol. 1. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23654-4
Book Google Scholar
Cambria, E., Hussain, A., Havasi, C., Eckl, C.: Common sense computing: from the society of mind to digital intuition and beyond. In: Fierrez, J., Ortega-Garcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds.) BioID 2009. LNCS, vol. 5707, pp. 252–259. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04391-8_33
Chapter Google Scholar
Cambria, E., Poria, S., Hazarika, D., Kwok, K.: Senticnet 5: discovering conceptual primitives for sentiment analysis by means of context embeddings. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Carvalho, C., Pinheiro, V., Freire, L.: Um modelo para sistema de diálogo fim-a-fim usando conhecimento de senso comum. In: XII Symposium in Information and Human Language Technology and Collocates Events (2019)
Google Scholar
Horn, R.A.: The hadamard product. In: Proceedings of Symposia in Applied Mathematics, vol. 40, pp. 87–169 (1990)
Google Scholar
Minsky, M.: The Society of Mind. Simon & Schuster, New York (1985, 1986)
Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Pinheiro, V., Pequeno, T., Furtado, V., Franco, W.: InferenceNet.Br: expression of inferentialist semantic content of the portuguese language. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 90–99. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12320-7_12
Chapter Google Scholar
Silveira, R., Furtado, V., Pinheiro, V.: Learning keyphrases from corpora and knowledge models. Nat. Lang. Eng. 1–26 (2019)
Google Scholar
Speer, R., Chin, J., Havasi, C.: Conceptnet 5.5: an open multilingual graph of general knowledge. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Young, T., Cambria, E., Chaturvedi, I., Zhou, H., Biswas, S., Huang, M.: Augmenting end-to-end dialogue systems with commonsense knowledge. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidade de Fortaleza, Fortaleza, Brazil
Cecília Silvestre Carvalho & Vládia C. Pinheiro
Universidade Federal do Ceará (UFC), Fortaleza, Brazil
Lívio Freire

Authors

Cecília Silvestre Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Vládia C. Pinheiro
View author publications
You can also search for this author in PubMed Google Scholar
Lívio Freire
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cecília Silvestre Carvalho .

Editor information

Editors and Affiliations

University of Évora, Evora, Portugal
Paulo Quaresma
University of Évora, Evora, Portugal
Renata Vieira
University of São Paulo, São Carlos, Brazil
Sandra Aluísio
University of Lisbon, Lisbon, Portugal
Helena Moniz
INESC-ID/ISCTE-IUL, Lisbon, Portugal
Fernando Batista
University of Évora, Evora, Portugal
Teresa Gonçalves

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carvalho, C.S., Pinheiro, V.C., Freire, L. (2020). A Deep Learning Model of Common Sense Knowledge for Augmenting Natural Language Processing Tasks in Portuguese Language. In: Quaresma, P., Vieira, R., Aluísio, S., Moniz, H., Batista, F., Gonçalves, T. (eds) Computational Processing of the Portuguese Language. PROPOR 2020. Lecture Notes in Computer Science(), vol 12037. Springer, Cham. https://doi.org/10.1007/978-3-030-41505-1_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-41505-1_29
Published: 24 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41504-4
Online ISBN: 978-3-030-41505-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Deep Learning Model of Common Sense Knowledge for Augmenting Natural Language Processing Tasks in Portuguese Language

Abstract

Similar content being viewed by others

Sense representations for Portuguese: experiments with sense embeddings and deep neural language models

Bilinear Fusion of Commonsense Knowledge with Attention-Based NLI Models

Common Sense Knowledge in Large Scale Neural Conversational Models

Keywords

1 Introduction