Keywords

1 Introduction

A recommender system is an information filtering technology that can be used to predict ratings for items (like products, services, etc.) and/or generate a custom item ranking that may be of interest to the user [1]. In their traditional form, recommender systems consider only the items that users have accessed, bought or evaluated positively, thus ignoring any other information that might enrich the recommendation process. One type of information that may enrich the process is contextual information. For example, when recommending a restaurant to a user, the system may consider the context “Day of the Week”. At weekends the user may prefer snack bars while on other days he/she may prefer less caloric meals.

In this way, there are the context-aware recommender systems that, unlike traditional systems, also consider contextual information to generate the set of recommendations. The term “context” may assume different definitions depending on the area in which it is being used. In the area of recommender systems, the definition most used and adopted in our work was proposed by Dey [8]. According to this author, “Context is any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves.”

Many authors have been working with new context-aware recommender algorithms. However, there is a lack of automatic techniques for extracting context. With the emergence of Web 2.0, users have enriched sites with contextual information through texts in social networks, comments and mainly through reviews. These reviews are usually in the form of textual comments, in which users explain why they liked or disliked an item based on their own experiences. According to Chen et al. [7], the incorporation of important information extracted from reviews can benefit the recommender systems, solving the sparse data and the cold-start problems. Reviews can provide relevant information that can be used by recommender systems, including the contextual one.

Thus, in this work, we propose a context-aware recommender method based on text mining (CARM-TM) that includes two context extraction techniques: (1) CIET.5\(_{embed}\), a technique based on word embeddings; and (2) RulesContext, a technique based on association rules. CARM-TM makes use of context by running the CAMF algorithm, a context-aware recommender based on matrix factorization. To evaluate our method, we compare it against the MF algorithm, an uncontextual recommender system based on matrix factorization. The evaluation was conducted in a dataset of reviews from Yelp and showed that our method provided better results than the MF algorithm in most cases.

This paper is structured as follows: in Sect. 2, we describe some related works about context extraction based on text mining. In Sect. 3, we present our proposal, a context-aware recommender method based on text mining (CARM-TM). We evaluate and discuss the main findings in Sect. 4. Finally, in Sect. 5, we present conclusion and future work.

2 Related Work

In this section, we present some related works that extract context from reviews or from other textual sources.

Li et al. [14] investigated available restaurant reviews and four types of contextual information for a meal. They developed algorithms with existing natural language processing tools to extract these types of contextual information from restaurant reviews. Hariri et al. [9] obtained contextual information by mining hotel reviews written by users. Their approach is based on using a classifier which is trained by the description sample and their corresponding contexts.

In [3], Bauman and Tuzhilin presented a method to find relevant contextual information from reviews of users. In this method, the reviews are classified as “specifics” and “generics”, and the context is extracted from the specific reviews by using two methods: “word-based” and “LDA-based”. Chen and Chen [6] extracted contexts employing a keyword matching method.

Kim et al. [10] presented a recommendation system model called Convolutional Matrix Factorization (ConvMF). The model integrates convolutional neural networks into probabilistic matrix factorization in order to capture contextual information (adjacent words) of the documents. Sulthana and Ramasamy [15] proposed an Ontology and Context Based Recommendation System for the book domain that uses a Neuro-Fuzzy Classification approach.

In [16], we proposed the CIET.5\(_{embed}\), a textual context extraction technique based on word embeddings model that was used with neighborhood-based contextual recommender systems. This technique is implemented in our CARM-TM, so it is detailed in Sect. 3.4. In addition, in this paper, we also propose for the CARM-TM, the RulesContext technique, that extracts association rules from user reviews and transforms them into contextual information to be used in recommender systems. In the next section, we present our context-aware recommender method (CARM-TM).

3 Context-Aware Recommender Method Based on Text Mining (CARM-TM)

In this work, we propose the CARM-TM, a context-aware recommender method that uses text mining techniques to extract contextual information from reviews to make recommendations. The CARM-TM, illustrated in Fig. 1, has 5 steps which are explained in the next subsections.

Fig. 1.
figure 1

Overview of the Context-Aware Recommender Method based on Text Mining (CARM-TM).

3.1 Step 1 - Preprocessing

The input of our method is a dataset, where which row contains a user identification, the identification of the item evaluated by the user together with the evaluation value, a textual content containing reviews/opinions about the item, and the date when the evaluation was made.

The step 1 is responsible for preparing the dataset for both the context extraction and the recommendation steps. In this step, the data are filtered, excluding those without textual content or other important information such as the user or item identification. In addition, users, items and reviews that are less relevant are excluded by using the exclusion criterias in [6]: (1) users with 1 review; (2) items with less than 15 reviews; and (3) reviews with less than 3 sentences. Besides filtering, we also create a file for each review.

3.2 Step 2 - Cleaning

In the step 2, the textual content goes through a cleaning in order to eliminate special characters such as @, \(*\), \(\#\) and  &. These characters may negatively influence the context extraction process. Then, the cleaned texts can pass through a normalizer in the step 3 or they can be directly used by the context extraction technique (step 4).

3.3 Step 3 - Normalization

Normalization is optional and aims to solve problems commonly encountered in texts written by users, like typos, spelling mistakes, abbreviations, etc. In this work, we used the TextExpansionFootnote 1 tool to normalize the texts.

3.4 Step 4 - Context Extraction

The main step of our method is the fourth step, which consists of extracting contextual information from reviews. Here, we can adopt different text mining techniques. For this work, we use the CIET.5\(_{{embed}}\) technique, proposed by us in [16]. In addition, we also propose the RulesContext, a new technique for context extraction that extracts association rules from reviews to be used as contextual information in recommender systems. Both techniques are detailed in the following subsections.

Contextual Information Extraction Technique Based on Word Embeddings (CIET.5\(_{{\varvec{embed}}}\)). Proposed by us in [16], this technique consists of combining two types of representations (bag of words and word embedding model) that allow to raise the volume and quality of information, the latent relationships among terms from documents, and the interpretability of the generated text representations. The CIET.5\(_{embed}\) technique is composed of five complementary steps (Fig. 2), which aim to transform a set of text documents into a set of contextualized documents.

Fig. 2.
figure 2

Adapted from Sundermann et al. [16].

Overview of the Context Extraction Technique CIET.5\(_{embed}\).

In Fig. 2, the Preparation step sends all documents to a textual enrichment, which consists of named entity and concept recognition. In the Delimitation step, the documents already prepared are submitted to a process to delimit the textual scopes, like paragraphs and sentences. In the Modeling, a language model based on word embeddings, previously trained with external source of documents, is retrained with the internal documents. Then, we identify contexts in the internal documents (Contextualization step). The terms of each sentence are processed in the language model to find their most related terms, by using for example the cosine measure. Finally, in the step Extraction, the contexts are extracted from the documents by using a comparative threshold.

Contextual Information Extraction Technique Based on Association Rules (RulesContext). Proposed in this work, the RulesContext is a technique that extracts association rules from reviews and transform them into contextual information to be used in context-aware recommender systems.

Association rules are widely used in the literature to find correlation among items on a given database [2]. The association rules are presented on the format \(LHS \rightarrow RHS\), where LHS stands for left hand side and RHS for right hand side, both of them contains a set of items such as \(LHS \cap RHS\) = \(\emptyset \).

The RulesContext technique is executed in four steps, as illustrated in Fig. 3. In the step Separation by item, the texts are separated by item, i.e. subsets of texts are grouped for each item that can be recommended. Each subset is composed of the reviews’ texts about the item.

In the second step (Preprocessing and Preparation), the texts are preprocessed, i.e. the stopwords are removed and the terms are stemmed. Besides, each subset of texts is transformed into a transaction.

Fig. 3.
figure 3

Overview of the RulesContext technique.

In the third step (Extraction of Association Rules), we extract the association rules from each subset. To extract the rules, we use the algorithm apriori [2]. This algorithm extracts the rules in 2 steps by combining the items on a given dataset and calculating the measures support and confidence for each rule. After extracting the rules, we use the mutual information (MI) measure to evaluate them. The MI measures how dependent the items on the LHS and RHS are. The mutual information is presented in Eq. 1.

$$\begin{aligned} MI(LHS \rightarrow RHS) = Support(LHS \cup RHS) log(Lift(LHS \rightarrow RHS)). \end{aligned}$$
(1)

After extracting the association rules, in the step four Rules-into-Context Transformation, the rules are transformed into contextual information as illustrated in Fig. 3.

3.5 Step 5 - Context-Aware Recommender Systems

In the fifth step of the CARM-TM, the contextual dataset generated by one of the previous Context Extraction Techniques is used as input by a Context-Aware Recommender System, together with the user and data items obtained in the Preprocessing. Contextual information is considered for recommendation according to the type of recommender system that is being used by the method. Latent factor models look for finding hidden features or patterns in the training data, also called factors, that are used to make the recommendations. Some of the most successful latent factor models are based upon matrix factorization techniques, such as the one presented by [11], which combines good accuracy and scalability. For this reason, in this paper, we use, as baseline, the matrix factorization algorithm (MF) [12], and as context-aware system, the context-aware matrix factorization algorithm (CAMF) [1].

According to Aggarwal [1], the recommendation training data consists of ratings given by users to sets of items, which are organized into the ratings matrix R. This matrix, given m users and n items, is of size \(m \times n\) and the entry \(r_{x_i,y_j}\) corresponds to the rating given by user \(x_i\) to item \(y_j\). The main purpose of matrix factorization is to decompose this matrix, R, into two approximate smaller matrices, X and Y, seeking to find k latent factors, which are hidden features or patterns in the training data, for the m users and the n items, respectively.

Considering that our purpose is to predict the unknown ratings in the matrix R, it is possible to use the inferred matrices X and Y to compute an approximate rating prediction. The predicted rating \(\hat{r}_{x_i, y_j}\) is given by the cross-product of the user-factors vector and the item-factors vector, as shown in Eq. 2, such that \({\varvec{x}}_{i}\) corresponds to the factors inferred for user \(x_i\) and \({\varvec{y}}_{j}\) to the factors inferred for item \(y_j\).

$$\begin{aligned} \hat{r}_{x_i,y_j} = {\varvec{x}}_{i} \cdot ({\varvec{y}}_{j})^T \end{aligned}$$
(2)

In order to obtain the factors vectors, the system should minimize Eq. 3, by using the training data (set S) and some optimization algorithm, such as Stochastic Gradient Descent [4] or Alternating Least Squares [13]. Regardless of the chosen algorithm, the parameters k and \(\lambda \) must be optimized. The first parameter corresponds to the number of latent features used to model the recommendation data. It is responsible for making the model simpler or more complex, depending on how much complexity is needed to capture all of the latent dimensions of the input data. The second parameter (\(\lambda \)) is used to weight the regularization constraint, in order to prevent overfitting. This algorithm is usually called MF, i.e. matrix factorization algorithm.

$$\begin{aligned} J = \sum _{(x_i,y_j) \in S} (r_{x_i,y_j} - {\varvec{x}}_{i} \cdot ({\varvec{y}}_{j})^T)^2 + \lambda (\left||{\varvec{x}}_{i}\right||^2 + \left||{\varvec{y}}_{j}\right||^2) \end{aligned}$$
(3)

Matrix factorization techniques are not exclusive to traditional recommender systems, Aggarwal [1] describes a method based on pairwise interactions that is suited to the context-aware recommendation task. The central idea in pairwise interaction algorithms is to decompose the ratings tensor R into n factor matrices, such that the first two correspond to users (U) and items (V) and the others correspond to the contextual variables (\(C_a\), \( 1 \le a \le n-2\)). This new matrices are then used to make the rating prediction (\(\hat{r}_{i, j, c_1\cdots , c_{n-2}}\)) for user i, item j and contexts \(c_1, \cdots c_{n-2}\), by multiplying them in a pairwise manner, as shown in Eq. 4.

$$\begin{aligned} \hat{r}_{i, j, c_1 \cdots , c_{n-2}} = (U V^T)_{ij} + (U C_1^T)_{ic_1} + (U C_2^T)_{ic_2} + \cdots + (C_{n-3} C_{n-2}^T)_{c_{n-3}c_{n-2}} \end{aligned}$$
(4)

In order to obtain this matrices, the following equation must be minimized (Eq. 5) using some optimization algorithm. The parameter \(\lambda \) is used for regularization purposes and the set S consists of specified ratings. This algorithm is called CAMF, i.e. context-aware matrix factorization algorithm

$$\begin{aligned} J = \sum _{(i, j, c_1\cdots , i_{n-2}) \in S} (r_{i, j, c_1\cdots , c_{n-2}} - \hat{r}_{i, j, c_1 \cdots , c_{n-2}})^2 + \lambda (|| U ||^2 + || V ||^2 + \sum _{a=1}^{n-2} || C_a||^2) \end{aligned}$$
(5)

The output of our context-aware recommender method is the recommendations generated by the context-aware recommender systems using the two types of context extracted by the CIET.5\(_{embed}\) and the RulesContext techniques. In the next section we present the empirical evaluation conducted with our proposal.

4 Empirical Evaluation

For this work, we carried out two different evaluations. In the first one, we compare the CARM-TM method, with the CIET.5\(_{embed}\) and the RulesContext techniques, against the uncontextual MF algorithm (baseline). With this evaluation, we aimed to demonstrate the impact of the use of contextual information extracted by the CIET.5\(_{embed}\) and RulesContext techniques in the contextual CAMF recommender systems. Additionally, in the second evaluation, we use the CARM-TM method to compare the CIET.5\(_{embed}\) against the RulesContext, in order to identify which contextual extraction technique provides the best recommendations.

4.1 Dataset

The dataset used in the empirical evaluation was the RecSys dataset for the recommender system challenge, ACM RecSysChallenge 2013, proposed to the customization of recommendations for YelpFootnote 2 users. In the Yelp website the users can evaluate businesses through reviews. In these reviews, it is possible to evaluate the item by leaving a rating in the format of stars (from one to five stars). In addition, the user can write a text explaining his/her opinion about the establishment and the reason for which he/she gave a certain note. The RecSys dataset contains 11,537 items (businesses), 45,980 users and 229,901 reviews.

4.2 Experimental Setup

To measure the predictive ability of the recommender systems, we used the All But One protocol [5] with 10-fold cross validation, where the set of documents were partitioned into 10 subsets. For each fold, we used \(n-1\) of these subsets for training and the rest for testing. The training set \(T_r\) was used to build the recommendation model. For each user in the test set \(T_e\), an item was hidden as a singleton set H, and the remaining items represent the set of observable items O used in the recommendation. Based on 10-fold cross validation, we computed Mean Average Precision for 10 recommendations (MAP@10) and to compare two recommendation algorithms, we applied the two-sided paired t-test with a 95\(\%\) confidence level.

For the CIET.5\(_{embed}\) technique, we considered the threshold values: 0.25, 0.50 and 1.0. The context sizes were 4 and 10 words. Altogether, we used 6 different configurations (3 threshold values \(\times \) 2 context sizes). These values were adopted according to the best results obtained in our previous work [16].

Regarding the RulesContext technique, we generated the association rules using a minimum support value equals to 10% and a confidence value equals to 50%. To select the most relevant rules, we used cut percentages of the MI measure equal to 50% and 75%. From the most relevant rules, we selected sets with the top 5, 10 and 20 rules, totalling 6 combinations (2 MI values \(\times \) 3 set sizes).

The MF and CAMF algorithms were executed 20 times, varying the values of k and \(\lambda \) (see Sect. 3.5). The used values were: k = 5, 10, 50 and 100; and \(\lambda \) = 1, 10, 100, 150 and 200.

4.3 Results

In this section, we first present the results of our proposal with the CIET.5\(_{embed}\), and then, with the RulesContext technique. Both against the baseline MF. In Tables 1 and 2, the values that are statistically significant (p-value > 0.05) are with an asterisk and the highest values are in boldface. At the end, we compare the results between the CIET.5\(_{embed}\) and the RulesContext techniques.

Table 1 presents the results of our proposal, with the CIET.5\(_{embed}\) technique and the CAMF algorithm, against the baseline MF. We refer to each contextual information as Size_Threshold. For example, the type of context 4_025 represents the context with 4 words that were extracted considering the value of threshold 0.25. In Table 1, we can observe that, in general, the results were very satisfactory compared against the baseline MF. Contextual information 10_1 provided the best results. However, the context 10_025 presented the best value of MAP@10 for the parameters k equals to 5 and \(\lambda \) equals to 150. We must emphasize that this combination of parameters resulted the best results for these experiments in particular.

Table 1. Comparing the results (MAP@10) of our proposal, with the CAMF contextual recommender algorithm using contexts extracted by the CIET.5\(_{embed}\) technique, against the results of the baseline MF.

The results of our proposal, with the contexts extracted by the RulesContext technique, are presented in Table 2. There, we refer to each contextual information as MI_NumberOfRules. For example, the type of context 50_5 means the top 5 rules (contexts) extracted using the cut percentage of the MI measure equals to 50%. Again, the results were very satisfactory. We can observe that the contexts extracted with the cut of MI equals to 50% presented the best results in most of the cases, with the highest value of MAP being provided by the context 50_20.

Table 2. Comparing the results (MAP@10) of our proposal, with the CAMF contextual recommender algorithm using contexts extracted by the RulesContext technique, against the results of the baseline MF.

Analyzing the parameters used in the experiments, we observed that for CIET.5\(_{embed}\), the best result was obtained with a context size equals to 10 and a threshold equals to 25%. For the RulesContext technique, the MI cut that generated the best results was 50%, that is, using a greater number of rules and selecting the best ones. Regarding the matrix factorization parameters, the best results were obtained with k equals to 5 and \(\lambda \) equals to 150, that is, a model with low complexity level and with a relatively high value of the parameter that controls overfitting.

Finally, we present in Fig. 4 the comparison between the CIET.5\(_{embed}\) and the RulesContext techniques. There, we compare the best MAP@10 (vertical axis) varying the values of \(\lambda \) and k (horizontal axis). In most cases, the CIET.5\(_{embed}\) performed better than the RulesContext technique. However, in two cases, 10_5 e 150_5, the RulesContext technique was superior. In Fig. 4, we can also observe that our proposal with the context extraction techniques outperformed the baseline MF in all cases.

Fig. 4.
figure 4

Comparing the MAP@10 values between CIET.5\(_{embed}\) and RulesContext techniques.

5 Conclusion and Future Work

In this work, we proposed a context-aware recommender method based on text mining (CARM-TM). Our method uses the context extraction techniques CIET.5\(_{embed}\), which is based on word embeddings; and RulesContext, which is based on association rules. For this work, our proposal used the CAMF contextual algorithm to generate the contextual recommendations.

The evaluation was conducted by using the Yelp dataset and the uncontextual MF algorithm as baseline. Our method provided better results than the baseline in all cases. Using the CIET.5\(_{embed}\) technique, we obtained good results in most cases. However, the best MAP@10 value was provided using the context 50_10 extracted by the RulesContext technique.

As future work, we will evaluate our proposal with other datasets and context-aware recommender systems. In addition, we will combine both context extraction techniques, CIET.5\(_{embed}\) and RulesContext. We will also work on the proposal of a new method for context extraction by using opinion mining.