Keywords

1 Introduction

Internet has left a significant mark in all fields, such as e-commerce, science and technology, education and research, and telecommunication. From the past couple of decades, the research and development of the Web services hasbecome exponential and accelerated by many cutting edge technologies such as big data and cloud computing. Popular service providers on the Web such as Netflix, Last.fm music and Amazon are trying to promise satisfaction to their customers by predicting their interests toward the domain by means of recommender systems.

Recommender systems are the providers of personalized recommendations that exist in various types with respect to the strategy used, out of which the first one is content based (CB), in which recommenders try to analyze the users’ access sequence; the second one is collaborative filtering (CF) that tries to aggregate the interests of the neighbors of a user. Over a period, hybrid recommenders evolved that combined the features of CB and CF to make suggestions better. The recommenders generally focus on the patterns of the navigation sequence of the customer by means of the user’s past history. The log file in the server can be the source of finding the access patterns of a user under various types of criteria.

Broadly, there are two issues identified in this scenario where the first one is if these patterns do not consider the true semantics behind the access patterns, then the outcome will limits the quality of prediction. That means the recommender must have domain knowledge to provide meaningful suggestions, so that the user can be satisfied. For instance, if the user is accessing a movie portal such as Netflix, then the access patterns must be simplified to the genre of the movie rather than the title of the movie, assuming that the genres will say the semantic of a movie page. To achieve this, one has to construct and incorporate the knowledge using the ontology of that particular domain, and the real challenge is in constructing the knowledge with reasonable efficiency. The second issue is recommending according to the dynamics in the user’s interest that drifts from one concept to another.

The patterns identified for a customer even with knowledge hardly gives the user profile little historic. If the user is with another concept which is not in the pattern, then it definitely leads to dissatisfaction of the user. For example, assume that the recommender stored the access pattern concept for a user as “romantic movies” and suggests accordingly, and assume that the user is currently accessing a set of “action movies”; then it definitely leads to the dissatisfaction of the user proving lag in the predictive accuracy.

In this paper, a recommender system is proposed that is focused on resolving the two above-mentioned issues. To accomplish this, the proposed system includes the following tasks:

  • Develop a methodology to construct the domain knowledge to identify the concepts.

  • Develop a model to find the sequence patterns by integrating the knowledge.

  • Propose a recommendation strategy that also identifies the concept drift in the access pattern and suggests accordingly.

The experimental results carried on benchmark data sets clearly show the improvement in the performance of the proposed framework when compared with the popular existing models and also prove the importance of analyzing the interest drift of the user by evaluating with appropriate measures.

The rest of the paper is organized as follows. In Sect. 2, related work is presented. In Sect. 3, the architecture and design details of the proposed recommender strategy are described. Section 4 analyzes the implementation and experimentation part. Finally, Sect. 5 gives the conclusion followed by references.

2 Related Work

Adomavicius and Tuzhilin, in [1], discussed the classification of the recommenders as content based [2, 3], collaborative filtering [4] and hybrid methods [5, 6]. Although many of the researchers kept their efforts in improving the accuracy of the recommender through the technique used [7], some focused on metrics such as “diversity” (average dissimilarity among all recommendation pairs) [8], individual diversity (average dissimilarity of recommendation pairs limiting to a user) [9] and aggregate diversity (average of dissimilarity of all users) [10] as important as accuracy to satisfy thebuser. Some works [11] proposed recommenders considering a new kind of metric “novelty” (amounts to the user’s surprise w.r.t. the time for searching a page or item) in the evaluation of recommendations and act accordingly. But all these studies did not consider the concept of the item or the user and did not try to identify the drift of concept.

In general, the access sequence patterns can be learned by probabilistic algorithms and association analysis [12]. Ezifie and Y. Lu proposed a sequence pattern mining algorithm using a tree structure called PLWAP-Mine [13] which showed better results than other pattern mining techniques. In [14], Nguyen proved that PLWAPMine integrated with the Markov model can enhance the performance of mining. However, these algorithms consider only the usage history, but not the semantics at all which lags the quality of recommendations.

L. Wei and S. Lei made a model by integrating the ontology with usage mining, so that the patterns are subject oriented rather than item or page oriented to improve the performance of the recommender [15]. There were several ontologies constructed on different domains such as personalized e-learning and software to generate the recommendations of the ontology with the significant terms in the web site used [3]. S. Salin and P. Senkul proposed applying these domain concepts even on access sequence instances and then tried to make accurate suggestions [16]. These studies did not consider the dynamics of the ontology instance and also made no focus on the efficiency of constructing the ontology.

3 Recommender Model with Conceptual Semantics

The proposed model for recommenders based on concept and its dynamics can be defined in three layers as shown in Fig. 1. In the first layer, the construction of ontology for the domain will take place by manual, automatic or semi automatic approaches. Though several techniques are available in constructing the ontology, there still exists demand for the customized domains depending on the purposes. To accomplish this task, the informal information provided is Web content; somehow it seems tedious to process the huge provided content. Instead, one can make use of the informal information available in or associated with a Web page such as title, tags or URL of the page to construct the domain knowledge. Typically, the second layer of the model is to find the patterns of the user’s access sequence. Generally, the sequence pattern modeling uses the Web usage log after some traditional preprocessing techniques to find patterns. However, the proposed model extends preprocessing in its way to get semantic log, so that concept-oriented patterns can be extracted accordingly. In the model, the recommendation strategy is defined in the third layer that uses not only the patterns provided in the second layer to assess the user’s navigation, but also the domain ontology constructed in the first layer to identify the existence of the concept drift if any. Finally the recommendation strategy in this layer gives the suggestions of the items or Web pages to the user.

Fig. 1
figure 1

Layered architecture of the proposed recommender system

3.1 Conceptual Knowledge Construction

Here, the title of the page and the tags provided for a Web page are used as sources to derive concepts to generate the ontology. Generally, the tags and title of the Web page contains key terms that represent the content of the page. The idea is to define the concepts from these key terms depending on the number of occurrences and combinations of the terms. This task can be accomplished by following the steps that involve defining concepts and relationships among concepts.

3.1.1 Defining Concepts

Let {T1, T2…Tn} be the titles and tags of m number of pages or items in the domain.

  • Step 1: The stop word removal technique is applied to the titles and tags of the pages, so that only a set of raw terms {w1, w2…wk} will be derived in each page title.

  • Step 2: Find frequent terms by means of the association analysis technique which gives the significance of a set of terms that can be assumed as concept Ci.

  • Step 3: From the derived concept set C = {C1, C2…Cm}, identify the most generalized and specialized concepts.; 2

  • Step 4: Identify a relationship from all the concepts to at least one of the mentioned generalized concepts.

3.2 Sequence Pattern Mining with Concepts

This layer of model gives the access patterns of the user as per the domain ontology constructed and stored with concepts and relationships among concepts. This task can be of two parts, where the first one is about preprocessing and the second one is pattern mining.

3.2.1 Preprocessing

The literature provides many data preprocessing techniques as four different classes (data cleaning, reduction, integration and transformation) to make mining qualitative in terms of accuracy. Here, the Web log contains the access records of all the users of all sessions for a particular period. This log information will act as raw input for extracting the navigational sequences of the user. Generally, the record in a log contains the IP address, time stamp of access, title of the page, URL of the page, protocol used, session id and typically the tags of the page. Data reduction is the first step that is applied to remove unnecessary fields such as protocol name in the log table. Information such as time stamps or session ids must be transformed to the format that can be processed. This log will be given for data selection so that only the records of that particular user will be extracted with its session ids (Fig. 2).

Fig. 2
figure 2

Preprocessing of the log to extract the semantic log

After making traditional preprocessing, the log is applied with advanced techniques to transform it as semantic log. In this model, the titles of the page are applied with stop word removal techniques to avoid the terms such as ‘the’, ‘are’ and ‘is’. Thereafter, the term extraction technique will consider the raw terms and their combinations by means of the frequency measure. Then, annotation is made to make the records in the user’s input log represented with the concepts for which the domain ontology is used to annotate the relevant concept label for each record. Finally, the aggregation step shows the access records with conceptual information.

3.2.2 Mining Patterns

Once the semantic log is constructed, the sequential learning method will be applied to get patterns. To get patterns, the proposed model uses the basic theme of association patterns extraction by the TITANIC algorithm [14], which outperforms in constructing the lattice of the concept for a user.

The above algorithm gives the access sequences as a set of combinations preserving the order of a particular user results the personalized lattice of concepts. The personalized concept lattice is the hierarchy of concepts in a tree structure constructed using the support and confidence measures.

Algorithm to find conceptual patterns:

3.3 Recommendation Strategy

The final layer in the architecture deals with the recommendation strategy to suggest pages by means of the knowledge gained in the first layer and the patterns from the second layer. The primary task in this step is to identify the recent concept pattern of the current session. Thereafter, if the current concept matches any part of any of the existing patterns saved for the user, then suggestions has to be made accordingly. If the pattern matches part of the existing pattern but not with the current concept, then the suggestion will switch away from the traditional recommendation path to the current concept pages dynamically. Thus, the concept drift of the user’s interest identified and the suggestions will be changed dynamically by the proposed method.

4 Experimentation Results

The experimental setup to evaluate the proposed model is kept on a benchmark dataset Movielens of two variants with 100 k and 1 M ratings. The one with 100 k ratings is provided for 1682 movies by 943 users, whereas the other one is provided for 3900 movies by 6040 users. The ratings are on the scale from 1 to 5, defining 1 for low quality and 5 for high quality. To evaluate the methodology, the mean absolute error (MAE) measure is used, and to evaluate the efficiency of recommendations, hit ratio [6] measure is used:

$$ MAE = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left| {ar_{i} - pr_{i} } \right|}}{n}, $$
(1)
$$ Hit\,ratio = \frac{Ar}{Re} $$
(2)

where {ar1, ar2…arn} are the actual ratings, {pr1, pr2…prn} are the predicted ratings, ‘Ar’ is the total number of recommendations accessed by the user and ‘Re’ is the total number of recommendations. This experimentation compared the proposed model with one of the popular existing usage model PLWAP for the two variants of the data set for the top ten recommendations.

Table 1 shows the summary of the hit ratio and MAE of the proposed model as well as the existing model. It clearly says that the proposed model outperforms the PLWAP model in terms of the number of suggestions that are accessed by the user that is relevant to the user interest.

Table 1 Experimentation values for the two recommenders

5 Conclusion

The typical recommenders based on usage history cannot use the semantics and will not consider the concept drift of user interest. This paper made study on the recommenders with concepts as well to find interest drift of the user on top of concepts in the sense to make user satisfied. The proposed model constructs the ontology, mines the patterns and applies on the current Web access sequence of the user. The proposed model was evaluated by comparing with popular existing methods and the results showed that the model outperformed in terms of performance.