1 Introduction

The prevalence of smart devices has brought more accessibility to different multimedia contents through the social web. Also, the ability to embed our living environment with small and cheap sensors enables and accelerates the development of context-aware recommendation systems that can obtain the most desirable content in a specific context. The combination of real-time access to different multimedia contents and the overwhelming content options available on the social web increases the challenge of building a personalized search model. For these reasons, this field of study, related to helping users browse and consume multimedia contents, has been a challenging issue for researchers over the past few years [33, 40].

Different users have different personalities [26]; likewise, a user’s choice of which multimedia content to consume varies with different parameters, which include emotions and various health conditions [23]. This necessitates the integration of user preferences and physiological context in the development of a recommendation algorithm. Using today’s technology, we can obtain a user’s various physiological functions and monitor them using biomedical sensors, with the help of computer systems.

In this paper, we focus on recommending multimedia content to users based on their instantaneous requirements, which can be represented by categorizing their contexts into physiological and environmental parameters, in order to enhance their experience and comfort level. Although a broad number of contextual parameters can be considered, we only show certain context conditions in this paper. Particularly, we highlight the importance of the physiological aspect of the user’s context during the recommendation process. However, the proposed model can also be applied to accommodate other contextual information that will help personalize the search.

As people share their preferences and information on different popular social media sites, they share important information about their consumed multimedia contents including their descriptive experience. A huge number of users on Last.fm, Youtube, and Facebook upload media content with annotation tags that describe their uploaded type of media. Our method benefits from the availability of such social tags by exploring context-based item recommendations and by avoiding the unrealistic need to analyze innumerable items on the Internet in order to detect the context. Accordingly, our recommendation technique offers a feasible solution to find relevant media content, based on the detected contextual information, and benefiting from the large number of existing online users who contribute to the classification and annotation of different media, which is more effective than just relying on feature extraction.

Inspired by our earlier work [2], our proposed recommendation model makes the following contributions toward recommendation techniques. First, we utilize existing online social networks by incorporating social tags and rating information in ways that personalize the search for the right content, for a particular detected context. Second, we propose a recommendation algorithm to improve the user experience and satisfaction by the use of a biosignal in the recommendation process.

The rest of this paper is organized as follows: Section 2 briefly reviews the existing work related to context-aware recommendations. Section 3 presents the detection of the user physiological context. Section 4 provides details concerning the proposed context-aware recommendation model. Section 5 highlights some scenarios and the usability of the proposed model by presenting an implemented recommender application. Finally, we present the experimental method and the related results and conclusion in Sections 6 and 7.

2 Related work

In this section, we briefly bring attention to the different existing recommendation techniques, and particularly to the ones based on context-awareness. We also review existing approaches that explore context detection and context-aware computing during the recommendation process.

2.1 Recommendation techniques

In recent years, the development of recommender systems has attracted considerable research interest in an attempt to discover the relation between users and the items they consume. The studies analyzed the hidden relationships between users and items, and proposed several methodologies that became today’s ways of establishing recommendations. The first traditional recommendation approach is Content-Based Filtering (CBF). The CBF approach analyzes the item content and creates a profile for each by assigning features such as type, category, and content special attributes. It then relies on profile matches, which contain user-content explicit ratings of other item profiles. Such an approach is commonly used in building music or movie recommender systems, and needs the user’s historical selection of content in order to predict the ranking of new items.

Another traditional method is Collaborative Filtering (CF), which recommends items based on the relationship of users rather than only items. This approach relies on the identification of similarities between users who share the same interests. As opposed to CBF, CF gives more attention to social aspects and the feedback of similar users to unknown user-item rating.

At a later time, researchers combined CBF and CF in order to build much improved versions of these traditional methods. A hybrid method is proposed to overcome a problem that exists in CF, where they only consider users to be close to each other based on the calculated distance, when in reality they might have completely different tastes. Another problem is that new users do not have enough history to feed the recommender system and allow it to find relevant items. This is a well-documented problem known as the cold-start problem [37].

Contextual information boosts the effectiveness of the traditional recommender systems, which consider only the user, items, or their related information to find the next expected item to be selected. For instance, the user may like to listen to a particular type of music while running in the morning and not at other times of the day. Accordingly, if the context of the user has been well detected, the recommendation results might be completely different from the traditional ones, and better suited to the user’s needs [33].

Some studies combine the existing recommendation approach with some contextual information. For instance, Woerndl et al. [36] apply a combination of content-based filtering, collaborative filtering, and a hybrid method on the dataset of mobile application recommender systems. The authors present the effectiveness of applying the hybrid approach as a successful way of accommodating a contextual type of information. Campos et al. [9] presented the recommendation model that combines CBF and CF. Context is also considered as a content feature and it is assumed that context elements are independent, so they can apply the Bayesian network algorithm on them. Another hybrid approach is presented by Lekakos and Caravelas [22], which is to recommend movies by analyzing the content and the collaborative relation between users. Bogers [4] proposed a Markov random walk algorithm over a contextual graph on a movie dataset. The user rating and tags, the movie genre, and the actors are all considered as contextual information in the graph. Cai et al. [6] extracted emotion tags available on a web document in order to match them with music lyrics. In a similar focus, Hyung et al. [18] use textual inputs from the user to find similar documents using a Latent Semantic Analysis approach (LSA) and a Probabilistic Latent Semantic Analysis (PLSA), which leads to a better recommendation of music. Another study using music extracted features as a context for music recommendation is proposed by Yoon el al. [38]. Han et al. [13] propose a music recommendation model that transits the user’s emotional state from one to the other using the recommended music. Authors use a proposed ontology to model the relationship between music items and the related emotion.

More recently, recommender systems rely on social networks in order to collect vast amounts of information about the user, their multimedia choices and their behavior. For instance, Zangerle et al. [42] present a facilitation approach to collect a user’s music dataset from Twitter, assuming the user’s music player shares such information on Twitter. Pessemier et al. [29] discussed an activity detection in a mobile environment as supportive information for the recommendation process. By using an attached accelerometer on a mobile device, the authors were able to detect four basic activities: running, walking, standing and cycling. Based on the detected activity and on other contextual information such as location and weather, the recommender system can fetch different categories of information such as train schedules and restaurants in the surrounding area. In another study, Kim et al. [20] employed a Hidden Markov Model on a contextual information dataset to recommend the most appropriate menu for healthcare services.

2.2 Physiological-related recommendation

In their study, Liu et al. [23] addressed the effects of music playlist recommendations on the heart, while flying on an airplane. Since the airplane environment causes discomfort and stress, authors used a heart sensor embedded in the aircraft’s seat to monitor the heart activities and particularly to measure the stress. This study [23] focuses on analyzing the airplane environment and the effects of music on the user’s level of stress, which differs from the focus of this paper. Oliver and Flores-Mangas [28] presented MPTrain system that uses a smartphone-based accelerometer and heart sensor to analyze the user activities to select appropriate music by changing the user selection based on the music rhythm. MPTrain is proposed to help users achieve their exercise goals by speeding up or slowing down their activity. In another study, Nirjon et al. [27] proposed a context-aware recommendation system that uses an earphone device equipped with a heart rate detection sensor. The sensor collects continuous Electrocardiography (ECG) signals, and extracts the peaks in order to calculate the average heart rate instantaneously. Similarly, the system extracts offline beats from the music signal or what is known as the tempo of the music, as well as the frequency of the sound, and the energy of the signal.

Our work differs from all the presented studies in this section in such a way that our proposed context-aware recommendation model considers the physiological condition of the user as a useful context parameter to enhance the recommendation results. Another difference is that our aim is to recommend items that match the detected context and not to transfer the physiological condition from one user to the other. In addition, our model does not require the analysis of the content of the media item itself, for example the content of the image or the video. Instead, we rely on the user’s available context.

3 Detecting a User’s physiological context

In order to detect the user’s physiological context, we have to record and monitor the user’s biological signals to determine their physiological or emotional (e.g. mental stress) situation. Even though we collect biological signals, we are not concerned with diagnosing illnesses. Our goal is to use the collected physiological data to better understand the physiological status of users. In fact, in this paper we focus on one particular emotion: mental stress. The best tool to detect the targeted physiological conditions is to monitor the heart using an ECG signal that can measure the Heart Rate Variability (HRV) [7].

Some of the least obtrusive commercial ECG devices are in the form of a chest strap fitted with two electrodes and an electronic circuit. This obliges the user to wear the ECG sensor at all times in order for the application to benefit from physiological or psychological contextual information. Since these devices are wearable, they might be associated with a certain level of discomfort over prolonged periods of use. In this paper, we used a much more convenient and non-invasive device as shown in Fig. 1. The ECG sensor used is a product from the AliveCor Company, and is connected to a “Samsung Galaxy III” smartphone device. By having a sensor that just looks like a protection cover attached to the smart device, an installed android application can collect the ECG signal. We use the ECG signal to extract the HRV information.

Fig. 1
figure 1

An image of the single-lead ECG sensor attached to an android device in our developed prototype application

Since the ECG signal represents the electrical activity of the heart, we can extract the HRV signal by calculating the series of time intervals between two consecutive heart beats or R-waves [27]. Fig. 2 shows a portion of a recorded ECG signal for a couple of minutes. The HRV is known to shed light on the mental stress and infer some conclusions about the conditions of monitored individuals [3, 11, 14, 17].

Fig. 2
figure 2

A portion of the original ECG signal recorded using the single-lead ECG sensor

To detect mental stress, the latest measurement collected is compared to a previous recorded benchmark. The benchmarks are typically previous measurements taken during a neutral stress state. The length of the measurement is a key constraint here, since the longer the measurement is, the more reliable the conclusions are. Therefore, it is widely recommended to use ECG measurement records of at least three minutes [14]. Accordingly, the stress detection algorithm assesses the recorded ECG signal and compares the HRV calculated values to the benchmark measurement. Nonetheless, since the reason for measuring the physiological parameters in this study is to recommend suitable multimedia content, such a long measurement reduces the system’s usability and applicability. In other words, asking a user to have a three minute ECG measurement before the system can recommend a song is unrealistic. Consequently, we decided to minimize the ECG recording time needed to evaluate the physiological parameters in order to increase the application’s practicability. The use of shorter ECG time measurements when analyzing mental stress is still a relatively new concept. Therefore, more research is needed in order to assess the accuracy of such an approach compared to longer accurate measurement methods [30]. As we have previously argued, the proposed model is not targeting medical applications and therefore we are more at liberty to use such an approach. As a result, we are willing to potentially sacrifice some accuracy in the goal of increasing usability. Therefore, for the physiological context detection, we simply use the short term analysis of the heart rate variability proposed by [30] to determine if there is the potential of the user being in an elevated stress state, a neutral state, or a relaxed state, as in Fig. 3.

Fig. 3
figure 3

Three different physiological states are proposed as possible user physiological contexts

After collecting the required biosignal, the extracted physiological information is sent along with other contextual parameters to the recommender system. Then, the recommender system recommends content such as music and movies, which consider the detected physiological state of the user.

3.1 Detecting physiological context experiment

We attempted to determine how the short ECG measurement is able to distinguish between the three main physiological contexts: elevated stress, neutral, and relax states. We invited subjects to measure their HRV parameters while asking them to perform three types of activities. The sessions were composed of stress, neutral, and relax exercises. For the stress exercise, subjects were asked to perform a Stroop color-word test. The Stroop test has been used in different physiological and psychological studies [21, 25, 34] since it involves sympatho-adrenal activation that is reflected in the subject’s heart and respiration rates [34]. For the neutral exercise, subjects were asked to sit comfortably and try to read an article. We will use this exercise as a benchmark to analyze stress and relaxation situations. For the relaxed exercise, subjects were asked to sit comfortably, close their eyes, and listen to relaxing music. Each exercise lasted three minutes and contained 3 min worth of ECG data, but only the last 30 s were fed to the physiological detection algorithm to evaluate the short ECG measurement performance.

Ten adult subjects participated in the physiological context detection experiment: 6 males and 4 females. The average age was 28.7 years. The experiment was carried in an office space in our laboratory. An office desk, a laptop and a headphone were provided for each participant. Subjects were seated in front of the laptop while the ECG sensor sent the data recorded to a Java based computer program.

After collecting the ECG signal, three HRV parameters of interest were extracted to measure the mental stress:

  • Low frequency band of the HRV (LF).

  • High frequency band of the HRV (HF).

  • The ratio of low frequency to high frequency HRV (LF/HF).

Numerous related studies have shown the effectiveness of these HRV parameters to measure mental stress [1, 30]. Particularly, during a stressful situation, the HF component is noticed to decrease while the LF ratio to HF increases. The resulting HRV parameters for the 10 subjects during the last 30 s of the three exercises of the experiment are presented in Table 1.

Table 1 The results of detecting the physiological context of the users

The results obtained have shown, on average for the 10 subjects, an expected increase in LF/HF component triggered by a stressor activity (with respect to the neutral state). Such an increase in the LF/HF was also observed in earlier documented studies using a similar Stroop test, such as in [32]. In addition, for the same stressor activity, a decrease in the HF component (with respect to the neutral state) was noticed from the experiment results on average for the 10 subjects. Such a decrease has also been confirmed in [8, 17]. On the other hand, for the relaxation activity, we noticed, on average, an expected decrease in the LF/HF component with respect to the neutral state. Nonetheless, we did not find a significant change in the HF component with respect to the neutral state. Therefore, we will use the LF/HF exclusively to differentiate between the various physiological (stresses, neutral, or relaxed) states.

Subjects were asked to evaluate the three exercises subjectively by giving a rating value that indicates how stressful each exercise was. A rating range from 0 to 10 was used, with 10 indicating the exercise was stressful, and 0 indicating that the exercise was relaxing. Our experimental design of three different exercises distinguishing the required three physiological statuses (stressed, neutral, and relaxed) has been reflected in the results reported in Table 2. Accordingly, the HRV analysis results verify the correlation between the physiological status of subjects performing the three experimental exercises and the observed subjective assessment.

Table 2 The subjective results of user experience during the experiment

4 Preliminaries

Before describing the details of the recommendation model, we have to introduce a set of definitions to formalize our recommendation problem. In this paper, bold upper-case letters, such as A, are used to denote matrices; whereas the corresponding lower-case italic with two subscript indices, such as ax,y represent the entries of the matrices. Capital italicized letters represent sets, such as U, and an upper-case italic letter with one subscript index such as U x represents an entry element x from set U. We also formalize the matrix elements in the recommendation model to follow the following format: Matrix = [(element entry)Set A, Set B]|A|×|B|. For example, the context-item matrix in the model is represented in the following format: A = [ac,i]|C|×|I| . In addition, for simplifying the recommendation problem, we refer to the term “users” or (U) to represent the set of users, and “items” or (I) to denote a set of media resources that can be recommended such as music or movies. Table 3 summarizes the notations employed in the rest of this paper.

Table 3 A summary of the notations meaning

4.1 Problem definition

Combining contexts is an important challenge in our recommendation model. Our research problem is to identify different contextual dimensions and deliver media content that best fits the user’s detected context. Suppose we have a group of users who share music content. A user’s information, including their history of rating and their textual annotations given to certain tracks describing their experience as “love”, “relaxing”, “for gym”, etc., are available. We also have other kinds of users who have not previously recorded any annotations toward any music or have not shared their ratings publicly on the internet.

For a list of possible detected conditions represented as context dimensions CN, where CN = {n 1, n 2, n 3, … n |N|}, a list of users U = {u 1u 2, …, u |U|}, a list of context parameters C = {c 1, c 2, …, c |C|} - extracted from CN- and a list of available item I = {i 1i 2, …, i |I|}, a recommendation model that predicts the suitability or interest for user (u) in item (i) given a set of context (c) can be built. Then, the context attributes are used in the recommendation process to filter and uncover items that are probably of interest to the user in such a context, personalized according to the user’s preferences.

4.2 Constructing the required matrices

To build the required matrices needed for our recommendation model, we use the example illustrated in Fig. 4. As shown in Fig. 4, we have a list of users U = {u 1 , u 2 ,…,u 5 }, a list of items I = {i 1 , i 2 ,…,i 12 }, and a list of contexts C = {c 1 ,c 2 ,…,c 5 }. Then, we construct three main matrices to build the base for our recommendation model as follows: context-user matrix T |C|×|U|, user-item matrix R |U|×|I|, and context-item matrix E |C|×|I|.

Fig. 4
figure 4

A representation of the three dimensions of the recommendation problem

To build the matrix T |C|×|U|, let t(c y , u x ) be the number of times a user u x consumed items in context c y . Similarly, in the matrix E |C|×|I|, let e(c y , i x ); be the number of times item i x has been consumed in context c y . If a user has not consumed any items in a given context, or if an item has never been consumed by any user in a particular context, then the t(c y , u x ) and e(c y , i x ) = 0 respectively. In addition, if we only consider the frequency of usage for a particular context within the users or items scope, then the accuracy of the recommendation results might be affected by the number of users who repeatedly use items in a large variety of contexts. Accordingly, we would neglect the importance of how many users have consumed items within that context as the opposite of small number of users who consumed many items in a particular context. Therefore, we normalized the frequency values in a range between 0 and 1 by the following formulas:

$$ t\left({c}_y,{u}_x\right)=\left(\frac{n_{cu}\left({c}_y,{u}_x\right)}{N_{cy,u}}\right) $$
(1)
$$ e\left({c}_y,{i}_x\right)=\left(\frac{n_{ci}\left({c}_y,{i}_x\right)}{N_{cy,i}}\right) $$
(2)

where n cu (c y , u x ) is the number of occurrences of context c y in the list of consumed items by u x , n ci (c y , i x ) is the number of occurrences of context c y in the list of contexts an item has been consumed in (f y,x ) as in Eq. 3. N cy,u and N cy,i represent the number of times the context c y is used with all items, and the number of times the context c y is used by all users, respectively.

$$ \begin{array}{c}\hfill {n}_{ci}\left({c}_y,{i}_x\right)={\displaystyle \sum_{y,x}}\left({\delta}_{y,x}{f}_{y,x}\right)\hfill \\ {}\hfill\ {\delta}_{y,x}=\left\{\begin{array}{c}\hfill 1\kern1.25em {c}_y\ occured\ in\ {i}_x\ \hfill \\ {}\hfill 0\kern4em otherwise\hfill \end{array}\right.\ \hfill \end{array} $$
(3)

Note that, we also examined the usage of a binary version of values for T and E, but in this case, we would not be able to show how often a particular item is being used in a specific context. That’s because in the binary case, we would only be able to know that an item has been consumed in that specific context.

The construction of the three matrices (T, R, and E) leads to the discovery of the latent association of items toward a particular context, and the latent association of users toward contexts, and accordingly, leverage relevant items for a user in a particular context.

5 The recommendation algorithm

In this section, we present the algorithm used to construct a context-aware recommendation model. The inceptive idea is that users sharing an item such as music are likely to also share some hidden contextual information. Such contextual information is able to effectively describe the user’s preferences toward their selected items. The analysis of the available contextual information associated with consumed items enables the analysis of items consumed in similar contexts. In general, users who consume certain items in a given list of contexts are more likely to form a contextual pattern to bridge the information gaps between users and new items.

5.1 User-based collaborative filtering

Before we analyze the user’s context, using collaborative filtering technique [5], the proposed recommendation algorithm identifies the user’s neighbors. The concept behind relying on the detection of similar users who share some items is to exploit the list of items consumed by given users to find other interesting items consumed by similar users (also called nearest neighbors). To determine the similarity between two users, we used cosine-based similarity. The cosine-based similarity is a widely used approach that takes two vectors of shared items of user u x , and u y, and quantifies their similarity according to their angle, as in Eq. 4. To minimize the computational cost, we consider top k nearest neighbors for each user. Accordingly, we eliminate the computed similarities of those users who share few items with others, and assign a zero similarity value if the similar user is not among the top k nearest neighbors. We employ the matrix (S), where S = S k, to form the user-user similarity matrix. Figure 5 shows an example of constructing the similarity matrix S.

Fig. 5
figure 5

A representation of the user-user similarity matrix S

$$ {s}_{u_x, {u}_y}= \cos \left({u}_x,{u}_y\right)=\left(\frac{u_x.{u}_y}{\left|\left|{u}_x\right|\right|{}^2.\left|\left|{u}_y\right|\right|{}^2}\right) $$
(4)

5.2 Item-based collaborative filtering

As we employ collaborative filtering to observe the user-user similarities, we employ the user-item matrix R to observe the item-item similarities. According to [10], the item similarity can be computed using collaborative filtering. The idea here is that a user is likely to consume items that are similar to some of what they have already seen in the past. The similarity values can be obtained by measuring the cosine angle between the two column vectors in the matrix R, similar to finding the user-user similarity according to Eq. 5.

$$ {b}_{i_x, {i}_y}= \cos \left({i}_x,{i}_y\right)=\left(\frac{i_x.{i}_y}{\left|\left|{i}_x\right|\right|{}^2.\left|\left|{i}_y\right|\right|{}^2}\right) $$
(5)

With regard to the resulted similarities, we can form the item-item similarity matrix B. The similarity value of \( {b}_{i_x, {i}_y} \) is only considered if it is greater than the top k nearest item neighbors, otherwise the similarity value is set to zero. Figure 6 shows an illustrated example of computing the item-item similarity.

Fig. 6
figure 6

A representation of the item-item similarity matrix B

5.3 Extracting latent context-item preferences

Before recommending items to a user, we need to find the latent preferences of that user toward their current context. This can be done by analyzing the hidden preferences of users toward items in a given context. Accordingly, by finding the latent context-item, we capture how a particular context has occurred with the user’s selection of items that are similar to a given particular item. We utilize the matrix T and the transpose of matrix B, which we constructed earlier to form the new context-item matrix Z. Formally, the matrix Z represents the matrix multiplication results of both T and B, as in Eq. 6:

$$ \mathbf{Z} = \overline{\mathbf{T}}{\left({\mathbf{B}}^{\mathrm{k}}\right)}^{\mathrm{T}} $$
(6)

Where the matrix \( \overline{\mathbf{T}} \) denotes a normalized version of the matrix T, and the matrix (B k)T denotes the top k nearest items as explained in Section 4.2. The multiplication of the c-th row by the i-th column implies finding the latent preferences of context c, on item i with respect to the items’ k nearest neighbor. Figure 7 shows the details of constructing the new matrix Z.

Fig. 7
figure 7

An illustration of the process of computing the latent context-item matrix Z

The reason for having normalized values in the matrix T is to reduce the effect of items that were consumed by many users. Hence, the first type of items contributes more in estimating the context-based prediction score than the second type of items. Using normalization, we can minimize the effects of those items in regards to the detected contexts.

5.4 Extracting latent item-user preferences

In this step, we capture the user latent preferences toward an item. The main idea is that users in a context consume certain items, and that when they are in the same context in the future, they will likely consume items that are either similar to their preferences or similar to the choice of their nearest neighbors. We denote matrix Y to represent the latent item preferences to a given user u x, which also includes the item preferences of his/her similar users. We build the matrix Y according to Eq. 7 as follows:

$$ \mathbf{Y}={\left(\overline{\mathbf{R}}\right)}^{\mathrm{T}}{\left({\mathbf{S}}^{\mathrm{k}}\right)}^{\mathrm{T}} $$
(7)

Where \( {\left(\overline{\mathbf{R}}\right)}^{\mathrm{T}} \) is the transpose of the original normalized rating matrix R (denoted as D) and S k is the top k user-user similarity matrix. The product of the two matrices \( {\left(\overline{\mathbf{R}}\right)}^{\mathrm{T}}{\left({\mathbf{S}}^{\mathrm{k}}\right)}^{\mathrm{T}} \) brings the user and their nearest neighbors’ preferences to a given item.

It is also important at this point to consider the issue of having some users that are more active in rating and consuming different items than other inactive users. This leads to more contributions in the recommendation model from the active users compared to the less active users. Therefore, we normalize the values in matrix D, before the multiplication step, in order to reduce such contribution effects. Figure 8 shows the details of constructing the new matrix Y.

Fig. 8
figure 8

An illustration of the process of computing the context-user matrix Y

5.5 Employing the latent collaborative models for recommendation

We take into consideration the user’s previous items, their nearest neighbors, as well as the similarity of items that have previously been consumed in the detected context. Items with a high ranking score will be recommended to the user. In addition, the recommended items reflect the user’s context within which it is recommended. In order to compute the items’ final rating scores, we use the two previously described models: the latent item-context model, and the latent item-user model. The association of these two models builds the required contextual bridge between users and items. Specifically, the proposed recommendation produces item recommendations relevant to a given context by extracting latent preferences. The calculation of the user-item ranking score is computed by:

$$ Ran{k}_{u,c}(i)=\sum_{c\in m}\alpha {\mathbf{Z}}_{c,i}\times {\mathbf{Y}}_{i,u} $$
(8)

Where Z c,i is the matrix entry of the c-th context row and the i-th item column of the Z matrix. The Y i,u is the entry value of the i-the row and the u-th column in the Y matrix. The parameter α is an attenuation factor, where α ϵ (0 … 1) for reducing the weight factor of a less sensitive context. The tuning of the α value is set after some experimental results. Details of the parameter α are presented in the parameter tuning of the performance evaluation in Section 7.5. Items with higher ranking scores are recommended to the user. Figure 9 presents an illustration of the user-item ranking score computation step. In Eq. 8, we show that the ranking score is not limited to a single context; if multiple contexts are in the user’s query, then the summation of the multiplication will represent the ranking score for that item. With regard to the example illustrated in Fig. 4, the final matrix does not have to be physically stored in the database; its ranking score can be computed by executing a query on the two matrices Z and Y to reduce the model’s complexity.

Fig. 9
figure 9

The final user-item ranking score

6 Architecture and system design

6.1 Application scenario

Today’s smartphone ecosystem accelerated the development of different context-aware recommender systems, particularly in observing the user’s contexts such as weather, location, activity, and time. Smartphones also enable enormous access to multimedia collections and act as a bridge between different sensors and the recommendation engine. Hence we developed a smartphone application that can be effectively used in a home environment. Let us assume a user who usually listens to music while studying and who has an exam the next morning. At home, the user used the application’s ECG interface to capture their heart signal. Depending on the user’s stress level, the application will suggest relaxing music according to his/her preferences. Nonetheless, as we explained in the design of the recommendation algorithm, the application explores the user’s social friends and their preferences as well. In this scenario, the application can not only recommend music, but it can also adapt the recommendation result to fit a user’s context.

The application prototype is developed in an android environment. The application can detect the date, time, location, song being played, the play count, and the heart signal, and has the ability to customize other parameters into the collection, as shown in Fig. 10.

Fig. 10
figure 10

A screenshot of the context-aware recommendation prototype interface

6.2 Design and architecture

The architecture of the introduced recommendation system consists of four main layers: the input/output interface layer, the context management layer, the client-local resources layer, and the server-cloud resource layer, as shown in Fig. 11. The client side is separated into client interfaces and user-local server. Identifying the user’s context and all the input parameters are processed within the user-local side. In addition, the processing of the recommendation algorithm including all required recommendation agents is kept in user-local server. The server side has a cloud-based design to store additional multimedia resources and social profiles. Figure 12 shows the sequence of main functionality interactions for a user requesting a song. As a proof of concept, the prototype application synchronizes any media content stored within the user’s Dropbox account. The input/output interface layer handles the collection of the required contextual data and interacts with the user, which includes delivering the recommendation results. The context-management layer identifies the context of the user by analyzing the retrieved sensory data. The local resources layer stores the user behavior, and selects and evaluates the different recommendation parameters needed for the recommendation algorithm to function. The Entity Relationship Diagram (ERD) of the client-local resources layer is demonstrated in Fig. 13. The resource contents and the available social profiles are stored in a cloud-based repository.

Fig. 11
figure 11

The architecture of the context-aware recommendation model

Fig. 12
figure 12

Interaction diagram of a user requesting a song using the proposed application

Fig. 13
figure 13

Client-Local Resources Layer Database ERD

7 Experimental evaluation

In this section, we investigated whether the proposed recommendation technique improves the item prediction accuracy or not. The goal of the experiments is to evaluate the accuracy of the proposed approach: the utilization of the user’s context to recommend a different number of items. We changed the number of items retrieved each time to measure their relevance to the user’s request.

7.1 Dataset

To find a publicly available dataset that carries some contextual information is a crucial challenge. Such lack of availability challenges any design of a context-aware recommendation algorithm [40]. Therefore, we crawled our dataset from an online social music database: last.fm. Specifically, music information and annotations data are extracted from the last.fm website. Last.fm is an online social music radio resource that enables their users to subscribe, listen and tag their favorite albums, tracks, and artists. The crawled dataset contains 164 users, 626 tracks, 251 contextual tags, and 10,711 overall item-context assignments. Users of last.fm annotate different albums and tracks with textual tags. Due to the fact that users can give any textual description to their favorite tracks and albums, different words can be used to describe the same meaning. For instance, four different users may tag item i 1 with four different tags with the same meaning: “relaxing music”, “for relax”, “relax”, and “relaxation”. These tags can be grouped together under one annotation: “relaxing”.

We also tested the performance of our proposed system on a bigger dataset. We used the publicly available dataset from MovieLensFootnote 1 (www.movielens.org), which was used in [15]. The MovieLens dataset consists of 943 users, 1,682 movies, and 100,000 user-item ratings. This dataset does not have explicit contextual data, but contains some information that we consider a context in our experiment for a proof of concept experiment: genre (romance, action, adventure etc.), and user’s information (age, gender, occupation and zip code).

7.2 Comparison with other methods

In our evaluation, we present the detailed experimental results of our ranking method in comparison with other benchmark methods.

  • Popular Items (PopItems): As described in [19], the PopItems technique gives more prediction weight to items with a higher count value within a specific context c y .

  • Item Rank (ItemRank): This technique is a random walk scoring algorithm proposed in [12]. Using a user-item relationship graph, the algorithm estimates the probability of user u visiting item i in a random walk.

  • uMender: This recommendation technique is proposed by J. Su et al. [33]. The algorithms first create a sub-matrix to find users and items that are similar in the same context condition. Then the algorithm obtains the positive and the negative preferences based on the available rating values. Finally, the algorithm finds frequencies in the negative and positive item sets and computes the related user-item prediction.

  • Collaborative and Content-based Technique (CCbT): Lops et al. [24] proposed a collaborative content-based tag recommendation algorithm. Since we are dealing with context by utilizing social tags, we considered comparing this technique with our proposed model since the tags annotation is also analyzed here.

7.3 Evaluation methodology

The evaluation procedure is divided into two parts: the first part measures the accuracy of the context-based recommendation prediction, using offline experiments on different datasets crawled from online multimedia databases; the second part evaluates the user’s satisfaction with the resulted context-based recommendations after using the proposed prototype applications.

7.3.1 Offline experiment

For the experiments in this section, we similarly follow the experimental procedure proposed in [41]. We randomly divided the two datasets into two groups: a training set that represents 80 % of the original dataset, and the remaining 20 % used as a test set. The training set is used to train the recommendation model, while the test set has items withheld randomly and their associated contexts are used as test-queries for each user. Since the algorithm performance might be sensitive to particular items chosen in the training or the test set, we repeated the run of the algorithms 5 times with different portioning. In addition, the values represented in the experimental results section represent the average of those different 5 runs with the standard deviation.

7.3.2 Online experiment

In addition to the offline dataset experiments, we also conducted a subjective user evaluation using the android prototype application mentioned earlier. Providing users with some contextual knowledge, we computed the suitable items to be recommended according to the given context and displayed them to the user. In order to measure the effectiveness of our approach, the recommendation algorithm presents the recommended items retrieved using another algorithm that is non-personalized to the user’s context. We used the collaborative filtering algorithm presented in [31] as the second recommendation technique used in the application. The goal behind this experiment is to know whether or not our context-aware prediction method can capture items more relevant to the user’s preferences than the non-context-aware technique.

Each user in the online experiment is asked to evaluate the ten pieces of music recommended in a certain context. Subjects go through different context scenarios, thus they evaluate the recommendation for different context conditions. Subjects browse the recommended music and answer if they like or dislike the music in such a context.

7.4 Evaluation metrics

To measure the algorithm’s retrieval accuracy, we adopted precision and recall, which are widely used evaluation parameters to measure the effectiveness of the retrieved recommendations in our offline experiment. Precision can be calculated by finding the ratio of the recommended items to the items already identified as relevant to the user, as in Eq. 9 [35]. Recall can be calculated by finding the amount of relevant contents among all recommended contents as in Eq. 10 [35].

$$ precision=\frac{\mathrm{recommended}\cap \mathrm{relevant}}{\mathrm{recommended}} $$
(9)
$$ recall=\frac{\mathrm{recommended}\cap \mathrm{relevant}}{\mathrm{relevant}} $$
(10)

For the comparison of our method to the other four benchmark algorithms, we report the Mean Average Precision (MAP) by using Eq. 11, in addition to precision and recall.

$$ MAP=\frac{1}{\left|U\right|}{\displaystyle \sum_{u=1}^{\left|U\right|}}\frac{1}{t_u}{\displaystyle \sum_{n=1}^{t_u}}{P}_n\times {R}_n $$
(11)

Where t u is the number of test cases for user u, and P n is the precision at top n and R n is a binary variable that equals to 1 if the item is relevant at rank n [16]. The MAP reports the average precision at each top k result [39]. Note that we varied the number of items retrieved (top k values) to measure the ranking positions of each recommended item. For instance, the precision values are reported for each top k (k = 1, k = 5, and k = 10), which show the number of relevant items at top 1, top 5, and top 10.

7.5 Parameter tuning

The proposed recommendation model uses α, an attenuation factor, where α ∈ (0 … 1) is used to reduce the weight factor of the contextual effects on the prediction scores. Prior to starting the experiments, we gave α equal values in all contexts in order to run an empirical study. By tuning this parameter, we may increase or decrease the influence of the context on the final scoring value given to an item. Hence, it is critical to correctly set the value of α to improve the recommendation performance.

We first measure the MAP, and the Mean Reciprocal Rank (MRR) using different values of α on the Last.fm dataset and then on the Movielens dataset, as shown Table 4. The MRR measure is computed using Eq. 12:

Table 4 The different weights assigned to α to measure its sensitivity in each dataset
$$ MR{R}_{\left(k=n\right)}=\frac{1}{\left|U\right|}{\displaystyle \sum_{u=1}^{\left|U\right|}}\left(\sum_{i\in \left({t}_u\cap {R}_u^n\right)}\frac{1}{r(i)}\right) $$
(12)

Where t u is the test case for user u, R n u is the top n returned records, and the value r(i) ranges between: 1 ≤ r(i) ≤ n. The results in Table 4 show the best MAP and MRR values obtained when α = 0.3 on Last.fm dataset and α = 0.5 on Movielens dataset.

7.6 The effect of normalization

Prior to running the comparison experiments, we investigated the impact of matrix normalization on the evaluation metrics. We measured MAP and MRR after running the algorithm on normalized and non-normalized matrices. Specifically, we employed two different versions of the context-user matrix D and context-item matrix E. Tables 5 to 6 reports the improvement on MAP and MRR using normalized matrices over non-normalized matrices of D and E in each dataset. In addition, we conducted a statistical analysis to measure the significance of the improvement of the normalized approach over the non-normalized one. Specifically, we conducted the two-tailed paired t-tests under the same conditions of D and E and for the same dataset. The results confirmed that the normalized approach positively impacted the significance for both MAP and MRR values Table 6.

Table 5 Effect of normalization on the Last.fm dataset
Table 6 Effect of normalization on the Movielens dataset

7.7 Experimental results

7.7.1 Offline experiment

In this section, we present the results of the comparison between the performance of our approach and the recommendation techniques introduced in Section 7.2. As explained in Section 7.3, we computed the performance of each recommendation approach in retrieving accurately relevant items, as well as their ranking positions in the recommendation list. Firstly, we evaluated the recommendation performance by calculating precision and recall, obtained by our approach and the other four alternative approaches, on Last.fm and Movielens datasets, as in Figs. 14 and 15.

Fig. 14
figure 14

Precision and recall obtained by the 5 algorithms on the Last.fm dataset

Fig. 15
figure 15

Precision and recall obtained by the 5 algorithms on the Movielens dataset

Figures 14 and 15 depict the precision-recall curves showing how our proposed approach outperforms the other baseline algorithms on both datasets. The number of retrieved items for a user’s quest is plotted on data points of the graph curves; the curves start from left denoting the top k = 1, whereas the last point on the right denotes the top k = 10. Our approach obtained approximately a 2 %, 3 %, 10 %, and 12 % precision improvement on the Last.fm dataset compared to CCbT, uMender, ItemRank, and PopItems respectively. Our approach also obtained approximately a 3 %, 1 %, 3 %, and 6 % precision improvement on Movielens dataset compared to CCbT, uMender, ItemRank, and PopItems respectively.

When comparing the reported results, we observed that finding latent assignments of context to items and latent relations of users towards contexts reveals more relevant items than non-context-aware approaches. We also noticed that our approach is relatively affected by the number of context parameters attached to an item, which indicates that the more users consume an item in different contexts, the better the recommendation would be.

We continue to determine if the types of users in the dataset affect the sensitivity of the recommendation algorithm. The reason is that in both datasets there are users who only rated a few items in different contexts, as well as other users who rated many items. Accordingly, we investigated the size of each user’s rating history, which is used for context discovery. Then, we divided the users in the datasets into three different groups denoted as Active, Normal, and Passive users. Each user is assigned to one of these three groups depending on the number of rated items. If a user rated at least 11 items or more, they are considered to be active users, if they rated from five to ten items, they are considered normal users. Passive users or (cold start users) are those who rated less than 5 items. We measure the algorithm’s sensitivity to the number of items rated per user. The resulted performances for each recommendation approach in the two datasets are shown in Figs. 16 and 17. As we expected, all the algorithms were sensitive to the number of items rated by each user. Particularly, the recommendation accuracy increased when the number of items rated by a user increased. Note also that all algorithms achieved considerably lower MAP for the cold-start problem, due to the fact that the algorithms do not have enough history to feed the recommendation process. However, detecting the user’s context indeed facilitates the improvement of the recommendations in such cold start cases.

Fig. 16
figure 16

A comparison of MAP according to the variation number of items rated by a user in the Last.fm dataset

Fig. 17
figure 17

A comparison of MAP according to the variation number of items rated by a user in the Movielens dataset

7.7.2 Online experiment

To provide insight into the performance of our approach on real users, we conducted an online experiment on invited subjects. According to the experimental setup introduced in Section 7.3.2, we invited 15 subjects to participate in evaluating our context-aware recommender prototype application. The subjects were adult, 7 males and 8 females. The average of their age was 23.2 years. To eliminate the effects of other contextual parameters such as the user’s age, education level, mother language, culture, and other; we carefully selected participants that shared the same parameters that are not included as a context in our application. The experiment was performed outside the laboratory and each user was given an android phone to use. All phones needed access to the Internet in order to save the data collected on our server cloud. We used a smaller online crawled dataset, which contained 40 popular artists and 419 musical tracks. Then, we asked the subjects to rate at least 20 tracks and tag each track with some contextual tags according to their preferences. From these surveys, we obtained from 343 item ratings and 1,747 context-item associations. Each subject was asked to listen to a track for one minute and then tag it with single or multiple words. We followed the same evaluation protocol that was used in similar studies such as the one by Cai et al. [6]. The collected rating and contextual data are used as ground truth dataset for the proposed model to generate a recommendation list for a given invited subject. Afterwards, for each user, we detected six different scenarios of context and ran the recommendation algorithm to produce 10 recommended items suitable for that particular context. Three of the scenarios reflect the three physiological context dimension introduced in Section 3. We repeated asking the subjects to perform three types of activities (perform a Stroop color-word test, sit comfortably and try to read an article, and sit comfortably, close their eyes, and listen to relaxing music). After performing each activity, subjects were asked to provide their feedback about the resulted recommendation list. As for the rest of the three other contextual scenarios, the application detects them based on the information available as well as the information provided by the user. For instance, date, time, weather, are detected right away by the android application while the user has to specify whether he/she is alone, with a partner, or with the family member. Subjects can optionally select a contextual condition such as: studying for an exam, or having a romantic date, etc. Based on the information collected, the application chooses randomly three different contexts to as a test query for evaluation.

Users could listen to each recommended item and answer whether they liked or disliked the recommended music in such a context. As a result, we received 900 responses telling us whether a user u x liked or disliked the retrieved item i y in a given context c z for each recommendation algorithm. We then computed the average precision of a user obtained from the 6 different contextual scenarios. The results are briefly summarized in Fig. 18. In addition, we conducted a statistical analysis to measure the significance of the improvement of our approach using the two-tailed paired t-test. Our approach achieved statistical significant at 1 % level where p < 0.01 over the non-context-aware approach.

Fig. 18
figure 18

Average precision of recommendation retrieval

We further analyzed the subjects’ evaluation of each recommendation list in regard to the three physiological contextual queries. We conducted a two-tailed paired t-test on the responses of subjects in relaxed, stressed and neutral situations. Both tests on relaxed and neutral conditions agreed that there are certain preferences of users toward these two conditions. However, there is no significant differences with the resulted obtained in the neutral condition test. Hence, we conclude that other contextual information should be included in the query where the physiological condition of the subject does not differ from their normal benchmarks.

8 Conclusion

Sensors attached to smartphones are becoming widely used to support the interactions between the user and the different context-aware recommendation systems. In this paper, we identify the user’s context and explore the use of physiological data to enhance the recommendation process. We also demonstrate the importance of using contextual information to provide enhanced recommendation quality and to increase the level of interactions between the user and their preferred multimedia contents. Moreover, the advantage of our proposed recommendation model is that it considers the contextual information by reflecting the online available social tags to explore the latent contexts assigned to items, as well as applying CF to find latent contexts preferences from similar users. Additionally, the proposed model can search and rank items without the need to analyze the item’s content, such as analyzing the music lyrics or voice signal, to predict the associated context. The experimental results demonstrate that the proposed context-aware recommendation technique offers favorable advantages in enhancing the accuracy of the prediction and providing suitable items for the user’s context.

To improve the recommendation quality, in future work we will address the issue of providing recommendations to a group of users rather than only to individuals. Additionally, we intend to investigate the proposed recommendation approach with additional context parameters in larger datasets. For instance, we can extend this work for different type of resources such as movies, books, and news.