1 Introduction

Recommender systems (RS) help to tackle the information overload issue since they are useful to quickly and easily identify items matching users’ needs [20]. For several years, RS have been widely adopted by various commercial platforms to enhance users’ experiences by providing customers with products (movies, books, shoes, etc.) fitting their expectations. RS are mainly organized into content-based methods and Collaborative Filtering (CF)-based methods [53]. Content-based approaches exploit extra information on items and user profile to perform the recommendation. However, those approaches show poor performances when the target user profile is sparse or when the system does not have enough extra information (users reviews, document content, etc.) on items. Collaborative Filtering (CF) methods are spreadly adopted in RS since they overcome limitations of content-based approaches by using a recommendation model that is built on experiences of a set of users that share similar tastes and needs [34]. For this purpose, interactions between items and users of this set of users help to perform the recommendation without requiring any extra information on items. CF-based methods are categorized into memory-based approaches and model-based approaches. Memory-based algorithms are simple implementable but inaccurate in case of large datasets. Meanwhile, model-based methods are hard-masterable but they are highly accurate even in case of large datasets [7].

In the literature, model-based approaches appear as the most widely used algorithms among CF-based methods. They use machine learning techniques to model users’ interests. Model-based approaches are famous due to their valuable recommendation performances. However, they badly deal with issues such as data sparsity and cold-start problems that can severely affect recommendation quality. Among model-based methods, Matrix Factorization (MF) is a very popular technique largely adopted in several recent powerful RS. This technique consists in decomposing the original high-dimensional data matrix into low-rank latent factors matrices [22]. However, in order to perform the prediction, this decomposition process involves a linear dot product between latent factors matrices that does not accurately model complex interactions that happen between users and items [1, 11]. Several studies [7, 25, 32, 35] try to tackle this insufficiency of the MF process by adding bias terms in the loss function also known as the interaction function[11]. Despite the incorporation of bias terms in the interaction function, the linearity of the inner product of the matrix factorization remains not completely mitigated. To remedy this, the nonlinearity property of deep neural networks shows encouraging promises.

Thanks to their nonlinearity, deep neural networks appear as a powerful technique to solve limitations of the simple inner product involved in the MF process. More recently, Deep neural networks (DNN) have proved their effectiveness in several domains such as computer vision, speech recognition, to text processing [8, 9, 14, 38] but they are not enough exploited in recommender systems (RS) comparatively to the rich literature on MF-based recommender methods. Some recent studies have proposed recommendation approaches based on neural networks but these methods are built on the baseline matrix factorization model that does not effectively alleviate the issue of the simplicity and the linearity of the inner product of latent factors in the interaction function. Subsequently, some complex interactions and relationships between users and items are not accurately modeled. To remedy these limitations, we propose a model-based recommendation approach that uses a matrix factorization technique in which the simplicity of the inner product of latent factors is alleviated by a twofold regularization of the interaction function. Moreover, the nonlinearity of complex users-items interactions is leveraged by employing deep neural networks (DNN).

The recognized effectiveness of Matrix Factorization-based model remains limited due to the fact that complex interactions between users and items are not accurately modelled by the inner dot product involved during the factorization process. We aim to address this limitation by leverage the contribution of neural networks. Our proposal is a recommendation method that combines the effectiveness of an improved matrix factorization and the power of deep neural networks. The contribution of this paper is declined as follows:

  • A matrix factorization based model is proposed and enhanced through a twofold regularization in order to significantly reduce the linearity impact of the inner product of latent factors, and therefore increase the accuracy of the user-item interactions modelling.

  • A deep neural network architecture is built and made of multilayer perceptron and single layer perceptron to effectively solve the linearity issue proper to the dot product of latent factor matrices from the matrix factorization process.

  • Extensive experimentations are conducted on several real-world datasets to highlight the effectiveness of the proposed method and the valuable contribution of deep learning in the recommendation task.

The remainder of this paper is structured as follows: section 2 reviews state-of-the-art recommendation methods, Sect. 3 details the proposed recommendation approach. Section 4 presents the conducted experiments and discusses obtained results. Section 5 concludes this paper and shows our perspectives.

2 Related work

According to the literature, Collaborative Filtering (CF)-based methods are very popular in Recommender Systems (RS). They are mainly categorized into memory-based and model-based approaches [22]. In this section, we survey several CF-based methods and promises of the deep learning applied in RS.

2.1 Memory-based methods

Memory-based methods are easy-understandable and simple-implementable [53]. They are mainly based on a neighborhood computation process that evaluates inter-user relationships in order to identify like-minded users. Authors in [4] propose a recommendation model that performs reliable recommendations according to three views of reliability measures such as user-based, item-based, and rating-based reliability measures. According to the authors’ method, an initial rating prediction is made and is thereafter improved through a neighborhood improvement mechanism for unreliable predicted ratings. Authors in [58] propose a recommendation model that uses an adaptive similarity measure in order to alleviate the data sparsity issue. For this purpose, the authors’ similarity measure combines online context-based multiple armed bandit mechanism. Authors in [3] propose a social collaborative filtering based on an adaptive neighborhood selection mechanism. A reliability measure evaluates the recommendation credibility, and a confidence model drops unreliable users from the initial neighborhood. The neighborhood is continuously updated in order to refine the prediction. Authors in [2] propose a social recommendation model based on user profiles improved by using virtual ratings. For this purpose, they compute the minimum number of required ratings through a probabilistic mechanism. The authors’ method ensures reliable recommendations thanks to reliable user rating profiles. Authors in [16, 56] propose a recommender system that relies on inter-user similarities computed by using the Spearman Rank Correlation Coefficient (SRCC). They perform predictions as average of weighted ratings from like-minded users.

Although simple and easy-affordable, memory-based methods badly deal with large datasets. In addition, they are highly sensitive to the data sparsity issue. Model-based methods have been developed to overcome these limitations.

2.2 Model-based methods

Model-based methods show the advantage of their accuracy and scalability. Several recent high-performant RS [13] use them to efficiently model user-item interactions. Those methods are most often based on Bayesian networks [23, 33, 46, 48, 52], clustering CF [5], latent semantic CF [42] and matrix factorization technique [30, 32, 36, 40]. Among model-based approaches, those based on matrix factorization are the most popular [28, 57]. Matrix factorization (MF) is a dimensionality reduction technique that reduces an original high-rank matrix into low-rank latent factor matrices. Afterwards, an optimization method helps to minimize the interaction function. Considering user-item interactions modeled by low-rank latent factor matrices, authors in [22] propose MF-based methods to alleviate the data sparsity problem while authors in [18] propose a MF-based recommender framework for an online recommendation. Authors in [28, 32, 35] propose a non-negative MF-based model while authors [36] incorporate social information into their recommender framework to effectively feed the MF-based RS. In [29], to address the high computational problem shown by existing recommender models, the authors propose the Nonnegative Latent Factor Model (ANLF) based on Alternating Direction Method (ADM). Their proposal is accurate and highly scalable while showing a low complexity. Authors in [24] develop a trust metric model that is incorporated in the regularization of the matrix factorization process. Similarly, authors in [10] develop a recommendation model that integrates trust social information in the latent feature extraction process performed by using the Single Value Decomposition (SVD). The authors’ proposal is scalable and performs reliable recommendations. Authors in [17] propose an adaptative learning rate function that comes to improve SVD++ recommendation algorithm. The authors’ proposal enhances recommendation performances while ensuring high scalability for large datasets.

Despite the efficiency and scalability of MF-based methods, their performances are limited due to the linearity of the dot product of latent factors involved in the interaction function that does not efficiently model complex user-item interactions. To alleviate this limitation, most often bias terms are added to the interaction function. This idea is useful since a rightly defined interaction function contributes to improve the recommendation quality. However, this approach is not enough to completely jugulate insufficiencies of the inner product of latent factors. To address the linearity issue from the inner product of the MF process, some researches exploit the nonlinearity property of deep neural networks.

2.3 Deep learning in recommender systems

For several years, deep learning succeeds in solving complex tasks. Applied to recommender systems, deep learning brings tremendous opportunities by overcoming limitations of state-of-the-art recommender models [55, 59]. Authors in [54] propose a recommendation model based on a deep neural network. The authors perform a matrix factorization by using the Quadric Polynomial Regression in order to reduce the original user data matrix into low-rank latent features matrices. In [11, 19, 50], the authors propose a recommender system based on a fusion of a baseline matrix factorization based model and a deep neural architecture. Authors in [21, 44] couple a Bayesian approach to a neural architecture to improve recommendation performances. Authors in [39] predict visitor shopping intent by using multilayer perceptron and long-short term memory (LSTM) recurrent neural networks. Their proposal is an accurate and scalable purchasing predictor that supports effective recommendations. Authors in [12] overcome flaws of statistical measures such as Pearson Correlation Coefficient (PCC) and cosine similarity by proposing a neural attentive model that learns the relative importance of historical items in a user profile and therefore improves recommendation performances. In [26], the authors propose an app recommendation based on hierarchical neural networks. To accurately model complex user interactions, the authors’ model is developed according to different views namely feature-level attention and view-level attention. Authors in [41] propose a deep collaborative filtering based recommendation model. They use Latent Dirichlet Allocation (LDA) for the feature extraction from user reviews in order to compute user similarities. Thereafter, they perform recommendations based on matrix factorization coupled with deep learning. Authors in [6] mine user sentiments from user reviews on e-commerce websites to improve recommendation performances. For this purpose, they model the fine-grained user-item interactions by using deep neural networks via LSTM encoder in order to enhance sentiment-aware representations. Authors in [45] propose a hybrid neural architecture to predict ratings. The authors’ architecture combines an autoencoder and a multilayer perceptron. The autoencoder extracts latent features from both users and items while the multilayer perceptron model nonlinear and intricate user-item interactions. Authors in [15] propose a deep hybrid recommendation model that is consisted of a Stacked Denoising Autoencoders- Factorization Machine (SDAE-FM) module for the latent feature extraction, a deep neural network module that effectively captures complex nonlinear user-item interactions, and the metric learning module that assesses the relationship between users and items. Besides the efficiency of neural networks and deep learning, some important aspects such as security and privacy need to be handled when using neural structures [27].

With unlimited capabilities, deep learning brings tremendous flexibility in recommender systems since it can be associated with various conventional models such as matrix factorization, factorization machines, and sparse linear models. Furthermore, its nonlinearity property enable the modeling of intricate user-item interactions that are not linear. Indeed, by using nonlinear activation functions, deep neural networks can accurately catch complex user-item interaction patterns and therefore significantly enhance recommendation performances. However, the latent feature extraction is the crucial step that is prior to the capture of nonlinear user-item interactions, and that therefore needs to be effectively performed. The proposed model efficiently merges the effectiveness of an enhanced matrix factorization with the power of deep neural networks in order to refine recommendations. The improved matrix factorization performs the latent feature extraction while the neural architecture proceeds to the modeling of complex user-item interactions.

The next section presents the proposed method.

3 The proposed recommendation method

In this section, we present a recommendation model and its specifications. The proposed model is an association between an improved Matrix Factorization (MF) model and a novel deep neural architecture.

The next subsection details specifications of our proposal.

3.1 dualDeepMF model presentation

The proposed method lays on a doubly-regularized Matrix Factorization (MF) model that is used for latent features representation. This MF model is biased by including reliable user’s neighborhood terms in order to accurately model user-item interactions. Thereafter, we exploit the nonlinearity property of deep neural networks in order to alleviate limitations from the linear dot product performed during the MF process. The latent features issued from the MF process feed a deep neural network (see Fig. 1). This neural network is mainly a Multilayer Perceptron (MLP) that is consisted of an input layer \(L_{in}\) fed by latent features, several hidden layers that enable the nonlinearity of the neural architecture, and an output layer \(L_{out}\). Thereafter, at the merging layer \(L_{merge}\), outcomings of the MLP model are combined with findings of the MF model in order to perform the end-prediction, and the predicted score is assessed regarding training instances.

Fig. 1
figure 1

dualDeepMF Model

The next subsection details the proposed MF model.

3.2 Matrix factorization model

The system hosts the set U of m users that interact with at most n items belonging to the set I. Users \(u \in U\) rate items \(i \in I\) by assigning to them values \(r_{ui}\) that express the user’s satisfaction intensity. We define the data matrix \(R = {\left[ {{r_{ui}}} \right] _{m \times n}}\) that contains all ratings assigning by users on items. The matrix factorization assume that R matrix can be approximate by low-rank latent feature matrices as follows:

$$\begin{aligned} R\approx PQ \Leftrightarrow r_{ui}\approx \sum \limits _{f \in F} {{p_{uf}}{q_{fi}}} , \end{aligned}$$
(1)

where \(P = {\left[ {{p_{uf}}} \right] _{m \times f}}\) is the user latent feature matrix, \(Q = {\left[ {{q_{fi}}} \right] _{f \times n}}\) is the item latent feature matrix and \(f \ll min(m,n)\) is the number of latent features. We slightly modify the approximation expression by adding a bias term \(b_{ui}\) that compensates for user-interaction variations. The approximation formula is reexpressed as follows:

$$\begin{aligned} r_{ui}\approx b_{ui}+\sum \limits _{f \in F} {{p_{uf}}{q_{fi}}} , \end{aligned}$$
(2)

From the factorization process, the loss function is defined as follows:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _{P,Q} \frac{1}{2}||R - PQ||_F^2\\ \Leftrightarrow \mathop {\min }\limits _{P,Q} \frac{1}{2}{\sum \limits _{u \in U} {\sum \limits _{i \in I} {{G_{ui}}(r_{ui} - {b_{ui}} - \sum \limits _{f \in F} {{p_{uf}}{q_{fi}}} )} } ^2} \end{array}, \end{aligned}$$
(3)

where \(||.||_F\) is the Frobenius norm, \(G_{ui}\) is an indicator function equal to 1 when a user rates an item and 0 otherwise.

A regularization parameter \(\lambda\) is incorporated in the error function in order to alleviate the overfitting problem. For this purpose, the loss function is reexpressed as follows:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _{P,Q} \frac{1}{2}{\sum \limits _{u \in U} {\sum \limits _{i \in I} {{G_{ui}}((r_{ui} - {b_{ui}} - \sum \limits _{f \in F} {{p_{uf}}{q_{fi}}} )} } ^2}\\ + \lambda (\sum \limits _{u \in U} {||} {P_u}|{|^2} + \sum \limits _{i \in I} {||} {Q_i}|{|^2} + \sum \limits _{u \in U} {\sum \limits _{i \in I} {b_{ui}^2} } ))\\ \\ \Leftrightarrow \mathop {\min }\limits _{P,Q} \frac{1}{2}{\sum \limits _{u \in U} {\sum \limits _{i \in I} {{G_{ui}}((r_{ui} - {b_{ui}} - \sum \limits _{f \in F} {{p_{uf}}{q_{fi}}} )} } ^2}\\ + \lambda (\sum \limits _{u \in U} {\sum \limits _{f \in F} {p_{uf}^2} } + \sum \limits _{i \in I} {\sum \limits _{f \in F} {q_{fi}^2} } + \sum \limits _{u \in U} {\sum \limits _{i \in I} {b_{ui}^2} } )), \end{array} \end{aligned}$$
(4)

where \(P_u\) and \(Q_i\) are respective latent feature vectors of user u and item i.

We perform a second regularization in order to refine the modeling of user-item interactions. For this purpose, we integrate the neighborhood effect into the loss function. Indeed, like-minded users influence each other since they enjoy the same items. Those users are determined after a similarity assessment that is performed by using a weighted Pearson Correlation Coefficient (PCC) computed as follows:

$$\begin{aligned} sim_{uv}= {\frac{JCC_{uv}*{\sum \limits _{i \in {I_u} \cap {I_v}} {(r_{ui} - \overline{{r_u}} )(r_{vi} - \overline{{r_v}} )} }}{{\sqrt{\sum \limits _{i \in {I_u} \cap {I_v}} {{{(r_{ui} - \overline{{r_u}} )}^2}} } \sqrt{\sum \limits _{i \in {I_u} \cap {I_v}} {{{(r_{vi} - \overline{{r_v}} )}^2}} } }}}, \end{aligned}$$
(5)

where \(\overline{{r_u}}\) and \(\overline{{r_v}}\) are rating averages of both users u and v respectively. \(JCC_{uv}= \frac{{|{I_u} \cap {I_v}|}}{{|{I_u}| + |{I_v}| - |{I_u} \cap {I_v}|}}\) is the Jaccard correlation coefficient that enables the impact of co-rated items in the PCC measure.

The neighborhood \(Near=\{v \in U|sim_{uv} \ge \gamma \}\) is defined as the set of like-minded users that share similar interests in the same items. It can be valuable to select only users that can effectively refine the prediction process. For this purpose, we define a reliability measure that ensures strong user profiles and therefore a reliable neighborhood.

3.2.1 Reliable neighborhood selection

Let \(C_{uv}\) be the reliability value between users u and v. \(C_{uv}\) expresses how reliable v is according to u. It is computed as follows:

$$\begin{aligned} C_{uv}= 1-\frac{{\sum \limits _{i \in {I_u} \cap {I_v}} {\frac{{|r_{ui} - r_{vi}|}}{{|{I_u} \cap {I_v}|}}}}}{{r_{max}}}, \end{aligned}$$
(6)

where \(I_u\) and \(I_v\) are sets of items, respectively, selected by u and v, \(r_{max}\) is the maximum value of rating. The reliability between u and v decreases when the gap between \(r_{ui}\) and \(r_{vi}\) increases. It translates a divergence of interests of u and v. Inversely, the reliability between u and v as the gap between \(r_{ui}\) and \(r_{vi}\) decreases.

We consider users and items, respectively, as nodes and edges of a graph in which users who have co-rated items are linked by an edge. We assume indirect reliable users since if v is reliable to u and user w is reliable to v, then w is indirectly reliable to u [3]. The indirect reliability evaluation enables the data sparsity resilience of our system. We assess indirect reliability scores as follows:

$$\begin{aligned} C_{uv}= 1-\frac{{d_{uv}+\varepsilon }}{{d_{max}}}, \end{aligned}$$
(7)

where \(d_{uv}\) is the trust propagation distance [31] that refers to the number of nodes (users) existing between u and v, \(d_{max}\) is the maximum allowable distance between two users, and \(\varepsilon \ll 1\) is a positive value to avoid a reliability value equal to 0. Looking to the reliability measure in Eq. (7), the reliability between u and v decreases when \(d_{uv}\) increases and vice-versa.

The reliability measure is finally updated in order to integrate both direct and indirect aspects of the reliability between users. The reliabily measure is reexpressed as follows:

$$\begin{aligned} C_{uv}= (1-\frac{{\sum \limits _{i \in {I_u} \cap {I_v}} {\frac{{|r_{ui} - r_{vi}|}}{{|{I_u} \cap {I_v}|}}}}}{{r_{max}}})(1-\frac{{d_{uv}+\varepsilon }}{{d_{max}}}). \end{aligned}$$
(8)

Given the reliability measure above-computer, the reliable neighborhood is defined as \(Near_{rel}=\{v \in U|sim_{uv} \ge \gamma\) and \(C_{uv} \ge \delta \}\).

3.2.2 Matrix factorization regularization

Like-minded users from the reliable neighborhood tend to enjoy similar items. Since user actions on items are described by latent feature spaces, it can be intuitively considered that the divergence between latent vectors needs to be minimized [51] as follows:

$$\begin{aligned} \mathop {\min } \frac{1}{2}{\sum \limits _{v \in Near_{rel}} {{sim_{uv}.C_{uv} }}||P_u-P_v||^2_F} , \end{aligned}$$
(9)

where \(P_u\) and \(P_v\) are respective latent feature vectors of users u and v.

Given the reliable neighborhood-based regularization, the loss function is updated as follows:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _{P,Q} \frac{1}{2}{\sum \limits _{u \in U} {\sum \limits _{i \in I} {{G_{ui}}((r_{ui} - {b_{ui}} - \sum \limits _{f \in F} {{p_{uf}}{q_{fi}}} )} } ^2}\\ + \lambda (\sum \limits _{u \in U} {||} {P_u}|{|^2} + \sum \limits _{i \in I} {||} {Q_i}|{|^2} + \sum \limits _{u \in U} {\sum \limits _{i \in I} {b_{ui}^2} } ) \\ +\omega \sum \limits _{u \in U}{\sum \limits _{v \in Near_{rel}} {{sim_{uv}.C_{uv} }}||P_u-P_v||^2_F})\\ \\ \Leftrightarrow \mathop {\min }\limits _{P,Q} \frac{1}{2}{\sum \limits _{u \in U} {\sum \limits _{i \in I} {{G_{ui}}((r_{ui} - {b_{ui}} - \sum \limits _{f \in F} {{p_{uf}}{q_{fi}}} )} } ^2}\\ + \lambda (\sum \limits _{u \in U} {\sum \limits _{f \in F} {p_{uf}^2} } + \sum \limits _{i \in I} {\sum \limits _{f \in F} {q_{fi}^2} } + \sum \limits _{u \in U} {\sum \limits _{i \in I} {b_{ui}^2} } )) \\ +\omega \sum \limits _{u \in U} {\sum \limits _{v \in Near_{rel}} {{sim_{uv}.C_{uv} }}\sum \limits _{f \in F}{(p_{uf}-p_{vf})^2}}). \end{array} \end{aligned}$$
(10)

where \(\omega\) is an additional parameter that enables the control of the effect of the neighborhood-based regularization.

The interaction function is solved by using the Stochastic Descent of Gradient (SDG) since it is widely adopted as optimization method [57]. This method iteratively updates latent feature spaces regarding the gradient direction of the loss function. The update rules are expressed as follows:

$$\begin{aligned} \begin{array}{l} \frac{{\partial D}}{{\partial P}} = \sum \limits _{u \in U} {\sum \limits _{i \in I} {{G_{ui}}((\tilde{r_{ui}} - r_{ui})} } \sum \limits _{f \in F} {{q_{fi}}} + \lambda \sum \limits _{u \in U} {\sum \limits _{f \in F} {{p_{uf}}} }\\ + \omega \sum \limits _{u \in U} {\sum \limits _{v \in Near_{rel}} {sim_{uv}.C_{uv}\sum \limits _{f \in F} {{|{p_{uf}} - {p_{vf}}|}} } } ) \end{array}, \end{aligned}$$
(11)

where D corresponds to the loss function,

$$\begin{aligned}&\begin{array}{l} \frac{{\partial D}}{{\partial Q}} = \sum \limits _{u \in U} {\sum \limits _{i \in I} {{G_{ui}}((\tilde{r_{ui}} - r_{ui})} } \sum \limits _{f \in F} {{p_{uf}}} + \lambda \sum \limits _{i \in I} {\sum \limits _{f \in F} {{q_{fi}}} } ) \end{array}, \end{aligned}$$
(12)
$$\begin{aligned}&\begin{array}{l} \frac{{\partial D}}{{\partial b}} = \sum \limits _{u \in U} {\sum \limits _{i \in I} {{G_{ui}}((\tilde{r_{ui}} - r_{ui})b_{ui}} } +\lambda \sum \limits _{u \in U} {\sum \limits _{i \in I} {{b_{ui}}} }) \end{array}, \end{aligned}$$
(13)

where \(\tilde{r_{ui}}={b_{ui}} + \sum \limits _{f \in F} {{p_{uf}}{q_{fi}}}\).

The optimization process starts with the initialization of P and Q with random positive values. The optimal latent feature matrices that minimize the interaction function are retrieved at the algorithm convergence. Iterations are performed following update rules (in Eqs. (11,12)) and observing a learning rate \(\alpha\).

Algorithm 1 summarizes the latent feature extraction process.

figure a

3.3 Deep neural network model

The proposed model is a feedforward neural network that is consisted of an input layer \(L_{in}\), several hidden layers \(L_k\), an output layer \(L_{out}\) and a merging layer \(L_{merge}\). The neural network is fed by latent features in order to predict z score in a first step. Thereafter, at the merging layer \(L_{merge}\), the prediction of the multilayer perceptron is combined to outcomes of the doubly-regularized MF model in order to predict \({\tilde{x}}\) score that is assessed according to the training instance x.

The input layer \(L_{in}\) of the neural network is fed by latent features and the input vector is obtained as follows:

$$\begin{aligned} {y_0} = {P_u} \otimes {Q_i}, \end{aligned}$$
(14)

where \(\otimes\) is the dot product operator, \(P_u\) is the user latent vector and \(Q_i\) the item latent vector. The layer \(L_1\) is fed by \(y_0\) and the output vector \(y_1\) of this first hidden layer is computed as follows:

$$\begin{aligned} {y_1} = \rho _{1} (w_1y_0+a_1), \end{aligned}$$
(15)

where \(w_1\) is the set of weights contained in a matrice between the input layer and the first hidden layer \(L_1\), \(a_1\) are biases in \(L_1\) layer, and \(\rho _1\) is the activation function that enables the nonlinearity property of the neural network. We use Swish function proposed by Google Brain [37, 49] and that consistently outperforms the most spreadly adopted ReLU function [37]. Unlike ReLU function, Swish activation function is smoother and does not hastly change direction. In addition, it efficiently deals with issues such as vanishing gradients [43]. Swish activation function is defined as follows:

$$\begin{aligned} Swish(x)=x.sigmoid(x)=\frac{x}{{1 + {e^{ - x}}}}. \end{aligned}$$
(16)

Generally, outputs of hidden layers \(L_k\) are denoted by \(y_k\) and obtained as follows:

$$\begin{aligned} y_k=\rho _{k}(w_k(\rho _{k-1}(...\rho _{2}(w_2y_1+a_2)...))+a_k), \end{aligned}$$
(17)

where \(\rho _{k}\), \(w_k\) and \(a_k\) are, respectively, the activation function, the weight matrix and biases of neurons in the hidden layer \(L_k\). For the output layer \(L_{out}\) of the Multilayer Perceptron (MLP), the prediction \(y_{out}\) is trivially computed as follows:

$$\begin{aligned} y_{out}=\rho _{out}(w_{out}y_k+a_{out}), \end{aligned}$$
(18)

where \(\rho _{out}\) is the activation function that is also Swish like for hidden layers.

For the merging layer \(L_{merge}\), the output prediction is performed as a combination between the doubly-regularized MF model and the MLP model. The output at this layer is computed as follows:

$$\begin{aligned} {\tilde{x}}=\rho _{merge}(w_{merge}(y_{out}+{P_u} \otimes {Q_i})+a_{merge}), \end{aligned}$$
(19)

where \(\rho _{merge}\) is the Softmax activation function that is appropriated for output of neural network [8]. The prediction of the proposed model is assessed by using the normalized cross-entropy method [47] through the following cost function:

$$\begin{aligned} E = - \sum \limits _{t = 1}^O {\frac{{{x_t}}}{{{R_{\max }}}}\log \widetilde{{x_t}} + (1 - \frac{{{x_t}}}{{{R_{\max }}}})} \log (1 - \widetilde{{x_t}}), \end{aligned}$$
(20)

where \(R_{\max }\) is the maximum rating values and O is the number of neurons in the merging layer \(L_{merge}\).

The training stage of the proposed model consists to determine optimal weights and biases that minimize the cost function in Eq. (20). For this purpose, we use the gradient descent method through update rules expressed as follows:

$$\begin{aligned} \begin{array}{l} w \leftarrow w - \gamma ( \frac{{\partial E}}{{\partial W}}),\\ a \leftarrow a - \gamma ( \frac{{\partial E}}{{\partial A}}), \end{array} \end{aligned}$$
(21)

where W and A are respectively the weight and bias matrices.

Consider the merging layer \(L_{merge}\), we compute the gradient for weight \(w_{merge}\) as follows:

$$\begin{aligned} \frac{{\partial E}}{{\partial {w_{merge}}}} = \frac{{\partial E}}{{\partial (T_{merge})}}\frac{{\partial (T_{merge})}}{{\partial {w_{merge}}}} =\frac{{\partial E}}{{\partial (T_{merge})}}{y_{out}}, \end{aligned}$$
(22)

where \(T_{merge}=w_{merge}{y_{out}} + {a_{merge}}\),

$$\begin{aligned} \begin{array}{l} \frac{{\partial E}}{{\partial {T_{merge}}}} = \frac{{\partial E}}{{\partial {\widetilde{x}}}}\frac{{\partial {\widetilde{x}}}}{{\partial {T_{merge}}}},\begin{array}{*{20}{c}} {where}&{}{} \end{array}\\ \frac{{\partial E}}{{\partial {\widetilde{x}}}} = \frac{{1 - \frac{x}{{{R_{\max }}}}}}{{1 - {\widetilde{x}}}} - \frac{x}{{{R_{\max }}{\widetilde{x}}}} = \frac{{{R_{\max }}{\widetilde{x}} - x}}{{{R_{\max }}{\widetilde{x}}(1 - {\widetilde{x}})}},\\ \frac{{\partial {\widetilde{x}}}}{{\partial {T_{merge}}}} = \frac{{\partial Soft\max ({w_{merge}}({y_{out}} + {P_u} \otimes {Q_i}) + {a_{merge}})}}{{{T_{merge}}}}\\ \begin{array}{*{20}{c}} {}&{}{}&{} = \end{array}\frac{{\partial Soft\max ({T_{merge}} + {w_{merge}}({P_u} \otimes {Q_i}))}}{{{T_{merge}}}}\\ \begin{array}{*{20}{c}} {}&{}{}&{} = \end{array}{\widetilde{x}}(1 - {\widetilde{x}}),\begin{array}{*{20}{c}} {finally}&{}{} \end{array}\\ \frac{{\partial E}}{{\partial {T_{merge}}}} = \frac{{{R_{\max }}{\widetilde{x}} - x}}{{{R_{\max }}}}. \end{array} \end{aligned}$$
(23)

For hidden layers \(L_k\), following the same analysis we have:

$$\begin{aligned} \frac{{\partial E}}{{\partial {T_k}}} = \frac{{\partial E}}{{\partial (T_{k+1})}}\frac{{\partial (T_{k+1})}}{{\partial {T_k}}} , \end{aligned}$$
(24)

where \(\left\{ \begin{array}{l} T_{k}=w_{k}{y_{k-1}} + {a_{k}},\\ T_{k+1}=w_{k+1}{y_{k}} + {a_{k+1}}\\ y_k=Swish(w_ky_{k-1}+a_k)=Swish(T_k), \end{array} \right.\)

\(\begin{array}{l}\frac{{\partial T_{k+1}}}{{\partial {T_k}}} =w_{k+1}(y_k+\frac{{y_k}}{{T_k}}(1-y_k)) \\ \end{array}\)

Finally, Eq. (24) is reexpressed as follows:

$$\begin{aligned} \frac{{\partial E}}{{\partial {T_k}}} = \frac{{\partial E}}{{\partial T_{k+1}}}[w_{k+1}(y_k+\frac{{y_k}}{{T_k}}(1-y_k))] . \end{aligned}$$
(25)

The gradients for bias a are computed as follows:

$$\begin{aligned} \frac{{\partial E}}{{\partial {a_k}}} = \frac{{\partial E}}{{\partial T_{k}}}\frac{{\partial T_{k}}}{{\partial a_{k}}}=\frac{{\partial E}}{{\partial T_{k}}}, \end{aligned}$$
(26)

with \(\frac{{\partial T_{k}}}{{\partial a_{k}}}=\frac{{\partial (w_{k}{y_{k-1}} + {a_{k})}}}{{\partial a_{k}}}=1\).

Update rules of Eq. (21) are reexpressed as follows:

$$\begin{aligned} \begin{array}{l} w_k \leftarrow w_k-\gamma \frac{{\partial E}}{{\partial T_{k}}},\\ b_k \leftarrow b_k-\gamma \frac{{\partial E}}{{\partial T_{k}}}, \end{array} \end{aligned}$$
(27)

where \(\frac{{\partial E}}{{\partial T_{k}}}\) is computed in Eq. (24) for hidden layers and in Eq. (23) for the merging layer.

3.4 The recommendation

Once the training phase achieved, by using the proposed model, the prediction \({\tilde{r}}_{ui}\) on unselected items is computed as follows:

$$\begin{aligned} {\tilde{r}}_{ui}=\mathop {\arg \max (\widetilde{{x_t}})}\limits _t. \end{aligned}$$
(28)

Predicted items are ranked by decreasing order of ratings. Thereafter, the most relevant items are returned to the end-user.

The next section details the experiments and presents the obtained results.

4 Experiments and results

In this section, performances of the proposed method are assessed comparatively to competing CF-based methods. Experiments are conducted on real-world datasetsFootnote 1. described in Tab. 1.

Table 1 Datasets Specifications

The next subsection presents the experiment process.

4.1 Experiments setup

Experimentations are conducted on a computer that hosts an Intel Core i7 (2.4 GHz) typed processor with 16 GB RAM, and that runs Windows 10 Operating System. We develop our algorithm by using TensorFlow and Keras libraries in the Spyder environment since we use Python 3.6 language.

The used datasets contain invocations of movies rated by users according to their relevance. Those datasets are split into a training data part and a test data part. The test part is gradually increased causing a decrease in the training part. In this way, we can assess the data sparseness impact on recommendation precision.

Parameters of the proposed model have been set in order to maximize our proposal’s performance. The regularization rate has been set to 0.01. The learning rate has been set to 0.001. We set the number of latent features to 10. We set the size of the reliable neighborhood to 15. In the input layer, the number of units is the sum of latent features that feed the neural network and that are used at the merging layer. The neural architecture has 3 hidden layers. The two first hidden layers have 32 units for each of them. The third hidden layer has 18 neurons and the merging has 8 neurons.

The next subsection presents the metrics used to evaluate performances of the proposed method.

4.2 Evaluation metrics

We evaluate the prediction precision of the proposed method by using the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) that are spreadly adopted for this concern [57]. Low values of MAE and RMSE indicators express a high prediction precision. MAE and RMSE values are computed as follows:

$$\begin{aligned}&MAE = \frac{{\sum \limits _{s \in {S_{rec}}} {|r_{ui} - \tilde{r_{ui}}|} }}{N}, \end{aligned}$$
(29)
$$\begin{aligned}&RMSE = \sqrt{\frac{{\sum \limits _{s \in {S_{rec}}} {(r_{ui} - \tilde{r_{ui}})^2} }}{N}}, \end{aligned}$$
(30)

where \(r_{ui}\) is the true rating of u on item i, and \(\tilde{r}_{ui}\) the predicted rating of u on item i; N is the number of recommended items and \(S_{rec}\) is the set of recommended items.

We evaluate the recommendation quality by measuring precision and recall indicators. High scores of precision and recall indicators translate a high recommendation quality. Recall measures compute the number of items rightly recommended given the number of items expected to be recommended. Meanwhile, precision measures assess the number of items rightly recommended given the proportion of recommendations. Precision and recall are computed as follows:

$$\begin{aligned}&Precision = \frac{|S_{rec} \cap E|}{S_{rec}}, \end{aligned}$$
(31)
$$\begin{aligned}&Recall = \frac{|S_{rec} \cap E|}{E}, \end{aligned}$$
(32)

where \(S_{rec}\) is the set of recommendations and E the set of expected items.

4.3 Results and analysis

The dualDeepMF method proposed in this paper is assessed comparatively to state-of-the-art recommendation methods hereafter detailed:

  • IPCC is a memory-based method that is based on the evaluation of inter-item similarities [53]. We apply on this approach the trustworthiness evaluation in order to enhance the recommendation quality.

  • NeuMF is a neural model-based method that uses the generalized matrix factorization combined to a neural architecture in order to perform the recommendation [11].

  • I-AutoRec is a neural model-based method that performs recommendations by employing an autoencoder to extract latent features that thereafter feed a neural network which makes predictions of unknown items [57].

Fig. 2
figure 2

MAE Performances on MovieLens

Fig. 3
figure 3

RSME Performances on MovieLens

Fig. 4
figure 4

MAE Performances on Filmtrust

Fig. 5
figure 5

RMSE Performances on Filmtrust

4.3.1 Assessment of the item prediction accuracy

Using the MovieLens dataset, Figs. 2 and 3 depict MAE and RMSE performances of the proposed method. It can be observed that our proposal outperforms other methods in terms of prediction precision. Using the Filmtrust dataset, Figs. 4 and 5 show that the proposed method has a better prediction accuracy compared to others. In addition, it can be observed that the prediction accuracy of all methods globally decreases when the test data part increases. It highlights the robustness and the scalability of our proposal in challenging conditions.

4.3.2 Assessment of the recommendation quality

Using the MovieLens dataset, Figs. 6 and 7 show the precision and recall performances of the proposed method. It can be observed that the precision and recall measures of our proposal are better than those of other methods. Indeed, the recommendation quality of the proposed method is the highest for 20 recommendations. Upper than 20 recommendations, the precision and recall decrease. It can be due to the fact that a number of recommendations upper than 20 include items with poor predicted ratings and that therefore appear as noisy recommended items.

Using the Filmtrust dataset, Figs. 8 and 9 show that the global precision and recall trends of the proposed method are better than those of other methods. The recommendation quality of our proposal is highest for a number of recommendations comprise between 10 and 15. Upper than 15 recommendations, the precision and recall trends decrease. It can be explained by the fact that the additional items behave as noise since they contribute with lower ratings.

Fig. 6
figure 6

MAE Performances on MovieLens

Fig. 7
figure 7

RSME Performances on MovieLens

Fig. 8
figure 8

MAE Performances on Filmtrust

Fig. 9
figure 9

RMSE Performances on Filmtrust

4.3.3 Impact of the reliable neighborhood

Looking at Figs. 10 and 11, it can be observed the impact of the size of the reliable neighborhood using both MovieLens and Filmtrust datasets. It can be observed that for a neighborhood size upper than 15 the prediction accuracy is affected. It can be observed by the fact, for a neighborhood size upper than 15, the reliability of additional users is poor. Therefore, those users are considered as doubtful users since their contribution to the recommendation process affects the prediction accuracy.

Fig. 10
figure 10

Impact of the Reliable Neighborhood Size on MAE Performances on MovieLens

Fig. 11
figure 11

Impact of the Reliable Neighborhood Size on MAE Performances on Filmtrust

5 Conclusion and perspectives

In this paper, we have proposed a recommender system to effectively address the information overload. The proposed model lays on an enhanced matrix factorization (MF) coupled to a novel deep neural architecture. The MF model developed is doubly regularized with both biases and reliable user’s neighborhood to accurately model user-item interactions. Thereafter, the nonlinearity of the proposed deep neural network is used to alleviate the limitations of the linear dot product involved in the MF process. Series of experiments have been performed on real-world datasets and show the effectiveness of our proposal compared to state-of-the-art recommendation methods in terms of accuracy and quality of the recommendation.

In the future, to further refine the recommendation, the proposed method could be extended by mining user’s opinions or by analyzing user’s sentiment through users’ activities on social networks. The proposed model can be extended by mining user’s opinions since words in users’ reviews or comments on social networks can be scored by using a sentiment lexicon to transform words into scores expressing the users’ appreciation level about items. The users’ reviews expressed into scores rating users’ satisfaction about items make ease the implementation of our recommendation proposal to predict users’ expectations about unknown items. The subjects on what users comment can be correlated to items likely to be interesting for users who left comments about them. For this purpose, recurrent neural networks could be useful to explore in order to mine textual reviews and comments left by users concerning selected items. Neural networks offer tremendous promises in the recommendation field. However, security and privacy aspects need to be also considered since some sensible training data can be inferred or the system can be vulnerable and wrongly performs predictions.