1 Introduction

Reciprocal Recommender Systems (RRS) that suggest users to each other can be found in many modern applications, from online dating and recruitment, to mentor-mentee matching [1,2,3]. RRS provide reciprocal recommendations based on the agreement between users’ preferences. Reciprocal recommendation is one where likings of user being recommended and the user who receives recommendation are fulfilled. For example, in online dating, a good match can happen when both the parties agree upon each other with respect to their preferences such as attractiveness, hobbies, profession. Likewise, in the case of online recruitment, both the candidate as well as the recruiter has some preferences such as salary, expertise, skills required for the position advertised and possessed by the candidate that has to be satisfied.

Recent advancements in deep learning techniques have shown outstanding predictive performance and have widely been used in reciprocal recommender systems. However, deep learning is often subject to flak because of its ‘black-box’ nature, which refers to its inability to ‘explain’ its results in a form that is suitable for human understanding. Even if we are able to recognize the basic mathematical modelling of these deep learning architectures, it’s not possible to get insight of the actual internal working of these models. We often need explicit reasoning tools that are able to explain the results of the model.

Generating explainable recommendations has become critical due to the demand of transparency in artificial intelligence applications, accountability and ethical practices. In recent years, it has become crucial to offer explanations to justify or explain how recommendations are generated by the algorithm. This enables users to decide judiciously whether or not they should accept the recommendations and also to what degree. Generating explainable recommendations is a necessary requirement to increase user’s credence in the recommendations generated by the system. An explanation can be any information that dictates the fundamental reasons for the decision taken by the model.

Transparency in design of the model (also known as intrinsic) and post-hoc explanations are two common ways of generating explainable recommendations [4]. Intrinsic explanation refers to identifying how a model functions by understanding its structure, components and training algorithms. The post-hoc explanation focuses on explaining recommendation results i.e., why a particular outcome is inferred. It facilitates analytic statements, e.g., why an item is recommended to you; provides visualizations and explanations by example to support recommendation results. Figure 1 shows a general categorization of explainable artificial intelligence (XAI) approaches.

Fig. 1
figure 1

Taxonomy of XAI

Many situations in real life involve choices that are difficult to make, such as choosing the optimal course of action based on available information (exploitation) or learning more about the environment (exploration). This is referred to as the exploitation vs. exploration trade-off, which necessitates balancing reward maximization based on the knowledge that the agent has previously obtained and trying new actions to further contribute knowledge. When generating recommendations, RRS is faced with a choice between exploiting the user’s known preferences and exploring potential unexplored preferences. In essence, RRS must leverage its understanding of the previously selected people as potential matches for any user, while also looking into new options for the user, while making suggestions. Figure 2 shows a toy illustration of the exploitation-exploration people engage in when choosing a compatible partner in online dating. The figure depicts that Alice is exploiting her knowledge and Mary is exploring more options to choose her partner. It highlights the fact that people may oscillate between these two approaches as they evaluate different potential partners and weigh the pros and cons of each option. Ultimately, the goal is to find the best possible match, which may require both exploiting existing knowledge and exploring new possibilities. The general RRS are not capable of solving the exploitation-exploration dilemma which limits their ability to present new or informative choices for the user.

Fig. 2
figure 2

Toy example to illustrate exploitation-exploration in RRS

With an aim to tackle this dilemma, we have formulated reciprocal recommender system as a contextual bandit [5]. In contextual bandit setting, a learner or agent perceives context, selects action, and obtains numerical loss or reward from the environment [6].

We propose XSiameseBiGRU-UCB: Siamese neural network model with argumentative explanations. Siamese architecture used in XSiameseBiGRU-UCB transforms raw features and adds upper confidence bound (UCB) in the last layer of the network to address the exploitation vs exploration dilemma in reciprocal recommender systems. XSiameseBiGRU-UCB provides reciprocal recommendations along with the personalized and intuitive explanations. It uses argumentation to generate post-hoc explanations of the reciprocal matches generated. To generate arguments for supporting a reciprocal match, our proposed approach uses contextual information available in user profiles. We compute arguments’ degree of support for the claim using Sorensen-Dice coefficient (SDC) and Siamese Bi-directional Gated Recurrent Units network semantic model based on Sentence-BERT (SBiGRU-SBERT). We chose SDC to compute similarity between binary vectors as it excludes negative co-occurrences and, in our case, the positive matches are more significant than the negative matches in the binary vectors. For computing the proximity between textual data from open ended descriptions in user profiles, we proposed SBiGRU-SBERT which is inspired by our SBiGRU model [7] with Sentence-BERT (SBERT).

Our proposed XSiameseBiGRU-UCB captures the predictive strength of deep learning along with associative reinforcement learning to find reciprocal recommendations. XSiameseBiGRU-UCB requires no other information apart from users’ profile for generating recommendations. Thus, the system is free from the cold-start problem. Experiments to evaluate XSiameseBiGRU-UCB are performed on four datasets viz. speed experiment dataFootnote 1, anonymized dataset from an online dating siteFootnote 2, okCupid profilesFootnote 3 and online recruitment dataFootnote 4. Our proposed approach exhibits better performance in comparison to state-of-the-art algorithms.

The main contributions of XSiameseBiGRU-UCB are summarized below:

  • XSiameseBiGRU-UCB provides argumentative explanations for the generated reciprocal recommendations using proposed Siamese Bi-directional Gated Recurrent Units network semantic model based on Sentence-BERT (SBiGRU-SBERT) and Sorensen-Dice coefficient (SDC).

  • XSiameseBiGRU-UCB utilizes multiple aspects such as demographical information, primary goal, user’s preferences, UCB exploratory strategy to recommend new and informative choices to the user.

  • We provide a formalization using contextual bandits for the inherent exploitation-exploration dilemma of reciprocal recommender system.

  • XSiameseBiGRU-UCB also addresses critical challenges faced by RRS namely cold-start problem, limited availability of users and popularity bias.

  • An integrated loss function that combines binary cross entropy and contrastive loss is used in the training stage to encourage inter-class separability and intra-class compactness.

  • Extensive experiments conducted on real-world datasets and comparisons with state-of-the-art approaches exhibit the efficacy of XSiameseBiGRU-UCB w.r.t. number of evaluation criteria.

Section 2 discusses related work followed by basic concepts in reciprocal recommender system, argumentation and contextual-bandits in Section 3. Section 4 describes our proposed approach XSiameseBiGRU-UCB. Datasets used, implementation and experimental results are discussed in Section 5. Lastly, we conclude in section 6 followed by references.

2 Related work

The study of reciprocal recommender systems has grown rapidly due to the increasing interest in online dating, online recruitment and other social matching domains. Moreover, explainable artificial intelligence (XAI) has emerged as a field which aims at making black-box models understandable to augment their effectiveness. We provide a review of reciprocal recommender systems and explainable recommendations.

RECON [1] is one of the best-known preliminary studies for generating reciprocal commendations in online dating. It computes unidirectional preference scores of users with other users and combines them into a reciprocity score using harmonic mean. Krzywicki et al. [8] present a recommendation algorithm in two-stages with collaborative filtering and re-ranking the recommendations using decision tree constructed with interactions data as a “critic". Zheng et al. [9] present a multi-dimensional utility framework using multi-criteria ratings for speed-dating data. Zheng et al. [10] further present utility-based multi-stakeholder recommendation approach and techniques to learn user expectations if not available in the data. Tay et al. [11] estimate the agreement between users for relationship recommendation on regular social networks using deep learning. Neve et al. [12] present collaborative filtering-based reciprocal recommendations using latent factor models and further propose hybrid RRS for a social networking service [13]. Kumari et al. [5] propose a multifaceted reciprocal recommendation approach using contextual-bandits for online dating. Different from their previous work, in this paper, authors propose an explainable RRS which generates argumentative explanations using SBiGRU-SBERT and SDC. This work also improves the model SiameseNN-UCB proposed in [5] with bi-directional gated recurrent units and integrated loss function.

Some of the existing works in the literature focuses on reciprocal recommendations for domains such as online recruitment [2], online learning [14], social network sites [15], skill sharing [13], mentor-mentee [3].

Equality and diversity are becoming increasingly significant in online dating or matrimonial. This motivates us to highlight these considerations within RRS. A wide variety of research within RRS focusses on online dating for opposite gender relationships (heterosexual dating), but not extended to homosexual dating. Our proposed approach can be applied on single class (e.g., homosexual online dating) as well as two-disjoint class (e.g., heterosexual online dating). We also distinguish reciprocity as direct and indirect.

Most of the existing works in explainable recommender system focus mainly on the non-reciprocal recommendations. These explanation approaches consider only the preferences of the user for whom recommendations are being generated [16]. In [17], a matrix factorization model is trained to generate association rules that interpret the recommendations. Existing work primarily employs global association rule mining to identify relationships between items, which is based on item co-occurrence across all user transactions. As a result, the explanations are not personalized, meaning that different users will receive the same explanation as long as they are recommended the same item and have similar purchase histories. This limits its applicability in modern recommender systems or reciprocal recommender systems, which strive to provide personalized services to users. Additionally, it’s not possible to associate all recommended items with other items based on available historical interactions, which means that in some cases no explanation can be provided which limits the usefulness of the proposed method. Zhang et al. [18] introduce a news recommendation model that utilizes meta explanation triplets to provide user-centered and news-centered explanations. The model includes heterogeneous graphs to use various types of side information to improve news recommendation. Explanations are categorized into two sets: one set centered around the user (e.g., “users living in the same city") and another set centered around the product (e.g., “news with the same topic"). In this paper, however, we present personalized and intuitive explanations for both parties along with a degree of support which tells degree by which arguments justify the reciprocal recommendation generated using the proposed model, XSiameseBiGRU-UCB.

Explainable reciprocal recommendations have shown its effectiveness in maximizing the persuasiveness of the generated recommendations [19].

Over the years, argumentation has achieved substantial success in explainable artificial intelligence due to its strong explainability capabilities [20]. The explanations generated by argumentation are closer to the way humans reason or think. Some of the existing works in recommender systems use [21, 22] Defeasible Logic Programming [23]. Bedi et al.[24] propose an interest-based recommender system for generating personalised recommendations. Briguez et al. [22] details the conditions for movie recommendation using a set of predefined postulates. Naveed et al. [25] develop a formalisation of explanations for generated recommendations based on Toulmin’s model of argumentation [26]. Briguez et al. [27] generate argumentation-based recommendations after modelling rule-based user preferences using Defeasible logic programming. Argumentation is integrated in social recommender system [28] to provide reasoning of the generated recommendations using the preferences of neighbours. Existing works in the literature of recommender systems discussed above uses argumentation. However, to the best of our understanding, argumentation is not yet explored in reciprocal recommender systems. Different from the earlier works, in this paper, we use factual and evidence arguments based on contextual information such as demographical information, primary goal, user’s preferences, exploratory behaviour to provide natural language explanations which are customized and comprehensible. In this paper, we propose argument-based reasoning in XSiameseBiGRU-UCB to provide explanations using arguments in support of generated recommendation.

3 Preliminaries

3.1 Reciprocal recommender systems

Reciprocity often plays a primal role in person-to-person relationships. It aids in forming social connection and mutual benefit. The ability to build a good match is usually dependent on both parties reciprocating. Although, reciprocity is a well-examined concept in various domains, namely, evolutionary psychology and economics to view the impact of cooperation in human populations. Our work considers reciprocity in RRS and distinguishes two kinds of reciprocity: direct and indirect reciprocity, both of which can be thought of as key mechanisms for mutual agreement. Direct reciprocity can be observed when both the parties involved use their own experiences/preferences to decide whether a request should be sent to connect with the other person or not. However, in indirect reciprocity they also consider the experiences of others. Figure 3 illustrates different kinds of direct or indirect reciprocity that can be observed in person-to-person reciprocal recommender system.

Fig. 3
figure 3

Illustration of different forms of reciprocity which can be observed in RRS

Reciprocal Recommender Systems (RRS)

generate recommendations based on the mutual agreement of users’ preferences and are principally used in domains such as, online dating, online recruitment, academic or research collaborations. Figure 4 shows a general RRS.

RRS intrinsically differs from conventional item-to-user Recommender Systems (RS) in following aspects:

  • With a conventional RS, the emphasis is on recommending items that the user will find most interesting. A good recommendation for the RRS problem, however, should take into account the interests of both parties, and not just the user receiving the recommendation.

  • In conventional RS, recommending the same item to a large number of users is not an issue. This contrasts with RRS as availability of persons is constrained. It is best to refrain from suggesting the same individual to many people. For example, in online dating, people select only few people or possibly one person to date. Also, the person who is recommended to too many people, becomes overawed by the attention and stops replying. Thus, overloading people should be evaded in RRS.

  • Sometimes, users may quit the RRS site and never come back even after getting successful match. Because of this, the cold-start issue is critical in RRS. However, even after a successful recommendation is made, users may still remain a part of the system in conventional RS.

Fig. 4
figure 4

Reciprocal Recommender System

Depending upon the decomposability of the set of users, RRS can be further classified as follows:

Single class RRS

If any user y is a potential match to any other user x where \(x,y \in U\) and \(x \ne y\), then it is single-class RRS.

Two-disjoint class RRS

Set of all users represented by U can be partitioned into two disjoint sets \(U_{1}\) and \(U_{2}\) such that if \(x \in U_{1}\) then \(y \in U_{2}\) and vice versa.

3.2 Basics of argumentation

Argumentation attempts to identify suitable conjecture and conclusions of reasoning. It is an important and commonly used form of human cognition. We often analyse some key arguments and counterarguments while making a decision.

An argument attempts to support or defend beliefs, decisions or claims. It tries to give a logical proof of the claim. In simpler terms, argumentation can be thought of as a justification of a knowledge through another knowledge. Knowledge can be expressed as judgments, decisions, hypotheses, concepts etc. Argument involves supporting, attacking and/or modifying claims so that decision makers may accept or deny the claim. It also helps in explaining the results of reasoning usually by identifying related information and generating an explanation. Claim refers to the knowledge that is being justified and knowledge that is used in justification of claim are called arguments, or reasons. For example, consider the following newspaper story (Associated Press 1993): “A recent study found that women are more likely than men to be murdered at work. 40% of the women who died on the job in 1993 were murdered. 15% of the men who died on the job during the same period were murdered." In this example, first sentence is a claim, and next two sentences provide reasons to justify this claim [29].

In the literature, argumentation is distinguished as monological and dialogical argumentation [30]. Monological argumentation involves justifying a claim from a knowledge base containing defeasible premises, and dialogical argumentation refers to an interactive process where a set of entities or agents interact with each other to identify arguments in support of or against a particular claim. Newspaper article by an editor, speech by a politician in public rallies or talk by a resource person in a seminar are some of the situations where one can observe monological argumentation. Plaintiff making a claim and defendant resisting it in a civil lawsuit, opposition and ruling party politicians debating about a new legislation are some of the examples where dialogical argumentation occurs.

Argumentation can inherently be based on any kind of information which can be objective, subjective, or hypothetical [31]. Objective information refers to the information that is data-based, observable, or measurable by everyone involved in the argumentation. Subjective information is based on beliefs or opinions from some of those involved in the argumentation. Hypothetical information is assumed by hypothesis for the sake of constructing arguments. For example, suppose you are asked to give review of a restaurant food. There may be certain foods that you subjectively dislike i.e., foods that you personally don’t like. But while critiquing food, you leave your subjective tastes aside and make decision on the basis of objective information such as how it is cooked and seasoned. Monological argumentation is used in various argument-based logics for defeasible reasoning. In this paper, we focus on monological argumentation based on objective information for generating explanations.

3.3 Contextual bandits

Bandit algorithms [32] are a class of learning algorithms that function well in uncertain environments. The name “bandit" comes from the idea of a slot machine, with multiple options (arms) and rewards for each. By balancing exploration and exploitation, the algorithm can learn the underlying reward distributions for each arm and thus achieve the highest cumulative reward over time. The goal is to achieve the best results by taking into account the trade-off between exploiting what you know and exploring for more knowledge about the possible options. At its core, a multi-armed bandit problem is a series of repeated trials in which the user is presented with a fixed number of options (known as “arms") and receives a reward based on the choice they make. The goal is to balance the exploration of new options with the exploitation of options that have been found to be profitable in order to maximize the overall rewards over time. Contextual bandit is a particularly valuable variation of the multi-armed bandit problem, where the agent is presented with an N-dimensional context, or feature vector, before making a decision on which arm to pull. The objective is to gradually understand the association between the context vectors and rewards in order to make more accurate predictions on which action to take, given the context.

Contextual bandits are a good-fit for wide range of situations where exploitation and exploration trade-off occurs such as clinical trials, web search, and recommender systems. In such settings, an agent has to iteratively make choices that can maximize its expected rewards. However, the agent often lacks the required knowledge of the reward generation process. Thus, the agent has to explore in order to gain more insights of the reward generating process. This aforementioned exploration vs. exploitation dilemma is encountered in many sequential decision problems and can be modelled as contextual bandits [33].

Contextual bandit problems often use optimistic algorithms, which make a deterministic choice at each round based on an optimistic estimate of future rewards. Upper confidence bound (UCB) [34] based on the “optimism in the face of uncertainty” principle, is an effective and well-known method to solve contextual bandit problems. These algorithms are popular due to their simplicity and effectiveness, and they are often preferred because of their ability to discard sub-optimal actions and achieve better results. UCB-type algorithms are also easier to analyse, which makes them a good choice for complex settings. The UCB strategy balances the exploration and exploitation trade-off by setting an upper bound on the reward for each action, made up of two components. The first component is an estimate of the reward, and the second component reflects the level of confidence in that estimate. The strategy is to choose the action with the highest UCB. When the agent has a low confidence in its reward estimates, the second term becomes more prominent, promoting exploration. Conversely, when all the confidence terms are small, the algorithm focuses on exploiting the best action(s) available. It selects arms (actions) using an upper confidence bound value which is estimated with mean reward of an arm (\(r_{t,a}\)) and confidence interval value (\(c_{t,a}\)). LinUCB [6], a well-known contextual bandits algorithm, utilizes a linear model to predict the expected reward for each arm based on the context. \(r_{t,a}\) is expressed with the d-dimensional feature \(x_{t,a}\ \) of a and an unknown coefficient \(\theta \), \(x^{T}_{t,a}\theta _{t}\) and confidence interval value is given by \(\alpha _{t}{\Vert x_{t,a}\Vert _{H^{-1}_{t}}}\) with \({\alpha }_{t}>\) 0 (tuning parameter to control the exploration rate). LinUCB uses ridge regression to estimate the unknown coefficients based on previous trials. Its arm-selection strategy is expressed by the equation:

$$\begin{aligned} a_{t}=\underset{a\ \in \ A}{\text {argmax}} \{x^{T}_{t,a}\theta _{t} + \alpha _{t}{\Vert x_{t,a}\Vert _{H^{-1}_{t}}}\} \end{aligned}$$
(1)

where \(H_{t}\) matrix is computed as

$$\begin{aligned} H_{t}={\lambda I}_{d}+\sum ^{t}_{i=1}{x_{i,a_{i}}.\ }x^{T}_{i,a_{i}} \end{aligned}$$
(2)

4 Proposed work

4.1 Problem specifications

4.1.1 Contextual-bandits framework for reciprocal recommendations

We propose contextual bandits formulation for RRS as follows:

Agent: Reciprocal Recommender System algorithm.

Environment: Set of users, U. U can be divided into two disjoint sets \(U_{1}\) and \(U_{2}\) in two-disjoint class RRS such as male and female users in heterosexual online dating or recruiters and job seekers in online recruitment. For single-class RRS such as bisexual or homosexual online dating, any user a can be recommended to u where \(a,u\in U\) and \(a\ne u\).

$$\begin{aligned} U=\left\{ \begin{array}{ll} U_{1} \bigcup \ U_{2} &{} {\textit{two-disjoint class RRS}}\\ U &{} {\textit{single-class RRS}} \\ \end{array}\right\} \end{aligned}$$
(3)

Arm or action: Reciprocal match \(a \in A\) of user u.

$$\begin{aligned} \text {Set of arms, } A\mathrm {=}\left\{ \begin{array}{lll} U_{2} &{} { two-disjoint~class~RRS~and \ u}\ \in \ U_{1} \\ U_{1} &{} { two-disjoint~class~RRS~and \ u}\ \in \ U_{2} \\ U \backslash \{u\} &{} { single-class~RRS } \\ \end{array}\right\} \end{aligned}$$
(4)

Context: Contextual information of user u and arm a.

Reward: Reward of an arm a, \(r_{t,a}\) \(\in [0,1]\) is 1 if u and a are reciprocal match to each other and 0 otherwise.

We formulate RRS using the contextual bandit framework according to the specifications given below.

For each round \(t \in [T]\), the agent

  1. 1.

    Observes the context \(s_{t}\in \ S\) where S denotes the context space.

  2. 2.

    Selects an arm \(a \in A\) on the basis of observed rewards in prior rounds and receives a reward \(r_{t,a}\) which depends on user as well as the arm.

  3. 3.

    Improves arm-selection strategy using \(s_{t}\), a and \(r_{t,a}\).

In each round t, agent observes \(u_{t}\) and a with their d-dimensional feature vectors, \(<x_{u_{t}}>\) and \({<x}_{a}>\) respectively. An arm a is recommended to the user \(u_{t}\) and a reward \(r_{t,a}\) is received by the agent. Agent selects arms so as to maximize the cumulative reward or minimize the regret given by:

$$\begin{aligned} R_{T}=E[\sum ^{T}_{t=1}(r^{*}_{t,a}- r_{t,a})] \end{aligned}$$
(5)

where \(r^{*}_{t,a}\) denotes the true reward and \(r_{t,a}\) denotes the predicted reward at round t.

At any round t, reward, \(r_{t,a}\) for an arm a given feature vectors \(<x_{t,u_{t}}>\) and \(<x_{t,a}>\) can be expressed by \(f\left( {\cdot ;}{<x}_{{u}_{t}}>,\ {<x}_{a}>,\ W\right) \). Here, W denotes an unknown weight matrix.

$$\begin{aligned} r_{t,a} = f\left( {\cdot ;}{<x}_{{u}_{t}}>,\ {<x}_{a}>,\ W\right) \end{aligned}$$
(6)

Figure 5 depicts agent-environment in contextual bandits setting of RRS.

Fig. 5
figure 5

Agent-environment for contextual bandits setting

4.1.2 Argumentation for explanations of reciprocal recommendations

Argumentation provides reasoning to support a claim with the help of some premises. In this paper, we use argumentation to generate post-hoc explanations. The underlying idea is to build conclusion for the claim by generating series of arguments that support the claim. Support for a claim can mean that the claim may be inferred from existing arguments, or that the claim is evidential.

For explanations of reciprocal recommendation, we generate arguments by exploiting contextual information based on various attributes. Factual and evidence arguments are given as explanations to justify the reciprocal recommendation. Factual argument is based on the explicit preferences of users and gives direct reasoning behind the recommendations. Evidence argument is based on the similarity between the user being recommended as reciprocal match and users liked in the past.

We define the notion of argumentation using: (1) claim that is being justified, (2) support by which the justification is performed (3) to what degree arguments justify the claim.

Argumentation

Argumentation, in our usage, is defined as a unit comprising a claim, its support arguments and degree of strength for support \((<claim>:<support>:<degree\) of \(support>)\). Support arguments may be explicit or implicit. Figure 6 presents a diagrammatic representation of argumentation. Figure 7 illustrates an example of explanations using argumentation in our usage.

Fig. 6
figure 6

Illustration of argument to justify a claim

Fig. 7
figure 7

Illustration of argumentation-based personalized explanation

4.2 Proposed approach: XSiameseBiGRU-UCB

XSiameseBiGRU-UCB is a deep learning contextual bandits framework with post-hoc argumentation based personalized explanations for RRS. The proposed model has Siamese bi-directional Gated Recurrent Units to transform raw features and uses UCB strategy to tackle the inherent exploitation vs exploration dilemma in RRS. Siamese architecture aids in distance metric learning to drive a similarity metric. The reward function \(f(\cdot )\) is expressed in terms of feature vectors and UCB term.

For a pair of users \(u_{t}\) and a (\(u_{t}\in \ U,\ a\ \in \ A,\ \ a\ne u_{t}\)), output of the Siamese bi-directional Gated Recurrent Units network, \({\varphi }_{W}(x_{u_{t}},x_{a};W)\) is defined as follows:

$$\begin{aligned} {\varphi }_{W}(x_{u_{t}},x_{a};W)=\ {(G}_{W}\left( x_{u_{t}}\right) -G_{W}\left( x_{a}\right) ) \end{aligned}$$
(7)

Here, W represents shared weight in the Siamese architecture which needs to be learnt. \({\varphi }_{W}(x_{u_{t}},x_{a};W)\) is the difference between output from two identical bi-directional Gated Recurrent Units and is defined as:

$$\begin{aligned} G_{W}\left( x_{u_{t}}\right) =\chi \left( x_{u_{t}}\right) +{\alpha }_{t}{\Vert \chi \left( x_{u_{t}};W\right) \Vert }_{H^{-1}_{t-1}} \end{aligned}$$
(8)
$$\begin{aligned} G_{W}\left( x_{a}\right) =\chi \left( x_{a}\right) +{\alpha }_{t}{\Vert \chi \left( x_{a};W\right) \Vert }_{H^{-1}_{t-1}} \end{aligned}$$
(9)

\(\chi \left( .\right) \) is the \(L^{th}\) hidden layer output.

Arm-selection strategy to select the best-suited arm is then given by:

$$\begin{aligned} a_{t}=\underset{a\ \in \ A}{\text {argmin}} \left\{ \varphi _{W}\left( x_{u_{t}},x_{a};W\right) \right\} \end{aligned}$$
(10)

\(\Vert . \Vert _{H}\) is Mahalanobis distance:

$$\begin{aligned} {\Vert x\Vert }_{H}=\ \sqrt{x^{T}Hx} \end{aligned}$$
(11)

\({\alpha }_{t}\ >\ 0\) is used to control exploration rate and \(H_{t}\) matrix is computed as:

$$\begin{aligned} H_{t}={\lambda I}_{d}+\sum \limits ^{t}_{i=1}{\chi \left( x_{{i,u}_{i}};W_{i-1}\right) .\ \chi {\left( x_{{i,a_{i}}_{\ }};W_{i-1}\right) }^{T}} \end{aligned}$$
(12)

\(\lambda > 0\) and \(I_{d}\) is an identity matrix of d-dimension. Algorithm 1 describes in detail the steps followed while generating reciprocal recommendations.

We integrated binary cross-entropy (18) with contrastive loss (19) to measure the prediction error while generating reciprocal recommendations. These measures are complementary to each other. Contrastive loss mainly focuses on minimizing intra-class variations and the cross-entropy loss guides in maximizing the inter-class variations. We wish to learn a function f(.) that maps \(u_{t}\) and a into embedding vectors such that similar samples have similar embeddings and different samples have different embeddings. For a pair of inputs, contrastive loss tries to minimize the embedding distance when they belong to the same class but maximizes the distance otherwise. Contrastive loss thus reduces the intra-class variations by bringing feature vectors from the same instance together. Our loss function (17) thus increases the inter-class separability in addition to reducing intra-class separability.

figure a
figure b
figure c
figure d

With the proposed approach XSiameseBiGRU-UCB, after generating a list of reciprocal recommendations for users, users are assisted with argumentation that provides intuitive and personalized explanations. We propose to utilize post-hoc explanation approach that attempts to provide the explainability of reciprocal recommendation results.

The proposed methodology for explaining the reciprocal recommendations is represented by means of two algorithms (Algorithm 2 and 3). Algorithm 2 gives a detailed description of the proposed post-hoc argumentation-based explanations. It provides explanations as to why a user \(u_{2}\) is a potential match to another user \(u_{1}\). To generate arguments in support of \(u_{2}\), algorithm 2 computes proximity scores between multi-criteria attributes desired by \(u_{1}\) and multi-criteria attributes possessed by \(u_{2}\) (\(ArgDec_{u_{1},u_{2}}^{(m)}\)), demographic attributes (\(ArgDec_{u_{1},u_{2}}^{(d)}\)), availability (using goal and frequency of going on the event) of \(u_{1}\) and \(u_{2}\) (\(ArgDec_{u_{1},u_{2}}^{(fg)}\)) and hobbies (\(ArgDec_{u_{1},u_{2}}^{(hobbies)}\)). It also computes the proximity scores between multi-criteria, demographic and availability attributes of \(u_{2}\) and users to whom \(u_{1}\) shown interest in the past (\(ArgDec_{u_{1},u_{2}}^{(mDated)}\), \(ArgDec_{u_{1},u_{2}}^{(dDated)}\), \(ArgDec_{u_{1},u_{2}}^{(fgDated)}\)). On the basis of these computed proximity scores, it generates arguments in support for \(u_{2}\). Proximity measures used in algorithm 2 for computing the degree of strength for support:

Table 1 Contingency table for binary vectors \(x_{1}\) and \(x_{2}\) (\(1\le i \le |x_{1}|\))
  1. 1.

    Sorensen-Dice coefficient (SDC): SDC was developed by Sorensen [35] and Dice [36]. We use SDC as proximity measure for binary vectors. Table 1 shows the 2-by-2 contingency table corresponding to two binary vectors \(x_{1}\) and \(x_{2}\) where \(f_{11}\) is the number of binary features or bits equal to 1 in both \(x_{1}\) and \(x_{2}\), \(f_{10}\) is the number of binary features or bits equal to 1 in \(x_{1}\) and 0 in \(x_{2}\), \(f_{01}\) is the number of binary features or bits equal to 1 in \(x_{2}\) and 0 in \(x_{1}\) and \(f_{00}\) is the number of binary features or bits equal to 0 in both \(x_{1}\) and \(x_{2}\). SDC can then be expressed over binary vectors \(x_{1}\) and \(x_{2}\) as follows:

    $$\begin{aligned} \mathcal {S}_{\text {D}}(x_{1}, x_{2}) = \frac{2 {f_{11}}}{2 {f_{11}} + {f_{10}}+ {f_{01}}}. \end{aligned}$$
    (13)
  2. 2.

    SBiGRU-SBERT (Siamese Bi-directional Gated Recurrent Units network semantic model based on Sentence-BERT): We use SBiGRU model [7] with Sentence-BERT (SBERT) for computing the semantic similarity between textual data from open ended descriptions in user profiles. SBiGRU-SBERT takes embedded vectors from SBERT model as its input and predicts the semantic similarity of the input text pairs. It is a siamese neural network having two identical networks with shared weights and parameters. It has an embedding layer, a 2-layer bi-directional GRU, a concatenate layer and densely connected feed forward neural network. Representations of each data point obtained from bi-directional GRU are concatenated and then passed through a series of two hidden layers. At last, semantic similarity score between the input data points is calculated with sigmoid activation function. Figure 8 illustrates the framework for Siamese Bi-directional Gated Recurrent Units network semantic model based on Sentence-BERT.

Fig. 8
figure 8

Siamese Bi-directional Gated Recurrent Units network semantic model based on Sentence-BERT(SBiGRU-SBERT)

The goal of Algorithm 3 is to generate explanations for reciprocal recommendations to explain why both the parties would be benefitted by the match. For each user \(u_{i}\) in the list of reciprocal recommendations of \(u_{1}\) generated by algorithm 1, it uses Algorithm 2 to generate arguments in support to justify the claim \(u_{i}\) (eXA(\(u_{1},u_{i}\))) being a potential match for \(u_{1}\) and vice-versa (eXA(\(u_{i},u_{1}\))).

Figure 9 shows workflow of XSiameseBiGRU-UCB. Figures 10, 11, 12 and 13 show snapshots of explanations generated by XSiameseBiGRU-UCB on datasets D1 to D4.

Fig. 9
figure 9

Workflow of XSiameseBiGRU-UCB

Fig. 10
figure 10

A snapshot of explanations generated by XSiameseBiGRU-UCB for speed experiment data [D1]

Fig. 11
figure 11

A snapshot of explanations generated by XSiameseBiGRU-UCB for online dating data [D2]

Fig. 12
figure 12

A snapshot of explanations generated by XSiameseBiGRU-UCB for online recruitment [D3]

Fig. 13
figure 13

A snapshot of explanations generated by XSiameseBiGRU-UCB for okCupid [D4]

5 Experimental study

5.1 Dataset used

We conducted experiments with the following real-world datasets in order to validate XSiameseBiGRU-UCB.

[D1]:

Speed-dating experiment data: The data is collated by Ray Fisman and Sheena Iyengar [37]. The dataset has 8,378 records containing demographic details of users along with the expectations and experiences of their dating behaviour.

[D2]:

Anonymized dataset from heterosexual online dating site [38]. There are 548,395 total users with 344,552 male and 203,843 female users. Each user profile has 35 attributes, namely, user ID, gender, birth year, work location, education level, mate requirements etc.

[D3]:

Anonymized results of the 2021 Stack Overflow Developer Survey are used for candidate profiles. Candidate profiles consist of information such as current employment status, location of residence, information pertaining to education, work, career, technology, and demographic information. Job profiles are obtained from Indeed.com containing job offers related to Software Engineer of 10,000 job profiles from different companies. Each job profile contains name, company, city, ratings, summary and date.

[D4]:

okCupid profiles: The data has 59947 rows and 31 columns containing information such as open text essays related to an individual’s preferences, personal descriptions, profession, diet, education, languages spoken etc.

Table 2 shows the specifics of the datasets used in this paper.

Table 2 Specifics of the datasets used

5.2 Dataset preprocessing

Data preprocessing is a crucial step to transform the raw data into a form suitable to be fed into a model.

5.2.1 Data discretization and one-hot encoding

Data discretization and one-hot encoding was performed on categorical attributes of datasets D1 to D4 described in section 5.1. Data discretization refers to the conversion of continuous values into a finite set of intervals. We discretised demographic attributes. This was followed by one-hot encoding to transform discretized attributes into one-hot encoded vectors.

One-hot encoding refers to the process of transforming nominal or categorical features into a vector by placing the value 1 or 0 in each column, depending on whether the value present in original categorical value matches the categorical column header. We used scikit-learn’s OneHotEncoder for one-hot encoding. Since we discretized the categorical or nominal attributes before one-hot encoding, it reduced the problem of creation of too many new dummy variables. We selected one-hot encoding scheme because there is no quantitative relationship between the discretized attributes’ values, thus, using ordinal encoding schemes which allow the model to assume an ordering between attributes’ values can result in unexpected outcomes and poor performance. A common practice while using one-hot encoding is to drop one of the one-hot encoded columns from each categorical feature. However, dropping a column from one-hot encoded columns introduces a bias towards the dropped variable in regularization and also multicollinearity is rarely an issue with neural networks [39]. Thus, we have kept all the one-hot encoded columns from each categorical feature.

5.2.2 Textual data preprocessing and transformation

Textual data preprocessing includes converting text data to lower case, removing punctuation, line breaks, whitespaces, stop words. Short-forms and abbreviations were expanded, followed by stemming and lemmatization.

After the pre-processing phase, Sentence-BERT (SBERT) proposed by Nils Reimers and Iryna Gurevych [40] was used for sentence embedding. SBERT is a modified BERT network having siamese and triplet networks that is able to obtain sentence embeddings which are semantically meaningful. A pooling operation is added to the output of BERT to get fixed size sentence embedding using SBERT. Figure 14 depicts SBERT architecture.

Fig. 14
figure 14

Architecture of SBERT

We used 10-folds cross-validation method to construct train and test sets by repeatedly splitting the data into training and testing sets. The data was divided into 10 equal subsets and the model was trained on 9 of these subsets and tested on the remaining one. This process was repeated until each subset was used as the test set, and the average performance of the model on all 9 test sets was used as the cross-validated performance.

5.3 Experimental study

5.3.1 Evaluation metrics

Precision@K, recall@K and F1-score@K are used to assess the reciprocal recommendations obtained with our proposed model.

Precision@K is calculated as the fraction of retrieved reciprocal recommended users in top-K set which are relevant. Recall@K is the fraction of relevant reciprocal matches in top-K retrieved reciprocal recommendations.

$$\begin{aligned} Precision @ K = \frac{|relevant \cap retreived|}{|retrieved |} \end{aligned}$$
(14)
$$\begin{aligned} Recall @ K = \frac{|relevant \cap retreived |}{|relevant |} \end{aligned}$$
(15)

F1-score@K is given by:

$$\begin{aligned} F1 - score@K = \frac{2 * Precision@K * Recall@K}{Precision@K + Recall@K} \end{aligned}$$
(16)

Our loss function is mathematically formulated as follows:

$$\begin{aligned} \mathcal {L} = \alpha \mathcal {L}_{bcel} + \beta \mathcal {L}_{conl} \end{aligned}$$
(17)

where \(\alpha \) and \(\beta \) are hyperparameters to weigh the contributions of the losses \(\mathcal {L}_{bcel}\) and \(\mathcal {L}_{conl}\) such that \(\alpha + \beta = 1\).

For a round t, algorithm perceives context-action pair \((s_{t},a)\), predicts a reciprocal compatibility score between \(u_{t}\) and a denoted by \(rCS(u_{t},a) \in [0,1]\), then receives a reward \(r_{t,a}^{*}\). Each input paired sample \((u_{t},a)\) has a corresponding reward \(r_{t,a}^{*}\). Binary cross-entropy loss is given by:

$$\begin{aligned} \mathcal {L}_{bcel}=(-1)/|U|\sum _{\forall u_{t}\in U} {r_{t,a}^{*} log(rCS(u_{t},a)) + (1-r_{t,a}^{*})log(1-rCS(u_{t},a))} \end{aligned}$$
(18)

where set of all users is represented by U and \(|U |\) denotes total number of users.

Contrastive loss is formulated as follows:

$$\begin{aligned} \mathcal {L}_{conl} = \frac{1}{2M} \sum _{u_{t} \in U, a \in A}{\mathbbm {1}[u_{t} \sim a] ||f(u_{t}) - f(a) {||_{2}^{2}} + \mathbbm {1}[u_{t} \not \sim a] max(0,(m-||f(u_{t}) - f(a) {||_{2}^{2}})) } \end{aligned}$$
(19)

In the above equation, M represents batch size of input-pair data. m is a hyperparameter that defines lower bound distance between different samples. \(\mathbbm {1}\) stands for indicator or characteristic function. If the two candidate samples \((u_{t}, a)\) are reciprocal match, it returns 1. Otherwise, it returns 0.

Beyond-accuracy metrics, we measured aggregate diversity and Gini coefficient for coverage and concentration respectively [41]. Aggregate Diversity and Gini coefficient are used to assess the results of XSiameseBiGRU-UCB in mitigating popularity bias.

Aggregate Diversity measures total number of distinct users among topN users recommended across all potential arms and is formally defined as:

$$\begin{aligned} Aggregate~Diversity = \frac{|\bigcup _{u \in U} RR_{u} |}{|A |} \end{aligned}$$
(20)

where A refers to the set of arms out of which recommendations can be predicted for u and \(RR_{u}\) represents list of users recommended to u.

In an ideal situation where each user is recommended equally Gini coefficient is 0, and 1 for extreme inequality of recommendation frequency for different users. The Gini coefficient calculates the frequencies of user \(v_{i}\) and user \(v_{j}\) being recommended out of n number of recommended users and formally defined as follows:

$$\begin{aligned} Gini = \frac{\sum _{i=1}^{n} \sum _{j=1}^{n} |v_{i} - v_{j} |}{2 \sum _{i=1}^{n} \sum _{j=1}^{n} v_{j}} \end{aligned}$$
(21)
Fig. 15
figure 15

Precision@K of Pizzato et al. [1], Zheng et al. [9], COUPLENET [11], DeepFM [44], biDeepFM [45], Kleinerman et al. [43], SiameseNN-UCB [5] and XSiameseBiGRU-UCB

Fig. 16
figure 16

Recall@K of Pizzato et al. [1], Zheng et al. [9], COUPLENET [11], DeepFM [44], biDeepFM [45], Kleinerman et al. [43], SiameseNN-UCB [5] and XSiameseBiGRU-UCB

Fig. 17
figure 17

F1-score@K of Pizzato et al. [1], Zheng et al. [9], COUPLENET [11], DeepFM [44], biDeepFM [45], Kleinerman et al. [43], SiameseNN-UCB [5] and XSiameseBiGRU-UCB

We propose mean exploration precision and mean exploration recall to assess XSiameseBiGRU-UCB regarding exploration i.e., reciprocal match discovery - how our model helps users discover new reciprocal matches. For any user u, suppose \(RR_{u}\) is the list of reciprocal recommendations generated and \(neX_{u}\) is the list of reciprocal recommendations which cannot be explained using factual or evident arguments.

$$\begin{aligned} Mean~Exploration~Precision~(MEP) = \frac{\sum _{u} {\frac{|neX_{u} \cap RR_{u}|}{|RR_{u} |}}}{|U |} \end{aligned}$$
(22)
$$\begin{aligned} Mean~Exploration~Recall~(MER) = \frac{\sum _{u} {\frac{|neX_{u} \cap RR_{u}|}{|neX_{u} |}}}{|U |} \end{aligned}$$
(23)

To evaluate the results of explainability, we assessed the generated explanations using feature matching ratio [42] which tells us whether a feature is included in the generated explanation or not. It is defined as:

$$\begin{aligned} Feature~Matching~Ratio = \frac{1}{N}\sum _{u,i}\mathbbm {1}(f_{u,a}\in eX_{u,a}) \end{aligned}$$
(24)

Here, N denotes total number of explanations, \(eX_{u,a}\) denotes generated explanation for user-arm pair (having degree of strength of support \(\ge \) threshold-value) and \(f_{u,a}\) is the given feature. \(\mathbbm {1}(x)\) is the indicator function.

5.3.2 Hyperparameter sensitivity analysis

We performed experiments to examine the effect of batch size, loss function, optimizer, and learning rate of XSiameseBiGRU-UCB.

A. Batch Size

We experimented with different batch sizes (Table 3). With 64 batch size, validation accuracy (val_acc) was highest and validation loss (val_loss) was lowest across all the datasets used except D1.

B. Loss function

We combined binary cross-entropy and contrastive loss to improve the discriminative capability of our proposed model. Loss weights \(\alpha \) and \(\beta \) were varied in the range [0-1] to find an optimal balance between both the losses. We empirically set the loss weights \(\alpha \) and \(\beta \) to 0.3 and 0.7 since we got better performance with these values.

Table 3 Ablation study of batch size in XSiameseBiGRU-UCB
Table 4 Ablation study with various optimizers and learning rates in XSiameseBiGRU-UCB

C. Optimizer and learning rate

Table 4 shows the validation accuracy and loss values of XSiameseBiGRU-UCB with various optimizers and learning rates. We observed that the highest accuracy was achieved with Adam optimizer for learning rate of \(10^{-2}\). Validation loss was lowest for datasets D1, D2 and D4 with Adam optimizer and \(10^{-2}\) learning rate.

5.3.3 Experimental results

XSiameseBiGRU-UCB was compared with the following state-of-the-art algorithms:

  • RECON [1]: It is a content-based RRS which computes reciprocal compatibility scores of users on the basis of their profiles and actions.

  • Zheng et al. [9]: A content-based RRS model on the basis of multi-criteria utility theory.

  • COUPLENET [11]: A deep learning Siamese architecture with Gated Recurrent Units (GRU) encoders.

  • Kleinerman et al. [43]: Authors present two explain methods namely, transparent and correlation-based to provide explanations for the recommendations generated by RECON.

  • DeepFM [44]: Feature learning and factorization machines are integrated for generating recommendations.

  • biDeepFM [45] is a multi-objective learning strategy that jointly evaluates the likelihood that an applicant will send an interest to the employer and the employer will also reciprocate. Harmonic mean is used to combine the unilateral preferences \(y_{u}^{\prime }\) and \(y_{v}^{\prime }\) obtained from biDeepFM [46].

  • SiameseNN-UCB [5]: It follows Siamese architecture and uses UCB to tackle exploitation-exploration dilemma.

XSiameseBiGRU-UCB, COUPLENET [11], DeepFM [44], biDeepFM [45] and SiameseNN-UCB [5] were executed using tensorflow in Python programming language. XSiameseBiGRU-UCB, COUPLENET, DeepFM and SiameseNN-UCB were trained using Adam [47] optimizer and \(10^{-2}\) as the learning rate because we were able to obtain better results consistently in these approaches with these values. Batch size was tuned to 64. Early stopping call-back function was used to stop training when no improvement was observed in the validation loss values for fifteen consecutive epochs (patience value). A dropout of 0.3 was used for all the above mentioned deep neural network-based models. For DeepFM, a 3-layer perceptron with relu activation function was used. Dropout value of 0.8 and “constant" network shape was used as mentioned by authors in their paper [44]. Same architecture was adapted for biDeepFM also, with additional sigmoid extension. biDeepFM was trained using Weighted LogLoss with gradient descent as in [45].

We computed precision@K, recall@K and F1-score@K for a threshold value of 0.7 i.e., top-K reciprocal recommendations generated by the above approaches having reciprocal compatibility score greater than or equal to 0.7 for different values of K (1, 3, 5 and 10). The threshold value was reduced to 0.6 to obtain top-20 (K=20) recommendation results. As evident from the Figures 15, 16 and 17, there was a trade-off between precision and recall values. From Figures 15-17, it is evident that XSiameseBiGRU-UCB outperforms in terms of precision@K, recall@K and F1-score@K.

Table 5 Performance comparison results with respect to aggregate Diversity and Gini coefficient
Table 6 Performance comparison results with respect to Mean Exploration Precision and Mean Exploration Recall
Table 7 Feature Matching Ratio of XSiameseBiGRU-UCB

Generating accurate recommendations only is not always effective and profitable. Recommending only popular users as reciprocal match for other users may give high accuracy, but it also results in a decline of other desirable aspects such as recommendation diversity. Previous studies have shown that beyond accuracy, metrics such as diversity and gini coefficient should also be assessed for an effective recommender system [48]. To examine the effectiveness of XSiameseBiGRU-UCB we also computed aggregate diversity and Gini coefficient against baselines. A low value of aggregate diversity indicates that the reciprocal recommender system recommends only a small fraction of the users to target users, which in turn may negatively impact other users. It is important to have high aggregate diversity values for a more even distribution of users in the recommendation lists. Gini coefficient is another widely used metric to examine whether the recommendations have fair distribution across all the recommended users or not. A greater value of Gini coefficient indicates a stronger concentration of the recommendations, e.g., on popular users. Table 5 demonstrates the results of aggregate diversity and Gini coefficient on the datasets used. Comparison results in Table 5 highlight that XSiameseBiGRU-UCB outperforms with respect to these metrics when compared with the baselines. With XSiameseBiGRU-UCB, we are able to achieve better coverage and a better distribution as shown by its low Gini coefficient compared to the baselines.

To evaluate XSiameseBiGRU-UCB in terms of exploration or explainability of users who are recommended, we used the proposed mean exploration precision and mean exploration recall. Table 6 shows the results of mean exploration precision and mean exploration recall on all the datasets used. We found empirically that the mean exploration precision and mean exploration recall were better in XSiameseBiGRU-UCB compared to the baselines.

Feature Matching Ratio is used to evaluate the generated explanation at the feature level. Table 7 shows the results of generated explanations with respect to feature matching ratio. The results show the values of feature matching ratio of generated explanations with k= 1, 3, 5, 10 and 20. As shown in table 7, more than 62% features are included in the generated explanations.

6 Conclusion

An effective reciprocal recommender system should be able to generate reciprocal recommendations along with personalized and intuitive explanations for the generated recommendations. In this paper, we proposed XSiameseBiGRU-UCB which provided post-hoc argumentation-based explanations for the reciprocal recommendations generated using contextual bandits framework. Contextual bandits framework was used to address exploitation and exploration dilemma in RRS. We provided personalized explanations using arguments in support of the claim i.e., generated reciprocal recommendations. To generate arguments in support of a reciprocal recommendation, we used contextual information available in user profiles. We computed arguments’ degree of support for the claim with Sorensen-Dice coefficient (SDC) and Siamese Bi-directional Gated Recurrent Units network semantic model based on Sentence-BERT (SBiGRU-SBERT). With XSiameseBiGRU-UCB, we addressed single and two-disjoint class RRS with direct as well as indirect reciprocity while incorporating various aspects such as demographics, interests, exploratory behaviour and key intent along with the user’s given preferences.

XSiameseBiGRU-UCB was compared against state-of-the-art approaches on four real-world datasets. The results highlight its effectiveness w.r.t. number of evaluation criteria such as precision, recall, F1-score, aggregate diversity, Gini coefficient, mean exploration precision, mean exploration recall and feature matching ratio.