1 Introduction

Modern search engines rely on machine learned methods for ranking the matching results for a given query. Training and evaluation of models for ranking is commonly known as Learning-to-Rank (LTR) [18]. There are two common approaches for collecting the data for LTR - human judgements and implicit user feedback. For human judgements samples of documents are gathered for a sample of queries and sent to human judges who analyze and label each document. The labels can be as simple as relevant vs. not relevant or can involve more levels of relevance. This labeled data is then used for training and/or evaluation of LTR models. Collecting human judged data can be expensive and time consuming and often infeasible. On the other hand, data from implicit user feedback, such as clicks, is essentially free and abundant. For that reason it is often the preferred method for collecting data for LTR. A major drawback of this method is that the data can be heavily biased. For example, users can only click on documents that have been shown to them (presentation bias) and are more likely to click on higher ranked documents (position bias). A lot of work in the LTR literature has focused on accounting for and removing these biases. In particular, the recent paper by Joachims et al. [16] has proposed a framework for systematically removing the biases from user feedback data. Following the title of the paper we will refer to this framework as Unbiased Learning-to-Rank. In particular, the authors have focused on removing the position bias by first estimating the click propensities and then using the inverse propensities as weights in the loss function. They have shown that this method results in an unbiased loss function and hence an unbiased model.

Unbiased Learning-to-Rank is an appealing method for removing the inherent biases. However, to apply it one needs to first get a reliable estimate of click propensities. The method proposed in [16] uses result randomization in the live search engine to estimate propensities. This can negatively impact the quality of the search results, which will in turn result in poor user experience and potential loss of revenue for the company [21]. It also adds bookkeeping overhead. Wang et al. [21] have proposed a regression-based Expectation Maximization (EM) method for estimating click propensities which does not require result randomization. However, this method uses the ranking features to estimate relevances and can result in a biased estimate of propensities unless the relevance estimates are very reliable, which is difficult to achieve in practice.

In this paper we propose a novel method for estimating click propensities without any intervention in the live search results page, such as result randomization. We use query-document pairs that appear more than once at different ranks to estimate click propensities. In comparison to the EM-based algorithm in [21] our method does not rely on modeling the relevance using ranking features. In fact, we completely eliminate the relevances from the likelihood function and directly estimate the propensities by maximizing a simple likelihood function.

Agarwal et al. [1] have proposed a similar approach for estimating propensities without interventions, which has been done in parallel to our work. The approach developed there relies on having multiple different rankers in the system, such as during A/B tests. They also derive a likelihood function to estimate the propensities, called an AllPairs estimator, which depends on terms for all combinations of rank pairs. In comparison to the method in [1] our method is more general and does not rely on having multiple rankers in the system. Although requiring multiple rankers is better than intervention it may still have a similar cost. For example, a different ranker could result in a different user experience and extra book keeping overhead. In contrast, our proposed approach leverages the organic ranking variation because of time dependent features and does not result in extra costs. That said, our method can naturally take advantage of having multiple rankers, if available. More importantly, our likelihood function depends on the propensities only, rather than terms for all combinations of pairs. The number of unknown parameters to estimate for our method is linear, rather than quadratic, in the number of ranks, which is a major advantage. Our method can therefore give reliable estimates for much lower ranks using much less data.

We use simulated data to test our method and get good results. We then apply our method on actual data from eBay search logs to estimate click propensities for both web and mobile platforms and compare them with estimates using the EM method [21]. Finally, we use our estimated propensities to train an unbiased learning-to-rank model for eBay search and compare it with two baseline models - one which does not correct for position bias and one which uses EM-based estimates for bias correction. Our results show that both unbiased models significantly outperform the “biased” baseline on our offline evaluation metrics, with our model also outperforming the EM method [21].

The main novel contributions of this work can be summarized as follows:

  • We present a new approach for directly estimating click propensities without any interventions in live search. Compared with other approaches in the literature [1, 21], our approach does not require multiple rankers in the system and large amounts of data for each pair of ranks from different rankers. Moreover, our proposal gives direct estimates of the propensity without having to model relevance. This makes our approach more robust and general.

  • Under a mild assumption we derive a simple likelihood function that depends on the propensities only. This allows for propensity estimation for much lower ranks. We also prove the validity of the method through simulations.

  • We estimate propensities up to rank 500 using our method for a large eCommerce search engine. This is a much lower rank than previous methods in the literature have been able to obtain (around rank 20). This may not be important for some search engines but is especially important in the eCommerce domain where people typically browse and purchase items from much lower ranks than for web search.

  • To the best of our knowledge this is the first paper to do a detailed study of the unbiased learning-to-rank approach for eCommerce search.

The rest of the paper is organized as follows. In Sect. 2 we discuss some of the related work in the literature. In Sect. 3 we introduce our method for estimating click propensities. In Sect. 4 we apply our method to eBay search logs and estimate propensities for web and mobile search, and compare them with EM-based estimates. In Sect. 5 we train and evaluate unbiased learning-to-rank models for eBay search using our estimated propensities as well as the propensities estimated with the EM method [21], and show that our model outperforms both baselines - one without position bias correction and one with bias correction using estimates from the EM method. We summarize our work in Sect. 6 and discuss future directions for this research. The derivation of our likelihood function is presented in Appendix A. Finally, in Appendix B we apply our method to simulated data and show that we are able to obtain reliable estimates of the “true” simulated propensities.

2 Related Work

Implicit feedback such as clicks are commonly used to train user facing machine learned systems such as ranking or recommender systems. Clicks are preferred over human judged labels as they are available plentifully, are available readily and are collected in a natural environment. However, such user behavior data can only be collected over the items shown to the users. This injects a presentation bias in the collected data. This affects the machine learned systems as they are trained on user feedback data as positives and negatives. It is not feasible to present many choices to the user and it affects the performance of these systems as we can not get an accurate estimate of positives and negatives for training with feedback available only on selective samples. This situation is aggravated by the fact that the feedback of the user not only depends on the presentation, it also depends on where the item was presented. This is a subclass of the presentation bias called position bias. Joachims et al. [16] proved that if the collected user behavior data discounts the position bias accurately then the learned system will be the same as the one learned on true relevance signals.

Several approaches have been proposed to de-bias the collected user behavior data. One of the most common approaches is the use of click models. Click models are used to make hypotheses about the user behavior and then the true relevance is estimated by optimizing the likelihood of the collected clicks. There are several types of click models. One such model is a random click model (RCM) [9] where it is assumed that every document has the same probability of getting clicked and that probability is the model parameter. In a rank based click through rate model (RCTR) it is assumed that the probability of every document being clicked depends on its rank. Therefore, the total number of model parameters is the total number of ranks in the ranking system. Another model is the document based CTR model (DCTR) [8] where the click through rates are estimated for each query-document pair. In this model the total number of model parameters is the total number of query-document pairs. This model is prone to overfitting as the number of parameters grows with the training data size. Most commonly used click models are the position based model (PBM) [8, 15] and the cascade model (CM) [8]. In PBM the hypothesis is that a document is only clicked if it is observed and the user found it attractive or relevant. In CM the hypothesis is that the user sequentially scans the whole document top to bottom and clicks when the document is found to be relevant. In this model the top document is always observed and consecutive documents are only observed if the previous ones were observed and were not deemed relevant. In our proposed method we make a similar hypothesis such as the position based method where the observation probability depends on the rank and the probability of relevance only depends on the query-document pair. However, our approach is to learn the click propensities instead of learning the true relevance by optimizing the likelihood of the collected clicks. More advanced click models, such as the user browsing model (UBM) [9], the dependent click model (DCM) [12], the click chain model (CCM) [11], and the dynamic Bayesian network model (DBN) [6] are also proposed. Chuklin et al. [7] provides a comprehensive overview of click models.

Click models are trained on the collected user behavior data. Interleaving is another option that is deployed at the time of data collection. In interleaving different rank lists can be interleaved together and presented to the user. By comparing the clicks on the swapped results one can learn the unbiased user preference. Different methods for interleaving have been proposed. In the balanced interleave method [17] a new interleaved ranked list is generated for every query. The document constraint method [13] accounts for the relation between documents. Hofmann et al. [14] proposed a probabilistic interleaving method that addressed some of the drawbacks of the balanced interleave method and the document constraint method. One limitation of the interleaving method is that often the experimentation platform in eCommerce companies is not tied to just search. It supports A/B testing for all teams, such as checkout and advertisements. Therefore, the interleaving ranked list may not be supported as it is pertinent only for search ranking.

A more recent approach to address presentation bias is the unbiased learning-to-rank approach. In this click propensities are estimated and then the inverse propensities are used as weights in the loss function. Click propensities are estimated by presenting the same items at different ranks to account for click biases without explicitly estimating the query-document relevance. Click propensity estimation can either be done randomly or in a more principled manner. Radlinski et al. [19] presented the FairPairs algorithm that randomly flips pairs of results in the ranking presented to the user. They called it randomization with minimal invasion. Carterette et al. [4] also presented a minimally invasive algorithm for offline evaluation. Joachims et al. [16] proposed randomized intervention to estimate the propensity model. Radlinski et al. [20], on the other hand, proposed alteration in ranking in a more informed manner using Multi-Armed Bandits. The main drawback of randomization for propensity estimation is that it can cause bad user experience, book keeping overhead, and a potential loss in revenue. Wang et al. [21] proposed a method to estimate propensities without randomization using the EM algorithm. In most of the existing methods, propensity estimation is done first. Once the propensities are learned, an unbiased ranker is trained using the learned propensities. Recently Ai et al. [2] proposed a dual learning algorithm that learns an unbiased ranker and the propensities together.

3 Propensity Estimation Method

The method proposed by Joachims et al. [16] for estimating click propensities is running an experimental intervention in the live search engine, where the documents at two selected ranks are swapped. By comparing the click through rates at these ranks before and after swapping one can easily estimate the ratios of propensities at these ranks (one only needs the ratio of propensities for removing the position bias [16]). Here we propose a novel methodology for estimating click propensities without any intervention. For some search engines, especially in eCommerce, the same query-document pair may naturally appear more than once at different ranks. Using the click data on such documents we can accurately estimate click propensities. It is not required that the same query-document pair should appear at different ranks a large number of times.

We model clicks by the following simple model (also used in [16]) - The probability of a click on a given document is the product of the probability of observing the document and the probability of clicking on the document for the given query assuming that it has been observed. We assume that the probability of observing a document depends only on its rank and the probability of clicking on the document for a given query if it is observed depends only on the query and the document. Mathematically:

$$\begin{aligned} \begin{aligned} p(c=1|q,y)&=p(o=1|q,y)p(c=1|q,y,o=1)\\&=p(o=1|rank(y))p(c=1|q,y,o=1)\\&=p_{rank(y)}p(c=1|q,y,o=1) \end{aligned} \end{aligned}$$
(1)

where q denotes a query, y denotes a document, c denotes a click (0 or 1), o denotes observation (0 or 1), and \(p_i\) denotes the propensity at rank i.

Let us assume that our data D consists of N query-document pairs \(x_j\) for \(j\in [1,N]\). For a query-document pair \(x_j\) we will denote the probability of clicking on the document after observing it by \(z_j\). For each query-document pair \(x_j\) we have a set of ranks \(r_{jk}\) where the document has appeared for the query, and clicks \(c_{jk}\) denoting if the document was clicked or not (1 or 0) when it appeared at rank \(r_{jk}\), for \(k\in [1,m_j]\). Here we assume that the query-document pair \(x_j\) has appeared \(m_j\) separate times. For now we do not assume that \(m_j\) must be greater than 1 - it can be any positive integer.

The probability of a click for query-document pair \(x_j\) where the document appeared at rank \(r_{jk}\) is, according to (1) \(p(c=1)=p_{r_{jk}}z_j\). It follows that \(p(c=0)=1-p_{r_{jk}}z_j\). We can now introduce the following likelihood function:

$$\begin{aligned} \mathcal {L}(p_i,z_j|D)=\prod _{j=1}^N\prod _{k=1}^{m_j}\left[ c_{jk}p_{r_{jk}}z_j+(1-c_{jk})(1-p_{r_{jk}}z_j)\right] \,. \end{aligned}$$
(2)

Here the parameters are the propensities \(p_i\) and the “relevances” \(z_j\) (relevance here means probability of clicking for a given query-document pair assuming that the document has been observed). Theoretically, the parameters can be estimated by maximizing the likelihood function above. However, this can be challenging due to the large number of parameters \(z_j\). In fact, we are not even interested in estimating the \(z_j\) - we only need to estimate the propensities \(p_i\), and the \(z_j\) are nuisance parameters.

The likelihood function above can be simplified under mild and generally applicable assumptions. Firstly, only query-document pairs that appeared at multiple different ranks and got at least one click are of interest. This is because we need to compare click activities for the same query-document pair at different ranks to be able to gain some useful information about propensities with the same “relevance”. Secondly, we make the assumption that overall click probabilities are not large (i.e. not close to 1). We discuss this assumption in detail in Appendix A. As we will see in Sect. 4 this is a reasonable assumption for eBay search. This assumption is generally valid for lower ranks (below the top few), and in Appendix A we discuss how to make small modifications to the data in case the assumption is violated for topmost ranks. We also discuss alternative approaches for estimating the click propensities for cases when the our assumption might not work very well (our methodology of simulations in Appendix B can be used to verify the validity of the assumption).

The likelihood can then be simplified to take the following form:

$$\begin{aligned} \log \mathcal {L}(p_i|D)=\sum _{j=1}^N\left( \log (p_{r_{jl_j}})-\log \sum _{k=1}^{m_j}p_{r_{jk}}\right) \,. \end{aligned}$$
(3)

The detailed derivation is presented in Appendix A. Note that the simplified likelihood function (3) only depends on the propensities, which is one of the most important contributions of this work. By maximizing the likelihood function above we can get an estimate of the propensities. Because the likelihood function depends on the propensities only we can estimate the propensities up to much lower ranks than previously done in the literature without having to rely on a large amount of data.

4 Click Propensities for eBay Search

In this section we apply the method developed above on eBay search data to estimate propensities. For comparison, we also estimate the propensities using the EM method [21].

We collected a small sample (0.2%) of queries for four months of eBay search traffic. For each query we keep the top 500 items (in this work we use the terms “item” and “document” interchangeably). There are multiple sort types on eBay (such as Best Match, Price Low to High, Time Ending Soonest) and click propensities may differ for different sort types. In this paper we present our results on Best Match sort, and hence we keep only queries for that sort type. Furthermore, there are multiple different platforms for search (such as a web browser or a mobile app) which can have different propensities. We separate our dataset into two platforms - web and mobile, and estimate click propensities for each platform separately. For web queries we estimate the propensities for list view with 50 items per page (the most common option).

Next, we identify same query-document pairs and find cases where the document appeared at multiple different ranks. We apply certain filters to ensure that the “relevance” of the document has not changed for the query between multiple appearances, and different click probabilities are only due to different ranks. Namely, we check that the price of the item has not changed and exclude auction items (since their relevance depends strongly on the current bid and the amount of time left). We also keep the same query-document pairs from the same day only to make sure that seasonality effects do not affect the popularity of the item. For the query side we identify two queries to be the same if they have the same keywords, as well as the same category and aspect (such as color, size) constraints. We then keep only those query-document pairs that appeared at two different ranks and got one click in one rank and no click in the other.Footnote 1 We have also verified our assumption of not very large click probabilities for our dataset. Note that the validity of the assumption is also verified through simulations in Appendix B where the simulated data has similar click through rates to the actual eBay data.

Fig. 1.
figure 1

Click propensity estimated for eBay search for web data (left) and mobile data (right). The solid blue line is the direct estimation of propensities for each rank, the red dashed line is the estimation using interpolation, and the black dotted curve is the estimation using the EM method. For comparison, on the right side we also plot the propensities for web data using interpolation in solid green, which is the same as the red dashed line from the left side. (Color figure online)

We first estimate propensities for web queries. Our dataset consists of about 40,000 query-item pairs, each of which appeared at two different ranks and received a click at one of the ranks. We use two methods for estimating propensities - direct and interpolation. In the direct method we treat the propensity at each rank as a separate parameter. We therefore get 500 different parameters to estimate. In the interpolation method we fix a few different ranks and use the propensities at those ranks as our parameters to estimate. The propensities for all the other ranks are computed as a linear interpolation in the log-log space, i.e. we approximate the log of the propensity as a linear function of the log of the rank. This results in the propensity being a power law of the rank. For the interpolation method our fixed ranks are 1, 2, 4, 8, 20, 50, 100, 200, 300, and 500. We choose a denser grid for higher ranks since there is more data and less noise for higher ranks, and the propensities can be estimated more accurately.

Our resulting propensity for web search is shown in Fig. 1 (left). The solid blue line shows the propensities estimated through the direct method, and the red dashed curve shows the propensities estimated through interpolation. Even though we estimate propensities up to rank 500, we plot them only up to rank 200 so that the higher ranks can be seen more clearly. The red dashed curve passes smoothly through the blue solid curve, which is reassuring. Note that the red dashed curve is not a fit to the blue one. The two are estimated directly from the data. For the blue curve the parameters are all of the propensities at each rank, whereas for the red dashed curve we only parametrize the propensities at select ranks and interpolate in between. We then maximize the likelihood for each case to estimate the parameters. The fact that the red dashed line appears to be a smooth fit to the solid blue shows that the interpolation method is useful in obtaining a smooth and less noisy propensity curve which is still very close to the direct estimation.

The propensities estimated from eBay mobile search data are shown in Fig. 1 (right). As in the left plot (web data), the blue solid curve shows direct estimation, and the red dashed curve is estimation using interpolation. For comparison, we plot the propensities from web using interpolation in solid green. The blue solid curve shows a certain periodicity - the propensities seem to drop sharply near rank 25, then go back up at rank 40, drop again around rank 65, then back at rank 80, and so on. In fact, this reflects the way results are loaded in mobile search - 40 at a time. The blue curve seems to indicate that users observe the results at higher ranks with the usual decrease in interest, then they tend to scroll faster to the bottom skipping the results towards the bottom, then as the new batch is loaded they regain interest. The red dashed curve matches the blue one reasonable well, but it fails to capture the periodic dips. This is due to our choice of knots for the linear spline. One can use the blue curve to choose new locations of the knots to be able to get a better interpolation for the propensities. The green solid curve matches fairly well with the blue one except for the dips. This means that the propensities for web and mobile are very similar, except for the periodic dips for mobile. The web results are shown 50 items per page, but we have not found any periodic dips for web search. Perhaps this indicates that for web search users do not tend to scroll quickly towards the end of the page and then regain interest as a new page is loaded. The smooth decline in propensities indicates that for web search users steadily lose interest as they scroll down, but the number of items per page does not affect their behavior.

We have also estimated propensities using the regression-based EM method by Wang et al. [21]. The results are plotted with black dotted lines in Fig. 1. The two methods are very different and use different kinds of data so it is hard to have a fair comparison. However, we have used datasets of similar sizes with similar numbers of queries to make the comparison as fair as possible. For the regression method we have used gradient boosted decision trees [10] using our top 25 ranking features. The estimates obtained with the EM method are in general higher than the estimates using our method. We have obtained similar periodicity patterns for mobile data from both methods which is reassuring. We do not have the ground truth for comparison since we have not performed any randomization experiments. However, our simulations in the next Section show that our method’s predictions are close to the ground truth. We have also used these estimates in Sect. 5 to train unbiased learning-to-rank models and have obtained better offline metrics using our estimates compared to the EM-based estimates.

5 Unbiased Learning-to-Rank Models

In this section we study the improvement in ranking models by using the estimated click propensities for eBay search data. Previous studies have consistently shown that unbiased learning-to-rank models significantly improve ranking metrics compared to their biased counterparts. Specifically, Joachims et al. [16] have shown that an unbiased learning-to-rank model significantly improves the average rank of relevant results for simulated data. Furthermore, they have performed an online interleaving experiment on a live search engine for scientific articles, which resulted in a significant improvement for the unbiased model. Wang et al. [21] have shown an improvement in MRR (Mean Reciprocal Rank) for the unbiased learning-to-rank models for personal search.

We train ranking models to check if unbiased ranking models show improvements over their biased counterparts and to compare our method of propensity estimation to the EM method. For our training data we collect a sample of about 40,000 queries which have received at least one click. The sample is collected from four days of search logs. We train listwise ranking models using the LambdaMART algorithm [3]. We use the DCG metric [18] as our loss function. We define \(\mathrm {rel}_{ij}\) to be 1 if document j was clicked, and 0 otherwise. We train three models - one without position bias correction (baseline biased), one with position bias correction using propensity estimates from the EM method (baseline EM), and finally a model with position bias correction using propensity estimates from our method (proposed method). All models use DCG as a loss function, with baseline biased using no position bias correction and the other models using inverse-propensity weighted relevances [16]. We use the propensities estimated for eBay web search as shown in Fig. 1 (left) - red dashed curve for proposed method and black dotted curve for baseline EM. Our training and test data are also from web search (i.e. browser) only. We use 25 features for all models, selected from our top ranking features. We use the same hyperparameters for all the models: the number of trees is 100 and the shrinkage is 0.1 (we have fixed the number of trees and tuned the shrinkage for the baseline model, which is then applied to all models).Footnote 2

Our test data contains a sample of about 10,000 queries from four days of eBay search logs. Since the test data also has the same position bias as the training data we cannot rely on standard ranking metrics such as DCG, NDCG (Normalized Discounted Cumulative Gain), or MRR (Mean Reciprocal Rank). Another option would be to use inverse-propensity-weighted versions of these metrics to remove the presentation bias. However, the true propensities are unknown to us and we obviously cannot use estimated propensities for evaluation since part of the evaluation is checking if our estimate of propensities is a good one. For that reason we choose a different approach for evaluation. Namely, we fix the rank of items in the test data, i.e. we select items from different queries that appeared at a given fixed rank. By selecting the items from a fixed rank in the evaluation set we effectively eliminate position bias since all of the items will be affected by position bias the same way (the observation probability is the same for all the items since the rank is the same). Then we compare the two ranking models as classifiers for those items, which means that we evaluate how well the models can distinguish items that were clicked from ones that were not. We use AUC (Area Under the Receiver Operating Characteristic Curve) as our evaluation metric.

Table 1. AUC improvement of the proposed method compared to two baselines - baseline biased and baseline EM [21]. The validation set contains documents from a fixed rank, shown in the first column. The next two columns show the improvements in AUC. Error bars are obtained using 1,000 bootstrap samples of the test data - we show the mean and standard deviation of the improvement over the bootstrap samples.

The results are presented in Table 1, where we show results for fixed ranks 1, 2, 4, 8, 16, and 32. To estimate statistical significance of the improvements we have performed 1,000 bootstrap samples of the test data and computed the improvements on these samples. In Table 1 we show the mean and standard deviation on the bootstrap samples (the distribution of the results on the bootstrap samples is close to Gaussian, as expected, so the mean and standard deviation are enough to describe the full distribution). As we can see, for all ranks the proposed method outperforms both baselines. Both unbiased models significantly outperform baseline biased. However, our proposed method outperforms baseline EM as well. The improvements are statistically significant for all ranks, except for rank 32, where the improvements are not as large. For ranks below 32 the improvements become minor.

6 Summary and Future Work

In this work we have introduced a new method for estimating click propensities for eCommerce search without randomizing the results during live search. Our method uses query-document pairs that appear more than once and at different ranks. Although we have used eCommerce search as our main example, the method is general and can be applied to any search engine for which ranking naturally changes over time. The clear advantage of our method over result randomization is that it does not affect live search results, which can have a negative impact on the engine as has been shown in the literature [21]. We have compared our method to the EM (Expectation Maximization) based method proposed in [21] and have shown that our proposed method outperforms the EM based method for eBay data. There is another approach proposed in parallel to our work [1] for direct estimation of propensities. However, our method has a few clear advantages, such as not relying on multiple rankers in the system and not requiring a large amount of data for each pair of ranks. This has allowed us to estimate propensities up to ranks that are much lower than previously computed in the literature. Our proposed approach is robust and we believe that it will find widespread use for unbiased learning-to-rank modeling, especially in the eCommerce domain.

We have used simulated data to show that our method can give accurate estimates of the true propensities. We have applied our method to eBay search results to separately estimate propensities for web and mobile search. We have also trained ranking models and compared the performance of the unbiased model using the estimated propensities to two baselines - one without bias correction and one that corrects position bias using estimates from the EM method. Using a validation dataset of documents from a fixed rank we have shown that our unbiased model outperforms both baselines in terms of the AUC metric.

The focus of this work is propensity estimation from query-document pairs that appear at multiple different ranks. Importantly, we have addressed the case when the same query-document pair appears only a few times at different ranks (can be as few as twice). This method can be generalized to use query-document pairs that appeared at a single rank only by incorporating appropriate priors and using Gibbs sampling to estimate the posterior distribution for propensities. We plan to study this approach in a future work. We are also planning to estimate and compare propensities for different classes of queries (such as queries for electronics versus fashion categories) and user demographics, as well as different sort types, such as sort by price.