Keywords

1 Introduction

Information Retrieval (IR) in the microblogosphere such as Twitter has attracted increasing research attention along with the fast development of social media. To explore the information seeking behavior in microblogoshpere, TREC first introduced a Real-Time Search Task (RTST) in 2011 [15], which can be summarized as “At time T, give me the most relevant tweets about topic X”.

However, it is inherently quite challenging to develop an effective real-time IR platform in the context of microblogosphere. First, in contrast to traditional web search techniques, real-time search task usually has to face to the problem of severe vocabulary mismatch. Since the tweets are very short, there is a large risk that query terms fail to match any word observed in relevant tweets. This problem is extremely severe especially when people search certain entities, which usually have several alternative aliases. Besides, real-time search usually indicates the information need concerning something that is happening right now. Thus, it is very crucial for the IR approach to boost the recent tweets relevant to the given topic. This real-time information need requires search engines to trade off between the recency and relevance score computed between the query and tweet.

Query Expansion (QE) methods based on pseudo-relevance feedback (PRF) [11, 13, 18] are widely used in microblog search to mitigate the problems mentioned above. However, these methods rely much on the assumption that the top ranked documents in the initial search are relevant and include good words for query expansion. Nevertheless, in real world, this assumption does not always hold in microblogosphere [4, 14], considering the example that the query contains proper nouns difficult to understand. What’s more, even if the top ranked documents are highly relevant to the topic, it is still very likely that they contain numerous topic-unrelated words due to the informality of the tweet content [14].

To overcome the limitations of existing methods, we utilize FreebaseFootnote 1 as the knowledge source to infer more topic-related context information for each query. Freebase is a practical, scalable tuple database used to organize general human knowledge [3], covering a large amount of knowledge in different aspects (domains), into a hierarchical structure. In this paper, we propose a knowledge query generation method, in which we first match related concepts in Freebase with respect to the query, and then extract useful terms from different properties of the concepts to generate the knowledge query. By interpolating the original query with the knowledge query, we can better reflect the users’ information need.

To further incorporate the temporal evidence in microblogosphere, we follow the work of Li and Croft [9] and incorporate a prior distribution regarding to the recency of documents into the language modeling frameworks. More specifically, while selecting top knowledge terms from Freebase using an association based method, we assign each top ranked pseudo-relevance document with a time prior so that the words appearing in more recent documents are associated with higher probability.

The main contributions of this paper include: (1) we propose a novel approach to generate knowledge terms from Freebase to expand the original query, which can result in better understanding of information need; (2) the temporal evidence is incorporated in our QE method to trade off between relevance and recency; (3) we perform a set of experiments on two official twitter test collections published by TREC, to compare our proposed method with the state-of-the-art baseline methods. And, the experimental results demonstrate that our proposed approach can give rise to significant better retrieval performance.

2 Related Work

QE methods based on PRF assume that most frequent terms in the pseudo-relevance documents are useful, which may not hold in practice. Cao et al. [4] then integrated a term classification process to predict the effectiveness of expansion terms. Miyanishi et al. [14] proposed a manual tweet selection feedback to improve the retrieval performance. However, this method sometimes fails due to the content redundancy of tweets, which contain meaningless words that may degrade search results. Several approaches have been proposed to use the external resource such as Wikipedia, WordNet and ConceptNet to improve query expansion [7, 17]. Medelyan et al. [10] explored the possibilities of using Wikipedia’s articles as an external corpus to expand ad-hoc queries and demonstrated Wikipedia especially useful to improve weak queries which PRF is unable to improve. Pan et al. [16] proposed using Dempster-Shafer’s Evidence Theory to measure the certainty of expansion terms from the Freebase structure. To the best of our knowledge, query expansion based on Freebase knowledge in microblog search is novel and effective.

Previous works showed that temporal evidence can be incorporated into IR [5, 6]. Li and Croft [9] adopted a prior distribution regarding to the recency of documents into language modeling frameworks for retrieval. Liang et al. [11] proposed a temporal re-ranking component to evaluate the temporal aspects of documents. Miyanishi et al. [14] assumed that similar temporal models share similar temporal property and proposed a query-document dependent temporal relevance model. Albakour et al. [1] introduced a decay factor to balance the short-term and long-term interests for a given topic.

3 Proposed Methods

Given the RTST, we assume that a query Q is obtained as a sample from a generative model \(\hat{\theta }_Q\), while the document D is generated by model \(\hat{\theta }_D\). If \(\hat{\theta }_Q\) and \(\hat{\theta }_D\) are the estimated query and document language model respectively, according to [8], the relevance score of D with respect to Q can be computed by the following negative KL-divergence function:

$$\begin{aligned} S(Q,D) = -D(\hat{\theta }_Q||\hat{\theta }_D) \propto \sum _{w \in V} P(w|\hat{\theta }_Q)\cdot \log {P(w|\hat{\theta }_D)} \end{aligned}$$
(1)

Within this ranking formula, the retrieval problem is essentially equivalent to the problem of estimating \(\hat{\theta }_Q\) and \(\hat{\theta }_D\). In principle, we can use any language model for the query and document, which is very flexible.

The start point of our study is to infer more topic-related context for the query with the help of Freebase. In this section, we first elaborate on why we choose Freebase as our knowledge base. Based on the characteristics of Freebase, we describe our proposed method of generating knowledge query in detail. Further improvements can be obtained by combining the knowledge-based query expansion with model-based pseudo-relevance feedback method.

3.1 Why We Choose Freebase

Freebase is a large collaborative knowledge base consisted of data harvested from sources such as the Semantic Web and Wikipedia, as well as individually contributed data from community members. In Freebase, human knowledge is described by structured categories, which are also known as types and each type has a number of defined properties. In this way, Freebase merges the scalability of structured databases with the diversity of collaborative wikis into a structured general human knowledge [3]. This structured knowledge shows two superiorities compared with the semi-structured or plain contents: (1) when searching in the Freebase (with API), different types can be integrated for a more accurate concept, and some types such as name and alias are more important; (2) when generating knowledge terms, we can treat different types and the corresponding properties as different evidence sources.

3.2 Generation of Knowledge Query

We generate the knowledge query based on types, aiming extracting terms from different properties for a given query. The basic procedures of our proposed method include:

  • Concept Match. We select the topic-related concepts with the help of Freebase API. Taking the query “Mila Kunis in Oz Movie” as an example, we match two concepts “Mila Kunis” and “The Wizard of Oz” in Freebase.

  • Term Selection. Freebase describes the human knowledge of a given concept using types and properties. For some important meta types such as alias, name and notable_for, we directly add terms from these type properties to the knowledge query. For other types (i.e. description and domain-specific types), we adopt an association based term selection method to extract the topic-related top K terms.

  • Domain Filtering. The concept in Freebase may be involved in several domains, such as TV, book and celebrities. When searching microblog about an entity (or concept) during a specific time period, users usually focus on one specific domain of that entity. Thus, aside from the common domain, we only use Freebase knowledge from one specific domain.

Then, we view the top ranked knowledge terms from selected properties equally to form a new knowledge query \(Q_{fb}\). After that, the knowledge query model \(\hat{\theta }_{Q_{fb}}\) is interpolated with the original query model \(\hat{\theta }_Q\):

$$\begin{aligned} P(w|\hat{\theta }_{Q_1}) = (1-\alpha ) \cdot P(w|\hat{\theta }_Q) + \alpha \cdot P(w|\hat{\theta }_{Q_{fb}}) \end{aligned}$$
(2)

where \(\alpha \in [0,1]\) is the weighting parameter to control the influence of the knowledge query. Both \(\hat{\theta }_Q\) and \(\hat{\theta }_{Q_{fb}}\) are estimated according to the maximum likelihood estimator.

Concept Match. We then describe our concept match algorithm in detail, which can be concluded as two steps:

  1. 1.

    Noun Phrase Detection. For a given query Q, we first split Q by space and receive a sequence of words \(q_1, q_2, \cdots q_n\). Part-of-speech Tagging is then performed on each word, and all the noun phrases are extracted with rule-based method [2] from the original query.

  2. 2.

    Maximum Match. For each noun phrase, we regard it as a new query and get the related concepts as described in Algorithm 1. The FreebaseSearch function searches the given query in the Freebase and returns the top ranked concept if found. The match process ends if a related concept is found or none of the separate words can find a match.

figure a

Term Selection. For each returned concept from Freebase API, different types and corresponding properties which reflect the different aspects of the concept are provided by the search result. Some types (i.e. meta types) are very general and precise, such as alias, name and notable_for in the common domain, we directly add the property terms to the knowledge query for these types. When it comes to other types such as description and domain specific ones whose properties contain a lengthy text, an association based term selection method is utilized to extract the topic-related knowledge terms. Effective term selection is an important issue for an automatic query expansion technique. In microblog retrieval, a good expansion term should satisfy the following criteria:

  1. 1.

    The term should be semantically associated with the concept from the original query;

  2. 2.

    The term extracted from Freebase should be suitable for the local corpus and it should provide useful information for the query as well;

  3. 3.

    As the user’s intent may change and events related to the given topic will develop over the time, the ranking function should favor the short-term words that are mostly used in recent tweets.

The candidate terms extracted from Freebase have met the first criterion to some extent. In order to satisfy the second criterion, we score the candidate terms with an association based method on the basis of the top ranked N pseudo-relevance documents (PRD):

$$\begin{aligned} Score(w) = \sum _{D\in PRD} P(D)\cdot P(w|D)\cdot \prod _{i=1}^{n}P(q_i|D) \end{aligned}$$
(3)

where P(D) is the document prior which is usually assumed to be uniform, and \(\prod _{i=1}^{n}P(q_i|D)\) is the query likelihood given the document model, which is traditionally computed using Dirichlet smoothing. To further meet the third criterion, we follow the work of [9] and incorporate the temporal evidence into the document prior in Eq. 3 by using an exponential distribution:

$$\begin{aligned} P(D|T_D) = r\cdot e^{-r(T_Q-T_D)} \end{aligned}$$
(4)

where r is the exponential parameter that controls the temporal influence, \(T_Q\) is the query issue time and \(T_D\) is the tweet post time. Both \(T_Q\) and \(T_D\) are measured in fractions of days. Note that \(T_D\) is constantly less than \(T_Q\) as we cannot use the future evidence. Finally, we select the top scored K words from the common description and domain specific properties, to form the knowledge query \(Q_{fb}\) along with the terms extracted from meta properties.

Domain Filtering. One advantage of Freebase is that it provides the domain information of a given concept. A domain is a collection of types which share a namespace (e.g. TV, book and celebrities). As mentioned above, only properties from common domain and top ranked domain are used for term selection. Hence, a domain ranking function should be defined to rank all the domains of a given concept. We first define the domain language model \(\hat{\theta }_{dm}\) as follows:

$$\begin{aligned} P(w|\hat{\theta }_{dm}) = \frac{c(w,dm)}{\sum _{w' \in dm}c(w', dm)} \end{aligned}$$
(5)

where c(wdm) is the count of word w occurred in the domain related properties. Then, we compute the KL-divergence score of the domain language model with the document language model estimated on the top ranked N PRD using empirical word distribution.

$$\begin{aligned} Score (dm) = -D(\hat{\theta }_{dm} || \hat{\theta }_{PRD}) \propto \sum _{w \in dm} P(w|\hat{\theta }_{dm})\cdot \log {P(w|\hat{\theta }_{PRD})} \end{aligned}$$
(6)

Note that we use Dirichlet smoothing method while estimating \(\hat{\theta }_{PRD}\). With this formula, we can rank all the domains and choose terms from the top ranked domain for term selection.

3.3 Mixture Feedback Model

With the knowledge query environment, we believe the information need is more understandable, which could lead to a high precision in the top retrieved documents. Based on this hypothesis, we further utilize a model-based feedback to update the query representation. More specifically, we update the \(\hat{\theta }_{Q_1}\) with the simple mixture model \(\hat{\theta }_{F}\) which is widely used in microblog retrieval [11, 18].

$$\begin{aligned} P(w|\hat{\theta }_{Q_2}) = (1-\beta ) \cdot P(w|\hat{\theta }_{Q_1}) + \beta \cdot P(w|\hat{\theta }_{Q_{F}}) \end{aligned}$$
(7)

where \(\beta \in [0,1]\) is a weighting parameter to control the amount of model-based feedback.

The model-based feedback model generates a feedback document by mixing the query topic model \(\hat{\theta }_{F}\) with the collection language model \(\hat{\theta }_C\). Under this simple mixture model, the log-likelihood of feedback documents F is:

$$\begin{aligned} \log {P(F|\hat{\theta }_{F})} = \sum _{w } {c(w,F}) \cdot \log ( (1-\lambda ) \cdot P(w|\hat{\theta }_{F})+\lambda \cdot P(w|\hat{\theta }_C) ) \end{aligned}$$
(8)

where c(wF) is the count of word w occurred in the set of feedback documents F. Then we follow the work of [18] and implement the EM algorithm with the fixed smoothing parameter \(\lambda =0.5\). No matter whether or not the query finds its knowledge terms in Freebase, the query environment will be updated by the model-based feedback.

4 Evaluation

4.1 Experimental Setup

Data Set. Two corpora (i.e. Tweets11 and Tweets13 collection) are used in our experiments, which are crawled via the official API released by TREC organizers [12]. Tweets11 collection has a sample of about 16 million tweets, ranging from January 24, 2011 to February 8, 2011 while Tweets13 collection contains about 259 million tweets, ranging from February 1, 2013 to March 31, 2013. Tweets11 is used for evaluating the effectiveness of the proposed real-time Twitter search systems over 50 official topics in the TREC’11 Microblog track as well as 60 official topics in the TREC’12 Microblog track, respectively. And, Tweets13 is used in evaluating the proposed real-time Twitter search systems over 60 official topics in the TREC’13 Microblog track. In our experiments, topics for Tweets13 are used for tuning the parameters and then we use the best parameter settings to evaluate our methods with topics for Tweets11.

The tweets and their corresponding topic information were preprocessed in several ways. We first discarded the non-English tweets using a language detector, named ldig Footnote 2. Second, in conformance with the track’s guidelines, all simple retweets were removed by deleting documents beginning with the string ‘RT’. Moreover, we crawled all the shortened URLs and add the web title to the origin tweet. In the end, each tweet was stemmed using the Porter algorithm and stopwords were removed using the InQuery stopwords list.

Evaluation Metric. In TREC Microblog Track, tweets were judged on the basis of the defined information using a three-point scale [15]: irrelevant (labeled as 0), minimally relevant (labeled as 1), and highly relevant (labeled as 2). The main evaluation metric is Mean Average Precision (MAP) for top 1000 documents and Precision at N (P@N), which are widely used in IR. MAP and P@30 with respect to allrel (i.e. tweet set judged as highly or minimally relevant) are used in this paper.

Baselines. To demonstrate the performance of our proposed method, we compare our knowledge-based query expansion methods with several baseline methods. The simple KL-divergence retrieval model (denoted as SimpleKL) [19] is used as our first baseline. That is, we estimate \(\hat{\theta }_Q\) and \(\hat{\theta }_D\) with empirical word distribution, and we choose Dirichlet smoothing method for document model estimation. Throughout this paper, we set the Dirichlet smoothing parameter \(\mu = 100\). We use the Simple Mixture Model [18] (denoted as QESMM) as our second baseline, and optimize the number of feedback documents to 5 and the number of terms in the feedback model to 100. The smoothing parameter \(\beta \) is set as 0.9. We also compare our method with the state-of-the-art real-time ranking model (denoted as RTRM) under language modeling framework, proposed by [11]. RTRM approach utilized a two-stage pseudo-relevance feedback query expansion to estimate the query language model and expand documents with shortened URLs in microblog. We tune all the parameters of these models with TREC’13 topics on Tweets13 corpus.

4.2 Experimental Results

We conduct several experiments to measure the effects of our query expansion methods. For our knowledge-based query expansion method, we label the method with query model \(\hat{\theta }_{Q_1}\) as QEFB, and the one with \(\hat{\theta }_{Q_2}\) as QEFB+SMM. When selecting knowledge terms from Freebase description and domain specific property, we set the top ranked PRD number N to 100 and the expanded term number K to 10. Following Li and Croft [9], unless otherwise specified, we set the parameter r as 0.01. \(\alpha \) in Eq. 2 is set as 0.5, which means we regard the original query and the knowledge query equally important. The query expansion parameters in the mixture feedback model are set like QESMM. All the parameters are tuned with TREC’13 topics. Then we test the optimized models on Tweets11 corpus with TREC’11 and TREC’12 topics.

Table 1. The performance comparison of different query expansion methods. The best performances are marked in bold.

Table 1 shows the performance comparison of different query expansion methods. To test for statistical significance, we used a paired t-test. \(\dag \) and \(\ddag \) indicate that the corresponding improvements over SimpleKL and QESMM are statistically significant (\(p<0.05\)), respectively. Note that all the methods listed in the table estimate the document model as SimpleKL. As we can see, all of the query expansion methods have significant MAP and P@30 improvements compared with the SimpleKL method, which indicates the importance of query expansion in microblog retrieval. When the query is expanded with the Freebase knowledge query, our approach can retrieve more relevant documents in the top results. Thus, we can further improve the retrieval performance by combining the knowledge-based expansion method with mixture feedback model. Our knowledge-based query expansion method QEFB+SMM achieves the best retrieval performance in the three topic sets. More specifically, for TREC’12 topics, our method QEFB+SMM improves the MAP over SimpleKL and QESMM by 23.83 % and 12.35 %, respectively; while the corresponding increments in terms of P@30 are 14.93 % and 11.56 %, respectively. For TREC’11 topics, the QEFB+SMM raises the MAP over SimpleKL and QESMM by 18.73 % and 9.36 %,respectively; while the corresponding P@30 improvements are 13.97 % and 4.04 %, respectively. Our method also beats the state-of-the-art baseline RTRM, which has a good performance in Microblog Track.

4.3 Discussion

Many parameters in our proposed approach can affect the system performance. In this section, we analyze the robustness of the parameter settings in knowledge-based query expansion method. All these experiments in this section are run on TREC’13 topics, which are used for parameter selection.

Effects of Knowledge Query. For the query modeling, we propose using knowledge query to make the information need more comprehensible. Many factors affect the quality of the knowledge terms: (1) whether the maximum match algorithm can get topic-related concept from the Freebase; (2) whether the Freebase structured information (i.e. type information and domain information used in term selection and domain filtering) is helpful; (3) how the number of knowledge terms K affects the retrieval performance and (4) how the number of pseudo-relevance documents N used for term selection and domain selection affects the retrieval performance.

To answer the first question, we create the run QEManualFB, which means we manually select the concept from Freebase for each query. For the second question, we dismiss the structured information of Freebase and select terms from all the properties and regard all the properties equally. We label this run as QEWiki.

Fig. 1.
figure 1

Sensitivity to the selected knowledge term number K.

Fig. 2.
figure 2

Sensitivity to the PRD number N for knowledge term selection.

Figure 1 shows the MAP and P@30 scores of all the models for different K and fixed \(N=100\). MAX means all the candidate terms that satisfy \(Score(w) > 0\) in Eq. 3 are selected. We can see that though QEFB is not better than QEManualFB, the performance gap between them is not large, which verifies the effectiveness of our concept match algorithm. QEFB is consistently better than QEWiki, which proves the importance of the structured information in Freebase. Moreover, when K is set around 10, QEFB can get its optimal retrieval performance and is significantly better than that of SimpleKL, which indicates the effectiveness of the association based term selection method.

We then fix the term number K to 10 and change the PRD number N for term selection and domain filtering. Figure 2 shows the MAP and P@30 scores of our QEFB model aganist different values of N. We can observe that when N is larger than 100, the performance changes slightly. It means that top 100 pseudo-relevance documents can provide enough information for selecting good knowledge terms from description property.

Effects of the Interpolation Coefficients. Recall that we first expand the query with knowledge query, and further expand the updated query \(\hat{\theta }_{Q_1}\) with model-based feedback. The first-stage query expansion is controlled by a coefficient \(\alpha \), while the second-stage expansion is controlled by \(\beta \). Figure 3 (left) shows the performance changing of QEFB (\(N=100, K=10\)) against different values of \(\alpha \). When \(\alpha =0\), the QEFB reduces to the baseline method SimpleKL. When \(\alpha =1\), we completely ignore the original query and only use the knowledge query. We can observe that the performance of QEFB is better than SimpleKL when \(\alpha \) is no greater than 0.7. The optimal performance can be obtained when \(\alpha \) is set around 0.5. Figure 3 (right) shows the performance changing of QEFB+SMM against different values of \(\beta \). When \(\beta =0\), the QEFB+SMM reduces to QEFB. We can see the second-stage expansion seems to be more robust and constantly better than QEFB. After knowledge-based query expansion, the query can be more comprehensible and get more top related tweets, which lead to a further improvement with traditional model-based feedback.

Fig. 3.
figure 3

Sensitivity to the first-stage knowledge query expansion coefficient \(\alpha \) (left) and second-stage mixture feedback interpolation coefficient \(\beta \) (right).

5 Conclusion

In this study, we proposed using knowledge-based query expansion to solve the problems in microblog search. With the knowledge terms derived from the Freebase, the queries in microblogosphere can be more comprehensible and thus more relevant documents can be retrieved. The knowledge terms from Freebase should also co-occur with query terms in PRD, which has the potential to alleviate topic drift often induced by knowledge-based QE. Freebase’s structured information is also well utilized in knowledge query generation procedure. Moreover, we incorporated the temporal evidence into query representation. Thus the proposed method favors recent tweets which satisfy the real-time information need in microblog retrieval. Our thorough evaluation, using two standard TREC collections, demonstrates the effectiveness of the proposed method.