Evidential Top-k Queries Evaluation: Algorithms and Experiments

Bousnina, Fatma Ezzahra; Chebbah, Mouna; Bach Tobji, Mohamed Anis; Hadjali, Allel; Ben Yaghlane, Boutheina

doi:10.1007/978-3-319-91473-2_35

Fatma Ezzahra Bousnina^16,18,
Mouna Chebbah¹⁷,
Mohamed Anis Bach Tobji¹⁷,
Allel Hadjali¹⁸ &
…
Boutheina Ben Yaghlane¹⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 853))

Included in the following conference series:

International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems

1613 Accesses

Abstract

Top-k queries represent a vigorous tool to rank-order answers and return only the most interesting ones. ETop-k queries were introduced to discriminate answers in the context of evidential databases. Due to their interval degrees, such answers seem to be difficult to rank-order and to interpret. Two methods of ranking intervals were proposed in the evidential context. This paper presents an efficient implementation of these methods and discusses the experimental results obtained.

Access provided by CONRICYT-eBooks. Download conference paper PDF

On Top-k Retrieval for a Family of Non-monotonic Ranking Functions

Generic Top-k Query Processing with Breadth-First Strategies

On the Top-k Retrieval Problem for Ontology-Based Access to Databases

Keywords

1 Introduction

Querying imperfect databases received a lot of importance recently with the emergence of domains like sensor networks, data cleaning, recommendation and recommender systems, etc. Indeed, information generated from this type of applications is obviously pervaded with imperfection (uncertainty, imprecision, ignorance...). That is why, database models that handle imperfect data were introduced (Probabilistic, possibilistic and evidential databases [1, 3, 5, 6, 16]). The latter, models several types of imperfect data but also perfect information using theory of belief functions. In database management, querying is a fundamental step. As consequence, multiple types of ‘imperfect’ queries were introduced. We name the evidential skyline [9], the extended relational queries [1] and the evidential Top-k queries [4].

In general, Top-k queries are needed in real world applications. For an example, movies, music and books are ordered by the preferred ones, researchers by their H-index, etc. Imperfect top-k queries can be very challenging when it comes to their semantics but also when it comes to their practical implementation.

In this paper, we present two algorithms of evidential Top-k queries: The first named NaiETop-k is based on the computation of the preference degrees (called evidential scores) as introduced in [18] and adapted in [4]. The second named OptETop-k is based on an optimized version of the preference degree calculation justified by the complementarity property as detailed in [4]. The proposed implementation allows the ranking of all evidential scores and finally, it provides the k most interesting results among all rank-ordered answers.

Table 1 is an example of an evidential table that stores some users’ preferences about books: $b_1, b_2, b_3, b_4$. This relation includes three attributes: The ID which is a unique reader identifier. The BookRate that includes the reader’s appreciations about one book and/or several books modeled through the belief functions theory (in this context only few researches addressed the issue of preference elicitation using this theory [2, 11]). The CL which is a specific attribute to evidential databases that stores intervals of confidence about user’s responses.

Table 1. Books appreciations’ table: EDB

Full size table

This paper is organized as follows: we recall, in Sect. 2 some basic concepts about the belief functions theory and evidential databases. In Sect. 3, we remind needs and challenges of evidential Top-k queries and we present the mathematical materials to compute and compare Evidential Scores and Preference Degrees. Section 4 is dedicated to the presentation of proposed algorithms. Experiments and results are shown in Sect. 5. Section 6 is devoted to the conclusion and the future works.

2 Evidence Theory and Evidential Databases

Evidence theory named the belief functions theory or the Dempster-Shafer theory [7, 8, 17], is a powerful tool to model ignorance and to represent uncertain, imprecise and inconsistent information.

In the theory of belief functions, a set $\varTheta = \{ \theta _1, \theta _2,\ldots , \theta _n\}$ is a finite, non empty and exhaustive set of n elementary and mutually exclusive hypotheses related to a given problem. The set $\varTheta $ is called the frame of discernment or universe of discourse.

The power set $2^{\theta }= { \{\varnothing , \theta _1, \theta _2,\ldots ,\theta _n,\{\theta _1, \theta _2\},\ldots ,\{\theta _1, \theta _2,\ldots , \theta _n \} \}}$ is the set of all subsets of $\varTheta $.

A mass function, noted m, is a mapping from $2^\varTheta $ to the interval [0, 1]. The basic belief mass of an hypothesis x is noted m(x), it represents the belief on the truth of that hypothesis x. A mass function is also called basic belief assignment (bba). It is formalized such that:

$$\begin{aligned} \displaystyle {\sum _{x\subseteq \varTheta }}m^{\varTheta }(x)=1 \end{aligned}$$

(1)

If $m^\varTheta (x)>0$, x is called focal element. The set of all focal elements is denoted F and the couple $\{F,m\}$ is called body of evidence.

The belief function, denoted bel, is the minimal degree of support committed exactly to x such that:

$$\begin{aligned} bel(x) = \displaystyle {\sum _{y\subseteq x; y \ne \varnothing } m^\varTheta (y)} \end{aligned}$$

(2)

The plausibility function, denoted pl, is the maximal degree of support committed exactly to x such that:

$$\begin{aligned} pl(x)= \sum _{y \subseteq \varTheta ; x \cap y \ne \varnothing } m^\varTheta (y) \end{aligned}$$

(3)

An evidential database, denoted EDB, stores different types of data using the belief functions theory as shown in Table 2.

Table 2. The different types of information modeled in the evidential database

Full size table

Definition 1

[Compact Evidential Database]

An EDB has N objects and A attributes. An evidential value, noted $V_{la}$, is the value of an attribute a ($1\le a\le A$) for an object l ($1\le l\le N$) that represents a basic belief assignment.

$$\begin{aligned} V_{la} : 2^{\varTheta _{a}} \rightarrow [0,1] \end{aligned}$$

(4)

$$\begin{aligned} with \,\,m^{\varTheta _{a}}_{la}(\varnothing ) = 0 \quad and \quad \displaystyle {\sum _{x \subseteq \varTheta _a}m^{\varTheta _{a}}_{la}(x)=1} \end{aligned}$$

(5)

The set of focal elements relative to the bba $V_{la}$ is noted $F_{la}$ such that:

$$\begin{aligned} F_{la} = \{x \subseteq \varTheta _a / m_{la}(x) > 0\} \end{aligned}$$

(6)

A confidence level, CL, is a specific attribute that includes intervals. Each one represents the confidence about its object l in the evidential database. The confidence level is a pair of belief and plausibility [bel; pl] reflecting the pessimistic and the optimistic degrees of support about each object’ existence in the database [1, 14, 15].

Multiple types of queries can be applied over an EDB like the extended relational operators (select, project, join...) [1, 14, 15], skyline queries [9, 10] and ranking queries [4].

3 Evidential Top-k Querying

Top-k queries represent a mighty tool to order queries’ results and give only the most interesting answers. Top-k queries were firstly introduced in the multimedia systems [12, 13]. They use a score function to rank answers where only results with the highest scores are returned.

Evidential Top-k queries, denoted ETop-k, rank answers using an evidential score function and return the most interesting ones (with the highest scores). Contrary to usual top-k queries that give a ranking based on a score function with precise values, the ETop-k queries give answers based on a score function with intervals. The latter reflect the minimal and the maximal amounts of confidence about each answer.

Definition 2

[Evidential Score]

Let $R_i$ be a response generated from processing a query Q over an evidential database EDB of a size N and let $S(R_i)$ be the score function of that answer $R_i$ and $bel(R_i)$ and $pl(R_i)$ are respectively its belief and plausibility in the table, such that:

$$\begin{aligned} S(R_i)= [ bel(R_i); pl(R_i)] \end{aligned}$$

(7)

$$\begin{aligned} where \quad bel(R_i) = \dfrac{ \sum ^N_{l=1} bel_l(R_i) * bel_l}{N} \end{aligned}$$

$$\begin{aligned} pl(R_i) = \dfrac{ \sum ^N_{l=1} pl_l(R_i) * pl_l}{N} \end{aligned}$$

The belief of an answer, $bel(R_i)$, is a disjunction of the response’s beliefs in each object of the database. The belief of a response in one object l, denoted $bel_l$, is the product of its belief in the attribute and the belief of that object. Same for the plausibility of an answer, $pl(R_i)$. It is the disjunction of the response’s plausibilities in each object of the database where the plausibility of a response in one object l, denoted $pl_l$ is the product of its plausibility in the attribute and the plausibility of that object [1, 14].

Example 1

The Top-k query processed over the evidential database of Table 1 is the following [4]:

Q: SELECT BookRate FROM EDB ORDER BY S(BookRate) LIMIT k;

Four possible responses are computed using the evidential score as detailed in Definition 2:

S($b_1$) = [bel($b_1$); pl($b_1$)] = [0,0375; 0.325]
S($b_2$) = [bel($b_2$); pl($b_2$)] = [0,0375; 0.525]
S($b_3$) = [bel($b_3$); pl($b_3$)] = [0,125; 0.65]
S($b_4$) = [bel($b_4$); pl($b_4$)] = [0,0375; 0.1]

Often a top-k query processed over an evidential database gives a large number of results. These latter need to be ranked in order to respond to the objective of the given query. In the evidential case, the result is a set of intervals that must be compared. Two methods were introduced to compare interval results in EDBs’ context [4, 18].

(i)
The first method was introduced in [18] and adapted in [4]. It is about computing degrees of preference of two intervals and then compare their results to deduce the rank based on three cases:

Definition 3

[Preference Degree]

Let $S(R_i) = [bel_i;pl_i]$ and $S(R_j) = [bel_j;pl_j]$ be two evidential scores. Each one is an interval composed of a belief degree and a plausibility degree. The degree of one interval to be greater than the other one is called a degree of preference and denoted P.

The degree of preference that $S(R_i) > S(R_j)$ is defined such that:

$$\begin{aligned} P(S(R_i) > S(R_j)) = \dfrac{max(0,pl_i - bel_j)- max(0,bel_i - pl_j)}{(pl_i - bel_i) + (pl_j - bel_j)} \end{aligned}$$

(8)

The degree of preference that $S(R_i) < S(R_j)$ is defined such that:

$$\begin{aligned} P(S(R_i) < S(R_j)) = \dfrac{max(0,pl_j - bel_i)- max(0,bel_j - pl_i)}{(pl_i - bel_i) + (pl_j - bel_j)} \end{aligned}$$

(9)

The different cases of comparing intervals $S(R_i)$ and $S(R_j)$ are as follows:

If $P(S(R_i)> S(R_j))> P(S(R_j) > S(R_i))$, then $S(R_i)$ is said to be superior to $S(R_j)$, denoted by $S(R_i) \succ S(R_j)$.
If $P(S(R_i)> S(R_j)) = P(S(R_j) > S(R_i)) = 0.5$, then $S(R_i)$ is said to be indifferent to $S(R_j)$, denoted by $S(R_i) \sim S(R_j)$.
If $P(S(R_j)> S(R_i))> P(S(R_i) > S(R_j))$, then $S(R_i)$ is said to be inferior to $S(R_j)$, denoted by $S(R_i) \prec S(R_j)$.

(ii)
The second method optimizes the first one using the complementarity proof*, results are compared in order to deduce their rank [4]:

Definition 4

[Optimized Preference Degree]

Let $S(R_i)=[bel_i;pl_i]$ and $S(R_j)=[bel_j;pl_j]$ be two evidential scores. Every interval is composed of degrees of belief (bel) and plausibility (pl) and P is the calculated preference degree.

$$\begin{aligned} P(S(R_i) > S(R_j)) = \dfrac{max(0,pl_i - bel_j)- max(0,bel_i - pl_j)}{(pl_i - bel_i) + (pl_j - bel_j)} = \lambda \end{aligned}$$

(10)

The different cases of comparing intervals $S(R_i)$ and $S(R_j)$ are as follows:

If $\lambda > 0.5$ then $S(R_i) \succ S(R_j)$.
If $\lambda = 0.5$, then $S(R_i) \sim S(R_j)$.
If $\lambda < 0.5$ then $S(R_i) \prec S(R_j)$.

Cases that permit to minimize computations before using Definitions 3 or 4 are illustrated in Fig. 1.

Note here the importance of transitivity** property detailed in [18] to give the final ranking.

4 Implementation of Evidential Top-k Query

As the best of our knowledge, there is no implementation of evidential top-k queries. Indeed, we present in this paper an object-oriented implementation of two methods to rank evidential scores (the evidential intervals). The first method is naive, it consists on computing the preference degree through three steps each time: (a) it computes the preference degree that the first interval is superior to the second one and then (b) it computes the preference degree that the first interval is inferior to the second one. Finally, (c) it compares results and give the partial rank. This algorithm is presented in Table 3.

The second method is an optimization of the first one. Indeed, it consists on computing only in one step the preference degree and then deduce the partial order between two intervals. This algorithm is detailed in Table 4.

Finally, the last order is treated using a sorting algorithm, that ranks all evidential intervals and provide the k most interesting ones. The presented implementations offer two methods of evidential intervals’ ranking. Both algorithms use the object-oriented paradigm for its programming benefits.

Table 3. ETop-k naive algorithm

Full size table

Table 4. ETop-k optimized algorithm

Full size table

5 Experimental Study

In this section, we evaluate both algorithms from a performance point of view. We used a windows 10 operating system with 2.10 GHz CPU and 4 GB RAM. We also used Java programming language and NetBeans platform.

5.1 Data Sets

We used synthetic data sets with the following parameters (a) N the size of the database, (b) S the evidential score which is an interval of belief and plausibility [Bel; PL] with BEl, PL $\in $ [0;1] and BEL $\le $ PL^{Footnote 1}.

To generate a synthetic evidential database, the used algorithm uses a procedure that generates a synthetic S. Indeed, the procedure computes randomly a fixed number of evidential scores in the interval [0, 1]. Then one of the algorithms (naive or optimized) are processed in order to compare intervals. Finally, a sorting function is used to provide the final complete ranking of all intervals. Note that each interval is associated to a specific and unique item in the evidential database. In our example, the item is a specific book.

Table 5. Impact of the database size for methods: NaiTopK and OptTopK

Full size table

Experiments showed interesting results from a performance point of view. In fact, we varied the database size parameter (N) from 10 to 3000. The execution time did not exceed 4 min and 50 s for both algorithms. Results are presented in Table 5. Both algorithms showed interesting results. Moreover, OptETopK gave better ones as shown in Fig. 2. For example, OptETopK ranked 1500 tuples in 69 s against 60 s for NaiETopKNote. Note that complexity depends also on the intervals’ nature generated randomly as detailed theoretically in Sect. 3.

6 Conclusion

Throughout this paper, we presented an implementation of the Evidential Top-k query, ETop-k. In fact, we proposed two algorithms NaiETopK and OptETopK. Both methods showed interesting results when we varied the database size but OptETopK showed best performance in practice as shown theoretically in [4]. The proposed implementation is an important achievement of the evidential Top-k querying fitting the semantics of returning the k most credible answers.

Other types of queries, in the evidential context, like aggregation, range, threshold remain as a promising future works.

Notes

1.
Bel and Pl are two functions defined in the object-relational implementation of evidential databases in [5].

References

Bell, D.A., Guan, J.W., Lee, S.K.: Generalized union and project operations for pooling uncertain and imprecise information. Data Knowl. Eng. (DKE) 18, 89–117 (1996)
Article Google Scholar
Yaghlane, A.B., Denœux, T., Mellouli, K.: Elicitation of expert opinions for constructing belief functions. In: Uncertainty and Intelligent, Information Systems, pp. 75–88 (2008)
Chapter Google Scholar
Bousnina, F.E., Bach Tobji, M.A., Chebbah, M., Liétard, L., Ben Yaghlane, B.: A new formalism for evidential databases. In: Esposito, F., Pivert, O., Hacid, M.-S., Raś, Z.W., Ferilli, S. (eds.) ISMIS 2015. LNCS (LNAI), vol. 9384, pp. 31–40. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25252-0_4
Chapter Google Scholar
Bousnina, F.E., Chebbah, M., Bach Tobji, M.A., Hadjali, A., Ben Yaghlane, B.: On top-k queries over evidential data. In: 19th International Conference on Enterprise Information Systems (ICEIS), Porto, Portugal, vol. 1, pp. 106–113 (2017)
Google Scholar
Bousnina F., Chebbah M., Bach Tobji M., Hadjali A. and Ben Yaghlane B.: Object-relational implementation of evidential databases. In: 1st International Conference on Digital Economy (ICDEc), La Marsa, Tunisia, pp. 80–87 (2016)
Google Scholar
Cavallo, R., Pittarelli, M.: The theory of probabilistic databases. In: Proceedings of the 13th VLDB Conference, Brighton, UK, pp. 71–81 (1987)
Google Scholar
Dempster, A.P.: Upper and lower probabilities induced by a multiple valued mapping. Ann. Math. Stat. 38(2), 325–339 (1967)
Article Google Scholar
Dempster, A.P.: A generalization of Bayesian inference. J. R. Stat. Soc. Ser. B 30, 205–247 (1968)
MathSciNet MATH Google Scholar
Elmi, S., Benouaret, K., Hadjali, A., Bach Tobji, M.A., Ben Yaghlane, B.: Computing skyline from evidential data. In: Straccia, U., Calì, A. (eds.) SUM 2014. LNCS (LNAI), vol. 8720, pp. 148–161. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11508-5_13
Chapter Google Scholar
Elmi, S., Benouaret, K., HadjAli, A., Bach Tobji, M.A., Ben Yaghlane, B.: Requêtes skyline en présence des données évidentielles. In: Extraction et Gestion des Connaissances (EGC), pp. 215–220 (2015)
Google Scholar
Ennaceur, A., Elouedi, Z., Lefevre, E.: Multi-criteria decision making method with belief preference relations. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 22(04), 573–590 (2014)
Article MathSciNet Google Scholar
Fagin, R.: Combining fuzzy information from multiple systems. In: 15th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Montreal, Canada, pp. 216–226. ACM (1996)
Google Scholar
Fagin, R.: Fuzzy queries in multimedia database systems. In: 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Seattle, WA, USA, pp. 1–10. ACM (1998)
Google Scholar
Lee, S.K.: An extended relational database model for uncertain and imprecise information. In: 18th Conference on Very Large Data Bases (VLDB), Canada, pp. 211–220 (1992)
Google Scholar
Lee, S.K.: Imprecise and uncertain information in databases: an evidential approach. In: 8th International Conference on Data Engineering (ICDE), Arizona, USA, pp. 614–621 (1992)
Google Scholar
Prade, H., Testemale, C.: Generalizing database relational algebra for the treatment of incomplete or uncertain information and vague queries. Inf. Sci. 34(2), 115–143 (1984)
Article MathSciNet Google Scholar
Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)
MATH Google Scholar
Wang, Y.-M., Yang, J.-B., Dong-Ling, X.: A preference aggregation method through the estimation of utility intervals. Comput. Oper. Res. 32(8), 2027–2049 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LARODEC, Institut Supérieur de Gestion, Université de Tunis, Tunis, Tunisie
Fatma Ezzahra Bousnina
ESEN, Univ. Manouba, Manouba, Tunisie
Mouna Chebbah & Mohamed Anis Bach Tobji
LIAS, ENSMA - Université de Poitiers, Poitiers, France
Fatma Ezzahra Bousnina & Allel Hadjali
Institut des Hautes Etudes Commerciales, Université de Carthage, Tunis, Tunisie
Boutheina Ben Yaghlane

Authors

Fatma Ezzahra Bousnina
View author publications
You can also search for this author in PubMed Google Scholar
Mouna Chebbah
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Anis Bach Tobji
View author publications
You can also search for this author in PubMed Google Scholar
Allel Hadjali
View author publications
You can also search for this author in PubMed Google Scholar
Boutheina Ben Yaghlane
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fatma Ezzahra Bousnina .

Editor information

Editors and Affiliations

Universidad de Cádiz, Cádiz, Cadiz, Spain
Jesús Medina
Universidad de Málaga, Málaga, Málaga, Spain
Manuel Ojeda-Aciego
Universidad de Granada, Granada, Spain
José Luis Verdegay
Universidad de Granada, Granada, Spain
David A. Pelta
Universidad de Málaga, Málaga, Málaga, Spain
Inma P. Cabrera
LIP6, Université Pierre et Marie Curie, CNRS, Paris, France
Bernadette Bouchon-Meunier
Iona College, New Rochelle, New York, USA
Ronald R. Yager

A Appendix

Proof

*Complementarity:

$$\begin{aligned} P(S(R_i) < S(R_j))&= \dfrac{max(0,pl_j - bel_i)- max(0,bel_j - pl_i)}{(pl_i - bel_i) + (pl_j - bel_j)} \end{aligned}$$

$$\begin{aligned} P(S(R_j) < S(R_i))&= \dfrac{max(0,pl_i - bel_j)- max(0,bel_i - pl_j)}{(pl_i - bel_i) + (pl_j - bel_j)} \end{aligned}$$

$P(S(R_i)< S(R_j))+ P(S(R_j) < S(R_i))$

$$\begin{aligned}&= \dfrac{max(0,pl_j - bel_i)- max(0,bel_j - pl_i)}{(pl_i - bel_i) + (pl_j - bel_j)}\\&+ \dfrac{max(0,pl_i - bel_j)- max(0,bel_i - pl_j)}{(pl_i - bel_i) + (pl_j - bel_j)}\\&= \dfrac{max(0,pl_j - bel_i)- 0 + max(0,pl_i - bel_j)- 0}{(pl_i - bel_i) + (pl_j - bel_j)}\\&= \dfrac{ pl_j - bel_i + pl_i - bel_j }{ pl_i - bel_i + pl_j - bel_j} = 1 \end{aligned}$$

$P(S(R_i)< S(R_j))+ P(S(R_j) < S(R_i)) = 1 $

Property 1

**Transitivity

Let $S(R_i) = [bel_i;pl_i] $, $S(R_j) = [bel_j;pl_j]$ and $S(R_k)= [bel_k;pl_k]$ be three intervals. If $S(R_i) \succ S(R_j)$ and $ S(R_j) \succ S(R_k)$ then $S(R_i) \succ S(R_k)$.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bousnina, F.E., Chebbah, M., Bach Tobji, M.A., Hadjali, A., Ben Yaghlane, B. (2018). Evidential Top-k Queries Evaluation: Algorithms and Experiments. In: Medina, J., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Foundations. IPMU 2018. Communications in Computer and Information Science, vol 853. Springer, Cham. https://doi.org/10.1007/978-3-319-91473-2_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-91473-2_35
Published: 18 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91472-5
Online ISBN: 978-3-319-91473-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evidential Top-k Queries Evaluation: Algorithms and Experiments

Abstract

Similar content being viewed by others

On Top-k Retrieval for a Family of Non-monotonic Ranking Functions

Generic Top-k Query Processing with Breadth-First Strategies

On the Top-k Retrieval Problem for Ontology-Based Access to Databases

Keywords

1 Introduction