Abstract
Query processing in the uncertain database has played an important role in many real-world applications due to the wide existence of uncertain data. Although many previous techniques can correctly handle precise data, they are not directly applicable to the uncertain scenario. In this article, we investigate and propose a novel query, namely probabilistic top-k star (PTkS) query, which aims to retrieve k objects in an uncertain database that are “closest” to a static/dynamic query point, considering both distance and probability aspects. In order to efficiently answer PTkS queries with a static/moving query point, we propose effective pruning methods to reduce the PTkS search space, which can be seamlessly integrated into an efficient query procedure. Finally, extensive experiments have demonstrated the efficiency and effectiveness of our proposed PTkS approaches on both real and synthetic data sets, under various parameter settings.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Antova, L., Jansen, T., Koch, C., Olteanu, D.: Fast and simple relational processing of uncertain data. In: Proceedings 24th International Conference on Data Engineering. (2008)
Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. (1990)
Benetis, R., Jensen, C., Karciauskas, G., Saltenis, S.: Nearest neighbor and reverse nearest neighbor queries for moving objects. In: TimeCenter Technical Report. (2002)
Böhm, C., Pryakhin, A., Schubert, M.: The Gauss-tree: efficient object identification in databases of probabilistic feature vectors. In: Proceedings 22th International Conference on Data Engineering. (2006)
Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of the 17th International Conference on Data Engineering. (2001)
Chen, J., Cheng, R.: Efficient evaluation of imprecise location-dependent queries. In: Proceedings of the 23th International Conference on Data Engineering. (2007)
Chen, L., Lian, X.: Dynamic skyline queries in metric spaces. In: Proceedings of the International Conference on Extending Database Technology. (2008)
Chen, L., Ozsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. (2005)
Cheng, R., Chen, J.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proceedings of the 24th International Conference on Data Engineering. (2008)
Cheng, R., Kalashnikov, D., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE Trans. Knowl. Data Eng. 16(9) (2004)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. (2003)
Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: Proceedings of the 30th International Conference on Very Large Data Bases. (2004)
Cheng, R., Zhang, Y., Bertino, E., Prabhakar, S.: Preserving user location privacy in mobile data management infrastructures. In: Privacy Enhancing Technologies. (2006)
Deng, K., Zhou, X., Shen, H.T.: Multi-source skyline query processing in road networks. In: Proceedings of the 23th International Conference on Data Engineering. (2007)
Faradjian, A., Gehrke, J., Bonnet, P.: Gadt: a probability space ADT for representing and querying the physical world. In: Proceedings of the 18th International Conference on Data Engineering. (2002)
Grinstead C.M., Snell J.L.: Introduction to Probability. American Mathematical Society (AMS), USA (1997)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. (1984)
Hua, M., Pei, J., Zhang, W., Lin, X.: Efficiently answering probabilistic threshold top-k queries on uncertain data. In: Proceedings of the 24th International Conference on Data Engineering. (2008)
Jovanovic-Dolecek, G.: Demo program for central limit theorem. In: Circuits and Systems. (1997)
Kang, J.M., Mokbel, M.F., Shekhar, S., Xia, T., Zhang, D.: Continuous evaluation of monochromatic and bichromatic reverse nearest neighbors. In: Proceedings of the 23th International Conference on Data Engineering. (2007)
Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th International Conference on Very Large Data Bases. (2002)
Kriegel, H.-P., Kunath, P., Pfeifle, M., Renz, M.: Probabilistic similarity join on uncertain data. In: International Conference on Database Systems for Advanced Applications. (2006)
Kriegel, H.-P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: International Conference on Database Systems for Advanced Applications. (2007)
Lian, X., Chen, L.: Monochromatic and bichromatic reverse skyline search over uncertain databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. (2008)
Lian, X., Chen, L.: Probabilistic ranked queries in uncertain databases. In: Proceedings of the International Conference on Extending Database Technology. (2008)
Lian, X., Chen, L.: Top-k dominating queries in uncertain databases. In: Proceedings of the International Conference on Extending Database Technology. (2009)
Ljosa, V., Singh, A.K.: APLA: indexing arbitrary probability distributions. In: Proceedings of the 23th International Conference on Data Engineering. (2007)
Ljosa, V., Singh, A.K.: Top-k spatial joins of probabilistic objects. In: Proceedings of the 24th International Conference on Data Engineering. (2008)
Mokbel, M.F., Chow, C.-Y., Aref, W.G.: The new casper: query processing for location services without compromising privacy. In: Proceedings of the 32nd International Conference on Very Large Data Bases. (2006)
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th International Conference on Very Large Data Bases. (1994)
Papadias D., Tao Y., Fu G., Seeger B.: Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1), 41–82 (2005)
Papadimitriou, S., Li, F., Kollios, G., Yu, P.S.: Time series compressibility and privacy. In: Proceedings of the 33rd International Conference on Very Large Data Bases. (2007)
Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skylines on uncertain data. In: Proceedings of the 33rd International Conference on Very Large Data Bases. (2007)
Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23th International Conference on Data Engineering. (2007)
Sharifzadeh, M., Shahabi, C.: The spatial skyline queries. In: Proceedings of the 32nd International Conference on Very Large Data Bases. (2006)
Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S., Neville, J., Cheng, R.: Database support for probabilistic attributes and tuples. In: Proceedings of the 24th International Conference on Data Engineering. (2008)
Soliman, M.A., Ilyas, I.F., Chang, K.C.: Top-k query processing in uncertain databases. In Proceedings of the 23th International Conference on Data Engineering. (2007)
Tan, K.-L., Eng, P.-K., Ooi, B.C.: Efficient progressive skyline computation. In: Proceedings of the 27th International Conference on Very Large Data Bases. (2001)
Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., B.K., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: Proceedings of the 31st International Conference on Very Large Data Bases. (2005)
Tao, Y., Papadias, D., Lian, X.: Reverse kNN search in arbitrary dimensionality. In: Proceedings of the 30th International Conference on Very Large Data Bases. (2004)
Tao, Y., Papadias, D., Lian, X., Xiao, X.: Multidimensional reverse kNN search. In: The VLDB Journal (2005)
Theodoridis, Y., Sellis, T.: A model for the prediction of R-tree performance. In: Proceedings of the ACM SIGACT-SIGMOD Symposiam on Principles of Database Systems. (1996)
Xue, W., Luo, Q., Chen, L., Liu, Y.: Contour map matching for event detection in sensor networks. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. (2006)
Yi, K., Li, F., Srivastava, D., Kollios, G.: Efficient processing of top-k queries in uncertain databases. In: Proceedings of the 24th International Conference on Data Engineering. (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lian, X., Chen, L. Shooting top-k stars in uncertain databases. The VLDB Journal 20, 819–840 (2011). https://doi.org/10.1007/s00778-011-0225-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-011-0225-y