Abstract
People recently are interested in a new operator, called skyline [3], which returns the objects that are not dominated by any other objects with regard to certain measures in a multi-dimensional space. Recent work on the skyline operator [3,15,8,13,2] focuses on efficient computation of skylines in large databases. However, such work gives users only thin skylines, i.e., single objects, which may not be desirable in some real applications. In this paper, we propose a novel concept, called thick skyline, which recommends not only skyline objects but also their nearby neighbors within ε-distance. Efficient computation methods are developed including (1) two efficient algorithms, Sampling-and-Pruning and Indexing-and-Estimating, to find such thick skyline with the help of statistics or indexes in large databases, and (2) a highly efficient Microcluster-based algorithm for mining thick skyline. The Microcluster-based method not only leads to substantial savings in computation but also provides a concise representation of the thick skyline in the case of high cardinalities. Our experimental performance study shows that the proposed methods are both efficient and effective.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bentley, J.L., Clarkson, K.L., Levine, D.B.: Fast linear expected-time alogorithms for computing maxima and convex hulls. In: SODA 1990 (1990)
Balke, W.-T., Güntzer, U., Zheng, J.X.: Efficient Distributed Skylining for Web Information Systems. In: EBDT 2004 (2004)
Borzsonyi, S., Kossmann, D., Stocker, K.: The Skyline Operator. In: ICDE 2001 (2001)
Cormen, T., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press, Cambridge (2001)
Hinneburg, A., Keim, D.A.: Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: VLDB 1999, pp. 506–517 (1999)
Jin, W., Tung, K.H., Han, J.: Mining Top-n Local Outliers in Very Large Databases. In: KDD 2001 (2001)
Kung, H.T., et al.: On finding the maxima of a set of vectors. JACM 22(4) (1975)
Kossmann, D., Ramsak, F., Rost, S.: Shooting Stars in the Sky: An Online Algorithm for Skyline Queries. In: VLDB 2002 (2002)
Matousek, J.: Computing dominances in En. Inf. Process. Lett. 38(5) (1991)
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge Univ. Press, Cambridge (1995)
Nielsen, F.: Output-sensitive peeling of convex and maximal layers. Thesis (1996)
Preparata, F.P., Shamos, M.I.: Computational Geometry: An Introduction. Springer, Heidelberg (1985)
Papadias, D., Tao, Y., Fu, G., Seeger, B.: An Optimal and Progressive Algoroithm for Skyline Queries. In: SIGMOD 2003 (2003)
Stojmenovic, I., Miyakawa, M.: An Optimal Parallel Algorithm for Solving the Maximal Elements Problem in the Plane. Parallel Computing 7(2) (1988)
Tan, K., et al.: Efficient Progressive Skyline Computation. In: VLDB 2001 (2001)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: SIGMOD 1996 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jin, W., Han, J., Ester, M. (2004). Mining Thick Skylines over Large Databases. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-30116-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive