Abstract
The maximal vector problem is to identify the maximals over a collection of vectors. This arises in many contexts and, as such, has been well studied.The problem recently gained renewed attention with skyline queries for relational databases and with work to develop skyline algorithms that are external and relationally well behaved. While many algorithms have been proposed, how they perform has been unclear. We study the performance of, and design choices behind, these algorithms. We prove runtime bounds based on the number of vectors N and the dimensionality K. Early algorithms based on divide and conquer established seemingly good average and worst-case asymptotic runtimes. In fact, the problem can be solved in \(\mathcal{O}(KN)\) average-case (holding K as fixed). We prove, however, that the performance is quite bad with respect to K. We demonstrate that the more recent skyline algorithms are better behaved, and can also achieve \(\mathcal{O}(KN)\) average-case. While K matters for these, in practice, its effect vanishes in the asymptotic. We introduce a new external algorithm, LESS, that is more efficient and better behaved. We evaluate LESS’s effectiveness and improvement over the field, and prove that its average-case running time is \(\mathcal{O}(KN)\).
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Balke W.T., Güntzer U. (2004) Multi-objective query processing for databas systems. In: Nascimento M.A., Özsu M.T., Kossmann D., Miller R.J., Blakeley J.A., Schiefer K.B. (eds). Proceedings of the 30th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann, Toronto, Canada, pp. 936–947
Balke, W.T., Güntzer, U.: Supporting skyline queries on categorical data in web information systems. In: IASTED International Conference on Internet and Multimedia Systems and Applications (IMSA 2004), pp. 1–6 (2004)
Balke, W.T., Güntzer, U.: Efficient skyline queries under weak pareto dominance. In: IJCAI-05 Multidisciplinary Workshop on Advances in Preference Handling (Preference 2005), pp. 1–7 (2005)
Barndorff-Nielsen O., Sobel M. (1966) On the distribution of the number of admissible points in a vector random sample. Theory Probab Appl 11(2): 249–269
Bentley, J.L., Clarkson, K.L., Levine, D.B.: Fast linear expected-time algorithms for computing maxima and convex hulls. In: Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 179–187. ACM/SIAM (1990)
Bentley J.L., Kung H.T., Schkolnick M., Thompson C.D. (1978) On the average number of maxima in a set of vectors and applications. JACM 25(4): 536–543
Blum M., Floyd R.W., Pratt V., Rivest R.L., Tarjan R.E. (1973) Time bounds for selection. J. Comput. Syst. Sci. 7(4): 448–461
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of the 17th ICDE, pp. 421–430 (2001)
Buchta C. (1989) On the average number of maxima in a set of vectors. Inf. Process. Lett. 33, 63–65
Chan, C.Y., Eng, P.K., Tan, K.L.: Efficient processing of skyline queries with partially-ordered domains. In: ICDE, pp. 190–191 (2005)
Chan, C.Y., Eng, P.K., Tan, K.L.: Stratified computation of skylines with partially-ordered domains. In: SIGMOD Conference, pp. 203–214 (2005)
Chaudhuri, S., Dalvi, N., Raghav, K.: Robust cardinality and cost estimation for skyline operator. In: ICDE (To appear, 2006)
Chomicki J. (2002) Querying with intrinsic preferences. In: Jensen C.S., Jeffery K.G., Pokorný J., Saltenis S., Bertino E., Böhm K., Jarke M. (eds). Proceedings of the 8th International Conference on Extending Database Technology (EDBT), LNCS 2287. Springer, Prague, Czech Republic, pp. 34–51
Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. Technical. Report 04, Computer Science, York University, Toronto, Ontario, Canada (2002)
Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of the 19th International Conference on Data Engineering (ICDE), pp. 717–719 (2003). See [14] for a longer version
Chomicki J., Godfrey P., Gryz J., Liang D. (2005) Skyline with presorting: Theory and optimization. In: Klopotek M.A., Wierzchon S.T., Trojanowski K. (eds). Proceedings of the Intelligent Information Systems Conference (IIS): New Trends in Intelligent Information Processing and Web Mining, Advances in Soft Computing. Springer, Gdansk, Poland, pp. 593–602
Ciaccia, P.: Evaluating preferences with non-transitive preferences. Presentation at the Dagstuhl Seminar 04271 (Preferences: Specification, Inference, Applications) (2004)
Eng P.K., Ooi B.C., Tan K.L. (2003) Indexing for progressive skyline computation. Data Knowl. Eng. 46(2): 169–201
Godfrey P. (2004) Skyline cardinality for relational processing. In: Seipel D., Torres J.M.T. (eds) Proceedings of the 3rd International Symposium on Foundations of Information and Knowledge Systems (FoIKS). Springer, Wilhelminenberg Castle, Austria, pp. 78–97
Godfrey P., Shipley R., Gryz J. (2005) Maximal vector computation in large data sets. In: Böhm K., Jensen C.S., Haas L.M., Kersten M.L., Larson P.Å., Ooi B.C. (eds) Proceedings of the 31st International Conference on Very Large Data Bases (VLDB 2005). ACM, Trondheim, Norway, pp. 229–240
Hellerstein J.M., Avnur R., Chou A., Hidber C., Olston C., Raman V., Roth T., Haas P.J. (1999) Interactive data analysis: The control project. IEEE Comput. 32(8):51–59
Huang, Z., Jensen, C.S., Lu, H., Ooi, B.C.: Skyline queries against mobile lightweight devices in manets. In: ICDE (To appear, 2006)
Jin, W., Han, J., Ester, M.: Mining thick skylines over large databases. In: PKDD, pp. 255–266 (2004)
Kossmann, D., Ramask, F., Rost, S.: Shooting stars in the sky: An online algorithm for skyline queries. In: Proceedings of 28th International Conference on Very Large Data Bases (VLDB-2002), pp. 275–286 (2002)
Kung H.T., Luccio F., Preparata F.P. (1975) On finding the maxima of a set of vectors. JACM 22(4): 469–476
Lin, X., Yuan, Y., Wang, W., Lu, H.: Stabbing the sky: efficient skyline computation over sliding windows. In: ICDE, pp. 502–513 (2005)
Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 467–478. ACM Press, Newyork (2003)
Papadias D., Tao Y., Fu G., Seeger B. (2005) Progressive skyline computation in database systems. ACM Trans. Database Syst. 30(1): 41–82
Pei, J., Jin, W., Ester, M., Tao, Y.: Catching the best views of skyline: a semantic approach based on decisive subspaces. In: VLDB, pp. 253–264 (2005)
Tan K.L., Eng P.K., Ooi B.C. (2001) Efficient progressive skyline computation. In: Apers P.M.G., Atzeni P., Ceri S., Paraboschi S., Ramamohanarao K., Snodgrass R.T. (eds). Proceedings of 27th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann, Rome, Italy, pp. 301–310
Tao, Y., Xiao, X., Pei, J.: SUBSKY: efficient computation of skylines in subspaces. In: ICDE (to appear, 2006)
Torlone, R., Ciaccia, P.: Finding the best when it’s a matter of preference. In: Ciaccia, P., Rabitti, F., Soda, G.: (eds) The 10th Italian National Conference on Advanced Data Base Systems (SEBD 2002), pp. 347–360 (2002)
Torlone, R., Ciaccia, P.: Which are my preferred items? In: Workshop on Recommendation and Personalization in eCommerce (RPEC), pp. 1–9. Malaga, Spain (2002)
Yuan, Y., Lin, X., Liu, Q., Wang, W., Yu, J.X., Zhang, Q.: Efficient computation of the skyline cube. In: VLDB, pp. 241–252 (2005)
Author information
Authors and Affiliations
Corresponding author
Additional information
Part of this work was conducted at William & Mary where Ryan Shipley was a student and Parke Godfrey was on faculty while on leave of absence from York.
Rights and permissions
About this article
Cite this article
Godfrey, P., Shipley, R. & Gryz, J. Algorithms and analyses for maximal vector computation. The VLDB Journal 16, 5–28 (2007). https://doi.org/10.1007/s00778-006-0029-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-006-0029-7