Abstract
Web associations are valuable patterns because they provide useful insights into the browsing behavior of Web users. However, there are two major drawbacks of using current techniques for mining Web association patterns, namely, their inability to detect interesting negative associations in data and their failure to account for the impact of site structure on the support of a pattern. To address these issues, a new data mining technique called indirect association is applied to the Web click-stream data. The idea here is to find pairs of pages that are negatively associated with each other, but are positively associated with another set of pages called the mediator. These pairs of pages are said to be indirectly associated via their common mediator. Indirect associations are interesting patterns because they represent the diverse interests of Web users who share a similar traversal path. These patterns are not easily found using existing data mining techniques unless the groups of users are known a priori. The effectiveness of indirect association is demonstrated using Web data from an academic institution and an online Web store.
This work was partially supported by NSF grant # ACI-9982274 and by Army High Performance Computing Research Center contract number DAAD19-01-2-0014. The content of this work does not necessarily reflect the position or policy of the government and no official endorsement should be inferred. Access to computing facilities was provided by AHPCRC and the Minnesota Supercomputing Institute.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Agrawal, T. Imielinski, and A. Swami. Database mining: a performance perspective. IEEE Transactions on Knowledge and Data Eng., 5(6):914–925, December 1993.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20th VLDB Conference, pages 487–499, Santiago, Chile, September 1994.
R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. of the Eleventh Int’l Conf. on Data Engineering, pages 3–14, Taipei, Taiwan, March 1995.
A. Banerjee and J. Ghosh. Clickstream clustering using weighted longest common subsequences. In Workshop on Web Mining at the First SIAM Int’l Conf. on Data Mining, pages 33–40, Chicago, IL, 2001.
J. Borges and M. Levene. Mining association rules in hypertext databases. In Proc. of the Fourth Int’l Conference on Knowledge Discovery and Data Mining, pages 149–153, New York, NY, August 1998.
T. Brijs, G. Swinnen, K. Vanhoof, and G. Wets. Using association rules for product assortment decisions: A case study. In Proc. of the Fifth Int’l Conference on Knowledge Discovery and Data Mining, pages 254–260, San Diego, August 1999.
S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In Proc. of 1997 ACM-SIGMOD Int. Conf. on Management of Data, pages 255–264, Tucson, Arizona, June 1997.
M.S. Chen, J.S. Park, and P.S. Yu. Efficient data mining for path traversal patterns. IEEE Transactions on Knowledge and Data Eng., 10(2):209–221, 1998.
R. Cooley, B. Mobasher, and J. Srivastava. Web mining: Information and pattern discovery on the world wide web. In International Conference on Tools with Artificial Intelligence, pages 558–567, Newport Beach, CA, 1997.
R. Cooley, P.N. Tan, and J. Srivastava. Discovery of interesting usage patterns from web data. In M. Spiliopoulou and B. Masand, editors, Advances in Web Usage Analysis and User Profiling, volume 1836, pages 163–182. Lecture Notes in Computer Science, 2000.
M. Deshpande and G. Karypis. Selective markov models for predicting web page access. In Proc. of First SIAM Int’l Conf. on Data Mining, Chicago, 2001.
Y. Fu, K. Sandhu, and M. Shih. A generalization-based approach to clustering of web usage sessions. In B. Masand and M. Spiliopoulou, editors, Web Usage Analysis and User Profiling. Springer-Verlag, 2000.
M.N. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression constraints. In Proc. of the 25th VLDB Conference, pages 223–234, Edinburgh, Scotland, 1999.
B. Liu, W. Hsu, and Y. Ma. Pruning and summarizing the discovered associations. In Proc. of the Fifth Int’l Conference on Knowledge Discovery and Data Mining, pages 125–134, San Diego, CA, August 1999.
H. Mannila, Toivonen H., and A.I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259–289, 1997.
J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu. Mining access patterns efficiently from web logs. In 4th Pacific-Asia Conference (PAKDD 2000), pages 396–407, Kyoto, Japan, April 2000.
P. Pirolli, J.E. Pitkow, and R. Rao. Silk from a sow’s ear: Extracting usable structures from the web. In Proc. of the CHI’ 96 Conference on Human Factors in Computing Systems, pages 118–125, Vancouver, BC, April 1996.
J.E. Pitkow and P. Pirolli. Mining longest repeating subsequences to predict world wide web surfing. In USENIX Symposium on Internet Technologies and Systems, Boulder, CO, October 1999.
A. Savasere, E. Omiecinski, and S. Navathe. Mining for strong negative associations in a large database of customer transactions. In Proc. of the Fourteenth Int’l Conf. on Data Engineering, pages 494–502, Orlando, Florida, February 1998.
C. Shahabi, A.M. Zarkesh, J. Adibi, and V. Shah. Knowledge discovery from users web-page navigation. In Workshop on Research Issues in Data Engineering, Birmingham, England, 1997.
A. Silberschatz and A. Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Trans. on Knowledge and Data Engineering, 8(6):970–974, 1996.
M. Spiliopoulou, L.C. Faulstich, and K. Winkler. A data miner analyzing the navigational behaviour of web users. In Proc. of the Workshop on Machine Learning in User Modelling of the ACAI’99 Int. Conf., Creta, Greece, July 1999.
R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. of the Fifth Int’l Conf. on Extending Database Technology (EDBT), pages 3–17, Avignon, France, March 1996.
J. Srivastava, R. Cooley, M. Deshpande, and P.N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2):12–23, 2000.
P.N. Tan and V. Kumar. Interestingness measures for association patterns: A perspective. In KDD 2000 Workshop on Postprocessing in Machine Learning and Data Mining, Boston, MA, August 2000.
P.N. Tan and V. Kumar. Discovery of web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery, 6(1):9–35, 2001.
P.N. Tan and V. Kumar. Mining association patterns in web usage data. In International Conference on Advances in Infrastructure for e-Business, L’Aquila, Italy, January 2002.
P.N. Tan, V. Kumar, and J. Srivastava. Indirect association: Mining higher order dependencies in data. In Proc. of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 632–637, Lyon, France, 2000.
P.N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. Technical report, AHPCRC, 2002.
H Toivonen, M. Klemettinen, P. Ronkainen, K. Hatonen, and H. Mannila. Pruning and grouping discovered association rules. In ECML-95 Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases, pages 47–52, Heraklion, Greece, April 1995.
A. Wexelblat. An environment for aiding information-browsing tasks. In Proc. of AAAI Symposium on Acquisition, Learning and Demonstration: Automating Tasks for Users, Birmingham, UK, 1996.
T. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Fifth International World Wide Web Conference, Paris, France, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tan, PN., Kumar, V. (2002). Mining Indirect Associations in Web Data. In: Kohavi, R., Masand, B.M., Spiliopoulou, M., Srivastava, J. (eds) WEBKDD 2001 — Mining Web Log Data Across All Customers Touch Points. WebKDD 2001. Lecture Notes in Computer Science(), vol 2356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45640-6_7
Download citation
DOI: https://doi.org/10.1007/3-540-45640-6_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43969-1
Online ISBN: 978-3-540-45640-7
eBook Packages: Springer Book Archive