Abstract
With the development of Internet, frequent pattern mining has been extended to more complex patterns like tree mining and graph mining. Such applications arise in complex domains like bioinformatics, web mining, etc. In this paper, we present a novel algorithm, namedChopper, to discover frequent subtrees from ordered labeled trees. An extensive performance study shows that the newly developed algorithm outperformsTreeMiner V, one of the fastest methods proposed previously, in mining large databases. At the end of this paper, the potential improvement ofChopper is mentioned.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Zaki M J. Efficiently mining frequent trees in a forest. In8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Copyright 2002 ACM 1-58113-567-X/02/0007, July 2002.
Cook D, Holder L. Substructure discovery using minimal description length and background knowledge.Journal of Artificial Intelligence Research, 1994, 1: 231–255.
Agrawal R, Mannila H, Srikant Ret al. Fast discovery of association rules. InAdvances in Knowledge Discovery and Data Mining, Fayyad Uet al. (eds.), AAAI Press, Menlo Park, CA, 1996, pp.307–328.
Cooley R, Mobasher B, Sravastava J. Web mining: Information and pattern discovering on the World Wide Web. In8th IEEE Int. Conf. Tools with AI, Newport Beach, California, USA, Nov. 1997, pp.558–567.
Zaki M J. SPADE: An efficient algorithm for mining frequent sequences.Machine Learning Journal, Jan/Feb 2001, 42(1/2): 112–120. Special issue on Unsupervised Learning.
Asai T, Abe K, Kawasoe Set al. Efficient substructure discovery from large semi-structured data. InProc. SDM'02, Hyatt Regency, Crystal City, Arlington, Virginia, USA, Apr. 2002, pp.158–174.
Deahaspe L, Toivonen H, King R D. Finging frequent substructures in chemical compounds. InProc. KDD98, New York, USA, 1998, pp.30–36.
Matsuda T, Horiuchi T, Motoda Het al. Graph-based induction for general graph structured data. InProc. DS'99, New York, USA, 1999, pp.340–342.
Mannila H, Meek C. Global partial orders from sequential data. InProc. KDD2000, Boston, USA, 2000, pp.161–168.
Miyahara T, Shoudai T, Uchida Tet al. Discovery of frequent tree structured patterns in semistructured Web documents. InProc. PAKDD-2001, Hong Kong, China, 2001, pp.47–52.
Wang K, Liu H. Schema discovery for semistructured data. InProc. KDD'97, Newport Beach, USA, 1997, pp.271–274.
Wang J T L, Shapiro B A, Shasha Det al. Automated discovery of active motifs in multiple RNA secondary structures. InProc. KDD-96, Portland, USA, 1996, pp.70–75.
Pei J, Han J, Mortazavi-Asl Bet al. PrefixSpan: Mining sequential patterns by prefix-projected growth. InProc. ICDE01, Heidelberg, Germany, April 2001, pp.215–224.
Scott Fortin. The graph isomorphism problem. Technical Report No. TR96-20, Dept. of Computer Science, University of Alberta, 1996.
Richard Cole, Ramesh Hariharan, Piotr Indyk. Tree pattern matching and subset matching in deterministicO(n log3 n)-time. InProc. the 10th Annual ACMSIAM Symposium on Discrete Algorithms, Robert E Tarjan, Tandy Warnow (eds.), Baltimore, Maryland, USA, Jan. 1999, pp.245–254.
http://music.hyperreal.org
http://www.cs.washington.edu/research/adaptive
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is supported by the Key Program of National Natural Science Foundation of China (Grant No.69933010) and the National High-Tech Development 863 Program of China (Grant Nos.2002AA4Z3430 and 2002AA231041).
Chen Wang was born in 1976. He received his B.E. degree and M.S. degree in computer science from Soochow University in 1999 and 2002 respectively. Now, he is currently a Ph.D. candidate in computer science at Fudan University. His research interests include data mining, database and knowledge base.
Qing-Qing Yuan was born in 1978. She received her B.E. degree and M.S. degree in computer science from Fudan University in 2000 and 2003 respectively. Her research interests include data mining, database and knowledge base.
Hao-Feng Zhou was born in 1975. He received his B.E. degree in computer science from Shanghai University in 1997, his M.S. degree and Ph.D. in computer science from Fudan University in 2000 and 2003 respectively. His research interests include data mining, database and knowledge base.
Wei Wang was born in 1970. He received the M.S. degree in 1992 and the Ph.D. degree in 1998. Now he is an associate professor of the Dept. of Computing and Information Technology, Fudan University. His main research areas include spatial-temporal database, constraint database, index technology and semistructure database.
Bai-Le Shi was born in 1935. He received the M.S. degree in 1956. Now he is a chief professor of the Dept. of Computing and Information Technology, Fudan University. His main research areas include object-oriented database, knowledge database, digital library.
Rights and permissions
About this article
Cite this article
Wang, C., Hong, MS., Wang, W. et al. Chopper: Efficient algorithm for tree mining. J. Comput. Sci. & Technol. 19, 309–319 (2004). https://doi.org/10.1007/BF02944901
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02944901