Pattern Mining Algorithms

Millham, Richard; Agbehadji, Israel Edem; Yang, Hongji

doi:10.1007/978-981-15-6695-0_4

Richard Millham⁶,
Israel Edem Agbehadji⁶ &
Hongji Yang⁷

Part of the book series: Springer Tracts in Nature-Inspired Computing ((STNIC))

474 Accesses
2 Citations

Abstract

In this chapter, we first look at patterns with their relevance of discovery to business. We then do a survey and evaluation, in terms of advantages and disadvantages, of different mining algorithms that are suited for both traditional and big data sources. These algorithms include those designed for both sequential and closed sequential pattern mining for both the sequential and parallel processing environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Pattern Mining: Current Challenges and Opportunities

Sequential Pattern Mining

Frequent Pattern Mining in Data Streams

References

Aggarwal, C. C., & Han, J. (2014). Frequent pattern mining. Springer International Publishing Switzerland. Available https://doi.org/10.1007/978-3-319-07821-2_3.
Aggarwal, S., & Rani, B. (2013). Optimization of association rule mining process using Apriori and ant colony optimization algorithm.
Google Scholar
Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In: Proceedings of International Conference Data Engineering (ICDE ’95) (pp. 3–14).
Google Scholar
Ayres, J., Gehrke, J., Yiu, T., & Flannick, J. (2002). Sequential pattern mining using a bitmap representation. In: Proceedings of ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (SIGKDD’ 02) (pp. 429–435).
Google Scholar
Cheng, S., Shi, Y., Qin, Q., & Bai, R. (2013). Swarm intelligence in big data analytics. Berlin: Springer.
Book Google Scholar
Cong, S., Han, J., & Padua, D. (2005). Parallel mining of closed sequential patterns. Available http://hanj.cs.illinois.edu/pdf/kdd05_parseq.pdf.
Gebali, F. (2011). Algorithms and parallel computing. Hoboken, NJ: Wiley.
Book Google Scholar
Han, J., & Kamber, M. (2006). Data mining concepts and techniques. Morgan Kaufmann.
Google Scholar
Han, J., Cheng, H., Xin, D. & Yan, X. (2007). Frequent pattern mining: current status and future directions. Data mining and knowledge discovery, 15(1), 55–86.
Google Scholar
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., & Hsu, M. C. (2000). FreeSpan: Frequent pattern projected sequential pattern mining. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’00) (pp. 355–359).
Google Scholar
Han, J., Wang, J., Lu, Y., & Tzvetkov, P. (2002). Mining top-k frequent closed patterns without minimum support. In Proceedings of IEEE ICDM Conference on Data Mining.
Google Scholar
Hand, D., Mannila, H., & Smyth, P. (2001). Principles of data mining. London: The MIT Press.
Google Scholar
Hirate, Y., Iwahashi, E., & Yamana, H. (2004). TF2P-growth: An efficient algorithm for mining frequent patterns without any thresholds. http://elvex.ugr.es/icdm2004/pdf/hirate.pdf.
Huang, K., Chang, C., Tung, J., & Ho, C. (2006). COBRA: Closed sequential pattern mining using bi-phase reduction approach.
Google Scholar
Iglesia, B., & Reynolds, A. (2005). The use of meta-heuristic algorithms for data mining.
Google Scholar
Kumar, V., Xindong, W., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., et al. (2007). Top 10 algorithms in data mining. London: Springer.
Google Scholar
Liu, Y., & Guan, Y. (2008). Fp-growth algorithm for application in research of market basket analysis. In 2008 IEEE International Conference on Computational Cybernetics (pp. 269–272). IEEE.
Google Scholar
Luna, J. M., Romero, R. J., & Ventura, S. (2011). Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. London: Springer.
Google Scholar
Oweis, N. E., Fouad, M. M, Oweis, S. R., Owais, S. S., & Snasel, V. (2016). A novel mapreduce lift association rule mining algorithm (MRLAR) for big data. International Journal of Advanced Computer Science and Applications (IJACSA), 7(3).
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., et al. (2001). PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the International Conference on Data Engineering (ICDE) (pp. 215–224).
Google Scholar
Qiao, S., Li, T., Peng, J., & Qiu, J. (2010). Parallel sequential pattern mining of massive trajectory data.
Google Scholar
Rajaraman, A., & Ullman, J. (2011). Mining of massive datasets. Cambridge University Press.
Google Scholar
Raju, V. P., & Varma, G. P. S. (2015). Mining closed sequential patterns in large sequence databases. International Journal of Database Management Systems (IJDMS), 7(1).
Google Scholar
Riondato, M., DeBrabant, J. A., Fonseca, R., & Upfal, E. (2012). PARMA: A parallel randomized algorithm for approximate association rules mining in MapReduce. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (pp. 85–94). ACM.
Google Scholar
Shintani, T., & Kitsuregawa, M. (1998). Mining algorithms for sequential patterns in parallel: Hash based approach. In Proceedings of Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining (pp. 283–294).
Google Scholar
Song, M., & Rajasekaran, S. (2006). A transaction mapping algorithm for frequent itemsets mining. IEEE Transactions on Knowledge and Data Engineering, 18(4).
Google Scholar
Tu, V., & Koh, I. (2010). A tree-based approach for efficiently mining approximate frequent itemset.
Google Scholar
Wang, J., Han, J., & Li, C. (2007). Frequent closed sequence mining without candidate maintenance. IEEE Transactions on Knowledge and Data Engineering, 19(8), 1042–1056.
Article Google Scholar
Wang, J., Zhang, L., Liu, G., Liu, Q., & Chen, E. (2014). On top-k closed sequential patterns mining. In 11th International Conference on Fuzzy Systems and Knowledge Discovery.
Google Scholar
Yan, X., Han, J., & Afshar, R. (2003). CloSpan: Mining closed sequential patterns in large databases. In: Proceedings of SIAM International Conference on Data Mining (SDM ’03) (pp. 166–177).
Google Scholar
Yang, T., Lin, Q., & Jin, R. (2015). Big data analytics: Optimization and randomization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2327–2327). Available on https://homepage.cs.uiowa.edu/~tyng/kdd15-tutorial.pdf.
Yin, J., Zheng, Z., Cao, L., Song, Y., & Wei, W. (2013). Efficiently mining top-k high utility sequential patterns. In Proceedings of 2013 IEEE 13th International Conference on Data Mining (pp. 1259–1264).
Google Scholar
Zaki, M. J. (2001). Parallel sequence mining on shared-memory machines. Journal of Parallel and Distribution Computing, 61(3), 401–426.
Article Google Scholar
Zhenxin, Z., & Jiaguo, L. (2009). Closed sequential pattern mining algorithm based positional data. In Advanced Technology in Teaching—Proceedings of the 3rd International Conference on Teaching and Computational Science (WTCS) (pp. 45–53).
Google Scholar

Download references

Author information

Authors and Affiliations

ICT and Society Research Group, Durban University of Technology, Durban, South Africa
Richard Millham & Israel Edem Agbehadji
Department of Informatics, University of Leicester, Leicester, England, UK
Hongji Yang

Authors

Richard Millham
View author publications
You can also search for this author in PubMed Google Scholar
Israel Edem Agbehadji
View author publications
You can also search for this author in PubMed Google Scholar
Hongji Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard Millham .

Editor information

Editors and Affiliations

University of Macau, Taipa, China
Simon James Fong
Durban University of Technology, Durban, South Africa
Richard C. Millham

Appendix: Summary on Mining Algorithms

Author	Algorithm	Mining approach	Approach used	Advantages	Limitations
Aggarwal and Han (2014)		Apriori-based methods			However, a candidate-generation-and-test strategy produces a large number of candidate sequences and also requires more database scan (Tu and Koh 2010) when there are long patterns
Han et al. (2000)		Pattern-growth methods		Without candidate generation compressed database structure which is smaller than original dataset
Han et al. (2000)	FreeSpan	Pattern-growth methods	sequential patterns by partitioning
Pei et al. (2001)	PrefixSpan	Pattern-growth methods	Pseudo-projection technique for constructing projected databases	Does not generate-and-test any candidate sequence that do not exist in a projected database	Projected database requires more storage space, extra time is required to scan the projected database
		Sequential pattern mining	Vertical format-based methods	Fast computation of support counting
Zaki (2001)	SPADE	Sequential pattern mining	Vertical format-based methods Either breadth-first or depth-first manner		Consumes more memory
Ayres et al. (2002)	SPAM	Sequential pattern mining	Vertical format-based methods Traverses the sequence tree in a depth-first manner	A vertical bitmap of the database	Consumes more memory space
Agrawal and Srikant (1995)	AprioriAll	Sequential pattern mining	Horizontal format-based method		Consumes more memory space
		Closed sequential pattern mining		Efficient use of search space pruning, reduced number of pattern, find more interesting patterns
Yan et al. (2003)	CloSpan	Closed sequential pattern mining	Prefix sequence lattice, post-pruning		Huge search space for checking the closure of new patterns
Wang et al. (2007)	BIDE	Closed sequential pattern mining	Depth-first search order. perform closure checking (bidirectional Extension)	Without candidate maintenance, does not keep track of historical closed sequential patterns	Multiple database scan, more computational time
Huang et al. (2006)	COBRA	Closed sequential pattern mining	Bi-phase Reduction Approach, item encoding, pruning methods (LayerPruning and ExtPruning), vertical and horizontal database formats	Reduce searching space	Requires large memory space
Raju and Varma (2015	ClaSP	Closed sequential pattern mining	Vertical database format, Frequent Closed Candidates, recursive post-pruning (CheckAvoidable for pruning the search)		Requires more main memory
Han et al. (2002)		Top-K closed sequential pattern mining	Descending order of support	Without specifying minimum support	Users must decide the value of k, prior knowledge of database required
Hirate et al. (2004)	TF²P-growth	Top-K closed sequential pattern mining	Descending order of support	Does not require the user to set any threshold value k, output of frequent patterns to user sequentially and in chunks	Time-consuming to be checking all chunk size
Wang et al. (2014)	BI-TSP	Top-K closed sequential pattern mining	bidirectional checking scheme, minimum length constraint, dynamically increase support of k
Raju and Varma (2015)	CSpan	Top-K closed sequential pattern mining	Depth-first search, occurrence checking method for early detection of closed sequential patterns, constructs the projected database		Projected database requires more storage space, extra time is required to scan the projected database

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Millham, R., Agbehadji, I.E., Yang, H. (2021). Pattern Mining Algorithms. In: Fong, S., Millham, R. (eds) Bio-inspired Algorithms for Data Streaming and Visualization, Big Data Management, and Fog Computing. Springer Tracts in Nature-Inspired Computing. Springer, Singapore. https://doi.org/10.1007/978-981-15-6695-0_4

Download citation

DOI: https://doi.org/10.1007/978-981-15-6695-0_4
Published: 26 August 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6694-3
Online ISBN: 978-981-15-6695-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Pattern Mining Algorithms

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Pattern Mining: Current Challenges and Opportunities

Sequential Pattern Mining

Frequent Pattern Mining in Data Streams

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Summary on Mining Algorithms

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Pattern Mining Algorithms

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Pattern Mining: Current Challenges and Opportunities

Sequential Pattern Mining

Frequent Pattern Mining in Data Streams

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Summary on Mining Algorithms

Appendix: Summary on Mining Algorithms

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation