Abstract
In this chapter, we first look at patterns with their relevance of discovery to business. We then do a survey and evaluation, in terms of advantages and disadvantages, of different mining algorithms that are suited for both traditional and big data sources. These algorithms include those designed for both sequential and closed sequential pattern mining for both the sequential and parallel processing environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C. C., & Han, J. (2014). Frequent pattern mining. Springer International Publishing Switzerland. Available https://doi.org/10.1007/978-3-319-07821-2_3.
Aggarwal, S., & Rani, B. (2013). Optimization of association rule mining process using Apriori and ant colony optimization algorithm.
Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In: Proceedings of International Conference Data Engineering (ICDE ’95) (pp. 3–14).
Ayres, J., Gehrke, J., Yiu, T., & Flannick, J. (2002). Sequential pattern mining using a bitmap representation. In: Proceedings of ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (SIGKDD’ 02) (pp. 429–435).
Cheng, S., Shi, Y., Qin, Q., & Bai, R. (2013). Swarm intelligence in big data analytics. Berlin: Springer.
Cong, S., Han, J., & Padua, D. (2005). Parallel mining of closed sequential patterns. Available http://hanj.cs.illinois.edu/pdf/kdd05_parseq.pdf.
Gebali, F. (2011). Algorithms and parallel computing. Hoboken, NJ: Wiley.
Han, J., & Kamber, M. (2006). Data mining concepts and techniques. Morgan Kaufmann.
Han, J., Cheng, H., Xin, D. & Yan, X. (2007). Frequent pattern mining: current status and future directions. Data mining and knowledge discovery, 15(1), 55–86.
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., & Hsu, M. C. (2000). FreeSpan: Frequent pattern projected sequential pattern mining. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’00) (pp. 355–359).
Han, J., Wang, J., Lu, Y., & Tzvetkov, P. (2002). Mining top-k frequent closed patterns without minimum support. In Proceedings of IEEE ICDM Conference on Data Mining.
Hand, D., Mannila, H., & Smyth, P. (2001). Principles of data mining. London: The MIT Press.
Hirate, Y., Iwahashi, E., & Yamana, H. (2004). TF2P-growth: An efficient algorithm for mining frequent patterns without any thresholds. http://elvex.ugr.es/icdm2004/pdf/hirate.pdf.
Huang, K., Chang, C., Tung, J., & Ho, C. (2006). COBRA: Closed sequential pattern mining using bi-phase reduction approach.
Iglesia, B., & Reynolds, A. (2005). The use of meta-heuristic algorithms for data mining.
Kumar, V., Xindong, W., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., et al. (2007). Top 10 algorithms in data mining. London: Springer.
Liu, Y., & Guan, Y. (2008). Fp-growth algorithm for application in research of market basket analysis. In 2008 IEEE International Conference on Computational Cybernetics (pp. 269–272). IEEE.
Luna, J. M., Romero, R. J., & Ventura, S. (2011). Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. London: Springer.
Oweis, N. E., Fouad, M. M, Oweis, S. R., Owais, S. S., & Snasel, V. (2016). A novel mapreduce lift association rule mining algorithm (MRLAR) for big data. International Journal of Advanced Computer Science and Applications (IJACSA), 7(3).
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., et al. (2001). PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the International Conference on Data Engineering (ICDE) (pp. 215–224).
Qiao, S., Li, T., Peng, J., & Qiu, J. (2010). Parallel sequential pattern mining of massive trajectory data.
Rajaraman, A., & Ullman, J. (2011). Mining of massive datasets. Cambridge University Press.
Raju, V. P., & Varma, G. P. S. (2015). Mining closed sequential patterns in large sequence databases. International Journal of Database Management Systems (IJDMS), 7(1).
Riondato, M., DeBrabant, J. A., Fonseca, R., & Upfal, E. (2012). PARMA: A parallel randomized algorithm for approximate association rules mining in MapReduce. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (pp. 85–94). ACM.
Shintani, T., & Kitsuregawa, M. (1998). Mining algorithms for sequential patterns in parallel: Hash based approach. In Proceedings of Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining (pp. 283–294).
Song, M., & Rajasekaran, S. (2006). A transaction mapping algorithm for frequent itemsets mining. IEEE Transactions on Knowledge and Data Engineering, 18(4).
Tu, V., & Koh, I. (2010). A tree-based approach for efficiently mining approximate frequent itemset.
Wang, J., Han, J., & Li, C. (2007). Frequent closed sequence mining without candidate maintenance. IEEE Transactions on Knowledge and Data Engineering, 19(8), 1042–1056.
Wang, J., Zhang, L., Liu, G., Liu, Q., & Chen, E. (2014). On top-k closed sequential patterns mining. In 11th International Conference on Fuzzy Systems and Knowledge Discovery.
Yan, X., Han, J., & Afshar, R. (2003). CloSpan: Mining closed sequential patterns in large databases. In: Proceedings of SIAM International Conference on Data Mining (SDM ’03) (pp. 166–177).
Yang, T., Lin, Q., & Jin, R. (2015). Big data analytics: Optimization and randomization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2327–2327). Available on https://homepage.cs.uiowa.edu/~tyng/kdd15-tutorial.pdf.
Yin, J., Zheng, Z., Cao, L., Song, Y., & Wei, W. (2013). Efficiently mining top-k high utility sequential patterns. In Proceedings of 2013 IEEE 13th International Conference on Data Mining (pp. 1259–1264).
Zaki, M. J. (2001). Parallel sequence mining on shared-memory machines. Journal of Parallel and Distribution Computing, 61(3), 401–426.
Zhenxin, Z., & Jiaguo, L. (2009). Closed sequential pattern mining algorithm based positional data. In Advanced Technology in Teaching—Proceedings of the 3rd International Conference on Teaching and Computational Science (WTCS) (pp. 45–53).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Summary on Mining Algorithms
Appendix: Summary on Mining Algorithms
Author | Algorithm | Mining approach | Approach used | Advantages | Limitations |
---|---|---|---|---|---|
Aggarwal and Han (2014) | Apriori-based methods | However, a candidate-generation-and-test strategy produces a large number of candidate sequences and also requires more database scan (Tu and Koh 2010) when there are long patterns | |||
Han et al. (2000) | Pattern-growth methods | Without candidate generation compressed database structure which is smaller than original dataset | |||
Han et al. (2000) | FreeSpan | Pattern-growth methods | sequential patterns by partitioning | ||
Pei et al. (2001) | PrefixSpan | Pattern-growth methods | Pseudo-projection technique for constructing projected databases | Does not generate-and-test any candidate sequence that do not exist in a projected database | Projected database requires more storage space, extra time is required to scan the projected database |
Sequential pattern mining | Vertical format-based methods | Fast computation of support counting | |||
Zaki (2001) | SPADE | Sequential pattern mining | Vertical format-based methods Either breadth-first or depth-first manner | Consumes more memory | |
Ayres et al. (2002) | SPAM | Sequential pattern mining | Vertical format-based methods Traverses the sequence tree in a depth-first manner | A vertical bitmap of the database | Consumes more memory space |
Agrawal and Srikant (1995) | AprioriAll | Sequential pattern mining | Horizontal format-based method | Consumes more memory space | |
Closed sequential pattern mining | Efficient use of search space pruning, reduced number of pattern, find more interesting patterns | ||||
Yan et al. (2003) | CloSpan | Closed sequential pattern mining | Prefix sequence lattice, post-pruning | Huge search space for checking the closure of new patterns | |
Wang et al. (2007) | BIDE | Closed sequential pattern mining | Depth-first search order. perform closure checking (bidirectional Extension) | Without candidate maintenance, does not keep track of historical closed sequential patterns | Multiple database scan, more computational time |
Huang et al. (2006) | COBRA | Closed sequential pattern mining | Bi-phase Reduction Approach, item encoding, pruning methods (LayerPruning and ExtPruning), vertical and horizontal database formats | Reduce searching space | Requires large memory space |
Raju and Varma (2015 | ClaSP | Closed sequential pattern mining | Vertical database format, Frequent Closed Candidates, recursive post-pruning (CheckAvoidable for pruning the search) | Requires more main memory | |
Han et al. (2002) | Top-K closed sequential pattern mining | Descending order of support | Without specifying minimum support | Users must decide the value of k, prior knowledge of database required | |
Hirate et al. (2004) | TF2P-growth | Top-K closed sequential pattern mining | Descending order of support | Does not require the user to set any threshold value k, output of frequent patterns to user sequentially and in chunks | Time-consuming to be checking all chunk size |
Wang et al. (2014) | BI-TSP | Top-K closed sequential pattern mining | bidirectional checking scheme, minimum length constraint, dynamically increase support of k | ||
Raju and Varma (2015) | CSpan | Top-K closed sequential pattern mining | Depth-first search, occurrence checking method for early detection of closed sequential patterns, constructs the projected database | Projected database requires more storage space, extra time is required to scan the projected database |
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Millham, R., Agbehadji, I.E., Yang, H. (2021). Pattern Mining Algorithms. In: Fong, S., Millham, R. (eds) Bio-inspired Algorithms for Data Streaming and Visualization, Big Data Management, and Fog Computing. Springer Tracts in Nature-Inspired Computing. Springer, Singapore. https://doi.org/10.1007/978-981-15-6695-0_4
Download citation
DOI: https://doi.org/10.1007/978-981-15-6695-0_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6694-3
Online ISBN: 978-981-15-6695-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)