Skip to main content

Abstract

In this chapter, we first look at patterns with their relevance of discovery to business. We then do a survey and evaluation, in terms of advantages and disadvantages, of different mining algorithms that are suited for both traditional and big data sources. These algorithms include those designed for both sequential and closed sequential pattern mining for both the sequential and parallel processing environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Aggarwal, C. C., & Han, J. (2014). Frequent pattern mining. Springer International Publishing Switzerland. Available https://doi.org/10.1007/978-3-319-07821-2_3.

  • Aggarwal, S., & Rani, B. (2013). Optimization of association rule mining process using Apriori and ant colony optimization algorithm.

    Google Scholar 

  • Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In: Proceedings of International Conference Data Engineering (ICDE ’95) (pp. 3–14).

    Google Scholar 

  • Ayres, J., Gehrke, J., Yiu, T., & Flannick, J. (2002). Sequential pattern mining using a bitmap representation. In: Proceedings of ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (SIGKDD’ 02) (pp. 429–435).

    Google Scholar 

  • Cheng, S., Shi, Y., Qin, Q., & Bai, R. (2013). Swarm intelligence in big data analytics. Berlin: Springer.

    Book  Google Scholar 

  • Cong, S., Han, J., & Padua, D. (2005). Parallel mining of closed sequential patterns. Available http://hanj.cs.illinois.edu/pdf/kdd05_parseq.pdf.

  • Gebali, F. (2011). Algorithms and parallel computing. Hoboken, NJ: Wiley.

    Book  Google Scholar 

  • Han, J., & Kamber, M. (2006). Data mining concepts and techniques. Morgan Kaufmann.

    Google Scholar 

  • Han, J., Cheng, H., Xin, D. & Yan, X. (2007). Frequent pattern mining: current status and future directions. Data mining and knowledge discovery, 15(1), 55–86.

    Google Scholar 

  • Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., & Hsu, M. C. (2000). FreeSpan: Frequent pattern projected sequential pattern mining. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’00) (pp. 355–359).

    Google Scholar 

  • Han, J., Wang, J., Lu, Y., & Tzvetkov, P. (2002). Mining top-k frequent closed patterns without minimum support. In Proceedings of IEEE ICDM Conference on Data Mining.

    Google Scholar 

  • Hand, D., Mannila, H., & Smyth, P. (2001). Principles of data mining. London: The MIT Press.

    Google Scholar 

  • Hirate, Y., Iwahashi, E., & Yamana, H. (2004). TF2P-growth: An efficient algorithm for mining frequent patterns without any thresholds. http://elvex.ugr.es/icdm2004/pdf/hirate.pdf.

  • Huang, K., Chang, C., Tung, J., & Ho, C. (2006). COBRA: Closed sequential pattern mining using bi-phase reduction approach.

    Google Scholar 

  • Iglesia, B., & Reynolds, A. (2005). The use of meta-heuristic algorithms for data mining.

    Google Scholar 

  • Kumar, V., Xindong, W., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., et al. (2007). Top 10 algorithms in data mining. London: Springer.

    Google Scholar 

  • Liu, Y., & Guan, Y. (2008). Fp-growth algorithm for application in research of market basket analysis. In 2008 IEEE International Conference on Computational Cybernetics (pp. 269–272). IEEE.

    Google Scholar 

  • Luna, J. M., Romero, R. J., & Ventura, S. (2011). Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. London: Springer.

    Google Scholar 

  • Oweis, N. E., Fouad, M. M, Oweis, S. R., Owais, S. S., & Snasel, V. (2016). A novel mapreduce lift association rule mining algorithm (MRLAR) for big data. International Journal of Advanced Computer Science and Applications (IJACSA), 7(3).

    Google Scholar 

  • Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., et al. (2001). PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proceedings of the International Conference on Data Engineering (ICDE) (pp. 215–224).

    Google Scholar 

  • Qiao, S., Li, T., Peng, J., & Qiu, J. (2010). Parallel sequential pattern mining of massive trajectory data.

    Google Scholar 

  • Rajaraman, A., & Ullman, J. (2011). Mining of massive datasets. Cambridge University Press.

    Google Scholar 

  • Raju, V. P., & Varma, G. P. S. (2015). Mining closed sequential patterns in large sequence databases. International Journal of Database Management Systems (IJDMS), 7(1).

    Google Scholar 

  • Riondato, M., DeBrabant, J. A., Fonseca, R., & Upfal, E. (2012). PARMA: A parallel randomized algorithm for approximate association rules mining in MapReduce. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (pp. 85–94). ACM.

    Google Scholar 

  • Shintani, T., & Kitsuregawa, M. (1998). Mining algorithms for sequential patterns in parallel: Hash based approach. In Proceedings of Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining (pp. 283–294).

    Google Scholar 

  • Song, M., & Rajasekaran, S. (2006). A transaction mapping algorithm for frequent itemsets mining. IEEE Transactions on Knowledge and Data Engineering, 18(4).

    Google Scholar 

  • Tu, V., & Koh, I. (2010). A tree-based approach for efficiently mining approximate frequent itemset.

    Google Scholar 

  • Wang, J., Han, J., & Li, C. (2007). Frequent closed sequence mining without candidate maintenance. IEEE Transactions on Knowledge and Data Engineering, 19(8), 1042–1056.

    Article  Google Scholar 

  • Wang, J., Zhang, L., Liu, G., Liu, Q., & Chen, E. (2014). On top-k closed sequential patterns mining. In 11th International Conference on Fuzzy Systems and Knowledge Discovery.

    Google Scholar 

  • Yan, X., Han, J., & Afshar, R. (2003). CloSpan: Mining closed sequential patterns in large databases. In: Proceedings of SIAM International Conference on Data Mining (SDM ’03) (pp. 166–177).

    Google Scholar 

  • Yang, T., Lin, Q., & Jin, R. (2015). Big data analytics: Optimization and randomization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2327–2327). Available on https://homepage.cs.uiowa.edu/~tyng/kdd15-tutorial.pdf.

  • Yin, J., Zheng, Z., Cao, L., Song, Y., & Wei, W. (2013). Efficiently mining top-k high utility sequential patterns. In Proceedings of 2013 IEEE 13th International Conference on Data Mining (pp. 1259–1264).

    Google Scholar 

  • Zaki, M. J. (2001). Parallel sequence mining on shared-memory machines. Journal of Parallel and Distribution Computing, 61(3), 401–426.

    Article  Google Scholar 

  • Zhenxin, Z., & Jiaguo, L. (2009). Closed sequential pattern mining algorithm based positional data. In Advanced Technology in Teaching—Proceedings of the 3rd International Conference on Teaching and Computational Science (WTCS) (pp. 45–53).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Millham .

Editor information

Editors and Affiliations

Appendix: Summary on Mining Algorithms

Appendix: Summary on Mining Algorithms

Author

Algorithm

Mining approach

Approach used

Advantages

Limitations

Aggarwal and Han (2014)

 

Apriori-based methods

  

However, a candidate-generation-and-test strategy produces a large number of candidate sequences and also requires more database scan (Tu and Koh 2010) when there are long patterns

Han et al. (2000)

 

Pattern-growth methods

 

Without candidate generation

compressed database structure which is smaller than original dataset

 

Han et al. (2000)

FreeSpan

Pattern-growth methods

sequential patterns by partitioning

  

Pei et al. (2001)

PrefixSpan

Pattern-growth methods

Pseudo-projection technique for constructing projected databases

Does not generate-and-test any candidate sequence that do not exist in a projected database

Projected database requires more storage space, extra time is required to scan the projected database

  

Sequential pattern mining

Vertical format-based methods

Fast computation of support counting

 

Zaki (2001)

SPADE

Sequential pattern mining

Vertical format-based methods

Either breadth-first or depth-first manner

 

Consumes more memory

Ayres et al. (2002)

SPAM

Sequential pattern mining

Vertical format-based methods

Traverses the sequence tree in a depth-first manner

A vertical bitmap of the database

Consumes more memory space

Agrawal and Srikant (1995)

AprioriAll

Sequential pattern mining

Horizontal format-based method

 

Consumes more memory space

  

Closed sequential pattern mining

 

Efficient use of search space pruning, reduced number of pattern, find more interesting patterns

 

Yan et al. (2003)

CloSpan

Closed sequential pattern mining

Prefix sequence lattice, post-pruning

 

Huge search space for checking the closure of new patterns

Wang et al. (2007)

BIDE

Closed sequential pattern mining

Depth-first search order. perform closure checking (bidirectional Extension)

Without candidate maintenance, does not keep track of historical closed sequential patterns

Multiple database scan, more computational time

Huang et al. (2006)

COBRA

Closed sequential pattern mining

Bi-phase Reduction Approach, item encoding, pruning methods (LayerPruning and ExtPruning), vertical and horizontal database formats

Reduce searching space

Requires large memory space

Raju and Varma (2015

ClaSP

Closed sequential pattern mining

Vertical database format, Frequent Closed Candidates, recursive post-pruning (CheckAvoidable for pruning the search)

 

Requires more main memory

Han et al. (2002)

 

Top-K closed sequential pattern mining

Descending order of support

Without specifying minimum support

Users must decide the value of k, prior knowledge of database required

Hirate et al. (2004)

TF2P-growth

Top-K closed sequential pattern mining

Descending order of support

Does not require the user to set any threshold value k, output of frequent patterns to user sequentially and in chunks

Time-consuming to be checking all chunk size

Wang et al. (2014)

BI-TSP

Top-K closed sequential pattern mining

bidirectional checking scheme, minimum length constraint, dynamically increase support of k

  

Raju and Varma (2015)

CSpan

Top-K closed sequential pattern mining

Depth-first search, occurrence checking method for early detection of closed sequential patterns, constructs the projected database

 

Projected database requires more storage space, extra time is required to scan the projected database

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Millham, R., Agbehadji, I.E., Yang, H. (2021). Pattern Mining Algorithms. In: Fong, S., Millham, R. (eds) Bio-inspired Algorithms for Data Streaming and Visualization, Big Data Management, and Fog Computing. Springer Tracts in Nature-Inspired Computing. Springer, Singapore. https://doi.org/10.1007/978-981-15-6695-0_4

Download citation

Publish with us

Policies and ethics