Abstract
Most of the data generated from social media, Internet of Things, etc. are semi-structured or unstructured. XML is a leading semi-structured data commonly used over cross-platforms. XML clustering is an active research area. Because of the complexity of XML clustering, it remains a challenging area in data analytics, especially when Big Data is considered. In this paper, we focus on clustering of XML based on structure. A novel method for representing XML documents, Compressed Representation of XML Tree, is proposed following the concept of frequent pattern tree structure. From the proposed structure, clustering is carried out with a new algorithm, TreeXP, which follows the XPattern framework. The performances of the proposed representation and clustering algorithm are compared with a well-established PathXP algorithm and found to give the same performance, but require very less time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal C, Ta N, Wang J, Feng J, Zaki M (2007) Xproj. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’07
Piernik M, Brzezinski D, Morzy T, Lesniewska A (2014) XML clustering: a review of structural approaches. Knowl Eng Rev 30(03):297–323
Thulasi A, Remya KTV, Raju G (2017) Structure based XML document clustering: a review. In: 2017 international conference on Infocom technologies and unmanned systems (trends and future directions) (ICTUS)
Piernik M, Brzezinski D, Morzy T (2015) Clustering XML documents by patterns. Knowl Inform Syst 46(1):185–212
Sigmodrecord.org. (n.d.) SIGMOD Record – SIGMOD Record Site. https://sigmodrecord.org
Aiweb.cs.washington.edu. (n.d.) UW XML Repository. http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/www/repository.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Accottillam, T., Remya, K.T.V., Raju, G. (2021). TreeXP—An Instantiation of XPattern Framework. In: Jat, D.S., Shukla, S., Unal, A., Mishra, D.K. (eds) Data Science and Security. Lecture Notes in Networks and Systems, vol 132. Springer, Singapore. https://doi.org/10.1007/978-981-15-5309-7_7
Download citation
DOI: https://doi.org/10.1007/978-981-15-5309-7_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5308-0
Online ISBN: 978-981-15-5309-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)