Abstract
The paper proposes a novel method for subtopics segmentation of Web document. An effective retrieval results may be obtained by using subtopics segmentation. The proposed method can segment hierarchically subtopics and identify the boundary of each subtopic. Based on the term frequency matrix, the method measures the similarity between adjacent blocks, such as paragraphs, passages. In the real-world sample experiment, the macro-averaged precision and recall reach 73.4% and 82.5%, and the micro-averaged precision and recal reach 72.9% and 83.1%. Moreover, this method is equally efficient to other Asian languages such as Japanese and Korean, as well as other western languages.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Goldstein J, Kantrowitz M, Mittal V,et al. Summarizing Text Documents: Sentence Selection and Evaluation Metrics.Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, August 1999. 121–128.
Salton G, Allan J, Buckley C. Approaches to Passage Retrieval in Full Text Information Systems.Proceedings of the 16th Annual International ACM SIGIR Conference. New York, June 1993. 49–58.
Hearst M A. Text Tiling: Segmenting Text Into Multi-Paragraph Subtopic Passages.Computational Linguistics, 1997,23(1): 33–64.
Lin Hong-fei, Zhang Xue-gang, Yao Tian-shun. Text Structure Analysis Based on Concept.Journal of Computer Research and Development, 2000,37(3): 325–328 (Ch).
Brants T, Chen F, Tsochantaridis I. Topic-Based Document Segmentation with Probabilistic Latent Semantic Analysis.Proceedings of the 11th International Conference on Information and Knowledge Management. New York, November 2002. 211–218.
Miller George A. WordNet: A Lexical Database for English George A. Miller.Communication of the ACM, 1995,38 (11): 39–41.
Han Jia-wei, Kamber Micheline. Translated by Fan Ming and Meng Xiao-feng.Data Mining Concepts and Techniques. Beijing: China Machine Press, 2001: 287 (Ch).
Jobbins A C, Evett L J. Text Segmentation Using Reiteration and Collocation.Proceedings of the 17th International Conference on Computational Linguistics-Volume 1, Morristown, August 1998, 614–68.
Author information
Authors and Affiliations
Additional information
Foundation item: Supported by the National High Technology Research and Development Program of China (2002AA119050)
Biography: ZHANG Yun-tao(1971-), male, Lecture, Ph. D. candidate, research direetion: text information processing and data mining.
Rights and permissions
About this article
Cite this article
Yun-tao, Z., Ling, G. & Yong-cheng, W. Hierarchical subtopic segmentation of web document. Wuhan Univ. J. Nat. Sci. 11, 47–50 (2006). https://doi.org/10.1007/BF02831702
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02831702