WordNet-based lexical semantic classification for text corpus analysis

Long, Jun; Wang, Lu-da; Li, Zu-de; Zhang, Zu-ping; Yang, Liu

doi:10.1007/s11771-015-2702-8

WordNet-based lexical semantic classification for text corpus analysis

Published: 08 May 2015

Volume 22, pages 1833–1840, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Central South University Aims and scope Submit manuscript

WordNet-based lexical semantic classification for text corpus analysis

Download PDF

Jun Long (龙军)¹,
Lu-da Wang (王鲁达)¹,
Zu-de Li (李祖德)¹,
Zu-ping Zhang (张祖平)¹ &
…
Liu Yang (杨柳)²

154 Accesses
5 Citations
Explore all metrics

Abstract

Many text classifications depend on statistical term measures to implement document representation. Such document representations ignore the lexical semantic contents of terms and the distilled mutual information, leading to text classification errors. This work proposed a document representation method, WordNet-based lexical semantic VSM, to solve the problem. Using WordNet, this method constructed a data structure of semantic-element information to characterize lexical semantic contents, and adjusted EM modeling to disambiguate word stems. Then, in the lexical-semantic space of corpus, lexical-semantic eigenvector of document representation was built by calculating the weight of each synset, and applied to a widely-recognized algorithm NWKNN. On text corpus Reuter-21578 and its adjusted version of lexical replacement, the experimental results show that the lexical-semantic eigenvector performs F1 measure and scales of dimension better than term-statistic eigenvector based on TF-IDF. Formation of document representation eigenvectors ensures the method a wide prospect of classification applications in text corpus analysis.

Article PDF

Research on Text Classification Method Based on Word2vec and Improved TF-IDF

A feature selection method based on synonym merging in text classification system

Article Open access 05 October 2017

Performance Comparison of TF*IDF, LDA and Paragraph Vector for Document Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

JING L P, NG M K, HUANG JOSHUA Z. Knowledge-based vector space model for text clustering [J]. Knowledge and Information Systems, 2010, 25(1): 35–55.
Article Google Scholar
ZHANG Wen, YOSHIDA Taketoshi, TANG Xi-jin. A comparative study of TF*IDF, LSI and multi-words for text classification [J]. Expert Systems with Applications, 2011, 38(3): 2758–2765.
Article Google Scholar
ZHANG Yin, JIN Rong, ZHOU Zhi-hua. Understanding bag-of-words model: a statistical framework [J]. International Journal of Machine Learning and Cybernetics, 2010, 1(1/2/3/4): 43–52.
Article MATH Google Scholar
LI P, SHRIVASTAVA A, KONIG A C. b-Bit minwise hashing in practice [C]// Proceedings of the 5th Asia-Pacific Symposium on Internetware. New York: ACM, 2013: 13–22.
Google Scholar
HAMID A O, BEHZADI B, CHRISTOPH S, HENZINGER M. Detecting the origin of text segments efficiently [C]// Proceedings of the 18th International Conference on World Wide Web. New York: ACM, 2009: 61–70.
Chapter Google Scholar
SANCHEZ D, BATET M. A semantic similarity method based on information content exploiting multiple ontologies [J]. Expert Systems with Applications, 2013, 40(4): 1393–1399.
Article Google Scholar
CHURCH K W, HANKS P. Word association norms, mutual information, and lexicography [J]. Computational linguistics, 1990, 16(1): 22–29.
Google Scholar
MILLER G A. WordNet: A lexical database for English [J]. Communications of the ACM, 1995, 38(11): 39–41.
Article Google Scholar
LINTEAN M, RUS V. Measuring Semantic similarity in short texts through greedy pairing and word semantics [C]// Proceedings of the 25th International Florida Artificial Intelligence Research Society Conference. Marco Island, USA: AAAI, 2012: 244–249.
Google Scholar
MIT. MIT Java Wordnet interface (JWI) [EB/OL]. [2013-12-20]. http://projects.csail.mit.edu/jwi/api/edu/mit/jwi/morph/WordnetStem mer.html/.
ZHAO Ling-yun, LIU Fang-ai, ZHU Zhen-fang. Frontier and future development of information technology in medicine and education: Identification of evaluation collocation based on maximum entropy model [M]. 1st ed. New York: Springer, 2013: 713–721.
Google Scholar
HWANG M, CHOI C, KIM P. Automatic enrichment of semantic relation network and its application to word sense disambiguation [J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(6): 845–858.
Article Google Scholar
KEYLOCK C J. Simpson diversity and the Shannon-Wiener index as special cases of a generalized entropy [J]. Oikos, 2005, 109(1): 203–207.
Article Google Scholar
TAN S. Neighbor-weighted k-nearest neighbor for unbalanced text corpus [J]. Expert Systems with Applications, 2005, 28(4): 667–671.
Article Google Scholar
AGGARWAL C C, ZHAI C X. Mining text data: A survey of text classification algorithms [M]. 1st ed. New York: Springer, 2012: 163–222.
Book Google Scholar
TATA S, PATEL J M. Estimating the selectivity of tf-idf based cosine similarity predicates [J]. ACM Sigmod Record, 2007, 36(2): 7–12.
Article Google Scholar
van RIJSBERGEN C. Information retrieval [M]. London: Butterworths Press, 1979.
Google Scholar
YAN Jun, LIU Ning, YAN Shui-cheng, YANG Qiang, FAN Wei-guo, WEI Wei, CHEN Zheng. Trace-oriented feature analysis for large-scale text data dimension reduction [J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(7): 1103–1117.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Central South University, Changsha, 410075, China
Jun Long (龙军), Lu-da Wang (王鲁达), Zu-de Li (李祖德) & Zu-ping Zhang (张祖平)
School of Software, Central South University, Changsha, 410075, China
Liu Yang (杨柳)

Authors

Jun Long (龙军)
View author publications
You can also search for this author in PubMed Google Scholar
Lu-da Wang (王鲁达)
View author publications
You can also search for this author in PubMed Google Scholar
Zu-de Li (李祖德)
View author publications
You can also search for this author in PubMed Google Scholar
Zu-ping Zhang (张祖平)
View author publications
You can also search for this author in PubMed Google Scholar
Liu Yang (杨柳)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lu-da Wang (王鲁达).

Additional information

Foundation item: Project(2012AA011205) supported by National High-Tech Research and Development Program (863 Program) of China; Projects(61272150, 61379109, M1321007, 61301136, 61103034) supported by the National Natural Science Foundation of China; Project(20120162110077) supported by Research Fund for the Doctoral Program of Higher Education of China; Project(11JJ1012) supported by Excellent Youth Foundation of Hunan Scientific Committee, China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Long, J., Wang, Ld., Li, Zd. et al. WordNet-based lexical semantic classification for text corpus analysis. J. Cent. South Univ. 22, 1833–1840 (2015). https://doi.org/10.1007/s11771-015-2702-8

Download citation

Received: 21 March 2014
Accepted: 11 October 2014
Published: 08 May 2015
Issue Date: May 2015
DOI: https://doi.org/10.1007/s11771-015-2702-8

Key words

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

WordNet-based lexical semantic classification for text corpus analysis

Abstract

Article PDF

Similar content being viewed by others

Research on Text Classification Method Based on Word2vec and Improved TF-IDF

A feature selection method based on synonym merging in text classification system

Performance Comparison of TF*IDF, LDA and Paragraph Vector for Document Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

WordNet-based lexical semantic classification for text corpus analysis

Abstract

Article PDF

Similar content being viewed by others

Research on Text Classification Method Based on Word2vec and Improved TF-IDF

A feature selection method based on synonym merging in text classification system

Performance Comparison of TF*IDF, LDA and Paragraph Vector for Document Classification

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation