Abstract
The rapid growth of online social media in the form of collaborativelycreated content presents new opportunities and challenges to both producers and consumers of information. With the large amount of data produced by various social media services, text analytics provides an effective way to meet usres’ diverse information needs. In this chapter, we first introduce the background of traditional text analytics and the distinct aspects of textual data in social media. We next discuss the research progress of applying text analytics in social media from different perspectives, and show how to improve existing approaches to text representation in social media, using real-world examples.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
L. Adamic, J. Zhang, E. Bakshy, and M. Ackerman. Knowledge sharing and yahoo answers: everyone knows something. In Proceeding of the 17th international conference on World Wide Web, pages 665–674. ACM, 2008.
N. Agarwal, H. Liu, L. Tang, and P. S. Yu. Identifying the influential bloggers in a community. In Proceedings of the international conference on Web search and web data mining, WSDM ’08, pages 207–218, New York, NY, USA, 2008. ACM.
C. C. Aggarwal and N. Li. On node classification in dynamic content-based networks. In The Eleventh SIAM International Conference on Data Mining, pages 355–366, 2011.
C. C. Aggarwal and H.Wang. Text mining in social networks. Social Network Data Analytics, pages 353–378, 2011.
E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of the international conference on Web search and web data mining, WSDM ’08, pages 183–194, New York, NY, USA, 2008. ACM.
R. Angelova and G. Weikum. Graph-based text classification: learn from your neighbors. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 485–492. ACM, 2006.
E. Bakshy, J. Hofman, W. Mason, and D. Watts. Identifying influencers on twitter. In Proceedings of the fourth ACM International Conference on Web Search and Data Mining, 2011.
S. Banerjee, K. Ramanathan, and A. Gupta. Clustering short texts using wikipedia. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 787–788. ACM, 2007.
G. Barbier and H. Liu. Information Provenance in Social Media. Social Computing, Behavioral-Cultural Modeling and Prediction, pages 276–283, 2011.
D. Carmel, H. Roitman, and N. Zwerdling. Enhancing cluster labeling using wikipedia. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 139–146. ACM, 2009.
S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In ACM SIGMOD Record, volume 27, pages 307–318. ACM, 1998.
H.-H. Chen, M.-S. Lin, and Y.-C. Wei. Novel association measures using web search with double checking. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 1009–1016. Association for Computational Linguistics, 2006.
L. Chen and A. Roy. Event detection from Flickr data through wavelet-based spatial analysis. In Proceeding of the 18th ACM conference on Information and knowledge management, pages 523–532. ACM, 2009.
B. Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the International AAAI Conference on Weblogs and Social Media, pages 122–129, 2010.
B. Danushka, M. Yutaka, and I. Mitsuru. Measuring semantic similarity between words using web search engines. In Proceedings of the 16th international conference on World Wide Web, WWW ’07, pages 757–766, 2007
L. Denoyer and P. Gallinari. The wikipedia xml corpus. SIGIR Forum, 40(1):64–69, 2006.
J. F”urnkranz. Exploiting structural information for text classification on the www. Advances in Intelligent Data Analysis, pages 487–497, 1999.
E. Gabrilovich and S. Markovitch. Feature generation for text categorization using world knowledge. In International joint conference on artificial intelligence, volume 19, page 1048, 2005.
E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the National Conference on Artificial Intelligence, volume 21, page 1301, 2006.
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 6–12, 2007.
S. Gerani, M. J. Carman, and F. Crestani. Proximity-based opinion retrieval. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’10, pages 403–410, New York, NY, USA, 2010. ACM.
M. Gray, B. Team, J. Pickett, D. Hoiberg, D. Clancy, P. Norvig, J. Orwant, and S. Pinker. Quantitative Analysis of Culture Using Millions of Digitized Books. science, 1199644(176):331, 2011.
Z. Guan, C. Wang, J. Bu, C. Chen, K. Yang, D. Cai, and X. He. Document recommendation in social tagging services. In Proceedings of the 19th international conference on World wide web,WWW ’10, pages 391–400, New York, NY, USA, 2010. ACM.
J. Hammerton, M. Osborne, S. Armstrong, and W. Daelemans. Introduction to special issue on machine learning approaches to shallow parsing. Machine Learning Research, 2:551–558, 2002.
F. M. Harper, D. Moy, and J. A. Konstan. Facts or friends?: distinguishing informational and conversational questions in social qa sites. In Proceedings of the 27th international conference on Human factors in computing systems, CHI ’09, pages 759–768, New York, NY, USA, 2009. ACM.
P. Heymann, G. Koutrika, and H. Garcia-Molina. Can social bookmarking improve web search? In Proceedings of the international conference on Web search and web data mining, pages 195–206. ACM, 2008.
J. Hu, L. Fang, Y. Cao, H. Zeng, H. Li, Q. Yang, and Z. Chen. Enhancing text clustering by leveraging Wikipedia semantics. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 179–186. ACM, 2008.
X. Hu, N. Sun, C. Zhang, and T.-S. Chua. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In Proceeding of the 18th ACM conference on Information and knowledge management, pages 919–928. ACM, 2009.
X. Hu, X. Zhang, C. Lu, E. K. Park, and X. Zhou. Exploiting wikipedia as external knowledge for document clustering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 389–396. ACM, 2009.
A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56–65. ACM, 2007.
M. Ji, Y. Sun, M. Danilevsky, J. Han, and J. Gao. Graph regularized transductive classification on heterogeneous information networks. Machine Learning and Knowledge Discovery in Databases, pages 570–586, 2010.
G. Kumaran and J. Allan. Text classification and named entities for new event detection. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 297–304. ACM, 2004.
H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 591–600, New York, NY, USA, 2010. ACM.
Y. Lee, H.-y. Jung, W. Song, and J.-H. Lee. Mining the blogosphere for top news stories identification. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’10, pages 395–402, New York, NY, USA, 2010. ACM.
K. Lerman and T. Hogg. Using a model of social dynamics to predict popularity of news. In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 621–630, New York, NY, USA, 2010. ACM.
D. Lewis and W. Croft. Term clustering of syntactic phrases. In Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, pages 385–404. ACM, 1989.
C. Lin, B. Zhao, Q. Mei, and J. Han. Pet: a statistical model for popular events tracking in social communities. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 929–938. ACM, 2010.
Y. Lu, P. Tsaparas, A. Ntoulas, and L. Polanyi. Exploiting social context for review quality prediction. In Proceedings of the 19th international conference on World wide web,WWW’10, pages 691–700, New York, NY, USA, 2010. ACM.
C. Macdonald, I. Ounis, and I. Soboroff. Overview of the trec-2009 blog track. Proceedings of TREC 2009, 2010.
D. Margineantu, W. Wong, and D. Dash. Machine learning algorithms for event detection. Machine Learning, 79(3):257–259, 2010.
J. McLean. State of the Blogosphere, introduction, 2009.
M. Mendoza, B. Poblete, and C. Castillo. Twitter Under Crisis: Can we trust what we RT? In 1st Workshop on Social Media Analytics (SOMA’10), 2010.
S. Moturu. Quantifying the Trustworthiness of User-Generated Social Media Content. PhD thesis, Arizona State University, 2009.
S. Osinski, J. Stefanowski, and D. Weiss. Lingo: Search results clustering algorithm based on singular value decomposition. In Proceedings of the IIS: IIPWM’04 Conference, page 359, 2004.
X.-H. Phan, L.-M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceeding of the 17th international conference on World Wide Web, pages 91–100. ACM, 2008.
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web, pages 851–860. ACM, 2010.
B. Sigurbjornsson and R. Van Zwol. Flickr tag recommendation based on collective knowledge. In Proceeding of the 17th international conference on World Wide Web, pages 327–336. ACM, 2008.
A. Stavrianou, P. Andritsos, and N. Nicoloyannis. Overview and semantic issues of text mining. ACM SIGMOD Record, 36(3):23–34, 2007.
Y. Sun, J. Han, J. Gao, and Y. Yu. itopicmodel: Information network-integrated topic modeling. In Data Mining, 2009. ICDM’09. Ninth IEEE International Conference on, pages 493–502. IEEE, 2009.
Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 797–806. ACM, 2009.
J. Surowiecki. The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations. Random House of Canada, 2004.
L. Tang and H. Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 817–826. ACM, 2009.
L. Urena-Lopez, M. Buenaga, and J. Gomez. Integrating linguistic resources in TC through WSD. Computers and the Humanities, 35(2):215–230, 2001.
N. Van House. Flickr and public image-sharing: distant closeness and photo exhibition. In CHI’07 extended abstracts on Human factors in computing systems, pages 2717–2722. ACM, 2007.
J. Wang, Y. Zhou, L. Li, B. Hu, and X. Hu. Improving short text clustering performance with keyword expansion. In The Sixth International Symposium on Neural Networks (ISNN 2009), pages 291–298. Springer, 2009.
K. Wang, Z. Ming, X. Hu, and T. Chua. Segmentation of multisentence questions: towards effective question retrieval in cQA services. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 387–394. ACM, 2010.
P.Wang and C. Domeniconi. Building semantic kernels for text classification using Wikipedia. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 713–721. ACM, 2008.
X. Wang, L. Tang, H. Gao, and H. Liu. Discovering overlapping groups in social media. In the 10th IEEE International Conference on Data Mining series (ICDM2010), Sydney, Australia, December 14 - 17 2010.
X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 784–793. ACM, 2007.
D. Yin, Z. Xue, L. Hong, and B. D. Davison. A probabilistic model for personalized tag prediction. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’10, pages 959–968, New York, NY, USA, 2010. ACM.
Z. Yin, R. Li, Q. Mei, and J. Han. Exploring social tagging graph for web object classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’09, pages 957–966, New York, NY, USA, 2009. ACM.
J. Yuan, Z. Zha, Z. Zhao, X. Zhou, and T. Chua. Utilizing related samples to learn complex queries in interactive concept-based video search. In Proceedings of the ACM International Conference on Image and Video Retrieval, pages 66–73. ACM, 2010.
R. Zafarani and H. Liu. Connecting Corresponding Identities across Communities. In Proceedings of the 3rd International Conference on Weblogs and Social Media (ICWSM09), 2009.
T. Zesch, C. Muller, and I. Gurevych. Extracting lexical semantic knowledge from wikipedia and wiktionary. In Proceedings of the Conference on Language Resources and Evaluation (LREC), pages 1646–1652. Citeseer, 2008.
Z. Zha, X. Hua, T. Mei, J. Wang, G. Qi, and Z. Wang. Joint multilabel multi-instance learning for image classification. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.
Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007.
Y. Zhou, H. Cheng, and J. Yu. Graph clustering based on structural/ attribute similarities. Proceedings of the VLDB Endowment, 2(1):718–729, 2009.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Hu, X., Liu, H. (2012). Text Analytics in Social Media. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_12
Download citation
DOI: https://doi.org/10.1007/978-1-4614-3223-4_12
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-3222-7
Online ISBN: 978-1-4614-3223-4
eBook Packages: Computer ScienceComputer Science (R0)