Abstract
Social networks are rich in various kinds of contents such as text and multimedia. The ability to apply text mining algorithms effectively in the context of text data is critical for a wide variety of applications. Social networks require text mining algorithms for a wide variety of applications such as keyword search, classification, and clustering. While search and classification are well known applications for a wide variety of scenarios, social networks have a much richer structure both in terms of text and links. Much of the work in the area uses either purely the text content or purely the linkage structure. However, many recent algorithms use a combination of linkage and content information for mining purposes. In many cases, it turns out that the use of a combination of linkage and content information provides much more effective results than a system which is based purely on either of the two. This paper provides a survey of such algorithms, and the advantages observed by using such algorithms in different scenarios. We also present avenues for future research in this area.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keyword
References
C. C. Aggarwal, H. Wang (ed.) Managing and Mining Graph Data, Springer, 2010.
C. C. Aggarwal, Y. Zhao, P. Yu. On Clustering Graph streams, SIAM Conference on Data Mining, 2010.
C. C. Aggarwal, P. S. Yu. A Framework for Clustering Massive Text and Categorical Data Streams, SIAM Conference on Data Mining, 2006.
S. Agrawal, S. Chaudhuri, G. Das. DBXplorer: A system for keywordbased search over relational databases. ICDE Conference, 2002.
R. Agrawal, S. Rajagopalan, R. Srikant, Y. Xu.Mining Newsgroups using Networks arising from Social Behavior. WWW Conference, 2003.
A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: Authority-based keyword search in databases. In VLDB, pages 564–575, 2004.
G. Bhalotia, C. Nakhe, A. Hulgeri, S. Chakrabarti, S. Sudarshan. Keyword searching and browsing in databases using BANKS. ICDE Conference, 2002.
C. Bird, A. Gourley, P. Devanbabu, M. Gertz, A. Swaminathan. Mining Email Social Networks, MSR, 2006.
D. Bortner, J. Han. Progressive Clustering of Networks Using Structure-Connected Order of Traversal, ICDE Conference, 2010.
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer networks and ISDN systems, 30(1-7):107–117, 1998.
V. Carvalho, W. Cohen. On the Collective Classification of Email “Speech Acts”, ACM SIGIR Conference, 2005.
D. Chakrabarti, R. Kumar, A. Tomkins. Evolutionary clustering. KDD Conference, 2006.
S. Chakrabarti, B. Dom, P. Indyk. Enhanced Hypertext Categorization using Hyperlinks, ACM SIGMOD Conference, 1998.
Y. Chi, X. Song, D. Zhou, K. Hino, B. L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. ACM KDD Conference, 2007.
S. Cohen, J. Mamou, Y. Kanza, Y. Sagiv. XSEarch: A semantic search engine for XML. VLDB Conference, 2003.
W. Cohen, V. Carvalho, T. Mitchell, Learning to Classify Email into ÂŞSpeech ActsÂŤ. Conference on Empirical Methods in Natural Language Processing, 2004.
W. Dai, Y. Chen, G. Xue, Q. Yang, Y. Yu. Translated Learning: Transfer Learning across different Feature Spaces. NIPS Conference, 2008.
D. R. Cutting, J. O. Pedersen, D. R. Karger, J. W. Tukey. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections, ACM SIGIR Conference, 1992.
D. Florescu, D. Kossmann, and I. Manolescu. Integrating keyword search into XML query processing. Comput. Networks, 33(1-6):119–135, 2000.
N. Fuhr, C. Buckley. Probabilistic Document Indexing from Relevance Feedback Data. SIGIR Conference, pages 45–61, 1990.
L. Guo, F. Shao, C. Botev, J. Shanmugasundaram. XRANK: ranked keyword search over XML documents. ACM SIGMOD Conference, pages 16–27, 2003.
M. Handcock, A Raftery, J. Tantrum. Model-based Clustering for Social Networks. Journal of the Royal Statistical Society, 170(2), pp. 301–354, 2007.
H. He, H. Wang, J. Yang, P. S. Yu. BLINKS: Ranked keyword searches on graphs. SIGMOD Conference, 2007.
H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: Ranked keyword searches on graphs. Technical report, Duke CS Department, 2007.
D. Heckerman, D. Chickering, C. Meek, R. Rounthwaite, C. Kadie. Dependency networks for inference, collaborative filtering and data visualization. Journal of Machine Learning Research, 1, pp. 49–75, 2000.
P. Hoff, A. Raftery, M. Handcock. Latent Space Approaches to Social Network Analysis, Technical Report No. 399, University of Washington at Seattle, 2001.
V. Hristidis, N. Koudas, Y. Papakonstantinou, D. Srivastava. Keyword proximity search in XML trees. IEEE Transactions on Knowledge and Data Engineering, 18(4):525–539, 2006.
V. Hristidis, Y. Papakonstantinou. Discover: Keyword search in relational databases. VLDB Conference, 2002.
V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, H. Karambelkar. Bidirectional expansion for keyword search on graph databases. VLDB Conference, 2005.
T. Joachims. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. ICML Conference, pages 143–151, 1997.
R. Kaushik, R. Krishnamurthy, J. F. Naughton, and R. Ramakrishnan. On the integration of structure indexes and inverted lists. In SIGMOD, pages 779–790, 2004.
B.W. Kernighan, S. Lin, An efficient heuristic procedure for partitioning graphs. Bell Systems Technical Journal (49) pp. 291ÂŰ-307, 1970.
M. S. Kim, J. Han. A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks. PVLDB, 2(1): pp. 622–633, 2009.
T. Lappas, K. Liu, E. Terzi. Finding a Team of Experts in Social Networks. ACM KDD Conference, 2009.
N. Loeff, C. O. Alm, D. A. Forsyth. Discriminating image senses by clustering with multimodal features. ACL Conference, pp. 547ÂŰ-554, 2006.
M. Maron. Automatic Indexing: An Experimental Inquiry. J. ACM, 8(3), pages 404-417, 1961.
A. McCallum. Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering. http://www.cs.cmu.edu/ mccallum/ bow, 1996.
N. Mishra, R. Schreiber, I. Stanton, R. E. Tarjan, Finding Strongly-Knit Clusters in Social Networks, Internet Mathematics, 2009.
M. E. J. Newman, Fast algorithm for detecting community structure in networks, Phys. Rev. E 69, 066133, 2004.
S.J. Pan, Q. Yang. A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering, October 2009.
L. Qin, J.-X. Yu, L. Chang. Keyword search in databases: The power of RDBMS. SIGMOD Conference, 2009.
H. Schutze, C. Silverstein, Projections for Efficient Document Clustering, ACM SIGIR Conference, 1992.
Y. Sun, J. Han, J. Gao, Y. Yu, iTopicModel: Information Network-Integrated Topic Modeling. ICDM Conference, 2009.
B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In UAI, pages 485–492, 2002.
Y. Yang. An evaluation of statistical approaches to text categorization. Inf. Retr., 1(1-2):69–90, 1999.
T. Zhang, A. Popescul, and B. Dom. Linear prediction models with graph regularization for web-page categorization. In KDD, pages 821–826, 2006.
S. Zhong. Efficient Streaming Text Clustering, Neural Networks, 18 (5–6), pp. 790–798, 2005.
D. Zhou, J. Huang, and B. Schölkopf. Learning from labeled and unlabeled data on a directed graph. In ICML, pages 1036–1043, 2005.
H. Wang, C. Aggarwal. A Survey of Algorithms for Keyword Search on Graph Data. appears as a chapter in Managing and Mining Graph Data, Springer, 2010.
Y. Xu, Y. Papakonstantinou. Efficient LCA based keyword search in XML data. EDBT Conference, 2008.
Y. Xu, Y.Papakonstantinou. Efficient keyword search for smallest LCAs in XML databases. ACM SIGMOD Conference, 2005.
Q. Yang, D. Chen, G.-R. Xue, W. Dai, Y. Yu. Heterogeneous Transfer Learning for Image Clustering vis the Social Web. ACL, 2009.
Y. Zhou, H. Cheng, and J. X. Yu. Graph clustering based on structural/attribute similarities. PVLDB, 2(1):718–729, 2009.
Y. Zhu, S. J. Pan, Y. Chen, G.-R. Xue, Q. Yang, Y. Yu. Heterogeneous Transfer Learning for Image Classification. AAAI, 2010.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Aggarwal, C.C., Wang, H. (2011). Text Mining in Social Networks. In: Aggarwal, C. (eds) Social Network Data Analytics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-8462-3_13
Download citation
DOI: https://doi.org/10.1007/978-1-4419-8462-3_13
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-8461-6
Online ISBN: 978-1-4419-8462-3
eBook Packages: Computer ScienceComputer Science (R0)