Abstract
This paper describes a comparative study of STASIS and LSA. These measures of semantic similarity can be applied to short texts for use in Conversational Agents (CAs). CAs are computer programs that interact with humans through natural language dialogue. Business organizations have spent large sums of money in recent years developing them for online customer self-service, but achievements have been limited to simple FAQ systems. We believe this is due to the labour-intensive process of scripting, which could be reduced radically by the use of short-text semantic similarity measures. “Short texts” are typically 10-20 words long but are not required to be grammatically correct sentences, for example spoken utterances and text messages. We also present a benchmark data set of 65 sentence pairs with human-derived similarity ratings. This data set is the first of its kind, specifically developed to evaluate such measures and we believe it will be valuable to future researchers.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Li, Y., et al.: Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE Transactions on Knowledge and Data Engineering 18(8), 1138–1150 (2006)
Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)
Lapalme, G., Lamontagne, L.: Textual Reuse for Email Response. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 242–256. Springer, Heidelberg (2004)
Glass, J., et al.: A Framework for Developing Conversational User Interfaces. In: Fourth International Conference on Computer-Aided Design of User Interfaces, Funchal, Isle of Madeira, Portugal (2004)
Bickmore, T., Giorgino, T.: Health dialog systems for patients and consumers. J. Biomed. Inform. 39(5), 556–571 (2006)
Cassell, J., et al.: Embodied Conversational Agents (2000)
Gorin, A.L., Riccardi, G., Wright, J.H.: How I help you? Speech Communication 23, 113–127 (1997)
Graesser, A.C., et al.: AutoTutor: An Intelligent Tutoring System With Mixed Initiative Dialogue. IEEE Transactions on Education 48(4), 612–618 (2005)
McGeary, Z., et al.: Online Self-service: The Slow Road to Search Effectiveness, in Customer Relationship Management (2005)
Sammut, C.: Managing Context in a Conversational Agent. Electronic Transactions in Artificial Intelligence Volume, 191–201 (2001)
Michie, D.: Return of the Imitation Game. Electronic Transactions in Artificial Intelligence Volume, 205–220 (2001)
Resnik, P., Diab, M.: Measuring Verb Similarity. In: Twenty Second Annual Meeting of the Cognitive Science Society (COGSCI 2000), Philadelphia (2000)
Resnik, P.: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
Prior, A., Bentin, S.: Incidental formation of episodic associations: The importance of sen-tential context. Memory and Cognition 31, 306–316 (2003)
McNamara, T.P., Sternberg, R.J.: Processing Verbal Relations. Intelligence 15, 193–221 (1991)
Miller, G.A., Charles, W.G.: Contextual Correlates of Semantic Similarity. Language and Cognitive Processes 6(1), 1–28 (1991)
Viggliocho, G., et al.: Representing the meanings of object and action words: The featural and unitary semantic space hypothesis. Cognition 85, B1–B69 (2002)
Charles, W.G.: Contextual Correlates of Meaning. Applied Psycholinguistics 21, 505–524 (2000)
Klein, D., Murphy, G.: Paper has been my ruin: conceptual relations of polysemous senses. Journal of Memory and Language 47(4), 548–570 (2002)
Tversky, A.: Features of Similarity. Psychological Review 84(4), 327–352 (1977)
Gleitman, L.R., et al.: Similar, and similar concepts. Cognition 58, 321–376 (1996)
Deerwester, S., et al.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Blalock, H.M.: Social Statistics. McGraw-Hill Inc., New York (1979)
Rubenstein, H., Goodenough, J.: Contextual Correlates of Synonymy. Communications of the ACM 8(10), 627–633 (1965)
Sinclair, J.: Collins Cobuild English Dictionary for Advanced Learners, 3rd edn. Harper Collins, New York (2001)
O’Shea, J.D.: http://www.docm.mmu.ac.uk/STAFF/J.Oshea/
Laham, D.: (October 1998) (cited 30/09/2007), http://lsa.colorado.edu/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
O’Shea, J., Bandar, Z., Crockett, K., McLean, D. (2008). A Comparative Study of Two Short Text Semantic Similarity Measures. In: Nguyen, N.T., Jo, G.S., Howlett, R.J., Jain, L.C. (eds) Agent and Multi-Agent Systems: Technologies and Applications. KES-AMSTA 2008. Lecture Notes in Computer Science(), vol 4953. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78582-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-78582-8_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78581-1
Online ISBN: 978-3-540-78582-8
eBook Packages: Computer ScienceComputer Science (R0)