Abstract
Retrieving documents by querying their mathematical content directly can be useful in various domains, including education, engineering, patent research, physics, and medical sciences. As distinct from text retrieval, however, mathematical symbols in isolation do not contain much semantic information, and the structure of an expression must be considered as well. Unfortunately, considering the structure to calculate the relevance scores of documents results in ranking algorithms that are computationally more expensive than the typical ranking algorithms employed for text documents. As a result, current math retrieval systems either limit themselves to exact matches, or they ignore the structure completely; they sacrifice either recall or precision for efficiency. We propose instead an efficient end-to-end math retrieval system based on a structural similarity ranking algorithm. We describe novel optimizations techniques to reduce the index size and the query processing time, and we experimentally validate our system in terms of correctness and efficiency. Thus, with the proposed optimizations, mathematical contents can be fully exploited to rank documents in response to mathematical queries.
Financial assistance for the research was provided by the Natural Sciences and Engineering Research Council of Canada, Mprime, and the University of Waterloo.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bancerek, G.: Information retrieval and rendering with MML Query. In: Borwein, J.M., Farmer, W.M. (eds.) MKM 2006. LNCS (LNAI), vol. 4108, pp. 266–279. Springer, Heidelberg (2006)
Einwohner, T.H., Fateman, R.J.: Searching techniques for integral tables. In: ISSAC, pp. 133–139 (1995)
Grimm, J.: Tralics, A LaTeX to XML Translator. INRIA (2008)
Guidi, F., Schena, I.: A query language for a metadata framework about mathematical resources. In: Asperti, A., Buchberger, B., Davenport, J.H. (eds.) MKM 2003. LNCS, vol. 2594, pp. 105–118. Springer, Heidelberg (2003)
Kamali, S., Tompa, F.W.: Improving mathematics retrieval. In: DML, pp. 37–48 (2009)
Kamali, S., Tompa, F.W.: A new mathematics retrieval system. In: CIKM, pp. 1413–1416 (2010)
Kamali, S., Tompa, F.W.: Retrieving documents with mathematical content. In: SIGIR (2013)
Kohlhase, M., Sucan, I.: A search engine for mathematical formulae. In: Calmet, J., Ida, T., Wang, D. (eds.) AISC 2006. LNCS (LNAI), vol. 4120, pp. 241–253. Springer, Heidelberg (2006)
Laitang, C., Boughanem, M., Pinel-Sauvagnat, K.: XML information retrieval through tree edit distance and structural summaries. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds.) AIRS 2011. LNCS, vol. 7097, pp. 73–83. Springer, Heidelberg (2011)
Maclean, S., Labahn, G.: A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets. In: IJDAR, pp. 1–25 (2012)
Mišutka, J., Galamboš, L.: System description: Egomath2 as a tool for mathematical searching on wikipedia.org. In: Davenport, J.H., Farmer, W.M., Urban, J., Rabe, F. (eds.) Calculemus/MKM 2011. LNCS, vol. 6824, pp. 307–309. Springer, Heidelberg (2011)
Munavalli, R., Miner, R.: Mathfind: a math-aware search engine. In: SIGIR, pp. 735–735 (2006)
Nguyen, T.T., Chang, K., Hui, S.C.: A math-aware search engine for math question answering system. In: CIKM, pp. 724–733 (2012)
Schellenberg, T., Yuan, B., Zanibbi, R.: Layout-based substitution tree indexing and retrieval for mathematical expressions. In: DRR (2012)
Smirnova, E.S., Watt, S.M.: Communicating mathematics via pen-based interfaces. In: SYNASC, pp. 9–18 (2008)
Sojka, P., Líska, M.: The art of mathematics retrieval. In: ACM Symposium on Document Engineering, pp. 57–60 (2011)
Youssef, A.: Search of mathematical contents: Issues and methods. In: IASSE, pp. 100–105 (2005)
Youssef, A.S.: Methods of relevance ranking and hit-content generation in math search. In: Kauers, M., Kerber, M., Miner, R., Windsteiger, W. (eds.) MKM/CALCULEMUS 2007. LNCS (LNAI), vol. 4573, pp. 393–406. Springer, Heidelberg (2007)
Zanibbi, R., Yu, L.: Math spotting: Retrieving math in technical documents using handwritten query images. In: ICDAR, pp. 446–451 (2011)
Zanibbi, R., Yuan, B.: Keyword and image-based retrieval of mathematical expressions. In: DRR, pp. 1–10 (2011)
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18(6), 1245–1262 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kamali, S., Tompa, F.W. (2013). Structural Similarity Search for Mathematics Retrieval. In: Carette, J., Aspinall, D., Lange, C., Sojka, P., Windsteiger, W. (eds) Intelligent Computer Mathematics. CICM 2013. Lecture Notes in Computer Science(), vol 7961. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39320-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-39320-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39319-8
Online ISBN: 978-3-642-39320-4
eBook Packages: Computer ScienceComputer Science (R0)