Abstract
A major challenge for next generation data mining systems is creative knowledge discovery from diverse and distributed data sources. In this task an important challenge is information fusion of diverse mainly unstructured representations into a unique knowledge format. This chapter focuses on merging information available in text documents into an information network – a graph representation of knowledge. The problem addressed is how to efficiently and effectively produce an information network from large text corpora from at least two diverse, seemingly unrelated, domains. The goal is to produce a network that has the highest potential for providing yet unexplored cross domain links which could lead to new scientific discoveries. The focus of this work is better identification of important domain bridging concepts that are promoted as core nodes around which the rest of the network is formed. The evaluation is performed by repeating a discovery made on medical articles in the migraine magnesium domain.
Chapter PDF
Similar content being viewed by others
Keywords
References
Albert, R., Barabasi, A.L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74(1), 47–97 (2002)
Bales, M.E., Johnson, S.B.: Graph theoretic modeling of large scale semantic networks. Journal of Biomedical Informatics 39(4), 451–464 (2006)
Berthold, M.R., Dill, F., Kötter, T., Thiel, K.: Supporting Creativity: Towards Associative Discovery of New Insights. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 14–25. Springer, Heidelberg (2008)
Segond, M., Borgelt, C.: “BisoNet” Generation using Textual Data. In: Proceedings of Workshop on Explorative Analytics of Information Networks at ECML PKDD (2009)
Boström, H., Andler, S.F., Brohede, M., Johansson, R., Karlsson, A., van Laere, J., Niklasson, L., Nilsson, M., Persson, A., Ziemke, T.: On the definition of information fusion as a field of research. Technical report, University of Skovde, School of Hum.and Inf., Skovde, Sweden (2007)
Dubitzky, W., Kötter, T., Schmidt, O., Berthold, M.R.: Towards Creative Information Exploration Based on Koestler’s Concept of Bisociation. In: Berthold, M.R. (ed.) Bisociative Knowledge Discovery. LNCS (LNAI), vol. 7250, pp. 11–32. Springer, Heidelberg (2012)
Dura, E., Gawronska, B., Olsson, B., Erlendsson, B.: Towards Information Fusion in Pathway Evaluation: Encoding Relations in Biomedical Texts. In: Proceedings of the 9th International Conference on Information Fusion (2006)
Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press (2007)
Fortuna, B., Lavrač, N., Velardi, P.: Advancing Topic Ontology Learning through Term Extraction. In: Ho, T.-B., Zhou, Z.-H. (eds.) PRICAI 2008. LNCS (LNAI), vol. 5351, pp. 626–635. Springer, Heidelberg (2008)
Juršič, M., Mozetič, I., Lavrač., N.: Learning Ripple Down Rules for Efficient Lemmatization. In: Proceedings of the 10th International Multiconference Information Society 2007, vol. A, pp. 206–209 (2007)
Koestler, A.: The Act of Creation. The Macmillan Co. (1964)
Kötter, T., Berthold, M.R.: From Information Networks to Bisociative Information Networks. In: Berthold, M.R. (ed.) Bisociative Knowledge Discovery. LNCS (LNAI), vol. 7250, pp. 33–50. Springer, Heidelberg (2012)
Ohsawa, Y., Benson, N.E., Yachida, M.: KeyGraph: Automatic Indexing by Co occurrence Graph based on Building Construction Metaphor. In: Proceedings of the Advances in Digital Libraries Conference (ADL), pp. 12–18 (1998)
Petric, I., Urbancic, T., Cestnik, B., Macedoni Luksic, M.: Literature mining method RaJoLink for uncovering relations between biomedical concepts. Journal of Biomedical Informatics 42(2), 219–227 (2009)
Petrič, I., Cestnik, B., Lavrač, N., Urbančič, T.: Outlier Detection in Cross Context Link Discovery for Creative Literature Mining. Comput. J., November 2 (2010)
Petrič, I., Cestnik, B., Lavrač, N., Urbančič, T.: Bisociative Knowledge Discovery by Literature Outlier Detection. In: Berthold, M.R. (ed.) Bisociative Knowledge Discovery. LNCS (LNAI), vol. 7250, pp. 313–324. Springer, Heidelberg (2012)
Porter, M.F.: An algorithm for suffix stripping. Progr. 14(3), 130–137 (1980)
Provost, F., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42(3), 203–231 (2001)
Racunas, S., Griffin, C.: Logical data fusion for biological hypothesis evaluation. In: Proceedings of the 8th International Conference on Information Fusion (2005)
Sluban, B., Juršič, M., Cestnik, B., Lavrač, N.: Exploring the Power of Outliers for Cross-domain Literature Mining. In: Berthold, M.R. (ed.) Bisociative Knowledge Discovery. LNCS (LNAI), vol. 7250, pp. 325–337. Springer, Heidelberg (2012)
Smalheiser, N.R., Swanson, D.R.: Using ARROWSMITH: a computer assisted approach to formulating and assessing scientific hypotheses. Comput Methods Programs Biomed. 57(3), 149–153 (1998)
Smirnov, A., Pashkin, M., Shilov, N., Levashova, T., Krizhanovsky, A.: Intelligent Support for Distributed Operational Decision Making. In: Proceedings of the 9th International Conference on Information Fusion (2006)
Srinivasan, P., Libbus, B., Sehgal, A.K.: Mining MEDLINE: Postulating a beneficial role for curcumin longa in retinal diseases. In: Hirschman, L., Pustejovsky, J. (eds.) BioLINK 2004: Linking Biological Literature, Ontologies, and Databases, Boston, Massachusetts, pp. 33–40 (2004)
Swanson, D.R.: Migraine and magnesium: Eleven neglected connections. Perspectives in Biology and Medicine 31(4), 526–557 (1988)
Swanson, D.R.: Medical literature as a potential source of new knowledge. Bull. Med. Libr. Assoc. 78(1), 29–37 (1990)
Swanson, D.R., Smalheiser, N.R., Torvik, V.I.: Ranking Indirect Connections in Literature Based Discovery: The Role of Medical Subject Headings (MeSH). Journal of the American Society for Inf. Science and Technology 57, 1427–1439 (2006)
Urbančič, T., Petrič, I., Cestnik, B., Macedoni-Lukšič, M.: Literature Mining: Towards Better Understanding of Autism. In: Bellazzi, R., Abu-Hanna, A., Hunter, J. (eds.) AIME 2007. LNCS (LNAI), vol. 4594, pp. 217–226. Springer, Heidelberg (2007)
Weeber, M., Vos, R., Klein, H., de Jong van den Berg, L.T.W.: Using concepts in literature based discovery: Simulating Swanson’s Raynaud–fish oil and migraine–magnesium discoveries. J. Am. Soc. Inf. Sci. Tech. 52(7), 548–557 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2012 The Author(s)
About this chapter
Cite this chapter
Juršič, M., Sluban, B., Cestnik, B., Grčar, M., Lavrač, N. (2012). Bridging Concept Identification for Constructing Information Networks from Text Documents. In: Berthold, M.R. (eds) Bisociative Knowledge Discovery. Lecture Notes in Computer Science(), vol 7250. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31830-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-31830-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31829-0
Online ISBN: 978-3-642-31830-6
eBook Packages: Computer ScienceComputer Science (R0)