Abstract
This paper describes a method for automatic detection of semantic relations between concept nodes of a networked ontological knowledge base by analyzing matrices of semantic-syntactic valences of words. These matrices are obtained by means of nonnegative factorization of tensors of syntactic compatibility of words. Such tensors are generated in the course of frequency analysis of syntactic structures of sentences taken from large text corpora of English Wikipedia and Simple English Wikipedia entries.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
T. Van de Cruys, “A Non-negative tensor factorization model for selectional preference induction,” J. Natural Language Engineer., 16, No. 4, 417–437 (2010).
T. Van de Cruys, L. Rimell, T. Poibeau, and A. Korhonen, “Multi-way tensor factorization for unsupervised lexical acquisition,” in: Proc. COLING 2012, Mumbai, India (2012), pp. 2703–2720.
S. B. Cohen and M. Collins, “Tensor decomposition for fast parsing with latent-variable PCFGs,” NIPS, 2528–2536 (2012).
W. Peng and T. Li, “On the equivalence between nonnegative tensor factorization and tensorial probabilistic latent semantic analysis,” Appl. Intel., Springer J., 35, No. 2, 285–295 (2011).
C. J. Fillmore, “The Case for CASE,” in: E. Bach and R. Harms (eds.), Universals in Linguistic Theory, Holt, Rinehart, and Winston, New York (1968), pp. 1–88.
G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller, Introduction to WordNet: An On-Line Lexical Database, http://wordnetcode.princeton.edu/5papers.pdf.
A. Mikheev, C. Grover, and M. Moens, “Description of the ltg system used for muc-7,” in: Proc. 7th Message Understanding Conference (MUC-7) (1998), pp. 1–12.
I. Dagan, A. Itai, and U. Schwall, “Two languages are more informative than one,” in: Proc. ACL-91, Berkeley, California (1991), pp. 130–137.
J. Hockenmaier, G. Bierner, and J. Baldridge, “Providing robustness for a ccg system,” in: Proc. Workshop on Linguist. Theory and Grammar Implement., Birmingham (2000), pp. 97–112.
T. Briscoe and J. Carroll, “Automatic extraction of subcategorization from corpora,” in: Proc. 5th Conf. on Appl. Natural Language Proces. (ANLP-97), Washington DC, USA (1997).
F. Xia, “Extracting tree adjoining grammars from bracketed corpora,” in: Proc. 5th Natural Language Proces. Pacific Rim Symp. (NLPRS-99), Beijing, China (1999).
K. Church, W. Gale, P. Hanks, and D. Hindle, “Using statistics in lexical analysis,” in: U. Zernik (ed.), Lexical Acquisition: Exploiting On-line Resources to Build a Lexicon, Ch. 6, Lawrence Erlbaum Associates, Hillsdale-New Jersey (1991), pp. 115–164.
L. Lee, “Similarity-based approaches to natural language processing,” Ph.D. Thesis, Harvard University Techn. Rep. TR-11-97 (1997), http://www.cs.cornell.edu/home/llee/papers/thesis.pdf.
P. Cimiano and S. Staab, “Clustering concept hierarchies from text,” in: Proc. LREC (2004), pp. 1721–1724.
P. M. Hastings, “Automatic acquisition of word meaning from context,” Ph.D. Dissertation, University of Michigan (1994), http://reed.cs.depaul.edu/peterh/papers/hastingsdiss.pdf.
U. Hahn and K. Schnattinger, “Towards text knowledge engineering,” in: Proc. 15th National Conference on Artificial Intelligence AAAI-98 (1998), pp. 524–531, URL citeseer.nj.nec.com/43410.html.
V. Pekar and S. Staab, “Word classification based on combined measures of distributional and semantic similarity,” in: Proc. of Research Notes of the 10th Conf. of the European Chapter of the Assoc. for Comput. Linguistics, Budapest (2003), pp. 147–150.
E. Alfonseca and S. Manandhar, “Extending a lexical ontology by a combination of distributional semantics signatures,” in: Knowledge Engineering and Knowledge Management, Lecture Notes in Artificial Intelligence, 2473 (2002), pp. 1–7.
A. Maedche and S. Staab, “Discovering conceptual relations from text,” in: Proc. 14th Europ. Conf. on Artificial Intel. (2000), pp. 1–17.
M. A. Hearst, “Automatic acquisition of hyponyms from large text corpora,” in: Proc. COLING-92, Nantes, France (1992), pp. 539–545.
M. A. Hearst, “Automated discovery of WordNet relations,” in: Ch. Fellbaum (ed.), WordNet: An Electronic Lexical Database, MIT Press (1998), pp. 132–152.
M. Berland and E. Charniak, “Finding parts in very large corpora,” in: Proc. ACL-99 (1999), pp. 57–64.
J. Kietz, A. Maedche, and R. Volz, “A method for semi-automatic ontology acquisition from a corporate intranet,” in: Workshop “Ontologies and text” co-located with EKAW’ 2000, Juan-les-Pins, French Riviera (2000), pp. 2–6.
Y. Wilks, D. C. Fass, C. M. Guo, J. E. McDonald, T. Plate, and B. M. Slator, “Providing machine tractable dictionary tools,” J. of Comput. and Translat., No. 2, 99–154 (1990).
G. Rigau, “Automatic acquisition of lexical knowledge from MRDs,” Ph.D. Thesis, Departament de Llenguatges i Sistemes Inform’atics, Universitat Polit’ecnica de Catalunya (1998), URL http://adimen.si.ehu.es/~rigau/publications/thesis-rigau.pdf.
S. D. Richardson, W. B. Dolan, and L. Vanderwende, “MindNet: Acquiring and structuring semantic information from text,” in: Proc. COLINGACL’98, 2, Montreal, Canada (1998), pp. 1098–1102.
W. Dolan, L. Vanderwende, and S. D. Richardson, “Automatically deriving structured knowledge bases rfon on-line dictionaries,” in: PACLING 93 Pacific Association for Comput. Linguistics (1993), pp. 5–14.
M. Ruiz-Casado, E. Alfonseca, and P. Castells, “Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets,” in: Advances in Web Intelligence, Springer, Berlin–Heidelberg (2005), pp. 380–386.
M. Ruiz-Casado, E. Alfonseca, and P. Castells, “Automatising the learning of lexical patterns: An application to the enrichment of Wordnet by extracting semantic relationships from Wikipedia,” Data & Knowledge Engineering, 61, No. 3, 484–499 (2007).
E. Niemann and I. Gurevych, “The people’s web meets linguistic knowledge: Automatic sense alignment of Wikipedia and WordNet,” in: Proc. 9th Intern. Conf. on Comput. Semantics (IWCS) (2011), pp. 205–214.
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” J. of the American Soc. for Inform. Sci., 41, No. 6, 391–407 (1990).
R. Harshman, “Foundations of the PARAFAC procedure: Models and conditions for an explanatory multimodal factor analysis,” UCLA Working Papers in Phonetics, 16, 1–84 (1970).
J. Antikainen, J. Havel, R. Josth, A. Herout, P. Zemcik, and M. Hauta-Kasari, “Nonnegative tensor factorization accelerated using GPGPU,” IEEE Trans. Parallel Distrib. Syst., 22, No. 7, 1135–1141 (2011).
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Sci. 220, 671–680 (1983).
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated from Kibernetika i Sistemnyi Analiz, No. 3, pp. 3–16, May–June, 2014.
Rights and permissions
About this article
Cite this article
Anisimov, A.V., Marchenko, O.O. & Vozniuk, T.G. Determining Semantic Valences of Ontology Concepts by Means of Nonnegative Factorization of Tensors of Large Text Corpora. Cybern Syst Anal 50, 327–337 (2014). https://doi.org/10.1007/s10559-014-9621-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10559-014-9621-9