Orthogonality and Orthography: Introducing Measured Distance into Semantic Space

Cohen, Trevor; Widdows, Dominic; Wahle, Manuel; Schvaneveldt, Roger

doi:10.1007/978-3-642-54943-4_4

Trevor Cohen¹⁹,
Dominic Widdows²⁰,
Manuel Wahle¹⁹ &
…
Roger Schvaneveldt²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8369))

Included in the following conference series:

International Symposium on Quantum Interaction

926 Accesses
4 Citations

Abstract

This paper explores a new technique for encoding structured information into a semantic model, for the construction of vector representations of words and sentences. As an illustrative application, we use this technique to compose robust representations of words based on sequences of letters, that are tolerant to changes such as transposition, insertion and deletion of characters. Since these vectors are generated from the written form or orthography of a word, we call them ‘orthographic vectors’. The representation of discrete letters in a continuous vector space is an interesting example of a Generalized Quantum model, and the process of generating semantic vectors for letters in a word is mathematically similar to the derivation of orbital angular momentum in quantum mechanics. The importance (and sometimes, the violation) of orthogonality is discussed in both mathematical settings. This work is grounded in psychological literature on word representation and recognition, and is also motivated by potential technological applications such as genre-appropriate spelling correction. The mathematical method, examples and experiments, and the implementation and availability of the technique in the Semantic Vectors package are also discussed.

Access provided by Autonomous University of Puebla. Download conference paper PDF

LSAfun - An R package for computations based on Latent Semantic Analysis

Article 26 November 2014

The orthographic similarity structure of English words: Insights from network science

Article Open access 25 June 2018

Chinese lexical database (CLD)

Article 22 June 2018

Keywords

1 Introduction

The relationships between words, their representation in text, concepts in the mind, and objects in the real world, has been the source of inquiry over many centuries. Empirical, distributional paradigms have been shown to successfully derive human-like estimates of semantic distance from large text corpora, and recent developments in this area have mediated the enrichment of distributional models with structural information, such as the relative position of terms [1, 2], and orthographic information describing the configuration of characters from which words are composed [3, 4]. This paper extends these beginnings in three principal ways. First, we propose a new and very simple method for encoding structure into semantic vectors, using a quantization of the space between two extreme ‘demarcator vectors’. This vector generation method performs well in experiments and has some key computational advantages. Second, the method is general enough to be applied within a wide range of Vector Symbolic Architectures (VSAs), including Circular Holographic Reduced Representations (CHRRs) that use complex vectors [5]. Thirdly, we investigate some higher-level compositions as well, demonstrating some early results with compositional representations for sentences.

Each of these developments is related to quantum interaction as follows. Demarcator vector generation is similar in a sense to the derivation of orbital angular momentum values in quantum mechanics, in that they use the same mathematics. The application within many VSAs, including those over complex vector space, makes the new methods available within algebras that are particularly related to generalized quantum models. Finally, compositional methods are important in research on generalized quantum structures, and the way many levels of representation are combined seamlessly in the compositional models presented here should be of interest to researchers in the field.

2 Orthogonality in Distributional and Orthographic Models

Geometric methods of distributional semantics derive vector representations of terms from electronic text, such that terms that occur in similar contexts will have similar vectors [6, 7]. While such models have been shown to approximate human performance on a number of cognitive tasks [8, 9], they generally do not take into account structural elements of language, and consequently have been referred to, at times critically, as “bags of words” models. Emerging approaches to semantic space models have leveraged reversible vector transformations to encode additional layers of meaning into vector representations of terms and concepts. Examples include the encoding of the relative position of terms [1, 10], syntactic information [11], and orthographic information [4].

The general approach used depends upon the generation of vector representations for terms, and associating reversible vector transformations with properties of interest. In accordance with terminology developed in [12], we will refer to the vector representations of atomic components such as terms as elemental vectors. Elemental vectors are constructed using a randomization procedure such that they have a high probability of being mutually orthogonal, or close-to-orthogonal. This adds robustness to the model, by making it highly improbable that elemental vectors would be confused with one another, despite the distortion that occurs during training. However, it also introduces the implicit assumption that elemental vectors are unrelated to one another, which means that models generated in this way must be composed of discrete elements.

This limitation notwithstanding, this approach has allowed for the integration of structured information into distributional models of meaning. From the perspective of cognitive psychology, this is desirable as it presents the possibility of a unified term representation that can account for a broad range of experimental phenomena. Recent work in this area has leveraged circular convolution to generate vectors representing the orthographic form of words [3], and integrate these with a geometric model of distributional semantics [4]. Vector representations of orthographic word form are generated by using circular convolution to generate bound products representing the component bigrams of the term concerned, including non-contiguous bigrams. Karchergis et al give the following example ($\circledast $ indicates binding using circular convolution) [4]:

$$\begin{aligned} word =&w + o + r + d + w \circledast o + o \circledast r + r \circledast d \\ +&w \circledast o + (w \circledast \_ ) \circledast r + (w \circledast \_) \circledast d + (w \circledast o ) \circledast r + ((w \circledast \_ ) \circledast r) \circledast d \\ +&(w \circledast \_ ) \circledast d + ( o \circledast \_ ) \circledast d + r \circledast d + (( w \circledast o)\_) \circledast d + (o \circledast r ) \circledast d \end{aligned}$$

The vector representation for the term “word” is generated by combining a set of vectors representing unigrams, bigrams, and trigrams of characters. It is a characteristic of the model employed that each of these vectors have a high probability of being mutually orthogonal, or close-to-orthogonal. So, for example, the vectors representing the trigrams $((w \circledast \_ ) \circledast r)$ and $((w \circledast o ) \circledast r)$ will be dissimilar from one another. Consequently it is necessary to explicitly encode all of the n-grams of interest, including gapped trigrams (such as “w_ r”) to provide flexibility. From the perspective of computational complexity, this is not ideal, as the number of representational units that must be generated and encoded is at least quadratic to the length of the sequence.

Rather than explicitly encoding character position precisely (with respect to some other character, or the term itself), alternative models of orthographic representation allow for a degree of uncertainty with respect to letter position. These approaches measure the relatedness between terms on the basis of the similarity between probability distributions assigned to the positions of each matching character [13, 14], providing a more flexible measure of similarity. However, on account of the constraints we have discussed, only orthographic representations based on discrete bigrams or the exact position of characters have been combined with other sorts of distributional information in an attempt to generate a holistic representation to date [3, 15]. Imposing near-orthogonality adds robustness, but also necessitates ignoring potentially useful information to do with structure, namely the proximity between character positions within a word. Consequently, we have selected the generation of orthographic representations as an example application through which to illustrate the utility of our approach.

The paper proceeds as follows. First, we present the mathematical language we will use to describe the operators provided by VSAs, a family of representational approaches based on reversible vector transformations [16]. Next, we will describe an approach we have developed through which the distance between elemental vectors, and hence bound products, can be predetermined. In the context of an illustrative application for orthographic modeling, we show that this approach permits the encoding of structural information to do with proximity, rather than absolute position, into a distributional model. We then discuss a relationship between this approach and quantum mechanics, and conclude with some experimental results and example applications.

3 Mathematical Structure and Methods

3.1 Vector Symbolic Architectures (VSAs)

The reversible vector transformations we have discussed are a distinguishing feature of a family of representational approaches collectively known as VSAs [16]. In our experiments the VSAs we will use are Kanerva’s Binary Spatter Code (BSC), which uses binary vectors [17], and Plate’s CHRR [5], which uses complex vectors where each dimension represents an angle between $-\pi $ and $\pi $, using the implementation developed in [10]. In addition, we will use an approach based on permutation of real vectors [2].

Binding is the primary operation facilitated by VSAs (in addition to standard operators for vector superposition and vector comparison). Binding is a multiplication-like operator through which two vectors are combined to form a third vector C that is dissimilar from either of its component vectors A and B. We will use the symbol “$\otimes $” for binding, and the symbol “$\oslash $” for the inverse of binding for the remainder of this paper. It is important that this operator be invertible: if $\mathrm{C} = \mathrm{A} \otimes \mathrm{B}$, then $\mathrm{A} \oslash \mathrm{C} = \mathrm{A} \oslash (\mathrm{A} \otimes \mathrm{B}) = \mathrm{B}$. In some models, this recovery may be approximate, but the robust nature of the representation guarantees that $\mathrm{A} \oslash \mathrm{C}$ is similar enough to B that B can easily be recognized as the best candidate for $\mathrm{A} \oslash \mathrm{C}$ in the original set of concepts. Thus the invertible nature of the bind operator facilitates the retrieval of the information it encodes.

In the case of the BSC, elemental vectors are initialized by randomly assigning 0 or 1 to each dimension with equal probability. Pairwise exclusive or (${\textsc {XOR}}$) is used as a binding operator: $\mathrm {X} \otimes \mathrm {Y} = \mathrm {X}{\textsc {\ XOR\ }}\mathrm {Y}$. As it is its own inverse, the binding and decoding processes are identical ($\otimes = \oslash $). For superposition, the BSC employs a majority vote: if the component vectors have more ones than zeros in a dimension, this dimension will have a value of one, with ties broken at random.

In CHRR, binding through circular convolution is accomplished by pairwise multiplication: $X \otimes Y = \{ X_1Y_1, X_2Y_2, \ldots .\,. X_{n-1} Y_{n-1}, X_nY_n \} $, which is equivalent to addition of the phase angles of the circular vectors concerned. Binding is inverted by binding to the inverse of the vector concerned: $X \oslash Y = X \otimes Y^{\mathrm {-1}}$, where the inverse of a vector is its complex conjugate. Elemental vectors are initialized by randomly assigning a phase angle to each dimension. Superposition is accomplished by pairwise addition of the unit circle vectors, and normalization of the result for each circular component. In the implementation used in our experiments, normalization occurs after training concludes, so the sequence in which superposition occurs is not relevant.

Our real vector implementation follows the approach developed by Sahlgren and his colleagues [2], and differs from the binary and complex implementations, in that elemental vectors are “bound” to permutations, rather than to other vectors. Elemental vectors are constructed by creating a high-dimensional zero vector (on the order of 1000 dimensions), and setting a small number of the dimensions of this vector (on the order of 10) to either +1 or $-1$ at random. The permutations utilized consist of shifting all of the elements of a given vector $n$ positions to the right, where each value $n$ is assigned to, or derived from, the information it is intended to encode. In the case of our orthographic model, this information consists of the character occurring in a particular position, so we have used the ASCII value of the character concerned as $n$. Binding is reversed by permuting all of the elements of the vector $n$ positions to the left. Superposition is accomplished by adding the vectors concerned, and normalizing the result.

In all models, the “random” initiation of elemental vectors is rendered deterministic by seeding the random number generator with a hash value derived from a string or character of interest following the approach developed in [18]. This retains the property of near-orthogonality where desired, while ensuring that random overlap between elemental vectors is consistent across experiments.

3.2 Measured Similarity

The first step in our approach involves generating a set of vectors that are a fixed distance apart, which we will refer to as demarcator vectors $(D({\mathrm {position}}))$, as illustrated in Fig. 1. The first pair of demarcator vectors are conventional elemental vectors $D(\alpha )$ and $D(\omega )$, constructed randomly such that they have a high probability of being mutually orthogonal or close-to-orthogonal. To ensure this with certainty, we render $D(\omega )$ orthogonal to $D(\alpha )$ using the quantum negation procedure, or its binary approximation, described in [19] and [20] respectively. The remaining demarcator vectors are generated by interpolation. In the continuous vector spaces, this is accomplished by subdividing the $90^{\mathrm {o}}$ angle between $D(\alpha )$ and $D(\omega )$ and generating the corresponding unit vectors. In binary vector space, this is accomplished by weighting the probability of assigning a 1 when $D(\alpha )$ and $D(\omega )$ disagree in accordance with the desired distance between the new demarcator vector and each of these extremes^{Footnote 1}. As the vectors representing adjacent numbers are approximately equidistant, the distances between vector pairs representing numbers the same distance apart should also be approximately equal (e.g. $sim(D(1),D(2)) \approx sim(D(2),D(3))$).

Table 1 illustrates the pairwise similarities between a set of five demarcator vectors constructed in this manner. In the binary case vectors of dimensionality 32,000 are used, in the complex case vectors of dimensionality 500 are used, and in the real case, vectors of dimensionality 1,000 are used. These dimensions were chosen so as to normalize the space requirements of the stored vectors across models, and were retained in our subsequent experiments. In all cases, the relatedness between demarcator vectors a fixed distance apart is approximately equal. For example, the similarity between all pairs of demarcator vectors two positions apart (e.g. 1 and 3) is approximately 0.5 in the binary implementation, and 0.71 in the complex and real vector implementations. In the binary implementation, the difference in relatedness is proportional to the difference in demarcator position. This is not the case in the complex or real implementations, where the drop in similarity between demarcator vectors becomes progressively steeper. This is an artifact of the metric used to measure similarity in each case. With binary vectors, $2 \times (0.5 - {\mathrm { normalized\ Hamming\ distance}})$ is used, but with continuous vectors the cosine distance metric is used. While a proportional decrement could be obtained by measuring the angle between complex vectors directly (or taking the arccos of this cosine value), we will retain the use of the cosine metric for our experiments.

Table 1. Pairwise similarity between demarcator vectors

Full size table

3.3 Encoding Orthography

Using controlled degrees of non-orthogonality, we can encode information about the positions of letters in words into their vector representations. Like spatial encoding [14] and the overlap model [13], our approach is based upon measuring the difference in position between matching characters. This is accomplished by creating elemental vectors for characters, and binding them to demarcator vectors representing positions. For example, the orthographic vector for the term “word” is constructed as follows:

$$\begin{aligned} S({\mathrm {word}}) = E(\mathrm {w}) \otimes D(1) + E(\mathrm {o}) \otimes D(2) + E(\mathrm {r}) \otimes D(3) + E(\mathrm {d}) \otimes D(4) \end{aligned}$$

As the elemental vectors for characters are mutually orthogonal or near-orthogonal, bound products derived from different characters will be orthogonal or near-orthogonal also. For example, in the complex vector space used to generate Table 1, $sim(E(\mathrm {w}) \otimes D(\mathrm {1}) , E(\mathrm {q}) \otimes D(\mathrm {1})) = 0$. Furthermore, the distance between bound products containing the same character will approximate the distance between their demarcator vectors. For example, in the real vector space used to generate Table 1, $sim(E(\mathrm {w}) \otimes D(\mathrm {1}) , E(\mathrm {w}) \otimes D(\mathrm {2})) = 0.92 = sim(D\mathrm ({1}), D\mathrm ({2}))$. Ultimately, the similarity between a pair of terms is derived from the distance between their matching characters. If this distance is generally low, the orthographic similarity between these terms will be high^{Footnote 2}. Thus, the models so generated are innately tolerant to variations such as transposition, insertion and deletion of sequence elements.

4 Demarcator Vectors and Orbital Angular Momentum

Evenly distributing normalized vectors between two orthogonal vectors is one of many strategies we could adopt to generate demarcator vectors. To generalize this process, we can describe it as follows:

1.
Construct a line in the vector space with a given starting point and direction.
2.
Place demarcators along this line using some dividing strategy.
3.
(Optional) Project demarcator vectors onto the unit circle to normalize them.

Two examples of this strategy are illustrated in Fig. 2. In the example on the left, the generating line is the vertical axis, the dividing strategy is to mark points at even intervals along this axis, and the projection strategy is to project orthogonally from the vertical axis onto the unit sphere^{Footnote 3}. In the example on the right, we have chosen a generating line parallel to the vertical axis, marked points at even intervals, and projected onto the unit sphere using standard vector normalization. The left-hand strategy will be familiar to some readers: this is precisely the way orbital angular momentum states are generated in quantum mechanics. We refer the reader to a text on quantum mechanics for this derivation, e.g., [21, Ch 14]: the process includes solving a wave equation in three dimensions, exploring the angular momentum operator and the commutator relations between its components, and noting that each point on the axis can be mapped to many on the outer sphere (this ambiguity corresponding to the fact that measuring the component of angular momentum along one axis must leave the component along the other axes undetermined according to the Uncertainty Principle). The important point for this discussion is that it is quite standard to derive non-orthogonal vectors for states in this fashion, not only in generalized quantum structures but in quantum mechanics itself. The underlying spherical harmonic functions involved in angular momentum are orthogonal to one another under pairwise integration, but lead to several possible non-orthogonal angular momentum vectors.

The ambiguity of the projection onto the unit sphere in the angular momentum model in more than two dimensions is problematic, and we expect the strategy on the right to be simpler in practice. Note also that both strategies do not distribute vectors evenly around the unit sphere: for example, in the right-hand strategy, the vectors representing positions 3 and 4 are closer to each other than the vectors representing positions 1 and 2. Such flexibility to vary these pairwise similarities between positions along a string is a desirable property, because changes at the beginning of a word may be more significant than changes in the middle [13]. We note also that in our current implementation in the Semantic Vectors package, this generalized strategy works well for real-valued and complex-valued vectors, but is as yet underspecified for binary-valued vectors because (for example) ‘the point half-way between $A$ and $B$’ is multiply defined using Hamming distance [22]. We are currently investigating appropriate strategies to bring binary vectors into this generalized description.

5 Applications: Orthography, Morphology, Sentence Similarity

Table 2 provides examples of nearest-neighbor search based on orthographic similarity in real, complex and binary vector spaces derived from the widely-used Touchstone Applied Science Associates, Inc. (TASA) corpus. Terms in the corpus that occurred between 5 and 15,000 times were represented as candidates for retrieval. The dimensionality of the real, complex and binary vector spaces concerned was 1000, 500 and 32,000 respectively. Our approach successfully recovers orthographically related terms, including terms containing substrings of the original term (“dominic” vs. “condominium”); insertions (“orthography” vs. “orthophotography”); substitutions (“angular” vs. “annular”) and transposition of characters (“wahle” vs. “whale”). While not shown in the table on account of space constraints, the models produced similar sets of results for the same cue term.

Table 2. Orthographic similarity

Full size table

Table 3. Comparison with benchmark conditions from [15]. 1X = original dimensionality (used for Table 2). 2X = twice original dimensionality. nb = no binding. $\mathrm{ED} =1 - \frac{\mathrm {edit\ distance}}{\mathrm {combined\ length}}.$

Full size table

While we would be hesitant to propose the simple model of orthographic representation we have developed as a cognitive model of lexical coding, it is interesting to note that it does conform to the majority of a set of constraints abstracted from lexical priming data by Hannagan and his colleagues [15]. We will describe these constraints in brief here, but refer the interested reader to [15] for further details. The constraints are as follows: (1) Stability: a string should be most similar to itself (${\mathrm {sim}} \ge 0.95$); (2) Edge Effects: substitutions at the edges of strings should be more disruptive; (3) Local Translocations (TL): transposing adjacent characters should be less disruptive than substituting both of them; (4) Global TL: transposing all adjacent characters should be maximally disruptive (5) Distal TL: transposing non-adjacent characters should be more disruptive than substituting one, but less than substituting both; (6) Compound TL: TL and substitution should be more disruptive than substitution alone; (7) Distinct Relative Position (RP): removing some characters should preserve some similarity; and (8) Repeated RP: removing a repeated or non-repeated letter should be equally disruptive^{Footnote 4}. Each constraint is accompanied by a set of test cases, consisting of paired strings, and the degree to which a model meets the constraint is determined from the estimated similarities between these pairs, and the relationships between them.

The extent to which the our models meet these constraints is shown in Table 3. Estimates based on all models consistently meet all constraints aside from those related to Edge Effects, and the Global and Distal TL constraints (in the latter case this is due to the fact that translocation of characters one position apart is less disruptive than substituting one of these characters). This represents a better fit to these constraints than comparable models based on letter distribution only (labeled “nb”, or not bound). It also represents a better fit than the majority of approaches evaluated against this benchmark previously [15, 23], providing motivation for the further evaluation of a more developed model in the future. While the real model appeared to meet the edge effect related constraint at its original dimensionality, this finding did not hold at higher dimensionality, and was most likely produced by random overlap. This is not surprising given that our current model does not address edge effects. As suggested in Sect. 4, one way to address this issue would be to increase the distance between peripheral demarcator vectors, a customization we plan to evaluate in future work.

Table 4. Combining orthographic and semantic similarity

Full size table

The results in Table 4 were obtained by superposing the orthographic vector for each term from the TASA corpus with a semantic vector for the same term generated using the permutation-based approach described in [2], with a 2+2 sliding window. As anticipated by previous work combining orthographic and semantic relatedness [24, 25], the examples suggest that this model is able to find associations between morphologically related terms, including those between English verb roots and past tense forms are related by non-affixal patterns such as “bring:brought”. The combination of semantics and shared characters is evident in other examples, such as “think:intend”. However, this sensitivity to morphological similarity comes at a cost of introducing false similarity when common letter patterns do not have a semantic significance.

Table 5. Retrieval of Sentences from the TASA Corpus in complex (first two examples) and binary (second two examples) spaces

Full size table

6 Conclusion

In this paper we have introduced a novel approach through which the near-orthogonality of elemental vectors is deliberately violated to introduce measured similarity into semantic space. While illustrated primarily through orthographic modeling, the approach is general in nature and can be applied in any situation in which a representation of sequence that is tolerant to variation is desired. Furthermore, this approach may mediate the generation of holistic representations combining distributional and spatial information, a direction we plan to explore in future work. To facilitate further experimentation, our real, binary and complex orthographic vector implementations have been released as components of the open source Semantic Vectors package [27, 28].

Notes

1.
The same random number sequence must be used for all vectors in a demarcator set, so that a consistent random value for each bit position is compared to the relevant thresholds.
2.
For terms of different lengths, we elected to construct a set of demarcator vectors for each term. So while $D(\alpha )$ and $D(\omega )$ will be identical, the demarcator for a particular position may differ. It would also be possible to use identical demarcator vectors (by generating a set large enough to accommodate the longest term), which may be advantageous for some tasks.
3.
In this example we have drawn negative and positive positions, though in practice we have only experimented with nonnegative positions so far.
4.
As the randomization procedure makes it very unlikely that the estimates of similarity between any two pairs will be identical, we have considered a difference of $\le 0.05$ to be approximately equal. This mirrors the relaxed constraint that $\ge 0.95$ is approximately identical used by Hannagan and his colleagues for the stability constraint [15, 23].

References

Jones, M.N., Mewhort, D.J.K.: Representing word meaning and order information in a composite holographic lexicon. Psychol. Rev. 114, 1–37 (2007)
Article Google Scholar
Sahlgren, M., Holst, A.,Kanerva, P.: Permutations as a means to encode order in word space. In: Proceedings of the 30th Annual Meeting of the Cognitive Science Society (CogSci’08), July 23–26, Washington D.C., USA (2008)
Google Scholar
Cox, G.E., Kachergis, G., Recchia, G., Jones, M.N.: Toward a scalable holographic word-form representation. Behav. Res. Methods 43, 602–615 (2011)
Article Google Scholar
Kachergis, G., Cox, G.E., Jones, M.N.: Orbeagle: integrating orthography into a holographic model of the lexicon. In: Honkela, T. (ed.) ICANN 2011, Part I. LNCS, vol. 6791, pp. 307–314. Springer, Heidelberg (2011)
Google Scholar
Plate, T.A.: Holographic Reduced Representation: Distributed Representation for Cognitive Structures. CSLI Publications, Stanford (2003)
Google Scholar
Cohen, T., Widdows, D.: Empirical distributional semantics: methods and biomedical applications. J. Biomed. Inf. 42, 390–405 (2009)
Article Google Scholar
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)
MATH MathSciNet Google Scholar
Landauer, T., Dumais, S.: A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104, 211–240 (1997)
Article Google Scholar
Burgess, C., Livesay, K., Lund, K.: Explorations in context space: words, sentences, discourse. Discourse Process. 25, 211–257 (1998)
Article Google Scholar
De Vine, L., Bruza, P.: Semantic oscillations: Encoding context and structure in complex valued holographic vectors. In: Proceedings of AAAI Fall Symposium on Quantum Informatics for Cognitive Social, and Semantic Processes (2010)
Google Scholar
Basile, P., Caputo, A., Semeraro, G.: Encoding syntactic dependencies by vector permutation. In: Proceedings of the EMNLP 2011 Workshop on GEometrical Models of Natural Language Semantics, GEMS, vol. 11, pp. 43–51 (2011)
Google Scholar
Cohen, T., Widdows, D., Schvaneveldt, R., Rindflesch, T.: Finding schizophrenia’s prozac: emergent relational similarity in predication space. In: Song, D., Melucci, M., Frommholz, I., Zhang, P., Wang, L., Arafat, S. (eds.) QI 2011. LNCS, vol. 7052, pp. 48–59. Springer, Heidelberg (2011)
Google Scholar
Gomez, P., Ratcliff, R., Perea, M.: The overlap model: a model of letter position coding. Psychol. Rev. 115, 577–600 (2008)
Article Google Scholar
Davis, C.J.: The spatial coding model of visual word identification. Psychol. Rev. 117(3), 713 (2010)
Article Google Scholar
Hannagan, T., Dupoux, E., Christophe, A.: Holographic string encoding. Cogn. Sci. 35(1), 79–118 (2011)
Article Google Scholar
Gayler, R.W.: Vector symbolic architectures answer jackendoff’s challenges for cognitive neuroscience. In: Slezak, P. (ed.) ICCS/ASCS International Conference on Cognitive Science, pp. 133–138. University of New South Wales, Sydney (2004)
Google Scholar
Kanerva, P.: Binary spatter-coding of ordered k-tuples. In: Artificial Neural Networks – ICANN, vol. 96, pp. 869–873 (1996)
Google Scholar
Wahle, M., Widdows, D., Herskovic, J.R., Bernstam, E.V., Cohen, T.: Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts. In: AMIA Annual Symposium Proceedings 2012, pp. 940–949, November 2012
Google Scholar
Widdows, D., Peters, S.: Word vectors and quantum logic experiments with negation and disjunction. In: Proceedings of 8th Mathematics of Language Conference, Bloomington, Indiana (2003)
Google Scholar
Cohen, T., Widdows, D., De Vine, L., Schvaneveldt, R., Rindflesch, T.C.: Many paths lead to discovery: analogical retrieval of cancer therapies. In: Busemeyer, J.R., Dubois, F., Lambert-Mogiliansky, A., Melucci, M. (eds.) QI 2012. LNCS, vol. 7620, pp. 90–101. Springer, Heidelberg (2012)
Google Scholar
Bohm, D.: Quantum Theory. Prentice-Hall, Upper Saddle River (1951). Republished by Dover (1989)
Google Scholar
Widdows, D., Cohen, T.: Real, complex, and binary semantic vectors. In: Sixth International Symposium on Quantum Interaction, France, Paris (2012)
Google Scholar
Hannagan, T., Grainger, J.: Protein analysis meets visual word recognition: a case for string kernels in the brain. Cogn. Sci. 36(4), 575–606 (2012)
Article Google Scholar
Yarowsky, D., Wicentowski, R.: Minimally supervised morphological analysis by multimodal alignment. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL ’00, PA, USA, pp. 207–216. Association for Computational Linguistics, Stroudsburg (2000)
Google Scholar
Baroni, M., Matiasek, J., Trost, H.: Unsupervised discovery of morphologically related words based on orthographic and semantic similarity. In: Proceedings of the ACL-02 workshop on Morphological and phonological learning -. MPL ’02, PA, USA, vol. 6, pp. 48–57. Association for Computational Linguistics, Stroudsburg (2002)
Google Scholar
Dennis, S.: Introducing word order within the LSA framework. In: Handbook of Latent Semantic Analysis (2007)
Google Scholar
Widdows, D., Ferraro, K.: Semantic vectors: a scalable open source package and online technology management application. In: Sixth International Conference on Language Resources and Evaluation (LREC 2008) (2008)
Google Scholar
Widdows, D., Cohen, T.: The semantic vectors package: New algorithms and public tools for distributional semantics. In: Fourth IEEE International Conference on Semantic Computing (ICSC), pp. 9–15 (2010)
Google Scholar

Download references

Acknowledgments

This research was supported by US National Library of Medicine grant R21 LM010826. We would like to thank Lance DeVine, for the CHRR implementation used in this research, and Tom Landauer for providing the TASA corpus.

Author information

Authors and Affiliations

University of Texas School of Biomedical Informatics at Houston, Houston, USA
Trevor Cohen & Manuel Wahle
Microsoft Bing, Bellevue, WA, USA
Dominic Widdows
Arizona State University, Phoenix, USA
Roger Schvaneveldt

Authors

Trevor Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Dominic Widdows
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Wahle
View author publications
You can also search for this author in PubMed Google Scholar
Roger Schvaneveldt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Trevor Cohen .

Editor information

Editors and Affiliations

ETH Zurich, Zurich, Switzerland
Harald Atmanspacher
School of Management, University of Leicester, Leicester, United Kingdom
Emmanuel Haven
Queensland University of Technology, Brisbane, Queensland, Australia
Kirsty Kitto
Department of Physics and Astronomy Centre for Interdisciplinary Science, University of Leicester, Leicester, United Kingdom
Derek Raine

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cohen, T., Widdows, D., Wahle, M., Schvaneveldt, R. (2014). Orthogonality and Orthography: Introducing Measured Distance into Semantic Space. In: Atmanspacher, H., Haven, E., Kitto, K., Raine, D. (eds) Quantum Interaction. QI 2013. Lecture Notes in Computer Science(), vol 8369. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54943-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-54943-4_4
Published: 18 April 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54942-7
Online ISBN: 978-3-642-54943-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Orthogonality and Orthography: Introducing Measured Distance into Semantic Space

Abstract

Similar content being viewed by others

LSAfun - An R package for computations based on Latent Semantic Analysis