1 Introduction

Semantic similarity between concepts is becoming a common problem for many applications such as natural language processing, text categorization, text clustering, information retrieval, and word sense disambiguation [1, 9, 12, 36, 37, 55, 57]. However, making judgments about the semantic similarity of different concepts is a routine yet deceptively complex task. To perform it, people need to draw on an immense amount of background knowledge about the concepts. Usually, these sources can be search engines [15], topical directories such as Open Directory Project [46], well-defined semantic networks such as WordNet [24, 43], more domain-dependent ontologies [67, 74] such as Gene Ontology [17] and biomedical ontologies MeSH or SNOMED CT [4, 69], Wikipedia [34, 37], or Linked Data [13, 56]. In fact, several works have been developed in the past years proposing semantic similarity measures. According to the concrete knowledge sources exploited and the way in which they are used, various similarity measures have been proposed [29, 71, 72]. Semantic similarity measures can be classified into four main categories: [37, 47, 51]: (1) distance-based models that are based on the structural representation of the underlying context; (2) feature-based models that define concepts or entities as sets of features; (3) statistical methods that consider statistics derived from the underlying context; and (4) hybrid models that comprise combinations of the three basic categories. Concretely, distance-based models, also referred to as edge-counting or path-based methods, define similarity as a function of distance between concepts [51, 62]. Feature-based methods assume that concepts can be represented as sets of features. They assess the similarity of concepts based on the commonalities among their feature sets: any increase in common features among concepts results in a higher similarity score and any decrease in shared features results in lower levels of similarity [51, 80]. Statistical similarity measures incorporate statistics derived from various aspects of the underlying domain into the similarity computation.

It is worth noting that all these measures mentioned above are some specific computation methods by using different knowledge sources such as WordNet [20], Wikipedia [48], or Linked Data [10] or different mathematical tools such as information content (IC) [63], pointwise mutual information (PMI) [14], or latent semantic analysis (LSA) [19]. Furthermore, for the same kind of knowledge source, different computation approaches for semantic similarity need different contents of the knowledge source. For example, in Wikipedia based similarity measures, IC-based measures need the category structure of Wikipedia, however, feature-based methods need the articles (e.g., the redirect pages and hyperlinks) of Wikipedia. In fact, we can propose some novel computation approaches for semantic similarity of concepts by exploiting new knowledge sources or mathematical tools. Clearly, there are some issues in existing researches. Firstly, there are lots of computation approaches of semantic similarity, however, there is not a unified framework for these methods. Therefore, in practical applications it is difficult for the users to choose which computation method for semantic similarity of concepts. Secondly, if two concepts A and B belong to two heterogeneous knowledge sources, the semantic similarity between A and B cannot be computed using existing methods. For example, if AWordNet, ADBpedia, BDBpedia, and BWordNet, existing approaches cannot compute the semantic similarity sim(A, B). Of course, if two concepts A and B belong to two homogeneous knowledge sources such as two different domain ontologies built in the same language, the value of sim(A, B) can be computed by using existing methods such as [8, 71].

To fill these gaps, this paper proposes an extensive study for semantic similarity of concepts from which a unified framework for semantic similarity computation is presented. It should be noted that Cross et al. [18] and Harispe et al. [32] have studied the unified framework issue for semantic similarity measures. However, their works are different from our research in this paper: Cross et al. [18] and Harispe et al. [32] present a framework for unifying ontology-based semantic similarity measures, and we will propose a unified framework for semantic similarity measures for multiple heterogeneous knowledge sources [68] such as WordNet [20], ontologies [77], Wikipedia [48], and Linked Data [10]. Based on our framework for semantic similarity of concepts, we give some generic and flexible approaches to semantic similarity measures resulting from instantiations of the framework. The main contributions of this paper are as follows:

  • The semantic representation and a unified framework for semantic similarity computation of concepts are presented.

  • Some generic and flexible approaches to semantic similarity measures of concepts resulting from instantiations of the framework are provided.

  • Several new approaches to semantic similarity computation of concepts that existing methods cannot measure are proposed.

It is worth mentioning that semantic similarity measures can also be used in multimedia system such as multimedia databases and retrieval, personalized electronic journals, multimedia encyclopedias, digital libraries, executive information systems, and multimedia documents. For example, in multimedia (e.g., image, audio or video) retrieval with text annotation, we may use the semantic similarity of text to assist multimedia retrieval, where the computation of semantic similarity of text can be implemented by exploiting semantic similarity measures of concepts. Another example, in digital libraries or multimedia documents, there are many image, audio, video, and text data. In a similar manner, we also can utilize the semantic similarity of text to assist the processing (e.g., retrieval, classification, recommendation, mining, and analysis) of digital libraries and multimedia documents. That is, semantic similarity measure of concepts is also relevant to multimedia system.

The rest of the paper is organized as follows. In the next section, we briefly present the related works on semantic similarity measures. Section 3 presents our unified framework for semantic similarity computation of concepts. This includes the semantic representation of concepts and a framework for semantic similarity computation. In Section 4, we investigate several similarity measures resulting from instantiations of the framework. Section 5 is devoted to presenting detail of experiments and evaluation of our approaches. Finally, in Section 6, we draw our conclusion and present some perspectives for future research.

2 Related work

As a fundamental concept in theories of perception, behavior, social bonding, learning, and judgment, the notion of similarity has been extensively studied for several decades. Many researchers have endeavored to understand and represent the way humans judge the similarity of two or more objects [12, 27, 51, 53, 76, 80]. Semantic similarity reflects the relationship between the meaning of two concepts (words, entities, or terms), sentences (or short texts) or documents (or texts) [21, 31, 54, 59]. The literature on semantic similarity measures is very extensive, thus, we only focus on the measures that are evaluated in this work, that is, this section takes an overview of the methods for semantic similarity measures for concepts.

As stated in Section 1, semantic similarity between concepts (semantic similarity for short) can be computed based on a set of factors derived typically from a knowledge representation model. Depending on the structure of the application context and its knowledge representation model, various similarity measures have been proposed and different families of methods can also be identified [51, 71]. These families are [37, 47, 51, 58]: (1) distance-based similarity measures; (2) feature-based similarity measures; (3) statistical similarity measures; and (4) hybrid similarity measures.

2.1 Distance-based similarity measures

Modern research in this area starts with the work presented by Rada et al. [62]. Concretely, Rada et al. propose to use the length of the shortest path between concepts as a measurement of distance. Formally, their definition of conceptual distance is as follows:

$$ Dist\left(A,B\right)=\mathrm{minimum}\ \mathrm{number}\ \mathrm{of}\ \mathrm{edges}\ \mathrm{separating}\ a\ \mathrm{and}\ b, $$

where A and B are the two concepts represented by the nodes a and b, respectively, in an is-a semantic net [24].

The distance measure is converted to a similarity measure by subtracting the path length from the maximum possible path length, which can be shown in the following equation:

$$ Sim\left(A,B\right)=2\times {Distance}_{max}- Dist\left(A,B\right), $$

where Distancemax is the maximum possible path length [24].

The work proposed by Rada et al. [62] opens up the family of edge-counting semantic measures and shows that conceptual distance (or similarity) between concepts in a semantic network is proportional to the length of the path that links them [38]. The ideas of Rada et al. are followed by other works such as Wu and Palmer [82], Leacock and Chodorow [39], Hirst and St-Onge [33], Li et al. [41], Pedersen et al. [56], and Garla and Brandt [25] which also propose similarity measures based on features derived from the length of shortest path between concepts. For example, the metric presented by Wu and Palmer [82] relies on the fact that in is-a hierarchies, concepts that are more distant from the root are more specific than the ones that are near the root. Formally, the conceptual similarity between concepts A and B is defined as follows:

$$ Sim\left(A,B\right)=\frac{2\times {N}_3}{N_1+{N}_2+2\times {N}_3}, $$

where N1 (N2) is the number of edges on the path from A (B) to LCS(A, B), N3 is the number of edges on the path from LCS(A, B) to root, and LCS(A, B) means the least common subsumber (LCS) of concept A and concept B [24].

Leacock and Chodorow [39] propose a non-linear adaptation of Rada’s distance to define the similarity measure:

$$ Sim\left(A,B\right)=-\log \left(\frac{Dist\left(A,B\right)}{2\times \mathit{\operatorname{Max}}\_ depth}\right), $$

where Max_depth is the longest of the shortest paths linking a concept to the concept which subsumes all the others [32]. It should be noted that the non-linear adaptation here means logarithmic function of Rada’s distance, while the adaptation in [6] means runtime/semantic adaptation and management of software to support source-code semantic flexibility.

Garla and Brandt [25] give a proposal for the normalization of the metric of Leacock and Chodorow to the unit interval as follows [3]:

$$ Sim\left(A,B\right)=1-\frac{\log \left( Dist\left(A,B\right)\right)}{\log \left(2\times \mathit{\operatorname{Max}}\_ depth\right)}. $$

Li et al. [41] introduce a family of ten different parametric similarity measures whose core idea is the breaking down of the overall similarity function into a combination of functions linearly or nonlinearly, where each base function relies on a different taxonomical feature such as the length of the shortest path between concepts, and the depth of the lowest common ancestor [38]. One of the best measures among them is shown in the following equation:

$$ Sim\left(A,B\right)={e}^{-\alpha \ast Dist\left(A,B\right)}\cdotp \frac{e^{\beta h}-{e}^{-\beta h}}{e^{\beta h}+{e}^{-\beta h}}, $$

where Dist(A, B) is the number of edges separating A and B, h is the depth of LCS of A and B, α and β are parameters scaling the contribution of Dist(A, B) and h, α ≥ 0 and β > 0.

2.2 Feature-based similarity measures

Feature-based methods assume that concepts can be represented as sets of features. They assess the similarity of concepts based on the commonalities among their feature sets: any increase in common features among concepts results in a higher similarity score and any decrease in shared features results in lower levels of similarity [51, 80]. For discrete-valued vectors similarity measures are inspired by the comparison of sets and the cardinality of sets. Some common set-inspired similarity measures for discrete-valued vectors include [45]:

$$ \mathrm{Jaccard}\ \mathrm{coefficient}\ Jaccard\left(A,B\right)=\frac{\mid A\cap B\mid }{\mid A\cup B\mid }, $$
$$ \mathrm{Dice}\ \mathrm{coefficient}\ Dice\left(A,B\right)=\frac{2\times \mid A\cap B\mid }{\left|A\right|+\mid B\mid }, $$
$$ \mathrm{Salton}\ \mathrm{Cosine}\ \mathrm{coefficient}\ SaltonCosine\left(A,B\right)=\frac{\mid A\cap B\mid }{\left|A\right|\times \mid B\mid }, $$

where A and B denote the sets of features that correspond to concepts a and b.

The Tversky ration model [80] is defined by a weighted variant for the complement of the symmetric difference between the feature sets of two concepts and considers the distinctive characteristics of each concept (the features of one concept which are not part of the other):

$$ Tversky\left(A,B\right)=\frac{\mid A\cap B\mid }{\left|A\cap B\right|+\alpha \left|A-B\right|+\beta \mid B-A\mid}\mathrm{for}\ \alpha, \beta >0, $$

where α and β represent the relative contribution of unique features of A and B in the similarity value, respectively. The α and β parameters can be used to reflect the symmetric or asymmetric nature of a given context: if α = β then Tversky(A, B) = Tversky(B, A) thus, the similarity comparison is symmetric, otherwise, it is asymmetric (i.e., Tversky(A, B) ≠ Tversky(B, A)) [51].

With a perspective from set theory, the meaning of the Tversky measure is clear and well-founded. However, the feature sets associated to each concept cannot be derived directly from an ontology, which is a serious drawback for its practical implementation [38]. With the aim of bridging the gap in the Tversky measure, Sanchez et al. [73] introduce a feature-based dissimilarity measure which is based on the use of the common ancestors between concepts as a measure of their degree of similarity:

$$ Dis\left(A,B\right)={\log}_2\left(1+\frac{\left|\varphi (A)-\varphi (B)\right|+\left|\varphi (B)-\varphi (A)\right|}{\left|\varphi (A)-\varphi (B)\right|+\left|\varphi (B)-\varphi (A)\right|+\left|\varphi (A)\cap \varphi (B)\right|}\right), $$

where φ(C) = {DAllCons| C ≤ D}, AllCons is the set of concepts of a given ontology, and ≤ is a binary relation (i.e., concept subsumption).

The definition of the set of features such as the set of synonyms (called synsets in WordNet), definitions (i.e., glosses, containing textual descriptions of word senses), and the set of subconcepts (or subclasses, subcategories) is crucial in feature-based measures.

The Rodriguez and Egenhofer measure [65] is computed as the weighted sum of similarities between synsets, features (e.g., meronyms, attributes, etc.) and semantic neighborhoods (those linked via semantic pointer) of two concepts A and B:

$$ Sim\left(A,B\right)=w\cdotp {S}_{synsets}\left(A,B\right)+u\cdotp {S}_{features}\left(A,B\right)+v\cdotp {S}_{neighborhoods}\left(A,B\right)\ \mathrm{for}\ w,u,v\ge 0. $$

Weights assigned to w, u, and v depend on the characteristics of the ontologies. Only common specification components can be used in a similarity assessment. Their respective weights add up to 1.0.

X-Similarity [58] relies on matching between synsets and term description sets. The term description sets contains words extracted by parsing term definitions (“glosses” in WordNet or “scope notes” in MeSH). Two terms are similar if their synsets or description sets or, the synsets of the terms in their neighborhood (e.g., more specific and more general terms) are lexically similar. The similarity function is expressed as follows:

$$ \kern1em Sim\left(A,B\right)=\left\{\begin{array}{c}1,\kern9.5em \mathrm{if}\ {S}_{synsets}\left(A,B\right)>0\\ {}\max \left\{{S}_{neighborhoods}\left(A,B\right),{S}_{descriptions}\left(A,B\right)\right\},\kern0.5em \mathrm{if}\ {S}_{synsets}\left(A,B\right)=0\end{array}\right.. $$

Jiang et al. [37] investigate some feature-based approaches to semantic similarity assessment of concepts using Wikipedia and give the following framework for feature-based similarity using the sets of all synonym sets, gloss sets, anchor sets, and category sets of Wikipedia concepts:

Sim(A, B) = Sconcepts(Ssynonyms(SynonymsA, SynonymsB), Sglosses(GlossesA, GlossesB), Sanchors (AnchorsA, AnchorsB), Scategories(CategoriesA, CategoriesB)).

2.3 Statistical similarity measures

Statistical similarity measures incorporate statistics derived from various aspects of the underlying domain into the similarity computation [51]. Several approaches use the popularity of terms in a document as a measure of their informativeness and use this as a basis for measuring the similarity [34, 38, 42, 51, 63, 64, 71, 72]. These approaches are also known as Information Content (IC)-based measures.

Resnik [63] proposes an IC-based method which is not sensitive to the problem of varying link distance. They assume that the information shared by two concepts is indicated by the IC of the concepts that subsume them in a net (e.g. WordNet) [24]:

$$ Sim\left(A,B\right)= IC\left( LCS\left(A,B\right)\right), $$

where IC(C) = −log(p(C)) and p(C) is the probability of encountering an instance of concept C in a given corpus (e.g. Brown Corpus).

Resnik’s metric has two problems: any pair of concepts (words) with the same LCS will have the same semantic similarity; similarity between the same concepts (words) is not equal to one [24]. To correct these problems, Lin [42], Jiang and Conrath [35] propose their methods. Jiang and Conrath represent their metric as follows [35, 38]:

$$ {\displaystyle \begin{array}{c} Distance\left(A,B\right)= IC(A)+ IC(B)-2\times IC\left( LCS\left(A,B\right)\right)\ \mathrm{and}\\ {} Sim\left(A,B\right)=1-\frac{Distance\left(A,B\right)}{2}.\end{array}} $$

Lin’s similarity function [42] is expressed as follows:

$$ Sim\left(A,B\right)=\frac{2\times IC\left( LCS\left(A,B\right)\right)}{IC(A)+ IC(B)}. $$

Recently, there are many researches in IC-based semantic similarity measure [4, 34, 51]. For example, Jiang et al. [34] present several new methods to IC computation of a concept and similarity computation between two concepts drawn from Wikipedia category structure. Since Wikipedia category structure is a graph, naturally, the semantic similarity between concepts can be assessed by extending traditional information theoretic approaches (i.e., IC-based approaches).

All the IC-based similarity measures require an IC model. An IC model is a concept-valued function that assigns an IC value to each concept [38]. Except the corpus-based IC models [24, 35] [38, 42], some intrinsic IC models are developed. The pioneering work is the intrinsic IC model of Seco et al. [75]. Some new intrinsic IC models are also proposed [28, 49, 70, 72]. For example, in a recent work, Sanchez et al. [72] propose estimating the IC value of concept C as the ratio between the number of leaves on the taxonomical hierarchy under the concept C (as a measure of C’s generality) and the number of taxonomical subsumers above C including itself (as a measure of C’s concreteness). Formally,

$$ IC(C)=-\log \left(\frac{\frac{\mid leaves(C)\mid }{\mid subsumers(C)\mid }+1}{\mathit{\max}\_ leaves+1}\right), $$

where leaves(C) is the set of concepts found at the end of the taxonomical tree under concept C and subsumers(C) is the complete set of taxonomical ancestors of C including itself. The ratio is normalized by the least informative concept (i.e., the root of the taxonomy), for which the number of leaves is the total amount of leaves in the taxonomy (max_leaves) and the number of subsumers including itself is 1. To produce values in the range [0, 1] (i.e., in the same range as the original probability) and avoid log(0) values, 1 is added to the numerator and denominator.

Other approaches such as pointwise mutual information (PMI) [14] and vector-based methods such as latent semantic analysis (LSA) [19] and explicit semantic analysis (ESA) [23] can be classified as statistical semantic similarity measures as they use functions of term frequency for computing the similarity [51].

2.4 Hybrid similarity measures

A number of approaches can be classified as hybrid methods: they are based on combinations of some of the above presented methods. For example, Pirro [60] presents a similarity metric combining the feature-based and information theoretic theories of similarity. In particular, the proposed metric exploits the notion of intrinsic IC which quantifies IC values by scrutinizing how concepts are arranged in an ontological structure. Meng et al. [50] introduce a variant of the Lin measure [42], concretely, the similarity measure of Meng et al. [50] is a hybrid measure that combines the Lin IC-based measure with a power factor based on the shortest path length between concepts. In IS-A taxonomies, intrinsic IC (IIC) [75] incorporates the number of subclass of a concept for estimating the information content: the higher the number of subclass of a term, the lower its informativeness [51]. IIC has also been combined with feature-based [60] and edge counting methods [61, 78]. Gao et al. [24] propose an approach to calculate the semantic similarity between word pairs based on WordNet, specifically, they present an approach for semantic similarity measuring which is based on edge-counting and IC theory.

3 A framework for semantic similarity computation

To compute the semantic similarity sim(A, B) for two concepts A and B, we firstly need to get some related information such as synonyms or taxonomy structures of A and B from certain knowledge source such as WordNet [20] or domain ontologies [77]. For example, if users want to evaluate Sim(A, B) using IC-based measures, the users must have a taxonomy structure T (or two homogeneous taxonomy structures T1 and T2) such that A, BT (or AT1 and BT2). If A and B belong to two different heterogeneous knowledge sources such as AWordNet and BDBpedia, Sim(A, B) cannot be computed using existing IC-based methods. Similarly, to compute Sim(A, B), distance-based measures or feature-based measures also need some related information of A and B. If these related information comes from different knowledge sources, existing distance-based or feature-based measures also cannot compute Sim(A, B). On the other hand, when we compute Sim(A, B), the more related information of A and B we get, the more accurate result of Sim(A, B) can be obtained. Therefore, we need to get as much related information for A and B as possible from different knowledge sources in order to better computation of Sim(A, B). For instance, we can get the synonyms or taxonomy structures of A (or B) via WordNet [20], domain ontologies [77], Wikipedia [34, 37], DBpedia [11] or YAGO [79]. Obviously, we have to integrate these related information of A and B that comes from different (heterogeneous) knowledge sources. To this end, we first present the notion of semantic representation of concepts in theory. We then give a framework for semantic similarity computation based on the semantic representation of concepts.

3.1 Semantic representation of concepts

How to represent a concept for semantic similarity computation? Because the semantic information of a concept may come from multiple knowledge sources, in particular, with the development of information technology, some new knowledge sources might be developed, we need a flexible way to represent the semantic information of a concept. Let us see an example.

  • Example 1. Consider a concept C1 = Artificial Intelligence. Clearly, from WordNet, Wikipedia and DBpedia we know that C1WordNet, C1Wikipedia, and C1DBpedia. From WordNet we know that the set of synonyms of C1 is synonyms(C1) = {AI}. From Wikipedia or DBpedia we have that the set of synonyms of C1 is synonyms(C1) = {AI, Machine Intelligence, Cognitive System, Computational Rationality, Soft AI, …}. Similarly, from WordNet we also know that C1 has a taxonomy structure (tree structure) TSWordNet(C1) (see Fig. 1), and C1 has a taxonomy structure (graph structure) TSWikipedia(C1) (see Fig. 2) or knowledge network (graph structure) TSDBpedia(C1) (see Fig. 3) from Wikipedia or DBpedia, respectively. Of course, we also can get other semantic information such as glosses for C1 from WordNet, Wikipedia, DBpedia, or YAGO.

Fig. 1
figure 1

Taxonomy structure of Artificial Intelligence in WordNet

Fig. 2
figure 2

Taxonomy structure of Artificial Intelligence in Wikipedia

Fig. 3
figure 3

Knowledge network of Artificial Intelligence in DBpedia

Consider another concept C2 = Semantic Web. From WordNet, Wikipedia and DBpedia we know that C2WordNet, C2Wikipedia, and C2DBpedia. Clearly, we cannot obtain the semantic information such as synonyms or taxonomy structure of C2 from WordNet, however, the information can be obtained from Wikipedia or DBpedia.

Now we propose the definition of semantic representation of concepts.

  • Definition 1. Let con be a concept. The semantic representation of concept con is defined as follows:

$$ con=\left\langle {SI}_1(con),{SI}_2(con),\dots, {SI}_n(con)\right\rangle, $$

where the ith semantic information SIi(con) of con (1 ≤ i ≤ n) is as below:

$$ SIi(con)=\left\langle \left\langle {KS}_{i_1}:{value}_{i_1}\right\rangle, \left\langle {KS}_{i_2}:{value}_{i_2}\right\rangle, \dots, \left\langle {KS}_{i_m}:{value}_{i_m}\right\rangle \right\rangle, $$

where \( {KS}_{i_j} \) (1 ≤ j ≤ m) means the jth knowledge source of SIi(con), and \( {value}_{i_j} \) is the value of SIi(con) from \( {KS}_{i_j} \) in 〈\( {KS}_{i_j} \):\( {value}_{i_j} \)〉.

The semantic representation of concept con can be shown in Fig. 4.

Fig. 4
figure 4

Semantic representation of concepts

To understand Definition 1, let us see a simple example.

  • Example 2. From Example 1 we have the following:

$$ Artificial\ Intelligence=\left\langle glosses\left( Artificial\ Intelligence\right), synonyms\left( Artificial\ Intelligence\right),\dots, taxonomy\ structure\left( Artificial\ Intelligence\right)\right\rangle, $$

where glosses, synonyms, …, and taxonomy structure represent the titles of all semantic information of Artificial Intelligence, and.

glosses(Artificial Intelligence) = 〈〈WordNet: the branch of computer science that deal with writing computer programs that can solve problems creatively, …〉,

…,

Wikipedia: Artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, …〉〉,

synonyms(Artificial Intelligence) = 〈〈WordNet: {AI}〉, …, 〈Wikipedia: {AI, Machine Intelligence, Cognitive System, Computational Rationality, Soft AI, …}〉〉,

taxonomy structure(Artificial Intelligence) = 〈〈WordNet: TSWordNet〉, …, 〈Wikipedia: TSWikipedia〉〉.

  • Remark 1. The semantic representation of concepts in Definition 1 is a flexible representation mechanism. On one hand, we don’t fix the numbers and kinds of semantic information of a concept, that is, users may add different semantic information such as hyponym (or sub-concept), hypernym (or super-concept), category, path, or seealso to semantic representation of concepts.

On the other hand, for any semantic information of concepts, we may obtain its value from multiple knowledge sources such as WordNet, domain ontologies (e.g., MeSH [44] or SNOMED CT [40]), Wikipedia, DBpedia, or YAGO. It is worth noting that the types of the values of different semantic information may be different, for instance, the types of the values of synonyms, glosses, or taxonomy structure are set, string, or tree (graph), respectively. Clearly, for some semantic information, its values from multiple knowledge sources can be integrated (merged). For example, the values of synonyms from different knowledge sources can be combined by using operation union in set theory, and the values of glosses from multiple knowledge sources may be merged by using operation concatenation of string. We call such semantic information as operable (denoted by ⊕, see Definition 2). Of course, some semantic information such as taxonomy structure is inoperable.

For the sake of convenience, we use string to represent the types of values of all semantic information. Our notation for the encoding of the value v of semantic information into its representation as a string is 〈v〉 such as 〈TSWordNet〉 and 〈TSWikipedia〉.

Definition 2. Let 〈SI 1(con), SI 2(con), …, SI n(con)〉 be the semantic representation of a concept con, where SI i(con)=〈〈\( {KS}_{i_1} \):\( {value}_{i_1} \)〉, 〈\( {KS}_{i_2} \):\( {value}_{i_2} \)〉, …, 〈\( {KS}_{i_m} \):\( {value}_{i_m} \)〉〉. If SI i(con) is operable, its values \( {value}_{i_1} \), \( {value}_{i_2} \), …, and \( {value}_{i_m} \) from \( {KS}_{i_1} \), \( {KS}_{i_2} \), and \( {KS}_{i_m} \) respectively can be merged by the following operator:

valuei=\( {value}_{i_1} \)\( {value}_{i_2} \)⊕…⊕\( {value}_{i_m} \), where ⊕ denotes integration (or combination) operator of multiple values of same type such as ∪ for sets and + for strings.

SIi(con) is extended as follows:

SIi(con)=〈〈\( {KS}_{i_1} \):\( {value}_{i_1} \)〉, 〈\( {KS}_{i_2} \):\( {value}_{i_2} \)〉, …, 〈\( {KS}_{i_m} \):\( {value}_{i_m} \)〉, 〈\( {KS}_{i_1},{KS}_{i_2},\dots, {KS}_{i_m} \): valuei〉〉.

In fact, for any {〈\( {KS}_{i_s} \):\( {value}_{i_s} \)〉, …, 〈\( {KS}_{i_t} \):\( {value}_{i_t} \)〉}⊆{〈\( {KS}_{i_1} \):\( {value}_{i_1} \)〉, 〈\( {KS}_{i_2} \):\( {value}_{i_2} \)〉, …, 〈\( {KS}_{i_m} \): \( {value}_{i_m} \)〉}, we may have the following:

valuei’=\( {value}_{i_s} \)⊕…⊕\( {value}_{i_t} \),

SIi(con) can be extended as SIi(con)=〈〈\( {KS}_{i_1} \):\( {value}_{i_1} \)〉, …, 〈\( {KS}_{i_m} \):\( {value}_{i_m} \)〉, 〈\( {KS}_{i_s},\dots, {KS}_{i_t} \): valuei’〉〉.

  • Example 3. From Example 2 we know that the glosses of Artificial Intelligence can be merged as follows:

    WordNet, …, Wikipedia: “the branch of computer science that deal with writing computer programs that can solve problems creatively, …”+…+“Artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, …”〉.

3.2 A framework for semantic similarity computation

Given two concepts A and B, we firstly need to obtain their semantic information in order to compute semantic similarity between them. Clearly, we can get their semantic information from the semantic representation 〈SI1(A), SI2(A), …, SIn(A)〉 and 〈SI1(B), SI2(B), …, SIn(B)〉 of A and B, respectively. Because there are lots of semantic information in A and B, we can design different similarity computation methods by using different semantic information. For example, feature-based measures need some features such as glooses, synonyms, hyponyms (sub-concepts), hypernyms (super-concepts), or categories, but IC-based measures need certain taxonomy structure (tree structure or graph structure). To unify these similarity measures (e.g., distance-based, feature-based, or IC-based measures) between two concepts, we need a framework for these semantic similarity measures.

Definition 3. Let A = 〈SI 1(A), SI 2(A), …, SI n(A)〉 and B = 〈SI 1(B), SI 2(B), …, SI n(B)〉 be semantic representation of two concepts, where SI i(A)=〈〈\( {KS}_{i_1} \):\( {value}_{i_1} \)〉, 〈\( {KS}_{i_2} \):\( {value}_{i_2} \)〉, …, 〈\( {KS}_{i_m} \): \( {value}_{i_m} \)〉〉 and SI i(B)= 〈〈\( {KS}_{i_1} \):\( {value_{i_1}}^{\prime } \)〉, 〈\( {KS}_{i_2} \):\( \kern0.50em {value_{i_2}}^{\prime } \)〉, …, 〈\( {KS}_{i_m} \):\( \kern0.50em {value_{i_m}}^{\prime } \)〉〉. The semantic similarity between A and B, denoted as Sim(A, B), is the function Sim: CON×CON → [0, 1], and is defined as follows:

$$ Sim\left(A,B\right)={Sim}_{concepts}\left({Sim}_{SI_1}\left({ESetSI}_1,{ESetSI}_1^{\prime}\right),{Sim}_{SI_2}\left({ESetSI}_2,{ESetSI}_2^{\prime}\right),\dots, {Sim}_{SI_n}\left({ESetSI}_n,{ESetSI}_n^{\prime}\right)\right), $$

where (1) \( {Sim}_{SI_i} \)(ESetSIi, ESetSIi′) (1 ≤ i ≤ n) is the similarity measure of semantic information SIi(A) and SIi(B), concretely, \( {Sim}_{SI_i} \) is the function \( {Sim}_{SI_i} \): SetSIi × SetSIi′ → [ai, bi], where ai, biR+∪ {0}, ai ≤ bi, R+∪{0} denotes the set of non-negative real numbers.

(2) Simconcepts is the function Simconcepts: [a1, b1] × … × [an, bn] → [0, 1].

(3) CON stands for the set of all concepts, SetSIi and SetSIi′ denote the set of all values of semantic information SIi(A) and SIi(B) respectively, formally, SetSIi = {〈\( {value}_{i_1} \)〉∪〈\( {value}_{i_2} \)〉∪…∪ 〈\( {value}_{i_m} \)〉} and SetSIi′={〈\( {value_{i_1}}^{\prime } \)〉∪〈\( {value_{i_2}}^{\prime } \)〉∪…∪〈\( {value_{i_m}}^{\prime } \)〉}, ESetSIiSetSIi, and ESetSIi′ ∈SetSIi′.

  • Example 4. Let A and B be two concepts, A = 〈glosses(A), synonyms(A), taxonomy(A)〉 and B = 〈glosses(B), synonyms(B), taxonomy(B)〉 be semantic representation of concepts A and B, where

    $$ {\displaystyle \begin{array}{c} glosses(A)=\left\langle \left\langle WordNet:{g}_{WordNet}(A)\right\rangle, \left\langle Wikipedia:{g}_{Wikipedia}(A)\right\rangle, \left\langle DBpedia:{g}_{DBpedia}(A)\right\rangle \right\rangle, \\ {} glosses(B)=\left\langle \left\langle WordNet:{g}_{WordNet}(B)\right\rangle, \left\langle Wikipedia:{g}_{Wikipedia}(B)\right\rangle, \left\langle DBpedia:{g}_{DBpedia}(B)\right\rangle \right\rangle, \\ {} synonyms(A)=\left\langle \left\langle WordNet:{s}_{WordNet}(A)\right\rangle, \left\langle Wikipedia:{s}_{Wikipedia}(A)\right\rangle, \left\langle DBpedia:{s}_{DBpedia}(A)\right\rangle \right\rangle, \\ {} synonyms(B)=\left\langle \left\langle WordNet:{s}_{WordNet}(B)\right\rangle, \left\langle Wikipedia:{s}_{Wikipedia}(B)\right\rangle, \left\langle DBpedia:{s}_{DBpedia}(B)\right\rangle \right\rangle, \\ {} taxonomy(A)=\left\langle \left\langle WordNet:{t}_{WordNet}(A)\right\rangle, \left\langle Wikipedia:{t}_{Wikipedia}(A)\right\rangle, \left\langle DBpedia:{t}_{DBpedia}(A)\right\rangle \right\rangle, and\\ {} taxonomy(B)=\left\langle \left\langle WordNet:{t}_{WordNet}(B)\right\rangle, \left\langle Wikipedia:{t}_{Wikipedia}(B)\right\rangle, \left\langle DBpedia:{t}_{DBpedia}(B)\right\rangle \right\rangle \end{array}} $$

By Definition 3, we have the following:

$$ Sim\left(A,B\right)={Sim}_{concepts}\left({Sim}_{glosses}\left({g}_{WordNet}(A)\cup {g}_{Wikipedia}(A)\cup {g}_{DBpedia}(A),{g}_{WordNet}(B)\cup {g}_{Wikipedia}(B)\cup {g}_{DBpedia}(B)\right),{Sim}_{synonyms}\left({s}_{WordNet}(A)\cup {s}_{Wikipedia}(A)\cup {s}_{DBpedia}(A),{s}_{WordNet}(B)\cup {s}_{Wikipedia}(B)\cup {s}_{DBpedia}(B)\right),{Sim}_{taxonomy}\left({t}_{WordNet}(A)\cup {t}_{Wikipedia}(A)\cup {t}_{DBpedia}(A),{t}_{WordNet}(B)\cup {t}_{Wikipedia}(B)\cup {t}_{DBpedia}(B)\right)\right). $$

From Definition 3 and Example 4 we know that the framework for semantic similarity measures is very generic. For any similarity function \( {Sim}_{SI_i} \): SetSIi × SetSIi′ → [ai, bi], there are many concrete implementation methods. Formally, for any {〈\( {value}_{i_s} \)〉, …, 〈\( {value}_{i_t} \)〉}⊆{〈\( {value}_{i_1} \)〉, 〈\( {value}_{i_2} \)〉, …, 〈\( {value}_{i_m} \)〉}, we can define a similarity function as follows from the perspective of knowledge sources:

$$ {Sim}_{SI_i}:\kern0.5em \left\{<{value}_{i_s}>\cup \dots \cup <{value}_{it}>\right\}\times \left\{<{value}_{i_s}\hbox{'}>\cup \dots \cup <{value}_{i_t}\hbox{'}>\right\}\to \left[{a}_i,{b}_i\right] $$

For example, in Example 4 part of the definitions of function Simglosses can be defined as follows:

$$ {Sim}_{glosses}:{g}_{WordNet}(A)\times {g}_{WordNet}(B)\to \left[a,b\right],{g}_{WordNet}(A)\times {g}_{Wikipedia}(B)\to \left[a,b\right],\mathrm{or}\ {g}_{WordNet}(A)\cup {g}_{Wikipedia}(A)\times {g}_{WordNet}(B)\cup {g}_{Wikipedia}(B)\to \left[a,b\right]. $$

From the perspective of mathematical tools of semantic similarity measures, we may use different mathematical tools such as IC [63], PMI [14], LSA [19], ESA [23], or Jaccard and Dice coefficients [45] for \( {sim}_{SI_i} \)(ESetSIi, ESetSIi′) (1 ≤ i ≤ n) in Definition 3. For instance, we can define Simglosses and Simsynonyms using ESA, Jaccard or Dice coefficients, and define Simtaxonomy using IC.

Lastly, the function Simconcepts in Definition 3 is also very flexible. Generally speaking, we may implement Simconcepts by introducing some simple functions such as max, min, or average.

Now we give the implementation method of the framework for semantic similarity measures.

figure a
  • Remark 2. In Algorithm 1, the sets {SI1, SI2, …, SIn} and {KS1, KS2, …, KSm} can be specified by users or experts. The value of SIi(A) and SIi(B) may be obtained from knowledge sources automatically. In fact, we may obtain the values of SIi(A) and SIi(B) offline. If we cannot get 〈KSj:\( {value}_{i_j} \)〉 (resp., 〈KSj:\( {value}_{i_j}^{\prime } \)〉) of SIi(A) (resp., SIi(B)), we may assign 〈KSj:\( {value}_{i_j} \)〉=ϕ (resp., 〈KSj:\( {value}_{i_j}^{\prime } \)〉=ϕ). In Step (5) of Algorithm 1, we can assign lots of similarity functions for each SIi∈{SI1, SI2, …, SIn} in theory. However, we can selectively set up similarity functions according to the complementarity of knowledge sources in practical applications.

For example, let us consider knowledge sources {WordNet, Wikipedia, MeSH}. It is well known that WordNet is a large lexical database, Wikipedia is a free online encyclopedia, and MeSH is a hierarchically-organized terminology for indexing and cataloging of biomedical information. Clearly, these are three complementary knowledge sources. If we only consider semantic information glosses and taxonomy (see Example 4), we may set up the following similarity functions:

$$ {Sim}_{glosses}:{glosses}_{WordNet}(A)\times {glosses}_{WordNet}(B)\to \left[a,b\right],{Sim}_{glosses}:{glosses}_{Wikipedia}(A)\times {glosses}_{Wikipedia}(B)\to \left[a,b\right],{Sim}_{glosses}:{glosses}_{WordNet}(A)\cup {glosses}_{Wikipedia}(A)\times {glosses}_{WordNet}(B)\cup {glosses}_{Wikipedia}(B)\to \left[a,b\right],{Sim}_{taxonomy}:{taxonomy}_{WordNet}(A)\times {taxonomy}_{WordNet}(B)\to \left[a,b\right],{Sim}_{taxonomy}:{taxonomy}_{MeSH}(A)\times {taxonomy}_{MeSH}(B)\to \left[a,b\right],\mathrm{and}\kern0.17em {Sim}_{taxonomy}:{taxonomy}_{Wikipedia}(A)\times {taxonomy}_{Wikipedia}(B)\to \left[a,b\right]. $$

If AWikipedia, AWordNet, AMeSH, BMeSH, BWordNet, and BWikipedia, then we also can give the similarity functions as follows:

$$ {Sim}_{taxonomy}:{taxonomy}_{Wikipedia}(A)\times {taxonomy}_{MeSH}(B)\to \left[a,b\right]. $$

Obviously, all existing methods of similarity computation can be obtained by instantiating the framework (Definition 3), that is, all existing approaches to similarity measures (including distance-based measures, feature-based measures, statistical measures, and hybrid measures, see Section 2 for more details) can result from instantiations of the framework. Concretely, existing methods to similarity measures consider only one knowledge source such as WordNet, Wikipedia, domain ontology, or DBpedia, thus, in Step (5) of Algorithm 1 there is only one kind of similarity function for each SIi∈{SI1, SI2, …, SIn}. Clearly, in addition to the existing similarity computation methods, we can get a lot of new similarity measure methods by instantiating the framework, in particular, we may obtain some new approaches to similarity measures that existing methods cannot deal with by introducing multiple knowledge sources.

4 Some approaches for measuring semantic similarity

In Section 3 our framework for semantic similarity of concepts is proposed. In this section we give some generic and flexible approaches to similarity measures by instantiating the framework. As stated in Section 3, all existing approaches can result from instantiations of our framework, the instantiation method is as follows:

figure b

In what follows, we present some new similarity measures that existing methods cannot deal with by instantiating the framework. Similarly to existing similarity measures, we also give three families of similarity measure methods: (1) IC-based similarity measures; (2) distance-based similarity measures; and (3) feature-based similarity measures. Based on these three similarity measure families, we will naturally get hybrid similarity measures.

4.1 IC-based measures under multiple knowledge sources

In the framework in Definition 3 or Algorithm 1, to implement IC-based similarity measures, we need one or multiple taxonomy structures (tree structures or graph structures). Suppose that A and B are two concepts, KS1, KS2, …, KSm are knowledge sources, and T1, T2, …, Tm are taxonomy structures in KS1, KS2, …, KSm, respectively.

If there exists a taxonomy structure Ti (1 ≤ i ≤ m) such that A, BTi, it is easy to get the LCS (least common subsumber) for A and B in Ti. Furthermore, we can compute Sim(A, B) by using IC-based similarity measure methods (see Section 2.3). However, if there does not exist any taxonomy structure Ti (1 ≤ i ≤ m) such that A, BTi, that is, for any taxonomy structure Ti (1 ≤ i ≤ m), either ATi, BTi, or ATi, BTi, how should we compute Sim(A, B) by using IC-based measures at this time (or how to find the LCS for A and B by using KS1, …, KSm)? To solve this problem, we propose some new IC-based similarity measures for concepts.

Without loss of generality, suppose that all knowledge sources that we consider are the set AllKS = {KS1, KS2, …, KSm}, and there exist some knowledge sources KSA = {KSk, KSk + 1,…, KSl} ⊆ AllKS and KSB = {KSs, KSs + 1,…, KSt} ⊆ AllKS such that for any KSiKSA and KSjKSB we have the following:

ATi, ATj, BTj, BTi, where T1, T2, …, Tm are taxonomy structures of KS1, KS2, …, KSm, respectively.

Obviously, there is no LCS for A and B in Ti (or Tj), thus, we cannot compute Sim(A, B) only by considering Ti (or Tj). Now we give some methods for Sim(A, B) by considering both Ti and Tj.

  • Definition 4. Let T be a taxonomy structure and concept subsumption (<T) be a binary relation <T: CON×CON, being CON the set of all concepts, where A < TC means that A is a subconcept of C or C is a parent concept of A in T. A < TC iff C > TA, that is, A > TC means that A is a parent concept of C or C is a subconcept of A in T. A ≤ TC iff A < TC or A = C (i.e., A and C are two identical concepts). A ≥ TC iff A > TC or A = C. We define the set of subconcepts, superconcepts, hyponyms, and hypernyms of a concept ACON w.r.t T as follows:

$$ subconcepts\kern0.55mm \left(A,T\right)=\left\{C\in \kern0.55mm CON\kern0.55mm |\ C<{}_TA\right\}; superconcepts\left(A,T\right)=\left\{C\in CON|\ C>{}_TA\right\}; hyponyms\left(A,T\right)=\left\{C\in CON|\exists {C}_1,{C}_2,\dots, {C}_{n-1},{C}_n\in CON\wedge n\ge 2\wedge {C}_1=A\wedge {C}_n=C\wedge {C}_1>{}_T{C}_2\wedge \dots \wedge {C}_{n-1}>{}_T{C}_n\wedge {C}_1\ne {C}_2\ne \dots \ne {C}_{n-1}\ne {C}_n\right\}; hypernyms\left(A,T\right)=\left\{C\in CON|\exists {C}_1,{C}_2,\dots, {C}_{n-1},{C}_n\in CON\wedge n\ge 2\wedge {C}_1=A\wedge {C}_n=C\wedge {C}_1<{}_T{C}_2\wedge \dots \wedge {C}_{n-1}<{}_T{C}_n\wedge {C}_1\ne {C}_2\ne \dots \ne {C}_{n-1}\ne {C}_n\right\}. $$

Clearly, we have that subconcepts(A, T) ⊆ hyponyms(A, T) and superconcepts(A, T) ⊆ hypernyms(A, T).

  • Definition 5. Let A, BCON be two different concepts (i.e., A ≠ B) and T be a taxonomy structure. The set of walks between A and B w.r.t. T can be defined as follows:

walks(A, B, T) = {〈C1, C2, …, Cn〉| C1, C2, …, CnCONC1 = ACn = B ∧ (∀1 ≤ i < n, Cisuperconcepts(Ci + 1, T)) ∧ C1 ≠ C2 ≠ … ≠ Cn-1 ≠ Cn}.

  • Definition 6. Let Ti and Tj be two taxonomy structures, A, BCON be two different concepts (i.e., A ≠ B), ATi, ATj, BTj, and BTi. The set of common ancestors of A and B w.r.t. Ti and Tj is defined as follows:

$$ CommonAnc\left(A,B,{T}_i,{T}_j\right)=\left\{C\in CON|\ C\in hypernyms\left(A,{T}_i\right)\wedge C\in hypernyms\left(B,{T}_j\right)\right\}. $$
  • Definition 7. Let Ti and Tj be two taxonomy structures, A, BCON be two different concepts (i.e., A ≠ B), ATi, ATj, BTj, and BTi. The set of GCS (Good Common Subsumer) of A and B w.r.t. Ti and Tj can be defined as follows:

    $$ GCS\ \left(A,B,{T}_i,{T}_j\right)=\left\{C\in CON|C\in CommonAnc\left(A,B,{T}_i,{T}_j\right)\wedge {p}_1\in \kern0.55em walks\ \left(C,A,{T}_i\right),{p}_2\in walks\left(C,B,{T}_j\right),|{p}_1|+|{p}_2|={\min}_{D\in CommonAnc\left(A,B,{T}_i,{T}_j\right),{p}^{\prime}\in walks\left(D,A,{T}_i\right),{p}^{\hbox{'}\hbox{'}}\in walks\left(D,B,{T}_j\right)}\left\{|p\hbox{'}|+|p^{{\prime\prime} }|\right\}\right\}, $$

where |p| is the length of walk p, i.e., if p = 〈c1, c2, …, cn + 1〉, then |p| = |〈c1, c2, …, cn + 1〉| = n.

Based on the GCS for two concepts in two taxonomy structures (Definition 7), we can present some new IC-based measures under multiple knowledge sources by extending traditional IC-based similarity measures (see Section 2.3) [35, 41, 42, 63, 72]. To compute semantic similarity of two concepts A and B using IC-based measures, we firstly need to give some approaches to IC computation for concepts.

  • Definition 8. Let ACON be a concept and T be a taxonomy structure. The first IC of A w.r.t. T is defined as follows:

$$ I{C}_{fir}\left(A,T\right)=1-\frac{\log\ \left( hyponyms\left(A,T\right)+1\right)}{\log\ \left(|{CON}_T|\right)}, $$

where CONT denotes the set of all concepts in T.

In fact, ICfir(A, T) is an extension of the IC model of Seco et al. [75].

  • Definition 9. Let ACON be a concept and T be a taxonomy structure. The depth depth(A, T) of A in T is defined as follows:

$$ depth\left(A,T\right)=\max \left\{|p|\ |\ p\in walks\left( root(T),A,T\right)\right\},\mathrm{where}\ root(T)\ \mathrm{is}\ \mathrm{the}\ \mathrm{root}\ \mathrm{of}\ T $$
  • Definition 10. Let ACON be a concept and T be a taxonomy structure. The set of leaves of A in T is defined as follows:

$$ leaves\left(A,T\right)=\left\{C\in CON|\ C\in hyponyms\left(A,T\right)\wedge hyponyms\left(C,T\right)=\phi \right\} $$

Furthermore, we define the following:

$$ maxleaves(T)= leaves\left( root(T),T\right) and\ maxdepth(T)=\max \left\{|p\Big\Vert p\in walks\left( root(T),A,T\right),A\in maxleaves(T)\right\}. $$

By extending the IC definitions of Zhou et al. [83] and Sanchez et al. [72], we can propose the following approaches to IC computation.

  • Definition 11. Let ACON be a concept and T be a taxonomy structure. The second and third ICs of A w.r.t. T are defined as follows:

    $$ I{C}_{sec}\left(A,T\right)=\gamma \left(1-\frac{\log\ \left( hyponyms\left(A,T\right)+1\right)}{\log\ \left(|{CON}_T|\right)}\right)+\left(1-\upgamma \right)\left(\frac{\log\ \left( depth\left(A,T\right)+1\right)}{\log\ \left( maxdepth(T)\right)}\right),\kern0.5em {IC}_{thi}\ \left(A,T\right)=-\log \left(\frac{\frac{\mid leaves\left(A,T\right)\mid }{\mid hypernyms\left(A,T\right)\cup \left\{A\right\}\mid }+1}{\mid maxleaves(T)\mid +1}\right) $$

where γ is a tuning factor that adjusts the weight of the two features involved in the IC computation. We use γ = 0.5 in default.

Now we propose some new approaches to semantic similarity measures for concepts under multiple knowledge sources by using GCS (Definition 7) and IC (Definitions 8 and 11). It is worth noting that we can obtain lots of new IC-based measures by extending traditional IC-based similarity measures. In this paper we only extend some classical IC-based measures.

  • Definition 12. Let Ti and Tj be two taxonomy structures, A, BCON be two different concepts (i.e., A ≠ B), ATi, ATj, BTj, and BTi. The IC-based semantic similarity SimIC1ord between A and B w.r.t. Ti and Tj can be defined as:

$$ SimIC{1}_{ord}\left(A,B, Ti, Tj\right)={\max}_{C\in GCS\left(A,B,{T}_i,{T}_j\right)}\left\{\max \left\{{IC}_{ord}\left(C,{T}_i\right),{IC}_{ord}\left(C,{T}_j\right)\right\}\right\}, $$

where ICord = ICfir, ICsec, or ICthi. For example, if ICord = ICfir, SimIC1ord means SimIC1fir.

Clearly, SimICord is an extension of Resnik’s metric [63].

By extending the Lin’s metric [42], we can present another similarity measure for concepts.

  • Definition 13. Let Ti and Tj be two taxonomy structures, A, BCON be two different concepts (i.e., A ≠ B), ATi, ATj, BTj, and BTi. The IC-based semantic similarity SimIC2ord between A and B w.r.t. Ti and Tj can be defined as:

$$ SimIC{2}_{ord}\left(A,B,{T}_i,{T}_j\right)={\max}_{C\in GCS\left(A,B,{T}_i,{T}_j\right)}\left\{\frac{2\times \max \left\{{IC}_{ord}\left(C,{T}_i\right),{IC}_{ord}\left(C,{T}_j\right)\right\}}{IC_{ord}\left(A,{T}_i\right)+{IC}_{ord}\left(B,{T}_j\right)}\right\}, $$

where ICord = ICfir, ICsec, or ICthi.

Obviously, we also can define a kind of similarity measure SimIC3ord by extending the Jiang and Conrath’s metric [35].

  • Definition 14. Let Ti and Tj be two taxonomy structures, A, BCON be two different concepts (i.e., A ≠ B), ATi, ATj, BTj, and BTi. The IC-based semantic similarity SimIC3ord between A and B w.r.t. Ti and Tj can be defined as:

$$ SimIC{3}_{ord}\left(A,B,{T}_i,{T}_j\right)=1-\frac{Distance\left(A,B,{T}_i,{T}_j\right)}{2}, $$

where Distance(A, B, Ti, Tj) =.

$$ I{\mathrm{C}}_{ord}\left(A,{T}_i\right)+I{\mathrm{C}}_{ord}\left(B,{T}_j\right)-2\times {\max}_{C\in GCS\left(A,B,{T}_i,{T}_j\right)}\left\{\max \left\{{IC}_{ord}\left(C,{T}_i\right),{IC}_{ord}\left(C,{T}_j\right)\right\}\right\},I{\mathrm{C}}_{ord}=I{\mathrm{C}}_{fir},I{\mathrm{C}}_{sec},\mathrm{or}\ I{\mathrm{C}}_{thi}. $$

From Definitions 12-14 we know that SimIC1ord, SimIC2ord, and SimIC3ord are based on two knowledge sources. In fact, we need multiple knowledge sources in practical applications in order to obtain better results. Therefore, we have to give some similarity measures for multiple knowledge sources.

  • Definition 15. Let AllTS = {T1, T2, …, Tm} be all taxonomy structures, TSA = {Tk, Tk + 1,…, Tl} ⊆ AllTS and TSB = {Ts, Ts + 1,…, Tt} ⊆ AllTS. For any TiTSA and TjTSB, we have that ATi, ATj, BTj, BTi. The IC-based semantic similarity measures SimIC1Mord, SimIC2Mord, and SimIC3Mord between A and B w.r.t. multiple taxonomy structures TSA and TSB can be defined as:

    $$ {\displaystyle \begin{array}{c} SimIC1{M}_{ord}\left(A,B, TSA, TSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimIC{1}_{ord}\left(A,B,{T}_i,{T}_j\right)\right\},\\ {} SimIC2{M}_{ord}\left(A,B, TSA, TSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimIC{2}_{ord}\left(A,B,{T}_i,{T}_j\right)\right\},\mathrm{and}.\\ {} SimIC3{M}_{ord}\left(A,B, TSA, TSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimIC{3}_{ord}\left(A,B,{T}_i,{T}_j\right)\right\}.\end{array}} $$

The IC-based semantic similarity measure SimIC between A and B w.r.t. TSA and TSB and all baseline measures can be defined as:

$$ SimIC\left(A,B, TSA, TSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ Sim\mathrm{I}C{1}_{ord}\left(A,B,{T}_i,{T}_j\right), SimIC{2}_{ord}\left(A,B,{T}_i,{T}_j\right), SimIC{3}_{ord}\left(A,B,{T}_i,{T}_j\right)\right\}. $$

In order to compare the values of different similarities SimIC1ord, SimIC2ord, and SimIC3ord, we normalize the value of each similarity.

  • Remark 3. In Definition 15, SimIC1Mord, SimIC2Mord, and SimIC3Mord are extensions of SimIC1ord, SimIC2ord, and SimIC3ord, respectively. That is, SimIC1ord, SimIC2ord, and SimIC3ord are based on two taxonomy structures, and SimIC1Mord, SimIC2Mord, and SimIC3Mord are based on multiple taxonomy structures.

On the other hand, if we give some new IC computation approaches (e.g., ICfou), SimIC1ord, SimIC2ord, and SimIC3ord can be expanded accordingly (e.g., SimIC1fou, SimIC2fou, SimIC3fou). Furthermore, SimIC1Mord, SimIC2Mord, and SimIC3Mord also can be expanded accordingly (e.g., SimIC1Mfou, SimIC2Mfou, SimIC3Mfou). Obviously, if we consider other baseline measures, we also can obtain some new similarity measures such as SimIC4ord and SimIC4Mord by instantiating our framework.

The similarity measure SimIC can be based on multiple taxonomy structures and baseline measures, clearly, it is easy to extend SimIC when we add new similarity measures for two or multiple taxonomy structures. For example, if a new measure SimIC4ord is provided, SimIC can be expanded as follows:

$$ SimIC\left(A,B, TSA, TSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimIC{1}_{ord}\left(A,B,{T}_i,{T}_j\right)| SimIC{2}_{ord}\left(A,B,{T}_i,{T}_j\right), SimIC{3}_{ord}\left(A,B,{T}_i,{T}_j\right), SimIC{4}_{ord}\left(A,B,{T}_i,{T}_j\right)\right\}. $$

Lastly, it is worth noting that the condition of Definition 15 can be relaxed as follows:

Let AllTS = {T1, T2, …, Tm} be all taxonomy structures, TSA = {Tk, Tk + 1,…, Tl} ⊆ AllTS and TSB = {Ts, Ts + 1,…, Tt} ⊆ AllTS. For any TiTSA and TjTSB, we have that ATi and BTj.

If TSATSBϕ, traditional IC-based measures under one taxonomy structure are included in this framework of Definition 15. For example, if TuTSATSB, SimICNord(A, B, Tu, Tu) (N = 1, 2, 3) is based on one taxonomy structure.

The relationships among all definitions of IC-based measures under multiple knowledge sources are shown as Fig. 5.

Fig. 5
figure 5

The relationships among all definitions of IC-based measures

4.2 Distance-based measures under multiple knowledge sources

Similarly to IC-based measures under multiple knowledge sources (see Section 4.1), in the framework in Definition 3 or Algorithm 1, we also need one or multiple taxonomy structures (tree structures or graph structures) in order to implement distance-based similarity measures. Assume that A and B are two concepts, KS1, KS2, …, KSm are knowledge sources, and T1, T2, …, Tm are taxonomy structures in KS1, KS2, …, KSm, respectively. Clearly, if there exists a taxonomy structure Ti (1 ≤ i ≤ m) such that A, BTi, it is easy to compute Sim(A, B) by using distance-based similarity measures (see Section 2.1). However, if there does not exist any taxonomy structure Ti (1 ≤ i ≤ m) such that A, BTi, we need some new distance-based similarity measures.

Let all knowledge sources be the set AllKS = {KS1, KS2, …, KSm}. Suppose that KSA = {KSk, KSk + 1,…, KSl} ⊆ AllKS and KSB = {KSs, KSs + 1,…, KSt} ⊆ AllKS, and for any KSiKSA and KSjKSB we have that ATi, ATj, BTj, and BTj. Obviously, there is no a path between A and B in Ti (or Tj), thus, we cannot compute Sim(A, B) only by considering Ti (or Tj). Now we give some methods for sim(A, B) by considering both Ti and Tj.

Assume that part of Ti and Tj are shown as Fig. 6.

Fig. 6
figure 6

Taxonomy structures Ti and Tj

Obviously, if there exists a concept C, such that CTi, CTj, C is a super-concept of A in Ti, and C is also a super-concept of B in Tj (see Fig. 6), we can find a path between A and B in Ti and Tj, formally, the path is made up of two paths A → C (the bold solid line in Ti) and B → C (the bold solid line in Tj), that is, there are four edges between A and B in this path.

Similarly, if there exists a concept D, such that DTi, DTj, D is a sub-concept of A in Ti, and D is also a sub-concept of B in Tj (see Fig. 6), we also may find another path between A and B in Ti and Tj, formally, the path is made up of two paths A → D (the bold dotted line in Ti) and B → D (the bold dotted line in Tj), that is, there are five edges between A and B in this new path.

Furthermore, we can compute Sim(A, B) by making use of these paths. It’s obvious that we meet a problem here: How to obtain the common super-concept or common sub-concept that we need such as C and D in Fig. 6? Because there may be multiple common super-concepts or common sub-concepts, for example, both C and E are super-concepts of A (resp., B) in Ti (resp., Tj), and both D and F are sub-concepts of A (or B) in Ti (or Tj) in Fig. 6. Clearly, we need to find the shortest path between concepts A and B in two taxonomy structures. To get the shortest path, we firstly introduce some notions.

  • Definition 16. Let T be a taxonomy structure (directed graph) and concept reachability (→T) be a binary relation →T: CON×CON, being CON the set of all concepts, where A → TC means that there is an edge e from A to C, that is, e is associated with the ordered pair (A, C) in T. A ← TC iff C → TA, that is, A ← TC means that there is an edge which is associated with the ordered pair (C, A) in T. We define the set of relatedconcepts of a concept ACON w.r.t T as follows:

relatedconcepts(A, T) = {CCON| ∃C1, C2, …, Cn-1, CnCONn ≥ 2 ∧ C1 = ACn = C ∧ ((C1 → TC2 ∧ …∧ Cn-1 → TCn) ∨ (C1 ← TC2 ∧ …∧ Cn-1 ← TCn)) ∧ C1 ≠ C2 ≠ … ≠ Cn-1 ≠ Cn}.

  • Definition 17. Let Ti and Tj be two taxonomy structures, A, BCON be two different concepts (i.e., A ≠ B), ATi, ATj, BTj, and BTi. The set of common concepts of A and B w.r.t. Ti and Tj is defined as follows, respectively:

$$ CommonCon\left(A,B,{T}_i,{T}_j\right)=\left\{C\in CON|\ C\in relatedconcepts\left(A,{T}_i\right)\wedge C\in relatedconcepts\left(B,{T}_j\right)\right\}. $$
  • Definition 18. Let A, BCON be two different concepts (i.e., A ≠ B) and T be a taxonomy structure. The set of paths between A and B w.r.t. T can be defined as follows:

$$ paths\left(A,B,T\right)=\left\{\left\langle {C}_1,{C}_2,\dots, {C}_n\right\rangle |\ {C}_1,{C}_2,\dots, {C}_n\in CON\wedge {C}_1=A\wedge {C}_n=B\wedge \left(\left(\forall 1\le i<n,{C}_i\to {}_T{C}_{i+1}\right)\vee \left(\forall 1\le i<n,{C}_i\leftarrow {}_T{C}_{i+1}\right)\right)\wedge {C}_1\ne {C}_2\ne \dots \ne {C}_{n-1}\ne {C}_n\right\}. $$

Now we can give the shortest and longest paths between concepts A and B in two taxonomy structures.

  • Definition 19. Let Ti and Tj be two taxonomy structures, A, BCON be two different concepts (i.e., A ≠ B), ATi, ATj, BTj, and BTi. The sets of the shortest paths spaths and the longest paths lpaths between A and B w.r.t. Ti and Tj can be defined as follows:

spaths(A, B, Ti, Tj)={〈C1, C2, …, Cn〉| C1, C2, …, CnCONC1 = ACn = B ∧ ∃CCommonCon(A, B, Ti, Tj), p1paths(C1, C, Ti), p2paths(Cn, C, Tj), |p1| + |p2|=.

$$ {\min}_{D\in CommonCon\left(A,B,{T}_i,{T}_j\right),{p}^{\prime}\in paths\left(A,D,{T}_i\right),{p}^{\prime \prime}\in paths\left(B,D,{T}_j\right)} $$

{|p′| + |p″|}},

lpaths(A, B, Ti, Tj)=.

{〈C1, C2, …, Cn〉| C1, C2, …, CnCONC1 = ACn = B ∧ ∃CCommonCon(A, B, Ti, Tj), p1paths(C1, C, Ti), p2paths(Cn, C, Tj), |p1| + |p2|=

$$ {\max}_{D\in CommonCon\left(A,B,{T}_i,{T}_j\right),{p}^{\prime}\in paths\left(A,D,{T}_i\right),{p}^{\prime \prime}\in paths\left(B,D,{T}_j\right)} $$

{|p′| + |p″|}}.

Furthermore, we define the longest paths w.r.t. Ti and Tj as follows:

maxdistance(Ti, Tj) = max{|p| | plpaths(A, B, Ti, Tj), ∀ATi, ∀BTj}.

where |p| is the length of path p, i.e., if p = 〈c1, c2, …, cn + 1〉, then |p| = |〈c1, c2, …, cn + 1〉| = n.

Based on the shortest path between two concepts in two taxonomy structures (Definition 19), we can present some new distance-based measures under multiple knowledge sources by extending traditional distance-based similarity measures (see Section 2.1) [25, 39, 41, 62, 82].

  • Definition 20. Let Ti and Tj be two taxonomy structures, A, BCON be two different concepts (i.e., A ≠ B), ATi, ATj, BTj, and BTi. The distance-based semantic similarity SimDis1 between A and B w.r.t. Ti and Tj can be defined as:

$$ SimDis1\left(A,B,{T}_i,{T}_j\right)=2\times maxdistance\left({T}_i,{T}_j\right)-\mid p\mid, $$

where pspaths(A, B, Ti, Tj).

Clearly, SimDis1 is an extension of the metric of Rada et al. [62].

  • Definition 21. Let Ti and Tj be two taxonomy structures, A, BCON be two different concepts (i.e., A ≠ B), ATi, ATj, BTj, and BTi. The distance-based semantic similarity SimDis2 between A and B w.r.t. Ti and Tj can be defined as:

$$ SimDis2\left(A,B, Ti, Tj\right)=\frac{2\times {N}_3\left(A,B,{T}_i,{T}_j\right)}{N_1\left(A,{T}_i\right)+{N}_2\left(B,{T}_j\right)+2\times {N}_3\left(A,B,{T}_i,{T}_j\right)}, $$

where

N1(A, Ti):

max{|p| | pwalks(C, A, Ti), CGCS(A, B, Ti, Tj)},

N2(B, Tj):

max{|p| | pwalks(C, B, Tj), CGCS(A, B, Ti, Tj)},

N3(A, B, Ti, Tj):

max{|p| | pwalks(root(Ti), C, Ti)∨pwalks(root(Tj), C, Tj), CGCS(A, B, Ti, Tj)}.

Similarly to the Wu and Palmer’ metric [82], SimDis2 (Definition 21) is based on is-a hierarchies, where walks and GCS are defined in Section 4.1 (Definitions 5 and 7). Obviously, SimDis2 is an extension of the Wu and Palmer’ metric [82].

We can define the following similarity measure SimDis3 by extending the Leacock and Chodorow’s metric [39].

  • Definition 22. Let Ti and Tj be two taxonomy structures, A, BCON be two different concepts (i.e., A ≠ B), ATi, ATj, BTj, and BTi. The distance-based semantic similarity SimDis3 between A and B w.r.t. Ti and Tj can be defined as:

$$ SimDis3\left(A,B, Ti, Tj\right)=-\log \left(\frac{\mid p\mid }{2\times \max \left\{|{p}_1|,|{p}_2|\right\}}\right), $$

where pspaths(A, B, Ti, Tj), p1lrpaths(Ti), p2lrpaths(Tj),

$$ lrpaths(Ti)=\left\{p\mid p\in paths\left( root(Ti),C, Ti\right),C\in Ti,\mid p\mid {\max}_{D\in {T}_i}\right\{\mid {p}^{\prime}\left\Vert {p}^{\prime}\in paths\right( root\left({T}_i\right),D,{T}_i\left\}\right\}, $$
$$ lrpaths(Tj)=\left\{p\mid p\in paths\left( root(Tj),C, Tj\right),C\in Tj,\mid p\mid ={\max}_{D\in {T}_j}\right\{\mid {p}^{\prime}\left\Vert {p}^{\prime}\in paths\right( root\left({T}_j\right),D,{T}_j\left\}\right\}. $$

Similarly to the metric of Garla and Brandt [25], we also can normalize SimDis3 to the unit interval as follows.

  • Definition 23. Let Ti and Tj be two taxonomy structures, A, BCON be two different concepts (i.e., A ≠ B), ATi, ATj, BTj, and BTi. The distance-based semantic similarity SimDis4 between A and B w.r.t. Ti and Tj can be defined as:

$$ SimDis4\left(A,B,{T}_i,{T}_j\right)=1-\frac{\log\ \left(|p|\right)}{\log\ \left(2\times \max \left\{\mid {p}_1|,|{p}_2|\right\}\right)}, $$

where pspaths(A, B, Ti, Tj), p1lrpaths(Ti), and p2lrpaths(Tj).

Obviously, we can define a kind of similarity measure SimDis5 by extending the metric of Li et al. [41].

  • Definition 24. Let Ti and Tj be two taxonomy structures, A, BCON be two different concepts (i.e., A ≠ B), ATi, ATj, BTj, and BTi. The distance-based semantic similarity SimDis5 between A and B w.r.t. Ti and Tj can be defined as:

$$ SimDis5\left(A,B,{T}_i,{T}_j\right)={e}^{-\alpha \times \mid p\mid}\cdotp \frac{e^{\beta h}-{e}^{-\beta h}}{e^{\beta h}+{e}^{-\beta h}}, $$

where pspaths(A, B, Ti, Tj), h = max{|p| | pwalks(root(Ti), C, Ti)∨pwalks(root(Tj), C, Tj), CGCS(A, B, Ti, Tj)}, α ≥ 0, and β > 0. In our experiments, we use the same optimal parameters as in [41], i.e., α = 0.2 and β = 0.6.

In Definitions 20-24, SimDis1, SimDis2, SimDis3, SimDis4, and SimDis5 are based on two knowledge sources. We may give the following similarity measures for multiple knowledge sources.

  • Definition 25. Let AllTS = {T1, T2, …, Tm} be all taxonomy structures, TSA = {Tk, Tk + 1,…, Tl} ⊆ AllTS and TSB = {Ts, Ts + 1,…, Tt} ⊆ AllTS. For any TiTSA and TjTSB, we have that ATi, ATj, BTj, BTi. The distance-based semantic similarity measures SimDis1M, SimDis2M, SimDis3M, SimDis4M, and SimDis5M between A and B w.r.t. multiple taxonomy structures TSA and TSB can be defined as:

$$ SimDis1M\left(A,B, TSA, TSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimDis1\left(A,B,{T}_i,{T}_j\right)\right\}, $$
$$ SimDis2M\left(A,B, TSA, TSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimDis2\left(A,B,{T}_i,{T}_j\right)\right\}, $$
$$ SimDis3M\left(A,B, TSA, TSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimDis3\left(A,B,{T}_i,{T}_j\right)\right\}, $$
$$ SimDis4M\left(A,B, TSA, TSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimDis4\left(A,B,{T}_i,{T}_j\right)\right\}\mathrm{and} $$
$$ SimDis5M\left(A,B, TSA, TSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimDis5\left(A,B,{T}_i,{T}_j\right)\right\}. $$

The distance-based semantic similarity measure SimDis between A and B w.r.t. TSA and TSB and all baseline measures can be defined as:

$$ SimDis\left(A,B, TSA, TSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\Big\{ SimDis1\left(A,B,{T}_i,{T}_j\right), SimDis2\left(A,B,{T}_i,{T}_j\right), SimDis3\left(A,B,{T}_i,{T}_j\right), $$
$$ SimDis4\left(A,B,{T}_i,{T}_j\right), SimDis5\left(A,B,{T}_i,{T}_j\right)\Big\}. $$

In order to compare the values of different similarities SimDis1, SimDis2, SimDis3, SimDis4, and SimDis5, we also normalize the value of each similarity. Similar to Definition 15 (see Remark 3), the distance-based measure SimDis is also a generic and flexible approach.

The relationships among all definitions of distance-based measures under multiple knowledge sources are shown as Fig. 7.

Fig. 7
figure 7

The relationships among all definitions of distance-based measures

4.3 Feature-based measures under multiple knowledge sources

Unlike IC-based or distance-based similarity measures, feature-based measures assess similarity between concepts as a function of their properties (i.e., features). Therefore, in the framework in Definition 3 or Algorithm 1, for each concept we need one or multiple knowledge sources in order to get its properties (i.e., features). Assume that A and B are two concepts, KS1, KS2, …, and KSm are knowledge sources. Clearly, if there exists a knowledge source KSi (1 ≤ i ≤ m) such that the features of A and B can be obtained from KSi, it is easy to compute Sim(A, B) by using traditional feature-based similarity measures (see Section 2.2). However, if there does not exist any knowledge source KSi (1 ≤ i ≤ m) that can provide the features of A and B at the same time, we need some new feature-based similarity measures.

Let all knowledge sources be the set AllKS = {KS1, KS2, …, KSm}. Suppose that KSA = {KSk, KSk + 1,…, KSl} ⊆ AllKS and KSB = {KSs, KSs + 1,…, KSt} ⊆ AllKS, and for any KSiKSA and KSjKSB we have that AKSi, AKSj, BKSj, and BKSi. Obviously, we cannot compute Sim(A, B) only by considering KSi (or KSj). Now we give some methods for sim(A, B) by considering both KSi and KSj.

  • Definition 26. Let KSi and KSj be two knowledge sources, A, BCON be two different concepts (i.e., A ≠ B), AKSi, AKSj, BKSj, and BKSi. Assume that all features that we consider are {fea1, fea2, …, fean}, i.e., the semantic representation of A and B is as follows:

$$ A=\left\{{fea}_1(A),{fea}_2(A),\dots, {fea}_n(A)\right\}\ \mathrm{and}\ B=\left\{{fea}_1(B),{fea}_2(B),\dots, {fea}_n(B)\right\}, $$

where the value of feau(A) (resp., feau(B)) (1 ≤ u ≤ n) that comes from KSi (resp., KSj) is follows:

feau(A) = 〈KSi: \( {value}_{i_u} \)〉 (resp., feau(B) = 〈KSj: \( {value}_{j_u} \)〉).

The feature-based semantic similarity framework SimFea between A and B w.r.t. KSi and KSj can be defined as:

$$ SimFea\left(A,B,K{S}_i,K{S}_j\right)=\max \left\{{Sim}_1\left({value}_{i_1},{value}_{j_1}\right),\dots, {Sim}_n\left({value}_{i_n},{value}_{j_n}\right)\right\} $$

In this paper, we only consider four kinds of features, i.e., glooses, synonyms, hyponyms (or sub-concepts), and hypernyms (or super-concepts). Thus, SimFea is instantiated as follows:

$$ SimFea\left(A,B,{KS}_i,{KS}_j\right)=\max \left\{{Sim}_{glooses}\left({glooses}_i(A),{glooses}_j(B)\right),{Sim}_{synonyms}\left({synonyms}_i(A),{synonyms}_j(B)\right),{Sim}_{hyponyms}\left({hyponyms}_i(A),{hyponyms}_j(B)\right),{Sim}_{hypernyms}\left({hypernyms}_i(A),{hypernyms}_j(B)\right)\right\}, $$

where A = {〈KSi: gloosesi(A)〉, 〈KSi: synonymsi(A)〉, 〈KSi: hyponymsi(A)〉, 〈KSi: hypernymsi(A)〉} and B = {〈KSj: gloosesj(B)〉, 〈KSj: synonymsj(B)〉, 〈KSj: hyponymsj(B)〉, 〈KSj: hypernymsj(B)〉}.

Simglooses, Simhyponyms, and Simhypernyms are defined using Jaccard index, Sorensen coefficient, and Symmetric difference. Simsynonyms is defined as follows:

$$ Si{m}_{synonyms}\left( synonym{s}_i(A), synonym{s}_j(B)\right)=\Big\{{\displaystyle \begin{array}{c}1,\kern0.75em \mathrm{if}\ {synonyms}_i(A)\cap {synonyms}_j(B)\ne \varnothing \\ {}0,\kern0.75em \mathrm{if}\ {synonyms}_i(A)\cap {synonyms}_j(B)=\varnothing \end{array}}\operatorname{}. $$

Therefore, we can define the following three kinds of feature-based semantic similarity between A and B w.r.t. KSi and KSj:

$$ SimFea1\left(A,B,{KS}_i,{KS}_j\right)=\max \left\{{Sim}_{synonyms}\left({synonyms}_i(A),{synonyms}_j(B)\right)| Jaccard\left({glooses}_i(A),{glooses}_j(B)\right), Jaccard\left({hyponyms}_i(A),{hyponyms}_j(B)\right), Jaccard\left({hypernyms}_i(A),{hypernyms}_j(B)\right)\right\}, SimFea2\left(A,B,{KS}_i,{KS}_j\right)=\max \left\{{Sim}_{synonyms}\left({synonyms}_i(A),{synonyms}_j(B)\right)| Dice\left({glooses}_i(A),{glooses}_j(B)\right), Dice\left({hyponyms}_i(A),{hyponyms}_j(B)\right), Dice\left({hypernyms}_i(A),{hypernyms}_j(B)\right)\right\}, $$
$$ SimFea3\left(A,B,{KS}_i,{KS}_j\right)=\max \left\{{Sim}_{synonyms}\left({synonyms}_i(A),{synonyms}_j(B)\right)| SaltonCosine\left({glooses}_i(A),{glooses}_j(B)\right), SaltonCosine\left({hyponyms}_i(A),{hyponyms}_j(B)\right), SaltonCosine\left({hypernyms}_i(A),{hypernyms}_j(B)\right)\right\}. $$

where synonymsi(A), synonymsj(B), hyponymsi(A), hyponymsj(B), hypernymsi(A), and hypernymsj(B) are some sets of concepts (or terms), and gloosesi(A) and gloosesj(B) are concept sets that contain words extracted by parsing glosses of A and B, respectively.

In Definition 26, SimFea1, SimFea2, and SimFea3 are based on two knowledge sources. Now we give some similarity measures for multiple knowledge sources.

  • Definition 27. Let AllKS = {KS1, KS2, …, KSm} be all knowledge sources, KSA = {KSk, KSk + 1,…, KSl} ⊆ AllKS and KSB = {KSs, KSs + 1,…, KSt} ⊆ AllKS. For any KSiKSA and KSjKSB, we have that AKSi, AKSj, BKSj, BKSi. The feature-based semantic similarity measures SimFea1M, SimFea2M, SimFea3M, SimFea4M, SimFea5M, and SimFea6M between A and B w.r.t. multiple knowledge sources KSA and KSB can be defined as:

    $$ {\displaystyle \begin{array}{c} SimFea1M\left(A,B, KSA, KSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimFea1\left(A,B,{KS}_i,{KS}_j\right)\right\}\\ {} SimFea2M\left(A,B, KSA, KSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimFea2\left(A,B,{KS}_i,{KS}_j\right)\right\}\\ {}\begin{array}{c} SimFea3M\left(A,B, KSA, KSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimFea3\left(A,B,{KS}_i,{KS}_j\right)\right\}\\ {} SimFea4M\left(A,B, KSA, KSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimFea4\left(A,B,{KS}_i,{KS}_j\right)\right\}\\ {}\begin{array}{c} SimFea5M\left(A,B, KSA, KSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimFea5\left(A,B,{KS}_i,{KS}_j\right)\right\}\\ {} SimFea6M\left(A,B, KSA, KSB\right)={\max}_{i=k}^l{\max}_{j=s}^t\left\{ SimFea6\left(A,B,{KS}_i,{KS}_j\right)\right\}\end{array}\end{array}\end{array}} $$

where SimFea4(A, B, KSi, KSj) = max{Simsynonyms(synonyms(A), synonyms(B)), Jaccard(glooses(A), glooses(B)), Jaccard(hyponyms(A), hyponyms(B)), Jaccard(hypernyms(A), hypernyms(B))},

SimFea5(A, B, KSi, KSj) = max{Simsynonyms(synonyms(A), synonyms(B)), Dice(glooses(A), glooses(B)), Dice(hyponyms(A), hyponyms(B)), Dice(hypernyms(A), hypernyms(B))},

SimFea6(A, B, KSi, KSj) = max{Simsynonyms(synonyms(A), synonyms(B)), SaltonCosine(glooses(A), glooses(B)), SaltonCosine(hyponyms(A), hyponyms(B)), SaltonCosine(hypernyms(A), hypernyms(B))},

glooses(A) = gloosesk(A)∪…∪gloosesl(A),

glooses(B) = gloosess(B)∪…∪gloosest(B),

synonyms(A) = synonymsk(A)∪…∪synonymsl(A),

synonyms(B) = synonymss(B)∪…∪synonymst(B),

hyponyms(A) = hyponymsk(A)∪…∪hyponymsl(A),

hyponyms(B) = hyponymss(B)∪…∪hyponymst(B),

hypernyms(A) = hypernymsk(A)∪…∪hypernymsl(A),

hypernyms(B) = hypernymss(B)∪…∪hypernymst(B).

The feature-based semantic similarity measure SimFea between A and B w.r.t. KSA and KSB and baseline measures can be defined as:

SimFea(A, B, KSA, KSB)=

$$ {\max}_{i=k}^l{\max}_{j=s}^t\Big\{ SimFea1\left(A,B,{KS}_i,{KS}_j\right), SimFea2\left(A,B,{KS}_i,{KS}_j\right), $$
$$ SimFea3\left(A,B,{KS}_i,{KS}_j\right), SimFea4\left(A,B,{KS}_i,{KS}_j\right), $$
$$ SimFea5\left(A,B,{KS}_i,{KS}_j\right), SimFea6\left(A,B,{KS}_i,{KS}_j\right)\Big\}. $$

In order to compare the values of different similarities SimFea1, SimFea2, SimFea3, SimFea4, SimFea5, and SimFea6, we also normalize the value of each similarity. Similarly to Definitions 15 and 25, the feature-based measure SimFea is also a generic and flexible approach.

The relationships among all definitions of feature-based measures under multiple knowledge sources are shown as Fig. 8.

Fig. 8
figure 8

The relationships among all definitions of feature-based measures

Until now some generic and flexible approaches (including IC-based measures, distance-based measures, and feature-based measures) to similarity measures of concepts have been presented. As stated in Section 1, semantic similarity between concepts can be applied to many fields such as multimedia databases, multimedia encyclopedias, digital libraries, and multimedia documents. The application architecture is as follows (Fig. 9):

Fig. 9
figure 9

Application architecture of semantic similarity measures

5 Experiments and evaluation

In this section we discuss the evaluation problem of our similarity measures (see Section 4). Section 5.1 introduces some experimental datasets and evaluation metrics. Section 5.2 gives our experimental results. Lastly, in Section 5.3, we discuss and analyze the experimental results.

5.1 Experimental datasets and evaluation metrics

We collect several publicly available gold standard benchmarks for evaluating concept semantic similarity, which are conventionally most common-used and some recently most updated benchmarks. The descriptions of these benchmarks used in the experiments are listed below.

  1. (1)

    WS353 [22] benchmark contains 353 word pairs and 13 to 16 human subjects were asked to assign a numerical similarity score between 0.0 to 10.0 (0 means totally unrelated and 10 means very closely related). In fact, this benchmark measures general relatedness rather than similarity because it considers other semantic relations (e.g., antonyms are considered as similar).

  2. (2)

    WordSim-353 [2] benchmark is a subset of WS353. WS353 is divided into two subsets. The first one concerns about relatedness while the second subset focuses on similarity. We only use the second one named WordSim-353 in our experiments. It contains 203 pairs of words and it has been identified by the authors to be suitable for evaluating semantic similarity specially.

  3. (3)

    R&G [66] benchmark is the first and most used benchmark containing human assessment of word similarity. The benchmark resulted from the experiment conducted in 1965 where a group of 51 students (all native English speakers) assessed the similarity of 65 pairs of words selected from ordinary English nouns. Those 51 subjects were requested to judge the similarity of meaning for two given words on a scale from 0.0 (completely dissimilar) to 4.0 (highly synonymous). It focuses on semantic similarity and ignores any other possible semantic relationships between the words.

  4. (4)

    M&C [52] benchmark contains 30 word pairs. It replicated the R&G experiment again in 1991 by taking a subset of 30 noun pairs. The similarity between words was judged by 38 human subjects.

  5. (5)

    Jiang-1 [37] and Jiang-2 [34] benchmarks contain 30 pairs of real-world Wikipedia concepts, respectively. The similarity between each concept pair is assessed by 10 students and 10 teachers in a scale between 0 (semantically unrelated) and 4 (highly synonymous). After a normalization process, a final set of 30 concept pairs is rated with the average of the similarity values provided by the students and the teachers. Thus, these two benchmarks are created and can be used to evaluate the accuracy of our approaches so that we use them in this work.

Each benchmark described above contains a list of triples comprising two words and a similarity score denoting word similarity judged by human. Concretely, we select 203 word pairs from WordSim-353, 65 word pairs from R&G, 30 word pairs from M&C, 30 word pairs from Jiang-1, and 30 word pairs from Jiang-2 in our experiments.

It is well known that an objective evaluation of the accuracy of semantic similarity functions is difficult because the notion of similarity is subjective. Generally, similarity measures are evaluated by means of standard benchmarks of word pairs whose similarity has been assessed by a group of human experts [37]. However, in this paper we evaluate our new approaches to measure similarity under multiple knowledge sources that existing similarity computation methods cannot deal with (traditional methods are generally based on one knowledge source). In particular, for any word pairs (or concept pairs) (A, B), A and B belong to different knowledge sources (A and B belong to the same knowledge source in traditional methods). Therefore, comparison of the proposed methods with standard benchmarks imposes some challenges and requires some modifications and adjustments in order to make such comparison meaningful. The comparative experiments have been group into three parts.

Firstly, we evaluate our methods over 5 benchmarks, namely M&C, R&G, WordSim-353, Jiang-1, and Jiang-2 and two kinds of knowledge sources, namely WikipediaFootnote 1 and WordNet.Footnote 2 To evaluate our methods objectively, for any concept pair (A, B), we require that the value of A comes from Wikipedia and the value of B comes from WordNet.

Secondly, we develop a benchmark Jiang-3 and then use it to evaluate the accuracy of our proposals. For comparison purposes, we select 30 pairs of real-world concepts extracted from some widely used knowledge sources, i.e., Wikipedia, WordNet, Medical Subject Headings (MeSH),Footnote 3 Disease Ontology (DO)Footnote 4 and Human Phenotype Ontology (HPO).Footnote 5 Our benchmark Jiang-3 is shown in Table 1. The similarity between each concept pair is assessed by 10 students and 10 teachers in biomedical fields in a scale between 0 (semantically unrelated) and 4 (highly synonymous), respectively. After a normalization process, a final set of 30 concept pairs is rated with the average of the similarity values provided by the students and the teachers. To evaluate our methods objectively, for any concept pair (A, B), we require that AWikipedia, AWordNet, AMeSH, ADO, AHPO, BMeSH, BDO, BHPO, BWikipedia, and BWordNet.

Table 1 Our benchmark Jiang-3

Lastly, in our benchmark Jiang-3 there are five kinds of knowledge sources, i.e., Wikipedia, WordNet, MeSH, DO, and HPO. Clearly, Wikipedia and WordNet are two kinds of general-purpose knowledge sources, but MeSH, DO, and HPO are three kinds of domain dependent knowledge sources (biomedical ontologies). To evaluate the accuracy of our proposals in another setting, we build another two benchmarks Jiang-4 and Jiang-5 by using knowledge sources MeSH, DO, HPO, Gene Ontology (GO),Footnote 6 and Ontology for Biomedical Investigations (OBI).Footnote 7 In our benchmark Jiang-4 there are 30 pairs of real-world concepts extracted from three kinds of knowledge sources, i.e., MeSH, DO, and HPO. Jiang-4 is shown in Table 2. For any concept pair (A, B), we require that AMeSH, AHPO, ADO, BDO, BMeSH, and BHPO. In our benchmark Jiang-5 there are 30 pairs of real-world concepts extracted from three kinds of knowledge sources, i.e., MeSH, GO, and OBI. Jiang-5 is shown in Table 3. For any concept pair (C, D), we require that CMeSH, CGO, COBI, DGO, DOBI, and DMeSH.

Table 2 Our benchmark Jiang-4
Table 3 Our benchmark Jiang-5

Different knowledge sources have different semantic information such as concept taxonomies and distributions of instances over concepts. We apply different combinations of knowledge sources to different benchmarks in this work and express the semantics of concepts through integrating different semantic information. To further illustrate it, we describe the relations among seven knowledge sources considered in our experiments and eight benchmarks in Fig. 10. The mark “1” on the arrow from knowledge source to benchmark represents the first concept in each pair of benchmark is computed in corresponding knowledge source. Similarly, the mark “2” represents the second concept in each pair of benchmark is computed in corresponding knowledge source. For example, the first concept in each pair of Jiang-3 benchmark is computed on WordNet and Wikipedia, and the second concept is computed on HPO, DO, and MeSH.

Fig. 10
figure 10

The relation among seven knowledge sources WordNet, Wikipedia, OBI, GO, MeSH, DO, and HPO and eight benchmarks M&C, R&G, WordSim353, Jiang-1, Jiang-2, Jiang-3, Jiang-4, and Jiang-5 (the connections from knowledge sources to benchmarks show the components of each benchmarks)

The knowledge sources WordNet and Wikipedia are used in measuring semantic similarities of concept pairs in M&C, R&G, WordSim-353, Jiang-1, Jiang-2 and Jiang-3 benchmarks. The WordNet organizes the lexical information in meanings (senses) and synsets (set of synonym words in a specific context) [5]. Each synset has a gloss that defines the concept. Hypernymy is a relation that organizes noun synsets into a lexical inheritance taxonomy. In this taxonomy, a subordinate term inherits the basic features from the superordinate term and adds its distinctive features to form its own meaning. The Wikipedia is a free, online multilingual knowledge source that is collaboratively maintained by volunteers and known to have a good coverage capacity [30]. At the bottom of each page in Wikipedia, all assigned categories are listed with links to the category page. These categories are connected to form the Wikipedia Category Graph (WCG). Wikipedia categories and their relations do not have explicit semantics like WordNet. The Wikipedia categorization system does not form a taxonomy like the WordNet “is-a” taxonomy with a fully subsumption hierarchy, but only through a thematically organized thesaurus. For example, Computer systems is categorized in the upper category of Technology systems (is-a) and Computer hardware (has-part).

The knowledge source MeSH is used in measuring semantic similarities of concept pairs in Jiang-3, Jiang-4, and Jiang-5 benchmarks. The MeSH organizes biomedical concepts in a meaningful way with explicit semantic relations. It consists of single- and multi-word terms that are used to index and catalog the medical literature [16]. Among the relations [5], we use the MeSH “is-a” taxonomy. The knowledge sources DO and HPO are used in measuring semantic similarities of concept pairs in Jiang-3 and Jiang-4 benchmarks. The DO has been developed as a standardized ontology for human disease with the purpose of providing the biomedical community with consistent, reusable and sustainable descriptions of human disease terms, phenotype characteristics and related medical vocabulary disease concepts. Also, the DO semantically integrates disease and medical vocabularies through extensive cross mapping terms to the MeSH thesaurus. The HPO is devising a system or a domain for the traits of phonomes and their effects on daily encountered human diseases [81]. The aim is to provide a well-structured vocabulary for these traits so that they can be easily studied and searched in the field of medical science to bring awareness about the traits and how they can damage a person’s health and body organs. The HPO currently contains over 13,000 different terms of traits and characteristics, and over 156,000 annotations to hereditary diseases. Each term describes a phenotypic abnormality such as Atrial septal defect.

The knowledge sources GO and OBI are used in measuring semantic similarities of concept pairs in Jiang-5 benchmark. The GO provides an ontology to describe attributes of gene products in three non-overlapping domains of molecular biology [26]. It includes several of the world’s major repositories for plant, animal and microbial genomes. Within each ontology, terms have free text definitions and stable unique identifiers. The vocabularies are structured in a classification that supports “is-a” and “part-of” relationships. The OBI is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted [7]. It imports parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meanings. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services.

The accuracy of the similarity computation is quantified by computing the correlation between the human judgements and the results provided by the computerized measures. This enables an objective evaluation of the different similarity computation methods. The correlation between two variables is the degree to which there is a relationship between them. Correlation is usually expressed as a coefficient which measures the strength of a relationship between the variables. Our experiments will use two measures of correlation: Pearson (Pearson correlation coefficient) and Spearman (Spearman correlation coefficient). Pearson reflects the linear correlation between measuring result with human judgments. Spearman is another metric and compares the correlation between measuring result with human judgments based on the ranking strategy.

5.2 Experimental results

For environment of our evaluation, the version of Wikipedia is released on April 20, 2018, the version of WordNet is 3.1, the version of MeSH is released in 2018, the version of GO is released on May 31, 2018, the version of DO is released on May 15, 2018, the version of HPO is released on March 9, 2018, and the version of OBI is released on April 29, 2016. At the same time, we use JWPL (Java Wikipedia Library), Java with JDK1.8 and MySQL to implement our algorithms to measure similarity by the formulas given in Section 4. According to our statistics, there are 47,204 concepts in GO, 11192 concepts in DO, 3337 concepts in OBI, 13544 concepts in HPO, 28938 concepts in MeSH, 1,679,499 concepts in Wikipedia and 147,479 concepts in WordNet.

Tables 4 and 5 are some related results about the developments of Jiang-3, Jiang-4 and Jiang-5 benchmarks. To evaluate the similarity of the concepts come from different knowledge sources, the common concepts are the key factor in both path-based and IC-based approaches proposed in this paper. In fact, common concepts are the elements of the intersections of the corresponding concept sets of different knowledge sources. In this case, we list the numbers of the elements in the intersections of different concept sets of seven knowledge sources MeSH, DO, HPO, OBI, GO, Wikipedia and WordNet in Table 4. We take different combinations of knowledge sources to build benchmarks Jiang-3, Jiang-4 and Jiang-5. We list the numbers of the concept pairs that are generated by different combinations in Table 5. According to the numbers of the pairs that have common ancestors or children and the numbers of the pairs that perform well on all three types approaches proposed in this paper, we adopt the last three division schemes and extract 50 concept pairs from each scheme to generate Jiang-3, Jiang-4 and Jiang-5 benchmarks, respectively.

Table 4 Numbers of the concepts in the intersections of seven considered knowledge sources
Table 5 The details of the concept pairs in different combinations of knowledge sources

The second (M&C), third (R&G), fourth (WordSim-353), fifth (Jiang-1), sixth (Jiang-2), seventh (Jiang-3), eighth (Jiang-4), and ninth (Jiang-5) columns in Table 6 show the Pearson correlation coefficients of the different measures with human judgments.

Table 6 Results on Pearson correlation with human judgments of similarity measures

The second (M&C), third (R&G), fourth (WordSim-353), fifth (Jiang-1), sixth (Jiang-2), seventh (Jiang-3), eighth (Jiang-4), and ninth (Jiang-5) columns in Table 7 show the Spearman correlation coefficients of the different measures with human judgments.

Table 7 Results on Spearman correlation with human judgments of similarity measures

5.3 Discussion and analysis

Now we analyze and discuss the experimental results (see Tables 6 and 7) from four different aspects: (1) the influence of knowledge sources, (2) the influence of benchmarks, (3) the differences among three kinds of measures: IC-based measures, Distance-based measures, and Feature-based measures, (4) the performances of three most generic and flexible measures: SimIC, SimDis, and SimFea.

5.3.1 Influence of knowledge sources

The results in Tables 6 and 7 show that the most results on both Pearson correlation and Spearman correlation coefficients on the benchmarks M&C, R&G, Jiang-1, and Jiang-3 are better than those on benchmarks WordSim-353, Jiang-2, Jiang-4, and Jiang-5. It indicates that domain-independent knowledge sources like Wikipedia and WordNet perform better in measuring similarities among both general and special concepts. The reason is that the semantic information of the concepts in M&C, R&G, Jiang-1, and Jiang-3 is computed on Wikipedia and WordNet, but Jiang-4 and Jiang-5 are computed on five biomedical knowledge sources. Furthermore, they are biomedical ontologies and the expressions of the same word are often different from encyclopedia. For example, the glosses of the same concept in HPO and WordNet varies from each other and semantic information in WordNet contains more features.

GO, DO, OBI, HPO, and MeSH are all the domain-specific ontologies which express the concepts professionally, but Wikipedia and WordNet express the concepts more general. So this is a problem which the features of the same concept from different knowledge sources are different and even some features are empty. And we use our methods to compute semantic similarity between concepts based on the features so that it causes the differences in our results.

5.3.2 Influence of benchmarks

Eight benchmarks are computed in our experiments. For the first five benchmarks, Tables 6 and 7 show that the results of both Pearson and Spearman correlation coefficients on M&C, R&G, and Jiang-1 are relatively better than WordSim-353 and Jiang-2. For all the concept pairs in these five benchmarks, we measure the semantic information of one concept of each pair on WordNet and the other of each pair on Wikipedia. M&C is a subset of R&G with the relabeled human judgments. All the concepts in these three benchmarks (M&C, R&G, and Jiang-1) are ordinary English nouns so they are fully described in both lexical databases like WordNet and encyclopedia like Wikipedia. The characteristics of both benchmarks (M&C, R&G, and Jiang-1) and knowledge sources (WordNet and Wikipedia) make the results on M&C, R&G, and Jiang-1 relatively good. Jiang-2 contains pairs of real-world Wikipedia concepts and over half of them don’t appear in WordNet taxonomy structure. The correlation coefficients on Jiang-2 benchmark don’t exceed 0.5 in Tables 6 and 7. WordSim-353 is a dataset for measuring semantic relatedness between words (concepts) so the correlation coefficients of semantic similarity task on it is not good by using both WordNet and Wikipedia.

The results of Pearson correlation coefficients on Jiang-3 are a little better compared with Jiang-4 and Jiang-5, and the Spearman correlation coefficients are much better than both Jiang-4 and Jiang-5. The best Spearman correlation coefficient is higher than 0.98 on Jiang-3 but lower than 0.52 on both Jiang-4 and Jiang-5. The first reason may be that the diversity of the concept pairs in Jiang-3 are much higher. Since Jiang-3 involves both the biomedical ontologies and common knowledge sources, which can provide with more semantic information. The second reason may be caused by the small different integrality of the semantic information of the concepts in Jinag-4 and Jiang-5. The information contained in Jiang-4 and Jiang-5 are much more professional but in Jiang-3 are much more extensive.

5.3.3 Influence of measures

Three kinds of measures, i.e., IC-based measures, distance-based measures, and feature-based measures, are proposed in this paper. For the IC-based measures, it is obvious that the measures SimIC1Mfir, SimIC2Mthi, SimIC3Mfir, and SimIC3Msec perform well on Pearson results while other six measures (SimIC1Mthi, SimIC1Msec, SimIC2Mfir, SimIC2Msec, SimIC3Mthi, and SimIC) don’t. Meanwhile, the measures SimIC1Mfir and SimIC2Mthi perform relatively better on Spearman results than other eight measures (SimIC1Msec, SimIC1Mthi, SimIC2Mfir, SimIC2Msec, SimIC3Mfir, SimIC3Msec, SimIC3Mthi, and SimIC). These four measures (SimIC1Mfir, SimIC2Mthi, SimIC3Mfir, and SimIC3Msec) involve three approaches of IC computation and three IC-based similarity measurement methods introduced in Section 4.1, which illustrates that IC-based measures are all feasible if they are adopted appropriately. The measure SimIC3Msec outperforms other measures with Pearson correlation coefficients 0.805, 0.740 and 0.723 on M&C, R&G, and Jiang-1, respectively. For Spearman results, the measure SimIC2Mthi outperforms other measures with 0.644 on M&C and 0.763 Jiang-3. These confirm the statistical similarity measures like IC-based measures are effective on multiple heterogeneous taxonomy structures.

For the distance-based measures, all the measures obtain good correlation coefficients except SimDis1M. A major reason of poor result about SimDis1M is that the depths of knowledge sources are greatly different from each other. The same spath considered in Definition 19 represents different similarity values in different knowledge sources. Figure 11 shows an example to illustrate the different cases of the same spath of two concept pairs (A, B) and (A′, B′). spath(A, B, Ti, Tj= 4 and spath(A′, B′, Ti, Tj= 4, but they are not equally similar since the maximum depths of Ti and Tj are 10 and 5 separately. So the lengths of the path in different taxonomy structures are of different semantic meanings. In contrast, the measure SimDis5M obtains Pearson correlation coefficients 0.822, 0.738, 0.442, 0.699, 0.567 and 0.416 respectively on M&C, R&G, WordSim-353, Jiang-1, Jiang-4 and Jiang-5 benchmarks. These show the feasibility of computing semantic similarity of concepts with the distances among them on multiple taxonomy structures.

Fig. 11
figure 11

Taxonomy structures Ti and Tj

Most of the correlation coefficients of feature-based measures listed in Tables 6 and 7 are positive. The measures SimFea1M and SimFea2M obtain nearly the same correlation coefficients on all benchmarks. It indicates the set operations Jaccard and Dice in the similarity computation of features influence the result slightly. For SimFea3M, both the Pearson and Spearman results are relatively lower than SimFea1M and SimFea2M. For measures SimFea4M, SimFea5M, and SimFea6M, they combine the features from multiple knowledge sources before computing similarities. However, comparing the performances of SimFea1M and SimFea4M on all benchmarks, we find there is a significant decrease from the former to the latter on M&C, R&G and Jiang-1, but an increase from the former to the latter on Jiang-3 and Jiang-5. Analogously, the decreases also appear in the measures pairs (SimFea2M, SimFea5M) and (SimFea3M, SimFea6M). This illustrates that computing similarity of aggregate features come from multiple knowledge sources doesn’t always perform better than considering the feature in a separated knowledge source.

5.3.4 Our most generic and flexible approaches

The similarity computed in measure SimIC is completely depended on the maximum similarity of other nine IC-based measures (SimIC1Mfir, SimIC1Msec, SimIC1Mthi, SimIC2Mfir, SimIC2Msec, SimIC2Mthi, SimIC3Mfir, SimIC3Msec, and SimIC3Mthi). To reduce the deviations of similarities among different IC-based measures, we normalize the similarities of each measure before computing SimIC. However, the results of correlation coefficients are not good in Tables 6 and 7 on all benchmarks, which means it is improper to compare different IC-based measures by similarities and set the value of SimIC to the maximum similarity. The major reason may be related to the incommensurable importance of different similarity values in different measures.

Similar to SimIC discussed above, the composite measures SimDis and SimFea also have lower correlation coefficients than separated measures such as SimDis5M and SimFea2M. This explicitly shows that the maximum methods generated by different semantic similarity computation measures can hardly improve the performance of similarity computation.

6 Conclusion

The final goal of computerized similarity measures is to accurately mimic human judgements about semantic similarity. At present similarity measures have been used for many different areas such as natural language processing, information retrieval, and word sense disambiguation. In this paper, some limitations of the existing similarity measures are identified (see Section 1). For example, there is not a unified framework for existing methods and existing approaches cannot compute similarity for two concepts that come from two different knowledge sources. To tackle these problems, this paper proposes an extensive study for semantic similarity of concepts from which a unified framework for semantic similarity computation is presented. Based on our framework, we give some generic and flexible approaches to semantic similarity measures resulting from instantiations of the framework. In particular, we obtain some new approaches to similarity measures that existing methods cannot deal with by introducing multiple knowledge sources. The evaluation, based on three widely used benchmarks and five benchmarks developed in ourselves, sustains the intuitions with respect to human judgements. Some methods proposed in this paper have a good human correlation and constitute some effective ways of determining semantic similarity between concepts.

With the development of deep learning technology, in recent years semantic similarity measures can also be implemented by exploiting deep learning technologies such as long short-term memory (LSTM) deep learning methods and attention-based approaches combined with Word2Vec. As future works, we are planning to further explore semantic similarity computation by using deep learning technologies. In addition, we will theoretically and empirically investigate the unified framework issue of semantic relatedness for concepts. It is also desirable to apply our similarity measure approaches to text or short text search tasks (semantic search for texts or short texts).