Structural similarity measure between UML class diagrams based on UCG

Yuan, Zhongchen; Yan, Li; Ma, Zongmin

doi:10.1007/s00766-019-00317-w

Structural similarity measure between UML class diagrams based on UCG

Original Article
Published: 18 June 2019

Volume 25, pages 213–229, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Requirements Engineering Aims and scope Submit manuscript

Structural similarity measure between UML class diagrams based on UCG

Download PDF

Zhongchen Yuan¹,
Li Yan² &
Zongmin Ma²

627 Accesses
10 Citations
Explore all metrics

Abstract

In software reuse, the reuse of UML class diagram produced in design phase has received more attention due to the important influence on the following developing process. The reuse is based on similarity. The similarity between class diagrams contains semantic and structural aspects. The existing works focus on semantic similarity, while the structural similarity is little paid attention to. The structure of class diagram can be categorized into two aspects: intra-structure and inter-structure. The intra-structure refers to the composition of each class, and the inter-structure is represented as the relationships between classes. So, the structural similarity measure should be carried out from these two aspects. In this paper, we propose to use a graph named UML class graph (UCG) to represent a class diagram for the structural similarity measure. An algorithm based on UCG Maximum Common Subgraph Sequence is proposed for the inter-structure similarity measure, and UCG edit distance is proposed and introduced to the intra-structure similarity measure. The experimental results show that our proposed approach is effective within a domain or across domains.

Mining Instances of Structural Design Patterns from Class Diagrams Based on Sub-patterns

Using Structural Similarity for Effective Retrieval of Knowledge from Class Diagrams

Software Remodularization by Estimating Structural and Conceptual Relations Among Classes and Using Hierarchical Clustering

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Software reuse can save development costs and time to improve software development process [1]. With the increasing complexity of software, software reuse has been involved in each phase of software life cycle, including design, testing or even maintenance, not just limited to code [2, 3]. Software design has an enormous influence on the following development process [4, 5], so the reuse of software design is promising. Class diagrams produced in design phase can clearly show the static structure of a system by modeling objects and relationships between objects [6]. Currently, the reuse of class diagrams has received more attention [7, 8]. The reuse architecture of class diagrams is shown as Fig. 1.

It is shown in Fig. 1 that the reuse architecture of class diagrams contains four stages. The original class diagrams are retrieved, adjusted and then applied for new projects. The newly developed class diagrams are finally added into the repository for future reuse. Among them, the retrieval that is based on similarity measure is a key. The existing works on similarity measure focus on semantics [9]. However, class diagram contains not only semantics but also structure [10]. Class diagrams for modeling a software system are generally created by a team of developers who may have different experiences and knowledge backgrounds. It is a common case that the created class diagrams are not exactly consistent even for the development of the same project.

Let us look at an example. Suppose that we have a query class diagram shown in Fig. 2a as input. Then, with a semantics-based retrieval, the class diagrams containing Fig. 2a, b should be retrieved in the reuse repository. It can be seen that the retrieved class diagrams may have different structures due to their different developing concerns. Here, Fig. 2a is a student-centered design and Fig. 2b is a lesson-centered design. However, it is possible that only the class diagrams containing Fig. 2a are required in an application, including the related artifacts of these class diagrams. At this point, the class diagrams containing Fig. 2b would not appear in the retrieval list with respect to the structural information of the query class diagram. Let us look at another example. For the query class diagram shown in Fig. 3, which is used to model the composition of a computer, there may not be any class diagrams that model the same project as the query class diagram in the reuse repository. As a result, no class diagrams would be retrieved if a semantics-based retrieval is applied. However, there may be some structurally similar class diagrams from different projects in the reuse repository (e.g., the class diagram modeling a vehicle composition in Fig. 4), which can be applied as a useful reference to construct new related class diagrams. Therefore, in addition to the semantics of class diagrams, the retrieval of class diagrams needs to consider the structures of class diagrams also for structural reuse. The key of structural retrieval for structural reuse is the structural similarity measure.

So far, while more attention has been paid to the semantic similarity measure of class diagrams, little work has been carried for the structural similarity measure of class diagrams. In this paper, we concentrate on the structural similarity measure of class diagrams. For this purpose, we propose a graph model named UCG (UML class graph) to represent class diagram. On the basis of the UCG model, we propose the algorithms for the structural similarity measure of class diagrams. The main contributions of this paper are summarized as follows.

(1)
We propose to consider the reuse of class diagrams from a structural perspective.
(2)
We propose the structural similarity measure method for the structural reuse, where an UCG is proposed to represent a class diagram, an algorithm based on UMCSS is proposed for the inter-structure similarity measure and UCG edit distance is proposed for the intra-structure similarity measure.
(3)
We carry out an experiment to show the effectiveness of the proposed method.

The rest of this paper is organized as follows. The related work is presented in Sect. 2. Section 3 presents the generic procedure of model transformation, formally defining UML class diagram and UML class graph and providing the transformation rules. The structural similarity measure between UML class graphs is proposed in Sect. 4. Section 5 presents an experiment and analyzes the experimental results. Section 6 concludes this paper.

2 Related work

The advance is mainly reflected in semantic similarity since the reuse of software artifacts (e.g., code, component and design model) has been valued [11,12,13,14,15,16,17,18,19,20]. The most commonly used approach is that, a reusable artifact is described as a few features, each feature is assigned, and then the similarity between artifacts is calculated using the difference between features [11, 13, 16,17,18, 20]. The definition and assignment of features is generally a manual process that requires more domain knowledge and searching artifacts for reuse is based on keyword. In [21], a method called case-based reasoning is proposed, in which previous experiences are described as cases (problem and solutions) stored in a case library. Given a query condition, the most similar cases are received and then adapted for reuse in new project. With the development of Semantic Web, more ontologies (e.g., WordNet) [22] are developed and applied to some fields such as knowledge engineering and information retrieval [23]. Ontology-based similarity measure is proposed [24, 25], in which domain and application ontologies are combined to improve the accuracy of semantic similarity measure [15]. A relationship is usually represented as a vector of end class and type in [15, 19, 20], then the distance between vectors is used to measure the similarity between relationships, which can be essentially viewed as a kind of semantic measure and only applied to the same projects. Certainly, still a few methods have been proposed for the structural similarity measure [19, 26,27,28,29,30]. In [19, 28], the neighborhood information is used to measure the similarity between relationships. A sequence diagram is represented as a conceptual graph for the similarity measure in [29], in which object name corresponds to vertex and message corresponds to edge. Then the matching is based on the labels of vertices and name of edges, which falls into a semantic similarity category. In [30], the state machine diagram is represented as a digraph for the similarity measure and the similarity measure is based on an adjacency matrix representation of different edges. In [27], a model query language is designed to rewrite a class diagram for the structural matching, where a depth-first algorithm is applied for searching the maximum common parts. Note that, when the number of relationships contained in the class diagrams is small, this approach can work well because few common substructures exist among them. As the size of class diagrams increases, the number of common substructures may be more than one and it is inaccurate to use this method for calculating the structural similarity. In addition, the text-based representation is inappropriate to represent class diagram because the structure of class diagram is not represented intuitively. So, a graphical and accurate approach is desirable for the structural similarity measure between class diagrams.

The structure of class diagram can be categorized into two aspects: intra-structure and inter-structure. The intra-structure refers to the composition of each class, and the inter-structure is represented as relationships between classes. Both the intra-structure and inter-structure are all within the scope of consideration in this paper. We apply a graph [29, 30] to represent a class diagram for the structural similarity measure. The vertices and edges of an UCG are classified into different types, and the structural matching is based on the edge tags rather than vertices. An UMCSS-based algorithm is proposed for the inter-structure similarity measure, and UCG edit distance is proposed for the intra-structure similarity measure. The feature vector method [11, 13, 16,17,18, 20, 24, 25] and the vertex label method [29, 30] pay their attention on the semantics rather than the actual structure. Compared with the semantics-based method, the method proposed in the paper does not care for the semantics (end class) and the matching is just based on the tags of edges. This can be viewed as a structural matching in nature, and it can also be applied to the structural reuse of the same domain and across domains. In [27], a model query language method is proposed. Our method considers more common substructures in addition to the maximum common substructure, and this can improve the accuracy. It is especially true for the similarity measure between class diagrams with a large size. Additionally, the graphical representation of a class diagram’s structure is more intuitive than the text representation.

3 Model transformation

OMG (Object Modeling Group) defines standard DTD (Document Type Definition) for UML model file. Then an UML model is described in an XMI (Extended Mark-up Language Interchange) document based on DTD standard [31]. The structural similarity measure between class diagrams can be attributed to model matching. There are two strategies to solve the issue of model matching. The first one is to put forward algorithms on the model, and the second one is to transform the model into another model and then put forward algorithms on the new model. Here we chose the latter. A graph called UCG is proposed to represent an UML class diagram (denoted as UCD) for the structural similarity measure in this paper. The procedure is described in Fig. 5.

Obviously, this process consists of three steps. Among them, parsing XMI is to obtain all elements of class diagram. Any XML parser based on SAX (Simple API for XML) can be used to parse XMI model file and then obtain the elements (i.e., class, attribute, operation and relationship) [32]. All these elements obtained by parsing provide a preparation for formalizing class diagram. To transform UCD to UCG, the transformation rules need to be defined and the structural information of UCD must be fully reflected in UCG. On the basis, the structural similarity between UCD is converted to the structural similarity between UCG. Finally, algorithms are proposed for the structural similarity measure.

UCD and UCG are formally defined, and then, the transformation rules from UCD to UCG are summarized in the following subsections.

3.1 UML class diagram

An UML class diagram is used to model the static structure of a system, which consists of classes and relationships between classes [6]. Being an abstract representation of a set of objects with the same properties, a class shown in Fig. 6 is composed of attributes and operations. A relationship existing between classes is mainly classified into six categories: association, generalization, dependence, aggregation, composite and realization. An example shown in Fig. 7 is a fragment of a class diagram from an education domain. It contains two classes named “Teacher” and “Professor,” and one relationship of generalization, indicating class “Professor” inherits from class “Teacher.”

Definition 1

We use a 5-tuple to formally define an UML class diagram and have UCD = (C, A, O, P, R).

(1)
C is a set of classes, where C = {c₁, c₂, c₃,…,c_k} and c_i is a class;
(2)
A is a set of attribute sets, where A = {A₁, A₂, …, A_k}, A_i is a set of attributes contained in class c_i, A_i ={a_i1, a_i2, …, a_im}, and a_ij is the jth attribute of class c_i;
(3)
O is a set of operation sets, where O = {O₁, O₂, …, O_k}, O_i is a set of operations contained in class c_i, O_i = {o_i1, o_i2, o_i3, …, o_in}, and o_ik is the kth operation of class c_i;
(4)
P is a set of all the parameters, where P ={P₁, P₂, …, P_k}, P_i is a set of parameters contained in all the operations of class c_i. P_i = {P_i1, P_i2,…, P_im}, P_ij is a set of parameters contained in the operation o_ij, P_ij = {p¹_ij, p²_ij, p³_ij, …, p^t_ij}, and p^t_ij is the tth parameter of operation o_ij;
(5)
R is a set of relationships, where R = {r_ij|1 ≤ i, j ≤ |C| and i ≠ j}, r_{ij =} (c_i, t_x, c_j) is a relationship between class c_i and c_j, t_x ∊ T is the type of r_ij, and T = {t₁, t₂, t₃, t₄, t₅, t₆} is a set of relationship types. Here t₁, t₂, t₃, t₄, t₅ and t₆ corresponds to association, generalization, aggregation, composition, dependency and realization, respectively.

For the class diagram in Fig. 7, two classes “Teacher” and “Professor” are denoted as c₁ and c₂, respectively; for class “Teacher,” attribute “ID” is denoted as a₁₁, attribute “name” is denoted as a₁₂, operation “teach” is denoted as

o₁₁, and parameter “class” is denoted as p¹₁₁; similarly, the attributes “degree” and “title” of class “Professor” are denoted as a₂₁ and a₂₂, respectively; the generalization relationship between class “Teacher” and “Professor” is then denoted as r₂₁, r₂₁ = (c₂, t₂, c₁).

3.2 UML class graph

A graph is an ordered pair (V, E), where V is a set of vertices, E ⊆ V × V is a set of edges, and an edge exists between two vertices [33]. As a powerful modeling tool, a graph is applied to a series of fields, ranging from computer network to biomedical science [34]. A core in graph applications is the issue of model matching [35]. The structure of an UCD is similar to a graph: Classes of an UCD correspond to vertices of a graph and relationships of an UCD correspond to edges of a graph. So, a graph is chosen to represent an UCD for the structural similarity measure. In this section, we propose an UCG to represent an UCD. Being different from a general digraph, an UCG consists of various types of vertices and edges to correspond to different elements in an UCD.

Definition 2

An UML class graph is defined as UCG = (V, E, L).

(1)
V denotes all vertices of an UCG, where V = CV ∪ AV ∪ OV ∪ PV.
- CV is a set of class vertices and CV = {cv₁, cv₂, …, cv_k}, where cv_i is the ith class vertex.
- AV is a set of sets of attribute vertices and AV = {AV₁, AV₂,…, AV_k}, where AV_i= {av_i1, av_i2, …, av_im} is a set of attribute vertices connecting to class vertex cv_i and av_ij is the jth attribute vertex.
- OV is a set of sets of operation vertices and OV = {OV₁,OV₂,…, OV_k}, where OV_i = {ov_i1, ov_i2, …, ov_in} is a set of operation vertices connecting to class vertex cv_i and ov_ij is the jth operation vertex.
- PV is a set of all parameter vertices and PV = {PV₁,PV₂,…,PV_k}, where PV_i = {PV_i1, PV_i2,…,PV_in} is a set of parameter vertices connecting to all operation vertices that are connected to class vertex cv_i, PV_ij= {pv¹_ij,pv²_ij,…, pv^f_ij} is a set of parameter vertices connecting to the operation vertex ov_ij, and pv^t_ij is the tth parameter vertex.
(2)
E denotes all edges of an UCG, where E = AE U OE U PE U RE.
- AE ⊆ CV × AV is a set of attribute edge sets and AE = {AE₁, AE₂,…, AE_k}, where AE_i = {ae_i1, ae_i2, …, ae_im} denotes a set of attribute edges connecting class vertex cv_i and ae_ij= (cv_i, av_ij) is an attribute edge from cv_i to av_ij.
- OE ⊆ CV × OV is a set of operation edge sets and OE = {OE₁, OE₂, …, OE_k}, where OE_i= {oe_i1, oe_i2, …, oe_in} denotes a set of operation edges connecting class vertex cv_i and oe_ij = (cv_i, ov_ij) is an operation edge from cv_i to ov_ij.
- PE ⊆ OV × PV is a set of parameter edges and PE = {PE₁, PE₂, …, PE_k}, where PE_i= {PE_i1, PE_i2, …, PE_in}, PE_ij ={pe¹_ij, pe²_ij_,…, pe^f_ij}, and pe^t_ij = (ov_ij, pv^k_ij) is a parameter edge from ov_ij to pv^k_ij.
- RE ⊆ CV × CV is a set of relationship edges and RE = {re_ij|1 ≤ i, j ≤ |CV| and i ≠ j}, where re_ij = (cv_i, e_x, cv_j) is a relationship edge from cv_i to cv_j, e_x ∊ ET is a tag of re_ij and ET = {e₁, e₂, e₃, e₄, e₅, e₆} is a set of relationship edge tags.
(3)
L is a label function, which denotes the label of a vertex, L = L^C + L^A + L^O + L^P. L^C(cv_i), L^A(av_ij), L^O(ov_ij) and L^P(pv^k_ij) denote the label of class vertex cv_i, attribute vertex av_ij, operation vertex ov_ij and parameter vertex pv^k_ij, respectively.

In a general digraph, the differences among vertices are based on labels and all edges are seen to be identical except for different weights. The vertices and edges of an UCG, however, are identified as different types (as mentioned above). Each type of elements plays a different role in an object that is composed of several different types of elements. These different types of vertices and edges are denoted as different tags in Table 1 to distinguish each other.

Table 1 Element tags of UCG

Full size table

In the real world, these elements that make up an object are usually multiple types instead of single type, so the modeling tools like UCG have a wide range of applications. Let us look at an application example of UCG in network topology design. In Fig. 8, a higher bandwidth is designed between two key nodes as the backbone, say e₁, and a relatively low bandwidth is assigned between a key node and a general node, say e_a and e_o, shown. A class vertex is a key node, and an attribute vertex and an operation vertex are considered as general nodes, which are different from each other and marked with different colors. In addition, different bandwidths are denoted as edges with different pounds. The same idea can be applied to highway construction planning, where higher-quality roads should be built between key cities and the standards among other cities are less demanding.

3.3 Transformation rules

Transformation rules from UCD to UCG are proposed in this section. Here the UCG is applied for measuring the structural similarity instead of a complete matching. So, we do not consider the multiplicity of relationship here. The related permissions (e.g., public, private, and protected) of attribute and operation are also ignored in this paper. In the following, we present the detailed transformation rules.

Rule 1: class → class vertex
Class c_i in an UCD is transformed into a class vertex cv_i in an UCG and the name of class c_i becomes the label L^C(cv_i) of cv_i.
Rule 2: attribute → attribute vertex and attribute edge
Attribute a_ij of class c_i in an UCD is transformed to an attribute vertex av_ij in an UCG and the name of a_ij becomes the label L^A(av_ij) of av_ij. Then an attribute edge ae_ij between cv_i and av_ij is created and the direction is from cv_i to av_ij. The type of attribute a_ij is assigned to the tag e_a of attribute edge with a mark (e.g., ta₁, ta₂, …, ta_n).
Rule 3: operation (parameter) → operation vertex and operation edge (parameter vertex and parameter edge)
Operation o_ij of class c_i in an UCD is transformed to an operation vertex ov_ij in an UCG. Then an operation edge oe_ij between cv_i and ov_ij is created and the direction is from cv_i to ov_ij. The name of o_ij becomes the label L^O(ov_ij) of the operation vertex ov_ij and the return type of operation o_ij is assigned to the tag e_o of operation edge oe_ij with a mark (e.g., rt₁, rt₂, …, rt_n). Being different from an attribute, an operation may contain some parameters. A parameter is defined by both name and type. A parameter can be handled in a similar way as an attribute, but a parameter edge is created between operation vertex and parameter vertex. So, parameter p^t_ij in an UCD is transformed into a parameter vertex pv^t_ij in an UCG. Then a parameter edge pe^t_ij between pv^t_ij and ov_ij, is created and the direction is from ov_ij to pv^t_ij. The name of parameter p^t_ij becomes the label L^P(pv^t_ij) of parameter vertex pv^t_ij and the type of parameter p^t_ij is assigned to the tag e_p of parameter edge pe^t_ij with a mark (e.g., tp₁, tp₂, …, tp_n).
Rule 4: relationship → relationship edge
Relationship r_ij between class c_i and c_j in an UCD is transformed into a relationship edge re_ij between class vertex cv_i and cv_j in an UCG. Regarding the direction and tags of relationship edge, Fig. 9 presents the details.
Fig. 9
The direction setting of relationship edges
Full size image

With the transformation rules, the UCD in Fig. 7 is converted into an UCG in Fig. 10. Here different types of vertices are denoted with different colors for distinguishing each other.

All the elements from an UCD can be transformed into corresponding vertices and edges of an UCG based on the above transformation rules. The structure of an UCD is represented as the structure of an UCG. The following is a summary of the model transformation.

for UCD = (C, A, O, P, R)

$$\begin{aligned} & {\forall c_{i} \in C\left( {1 \le \, i \, \le n} \right) \Rightarrow \exists cv_{i} \in CV + L^{C} \left( {cv_{i} } \right)} \hfill \\& {\forall a_{ij} \in A_{i} \left( {1 \le \, i \, \le n} \right) \Rightarrow \exists av_{ij} \in AV_{i} + L^{A} \left( {av_{ij} } \right) \, }\\&\quad{+ ae_{ij} \left( {e_{a} } \right) \in AE_{i} } \hfill \\& {\forall o_{ij} \in O_{i} \left( {1 \le \, i \, \le n} \right) \Rightarrow \exists ov_{ij} \in OV_{i} + L^{O} \left( {ov_{ij} } \right) \, }\\&\quad{+ oe_{ij} \left( {e_{o} } \right) \in OE_{i} } \hfill \\& {\forall p_{ij}^{f} \in P_{ij} \left( {1 \le \, i \, \le n, \, 1 \le \, j \, \le |O_{i} |} \right) \Rightarrow \exists pv_{ij}^{f} \in PV_{ij} }\\&\quad{+ L^{P} \left( {pv_{ij}^{f} } \right) \, + pe_{ij}^{f} \left( {e_{p} } \right) \in PE_{ij} } \hfill \\& {\forall r_{ij} \left( {t_{m} } \right) \in R\left( {1 \le \, i,j \, \le n} \right) \Rightarrow \exists re_{ij} \left( {e_{m} } \right) \in RE} \end{aligned}$$

Then,

$$\begin{array}{*{20}l} {AV = \left\{ {AV_{1} ,AV_{2} , \ldots ,AV_{n} } \right\}} \hfill \\ {OV = \left\{ {OV_{1} ,OV_{2} , \ldots ,OV_{n} } \right\}} \hfill \\ {PV = \left\{ {PV_{1} ,PV_{2} , \ldots ,PV_{n} } \right\}\;\text{and}\;PV_{i} = \left\{ {PV_{i1} ,PV_{i2} , \ldots ,PV_{in} } \right\}} \hfill \\ \end{array}$$

and

$$\begin{array}{*{20}l} {AE = \left\{ {AE_{1} , AE_{2} , \ldots ,AE_{n} } \right\}} \hfill \\ {OE = \left\{ {OE_{1} ,OE_{2} , \ldots ,OE_{n} } \right\}} \hfill \\ {PE = \left\{ {PE_{1} , PE_{2} , \ldots ,PE_{n} } \right\}\;\text{and}\;PE_{i} = \left\{ {PE_{i1} , PE_{i2} , \ldots ,PE_{in} } \right\}} \hfill \\ \end{array}$$

So,

$$\begin{array}{*{20}c} {CV \cup AV \cup OV \cup PV \Rightarrow V} \\ {AE \cup OE \cup PE \cup RE \Rightarrow E} \\ \end{array}$$

and

$$L^{C} + L^{A} + L^{O} + L^{P} \Rightarrow L$$

Let,

$$\left( {V, \, E, \, L} \right) \Rightarrow UCG$$

4 Structural similarity measure

The inter-structure of an UCG can be thought of as the structure after deleting attribute vertices (edges), operation vertices (edges) and parameter vertices (edges), corresponding to the mainframe of a class diagram. The inter-structure of an UCG plays a decisive role in the structural similarity measure. The intra-structure of an UCG is expressed by these elements (i.e., attribute vertices, operation vertices and parameter vertices) connecting to a class vertex, corresponding to the composition of a class existing in an UCD.

The structural similarity measure is to quantify the structural difference. The similarity value is limited to [0, 1], where 0 means completely different and 1 means identical. Due to the characteristics that an UCG consists of different types of vertices and edges, the matching and comparing of structure can only be carried out among the elements with the same types. We have some correspondences: class vertex is to class vertex, attribute vertex (edge) is to attribute vertex, operation vertex (edge) is to operation vertex, parameter vertex (edge) is to parameter vertex and relationship edge is to relationship edge. The structural matching is based on the tags of edges, instead of vertices: the same tag indicates the same structure and vice versa. The structural similarity measure between UCG is defined as bellows.

$$Sim\left( {g_{1} , \, g_{2} } \right) = \, \theta *simInter\left( {g_{1} , \, g_{2} } \right) + \left( {1 - \theta } \right)*simIntra\left( {g_{1} , \, g_{2} } \right)$$

(1)

Here simInter and simIntra denote the similarity of inter-structure and the intra-structure, respectively, and θ is the weighting factor (θ is limited to [0, 1] and usually close to 0.9).

4.1 Preliminary knowledge

Maximum Common Subgraph (denoted as MCS) and Edit Distance (denoted as ED) are frequently used methods for graph isomorphism [36, 37]. UCG maximum common subgraph and UCG edit distance are first proposed in this section and then applied to the inter-structure similarity measure and intra-structure similarity measure, respectively.

4.1.1 UCG maximum common subgraph

Here UCG Maximum Common Subgraph is from the inter-structure of UCG, which is only applied to the inter-structure similarity measure. Obtaining UCG Maximum Common Subgraph is based on the tags of relationship edges, instead of class vertices. Firstly, UCG Maximum Common Subgraph is defined and then UCG Maximum Common Subgraph List and UCG Maximum Common Subgraph Tree are proposed, respectively.

Definition 3 (UCG Maximum Common Subgraph)

Let ucg₁ and ucg₂ be two UCG. Suppose that there exists an UCG g and there is not an UCG g′, where g ⊆ ucg₁, g ⊆ ucg₂, g′ ⊆ ucg₁, g′ ⊆ ucg₂, and |g′| > |g| (|g| is used to denote the number of relationship edges existing in g). Then g is called UCG Maximum Common Subgraph (denoted as UMCS) between ucg₁ and ucg₂.

Here, the size of an UMCS can be measured by the number of relationship edges existing in UMCS. The number of UMCS may be more than one, especially for UCG with larger size. It is assumed that g₁, g₂, …, g_m are UMCS between ucg₁ and ucg₂. Then, these UMCS constitute a list called UMCS List (denoted as UMCSL) and we have UMCSL₁ = {UMCS¹₁, UMCS¹₂, UMCS¹₃, …, UMCS¹_m}, where g_i is denoted as UMCS¹_i. Based on each UMCS¹_i existing in UMCSL₁, we can obtain UMCSL₂ between (ucg₁–UMCS¹_i) and (ucg₂–UMCS¹_i). That is, UMCSL₂ = {UMCS²₁₁, UMCS²₁₂, …, UMCS²_m1, UMCS²_m2, …, UMCS²_mn}. This process is repeated until there is not any UMCS between the remainders of ucg₁ and ucg₂. All these UMCSL are inserted into an UMCS Tree shown in Fig. 11. UMCS Tree is initialized as a root node and it is empty.

4.1.2 UCG edit distance

The basic idea of graph edit distance comes from string edit distance [38], which is used to find the minimum operation distance while transforming one graph to another. The edit distance between two graphs g₁ and g₂ is defined as follows.

$$\text{GED}\left( {g_{1} , \, g_{2} } \right) \, = \mathop {\hbox{min} }\limits_{1 \le j \le m} \sum\limits_{i = 1}^{k} {_{{e1, \ldots ,ek \in p_{j} (g_{1} ,g_{2} )}} \cos t(ei)}$$

(2)

Here, cost (e_i) denotes the cost of edit operation e_i and p_j (g₁, g₂) denotes an edit path for transforming g₁ into g₂. There may be multiple edit paths for transforming g₁ to g₂ and the edit distance is to find the path whose edit cost is the least. A standard set of edit operations generally includes insertion, deletion and substitution of both vertices and edges. In this paper, UCG edit distance is proposed and applied to the intra-structure similarity measure, in which only two operations are allowed: insertion and deletion. The label of vertex is ignored when the edit distance is calculated. The reason is that we are talking about structure, not semantics. The edit operations of UCG are summarized in Table 2.

Table 2 UCG editing operations

Full size table

On the basis of Table 2, we define the UCG edit distance as follows.

$$\text{UCGED}\left( {g_{1} ,g_{2} } \right) = x_{1} *\text{IC}_{1} + x_{2} *\text{IC}_{2} + x_{3} *IC_{3} + y_{1} *\text{DC}_{1} + y_{2} *\text{DC}_{2} + y_{3} *\text{DC}_{3}$$

(3)

Here, x₁, x₂, x₃, y₁, y₂ and y₃ are some coefficients, which are the times of the corresponding edit operation. Note that the insertion and deletion operations that are applied to the same object are assigned to the same edit cost, that is, IC₁ = DC₁, IC₂ = DC₂ and IC₃ = DC₃. Then the formula above can be further stated as follows.

$$\text{UCGED}\left( {g_{1} ,g_{2} } \right) = \left( {x_{1 + } y_{1} } \right)*IC_{1} + \left( {x_{2 + } y_{2} } \right)*IC_{2} + \left( {x_{3 + } y_{3} } \right)*IC_{3}$$

(4)

Let us look at an example shown in Fig. 12, where the UCG in Fig. 12a is matched to UCG in Fig. 12b. We calculate the edit distance from UCG in Fig. 12a to UCG in Fig. 12b based on the formula (4).

Obviously, after deleting an operation vertex ov₁₁ and its corresponding operation edge oe₁₁, inserting an attribute vertex av₁₂ and its attribute edge ae₁₂ to cv₁, and adding two operation vertices ov₂₁ and ov₂₂ and their corresponding operation edges oe₂₁ and oe₂₂ to cv₂, the UCG in Fig. 12a becomes the UCG in Fig. 12b in the structure. The edit path is shown from Step (1) to Step (4) in Fig. 13, where UCG edit distance is UCGED (a, b) = IC₁ + 3IC₂.

4.2 Similarity measure

The Similarity is based on the common parts of objects that are matching one another. Let us see an example. Two UCG g₁ and g₂ are transformed from UML class diagrams in an education domain, shown as Fig. 14, they have similar structures. We only show the inter-structure of g₁ and g₂ and the labels of the vertices are removed for saving space. Note that the same tags of class vertices from g₁ and g₂ (e.g., cv₁, cv₂, …, cv₆) do not mean that these vertices are identical. Again, to save space, we do not show the intra-structures and the distributions of attribute vertices (edges) and operation (parameter) vertices (edges) connecting to each class vertex existing in g₁ and g₂ are shown in Tables 3 and 4, respectively. In this section, the inter-structure similarity and the intra-structure similarity are discussed, respectively.

Table 3 Distribution of attribute vertices and operation (parameter) vertices in g₁

Full size table

Table 4 Distribution of attribute vertices and operation (parameter) vertices in g₂

Full size table

4.2.1 Inter-structure similarity

UMCS Tree provides a solution for using common parts to measure the inter-structure similarity. Each path from the root to a leaf node constitutes an UMCS Sequence (denoted as UMCSS). A preorder traversal of UMCS Tree can obtain all UMCSS. We have UMCSS_i = {UMCS¹_j, UMCS²_jp, …, UMCS^w_jp_….k}, where |UMCS¹_j| ≥ |UMCS²_jp| ≥ … ≥ |UMCS^w_jp_….k|. Then UMCSS_i with the largest number of elements is chosen to measure the inter-structure similarity between two matched UCG, which is defined as follows. Of course, there may be more than one like UMCSS_i.

$$SimInter(ucg_{1} ,ucg_{2} ) = \frac{{\hbox{max} \left( {\left| {\text{UMCSS}_{1} } \right|,\left| {\text{UMCSS}_{2} } \right|, \ldots ,\left| {\text{UMCSS}_{n} } \right|} \right)}}{{\hbox{min} \left( {\left| {ucg_{1} \left| , \right|ucg_{2} } \right|} \right)}}$$

(5)

$$\left| {\text{UMCSS}_{i} } \right| = \sum\nolimits_{{\text{UMCS} \in \text{UMCSS}_{i} }} {\left| {\text{UMCS}} \right|}$$

(6)

Now, an important task is to create the UMCS Tree. The algorithm of creating UMCS tree is described in Algorithm 1.

UMCS Tree t is initialized as a root node and it is NULL. The mcsl is used to store UMCSS between g₁ and g₂ in Step 1. The construction of UMCS tree is a process of repeatedly obtaining UMCSL and inserting it into UMCS tree from Step 1 to Step 7 until there is not any UMCSL in Step 10. This process is a recursion. It can be seen from Algorithm 1 that, to create UMCS tree, we need to achieve UMCSL first and we propose Algorithm 2 to deal with the issue.

Algorithm 2 performs a depth-first searching. Here S is a state space that stores common subgraph between g₁ and g₂ under construction and is a fragment of UMCS to be formed. We may have more than one UMCS and so mcsl is used to store all UMCS. S and mcsl are initialized as empty (Step 1 and Step 2). Then a relationship edge re_ij from g₁ is added to S. It is necessary to check if it is possible to extend the common subgraph represented by an actual state S by the means of adding the relationship edge re_ij to S. If this extension is successful, a new state space S replaces the old one. If the current partial solution is larger than the stored solution, it becomes the new stored solution and is inserted into mcsl (Step 4 to Step 11). saveCurrentMCS, clearMCSL and insertMCSL are three functions, which save UMCS to mcsl, clear mcsl and insert UMCS to mcsl, respectively. If the size of current partial solution is equal to the stored solution and the current partial solution is not contained in mcsl, it is appended to mcsl as another UMCS (Step 12 to Step 13) and then next UMCS is continuously searched. backState(S) is used to restore the previous state of S in Step 17.

It is well known that obtaining MCS between two graphs is a NP problem, but the actual computation time is still acceptable in many applications. The reason is based on the fact that the graphs encountered in practice are usually different from the worst cases existing in general graphs. For an UCG, the characteristics of nodes and edges can be used very often to reduce the searching time dramatically [39]. Figure 15 gives the best and worst cases that may occur in the inter-structure similarity measure.

In a best case, each relationship edge of G₁ is perfectly matched only to the relationship edge of G₂, which is shown in Fig. 15a, and UMCS is easily obtained. A worst case shown as Fig. 15b is that all relationship edges existing both in G₁ and G₂ have the same tags. At this point, an UCG is evolved into a general digraph and obtaining UMCS becomes a NP problem. It should be noted that it is almost impossible that such a worst case could occur. This is because that UCG is transformed from UCD, and it is impossible that all relationships of UCD are the same. Generally, the average number of class vertices of an UCG is not more than 30 [40]. So, an UCG is not a large graph and the time complexity of the worst case is not too bad. The basic idea of obtaining UMCS in this paper mainly comes from McGregor [36]. The difference of our approach is that our searching UMCS starts from edge instead of vertex.

Now, we begin to calculate the inter-structure similarity between g₁ and g₂ in Fig. 14 based on the proposed algorithm. We need to create an UMCS tree. An UMCS tree is initialized as a root node, and it does not contain any vertices and edges. The specific process is as follows:

(1)
Obtaining UMCSL₁ between g₁ and g₂

Two UMCS between g₁ and g₂ can be obtained, which are shown in Fig. 16 as (a) UMCS¹₁ and (b) UMCS¹₂ circled with a dotted rectangle and ellipse, respectively. We have UMCSL₁ = {UMCS¹₁, UMCS¹₂}. All these elements in UMCSL₁ are inserted into UMCS tree.

(2)
Searching UMCSL₂ between the remainders of g₁ and g₂

Then g₁—UMCS¹₁ and g₂—UMCS¹₁ as well as g₁—UMCS¹₂ and g₂—UMCS¹₂ are shown in Fig. 17, respectively.

The vertices marked by dotted lines become the part of the exited UMCS, such as cv₁ and cv₅ in Fig. 17a. The existence of a relationship edge depends on two class vertices at each end. Obviously, there is not a complete relationship edge in g₁—UMCS¹₁, but there are still a few relationship edges to be not matched, which emerge in g₂—UMCS¹₁ and are shown in Fig. 17b. So, UMCS between g₁—UMCS¹₁ and g₂—UMCS¹₁ does not exist. UMCS between g₁—UMCS¹₂ and g₂—UMCS¹₂ can be easily found, it is circled with a dotted rectangle and denoted as UMCS²₂₁ in Fig. 18. That is, UMCSL₂ = {UMCS²₂₁}. Then, the searching process can finally stop because there is not a relationship edge in the remainders of g₁—UMCS¹₂—UMCS²₂₁. As shown in Fig. 19, the element in UMCSL₂ is also inserted into UMCS tree.

Obviously, two paths exist in the UMCS tree: UMCSS₁ = {MCS¹₁} and UMCSS₂ = {UMCS¹₂, UMCS²₂₁}, where |UMCSS₂| > |UMCSS₁|. That is, the inter-structure similarity between g₁ and g₂ can be measured by UMCSS₂. We use the formulas (5) and (6) to calculate the inter-structure similarity as follows.

$$SimInter\left( {g_{1} , \, g_{2} } \right) = \frac{{\left| {\text{UMCS}_{2}^{1} \left| + \right|\text{UMCS}_{21}^{2} } \right|}}{{\hbox{min} \left( {\left| {g_{1} \left| , \right|g_{2} } \right|} \right)}} = (3 + 1)/5 = 0.80$$

The corresponding class vertices matching pairs in the inter-structure similarity are described in Table 5.

Table 5 Class vertices matching pairs in the inter-structure similarity

Full size table

Here the same tag emerges in the relationship edges re₂₁ and re₃₁ of g₁. So, the matching pair 2 and 3 can be adjusted from g₁.cv₂ to g₂.cv₇ and from g₁.cv₃ to g₂.cv₄.

4.2.2 Intra-structure similarity

Frequently, there are more than one UMCSS that satisfies the same inter-structure similarity values. For example, there are umcss₁ and umcss₂ between ucg₁ and ucg₂ and the same values can be obtained by using umcss₁ and umcss₂ to calculate the inter-structure similarity, shown as Fig. 20, where |umcss₁| = |umcss₂|. At this point, choosing which one of umcss₁ or umcss₂ as the final answer of the inter-structure similarity is decided by the intra-structure similarity.

In this paper, we introduce UCG edit distance discussed in Sect. 4.1.2 to the intra-structure similarity measure. The intra-structure similarity is based on the inter-structure similarity. The intra-structure similarity is captured from three aspects: attribute vertex (edge), operation vertex (edge) and parameter vertex (edge). To limit the intra-structure similarity value to [0, 1], the intra-structure similarity is defined as follows.

$$\begin{aligned} SimIntra\left( {g_{1} ,g_{1}^{\prime } } \right) & = \alpha *\left( {1 - \frac{{\left( {x_{1} + y_{1} } \right)*\text{IC}_{1} }}{{\sum_{{mcsg_{i} \in g_{1} ,mcsg_{j} \in g_{1}^{\prime } }} \sum_{{\text{AV}_{i} \in mcsg_{i} , \text{AV}_{j} \in mcsg_{j} }} \hbox{max} \left( {\left| {\text{AV}_{i} } \right|,\left| {\text{AV}_{j} } \right|} \right)}}} \right) \\ & \quad + \,\beta *\left( {1 - \frac{{\left( {x_{2} + y_{2} } \right)*\text{IC}_{2} }}{{\sum_{{mcsg_{i} \in g_{1} ,mcsg_{j} \in g_{1}^{\prime } }} \sum_{{\text{OV}_{i} \in mcsg_{i} , \text{OV}_{j} \in mcsg_{j} }} \hbox{max} \left( {\left| {\text{OV}_{i} } \right|,\left| {\text{OV}_{j} } \right|} \right)}}} \right) \\ & \quad + \,\gamma *\left( {1 - \frac{{\left( {x_{1} + y_{1} } \right)*IC_{1} }}{{\sum_{{mcsg_{i} \in g_{1} ,mcsg_{j} \in g_{1}^{\prime } }} \sum_{{\text{OV}_{i} \in mcsg_{i} , \text{OV}_{j} \in mcsg_{j} }} \sum_{{\text{PV}_{ik} \in \text{OV}_{i} ,\text{PV}_{jw} \in \text{OV}_{j} \hbox{max} (\left| {\text{PV}_{ik} } \right|, |\text{PV}_{jw} |)}} }}} \right) \\ \end{aligned}$$

(7)

Here, g₁ and g^′₁ are a matching pair in UMCSS_i and they are from ucg₁ and ucg₂, respectively. Parameters α, β and γ are the weighting factor (α + β+γ = 1), identifying the weight of each part in the intra-structure similarity. Generally, α is close to β and they are all above γ. They are determined by the importance of attributes, operations and parameters contained in a class. The edit cost of all these operations is set to 1, IC₁ = 1, IC₂ = 1 and IC₃ = 1. That is, the edit distance is measured only by the times of the specified edit operation.

In the following, we use the formula 7 to calculate the intra-structure similarity of UMCSS₂ of Fig. 19, we have the following results.

$$simIntra\left( {g_{1} , \, g_{2} } \right) = 0.4*0.8065 + 0.5*0.8571 + 0.1* \, 0.8500 \, = \, 0.8362$$

Here, α, β and γ are set to 0.4, 0.5 and 0.1, respectively. When the matching pair 2 and 3 is adjusted according to the above statements, another intra-structure similarity value can be calculated, and it is 0.7895. Obviously, the matching pair that is combined with a larger similarity value 0.8362 is accepted. The final structural similarity value between g₁ and g₂ is:

$$Sim\left( {g_{1} ,g_{2} } \right) = 0.90*0.8000 + 0.10*0.8362 = 0.8036$$

Here, the weighting factor θ is set to be 0.9.

5 Experiment

In this section, we design an experiment to evaluate our proposed approach. A prototype system was developed, which was implemented using Java and run on a computer (CPU I5 2.5G, RAM 8G) using Windows 7. We use Microsoft SQL Server 2008 to store UML class diagrams for our experiment. We use the experiment to prove that:

(1)
our proposed approach is suitable for UML class diagrams with various sizes,
(2)
our proposed approach is not limited by the modeling field, and
(3)
our proposed approach is more accurate than other methods.

5.1 Experimental Data

The class diagrams used in the experiment are from projects developed by software companies, which are divided into two parts: query class diagrams and target class diagrams. We calculate the structural similarity values between query class diagrams and target class diagrams. The description of the class diagrams used in the experiment is shown in Table 6.

Table 6 The description of class diagrams used in the experiment

Full size table

All query class diagrams are from the same domain “Education,” and they are classified into two categories based on the size. The sizes of the query class diagrams existing in the first category denoted as QC₁ vary from 10 to 15, and the size of each query class diagram in the second category denoted as QC₂ is limited to 20–25. The number of query class diagrams in both categories is 5. The target class diagrams are partitioned from two different perspectives. Viewed from the modeling field, the target class diagrams are divided into two categories and the number of the class diagrams is 15 in each category. In the first category denoted as TFC₁, all target class diagrams are from “Education” and describe the same or similar projects as query class diagrams. In the second category denoted as TFC₂, the modeling field of target class diagrams is from “Company,” which is completely different from the first category but still similar in structure. Viewed from the size of the target class diagrams, they can be divided into two categories and the number of class diagrams in each category is 15. The size of each target class diagram from the first category denoted as TSC₁ is limited to 10–15, and the sizes of target class diagrams from the second category denoted as TSC₂ vary from 20 to 25.

5.2 Results analysis

In the experiment, we applied three structure (relationship) similarity measure methods, which are semantics-based relationship matching (Semantics for short), model query language-based pattern matching (Query Language for short) and our proposed approach (MCSS for short), respectively. The first two methods have been mentioned in [15, 27]. Each query class diagram is matched to all target class diagrams, and all the structural similarities are calculated by these three methods. In our proposed MCSS, the weighting factors θ, α, β and γ are set to 0.9, 0.4, 0.5 and 0.1, respectively. In the semantics-based method, the weights of relationship type and end class are set to 0.5 and 0.5 when the relationship is matched.

To assess these three methods, we also invited five experts who are software engineers with rich experience in software design. The experts were requested to compare the query class diagrams and target class diagrams and then answer the same problem for each comparison between a query class diagram and a target class diagram: “how structurally similar are these two class diagrams?”. Each expert provided a certain value in [0, 1] for a comparison to identify the structural similarity degree of two compared class diagrams. Here 0 means that two compared models are completely different and 1 means the completely identical. Given that there are two categories of query class diagrams with total 10 query models and 30 target models, each expert made 300 comparisons. Finally, we compared the results obtained by the three methods with the results given by the experts. To avoid listing large amounts of data, the similarity values that a set of query class diagrams are matched to a target class diagram are averaged.

For the query class diagrams and the target class diagrams from the same modeling field, shown in Figs. 21 and 22, the results obtained by these methods are close, except for individual values, which is easy to be understood because query class diagrams and target class diagrams describe the same or similar projects, the most structural similarity values are high (≥ 0.5), and only few structural similarity values are low (≤ 0.3). In particular, it is shown in Fig. 21 that the structural similarity values are almost same, which can be explained by the small size of query class diagrams resulting in no common substructures in addition to maximum common substructure in the same modeling field.

It is shown in Figs. 23 and 24 that, however, the results obtained by these three methods have significant differences for different modeling fields. The results obtained by the semantics method are significantly smaller than the results obtained by other two methods. The reason is that the semantics method considers both relationship type and end class when a relationship is matched, the low semantic similarity between two class names from different modeling domains results in low similarity values and most structural similarity values obtained by the semantics method are low (≤ 0.5). Therefore, the semantics method is severely affected by the modeling field, but the semantics method gives the almost same results as query language method when query class diagrams and target class diagrams are from the same domain, regardless of the size of the class diagram being matched.

However, the query language method is affected by the size of the class diagrams being matched. When the size of the matched class diagrams is small and close, it is shown in Fig. 25 that the results obtained with query language and MCSS method almost has the same results. It is shown in Fig. 26 that, however, the results obtained with these two methods have significant differences for the matched class diagrams in large size, and the values obtained with MCSS are higher than the results obtained with the query language method in some matching class diagrams pairs. The reason is that the more common substructures existing between the matched class diagrams are considered in MCSS, in addition to the maximum common substructure which is considered in the query language method. Here the results by the semantics-based method are not shown and the reason is that the semantics-based method is affected by the modeling domain rather than the size of class diagrams.

It is shown from the above experimental results that our proposed algorithm is applicable for UML class diagrams with any size and modeling field. As shown in Figs. 27 and 28, no matter which way you look at it, the results obtained by our proposed MCSS are closer to the results given by the experts.

6 Conclusions

In software reuse, the reuse of UML class diagram produced in design phase becomes a major concern. The existing works on the reuse of class diagram mainly focus on its semantic reuse, and its structural reuse is rarely noticed. This paper proposes reusing class diagrams in another light, namely, structure. The core of the structural reuse is the structural similarity measure. In this paper, we propose to use UML class graph to represent UML class diagram for the purpose of structural similarity measure. The structure is considered from two aspects: inter-structure and intra-structure. An algorithm-based UMCSS is proposed for the inter-structure similarity, and the UCG edit distance is proposed and applied to the intra-structure similarity. The experimental results show that our proposed method is effective and closer to the results given by experts. Note that here we do not mean that this can become a paradigm in conceptual modeling, which is only a way available for conceptual modeling.

In our future work, we will investigate several issues. First, how to improve the efficiency of measuring similarity is one important concern. In this direction, filtering some feature values may help us to do less comparison because of the characteristics of UML class diagram consisting of various relationships. Second, trying other methods (e.g., unit structural matching) is a problem we will consider. UML class graph can be split into pieces of unit structures. On the basis of unit structures, we can obtain the final structural similarity through merging unit structure similarity. Third, transforming UML class diagram into other data models (e.g., XML model) may be a possible way for the structural similarity measure. Finally, in order to improve the matching accuracy, we will consider combining the structural similarity and the semantic similarity together for the reuse.

References

Krueger CW (1992) Software reuse. ACM Comput Surv 24(2):131–183
Article Google Scholar
Prieto-Diaz R (1993) Status report: software reusability. IEEE Softw 10(3):61–66
Article Google Scholar
Prieto-Diaz R (1993) Software reuse: issues and experiences. Am Progr 6(8):10–18
Google Scholar
Mili H, Mili F, Mili A (1995) Reusing software: issues and research directions. IEEE Trans Softw Eng 22(6):528–562
Article Google Scholar
Kim Yongbeom, Stohr Edward A (1998) Software reuse: survey and research directions. J Manag Inf Syst 14(4):113–147
Article Google Scholar
Medvidovic N et al (2002) Modeling software architectures in the unified modeling language. ACM Trans Softw Eng Methodol 11(1):2–57
Article Google Scholar
Arango G, Schoen E, Pettengill R (1993) Design as evolution and reuse. In: Proceedings of the second international workshop on advances in software reuse, pp 9–18
Ali FM, Du W (2004) Toward reuse of object-oriented software design models. Inf Softw Technol 46(15):499–517
Article Google Scholar
Adamu A, Zainon WMNW (2016) A review of UML model retrieval approaches. Indian J Sci Technol 9(46):384–390
Article Google Scholar
Object Management Group, Unified Modeling Language: Superstructure V2.0, 2005
Reiss SP (2009) Semantics-based code search. In: Proceedings of the 31st international conference on software engineering, IEEE Computer Society, IEEE, 2009, pp 243–253
Kim J et al (2010) Towards an intelligent code search engine. In: Proceedings of the twenty-fourth AAAI conference on artificial intelligence, pp 1358–1363
Alnusair A, Zhao T (2010) Component search and reuse: an ontology-based approach. In: Proceedings of 2010 IEEE international conference on information reuse and integration, pp 258–261
McMillan C et al (2012) Exemplar: a source code search engine for finding highly relevant applications. IEEE Trans Softw Eng 38(5):1069–1087
Article Google Scholar
Robles K et al (2012) Towards an ontology-based retrieval of UML Class Diagrams. Inf Softw Technol 54(1):72–86
Article Google Scholar
Salami HO, Ahmed M (2013) Class diagram retrieval using genetic algorithm. In: Proceedings of 12th international conference on machine learning and application, vol 2, pp 96–101
Al-Khiaty MAR, Ahmed M (2014) Similarity assessment of UML class diagrams using a greedy algorithm. In: Proceedings of 2014 international computer science and engineering conference (ICSEC2014), IEEE, 2014, pp 228–233
Al-Khiaty MAR, Ahmed M (2014) Similarity assessment of UML class diagrams using simulated annealing, In: Proceedings of 2014 5th international conference on software engineering and service science, IEEE, 2014, pp 19–23
Al-Khiaty MAR, Ahmed M (2016) UML class diagrams: similarity aspects and matching. Lect Notes Softw Eng 4(1):41–47
Article Google Scholar
Oksana N et al (2015) An approach to compare UML class diagrams based on semantical features of their elements. In: Proceedings of the tenth international conference on software engineering advances, pp 147–153
Gomes P et al (2004) Using WordNet for case-based retrieval of UML models. AI Commun 17(1):13–23
MathSciNet MATH Google Scholar
Miller G (1998) WordNet: an electronic lexical database. MIT press, Cambridge
MATH Google Scholar
Kara S et al (2012) An ontology-based retrieval system using semantic indexing. Inf Syst 37(4):294–305
Article Google Scholar
Cordi V, Lombardi P, Martelli M, Mascardi V (2005) An ontology-based similarity between sets of concepts. In: Proceedings of WOA, pp 6–21
Meng L, Huang R, Junzhong G (2013) A review of semantic similarity measures in wordnet. Int J Hybrid Inf Technol 6(1):1–12
Google Scholar
Lucrédio D, Fortes RPM, Whittle J (2012) MOOGLE: a metamodel-based model search engine. Softw Syst Model 11(2):183–208
Article Google Scholar
Zhang X, Chen H, Zhang T (2012) An UML model query method based on structure pattern matching. In: Proceedings of international conference on trustworthy computing and services. Springer, Berlin, Heidelberg, vol 320, pp 506–513
Qiu DH, Li H, Sun JL (2013) Measuring software similarity based on structure and property of class diagram. In: Proceedings of 2013 sixth international conference on advanced computational intelligence, IEEE, pp 75–80
Salami HO, Ahmed M (2014) Retrieving sequence diagrams using genetic algorithm. In: Proceedings of 2014 11th international joint conference on computer science and software engineering, IEEE, pp 324–330
Ahmed M, Salami HO (2015) Behavior-based retrieval of software. Afr J Comput ICT 8(1):95–102
Google Scholar
Routledge N, Bird L, Goodchild A (2002) UML and XML schema. In: Proceedings of 2002 thirteenth Australasian database conference DBLP on database technologies, pp 157–166
Grose TJ, Doney GC, Brodsky SA (2002) Mastering XMI Java Programming with XMI, XML and UML, vol 20. Wiley, Hoboken
Google Scholar
Bondy JA, Murty USR (1976) Graph theory with applications, vol 290. Macmillan, London
Book Google Scholar
Bunke Horst (2000) Graph matching: theoretical foundations, algorithms, and applications. Proc. Vision Interface 2000:82–88
Google Scholar
Conte D et al (2004) Thirty years of graph matching in pattern recognition. Int J Pattern Recognit Artif Intell 18(3):265–298
Article Google Scholar
Derek G, Gotlieb CC (1970) An efficient algorithm for graph isomorphism. J ACM 17(1):51–64
Article MathSciNet Google Scholar
McKay BD (1981) Practical graph isomorphism. J Symb Comput 60(1):94–112
MathSciNet MATH Google Scholar
Gao X et al (2010) A survey of graph edit distance. Pattern Anal Appl 13(1):113–129
Article MathSciNet Google Scholar
Bunke Horst, Shearer Kim (1998) A graph distance metric based on the maximal common subgraph. Pattern Recognit Lett 19(3–4):255–259
Article Google Scholar
Bunke H, Messmer BT (1995) Efficient attributed graph matching and its application to image analysis. In: Proceedings of international conference on image analysis and processing, Springer-Verlag, vol 974, pp 45–55

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (61772269 and 61370075).

Author information

Authors and Affiliations

School of Software, Northeastern University, Shenyang, 110819, China
Zhongchen Yuan
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
Li Yan & Zongmin Ma

Authors

Zhongchen Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Li Yan
View author publications
You can also search for this author in PubMed Google Scholar
Zongmin Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zongmin Ma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, Z., Yan, L. & Ma, Z. Structural similarity measure between UML class diagrams based on UCG. Requirements Eng 25, 213–229 (2020). https://doi.org/10.1007/s00766-019-00317-w

Download citation

Received: 19 October 2018
Accepted: 10 June 2019
Published: 18 June 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00766-019-00317-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Structural similarity measure between UML class diagrams based on UCG

Abstract

Similar content being viewed by others

Mining Instances of Structural Design Patterns from Class Diagrams Based on Sub-patterns

Using Structural Similarity for Effective Retrieval of Knowledge from Class Diagrams

Software Remodularization by Estimating Structural and Conceptual Relations Among Classes and Using Hierarchical Clustering

1 Introduction

2 Related work