Keywords

1 Introduction

The modern approach to the development of information systems involves the use of intelligent automated tools to support the design process, allowing to search for and reuse successful architectural solutions based on a common semantic representation of subject and design knowledge. The knowledge in such automated tools is currently usually presented in the form of ontologies. Ontology development is a long and resource-intensive process that requires the involvement of specialists with competencies in ontological engineering and software development.

Developers of information systems as a rule do not have sufficient knowledge about the subject area of the project. The documents governing the subject area do not always record all the accepted semantic meanings of entities and relationships. Creating a knowledge base that allows take into account and reuse successful design solutions will reduce the time of design and development, as well as the number of semantic errors. The main problem of using ontologies in the process of developing information systems is the high requirements for knowledge of the internal structure of ontologies.

The importance of formalizing concepts of the subject area for the development of an information system has led to the emergence of special design languages, which include the formalization of domain concepts (entities). The most common design tools are based on the Unified Modeling Language (UML) [1].

Class diagrams in UML notation and Java source code are used as input data for the design support system. UML diagrams are applicable to the description of specific features of information systems, for example:

  • classes that make up the architecture of the information system;

  • tables and relationships of database schema ;

  • properties and characteristics of user’s computer servers to create a physical deployment, etc.

The implementation and use of knowledge-based intelligent systems are relevant in all problem areas nowadays [9, 10]. At the moment, a lot of researchers use the ontological approach for the organization of the knowledge bases of expert and intelligent systems: M. Gao, C. Liu [11], D. Bianchini [12], N. Guarino [3], G. Guizzardi [13], R.A. Falbo [14], G. Stumme [15], N.G. Yarushkina [18], T.R. Gruber [16], A. Maedche [17].

Fernando Bobillo Umberto Straccia proposed a fuzzy ontology description format [2]. At this stage in the development of information technology, a large number of open source software systems have been created in various subject areas. Reuse of modules of open software systems will significantly reduce the time spent on software development.

Currently, the strong influence of the characteristics of the problem area, within which the development of software systems is carried out, leads to the frequent use of domain-based development methodology (Domain Driven Development — DDD). This methodology is based on an object-oriented programming paradigm and involves various types of testing that allow performing the function of checking the quality of the source code. But this methodology does not take into account the correctness of the model from the point of view of the features of the subject area. This type of error control is carried out by project managers and leading developers.The formation of the ontological representation of the model allows detecting errors in the perception of the subject area. Knowledge is captured in the ontology in the OWL (Web Ontology Language) [4] format with a predefined TBox.

As input to the design support system, class diagrams in UML notation and Java source code are used. The solution a problem of design support information systems consists of executing the following tasks:

  1. 1.

    build a model for representing information system design as the content of a knowledge base;

  2. 2.

    method of ontological indexing of class diagrams in UML notation and project by source code;

  3. 3.

    search methods for effective design solutions in the content of the knowledge base.

2 Software Product Design Model for Representation in the Knowledge Base

Project documentation includes diagrams formalized in UML notations. To solve the problem of the intellectual analysis of design diagrams, it is necessary to formalize the rules of notation in the knowledge base. Such knowledges allow the identification of design patterns and architecture software solutions used in various projects. This allows to search for projects with similar architectural solutions and approaches to the implementation of modules of information systems.

An ontology in OWL format is used as the basis of the knowledge base of the design process support system. The W3C consortium recommends using the \(\mathcal {SHOINF(D)}\) formalism  [5,6,7] for the OWL language group (OWL Light, OWL DL, OWL Full) as the logical basis of the ontology description language.

In the context of the \(\mathcal {SHOINF(D)}\) description logic, the ontology is a knowledge base of the following form [8]:

$$\begin{aligned} KB = \lbrace TBox, ABox \rbrace , \end{aligned}$$
(1)

where TBox — a set of terminological axioms representing general knowledge about the concepts of the knowledge base and their relationships;

ABox — a set of statements (facts) about individuals.

2.1 Tbox Axioms of Information Systems Design Ontology

The terminology of project diagrams is divided into the logical representation of the UML notation and the logical representation of design patterns.

$$\begin{aligned} \begin{array}{l} Relationship \sqsubseteq \top ; \\ Dependency \sqsubseteq Relationship; \\ Association \sqsubseteq Relationship; \\ Generalization \sqsubseteq Relationship; \\ Realization \sqsubseteq Relationship \sqcap \exists \, \\ startWith.Class \sqcap \exists \, endWith.Interface; \end{array} \end{aligned}$$
(2)

where startWith

The main classes can be represented:

$$\begin{aligned} \begin{array}{l} Thing \sqsubseteq \top \forall \, hasAName.String; \\ StructThing \sqsubseteq Thing; \\ AnnotThing \sqsubseteq Thing; \\ Note \sqsubseteq AnnotThing \sqcap \exists \, connectedTo.Association; \\ Class \sqsubseteq StructThing; \\ Object \sqsubseteq StructThing \sqcap \exists \, \\ isObjectOf.Class \sqcap \forall \, isObjectOf.Class; \\ Interface \sqsubseteq StructThing; \\ SimpleClass \sqsubseteq Class; \\ AbstractClass \sqsubseteq Class; \\ \end{array} \end{aligned}$$
(3)

where hasAName,isObjectOf, connectedTo — roles in relationship;

String — concrete domain.

Class attributes and methods are represented as follows:

$$\begin{aligned} \begin{array}{l} Attribute \sqsubseteq \top \sqcap \exists \, \\ hasAAttrName.String \sqcap \exists \, isAPartOf.Class\\ Method \sqsubseteq \top \sqcap \exists \, \\ hasAMethName.String \sqcap \exists \, isAPartOf.Class, \end{array} \end{aligned}$$
(4)

where hasAAttrName and hasAMethName — relationships “has attribute/method name”.

Consider the terminology of design patterns associated with the logical representation of design diagram notation (using the UML class diagram as an example):

$$\begin{aligned} \begin{array}{c} Template \sqsubseteq \top \sqcap \exists \, \\ hasATempName.String \sqcap \exists \, hasAExpValue.Double \\ SomeTemplate \sqsubseteq Template, \end{array} \end{aligned}$$
(5)

where hasATempName — role “has a design pattern name”;

hasAExpValue — role “has value of expression”;

Double — concrete domain.

Each design pattern in each specific project has a certain degree of expression.

The hierarchy of concepts of the developed ontology is presented in the Fig. 1.

Fig. 1.
figure 1

The hierarchy of concepts of the developed ontology in the editor Protege

Hierarchy of properties (DataTypeProperty and ObjectProperty) in developed ontology is presented in the Fig. 2.

Fig. 2.
figure 2

Hierarchy of DataTypeProperty and ObjectProperty f the developed ontology in the editor Protege

2.2 Abox Axioms of Information Systems Design Ontology

The Fig. 3 shows an example of the design pattern “Delegate”, which in the form of a set of ABox facts:

$$\begin{aligned} \begin{array}{c} class1 :SimpleClass \quad \\ class2 :SimpleClass; \\ attribute1 :Attribute \quad \\ object1 :Object; \\ method1 :Method \quad \\ method2 :Method; \\ relation1 :Association; \\ \left( method1, name1 :String \right) :hasAMethName; \\ \left( method2, name2 :String \right) :hasAMethName; \\ \left( attribute1, class1 \right) :iaAPartOf \quad \\ \left( object1, class2 \right) :isObjectOf; \\ \left( object1, attribute1 \right) :owl:sameAs \quad \\ \left( method1, class1 \right) :iaAPartOf; \\ \left( method2, class2 \right) :iaAPartOf \\ \\ \left( relation1, class1 \right) :startWith \quad \\ \left( relation1, class2 \right) :endWith. \end{array} \end{aligned}$$
(6)

In the knowledge base ontology, the ABox fact set includes all the facts about the design patterns used. Then, in the indexing process, the facts from ABox are compared with the facts extracted from the design diagrams and the degree of expression for each ontology template is determined.

Fig. 3.
figure 3

Ontology structure (including design pattern example)

3 Ontological Representation of Design Patterns

Formally, the ontological representation of the design pattern can be represented as follows:

$$\begin{aligned} OV^{tmp}_{i} = \lbrace C, D, R^{same\_as} \rbrace , \end{aligned}$$
(7)

where C — set of individuals in knowledge base;

D — set of relationship between elements of \(i-th\) design patterns, presented as knowledge base individuals;

R — set of equivalence relationships knowledge base individuals.

Design pattern “Builder” is one of the most commonly used patterns in industrial software development. “Builder” is a generic design pattern and allows to create complex composite objects. Figure 4 shows the UML class diagram of the Builder design pattern in the Visual Paradigm.

Fig. 4.
figure 4

Class diagram of design pattern “Builder”.

Representation of the “Builder” design pattern as a fragment of a knowledge base ontology is defined by the following concept instances:

$$\begin{aligned} \begin{array}{c} Builder.Client :SimpleClass \\ Builder.Director :SimpleClass \\ Builder.ConcreteBuilder :SimpleClass \\ Builder.Product :SimpleClass \\ Builder.AbstractBuilder :AbstractClass \\ Builder.Client\_AbstractBuilder :Association \\ Builder.Client\_Director :Association \\ Builder.Client\_IProduct :Association \\ Builder.ConcreteBuilder\_Product :Association \\ Builder.ConcreteBuilder\_AbstractBuilder :Generalization \\ Builder.Product\_IProduct :Realization \end{array} \end{aligned}$$
(8)

The ontology fragment presented above has the form shown in the Fig. 5.

Fig. 5.
figure 5

An example of an ontological representation of a design pattern Builder

Realization of the similar project search function in the knowledge base use the metric to calculate the similarity between designed (in UML notation) and already implemented software projects.

4 Determining the Design Pattern Expression in the Information System Project

To calculate the similarity measure of software projects, the following method is proposed for calculating the measure of expression the design pattern in a software project:

$$\begin{aligned} \mu ^{prj} \left( tmp_{i} \right) = \frac{ \left| C^{prj} \cap C^{tmp_{i}} \right| + \left| R^{prj} \cap R^{tmp_{i}} \right| }{ \left| C^{tmp_{i}} \right| + \left| R^{tmp_{i}} \right| }, \end{aligned}$$
(9)

where \(\left| C^{prj} \cap C^{tmp_{i}} \right| \) — number of matching individuals in an ontological representation \(i-th\) knowledge base design pattern and ontological representation of a software project;

\(\left| R^{prj} \cap R^{tmp_{i}} \right| \) — number of matching relationships in an ontological representation \(i-th\) knowledge base design pattern and ontological representation of a software project;

\(\left| C^{tmp_{i}} \right| \) — number of individuals in an ontological representation \(i-th\) knowledge base design pattern and ontological representation of a software project;

\(\left| R^{tmp_{i}} \right| \) — number of relationships in an ontological representation \(-th\) knowledge base design pattern and ontological representation of a software project.

If the number of facts (\(\left| C^{tmp_{i}} \right| \) and \(\left| R^{tmp_{i}} \right| \)) ontological representation \(i-th\) design pattern \(tmp_{i}\) determined by summing up the number of facts. To calculate number of facts (\(\left| C^{prj} \right| \) \(\left| R^{prj} \right| \)) in ontological representation of a software project it is necessary to use the following developed algorithm:

Step 1. Convert UML class diagram project \(proj_{j}\) to set of facts \(ABox^{prj}\):

$$\begin{aligned} \begin{aligned}&elem^{prj}_{k} :Concept \\&\left( elem^{prj}_{k}, elem^{prj}_{s} \right) :Role, \end{aligned} \end{aligned}$$

where Concept — concept of the knowledge base, defined at TBox;

Role — role, defined at TBox;

\(elem^{proj}_{k}\)\(k-th\) individual of ontology concept, extracted from diagram.

Step 2. Defining a set of main classes from \(ABox^{prj}\) regarding \(i-th\) design pattern \(tmp_{i}\).

The base class will be such an individual \(elem^{prj}_{k}\) of the concept Class (or its child concept Subclass) of \(ABox^{prj}\), which corresponds to some individual \(elem^{tmp}_{k} :Class\) from \(ABox^{tmp_{i}}\) for which a number of facts coinciding with the \(tmp_{i}\) pattern are maximum:

$$\begin{aligned} \begin{aligned}&elem^{prj}_{k} :Concept \left( elem^{prj}_{k}, * \right) :Role \quad \left( *, elem^{prj}_{k} \right) :Role. \end{aligned} \end{aligned}$$
(10)

Step 3. Calculation of the number of true facts. The fact is true if there is a correspondence between \(i-th\) class individuals of the design pattern \(tmp_{i}\) and class diagrams of the project \(prj_{j}\):

$$\begin{aligned} \forall \, k :elem^{tmp}_{k} \leftrightarrow elem^{prj}_{k}. \end{aligned}$$

This algorithm of the class diagram indexing is performed for each design pattern available in the knowledge base ontology.

After calculating the measure of the expression of each selected design pattern in each of the considered software projects, it becomes possible to calculate the measure of similarity between software projects using one of three metrics.

5 Metrics Measures Architectural and Semantic Similarity of Software Projects

The first metric allows to calculate the measure of similarity by the most expressed pronounced design pattern in each of the projects:

$$\begin{aligned} \mu ^{1} \left( prj_{i}, prj_{j} \right) = \bigvee _{tmp_{k} \in \left( prj_{i} \cap prj_{j} \right) } \mu ^{prj} \left( tmp_{k} \right) , \end{aligned}$$
(11)

where \(prj_{i}, prj_{j}\) — UML class diagram ontological representation \(i-th\) and \(j-th\) projects respectively;

\(\mu ^{prj} \left( tmp_{k} \right) \) — measure of expression \(k-th\) design pattern in project (expression 9).

This metric demonstrates good results for a relatively small number of complex combined design patterns. Such design patterns are based on the subject area and, to a lesser extent, correspond to design patterns in the usual sense of industrial programming.

The second metric extends the first (the expression 11) and takes into account the degree of expression of design patterns that exceeds a certain threshold value. A threshold value of 0.3 was chosen experimentally. If the measure of the expression of the design pattern is less than 0.3, we can conclude that there is no design pattern in the software project, and as a result, such a design pattern should be excluded from consideration:

$$\begin{aligned} \mu ^{2} \left( prj_{i}, prj_{j} \right) \frac{ \bigvee _{tmp_{k} \in \left( prj_{i} \cap prj_{j} \right) \ge 0.3} \mu ^{prj} \left( tmp_{k} \right) }{ N }, \end{aligned}$$
(12)

where N — number of design patterns with expression measure more than 0.3 for each project.

The third metric is similar to the second metric (expression 12), but imposes an additional condition on the contribution of the measure of expressiveness of the design pattern (\(\tilde{\mu }^{prj}\)) to the measure of architectural similarity of projects:

$$\begin{aligned} \mu ^{3} \left( prj_{i}, prj_{j} \right) \frac{ \bigvee _{tmp_{k} \in \left( prj_{i} \cap prj_{j} \right) \ge 0.3} \tilde{\mu }^{prj} \left( tmp_{k} \right) }{ N }, \end{aligned}$$
(13)

where \(\tilde{\mu }^{prj} \left( tmp_{k} \right) \) — the weighted measure of expression design pattern \(tmp_{k}\) in software project prj.

The weighted measure of expression \(\tilde{\mu }^{prj} \left( tmp_{k} \right) \) \(k-th\) design pattern \(tmp_{k}\) in project prj is measure of expression (expression 9), normalized by number of elements, included in design pattern with maximum set of element:

$$\begin{aligned} \tilde{\mu }^{prj} \left( tmp_{i} \right) = \frac{ \left| C^{prj} \cap C^{tmp_{i}} \right| + \left| R^{prj} \cap R^{tmp_{i}} \right| }{ \bigvee _{tmp_{k} \in ABox} \left( \left| C^{tmp_{k}} \right| + \left| R^{tmp_{k}} \right| \right) }, \end{aligned}$$
(14)

This modification allows taking into account the complexity of the internal structure of the design pattern when calculating the similarity measures of software projects.

Design pattern consist from 20 elements that has full expression by \(\mu ^{3} \left( prj_{i}, prj_{j} \right) \) in projects \(proj_{i}\) and \(proj_{j}\), will have 4 times more weight than the design pattern, consisting of 5 elements and also having a degree of expression equal to 1.

6 Experiment in Finding Design Patterns in Public Projects from Github

To test the proposed approach to highlighting design patterns in projects, an experiment was conducted, the purpose of which was to search for design patterns in projects located in the GitHub repository. To conduct the experiment, information on 10 design patterns was added to the ontology: Delegation, Interface, Abstract superclass, Builder, Adapter, Singleton, Bridge, Façade, Decorator, Observer.

As a result of the “vk api” request, 108 projects were received related to the set of projects working with the social network API “VKontakte”. VKontakte is the most common social network in Russia.

The query by “design patterns” resulted in a sample of 6516 projects. This experiment is necessary to verify the operation of the system in conditions of increased content of design patterns in projects.

For testing, the sample was limited to the first 100 projects for both requests. Search results for design patterns are presented as bar graphs in the Figs. 6 and 7.

Fig. 6.
figure 6

Search results for templates among projects received by request “vk api”

Fig. 7.
figure 7

Search results for templates among projects received by request “design patterns”

Selected design patterns differ in the number of elements and the relationships between them. The number of elements varies from 3 to 20.

In this experiment, only projects developed using the Java programming language were also considered.

Since the total number of projects in the GitHub repository developed in the Java language is very large, it is necessary to limit the selection of projects for the experiment. As a result, the following results were obtained: high frequency of use of the Delegation, Interface, Abstract superclass and Facade templates. This result is explained by the simple structure of these patterns — a relatively small number of structural elements and, as a result, relationships. These design patterns may have been used unconsciously by developers, or they may coincide in structure with part of a more complex pattern.

There were relatively few design patterns for Builder, Adapter, Bridge, Decorator, and Observer in the control group of projects. The rarity of these patterns is due to their complex structure — the content of a large number of elements.

7 Search Experiments Results for Structurally Similar Software Projects

To determine the measure of structural similarity between two projects, it is necessary to calculate the measure of severity of each design pattern in both projects (expressions 9 and 14).

In this experiment, all projects were downloaded from the GitHub open repository. All projects were selected by the following keywords: “public API”, “social network”, “vkontakte”, which allows to determine whether the projects belong to the subject area — work with the social network API “VKontakte”: Android-MVP, cordova- social-vk, cvk,DroidFM, VK-Small-API, VKontakteAPI, VK_TEST.

The Table 1 shows the severity of each considered design pattern in all projects of the experimental sample. The similarity score ratings are normalized from 0 to 1.

Table 1. The expression measure of design patterns in projects

The estimates of the similarity measure for the first metric (expression 11) are always equal to 1, because this metric selects a design pattern with the maximum measure of expression for each of the two compared projects. And since, for example, the Abstract superclass, Interface, and Delegator design pattern consists of a relatively small number of elements, this leads to a high degree of expression of such patterns in a large number of projects.

The estimates of the projects similarity measure by second (expression 12) and third (expression 13) metric presented at 2 and 3 tables respectively.

Table 2. Measures of structural similarity of software products in the second metric
Table 3. Measures of structural similarity of software products in the third metric

The degree of similarity of the projects obtained in these experiments are very high, which can be explained by two features of this experiment. Design patterns with a severity measure of less than 0.3 were excluded in at least one of the compared projects. In this experiment, it is assumed that the design pattern, expressed with a measure of expression less than 0.3, is not found in the project. Accounting design patterns with a small degree of severity will lead to a significant decrease in the value of the similarity indicator of any projects with an increase in the number of design patterns.

The considered metrics for calculating project similarity are based on a single computational principle and represent its consistent development. The third metric is the most universal for projects and design patterns of different sizes but much more parametrized.

Design patterns can be implemented in projects in various ways. This problem can be solved in two ways:

  1. 1.

    Using the corporate standard of the company

  2. 2.

    To use projects from open sources, it is worthwhile to form two or more alternative representations of the design pattern in an ontology and consider them equivalent

8 Conclusion

In the course of this research, the following results were obtained:

  1. 1.

    Ontologically-oriented model of the UML diagram language and ontological model of the design pattern.

  2. 2.

    Architectural similarity measures for software projects; measures of expressiveness of the design pattern in the considered software products.

  3. 3.

    An algorithm for transforming a class diagram in UML notation into an ontology of the OWL format.

  4. 4.

    An algorithm for transforming source code in the Java programming language into an ontology of the OWL format.

Thus, the proposed approach to supporting the design process allows the use of successful design solutions in the development of new software project, thereby reducing the design process time and increasing the quality of the resulting solutions.