Keywords

1 Introduction

One of the most important and complex issues of artificial intelligence is the issue of representation and processing of knowledge. The basic models of knowledge representation are: productional, frame-based, logical and semantic networks. Using any of these models can be developed intelligent system. Therefore, before starting to develop we must choose the proper model [1].

The first production model was offered by Post in 1943. It is based on rules that allows us to represent knowledge in the form of sentences like “If (condition) then (action)”. The production model has the disadvantage that the accumulation of a sufficiently large number (several hundreds) of productions they begin to contradict each other [2]. Frame - a method of knowledge representation in artificial intelligence, which is a scheme of action in a real time. Originally, the term “frame” Marvin Minsky introduced in the 70-ies of XX century to refer to the knowledge structure for the perception of spatial scenes. It is an abstract model of the image, the minimum possible description of the essence of any object, phenomenon, events, situations, process [3]. The main idea of the approach in constructing logical models of knowledge representation is all the information needed for applications viewed as a collection of facts and statements which are presented as a formula to some logic. Knowledge is displayed as a set of such formulas, and the generation of new knowledge is reduced to the implementation of inference procedures. A formal theory is based on logical models of representation [5]. Semantic web is the information domain model that has the form of a directed graph. In semantic network the role of vertexes is executed by meaning in database and arcs (directed) explains the relation between these meanings. Thus, the semantic network reflects the semantics of the domain in the form of meanings and relationships [5].

In the treatment of knowledge commonly used search methods based on predicate calculus decisions (rules modus-ponus etc., conjunction and disjunction negation, etc.) [6]. Also it is used forward and reverse output in expert systems of production type (search strategy in depth, the width of the search strategy, splitting into sub-tasks, a-b-algorithm, etc.) [7]. Increasingly, they began to be used the knowledge in intelligent processing method with frame-systems (demons attached procedures, inheritance) [8].

Very interesting way would be to use multi-dimensional arrays as the repository for the knowledge base. Multi-dimensional arrays are arrays whose elements are arrays. Defining a multidimensional array must contain information about the type, dimensions and number of elements of each dimension. The elements of a multidimensional array are arranged in the memory in ascending order of the right index. Variables are both local and global can be as simple or indexed structures. Global variables or globals, as stored data, form the basis of so-called direct access [9]. Globals this data structure usually multidimensional stored in a database and can be processed by different processes in a multiuser environment. Multidimensionality of data is realized through the indexes, that is why we talk about an indexed variable.

Global data is stored in the B *tree. A tree that has the same number of sub-levels in each of its sub-tree, is called balanced (balanced tree, hence the B-tree). Tree, in which each key points to the data block containing the required entry, called B *tree. It enables the integration of the area pointers and the data area. B *tree can be thought of as a network consisting of graphs. According to hypergraph, H (V, E) has a pair, where V - the set of vertices V = {v_i}, \( \text{i}\, \in \,\text{I = }\left\{ {1,2, \ldots ,\text{n}} \right\} \), and E - the set of edges E = {e_j}, \( \text{j}\, \in \,\text{J = }\left\{ {1,2, \ldots ,\text{m}} \right\} \); each edge is a subset of V. The vertexx \( {\text{v}} \) and the edge \( {\text{e}} \) are called incident, if \( {\text{v}} \in {\text{e}} \). For \( {\text{v}} \in {\text{V}} \) throught \( {\text{d}}({\text{v}}) \) is it defined the count edges, that are incident to \( {\text{v}} \); \( {\text{d}}({\text{v}}) \) is called the degree of vertex \( {\text{v}} \). Degree of edge \( {\text{e}} \)–the count of vertexes that are incident to this edge marked as \( {\text{r}}({\text{e}}) \). Hypergraph \( {\text{H}} \) is r-homogenious, if all the edges have the same degree \( {\text{r}} \). Ontology–a comprehensive and detailed formalized idea of a specific domain using conceptual frameworks consisting of concepts copies (classes), attributes (properties), functions (operations), axioms (facts) and links. To construct the ontological model it will be used extended base semantic hypergraph (XBSH). The nodes in these graphs represent the semantic attributes (properties and functions) of objects or entities, and the arcs represent the relationships between them. Note that XBSH structure is similar to the paradigm of object-oriented programming Therefore XBSH can be used to describe application software that can answer customer questions about the knowledge base. Concepts in the hypergraph structure described in trees that are converted into mathematical formulas. There are a number of papers that describes the semantic graph. Zhen L, Jiang Z. describes a model of semantic hypergraph as “hypergraph based on semantic web” that can provide more sophisticated semantic networks and more efficient data structure for storing knowledge in the repositories.

Weights of vertexes extended base semantic hypergraph: K = {k a }, a ∈ A = {1,2,…b}, where k a  = {S a ,V a ,E a }, S a  = \( \bigcup\limits_{j = 1,2, \ldots }^{ks} {s_{j}^{a} } \) – set of properties of class, V a  = \( \bigcup\limits_{j = 1,2, \ldots }^{kv} {v_{j}^{a} } \) ∈ k a set of class instanses, E a  = \( \bigcup\limits_{j = 1,2, \ldots }^{ke} {e_{j}^{a} } \) ∈ k a set of semantic arcs that are incident to class kscount of properties of class kvcount of class instances, keset of semantic arcs that are incident to class. Vertext-instance can be presented as a three of v i  = {k i ,S i ,E i }, гдe k i parent class, that means v i ∈ k i , where S i  = \( \bigcup\limits_{j = 1,2, \ldots }^{ks} {s_{j}^{i} } \) is a set of instace,s E i  = \( \bigcup\limits_{j = 1,2, \ldots }^{ke} {e_{j}^{i} } \) ∈ v i is a set of semantic arcs that are incident to instance kscount of class instances, kecount of semantic arcs that incident to class instance. This three, k a  = {S a ,V a ,E a }, we have changed to five k a  = {S a ,F a ,I a ,V a ,E a }, where F a a set of class function, I a a set of incapsulations in class. Vertext-instance of class can be presented as a five of v i  = {k i ,S i ,F a ,I a ,E i }, where F i is a set of instance functions, I a a set of instance.

The extention of semantic hypergraph has the same meaning and extention of context-free grammar to attribute grammar. That means if G = (N, T, P, S) is a context-free grammar, the attribute grammars defined as AG = (N, T, P, S, AS, AI, R), where N - set of nonterminals, T - a plurality of terminals (disjoint from N), P - the set of rules, S - the initial nonterminal, AS - a finite set of synthesized attributes, AI - a finite set of hereditary attributes (non-overlapping with AS), R - a finite set of semantic rules.

Here, attribute grammars entirely new mathematical tool, which allows to describe not only the structure of language units, but their attributes (semantic features). It is a well-known classical scientific result Donald Knuth. That’s how it should be understood the extention of semantic hypergraph, we have added two new characters that allow you to describe the set of functions of the class and a set of class encapsulation. But we do not have a name for the extended semantic hypergraph, so let’s call it an extended base semantic hypergraph. Knowledge stored in globals, we will handle both structured and ontology, which is based on XBSH.

2 Knowledge Presentation

Globals a data structure in an OODB Intersystems Cache`, usually multidimensional stored in a database and can be processed in a multiuser environment, the various processes. To evaluate all globals values, develop a catalog of electronic devices. For this we introduce the Global ^device, which is on the first level has the article ^device (article), and Global is the level of equipment properties ^device (article, property) on the second level, the level of properties of devices may comprise n-number of sublevels Device Properties ^device(article, property, subproperty,…, n). Also on the third level of article can be a function ^device (article, * func *). This level is determined by a key string *func*. All sublevels *func* contain a function object (in this case the device) ^device(article, *func*, function). Similarly, the level of device characteristics, the level of the functions may include n-number of sublevels function devices ^device(article,*func*, function, subfunction,…,n). Thus, we used multivariate data provided globals.

After we pointed out the properties and functions of the devices, we need to connect these devices for part numbers. For this purpose, the key line “rel” ^device(rel,article1,article2), here given direction from article1 to article2, so arc-relations have directionality. To set the bidirectional arc-relations need to create an entry in the form of “rel” ^device(rel,article2, article1). The levels of this line contains articles related devices, and in the values of these levels will contain the names of contacts, arcs, separated by a “/”.

For example, consider a \( ^{ \wedge } device\left( {''rel'',111587,111588} \right) = ''SameCompany / MadeOfIron'' \), further prepared adjacency matrix and drawn-arc connection Fig. 1.

Fig. 1.
figure 1

Adjacency matrix and relations between nodes.

Functions in global are written like:

Thus, we have created globals that store of knowledge about the devices. We will collect these globals into structured frames in a semantic network in our knowledge base about the devices. In Fig. 2 we gather our globals network.

Fig. 2.
figure 2

Representation of knowledge about the devices in the structural form.

Built semantic network based on frame is very similar to the structure of the objects, which is used in object-oriented programming paradigm. The semantic network with a complex structure where the properties and functions of multidimensionality there, it is difficult to do semantic search. Therefore, it is necessary to define the mathematical model for it, and then build an algorithm of semantic search based on it. A hypergraph is a pair of \( {\text{H}}({\text{V}}, {\text{E}}) \), but as we use the knowledge in addition to the properties and relations of functions and also has its own internal nesting hypergraph this type does not suit us. Therefore, we propose to extend the hypergraph by adding functions to the expanded core semantic hypergraph (XBSH). We use \( H(A, P,F,R) \), where H – XBSH, A – Id concept, P – property, F – functions, R – relations, semantic arcs. \( A = \{ A_{1} , A_{2} , \ldots ,A_{n} \} \), \( P = \{ P_{1} , P_{2} , \ldots ,P_{n} \} \), \( F = \{ F_{1} , F_{2} , \ldots ,F_{n} \} \), \( R = \{ R_{1} , R_{2} , \ldots ,R_{n} \} \), where \( P_{1} = \{ P_{11} , P_{12} , \ldots ,P_{1k} \} \), \( P_{2} = \{ P_{21} , P_{22} , \ldots ,P_{2k} \} \), and \( P_{n} = \{ P_{n1} , P_{n2} , \ldots ,P_{nk} \} \), also \( F_{1} = \{ F_{11} , F_{12} , \ldots ,F_{1k} \} \), \( F_{2} = \{ F_{21} , F_{22} , \ldots ,F_{2k} \} \), a \( F_{n} = \{ F_{n1} , F_{n2} , \ldots ,F_{nk} \} \). The composition would be \( A(x) \circ P(y) \circ F(z) \circ R(w) \). For example, data processing (to find an example in the graphs), we used the Dijkstra algorithm. Definition of Dijkstra algorithm: Aa set of vertexes in graph. Ra set edges in graph. wweight of all edges. In this case constant 1. a - the distance where it is searched from. Ua set of visited vertexes. d[u] - is equal to the closest way from a to vertex u when the algorithm is finished. l[u] - consists to the closest way from a to vertex u when the algorithm is finished. Algorithm:

Assign \( {\text{d}}\left[ {\text{a}} \right] \leftarrow 0, {\text{p}}\left[ {\text{a}} \right] \leftarrow 0 \) For all \( {\text{u }} \in {\text{A}} \) different from \( {\text{a}} \) assign \( {\text{d}}\left[ {\text{u}} \right] \leftarrow \infty \) while \( \exists {\text{v }} \notin {\text{U}} \) let \( {\text{v}} \notin {\text{U}} \) – vertex with minimum \( {\text{d}}[{\text{u}}] \) write \( {\text{v}} \) in \( {\text{U}} \) For all \( {\text{u}} \notin {\text{U}} \) that \( {\text{vu}} \in {\text{R}} \) if \( {\text{d}}\left[ {\text{u}} \right] > {\text{d}}\left[ {\text{v}} \right] + {\text{w}} \) then change \( {\text{d}}\left[ {\text{u}} \right] \leftarrow {\text{d}}\left[ {\text{v}} \right] + {\text{w}} \) change \( {\text{l}}\left[ {\text{u}} \right] \leftarrow {\text{l}}\left[ {\text{v}} \right],{\text{u}} \)

3 Processing Data

For a software implementation knowledge base we recorded test data about the devices in the Global ^device. Displays can be seen in the management portal system indicated in Fig. 3.

Fig. 3.
figure 3

Display the globals in the Management Portal.

With the data we have gathered in the global with frames and translated them into JavaScript data format Object Notation (JSON) Caché Object Script (COS) programming language to output the data to the client part, JSON is described in Fig. 4.

Fig. 4.
figure 4

Detail of JSON data.

Further data in the form of frames in JSON format are displayed in the client browser. To do this, use the JavaScript programming language with a framework AngularJS, is described in Fig. 5.

Fig. 5.
figure 5

Display data as XBSH.

Next, we carry out a test process according to the algorithm described above data. The algorithms for processing the input data are written as the name of the vertex. The program finds the shortest way to every vertex connected to the above algorithm, the result of the algorithm is described in Fig. 6.

Fig. 6.
figure 6

The result of the algorithm bypass (there is no way to the vertex ^a).

As a rule, knowledge base processing algorithm should solve problems in which questions like put “List all the possible options…”, “How many ways are there to…”, “Is there any way…”, “whether the object exists…”, etc. Next, we plan to improve the knowledge-processing algorithm to the above-mentioned level.

The task of filling the knowledge base and find it is closely associated with the responses nlp problems; it is necessary to convert the text in a natural language into the language of the Knowledge Base. The first algorithm is needed normalizing simple sentences. Further necessary syntactic and morphological analyzers. Also we need an algorithm able to identify concepts in the text, attributes, and actions of these concepts, as well as revealing the relationships between concepts. Names tops Knowledge Base - nouns. Properties tops – articles, adjectives, pronouns, numerals, adverbs, prepositions. Unions – will be used for the normalization of the proposals. Relations and functions – Verbs (interaction between concepts, XBSH top). Functions can change the property values, if something happens, such as a condition, and they can work closely with different properties and functions of the other peaks. Work on nlp described below.

4 Evaluation Results

To evaluate the work of the results, we have increased the number of vertices (with properties and functions), the number of links between them and calculated the time for which will be set aside for all the vertices Dijkstra algorithm described above, the results described in Table 1 and Figs. 7 and 8. Computer Options which conducted experiments Intel (R) Pentium (R) CPU B960 2.20 GHz 4 GB.

Table 1. The results of the experiment.
Fig. 7.
figure 7

Schedule of execution of the algorithm, the X-axis - the number of vertices, Y-axis - time to complete in seconds.

Fig. 8.
figure 8

Schedule of execution of the algorithm, the X-axis - the number of relations between nodes, Y-axis - time to complete in seconds.

As we can see the first graph (change the number of vertices) and the second graph (change the number of relations between nodes) have a similar circuit. That is, increasing the number of vertices and an increase in the number of relations between nodes have a similar circuit time interval changes to the algorithm, the changes here and there almost lonely affect turnaround time. For example, in experiments 10-13 the number of vertices is not changed, but the number of relations between them has grown, so has increased the time to process the data by the algorithm. Also, there is another example in 6-10 experiments increased the number of vertices and the number of relations between them are constant, it is sometimes even resulting in lower operating time algorithm. Then spend the calculation of the dependencies between the changes in the number of vertices, the number of connections between nodes and time for data processing algorithm. The most widely known Pearson’s correlation coefficient, of the degree of linear correlation between variables. It is defined as: \( {\text{r}}_{\text{xy}} = \frac{{\mathop \sum \nolimits ({\text{x}}_{\text{i}} - {\bar{x}})\, \times \,({\text{y}}_{\text{i}} - {\bar{y}})}}{{\sqrt {\mathop \sum \nolimits \left( {{\text{x}}_{\text{i}} - {\bar{x}}} \right)^{2} \, \times \,\mathop \sum \nolimits \left( {{\text{y}}_{\text{i}} - {\bar{y}}} \right)^{2} } }} \), where \( {\text{x}}_{\text{i}} \) – variable value \( {\text{X}} \); \( {\text{y}}_{\text{i}} \) – variable value \( {\text{Y}} \); \( {\bar{x}} \) – the arithmetic mean of the variable \( {\text{X}} \); \( {\bar{y}} \) – the arithmetic mean of the variable \( Y \). Then try to make a prediction on the performance. To calculate the correlation coefficients 1 and 2 of the schedule were selected (Figs. 7 and 8). Pearson’s correlation coefficient for the first schedule: 0.8737. Pearson correlation coefficient of the second schedule: 0.9458. With 100 000 000 (100 million) of vertices, the algorithm is similar to the car to complete its work ≈ 7500 s. At 1 000 000 (one million) links between nodes, an algorithm similar to the car to complete its work ≈ 14,000 s. For Dijkstra algorithm is critical is not the number of vertices in the graph, and the number of relations between the nodes. The figure of 14,000 s, an excellent result with a small performance computing. This was achieved by structuring knowledge in frames, thus the number of nodes has been reduced, and the semantics of the left and had a great expressive power. For each vertex in XBSH it is necessary create a thesaurus. It will be used in a smart learning and smart control of knowledge. These data hyperons, hyponym, synonym Meron and will give us in the future latent semantic links between nodes and their properties. They need to use advanced methods of processing knowledge bases.

5 Conclusion

The proposed method of storage (XBSH) will reduce the number of vertices in a semantic network, thereby increasing computing performance. Usually there is a certain subject area of are not more than 1,000 objects and subjects. That is, too much into the depth and width of 1000 peaks will take a little time from seconds to 0.241899 10.054957 s. In future work, we propose a new data search method will improve the client part of the input data to the knowledge base, will improve knowledge search method (search by name, by the properties, on relations between the nodes), connect nlp tools for self-learning knowledge base (autocomplete knowledge through text analysis natural language), we will try to enhance the expressive power of XBSH (set the weight on the ratio and the properties using the mathematical apparatus of fuzzy logic).