1 INTRODUCTION

There are quite a lot of studies of semantic relations in subject areas and representation of the resulting taxonomies in digital resources in the form of ontologies. Many scientific studies are devoted to data integration to obtain the most voluminous digital representation in the WEB. Mathematics, as a tool for many scientific studies, is especially in demand in the digital environment at all stages of education, from school, university, etc. In the field of mathematical sciences and applications, general education, training, and professional resources are being created. Professional resources are intended for specialists, but like the others, the more popular ones can be used to gain expert knowledge.

Among general ‘‘educational’’ resources, WikipediaFootnote 1 is the most popular, and BritannicaFootnote 2 is the most authoritative, since it provides verified encyclopedic information. In the field of educational resources, numerous special manuals are used, textbooks in the digital libraries of universities, for example, the library of Lomonosov Moscow State UniversityFootnote 3 , Library for Natural Sciences of RASFootnote 4 and many others. It should be noted that learning resources can include library resources and vice versa, but also be developed separately. In mathematical subject areas, as the most developed resources can be noted ZentralblattFootnote 5 , Math-Net.RuFootnote 6 , the English version of I. M. VinogradovFootnote 7 [1], Mathematical Encyclopedia of the University of CambridgeFootnote 8 . The listed resources are mostly integrated resources. That is, these resources contain dictionaries and thesauri of subject areas, as well as links to the literature of the subject area. The technology for creating such information resources is based on ontological design. An example of the implementation of this approach is a system built on the ontology of mathematical knowledge and presented in [2, 3, 15, 16]. The ontology of the subject area allows to combine various sources and integrate them within the framework of a single content of a digital library. A digital library based on ontological design here means the entire set of semantically related data presented in the content of a digital resource [2–8].

It is proposed here to consider the methodology for forming an ontology of an applied subject area based on the content of a digital semantic library. This approach is implemented based on the integration of subject area data into a single semantic structure, combining the links of dictionaries, thesauri, encyclopedias and real scientific publications of a thematic journal.

The results of this approach are demonstrated by examples of applications of the basic equations of mathematical physics and special functions in the field of problems of composite materials and structures. For example, one of the central problems of the theory of elasticity, the ‘‘Lame problem’’, was chosen, and the semantic links of articles within the content of the digital library were established. This made it possible to carry out one of the main search tasks for describing the applied subject area, namely, to provide the ability to find a publication using all library metadata, including classifiers, keywords, synonyms, and formulas.

The article describes the general approach of ontological modeling of the subject area and its subdomain and discusses an example of implementation by means of a digital library. The work is structured as follows: in Section 2, a description of the subject area is given, in Section 3, modeling of its subdomain based on a general ontology. To do this, Section 3.1 defines the purpose of such modeling. Section 3.2 describes a set of vocabularies based on which the domain subdomain is restricted. Section 3.3 defines a thesaurus extension to describe a subdomain of a subject area. Section 3.4 defines the approach used to incorporate new data into a general ontology based on its extended thesaurus. Section 4 shows an example of including subdomain data based on an extended domain thesaurus.

2 ABOUT THE SUBJECT AREA OF MATHEMATICS IN THE CONTEXT OF THE DIGITAL LIBRARY

For several decades of digitalization, the term ‘‘ontology’’ has penetrated into various fields of knowledge. Many researchers, scientists and specialists from different fields of knowledge use, develop or apply ontologies as a mechanism for representing subject areas in information systems. The term ‘‘ontology’’, the origins of which can be found in philosophy, was adopted in the research community of artificial intelligence to formally describe areas of knowledge [9, 10].

As mandatory properties of the description of knowledge in a format that should be defined as an ontology are:

(1) a finite and controlled dictionary of concepts and terms, excluding their ambiguous interpretation, and

(2) a strict hierarchy of relations of subclasses of concepts and terms that describe the knowledge of the subject area [9–13].

One of the features implemented in ontologies is the ability to use inference operations to identify new semantic relationships. This is done thanks to the languages of formal logic underlying ontologies. Therefore, an important element of ontologies is a set of axioms that allow you to extract new knowledge using the mechanisms of reasoning and the use of existing relationships.

At this point in time, a huge number of ontologies have already been developed for various subject areas that cover various aspects of human activity. This paper considers the scientific field ‘‘Mathematics’’ and the process of building and constructing an ontology for one of its applied sections in problems of mathematical physics based on the data sources and library content available from the sources [14, 15].

Note that an important feature of already developed ontologies is the possibility of their reuse to create new ontologies. In this work, research is based on the developed ontologies and thesauri of the subject area, which describe the scientific knowledge of the subject area ‘‘Mathematics’’ at various levels of detail [2, 14–17]. The use of a generalized domain ontology model makes it possible to focus on highlighting such metadata that allow designing specific data structures for various scientific subject areas, identifying common approaches to managing and processing this data. This approach allows you to structure and link various resources, extract from them and contextualize a variety of data, turning them into knowledge [9]. To define the subject area, a general knowledge model is used, within which various data sources can be integrated, and various taxonomies of concepts and terms verified by recognized experts in the scientific field are used. The Mathematical Encyclopedia [16] and the ontology of the scientific subject area, which underlies the construction of the LibMeta semantic library [17] together with the industry classifiers MSC [18] and UDCFootnote 9 , were used as the basic taxonomy.

3 ABOUT THE METHODOLOGY AND PROBLEMS OF ONTOLOGY CONSTRUCTION IN A PARTICULAR CASE

Despite the long history of ontological design, there is still no single standard for building ontologies, but there are various methodologies [22, 23, 31]. Methodologies define different concepts at different levels of conceptualization, but converge in the main sequence of steps necessary to build an ontology: defining the purpose of the ontology, highlighting the main top-level concepts and their properties, and highlighting the relationships between them.

In fact, after defining the goals and objectives of ontology formation, three main processes are distinguished

(1) the collection of data from sources and their ontologization;

(2) the representation of knowledge in the form of taxonomies, and;

(3) the implementation and construction of a formal model suitable for machine use in a specific task [23].

3.1 Determination of Goals

The purpose of developing an ontology of applications of the subject area is the need to integrate data describing applications of mathematics and scientific research in the field of these applications, reflected in scientific publications. As has been repeatedly noted [2, 4, 10], there is a certain gap in the representation of knowledge in digital resources and their reflection in bibliographic resources. Examples of the integration of mathematical knowledge and publications are Zentralblatt, and the English version of the Mathematical Encyclopedia, the mathematical digital ecosystem OntoMath [19–21]. And numerous bibliographic resources [2–6, 14] serve as examples of individual databases. This is due, of course, to the fact that the goals of these developments are different. Nevertheless, it is impossible not to note the fact that it is important for the scientific community to have exactly data integration, so that, being within the digital subject area, it would be possible to get acquainted with publications on the chosen topic. This possibility is realizable when creating a common semantic library, where, along with terminological links of the subject area, there are links with publications.

This is especially important in interdisciplinary studies, which do not always fit into the classical classification, which often makes it difficult to find these works. These areas are modern applications of the classical equations of mathematical physics, which some researchers propose to single out in the section ‘‘new applied mathematics’’. While this section does not exist and its scope is not even defined, the creation of an ontology of the applied area of mathematics is an urgent task of ontological design.

Determining the need to create an ontology of the applied subject area of mathematical physics, first of all, it is possible to determine its purpose—this is the use of a digital library of books, journals, publications and various scientific materials and resources for scientific and educational purposes. On the one hand, these resources are the content of the library, and on the other hand, they are sources of knowledge in this area, which requires classification and categorization. Such an ontology will enrich the data with horizontal and vertical semantic links and define implicit links, for example, between tasks and their applications and between persons associated with these resources. It will also allow you to mutually enrich already linked resources by assigning missing classifier codes or assigning keywords. The solution of this problem within the framework of the semantic library will allow us to come to the construction, in particular, of the thesaurus of the subject area, which will expand and be filled with the growth of information accumulated in the library.

3.2 Ontologization

The main data sources can be divided into two broad categories: this is data from journals and scientific publications that demonstrate the development of the subject area over time and the second category is textbooks, monographs, dictionaries and classifiers, which contain the main terminology on which this subject area is based. As the first type of source, we used the publications of the ‘‘MMKM’’ journal over the past 25 years, as well as thesauri, encyclopedia articles, and publications previously accumulated in the LibMeta library. The total number of texts in the corpus is about 10,000. As the second type of sources were used books, textbooks, monographs and terminological dictionaries recognized by experts in the subject area:

  • textbook of academicians of the Russian Academy of Sciences A. A. Samarsky and A. N. Tikhonov [24]. About 400 concepts were extracted from this source, which are descriptions of the main problems;

  • description of the subject area of continuum mechanics is based on the classification of academician of the Russian Academy of Sciences L. I. Sedov [25]. Sedov’s classification is used as a dictionary of basic terms and includes about 1200 terms;

  • the subject area of composite mechanics is based on the classification of academician of the Russian Academy of Sciences V. V. Vasiliev [26–28] in English and includes about 2500 terms that were extracted and used as a dictionary of the subject area under consideration;

  • glossary of terms from the section ‘‘Fullerenes’’ and related areas [29]. The main emphasis of the dictionary is on the use of terms in Russian-language monographs, educational literature, scientific articles and electronic sources. Contains more 850 terms in Russian;

  • glossary of terms from the section ‘‘Nanotubes’’ and related fields [30]. The main emphasis of the dictionary is also on the use of terms in Russian-language monographs, educational literature, scientific articles and electronic sources. Contains over 1000 terms in Russian;

  • dictionary of polymer composites based on GOST 32794-2014Footnote 10 . Contains about 500 terms that are presented in several languages;

  • note that the works of L. I. Sedov and V. V. Vasiliev are devoted to the study of equations of the theory of elasticity, special functions of mathematical physics and applications to specific problems. Glossaries of terms based on them are used to account for historical and modern connections in the selected subject area.

So, on the basis of these sources, the main terms and concepts of the subject area were determined, on the basis of which the thesaurus of the subject area is formed.

3.3 Taxonomies

Simultaneously with the construction of an ontology, it becomes necessary to limit it within a specific subject area of science. To do this, a set of concepts is introduced that are used to describe this subject area. The corresponding domain terms are associated with these concepts. Most often, these terms are organized as some kind of taxonomy with support for relationships between them. The structure of this taxonomy can vary in complexity depending on the area being modeled and, if necessary, represent a full-fledged thesaurus with all the richness of relationships. In the future, we will talk about thesauri as a means of organizing concepts (knowledge). Terms presented in this form can be used to process available resources. At the same time, links arise between concepts and resources. The scientific data of the subject area is understood as a set of verified concepts of the scientific subject area and the identified links between them and resources [11-13].

Separately, it is worth mentioning that the thesaurus of the subject area can be either the result of the work of an expert group or built by automated means. Questions of compiling domain thesauri are beyond the scope of this work, as is a discussion of methods for identifying relationships.

Below is an example Fig. 1 of a taxonomy from a dictionary of special functions with relationships. The text is presented in the RDF XML syntax, which reflects the connection of the term with the classifier and the mathematical encyclopedia in the LibMeta library.

Fig. 1
figure 1

Fragment of links the term ‘‘Lame’’ from the semantic library.

3.4 Ontology Development

The ontology of the applied subject area of mathematical physics is constructed on the basis of the ontology of the Libmeta semantic library. The ontology of the subject area ‘‘Mathematics’’, including a terminological description based on the ‘‘Mathematical Encyclopedia’’, has already been created earlier. This allows us to use an ontology of information resources to describe publications, persons, tasks, and related formulas to describe data retrieved from new data sources, as well as use a thesaurus to represent domain taxonomies and their relationships.

When defining relationships and concepts of the subject area within the framework of the ontology of the semantic library, three approaches are possible:

  • A ‘‘top-down’’ approach where the design of concepts and relationships starts from the most top-level concepts.

  • A ‘‘bottom-up’’ approach where the design of concepts and relationships starts from the lowest level data, which are grouped and form more general concepts, etc.

  • And a ‘‘combined’’ approach, when the basic concepts of a particular subject area are formulated, its data is cleaned and partially structured (for example, presented in the form of separate taxonomies) and it is necessary to link disparate data within a given subject area, while enriching it and specializing and refining the included resources.

In our case, we used a ‘‘combined’’ approach. Preliminary work on highlighting the top-level concepts of the subject area and the concepts necessary to describe the structure and relationships of its thesaurus was performed earlier when designing the library. And cleaning tasks were performed as part of preparing data for loading.

4 REVEALING THE CONNECTIONS OF PROBLEMS AND SOLUTIONS OF APPLIED PROBLEMS WITH THE CONTENT OF THE SEMANTIC LIBRARY

To solve the problem of expanding the content of the semantic library by including a new subject area, it is necessary to identify the links between the applied area and the concepts of a mathematical encyclopedia. It is also necessary to identify links with already accumulated dictionaries, thesauri, classifiers. The subject area of continuum mechanics and applications to sections of composite materials were considered.

Sources were analyzed and articles of the lexico-semantic index for the equations of the theory of elasticity were compiled. An array of formulas for the selected subject area has been created.

There are two examples of lexico-semantic index for subject thesaurus of ‘‘Equations of elasticity’’ in Table 1 for term ‘‘Lame equation’’ and term ‘‘Lame generalized boundary value problem for the gradient theory of elasticity of isotropic bodies’’ in Table 2.

Table 1 Descriptor ‘‘Lame equation’’
Table 2 Descriptor ‘‘Lame generalized boundary value problem for the gradient theory of elasticity of isotropic bodies’’

As an example, we present a small part of the thesaurus devoted to the Lame problem [32-34]. This article is a lexico-semantic index for the subject area of continuum mechanics, section composite materials corresponding to the concept of ‘‘Lame generalized boundary value problem for the gradient theory of elasticity of isotropic bodies’’. The structure of the thesaurus article includes the title, references to synonyms, references to related concepts, references to the literature on which the thesaurus was compiled. The thesaurus entry also includes a set of mathematical formulas related to this concept.

For brevity, we present the values of the positions ‘‘KW’’ and ‘‘NOTE’’ not in the form of a Table 2, but in solid text:

REF: Ильюшин А.А. Механика сплошной среды. М.: Изд-во Моск. ун-та, 1990. 310 с.

REF: Волков-Богородский Д.Б., Евтушенко Ю.Г., Зубов В.И., Лурье С.А. Численно- аналитический учет масштабных эффектов при расчете деформаций нанокомпозитов с использованием блочного метода мультиполей // Вычислительная математика и математическая физика, 2006, т. 46,№7, С. 1318–1337.

KW: Ламе уравнение обобщенное, краевая задача для Ламе уравнения обобщенного, математическая модель межфазного слоя в механике материалов, модель композитных структур с микро- и нано-включениями, модельтонкопленочных наноструктур, представление Нейбера–Папковича обобщенное, Ламе неоднородное уравнение динамики, Ламе однородное уравнение, Ламе оператор, векторная форма Ламе уравнения, индексная форма Ламе уравнения, решение уравнения Ламе представление Галеркина, Ламе волновое уравнение, Ламе полином, Ламе функция.

KW: Lame equation generalized, boundary value problem for the Lame equation generalized, mathematical model of the interfacial layer in mechanics of materials, model of composite structures with micro- and nano-inclusions, model of thin-film nanostructures, generalized Neuber–Papkovich representation, Lame inhomogeneous equation of dynamics, Lame homogeneous equation, Lame operator, vector form of the Lame equation, index form of the Lame equation, solution of the Lame equation Galerkin representation, Lame wave equation, Lame polynomial, Lame function

NOTE: Ламе уравнение обобщенное и краевая задача для Ламе уравнения обобщенного определяет математическую модель межфазного слоя в механике материалов или модель композитных структур с микро- и нано-включениями и модель тонкопленочных нано- структур.

NOTE: The Lame equation generalized and the boundary value problem for the Lame equation generalized determines the mathematical model of the interfacial layer in the mechanics of materials or the model of composite structures with micro- and nano-inclusions and the model of thin-film nanostructures.

The set of keywords associated with a thesaurus entry is compiled from the associated lookups and dictionaries that were listed in the previous section. These keywords, along with the descriptor and ascriptors of the concept, are used primarily to identify related materials from the content of the library to identify links in text analysis.

Subsets of concepts were identified from the mathematical encyclopedia and the dictionary of special functions for the concept of ‘‘Lame equation’’ first of all (see Fig. 2). These concepts, in turn, are associated with the elements of industry classifiers. They can be used as guidelines for use in classifying publications that do not contain such information.

Fig. 2
figure 2

Scheme of the inclusion the publication into subject area of the semantic library.

For example, in this case, the options:

(MSC)-33E10—Lamé, Mathieu, and spheroidal wave functions

(UDC)-517.589—Другие специальные функции и специальные числа

This list can be extended by allowing the transitivity of relationships and extracting data from the content of the library, which in turn are associated with these elements of the classifiers.

In a new array of data coming to the library (more than a thousand publications), based on the constructed links and thesaurus concepts, related tasks from 27 publications were identified. At the same time, text analysis, in turn, includes the identification of the main parts of the text in which matches with keywords are most valuable and indicate with high probability the description of tasks of this type in the publication.

Also, by identifying the links between the elements of the thesaurus and the mathematical encyclopedia, in parallel, we get the opportunity to add hierarchical links between concepts to the encyclopedia, based on their links in the thesaurus.

So, on the example of the Lame equation, connections were revealed that made it possible to present the place of this concept in the subject area of mathematics and mechanics within the framework of our data set. Relationships were analyzed using keywords, classifiers and formulas.

In the gray rectangles (see Fig. 2) are the concepts of the constructed thesaurus. In ovals with a solid outline the concept of mathencyclopedia. In ovals with a dotted outline, a concept from the dictionary of special functions. In gray ovals are the codes that we extracted and in the rectangles are the corresponding publications.

5 CONCLUSIONS AND FURTHER RESEARCH

The work is devoted to the problem of processing poorly (or insufficiently) structured information, which includes archival articles of national specialized journals. Quite often, they do not contain dedicated sections of annotations, keywords, classifiers and other structural parts of a scientific publication that are standard today. This makes it difficult to find them. However, they are of a certain scientific value as part of the applied scientific field. Modern semantic processing tools make it possible to analyze these texts and supplement them with a digital library in the relevant subject areas. This paper shows how, for one of the applied sections of problems of mathematical physics, the procedure for including arrays of publications of the journal ‘‘MMKM’’ into the ontology of the semantic library based on the available data sources and content of the library is implemented. Connections of a new array of publications with the mathematical encyclopedia, classifiers are obtained, the main concepts of the local subject area of the journal, keywords are identified, which will make it possible to compile a local thesaurus of the journal in the future. Thus, archival articles acquire additional properties, can be presented in search queries and be found thanks to the obtained semantic links.

Now, work on the creation of a semantic library for the considered subject area is ongoing. Further steps will be in the direction of building new links between the main tasks of the subject area and their application. To improve the quality of the description of the subject area in the corpus of texts, it is supposed to process new incoming journal texts to identify their keywords and compare them with the thesaurus of the subject area, built on the basis of the sources described in this article. On the one hand, we get a more accurate description of the subject area, and on the other hand, based on new data, we can offer help to the expert, automating his work in the task of expanding the thesaurus, replenishing it with new concepts and keywords. As a result of the work, we will get a marked up, verified corpus of texts on a given topic, which will allow us to use it in text processing tasks using modern machine learning methods for analyzing mathematical texts and formulas (in Russian aspecially).