1 INTRODUCTION

An analysis of current trends in the development of high-performance computing towards mathematical modeling of complex systems allows to draw several important conclusions. Usually, mathematical modeling of such systems related to solving NP-hard problems that requires high-performance computing systems. However, such systems can significantly differ in their computational capabilities, hardware and software platforms, system architectures, interconnects, user interfaces, and other components. Often, end-users are faced with various difficulties of their use [1], especially, when end-users switch to a new system:

• The need for considering the subject-domain specifics features, including the use of weakly structured data, presence of different uncertainties, changing of a problem sizes, etc.

• Implementing experiments in a heterogeneous systems, for example, the adaptation of problem-solving algorithms to different architectures,

• Predicting the performance and applications’ makespan.

It is extremely demanded the design of universal information and computing environment that provides instrumental support of preparing and carrying out large-scale experiment in a heterogeneous environment and its specialization for a specific subject-domain [2]. Such an environment must expand the spectrum of models, algorithms, applications, and subject databases used. In addition, it has to provide the opportunity for the participation of various scientific groups in the preparation and carrying out of experiments using the necessary resources.

Clouds, grids, and public access supercomputing centers are three organizational and functional platforms of parallel and distributed computing that are highly sought-after by the scientific community [3].

Grid-systems support standard software and services for collaborative sharing and managing of federated and geographically distributed resources for computing and data processing [4].

Cloud platforms provide elasticity and scalability of computation resources, data storage, or network bandwidth with the required quality [5]. However, end-users are forced to deliver, deploy, and manage necessary software and data on the top of the allocated infrastructure. Moreover, the use of cloud resources and services is often more expensive than carrying out experiments in the grid-systems.

Public access supercomputer centers have the following distinctive advantages [6]:

• Provision of paid and free resources and services according to administrative policies,

• Providing basic system and application software,

• Providing unified and tightly regulated access to computing resources,

• Evaluation of the effectiveness of computing resources,

• Technical and methodological support for the end users.

In general, the resources of public access supercomputer centers are non-dedicated and shared by all users of the center. However, users could want to use grid and cloud resources together to solve complex problems [7]. Differences in the methods of access to above-listed resources and their use significantly limit the scope of high-performance computing. Thus, the integration of grid and cloud computing including resources of public access supercomputing centers can provide tremendous opportunities for the implementation of mass large-scale scientific experiments [8].

In the paper, we propose new features of the Orlando Tools framework (OT) [9]. We apply OT as the basis of an integrated computing environment for developing and applying distributed applied software packages. It is assumed that the environment unifies three platforms discussed above: clouds, grids, and public access supercomputing centers.

Distributed applied software packages represent a special class of scientific applications [10]. In such packages, the software has a modular structure. Package modules implement algorithmic knowledge about solving problems.

An abstract program (problem-solving scheme) describes the set of modules used to solve a problem and order of their execution. Directed Acyclic Graph (DAG) is applied to reflect relations and data flow between modules.

The concept of a problem-solving scheme corresponds to an abstract scientific workflow [11]. The specification of a problem-solving scheme (computational job) for meta-schedulers describes information about the used package modules, their input and output data, requirements on computation resources, data storages, and communication network.

In contrast with scientific workflow systems, the packages are associated with solving a class of problems in a certain subject domain considering several conditions. The important ones include the changes in algorithmic knowledge, presence of various categories of package users, and use of integrated heterogeneous resources based on various parallel and distributed computing paradigms.

New features of OT provide the processing of weakly structured data, automation of the assembly, debugging, and joint testing of package modules, prediction of the modules runtimes, and their classification on heterogeneous resources. As a result, it improves the quality of resource allocation.

The advantages of this subsystem are demonstrated and validated in practice by the development of two versions of the package for solving the problem of the sustainable development of the energy sector of Vietnam: with [12] and without this subsystem [13].

In the first case, the results show a significant reduction of the overall time for preparing and conducting experiments.

The remainder of the paper is structured as follows. Section 2 provides a brief overview of approaches related to the integration and virtualization of resources with different computational characteristics into the unified heterogeneous environment. In addition, we give the current state in the continuous integration field related to the development of scientific applications. In Section 3, we consider OT for development scientific applications. We discuss a new subsystem for the continuous integration, delivery, and deployment of applied and system software. The Section 4 demonstrates mitigating uncertainty in mapping jobs on resources based on the job classification and administrative skills. Section 5 describes an example of a job flow management for a scalable application to solve the complex practical problem. The gained experience is discussed in Section 6. Finally, Section 7 concludes the paper.

2 RELATED WORK

An integrated software environment to support large-scale scientific experiments using various paradigms of high-performance computing is not discussed in depth in the scientific literature. Nevertheless, we can distinguish several interesting projects of great practical importance for the implementation of collaborative computing.

The modular toolbox Distributed and Unified Numeric Environment (DUNE) for solving partial differential equations is presented in [14]. It provides efficiency mechanisms in high-performance computing for scientific applications.

Integration of developed applications, for example, Computer-Aided Engineering systems, into a distributed computing environment can be performed on the basis of the service-oriented approach proposed in [15].

A new algorithm for computation managing in such an environment has been developed in [16].

The Project SPPEXA supports various studies in integrated developing computational algorithms, system software, applied software, data management and exploration, programming environment, and software tools [17]. A wide spectrum of scientific institutions involved in the project.

Afgan et al. [18] propose the web-based Galaxy platform for a scientific analysis of large biomedical datasets in genomics, proteomics, metabolomics, and imaging. Many scientists across the world actively use it for large-scale experiments.

Ananthakrishnan et al. [19] provide the elastic cloud platform Globus for the implementation, deployment, and operation of services to support similar experiments. Similar tools implemented using the multi-agent approach are offered in [20].

Often, scientists need to execute the composition of various services with certain quality criteria. The problem of creating such a composite combination of services is NP-hard. QoS-aware cloud service composition using eagle strategy is presented in [21].

The concept and functioning principles of an integrated computing environment are proposed in [2]. The authors provide the development tools for implementing the main stages of large-scale computational experiments and supporting a long-life cycle of applied and system software. However, existing projects leave many problems unresolved. The most important ones include [2, 17]:

• Study of application functioning in heterogeneous resources,

• Designing applications considering the architecture of the environment,

• Applying middleware to support the interaction between applications and operating systems,

• Integration of capabilities of middleware and applications for managing jobs and resources,

• Intellectualization of job management,

• Development of flexible interfaces for various categories of users to access to application components,

• Automation of the controlling, building, and testing of system and application software,

• Elicitation, analysis, and transformation of subject data.

These problems are also relevant for scientific workflows [22]. Their solution facilitates mitigating the uncertainty of the workflow distribution.

Scientific workflow systems and distributed applied software packages have several differences. Usually, a scientific workflow applications such as LIGO [23], Montage [24], CyberShake [25], SIPHT [26] or Epigenomics [27] are oriented to solve a single problem in some subject domain. A package is designed to solve a class of problems. Problem-solving schemes reflect the variety of algorithms used in the package. Each workflow has no large variability [28]. Only the input data are usually changed. Thus, there is no need to support the continuous integration of their software.

On the contrary, packages are actively developing by their users. The permanent changes of algorithmic knowledge are expected.

In the packages, new problem-solving schemes can be designed and existing schemes can be modified. In [29], we provide an analysis of the state of the art in this field. It shows that, currently, there are no tools, which would allow us to implement the continuous integration of software for distributed application packages.

To solve this problem, the development of new specialized tools is required. Such tools should implement special functions for continuous integration caused by the package specifics and interact with existing continuous integration tools that implement traditional functions. The provided comprehensive comparative analysis of the possibilities of known tools for continuous integration shows that the GitLab system is most suitable for implementing such interaction [30].

There are various categories of package users (developers, administrators, and end-users). Developers create and modify the package. End-users apply it to solve a class of problems provided by the package. They can formulate new problems and create various schemes for their solving using existed, modified or newly applied software. Administrators determine resource use policies for running the package. Users may have various affiliations and be in different geographically distant regions.

Unlike most modern scientific workflows [31], packages are developed to run in a heterogeneous distributed computing environment that can integrate resources of cloud platforms, grid-systems, and public access supercomputing centers.

Resources that are integrated within such an environment can be dedicated and non-dedicated. Scientific workflow systems usually apply traditional virtualization tools [32]. Their capabilities are not enough to integrate the aforementioned resources into unified virtual environment.

A detailed comparative analysis of virtualization systems is represented in [12]. It showed that in a dynamic environment it is necessary to apply middleware to support effective interaction with various hypervisors and container management systems. Such a middleware that can be used in dedicated resources is the OpenStack platform [33]. However, for non-dedicated resources, it is necessary to develop a specialized tool for launching virtual machines (VM).

The aforementioned differences cause various types of uncertainties that arise in computation planning and resource allocation. Among them are uncertainties in the resource capabilities, jobs characteristics, environment states, preferences of resource owners and users, quality of services, etc. From uncertainty point of view, the main challenges in scalable scientific applications while tackling applications at scale are computation planning and resource allocation. Studies of improving computation planning and resource allocation through decreasing the above-listed uncertainties are actively developing.

Methods for the analysis of workflow structures and their decomposition are considered in [3436]. The workflow analysis allows clarifying the components properties. The decomposition is carried out in order to rationally distribute the components on resources.

In addition, automation of planning workflows (problem-solving schemes) for non-procedural formulations of problems enables significantly simplify the selection of applications needed for solving problems if there is uncertainty in the software use [37].

A new algorithm for decreasing the jobs distribution uncertainty of problem-oriented applications is proposed in [38]. It considers both the specifics of problems being solved and features of available resources.

Ramírez-Velarde et al. [39] study adaptive grid-resource allocation under uncertainty. They propose a special model of execution delay for job makespan prediction and a new strategy for adaptive stochastic allocation of jobs.

Tchernykh et al. [40] propose an approach to modeling cloud computing with uncertainty addressing resource provisioning in a hybrid public-private cloud environment. This study provides improving the efficiency of cloud resource provisioning considering problems of their confidentiality, integrity, and availability.

Babenko et al. [41] proposed the arithmetic error correction codes to reduce uncertainty completion time, deadlines, reliability, and cost. This approach focuses on redundant computing systems.

Uncertainty of quality of services associated with the preference coordination for owners and users of resources can be successfully provided using multi-agent technologies. For example, Singh [42] describes a Multi-Agent System (MAS) to support the scalability in cloud computing by finding the clouds satisfied to user’s requests when the currently used cloud becomes overloaded. Kalyaev et al. [43] present the multi-agent scheduling in cloud computing environments considering job completion deadlines. Improving multi-agent scheduling is due to emergent knowledge arising in the process of local interactions of agents.

Keeping order in computation planning and resource allocation within an unspecialized heterogeneous distributed computing environment becomes increasingly complex. Often, some practical situations are complicated. Thus, they simply have no workable deterministic solution that can be obtained in an acceptable time.

At the same time, probabilistic methods provide simple and effective prediction mechanisms using retrospective statistic data and expert judgment for the future. For example, Prieto et al. [44] demonstrate that a probabilistic approach to the decentralized management of a computing network allows them to solve the challenges of the efficient resource usage, scalability, robustness, and adaptability.

In this paper, we represent a new subsystem of the OT framework. This framework is a foundation for an integrated computing environment for packages. It implements the necessary tools for mitigating uncertainty in the job distribution on resources.

3 OT: FOUNDATION FOR INTEGRATED COMPUTING ENVIRONMENT

The architecture of the integrated computing environment is shown in Fig. 1. The OT subsystems represent its basis. External information and computation systems and resources are used in developing and applying distributed applied software packages.

Fig. 1.
figure 1

The architecture of an integrated computing environment.

The following main subsystems of are distinguished:

• A user interface that provides a variety of ways for different user categories to access to the package components, subsystems, and information and computation resources,

• Designer of a conceptual model of the environment that implements the knowledge specification about the subject domain of solved problems,

• Designer of module libraries that supports the development and modification of applied software for packages,

• Software manager, which implements automation of the version control, testing, delivery, and deployment of both the applied software of packages and system software of the environment,

• Computation manager that implements computing management on heterogeneous resources of the environment,

• Environment state manager that observes resources,

• API for access to external information and computation systems, and resources.

Knowledge about the subject domains of packages, information about environment resources and computation systems are stored in the corresponding sections of the OT knowledge base. To elicit such knowledge, various databases can be used. The source data and computation results are stored in the computation databases. In the process of computing, additional storages can be used.

3.1 User Interface

Components of the OT subsystems are represented by services. We implemented web-based access for users to the operations. Services that represent components accessed by other software entities (other components, package modules, or external software systems) have an additional description in the Web Services Description Language (WSDL) [45].

3.2 Conceptual Model Designer

A conceptual model of the environment formed by this designer consists of three parts. The first part describes the algorithmic knowledge about solving problems of the subject domain with the package. The second part contains information about infrastructural objects of the environment. Knowledge about the continuous integration, delivery, and deployment of software on the environment resources forms the third part of the model.

Traditionally, algorithmic knowledge is the result of a structural analysis of the subject domain. In OT, its description is formed using a special set of concepts. Among them are such basic concepts as parameters, operations, modules, problem-solving schemes, etc.

Parameters represent the characteristics of the subject domain necessary in the problem-solving process. Operations are defined above a set of parameters. They associate a subset of the input parameters, whose values must be known, to another subset of the output parameters, whose values need to be calculated. Information relations between operations are determined based on their input and output parameters. Productions determine the logical conditions of applying operations in the problem-solving process. These conditions take into account the intermediate results of computations and environment resources state.

Package developers and their end-users create problem-solving schemes on the procedural or non-procedural problem-formulations. These formulations are based on information relations between operations. The procedural formulation is a list of operations required to solve the problem and partially ordered within a tiered-parallel form of DAG. A non-procedural formulation is represented in the following form: “Using the initial subset of parameters, whose values are known, calculate the parameter values of the target subset”. In the case of such a problem-formulation, the list of operations required to solve the problem and their partial order are determined automatically.

Modules are associated with software implementations of operations. Thus, each problem-solving scheme through its operations determines a set of modules necessary for the problem-solving process implementation.

A module description can include the following information: module source code; programming language; type and purpose of input, output, and transit parameters; methods for passing parameters, handling the non-standard execution, required compiler and its options, call format, etc.

We represent new features of OT for the automation of version control and testing of software. In this regard, the module description is expanded with additional information. Now, the description includes the following additions: links to repositories with source and binary module files, scenarios for building and testing modules, information about test data, and methods of delivery and deployment.

OT supports testing the modules in both individually and as part of problem-solving schemes. The conceptual model constructor of OT provides the development of necessary scripts for external systems. These scripts implement scenarios of continuous integration, delivery, and deployment of software.

Information about infrastructure objects of the environment includes the characteristics of nodes, network devices, network topology, and other facilities. In addition, these characteristics include statistics on software and hardware failures. Information about administrative policies defined for resources is taken into account. It includes data on users and their problems, access rights to resources and quotas for their use, properties of resource management systems, their characteristics, and disciplines of computation dispatching.

Information about the hardware and software infrastructure and administrative policies is elicited using various information and computation systems, controlling and measuring means, and monitoring systems. It is reflected in the conceptual model using system parameters, operations, and modules.

A description of the conceptual model is stored in the OT knowledge base. Sections of the knowledge base are distributed between OT, a meta-monitoring system, and external systems for continuous integration, delivery, and deployment of software.

3.3 Module Libraries Designer

The applied software of packages can be formed on the basis of existing software libraries or created directly by package developers. In both cases, the module description is included in the conceptual model of the environment. OT provides the text editor of the module source code and tools of interaction with external systems for the building, debugging, and testing of modules.

The system software of packages is formed on the basis of the OT system libraries.

3.4 Software Manager

The software manager is a new subsystem of OT. It is designed to identify and partially eliminate potential troubleshooting during the continuous integration, delivery, and deployment of new versions of both the applied and system software of packages. Its main functions include the following operations:

• Interacting with repositories of source and binary files of the applied and system software, including the support of its version control,

• Automation of building and testing the applied and system modules individually,

• Creating the VM images for executing modules,

• Automation of testing the applied modules as part of problem-solving schemes,

• Predicting the runtime of the applied modules in the environment nodes,

• Automation of the delivery and deployment of the applied and system modules in the environment.

The general scheme of continuous integration, delivery, and deployment of applied and system software is shown in Figure 2.

Fig. 2.
figure 2

Scheme of the continuous integration, delivery, and deployment of software.

New tools for a meta-description of performing all the above-listed operations are first implemented in the conceptual model designer. Information about nodes performance obtained during the testing of modules allows us to determine how much does resource performance matches the module execution criteria.

Such determining is implemented by the environment administrators in advance, before the start of the resources allocation in the process of computational experiments. This ensures significant mitigation of the uncertainty in the subsequent process of the distribution of jobs to resources.

3.5 Computation Manager

A computation manager includes the components for representing problem-formulations and creating problem-solving schemes, generator of jobs for executing the created schemes, scheduler, and interpreter. The scheduler decomposes the initial problem-solving schemes represented in jobs into subschemes in order to optimize the distribution of computing and communication loads on the environment resources.

The interpreter performs the implementation of schemes and subschemes for solving problems.

In addition, the computation manager can transfer jobs for processing to MAS [46]. It is implemented using the Java Agent DEvelopment framework (JADE) [47].

This MAS belongs to the class of meta-schedulers such as GridWay [48] or Condor DAGMan [49]. At the same time, unlike similar tools, it allows us to control the job flows in an environment that integrates the resources of the cloud platforms, grid systems and public access supercomputer centers.

MAS includes agents for classifying jobs, monitoring and dispatching environment resources, as well as agents directly representing integrated resources (resource agents). Resource agents come together in a new virtual community to execute each job.

A condition for including an agent in a community for executing the specific job is the agent has the resources that have the necessary capabilities to execute the modules of this job.

The allocation of resources by agents is shown in Fig. 3. The classification agent determines the class of the job and forms a virtual community of resource agents. The job classification system is discussed in detail in [50]. The classified job is queued to the dispatching agent.

Fig. 3.
figure 3

Resource allocation by agents.

In processing the job queue, the dispatching agent decomposes them into subjobs, which are then distributed among the segments of the integrated computing environment, representing cloud platforms, grid systems, and public access supercomputer centers.

When modules of subjobs are executed, special VMs are launched. Software manager previously creates the images of these VMs. Running VMs are combined into a united virtual runtime environment for a package.

Cloud resources are managed using the OpenStack platform. To start VMs on the resources of grid-systems and public access supercomputer centres, new jobs are formed for Local Resource Managers (LRMs) installed on these resources. New jobs are sent to the common queues of LRMs. To speed up the process of executing jobs, nodes with free slots are searched in their job service queues during the resources allocation. Each slot determines the number of cores that are used or can be used to execute a job over a period of time. Overall, agents provide mitigation of uncertainty in the job distribution by resources. Such mitigation is carried out automatically in four stages.

At the first stage, the primary classification of a job is implemented by analyzing the job specification using a special characteristic function. The formal correspondence of job characteristics and their values to the characteristics of the predefined classes and their value domains is checked. In consequence of the primary classification, a job may belong to several classes.

The selection of resources corresponding to the job classes is performed in the second stage.

In the third stage, the secondary classification of a job is performed based on the analysis of statistical information about the computational history of job executions and their correspondence to the predefined classes. The analysis results are used to adjust the outcomes of the primary classification.

At the last stage, the results of the secondary classification are refined using the previously obtained data in testing applied software by the software manager. For this purpose, we can also use information about the predicted runtime of modules.

The agent for classifying jobs performs the first two stages. The agent for dispatching environment resources carries out the last two stages. Preliminarily, administrators define the necessary job classes in the classification system. Then they establish the correspondence of all their resources to the defined classes. Administrators are made the decisions based on their practical experience.

The final lob distribution by resources is performed by resource agents using a tender of computational works (jobs) [51]. Within the tender, bidding is based on the closed second-price Vickrey auction [52]. It is held in one round. During the bidding, additional conditions for job execution are taken into account. These include the preferences of resource owners and the criteria for the problem-solving efficiency, determined by users of the environment.

3.6 Environment State Manager

This manager receives information about the environment state from a special meta-monitoring system implemented by applying a multi-agent approach [46]. The meta-monitoring system collects information about the state characteristics of environment infrastructure objects using the controlling and measuring means (Sensors, APC PowerChute, IPMI, SMART, etc.) and local monitoring systems (Ganglia, Nagios, Zabbix, ZenOSS, etc.). In addition, it observes the performance parameters of control systems that operate on these objects.

Based on the information collected, the meta-monitoring system assesses the current computing situation, predicts its development, and generates control actions on infrastructure objects and control systems in order to prevent or partially eliminate failures of software and hardware. In case of a critical situation, when such actions cannot be performed automatically, the meta-monitoring system sends a notification to the environment administrator.

3.7 Access to Information and Computation Systems and Resources

OT provides web-oriented access for users to information and computations systems and resources of the integrated computing environment. This framework interacts with the systems and resources using the protocols supported by them.

4 MITIGATING UNCERTAINTY

We consider uncertainty in terms of a deficiency of the available knowledge in relation to the knowledge needed to make the near-optimal decision. In our case, such necessary knowledge is information about the module runtime in heterogeneous nodes of the environment, job classification in accordance with their characteristics, and correspondence of resource capabilities to job classes.

4.1 Evaluating the Module Runtime

We developed a special model for evaluating the runtime of modules in the problem-solving scheme. In the process of testing, the module runs on the reference node from the software environment for profiling programs. We execute the module and determine its time spent on working with various components of the node taking into account its computational characteristics. The module runtime on the target node is predicted with some error by comparing the characteristics of the reference and target nodes.

Within the developed model, the sets CR = \(\{ c{{r}_{1}},c{{r}_{2}}, \ldots ,c{{r}_{m}}\} \) and \(CT = \left\{ {c{{t}_{1}},c{{t}_{2}}, \ldots ,c{{t}_{m}}} \right\}\) of characteristics of the reference and target nodes are formed. The «one-to-one» correspondence is set between cri and \(c{{t}_{i}}\), \(i=\overset{\acute{\ }}{\mathop{1,m}}\,\).

We form the set \(P = \left\{ {{{p}_{1}}\left( d \right),{{p}_{2}}\left( d \right), \ldots ,{{p}_{n}}\left( d \right)} \right\}\). The element pj(d), \(j=\overset{\acute{\ }}{\mathop{1,n}}\,\) reflects the computational load of a node when the module is run. Among them are the number of integer operations performed, the number of floating point operations, the number of accesses to main memory and cache memory of different levels, the number of misses of such accesses, the number of disks read and write sessions, etc. The load depends on the data d processed by this module.

Then, the functions \({{f}_{l}}\left( {CR,P,g\left( d \right)} \right)\) are defined. They calculate the work-time of the lth component of a node when it performs the computational load, \(l=\overset{\acute{\ }}{\mathop{1,k}}\,\). The parameter g(d) is an interpolation function that determines the dependence of \({{\hat {T}}_{r}}\left( d \right)\) from the amount of data d experimentally.

The module runtime on the reference node is evaluated by the following expression:

$${{\hat {T}}_{r}}\left( d \right) = \mathop \sum \limits_{l = 1}^k {{f}_{l}}\left( {CR,P,g\left( d \right)} \right),$$

where g(d) is a manually fitted interpolation function that determines the dependence of \({{\hat {T}}_{r}}\left( d \right)\) from the amount of data d experimentally. The error \(\varepsilon \) of this evaluation is determined from the difference in Tr(d) and \({{\hat {T}}_{r}}\left( d \right)\): \(\varepsilon = {{T}_{r}}\left( d \right) - {{\hat {T}}_{r}}\left( d \right)\). A theoretical evaluation of the upper and lower bounds of the ε error is not given since in each particular case it depend on the implementation of the functions \({{f}_{l}}\left( {CR,P,g\left( d \right)} \right)\).

The time \(\hat {T}{}_{t}^{'}(x)\) of the module run in the target node is evaluated by the following expression:

$$\hat {T}{}_{t}^{'}(x) = {{\hat {T}}_{t}}\left( d \right) + \varepsilon \frac{{{{{\hat {T}}}_{r}}\left( d \right)}}{{{{{\hat {T}}}_{t}}\left( d \right)}},s$$
$${{\hat {T}}_{t}}\left( d \right) = \mathop \sum \limits_{l = 1}^k {{f}_{l}}\left( {CT,P,g\left( d \right)} \right).$$

To evaluate the module runtime \(\hat {T}{}_{t}^{'}(x)\) in the target node, we propose the algorithm that includes the following steps:

• Defining the reference and target nodes,

• Determining the characteristics \(c{{r}_{1}},c{{r}_{2}}, \ldots ,c{{r}_{m}}\) of the reference node based on the knowledge about resources from the conceptual model of the environment,

• Determining the characteristics \(c{{t}_{1}},c{{t}_{2}}, \ldots ,c{{t}_{m}}\) of the target node based on the knowledge about resources from the conceptual model of the environment,

• Preparing the data d in the computation database of the OT,

• Deployment the profiler, module, and prepared data on the reference node using the software manager,

• Starting the profiler on the reference node,

• Launching the module from the profiler environment,

• Executing the module on the reference node,

• Getting the values of profiler counters,

• Determining the real module runtime \({{T}_{r}}\left( d \right)\) using the obtained values,

• Calculating \({{p}_{1}}\left( d \right),{{p}_{2}}\left( d \right), \ldots ,{{p}_{n}}\left( d \right)\) using the obtained values,

• Evaluating the module runtime \({{\hat {T}}_{r}}\left( d \right)\) on the reference node,

• Evaluating the module runtime \({{\hat {T}}_{t}}\left( d \right)\) on the target node,

• Calculating the error ε,

• Evaluating the module runtime \(\hat {T}{}_{t}^{'}(x)\) on the target node.

In the algorithm, the characteristics \(c{{r}_{1}}\), \(c{{r}_{2}}\), …, \(c{{r}_{m}}\), \(c{{t}_{1}}\), \(c{{t}_{2}}\), …, \(c{{t}_{m}}\), components of d, the module runtime Tr, evaluation \({{\hat {T}}_{r}}\), evaluation \(\hat {T}{}_{t}^{'}\), and error ε are the system parameters. The functions \({{p}_{1}}\), \({{p}_{2}}\), …, \({{f}_{1}}\), \({{f}_{2}}\), …, \({{f}_{k}}\), \(g\), and implementations of the expressions for calculating \({{\hat {T}}_{r}}\), \(\hat {T}{}_{t}^{'}\), and \(\varepsilon \) are the system modules. The system parameters and modules are described in the conceptual model of the environment.

The evaluation of the module runtime obtained with the help of the proposed algorithm is rather approximate. In addition, the automatic selection of the interpolation function g(d) is difficult to implement for some classes of modules. However, such an evaluation in some cases can significantly improve end-user evaluations and evaluations based on computational history.

4.2 Mitigating Uncertainty in the Job Distribution by Resources

To evaluate the degree of uncertainty mitigation in the job distribution by resources, we apply an informational binary entropy. This parameter is often used to measure the uncertainty of a complex system. It is assumed that a decrease in entropy leads to mitigation of uncertainty.

Let m classes be found from the results of the primary classification of a job that is satisfied with the characteristics of these classes. The environment includes n heterogeneous resources. One class can correspond to k resources, \(k\in \overset{\acute{\ }}{\mathop{1,n}}\,\). Information about the correspondence between classes and resources is represented by a Boolean matrix Y of dimension \(n \times m\). The matrix element \({{y}_{{ji}}} = 1\) (\({{y}_{{ji}}} = 0\)) shows that the jth resource corresponds (not corresponds) to the ith class.

We define the entropy Ec, which reflects the uncertainty degree in belonging the job to one of the classes, as follows:

$${{E}^{c}}\left( {{{x}_{c}}} \right) = \mathop \sum \limits_{i = 1}^m {{p}_{i}}\left( {{{x}_{c}}} \right)E_{i}^{c}\left( {{{x}_{c}}} \right),$$

where xc is a random event that determines that the job belongs to one of m classes, \({{p}_{i}}\left( {{{x}_{c}}} \right)\) is the probability of such an event relative to the ith class, \(\mathop \sum \limits_{i = 1}^m {{p}_{i}}({{x}_{c}})\) = 1, \(E_{i}^{c}\left( {{{x}_{c}}} \right)\) is the entropy corresponding to this event. The value of \(E_{i}^{c}\left( {{{x}_{c}}} \right)\) is calculated as follows:

$$E_{i}^{c}({{x}_{c}}) = \left\{ \begin{gathered} 0,\quad {\text{if}}\quad {{p}_{i}}({{x}_{c}}) = 0, \hfill \\ - {\text{lo}}{{{\text{g}}}_{2}}{{p}_{i}}({{x}_{c}})\quad {\text{otherwise}}. \hfill \\ \end{gathered} \right.$$

In the same way, we define the entropy Er, which reflects the uncertainty degree in belonging the job to one of the classes. It will take the following form:

$${{E}^{r}}\left( {{{x}_{r}}} \right) = \mathop \sum \limits_{j = 1}^n {{p}_{j}}\left( {{{x}_{r}}} \right)E_{j}^{r}\left( {{{x}_{r}}} \right),$$

where xr is a random event that leads to allocate one of n resources to the job, \({{p}_{j}}\left( {{{x}_{r}}} \right)\) is the probability of such an event relative to the jth resource, \({{E}^{r}}\left( {{{x}_{r}}} \right)\) is the entropy corresponding to this event. The values \({{p}_{j}}({{x}_{r}})\) and \(E_{j}^{r}\left( {{{x}_{r}}} \right)\) are calculated as follows:

$$\begin{matrix} {{p}_{j}}\left( {{x}_{r}} \right)=\mathop \sum \limits_{\forall l:{{y}_{jl}}=1}\,{{p}_{l}}\left( {{x}_{C}} \right)\text{/}\mathop \sum \limits_{j=1}^n \,\mathop \sum \limits_{\forall l:{{y}_{jl}}=1} \,{{p}_{l}}\left( {{x}_{C}} \right), \\ l\in \overset{\acute{\ }}{\mathop{1,m}}\,,\quad \mathop \sum \limits_{j=1}^n\,{{p}_{j}}\left( {{x}_{r}} \right)=1, \\ \end{matrix}$$
$$E_{j}^{r}\left( {{{x}_{r}}} \right) = \left\{ \begin{gathered} 0,\quad {\text{if}}\quad {{p}_{j}}\left( {{{x}_{r}}} \right) = 0, \hfill \\ - {\text{lo}}{{{\text{g}}}_{2}}{{p}_{j}}\left( {{{x}_{r}}} \right)\quad {\text{otherwise}}. \hfill \\ \end{gathered} \right.$$

Let us to consider an illustrative example of mitigating uncertainty in the job distribution by resources.

Suppose that a job was sent into the environment to execute a module specified in this job. According to the results of the primary classification, 8 classes were found whose characteristics are satisfied by the job. These classes differ among themselves in the value domain of one characteristic that determines the admissible job makespan. The value of this characteristic is not specified in the job specification.

The environment includes 3 heterogeneous resources (3 clusters, the nodes of which differ in their computational characteristics). The correspondence of resources to classes is represented by a matrix

$$Y = \left[ {\begin{array}{*{20}{c}} 1 \\ 0 \\ 0 \end{array}\begin{array}{*{20}{c}} 1 \\ 0 \\ 0 \end{array}\begin{array}{*{20}{c}} 1 \\ 0 \\ 0 \end{array}\begin{array}{*{20}{c}} 1 \\ 1 \\ 0 \end{array}\begin{array}{*{20}{c}} 0 \\ 1 \\ 0 \end{array}\begin{array}{*{20}{c}} 0 \\ 1 \\ 0 \end{array}\begin{array}{*{20}{c}} 0 \\ 1 \\ 0 \end{array}\begin{array}{*{20}{c}} 0 \\ 1 \\ 0 \end{array}} \right].$$

There is a computational history of performing such a job on the environment resources.

Table 1 shows the number of stages, values \({{p}_{1}}\left( {{{x}_{c}}} \right)\), \({{p}_{2}}\left( {{{x}_{c}}} \right)\), …, \({{p}_{8}}\left( {{{x}_{c}}} \right)\), Ec, \({{p}_{1}}\left( {{{x}_{r}}} \right)\), …, \({{p}_{3}}\left( {{{x}_{c}}} \right)\), and Ec for each stage of mitigating uncertainty in the job distribution by resources.

Table 1. Event probabilities and entropy parameters

At the first stage, the probabilities of belonging the to one of 8 classes are equal to each other, just as the probabilities of allocating one of 3 resources to the job are equal. In this case, \({{E}^{c}} = 3.00\) and \({{E}^{r}} = 1.06\).

At the next stage, two resources were selected that are most suitable for the job execution from the point of view of expert experience of resource administrators. In this case, the uncertainty degree in allocating resources to the job decreases to \({{E}^{r}} = 0.99\).

The secondary classification of the job based on the analysis of statistical information about the computational history concretizes the probabilities of belonging the job to classes obtained at the third stage. The value of Ec becomes 1.14. In accordance with this fact, the probabilities of allocating resources to the job change. The uncertainty degree in allocating resources to the job is significantly reduced to \({{E}^{r}} = 0.09\).

At the last stage, the results of the secondary classification are refined based on the predicted module runtime or previously obtained data of its testing. As a result, \({{E}^{c}} = 0.34\) and \({{E}^{r}} = 0.09\). Thus, resource 1 is allocated to execute the job.

It should be noted that according to the results of the secondary classification of jobs, they could belong to wider sets of classes in comparison with the refined classification based on module testing. This is because the computational history may contain information about the execution of modules in debugging and modules executed using incorrect data. In this case, some job characteristics, for example, their makespan, may go beyond the boundaries of value domains in classes that really correspond to these jobs.

5 EXPERIMENTAL ANALYSIS

5.1 Package for Problem-solving the Search for Directions of Sustainable Development of the Energy Sector

The search for directions of sustainable development of both the energy sector as whole and individual energy systems is a typical problem. The need to solve it repeatedly arises in the process of their operation and development. To solve this problem, the Corrective package was developed. It has been used to study the sustainable development of the energy sector of Vietnam [53].

We study the energy sector development taking into account all combinations of large-scale disturbances that can lead to insufficient provision of end-consumers in energy resources. Such a study causes a large number of possible states of the energy sector and difficulties in analyzing these states by the classical methods.

In this regard, in the package, we use a new approach to the study of the energy sector through combinatorial modeling. It is based on considering various combinations of possible energy sector states and transitions between them for the given period T. Since the entire set of possible states is very large, a set of admissible states is only examined.

We represent a set of states as directed graph \(G{\text{ = < }}V,E > \)_x005F_x0001_, where \(V\)_x005F_x0001_ is a set of vertices (possible states of the energy sector), and \(E\)_x005F_x0001_ is a set of edges (transitions between states). Edges are directed from the state s0 at the initial time \({{\tau }_{0}}\) towards states _x005F_x0001_at the next time moment \({{\tau }_{1}}\) and further towards states at the next time moment \({{\tau }_{2}},{{\tau }_{3}},..\). (Figure 4). Each vertex reflects a state of interconnected objects of the energy sector. Such state includes the operation, reconstruction, modernization, and creation of objects. Each object performs one or more technological work (storing, production, processing, transformation, and transportation of energy resources).

Fig. 4.
figure 4

Fragment of the graph \(G\).

The combinatorial analysis is carried out by examining various combinations of object states and transitions between states for the period \(T\). The objective is to search a subgraph of the graph \(G\) that includes the paths with a minimum cost of energy sector development.

The general scheme for problem-solving is shown in Figure 5.

Fig. 5.
figure 5

Problem-solving scheme.

The operation \({{o}_{5}}\) prepares these scenarios for the development of the energy sector of Vietnam. It forms the criteria for evaluating the energy sector functioning, list of actions for its development, set of organizational and functional constraints, simulated period, and other initial parameters.

Then, based on the prepared data, the operation o1 selects a scenario and constructs a variant of the appropriate sector development model. The operation o2 decomposes the model into submodels taking into account various logical and balance constraints. The submodels are examined in parallel using instances of the operation o3.

In each case, we solve the following linear programming problem:

$$f\left( S \right) = CX + P\left( {R - Y} \right) \to {\text{min}},$$
$$S = A,X,Y,C,P,D,R,U,W > ,$$
$$AX - Y \geqslant 0,\quad X \leqslant D,\quad Y \leqslant R,$$

where the problem parameters are interpreted as follows:

S is the generalized state of the energy sector,

C is the vector of specific costs for the technological methods of functioning the existing, reconstructed, modernized, and newly constructed energy sector objects,

A is the matrix of technological coefficients of production, processing, transformation, and transportation of resources, \({{a}_{{ij}}} = {{q}_{{ij}}}\left( {U,W} \right)\),

D is a vector that determines the technically possible intensities of applying technological methods of objects functioning (meaning production and transport of resources), \({{d}_{i}} = {{h}_{i}}\left( U \right)\),

R is a vector that defines the required volume of the resource consumption,

P is the vector of specific losses due to lack of resources,

U is the vector of actions for the energy sector development,

\(W\) is the vector of disturbances.

The first term CX in the objective function shows the costs associated with the energy sector functioning. At the same time, the second term \(P\left( {R - Y} \right)\) reflects the damage from insufficient resources supply to consumers due to disturbances. Such disturbances can include large-scale technological accidents, natural disasters, and terrorist acts.

The operation o4 integrates the simulation results, performs their multicriteria analysis, and generates a set of rational ways of the energy sector development for each scenario.

Modules \({{m}_{1}} - {{m}_{5}}\) implement operations \({{o}_{1}} - {{o}_{5}}\). Their frequent modification is due to the evolution of algorithmic knowledge of the package. In particular, it is caused by the applying new criteria for evaluating the energy sector functioning, expanding the list of actions for its development, and using additional constraints on the processes of functioning and development of this sector. In addition, the development of hardware and software of the environment often leads to the need for the recompilation, build, and testing of modules with these updates.

In this regard, we consider two basic versions \({{{v}}_{1}}\) [12] and \({{{v}}_{2}}\) [13] for solving the problem of searching directions of the sustainable development of the energy sector of Vietnam. The second version is developed using the new software manager developed for OT.

5.2 Computational Resources

Problem-solving using both of the aforementioned versions of the package was performed in the integrated computing environment that includes the following resources:

• Cloud-based virtualized resources under the OpenStack management – 5 dedicated nodes (2x16 cores CPU AMD Opteron 6276, 2.3 GHz, 64 GB RAM) of the HPC-cluster of the public access computer center Irkutsk Supercomputer Center of SB RAS [54],

• Non-dedicated nodes (2 × 18 cores CPU Intel Xeon X5670, 2.1 GHz, 128 GB RAM) of the HPC-cluster,

• Grid-resources – remote personal computers.

The dedicated resources are intended only for solving the problem under consideration. Non-dedicated resources are shared by all users of the public access computer center. They are used in accordance with their administrative policies and quotas. Users who have already been dedicated resources can use only those non-dedicated resources on which there are free slots in the LRM schedule. In the process of solving the problem in an integrated computing environment, its grid-resources can be used by their owners.

5.3 Preparation of a Computational Experiment

In general, software modification includes the following stages:

• Modification of the source code of applied and system software, including searching for the correct version in the repositories on the GitLab server, cloning the necessary repository to the OT server or developer server, launching the programming environment, editing the code, and saving the modified code to the GitLab server,

• Automated build of software in accordance with the scenarios predefined in the conceptual model designer for external continuous integration systems,

• Modification of test data including its search on the GitLab server or testing server, launching the data processing system, and editing them,

• Modification of scenarios of continuous integration, delivery, and deployment of software using the conceptual model designer,

• Automated software testing in accordance with the predefined scenarios,

• Automated build of software installation packages in accordance with scenarios of its continuous integration, delivery, and deployment,

• Modification of the conceptual model (parameters, operations, modules, productions, problem-solving schemes, and relationships between them) of the environment, translation of the resulting modification into an XML-description, and verification of its completeness and correctness,

• Automatic deployment of software on resources using installation packages,

• Automated testing of software on resources in accordance with the predefined scenarios including an evaluation of the resource performance during execution,

• Automated testing of package schemes in accordance with the predefined scenarios.

Usually, these stages are repeated many times (up to several times a day) in the process of modifying applied and system software. When developing version \({{{v}}_{1}}\), all of them were executed in non-automated mode.

Evaluations of the average time taken to modify the software of the package are represented in Fig. 6. They include evaluations of overall time costs (Fig. 6a,c) and the developer time costs (Fig. 6b,d) at the modification stages for a one module using the software manager (version \({{{v}}_{2}}\)) and without it (version \({{{v}}_{1}}\)).

Fig. 6.
figure 6

Evaluations of the overall time (a, c) and developer time (b, d) in minutes at the modification stages for a one module using and without the software manager respectively.

Figure 6a and Figure 6c reflect the time spent on the stages associated with the modification of the algorithmic knowledge of the package. Their implementation is based on the joint use of OT and external continuous integration systems. The remaining evaluations related to modifying the package only with OT are shown in Fig. 6b,d.

The results represented in Fig, 6 show that the use of the software manager provides a significant decrease in both the time spent by a software developer and overall time taken to prepare the experiment. They are due to the following factors:

• Reducing the time spent to search the software source code in repositories,

• Automation of compilation, testing, and building installation packages, deployment and testing of software on resources, and testing problem-solving schemes,

• Reducing compilation time, testing, and building installation packages by moving these stages to more performance resources,

• Improving the computation reliability in the problem-solving process by preliminary testing software on heterogeneous resources,

• Partial automation of the process of modifying the conceptual model of the environment when changing software specifications in repositories.

Figure 7 shows evaluations of the time spent by the software developer and the overall time taken to prepare and carrying out experiments for version \({{{v}}_{2}}\) using and without the software manager. The evaluations are calculated on the basis of a cumulative total and shown in minutes in accordance with the software modification sequence. Obviously, the use of the software manager provides a significant decrease in time of preparing and carrying out of experiments.

Fig. 7.
figure 7

Cumulative time evaluations.

5.4 Computational Experiment

We compare the module execution parameters of the two versions \({{{v}}_{1}}\) and \({{{v}}_{2}}\) when solving typical problems characterized by the same computational complexity. In the process of problem-solving, data of the same structure were used. These data are obtained on the basis of statistical information on the functioning of the energy sector of Vietnam in different periods.

In Table 2, we show the following problem-solving parameters for one scenario of the energy sector development:

Table 2. Module execution parameters

\({{n}_{s}}\) is the number of module launches,

\({{t}_{{avg}}}\) is the average module runtime in seconds,

\({{n}_{f}}\) is the number of module failures,

\({{n}_{r}}\) is the number of module restarts,

k is the coefficient of utilization of the node components at the module execution.

The time of parallel processing of submodels for the energy sector development by instances of module \({{m}_{3}}\) is the main part of the problem-solving makespan. The results are represented in Table 2 show that the number of failures for the module \({{m}_{3}}\) instances of version \({{{v}}_{2}}\) has decreased. Accordingly, the number of restarts has reduced.

The optimization of the module \({{m}_{3}}\) operation of the version \({{{v}}_{2}}\) and elimination of failures in its functioning led to a significant improvement in the coefficient k for this module by 10.19%. The resource utilization coefficients for the execution of the remaining modules of version \({{{v}}_{2}}\) are also improved.

Figure 8 shows the time \({{t}_{2}}\) and \({{t}_{3}}\) of computations in minutes when using modules of versions \({{{v}}_{1}}\) and \({{{v}}_{2}}\) to solve the problem. In both cases, dedicated and non-dedicated resources were used jointly. The jobs distribution was carried out under the MAS management.

Fig. 8.
figure 8

Problem-solving makespan.

In addition, the evaluated time \({{t}_{1}}\) for problem-solving only on dedicated resources when using modules of version \({{{v}}_{1}}\) is given. In this case, the distribution of jobs was carried out under the OpenStack management.

In the case of version \({{{v}}_{1}}\), the use of additional non-dedicated resources made it possible to reduce the time it took to solve the problem by 18% in comparison with the evaluated time of problem-solving on non-dedicated resources only. The use of version \({{{v}}_{2}}\) reduced this time by an additional 5%.

5.5 The Non-dedicated Resource Use

In this section, we consider the advantages of multi-agent management of distributed computing in the joint applying dedicated and non-dedicated resources.

The module m3 evaluates states every 10 seconds on average. Figure 9 shows the distribution density of the module m3 runtime for both package versions respectively. In both cases, these results show that the module m3 runtime in most cases is about from 5 to 10 seconds. This fact enables to provide a good balance of the resources load by agents.

Fig. 9.
figure 9

The distribution density of the execution time.

The estimated problem-solving makespan on one core is more than 60 days. The makespan provided by applying only virtualized resources with 160 VMs is about 9 hours for both package versions. This is achieved by starting VMs in non-dedicated resources. A part of the computational load is re-allocated to non-dedicated nodes.

Figure 10 demonstrates numbers of free and used slots for every hour in the problem-solving for both versions of packages.

Fig. 10.
figure 10

A number of slots vs. an hour.

The numbers of free slots in the LRM schedule of non-dedicated resources are represented by the green color. The numbers of used slots are reflected by the light green color.

Distribution densities of the number of cores are demonstrated in Fig. 11. We can see that most of the slots have 32 cores.

Fig. 11.
figure 11

The distribution density of a number of processors in slots.

Time intervals for the used slots are represented in Fig. 12. It shows the insufficient time intervals in many slots for the effective launch of VMs. At the histograms, we use the exponential fit.

Fig. 12.
figure 12

The distribution density of slot time intervals.

We used 16 slots from 55 free slots for \({{{v}}_{1}}\) and 17 slots from 56 free slots for \({{{v}}_{2}}\). For \({{{v}}_{1}}\)(\({{{v}}_{2}}\)), 15 (18) free slots occurred in the nodes in which jobs are already executed. The VMs startup could negatively affect the execution time. 24 (21) slots exist a short time. The efficiency of running machines in these slots would be extremely low. In the best case, we use three slots at a time and the maximum number of additional cores is equal to 96 in each scenario. At the same time, the average processor utilization rate in non-dedicated resources for both cases is equal to about 90%.

For jobs of the module m3 execution, we have created the special class. It includes characteristics for jobs that can be executed both in dedicated and non-dedicated resources. The information about these characteristics is represented in Table 3.

Table 3. Job Class Characteristics

It shows the following data: job characteristic, its types, domain, unit, attributes “mandatory” or “optional”, specialized domain for the class, and example of characteristic values in the job specification. In this specification, characteristics 1-3 represent information about the executed file.

Characteristics 4-6 describe requested resources. A selection of the right VM image is provided by characteristics 7 and 8. Middleware in VM is determined by the system variable “framework”. This variable “framework” was equal to “DISCOMP” for \({{{v}}_{1}}\). Subsequently, it was equal to “OT” for \({{{v}}_{2}}\). Characteristics 9 and 10 determine restrictions of slots in LRM schedule. Let us note that characteristics 11-13 are optional. Therefore, their values may not be defined in a job specification.

The job classification determines restrictions of slots in the job class specification, in advance. Further, using this knowledge, agents automatically select slots that ensure efficient use of resources.

For dedicated virtualized nodes, we speed up computations close to linear. In the case of \({{{v}}_{1}}\) job re-allocation to non-dedicated resources reduces the problem-solving time by about 18% in comparison with OpenStack platform that cannot provide dynamic elasticity of VM provisioning in non-dedicated resources. At the same time, in the case of \({{{v}}_{2}}\) these indicators were improved by 5% in comparison to \({{{v}}_{1}}\).

6 PRACTICAL USE

OT is applied in Irkutsk Supercomputer Center of SB RAS. Usually, it is used for supporting large-scale computational experiments within initiative scientific research.

At the same time, OT has been also successfully used to solve many practical problems. Among them are the following important problems:

• Studying the development scenarios of the energy sector of Vietnam [13, 53],

• Analysis of the vulnerability of critical infrastructures in the energy sector of Russia [9],

• Optimization of warehouse logistics [55],

• A search of global minima of multiextremal functions [56].

In addition, students of the Irkutsk State University actively use OT within training course Parallel Computing. Students receive both the group training in solving typical problems using traditional parallel programming technologies and individual training in solving specialized problems using the OT technologies.

The community of the OT users includes the following groups:

• A team of scientists developing and supporting the framework,

• Developers, administrators, and end-users of distributed applied software packages that were developed for solving the aforementioned practical problems,

• Students.

7 CONCLUSIONS

OT provides unified environment that integrates clouds, grids, and public access supercomputer centers. It includes tools for development and use of distributed applied software packages using modern methods and means within the paradigms of parallel, distributed, and virtual computing.

In this paper, we present new tools of the framework for mitigating uncertainty of the job allocation on resources. Our contribution is multifold.

We define uncertainty in terms of a deficiency of the available knowledge in relation to the knowledge needed to make the near-optimal decision of the job distributing.

We provide eliciting the additional necessary knowledge about the module runtime estimation in different nodes of a heterogeneous integrated computing environment.

We propose methods for job classification in accordance with their characteristics and matching resource capabilities to job classes.

We extend the systems for continuous integration, delivery, and deployment of software for scalable scientific applications (distributed applied software packages) with workflows.

However, some problems cannot be solved with the help of such systems. To this end, we have implemented our own tools for the meta-description of all processes of continuous integration, delivery, and deployment of both the applied and system software. In comparison with existed systems, these tools support modification the conceptual model of the integrated computing environment and automation of planning and testing problem-solving schemes.

We have provided the results of a comprehensive comparative analysis of two modifications of the distributed applied software package. The versions of this package were developed for solving the problem of studying directions of the sustainable development of the energy sector of Vietnam with the use of the developed tools and without them. These results show the benefits of using the developed tools. In particular, the reliability of distributed computing and problem-solving time are significantly improved.

Our further studies are related to the use of OT in creating digital twins of important infrastructure objects of the Baikal natural territory. The aim of the study is to optimize the applying advanced environmentally friendly technologies at the objects.

OT is based on the technology of creating applied software packages, which has been developing at the institute since the 80s of the last century. It is promoted by the Matrosov Institute for System Dynamics and Control Theory of SB RAS.