Keywords

JEL Codes

1 Introduction

Emerged with the advent of Industry 4.0 [1], the term digital economy has become a full-fledged concept embraced by scholars, researchers, etc. To date, the term denotes an economy based on the implementation of computer technologies and techniques in economic activities. Big Data and Artificial Intelligence (AI) transform the environment, traditional concepts (smart home, unmanned vehicle systems, smart contracts, and blockchains (distributed ledgers), economic activities, and industries into a qualitatively new state (smart agriculture, digital healthcare, etc.). At the same time, these activities rely on data exchange and decision support systems as vital architectural components required to respond efficiently to possible challenges.

The operational features that are crucial for the reliable operation of Industry 4.0 computer technologies include.

  1. (1)

    processing large amounts of data taking into account not only the Big effect but also the Open effect, which consists in regular updating of the current database with new information,

  2. (2)

    dealing with the time constraints imposed on the Data Analysis (DA) and Decision-Making time, the so-called process-real time.

The development and implementation of Industry 4.0 technologies have shown that in the conditions under consideration, Big Data analysis can be carried out effectively and reliably only by using Artificial Intelligence techniques (where this term represents a general name for the relevant field of research and development). Thus, a problem-oriented approach using Intelligent Data Analysis (IDA), Big Data analysis performed by computer systems that simulate and enhance the cognitive capabilities of an expert researcher, is an effective means to respond to these kinds of challenges.

The need to develop approaches, techniques, and means to ensure reliable operation of the infrastructure components and aid to protect and maintain business continuity in the complex technologies employed in Industry 4.0. appears to be a critical component for appropriate decision-making in the industry.

When developing techniques for decision-making systems, it is vital to take into account the following requirements [2,3,4,5,6,7, etc.]:

  • supported by IDA techniques, guidelines, conclusions, and recommendations for Decision-Makers (DM) must be transparent that is meaningfully interpreted and clearly explained in meaningful terms and concepts adopted in the analyzed subject area because the decision-maker takes full responsibility for the consequences that arise as a result of their implementation;

  • conclusions and recommendations provided by decision support systems (DSS) should be sustainable when updated with new evidence on the control object; causal factors in the DA and DSS, which determine the presence of the studied effect or phenomenon, need to be identified. Consistent refinement of ideas about such causal influences, formed in the process of constantly updated empirical data on the behavior of the control object, allows us to reliably and effectively maintain the business continuity mode of the relevant technologies and end solutions in Industry 4.0;

  • Big Data processed under process-real time constraints requires special efforts to control and optimize the inspection of possible alternatives when searching for appropriate management decisions (more formally speaking, targeted control of computations to identify a reliable alternative).

Previous studies have shown that when we deal with relatively small initial (see [3,4,5,6,7, etc.] for mathematical problems of diagnostic type), some computationally hard—provably intractable combinatorial problems—arise [6, etc.]. In this case, a critical role belongs to various methods to provide control case study inspection when searching for a reliable alternative (the development of effective heuristics for finding particular solutions, development of some approximate methods, etc.).

The techniques discussed seem promising for diagnostics and support in large technical systems such as data centers and infrastructure telecommunication complexes. These techniques are efficient for fraud protection in the financial sector and decision-making support under force majeure in the industrial sector. The latter includes crisis management, support for the survivability (business continuity) of the control object under the conditions of failures, and other incidents that disrupt its operation.

2 The Scientific Problem

From a formal point of view, the described above industrial applications tackle specific diagnostic tasks [3,4,5,6,7, etc.]. In this case, the analyzed data represent descriptions of samples with the presence (or, conversely, the absence) of the phenomenon or effect under study—the so-called precedents. The initial database, regularly updated with descriptions of new precedents of the controlled object behavior, can be summarized in the table Precedents × Features, where a concrete value is assigned to each of the observed features of a specific precedent. The purpose of the IDA is to identify the causal factors, which determine the presence/absence of the analyzed effect/phenomenon. Based on the collected empirical data, reconstruction of causal factors enables the effectiveness and reliability of decisions based on the conclusions and recommendations derived by Intelligent Data Analysis due to a targeted search of the identified causal factors.

In Industry 4.0, the management of complex industrial facilities involves risk assessment for further development of the current situation—the state of the control object that can evolve in various possible ways. Simultaneously, it is vital to consider several requirements and restrictions that vary from the desire to enhance profits to the need to maintain business continuity by ensuring the operational stability of high-tech business structures by repairing various technical failures and targeting disruptive impacts. In this case, decision-making should rely on specific control systems that use built-in models to control an object’s behavior in the changing environment. Here, we deal with solutions of two types. The first type uses stationary models of large objects and the cooperation of objects ported on large geographically distributed computing systems (see, for example, [8, 9] and others). The second type includes various mobile and onboard systems taking into consideration their limitations in functionality and performance.

The computer models for the control systems described above are as follows. (1) Mathematical models that now have the status of classical use system of balance type relations, e.g., models based on systems of equations, in particular, describe different effects in physics, economic cross-production or cross-industrial balances, etc. (2) Interpolation-extrapolation models implement learning procedures based on precedents, machine learning, data mining, etc.

To date, the use of balance models prevails in cases with a sufficiently small set of influence factors (possible causes), e.g., effects and phenomena characteristic description in natural science, such as physics, and technology, such as technical diagnostics, in particular. The efficient use of mathematical Models in Big Data analysis of multi-causal effects and phenomena is limited significantly in analytical and numerical ways. Let us take, in particular, the objectives of sociological studies that should consider the typology of society, identification of rational grounds for the electoral decisions in various social groups in society, etc., or the purposes of modern high-tech medical diagnostics. It is precisely these restrictions that have become the most significant driver for the development and improvement of interpolation-extrapolation models, which use training samples of precedent descriptions in dynamically changing conditions for the subject under study with its characteristic effects and phenomena. The COVID-19 pandemic represents a situation, in which the virus is subject to constant mutations. The need exists to analyze its impacts on people and the environment (pets, etc.) promptly.

The mechanism for this class models functioning is the interpolation of the training set of precedents with the presence (and/or absence) of the studied phenomenon/effect by empirical dependencies of one kind or another, followed by checking the expansion of the dependencies found in this way to the newly analyzed precedents to predict the presence (or, vice versa, the absence) of the studied target effects/phenomena.

To date, a topic of general interest is the personalization of products and services that is Industry 4.0 systemic feature. The individual settings and the need to classify consumers into homogeneous groups give rise to the issue of finding a balance between individual and group needs. We mean the following: can an individual consumer need simultaneously be a commercially significant need of a group of consumers that, when extracting the costs, can bring a profit? Where should we draw economically meaningful boundaries between homogeneous groups with retained, to a certain extent, characteristics? So, researchers set the task of quickly identifying economically significant homogeneous groups of consumers for Industry 4.0 products and services. Various network communities and collaborations [10,11,12, etc.] have become a topical research objective. It became clear that at the technical—procedural, algorithmic—level, identification of the boundaries between homogeneous groups of consumers in terms of their properties such as needs, and interests, has several specific features. First, it is necessary to promptly analyze large amounts of initial data (Big Data) to identify sought-for patterns under the specified time constraints for data analysis and decision-making. It is required to process many alternatives (candidates for a group in question) in constrained time intervals because of the specifics of the problem solved. An example of a successful solution to the issue under discussion is the well-known story of voting in the UK and USA associated with the Cambridge Analytica company [13].

Apart from deciding whether a specific object belongs to a group of similar objects, it is vital to explain why we assign it to this group as it aids to suggest a course of action, or select a treatment scheme based on medical diagnosis, or develop a repair plan based on the results of technical diagnostics, or take measures to counteract cyber attacks, etc. A lack of consideration of the causal factors can lead to undesirable effects—the inability to resist threats efficiently.

Today, it is evident that this class of problem solutions in industrial applications requires artificial intelligence computer systems. This type of problem solution relies on IDA and machine learning for supporting decision-makers, and professional collaborations (including operational formation and reorganization in response to regularly occurring changes in the functional environment) as efficient techniques for Industry 4.0 [10,11,12, etc.], given time limits for data analysis and decision-making.

3 The Scientific and Practical Relevance of the Discussed Problem

Modern high-tech medicine as a research and development area uses Industry 4.0 technology to provide effective treatment. The use of modern techniques and technologies for medical decision-making support is a universal component of the inclusive growth and development of national digital economies. Using these tools allows creating favorable conditions for improving the quality of life and ensuring the equality of opportunities for all social groups of the population, regardless of geographical locations. Evidence-based medicine aims at personalization, proactivity, and prevention of medical interventions in conditions when it is necessary to analyze Big Data within restricted time. The requirements to ensure sufficient evidence for decision-making and, finally, enable decision-makers to be responsible for the consequences of the decisions made present strong arguments in favor of choosing medicine as a model for illustrating the approaches discussed in this chapter.

In high-tech medicine, the data required for successful diagnosis and subsequent therapy comprise a complete personalized record of a patient. It includes the anamnesis data and follow-up—immune status, heredity, genomic and metabolic data, etc., exhaustive lists of causal factors when analyzed effects and pathologies occur, the dynamics of changes in the observed indicators over time. Here, computer techniques for data analysis will lead to improved diagnosis and assessment of medical therapeutic effects and possible complications, side effects, etc.

An example of a mathematical technique adequate to the peculiarities of the performed computer DA (see [3,4,5,6] and others) is the procedural construction of the so-called Characteristic Functions (CF), which enables developing empirical theories based on regularly updated experimental data. Each of these theories is a consistent set of formulae that are empirical dependencies that interpolate on the available training samples of precedents—the Database of Facts (Fact Base). The theory describes all the facts available—the precedents from the Fact Base—as the logical consequences of the assumptions that form this evidence-based theory. These statements are accepted as valid because they are incontestable on the current samples of precedents from the interpolated Fact Base. At the same time, the Fact Base updated with descriptions of new precedents requires a re-evaluation of the acceptability of the assumptions that form the current evidence-based theory that is a dynamically changing structure, reflecting the evolution of our knowledge about the subject area under study.

Within the framework of these theories, verification of the uncontested conclusions can be completed using the argumentation scheme for the assessment of the acceptability of conclusions. The arguments PRO, empirical dependencies responsible for the target effect/phenomenon presence, are analyzed against the current Fact Base. The PRO arguments oppose the CONTRA arguments, facts, and dependencies characterizing the precedents of the target effect absence. Thus, medicine, a subject area regularly updated with new evidence and knowledge, is subject to formalization based on the principles and techniques of open theories. The latter represent expandable collections of empirical evidence (facts) and knowledge representation by local empiric dependencies.

4 Competitive Advantages of the Approach

By definition, a Characteristic Function (CF) is a logical condition that binds especially salient elements of case descriptions—examples and counterexamples of a diagnosed effect/phenomenon from the current Fact Base (FB), which takes:

  • the value that is true on all facts (examples) of the current FB, characterized by the presence of the analyzed target property (diagnosed phenomenon),

  • the value that is false on all facts (counterexamples) of the current FB of the CF—in the absence of the analyzed target property.

Papers [3, 5, 6] give a fairly detailed consideration of the procedures for generating CFs for a given FB. The mathematical properties of the CFs procedural toolbox in their most general form are as follows:

  1. 1.

    in the general case, the exponentially fast growth of the size of the classes of generated empirical dependencies with a linear increase in the number of elements in the original training sample of precedents;

  2. 2.

    as a consequence, there is a lack of opportunities in many significant applications to identify all conflicting pairs of empirical dependencies by the so-called “brute-force” method—by exhaustive enumeration of all the candidates, which leads to a controversy in the diagnosis;

  3. 3.

    and an efficient, fast (polynomially complex) solution to some key combinatorial problems.

In a formalized form, this can be summarized in the following statements:

  • the problem of checking the causal representativeness of the current training sample of FB precedents (see [5, 6, etc.] on the non-emptiness of the set of CFs formed on its base) is effectively solvable: there is an algorithm of polynomial computational complexity that generates its solution;

  • the problem of enumeration of all CFs formed on a specific FB is enumeratively complete [5, 6];

  • the problem of enumeration of all elements in the set CF(FB ∪ FB) formed on the current FB upgraded by descriptions of new precedents ∆FB is enumeratively complete [5, 6] (with the effective solvability of the problem of non-emptiness—causal representativeness—checking of such a set CF(FB ∪ ∆FB);

  • the problem to define the number of elements in the set of characteristic functions that are true on a newly diagnosed patient—extrapolated to his description—is enumeratively complete (with the effective solution for the solvability of the problem of non-emptiness—causal representativeness—of a set of such CFs).

5 Results

5.1 Intelligent Data Analysis and Medical Diagnosis Based on Empirical Data

Using the example of identification and prediction of the post-treatment radiation effect, the so-called pseudoprogression of brain tumors (PsP) (see, for example, [14,15,16,17,18,19,20] and others), we will show how the presented IDA technique works in medical diagnostics. Pseudoprogression is temporary changes in the tumor and perifocal tissues surveyed on magnetic resonance imaging. It is challenging to differentiate post-radiation treatment effects from tumor recurrence by MRI alone. It is important to understand the radiological and clinical presentation that distinguishes these two entities to guide management. A radiation-induced increase in tumor size and contrast-enhancing lesion usually occurs shortly after radiation therapy and resolves spontaneously without treatment in majority of cases. The observed changes that characterize pseudo-progression are as follows:

  • increased accumulation of contrast enhancement with or without the border expansion (T1 c/i),

  • increased swelling edema the tumor and perifocal tissues (T2/Flair),

  • an increase in tumor size due to a cyst and/or a solid component.

From the point of view of data analysis and medical decision support, pseudo-progression is specific primarily by the lack of accurate data that can reliably determine:

  • the time of PsP development,

  • the duration of PsP occurrence,

  • diagnostic standards for PsP that differentiate PsP from tumor progression,

  • recommendations for the treatment of patients with PsP, prognostic factors for PsP development.

The study of the PsP effects with the aid of IDA focuses on the three main objectives that include the diagnosis of the PsP effect, the diagnosis that differentiates PsP from tumor progression, and the identification of the so-called markers of pseudoprogression that allow for an uncontested diagnostic conclusion about the presence of PsP in a new patient.

The initial data (FB) of the study have descriptions of 410 patients of the N.N.Burdenko National Medical Research Center of Neurosurgery collected over about 15 years. The Database of Facts comprises data on 67 precedents of the PsP effect (described in the terminology we have adopted above) and 343 precedents of its absence (counterexamples). Each of the precedents is described by 150 parameters, quantified by values (boolean, numerical, etc.), but some parameters lack data in some patients. Although FB is regularly upgraded, due to obvious reasons, it occurs at a slow pace, with as few as several dozen descriptions of new patients annually.

Thus, in terms of mathematical models and methods of computer data analysis, we have to deal with a training sample of a limited size. At the same time, when processing these data, we need to operate with real Big Data effects. It is clear that even if we consider only the Boolean quantification of the data on precedents (when a description attribute significant for a PsP is either involved (has an impact) or not), we should deal with more than 2150 combinations of values for the parameters under consideration. We have to identify meaningful combinations of the impact factors out of all the descriptions of the current FB precedents empirical dependences, which determine (see above) the occurrence of the target effect—PsP of the tumor.

Some knowledge about the nature of pseudo-progression occurrence in the situation under consideration can be obtained using traditional statistical tools for testing medical empirical data. Literature (see, in particular, [17,18,19] and others) analyzing the risks of PsP development found that a statistically significant indicator that characterizes the empirical dependence correlation is the presence of a cyst in the tumor, which has an infratentorial localization in patients over 11 years of age in the FB under study. However, the approach that uses statistical tests to analyze the sets of correlations between the parameter values in the descriptions of the current FB cases does not solve vital questions concerning the accuracy of diagnostic conclusions and personalization of recommendations based on possible correlations of the observed parameter values. It does not provide meaningful explanations and informal interpretations of diagnostic decisions based on the identified correlations.

Papers [4, 5, 7] present a detailed description of the IDA results; in particular, it enabled us to divide the examples and counterexamples (descriptions of precedents characterizing the presence and absence of the PsP effect), based on an explicit form of the Characteristic Function, by meaningfully interpreted conditions, and give sufficiently detailed (multifactorial) descriptions of markers of the development of pseudo-progressions. One of the empirically established results is the structure of the Characteristic Function that is common for all that is formed on a given FB, in which two groups of the impact factors are present:

  • the presence of those causal influences that force the emergence of PsP,

  • the presence of some PRO arguments that provide sufficient grounds for accepting the results of the performed IDA, and

  • the absence of those impact factors that indicate any deviations from the development of events typical for the PsP effect (the absence of actual CONTRA arguments in assessing the sufficiency of grounds to consider the generated IDA results acceptable).

5.2 The Proposed Approach and COVID-19 Pandemic-Induced Medical Services Transformation

In the context of the discussed features of Industry 4.0, we have to agree that the personalization of decisions and recommendations has fundamental significance. The current situation with COVID-19 viruses is one of the strongest arguments in favor of prioritizing personalization. Indeed, vaccination as a universal (proposed for everyone) means of tackling a pandemic has both clear advantages and disadvantages. Combating COVID-19 has taught us to perceive its effects on people as a combined multi-stage pathology. At the first stage, the virus accommodates in the human body, then, at the second stage, the immune system responds to the virus, and at last, at the third stage, we have to deal with various complications, the consequences of the first and the second stages. It is undeniable that the multidimensionality of the second and third stages (the variety of personalized characteristics of the course of these stages of the disease in different patients) is the prominent argument in favor of the resource futility of attempts to develop a universal remedy (medicinal product) for the second and third stages. There is too much variety in the individual responses of various patients to the effects of COVID-19 viruses.

Given this variability, the fabrication of a highly targeted drug that would consider individual responses to the infection will require enormous resources and time to conduct appropriate laboratory and clinical tests. At the same time, the formation of a typology of patients at the first stage of the infection, and the elaboration of treatment schemes that target homogeneous groups of patients is one of the few (if not the only one) alternatives for the COVID-19 personalized treatment. Identification of typologies, which would start with express diagnostics at the early stages of the disease, based on the constantly growing evidence (factual base) of clinical practice (both positive and negative), is an obvious tactic for improving therapeutic measures in the current situation. Moreover, we note that the described above clinical practice does not need to counteract the destructive activities of various COVID dissidents and all kinds of anti-vaxxers).

Procedurally, using the methodology and the relevant IT tools, the identification and maintenance of the proposed typologies that take into account the dynamics of changes in the epidemic, is a practice that has been elaborated in detail (see, for example, the case with Cambridge Analytica [13], etc.). Speaking about the latest approaches and mathematical models developed to identify typologies of the type under discussion, one can consider the experiences of using artificial intelligence and data mining (see, in particular, [21] and others).

Various research collaborations could play a critically important role in the identification of the discussed typologies. Cross-national and cross-border integration of regularly updated empirical data from research teams can allow us to fine-tune the personalized characteristics of each group of patients in the typologies identified. An example of a setting is the ability to manage the balance between the size of the group and the information content of its description, which is critically important for the effectiveness of the relevant recommendations and treatment measures, taking into account the limitations on the resources available when fighting emerging pathologies. In this context, the growing importance of network collaborations accumulated in the framework of Industry 4.0 is indisputable [10,11,12].

The proposed typologies identification requires not only to classify the studied precedents but also to highlight the causal grounds for the classification. The informativeness of the description of causal influences that allow for a meaningful explanation and interpretation of the emerging typologies provides opportunities to use the characteristic features of the formed classes of similarity of the analyzed precedents when tackling the related pathologies and threats. Medicine, technical diagnostics, the fraud protection in the financial sector, cyber security, etc. are some of the many areas that can use types of causal relations as a basis for combat measures that, in the first place, target the causes of the identified threats.

The approach we have considered using the example of combating COVID-19, which discusses the personalized characteristics of objects of analysis, support of collaborations, and typologies formation, can also be expanded on several other areas characteristic of Industry 4.0. Along with the previously mentioned medical and technical diagnostics, these areas can include providing business continuity of large infrastructure solutions for Industry 4.0, the fraud protection in banking and finance (including identification and countering/protection insider activities, etc.), analysis of public opinion (typology of society, allocation of rational grounds for the electoral choice of various social groups, etc.), and others.

6 Conclusion

Research programs undertaken in industrialized countries [22, etc.], which focus on support and development of the technology initiatives of Industry 4.0, draw our attention to the development of mathematical techniques and computer systems for the automated formation of evidence-based theories of the type discussed above. The AI technique allows us to effectively manage large amounts of empirical data from various sources: data fixed by different indicators and other measuring equipment, production and operation monitoring data, etc. AI dealing with regularly updated databases and promptly utilizing the outcomes in decision-making support systems ensures an adequate level of situational awareness for decision-makers and predicts further development of the current situation.

Here, we assign the fundamental role to:

  1. (a)

    methods and technologies of computer-oriented representation of knowledge about the analyzed subject area in the form of dynamically changing logical theories (consistent sets of formulas—empirical dependencies) of a special type, the so-called partial empiric (evidence-based) theories. At the same time, such partial (empirical, regularly reorganized\reformed with the arrival of new empirical data) theories are used as a computer-oriented technique for providing support for decision makers’ situational awareness;

  2. (b)

    methods and technologies utilized for predicting the development of the current situation when analyzed by the decision-maker (including the identification of potential risks that may arise from making decisions based on suggested alternatives) by expanding the existing empirical dependencies,

  3. (c)

    AI systems that provide just in time automated generation and maintenance/modification of evidence-based theories that target a wide range of subjects varying from heads of local business units to analysts in large centers for data analysis and decision support.