Keywords

1 Introduction

The development of information and communication technologies offers opportunity to transform traditional production into a smart production. This digital transformation concept significantly impacts production processes and drives the fourth industrial revolution - Industry 4.0 [1].

Development of Industry 4.0 technologies (e.g. Internet of Things (IoT), Cyber-Physical Systems (CPS), Cloud Computing and Artificial Intelligence) results with the constant increment of amount of data obtained during production processes. Moreover, generation of large amount of raw data in production systems, called Big Data [2, 3], is becoming a key for optimizing production process and improving competitiveness [4, 5]. Thus, in this way, the data acquired across the product life-cycle can be converted into knowledge with positive impact on all aspects of production [6].

However, Big Data with all the positive expectations also brings challenges not encountered by production companies until now. Specifically, the biggest challenge today’s production faces is the processing and analysis of Big Data. Consequently, the Big Data speeds up the development of data analytics technology for detecting hidden information among data collected by different machines and devices in the production processes using advanced analytical techniques (advanced statistical analysis, machine learning and expert systems). In this way, the emphasis is put on eliminating the issues that arise in production systems [7]. Another challenge of using Big Data is that very often it cannot be processed and analyzed using the existing software applications and personal computers due to insufficient processing power [4]. Therefore, new technologies, such as Cloud and Edge Computing, are using advanced data analytic techniques to detect hidden information to overcome the problems of processing and analyzing of the generated data.

The discovery of hidden information in the raw data is enabled by the real-time predictive data analytics. Based on real-time predictive data analytics, the real-time decision making models are applied in the production system [8] transforming a reactive production system into predictive. The predictive production systems enable proactive behavior permitting to anticipate the error before it occurs in the production process and to instantly take appropriate actions to avoid it [9]. Thus, we argue that in the near future of Industry 4.0 the real-time data analytics will play a mayor role in the development of the predictive production systems with ability of processing and analysing the data as it is generated in real-time [10].

The present research contributes to the body of real-time data analytics literature by proposing a conceptual framework for development of real-time predictive model for data analytics. Notably, the proposed framework represents the basis for development of real-time data predictive models based on datasets collected from production process.

The present paper is organized as follows. Section 2 provides a theoretical background of real-time data analytics for smart production systems and details on the research method. Section 3 proposes framework for real-time data analytics for Industry 4.0. Finally, Sect. 4 derives conclusions and provides information on the future research.

2 Theoretical Background and Research Method

This section summarizes the state of the art of the research in the subject fields, namely: Industry 4.0, Predictive production systems, IoT, and real-time data analytics for smart production. Further, the research method is provided.

2.1 Background: Real-Time Data Analytics for Smart Production

Industry 4.0 -

Recently, Industry 4.0, has become one of the major research and development topics for industry and academia [7, 8]. The concept of Industry 4.0 can be defined in different ways, depending on the point of view and field of inquiry [1, 7]. In the present research, the used definition of Industry 4.0 observed through lens of production processes in industry and the application of data science based on data analytics, reads: “Industry 4.0 is a concept that aims to increase production processes efficiency by collecting and interpreting data using data analytic techniques (e.g. data mining, statistical analysis etc.) and developing of predictive models based on which real-time decisions are made and executed” [11].

Predictive Production Systems -

Intelligent production systems represent new generation of production systems that include hardware parts of equipment and machines, and software components [2, 6]. These production systems use techniques such as expert systems, fuzzy logic, statistics and machine learning [3] for production processes control [2]. One of the biggest challenges of production processes control is the transition from reactive to proactive production system. On the one hand, reactive production systems apply a posteriori inspections to detect defects (i.e. machine failures or defect products) at the end of the production process and reactively implement changes in the production system [2]. On the other hand, proactive production system use predictive (i.e. a priori) approach that aims to prevent machine failures or defect products from being manufactured in the first place [2]. Predictive production systems are a type of proactive Industry 4.0 intelligent production systems [4]. Thus, according to the previously said, predictive production systems predict an error occurrence inside the system and take adequate actions for avoiding the error to occur at all [4]. The real-time data analytics represents the future of the predictive production systems since it enables the data to be processed and analysed while it is generated in real-time [10].

Internet of Things (IoT) -

The increasing availability of production data is changing the way decisions are taken in industry regarding predictive maintenance and quality improvement using data analytical methods [12]. The implementation of advanced Industry 4.0 technology, namely Internet of Things (IoT) allows the higher availability of production data due to it’s ability of connecting different devices, communication technologies, sensor networks, Internet protocols, tags for RFID devices and so on [13]. The IoT combined with data analytics, can enable predictive production and networked production environments [4] for real-time data analytics [10].

Real Time Data Analytics -

Data analytics, as a part of data science field, represents a practice for revealing hidden information among data collected from various devices further enabling real-time decision making in production systems using real-time data analytics [7]. Real-time data analytics, as a part of data analytics, refers to analytical techniques where data is processed and analysed while it is generated, in real-time [10] or near to real-time [14]. Among the available real-time data analytics research [15, 16] the main focus is put on real-time monitoring and quality controling without predictive abilities. Moreover, the recent systematic literature review [11] showed that data analytics implementation challenges are the most frequently addressed in the Industry 4.0 literature. Specifically, “the inability to achieve real-time maintenance” is one of these challenges ([11] based on [17]). The present research adds to discussion present in this literature stream.

2.2 Research Method

According to Phaal et al. [18] the term “conceptual” implies “concerned with the abstraction or understanding of a situation”, while “framework” implies “supports understanding and communication of structure and relationship within a system for a defined purpose” (Phaal et al. [18] based on Shehabuddeen et al. [19]). Thus, conceptual frameworks “support understanding of an issue or area of study, provide structure, and support decision making and action” [18]. Conceptual frameworks are “needed to guide thinking about technology management, based on well-founded theoretical principles” [20].

The present research proposes a conceptual framework [18, 20] for real-time data analytics application for Industry 4.0. It does so by reviewing the relevant literature and conceptualizing a framework based on the findings. The proposed framework for real time data analytics is composed of three parts, namely: system characteristics, dataset characteristics, and IoT network infrastructure. Notably, the research presented is supported and additionally informed by researchers’ insights obtained during the implementation of Industry 4.0 in industrial practice [21].

3 Proposed Framework for Real-Time Data Analytics for Industry 4.0

In the present research, we propose a framework for application of real-time data analytics for Industry 4.0 in production system (Fig. 1). The proposed framework is composed of: production system characteristics, dataset characteristics, and IoT network infrastructure.

Fig. 1.
figure 1

Proposed framework for real-time data analytics

Production System Characteristics -

Notably, we need to know and understand production system characteristics in order to determine if it is possible to develop a real-time data analytical predictive model for the observed production system [22]. Without production system characteristics it is very hard to imagine a real-time predictive model, because in general predictive model development is built on the characteristics of the production system it is meant to serve. Notably, production system that moves towards implementation of real-time data analytics tends to have a high level of automatization and high level of digitalization which together enable real-time data acquisition [23] (Fig. 1). Also, expert knowledge must not be neglected, because there is a need for complete openness, sincerity, tight collaboration and constant communication between industry and researchers in order to obtain the synergy needed [24] for achieving real-time predictive production system. In this collaboration, industry practitioners provide the know-how on the observed production process, while researchers develop predictive models and test them in production systems [24] to improve the performances of the processes using real-time data analytics. Based on data acquired from production system, the dataset is formed.

Dataset Characteristics -

Basic characteristics of acquired dataset are data volume, dataset type, dataset structure, dataset values and data change rate. In the following we elaborate on all five Data characteristics needed for implementation of real-time data analytics.

  • Data volume - It is a generally accepted opinion that a large amount of data is necessary for the development of a predictive model for real-time data analytics [25]. However, recent research [26] points out the importance of small dataset in developing predictive models. Therefore, we argue that small carefully selected dataset based on experts’ knowledge can provide an accurate prediction that is comparable with predictive models developed based on a medium or large dataset.

  • Data type - The development of the predictive model with ability of real-time data analysis largely depends on the type of the data collected over a period of time in production systems [5]. Therefore, it is necessary to select a dataset without incomplete, homogeneous or noise data, which can disrupt the quality of a dataset [27, 28]. Thus, a high-quality dataset is crucial for developing an effective predictive model. A high-quality dataset is considered to consist of a certain number of samples from the periods when production system worked without problems, interruptions and difficulties, as well as a certain amount of data when certain problems occurred in the system [1]. Data samples from the periods when production system worked without any problems, interruptions and difficulties are defined as non-fault data samples, while data samples when certain problems occurred in the system are defined as fault data samples. Notably, the difficulty lies in finding of balance between the number of non-fault data samples and the number of fault data samples in order to obtain a varied dataset.

  • Data structure - Production data is usually not in a format that can be used directly for data analysis [29]. Nevertheless, predictive models require a structured dataset. This means that dataset needs to be in the form of a matrix containing rows and columns, that is it is necessary for the dataset to be structured as a two-dimensional matrix [30]. The rows of the matrix represent all identified independent parameters, as well as one dependent parameter determined by the expert knowledge, while the columns correspond to each individual case (sample) recorded during the production process. Even though dataset needs to be in the structured form, in majority of cases the collected dataset is available in semi-structured and unstructured form. If the data is available in a semi-structured or unstructured form, the additional efforts are needed to convert the data into a usable structured form.

  • Data values - Different values of dataset can be generated in production process, namely: categorical, continuous or combination of these two. On the one hand, for categorical variables, it is suggested that they are converted into binary value (e.g. value of variable can be either 0 or 1). If there are more than two levels in the categorical variable, a series of dummy variables should be used, where each defining the presence of a level [31]. On the other hand, continuous variables are numeric variables that have an infinite number of values between any two values. In other words, continuous values are numeric values [32]. Therefore, when collecting a dataset for development of a predictive model for real-time data analytics, the best choice is to use continuous value of a production parameter.

  • Data change rate - Nowadays, large volumes of data are daily generated at high rate from heterogeneous sources during production process due to use of fast production equipment. This leads to fast change of data rate representing the frequency at which data is generated, captured, and shared [33]. Thus, the data arrives in stream and must be analyzed in real-time. That represents a difficulty for developing predictive model for real-time data analytics based on high-rate changing data. Therefore, in order to develop a reliable predictive model for real-time data analytics, the slow-rate changing data represents better option.

IoT Network Infrastructure -

When considering the formed dataset, taking into account the characteristics of the production system, we need to choose IoT network infrastructure. The possibility of developing a predictive model for real-time data analytics depends on the right choice of IoT network infrastructure, i.e. Cloud Computing or Edge Computing. Bajic et al. [34] compare the Cloud and Edge Computing and stress out that these technologies do not rule out one another, but complement each other. However, on the one hand, the disadvantage of Cloud Computing is represented in the longer processing and computing times as well as a slower response time which prevents the implementation of real-time data analytics [34]. On the other hand, Edge Computing enables the data generation at the network edge reducing the distance that data must travel on the network enabling real-time data analytics implementation [34].

4 Conclusions and Future Work

The present research develops a framework for real-time data analytics (Fig. 1). It does so by analyzing the relevant literature and subsequently conceptualizing a corresponding framework. The framework is meant to serve as starting point for companies that would like to implement real-time data analytics as a part of transformation towards Industry 4.0. The proposed framework is composed of three parts, namely: system characteristics, dataset characteristics, and IoT network infrastructure. The research goes in detail with the dataset characteristics analysis.

In the future work, the developed conceptual framework for the real-time data analytics will be further refined and elaborated. Additionally, the research will move towards testing the proposed framework with the companies interested in implementing real-time data analytics.