Keywords

1 Introduction

The Industry 4.0 is considered as the “fourth industrial revolution” that may fully automatize the production process in the manufacturing industry. In essence, it is based on large-scale digitalization of manufacturing, where machines and humans are connected as a collaborative community, generating large volumes of complex data—often referred to as big data. The core idea is to collect data, consolidate, and analyze it across the entire manufacturing process in real-time. The hope is that an analysis of data captured from industrial processes can lead to a better understanding of production processes, and in turn, support their improvement. Predicting the required maintenance intervals of production machinery, for example, can lower operational costs. Another example is the identification of influence factors on production quality, which may be used for quality control.

Current production facilities are typically equipped with sensors that can collect relevant process data. Together with communication and processing capabilities, sensor systems can also exchange data from machine-to-machine or machine-to-human, or perform on-the-fly data analysis at the sensor (edge computing). This data should support the machines in the task of identifying (e.g., trace parts and sub-assemblies), to adapt the production to changing requirements and individual needs, and ensuring product flexibility. Yet, raw production data alone does not provide valuable information and requires domain experts to extract valuable insights. The volume of data generated in a production process, however, can be overwhelming. First, the data has to be monitored and recorded using methods that can handle huge datasets. The next stage includes the analysis of the data (often in real-time) in order to, e.g., (i) identify undetected process correlations, (ii) forecast the production quality, and (iii) perform root cause analysis of failures or problems. A promising application for understanding production data is the use of visual data science tools and methods that effectively combine machine intelligence with human intelligence and interactive data exploration. Such interactive approaches can help domain experts to gain insight from manufacturing data, identify interesting patterns, and extract actionable information.

Recently, Zhou et al. [65] identified application opportunities and benefits of visual data science in industrial applications in different industrial sectors, such as automotive and energy, and key operations, such as replacement and creation. Researchers in visual data science address this application space with an increasing set of techniques that effectively exploit the power of data analytics methods and human information processing capabilities. Due to this intelligent use of human visual system and analytical methods, it becomes possible to shift the limiting factors of the analysis of management and production processes data in complex industrial scenarios. In this chapter, we first introduce the visual data science methodology and goals and discuss the infrastructure to implement the technology in concrete applications in Sect. 2. In Sect. 3, we review example visual data science solutions for selected industrial applications, such as production planning, quality control, and condition monitoring. In Sect. 4, we discuss future directions and conclude this chapter.

2 Foundations of Visual Data Science and Challenges

The main idea in visual data science (VDS), also discussed as visual analytics (VA) or visual data analysis, is to support the exploration, understanding, and explanation/communication of relevant patterns in data. It is often used for tasks such as data-driven decision-making, monitoring, and steering of analytical processes. It builds among others on concepts from data visualization, human-computer interaction, and computational data analysis approaches. The latter include approaches from statistics (e.g., from exploratory data analysis) and machine learning, including specific deep learning techniques and artificial intelligence. For this article, we subsume the latter approaches simply as computational data analysis. For a more detailed discussion of respective terminology, we refer to the overview given by Cao [12].

Visual data science approaches, typically integrate data visualization with computational data analysis techniques into interactive systems. In these systems, users explore data to solve tasks in an interactive, sometimes open-ended process. We start by surveying fundamental concepts of interactive data visualization (Sect. 2.1) and visual data science (Sect. 2.2). Then, we give an overview of the data to be analyzed, its complexities (Sect. 2.3), and challenges (Sect. 2.4)—all through the perspective of visual data analysis. In addition, we review the technical infrastructure, ranging from off-the-shelf tools to libraries, which can help in building visual data science solutions for industrial applications (Sect. 2.5).

2.1 Interactive Data Visualization

Interactive data visualization aims to find suitable visual representations of data, such that important data properties can be effectively and efficiently perceived by users [13, 36, 57]. The idea is to leverage the human visual perception capabilities to perceive large amounts of visual information and to link to cognitive processes (“Using vision to think”) [49]. The integration of interaction techniques [64] allow users to dynamically explore the data by interacting with its visual representations. This way, users can, for instance, verify hypotheses about the data, dynamically select and filter data items, and change visual encodings to reveal patterns.

The capabilities of visualizations are determined by their specific visualization designs that are composed of visual marks and their geometric and appearance properties as their main elements. Marks are essentially geometric shapes of type point, line, area, or three-dimensional marks, as illustrated in Fig. 1a. The appearance of the marks is controlled by visual channels that encode the properties of the marks. The classification of the visual channels was proposed by Bertin [8], who originally called them visual variables. Bertin’s list of visual channels comprised position, size, shape, value, color, orientation, and texture. Later, Mackinlay [34] extended the list with the channel length, angle, slope, area, volume, density, color saturation, color hue, connection, and containment. Figure 1 shows a selection of marks whose appearance is modified by means of visual channels. Users can create visualizations by mapping data dimensions to these marks and channels.

Fig. 1
figure 1

Elementary mark types (a) and geometric and appearance properties (b). By mapping data properties to marks, visualization designs can be constructed. Figures reused from [36] courtesy of Tamara Munzner

Effective visual designs should convey the data as accurately as possible to users, allowing a focus on the most important data features, while scaling with the volume of data (see also Sect. 2.4). Visual design is both a science, in measuring and comparing the effectiveness and efficiency of design, and an art, in creating designs. Over the years, researchers in cartography, statistics, and computer science have formalized perceptual guidelines which a visual designer should consider when defining effective visual representations [8, 23, 34]. Guidelines comprise criteria for visual encoding, perceptually-motivated rankings, and characteristics of the visual channels. Mackinlay, for instance, developed a formal visual encoding language to generate graphical presentations for relational information [34]. He defined expressiveness and effectiveness as the main principles for following in visualization. The expressiveness reflect how accurately a graphical language can encode the desired information. The effectiveness describes how well the graphical language exploits the output medium’s capabilities and the human visual system. Based on studies, the most effective visual channels for interpreting data from visualization include position, length, orientation, area, and depth.

2.2 Integrating Data Visualization with Data Science

Visualization techniques allow users to get an overview of data sets, interactively explore data, and gain insights. Building on visualization techniques, visual analytics or visual data science systems bridge between interactive data visualization on the one hand, and algorithmic approaches for data analysis (data mining, data science) [30, 54, 55] on the other hand. Both components can be integrated to form highly interactive, powerful data analysis systems (see Fig. 2, left part). The goal is to leverage the joint capabilities of analysts, their background knowledge, and the advantages of automatic data analysis such as pattern search, clustering, and classification. The inclusion of Machine Learning techniques into visualization aims to address several goals, including scalability to large data sets that could not be exhaustively inspected by visualization of the raw data. The result is an encompassing analytical process, going from discovering single findings in data, to forming insights and hypotheses about data, and eventually, arriving at actionable results and decision support [44] (see also Fig. 2, right part).

Fig. 2
figure 2

The visual knowledge generation model. It suggests that analysts can obtain knowledge from data by integration of data analysis (modeling) and interactive visualization. The knowledge generation process works by obtaining findings and insights, which can lead to hypotheses and actions. Figure reused from [44] with permission

To date, many data science methods have been proposed—including techniques for data reduction, clustering, classification, and prediction [22]—and combined with visualization to form effective visual analysis systems [19, 33]. Recently, also artificial intelligence (AI)-based methods, in particular deep neural networks, gained much popularity. These methods have been successfully applied in many real-time decision-making use cases, including autonomous driving, recommender systems, and automated diagnosis. AI is attracting a growing interest in the industry too, since it is perceived as a powerful technology for (i) identifying undetected process correlations, (ii) forecasting the production quality, and (iii) performing root-cause analysis of failures or problems. AI-based approaches can show superior results in many learning domains compared to more traditional approaches.

How an AI algorithm operates is, in many cases, a black box. Even for experts, it can be hard to answer why AI algorithms make certain decisions. Usually, this is due to the input of a large number of parameters and complex data structures. Yet, this lack of transparency can be a key problem in industrial practices and may hinder AI-based methods from being actively used as part of real decision-making processes. However, this issue can be addressed by making machine knowledge explainable and understandable. Recent work focuses on two methods to provide explainable AI (xAI): transparency and post-hoc interpretation [61]. The former renders the AI algorithm transparent, hence showing how the algorithm functions internally. The latter provides explanations of its behavior. As a result, practitioners are able to understand, e.g., what the algorithm has predicted and why. However, the biggest challenge for xAI is to provide explanations that are interpretable by humans. Visualization offers many opportunities to help AI-based systems become understandable and explainable, as discussed by Hohman et al. [24].

2.3 Data Types and Characteristics

A key prerequisite to employ visual analytics techniques for practical data analysis is to model the input data. In order to select suitable analysis and visualization techniques, it needs to be clear which types of data are input to the analysis. In the following, we review a selection of data types that are typically found in industrial applications and for which the scientific community has developed a broad variety of visual representations and interaction techniques. For additional details and visualization examples, see the linked references to textbooks and surveys. Note that this selection of data is not intended to be complete but follows a pragmatic approach linking to applications. The distinction of visualization techniques by data type is often used in visualization research, dating back to Shneiderman’s data type by task taxonomy [49].

Tabular data consists of rows and columns. In a simple flat table, each row shows an item, and each column is an attribute describing the item. An item is an entity representing a city, a person, or a shop, for example. An attribute is a specification that can be measured, observed, or logged, such as age, price, and temperature. A combination of a row (=item) and a column (=attribute) represents a cell that contains a value of that pair. However, multidimensional tables have a more complex structure: the data values are ordered in hierarchies [36]. Appropriate sorting of tables is typically required to detect patterns and relationships in the data [6].

Time-dependent data ubiquitously occurs in many applications, for example, when measuring resource consumption over a production cycle, or in sales data. In the context of visualization, the temporal aspect is particularly challenging because of the unique characteristics of time. Time has hierarchical levels of granularity (60 s = 1 h, 24 h = 1 day, etc.) with irregular divisions (a month consists of 28 to 31 days), cyclic patterns (e.g., seasons of a year), and cannot be perceived by humans directly [2]. Depending on the size, dimensionality, and resolution of time series data, again many scalable visualization techniques have been proposed [2]. Examples are time-charts, pixel-oriented, and glyph-based representations.

Spatial data comprises data that is localized in a reference frame of some kind. For instance, geographic maps can encode information on land usage, transportation facilities, or addresses of customers. 2D and 3D shaped information can be used for describing parts to be produced in an assembly line. Spatial data often arises in engineering or simulation tasks to describe the physical properties of a space or product, for example, air turbulence caused by a jet engine design. Note that spatial data can include any specific data type, as long as it is localized. Air turbulence, for instance, can be described as a time-dependent flow (or vector) field. Spatially moving objects give rise to trajectories, the visual analysis of which is an important application [39],

Graphs (or networks) represent data by nodes, links connecting the nodes, and descriptive data associated with both. They are an important model to represent structured information. Examples in industrial applications include flows of input factors, product hierarchies, and marketing networks. Visual analysis techniques for graphs include representations as node-link diagrams based on various layout techniques and more abstract representations, including adjacency matrices [31, 40]. Many analysis tasks in real-world graphs depend on analyzing and exploring changes in both the topology of the graph and data attributes associated with the nodes and edges [40]. Dynamic graph visualization is required to address this aspect [4].

Textual data may stem from textual communication, reporting, feedback captured from customers, technical specifications of products and patents, etc. Text visualization techniques [27] tackle the challenging problem of identifying key information in a large text collection. Different analysis perspectives are possible, focusing on text features, text structure, or names and entities, for example.

Image, video, and audio data also regularly occur in industrial processes, e.g., during quality control for optical, X-ray, ultrasonic material inspection, or visual surveillance of production processes. Appropriate techniques, e.g., from multimedia signal processing and computer vision, can be applied to transform the input data to one or several of the above data types, for example, to represent numeric measurements or events. For an overview of video visualization methods, see the survey by Borgo et al. [9].

In practice, data analysis often requires integrating data of different types, sources, and qualities. In many cases, data transformations are needed to apply data analysis and visualization, e.g., to describe customer transactions by selected numeric features by which they can be grouped and compared. In the following, we review important challenges in the context of visual data science.

2.4 Challenges in Creating Visual Data Analysis Applications

We next discuss a set of important challenges pertaining to data properties and complexity of user tasks in visual data analysis.

The Three (or four) V’s of (big) Data

As discussed in Sect. 2.3, application data often comes in different forms and in large volumes, often referred to as big data. It is a term that is not only used by domain experts, but that has also entered the mainstream vocabulary. In fact, big data has become the subject and driving factor of enormous importance in both academia and industry. The first association to big data is often related to the data volume, although it covers a much broader set of aspects. Even though many definitions have been proposed over the years, the most established one is the 3 Vs of big datavolume, velocity, and variety—formulated by the Gartner Group [32].

The first V, data volume, is a moving target. As technology develops, we continue to increase the volume of data we collect. Scientists and data science practitioners have for many decades been faced with the challenge of making sense of more data than they can process using the methods and technologies available. This implies that a specific visualization will most likely not be able to display the data at the highest level of detail on the given output device. Consequently, data aggregation and filtering methods are needed to deal with this aspect.

The second V, data velocity, refers to the speed at which the data is generated and needs to be processed. For instance, analyzing live streaming data from production machines or social media is more difficult than processing hourly measurements acquired by weather stations stored in a static file.

The third V, data variety, addresses data heterogeneity in terms of type, source, and format. Detecting patterns by automatically clustering a homogeneous table with thousands of rows and columns might be more straightforward than finding correlations in a small but heterogeneous table in which the columns have different semantics and attribute types. The problem becomes even more difficult when one needs to make sense of multiple interconnected datasets—possibly even mixing various data types. An example is high-dimensional production data that needs to be investigated along the production process.

In addition, data veracity may be considered a fourth aspect—extending the definition to become the 4 Vs of Big Data. It pertains to the quality of the data, e.g., precision and completeness of measurements, as well as its trustworthiness and origin. It is another decisive factor, as only analysis of accurate and relevant data can lead to relevant insights and decisions.

User Tasks and Personalization

Besides technical challenges in data visualization and analysis, there is also the challenge of supporting the many different possible user tasks (or goals) in data analysis. Models of the visual data analysis process identify a variety of user goals in understanding data [36], e.g., identifying trends and outliers, classification, prediction, comparison and many more. In addition, the background knowledge and expertise of users may vary. Hence, visual data analysis systems must be flexible and support different tasks and types of users. To this end, it is interesting to develop adaptive systems. These custom visual analytics systems build upon the idea that the choice of visual representation depends to a large extent on the visual perception and interests of the user. Visual data analysis design considering user’s preferences and interest is a research topic that receives increasing attention in recent years. Current research mainly focuses on using either explicit user feedback [5, 10, 38] provided in the form of ratings (representing user’s visual preferences) or tags (representing user’s topic of needs), or gaze movements [48, 50, 51], to help identify visualizations that best address the users’ preferences, expertise, and tasks.

User Guidance

Visual data science approaches often require a significant level of visual and data analytical skills from the user. However, users may not possess such skills right away, and hence have difficulties while analyzing the data. Also, domain experts within a specific area may have little or no knowledge about visual data analysis. Therefore, companies or institutes employ data analysts with visualization and analysis skills, but little domain knowledge in manufacturing processes, which as a result might impede the decision-making process. One possible solution to tackle this issue is to guide user throughout the data exploration process. Ceneda et al. [14, 15] define guidance as a “computer-assisted process that aims to actively resolve a knowledge gap during an interactive visual analytics session”. The key factor in guided analytics is to exactly figuring out what the users’ needs and preferences are and which steps to take to address them. In this context, guidance can be provided to recommend the appropriate visualizations and analytical steps [37].

2.5 Under the Hood: Visual Data Science Infrastructure

Technology Stack

As in any sector, technology is moving fast. So any discussion of specific tools and libraries will be outdated quickly. Therefore, in this section our aim is to briefly outline the technology stack to help readers better understand the spectrum that ranges from using off-the-shelf tools, through employing declarative libraries, to using programming languages for creating tailored visual analysis tools designed for a specific, well-defined purpose. Figure 3 illustrates the technology stack. Off-the-shelf tools at the top of the stack, such as: Tableau,Footnote 1 Microsoft Power BI,Footnote 2 and Spotfire,Footnote 3 have the advantage that they are easy to use and do not require programming skills by domain experts. However, they are limited in the sense that they support a fixed set of data types, visualization techniques, and analytical capabilities—although extension and plug-in mechanisms alleviate this limitation. Behrisch et al. [7] provide a comprehensive overview of commercial visual analysis tools, evaluating the performance, available features, and usability.

An alternative to using off-the-shelf tools is to employ static and interactive plotting libraries, as part of Notebook environments such as Jupyter NotebookFootnote 4 or R Markdown,Footnote 5 for instance. Example libraries are VegaFootnote 6 and Vega-lite,Footnote 7 Chart.js,Footnote 8 and plot.ly.Footnote 9 Finally, making use of high-level programming libraries for creating specialized visual analysis tools is the most expressive option, but also the one that requires the highest effort and skills. For instance, D3.jsFootnote 10 is a popular JavaScript library that allows developers to flexibly create web-based visualizations.

Fig. 3
figure 3

The visual analysis technology stack with off-the-shelf tools at the top, declarative static and interactive visualization libraries in the middle, and high-level programming libraries at the bottom. While the ease of use and skills required increase from the bottom to the top, the expressiveness and tailoring possibilities decrease. The figure is adapted from Jeff Heer’s keynote given at the OpenVis Conference in 2015

General Purpose vs. Tailored Tools

General purpose visualization tools are effective for answering a broad set of analysis questions. In contrast, answering (domain-)specific questions often requires tailored tools that are designed for a small set of specialized users. This is frequently the case for ill-defined domain-specific problems that need to be investigated by means of interrelated, heterogeneous datasets, as often encountered in data-driven sciences. Visual analytics researchers can contribute to the solving of domain-specific research questions by designing and building tailored visual analysis solutions and tools. Figure 4 illustrates the relationship between the type of questions to be asked and the number of potential users. The more specific the questions are, the lower the number of users that can benefit from the tool becomes. Our advice is to rely on proved and tested general purpose tools. Nevertheless, if that is not possible and the problem to be solved is important enough, it is worth to invest time and money in creating highly specialized domain-specific solutions.

Fig. 4
figure 4

Relationship between the type of analysis questions and the number of users (adapted from [45]). The more specific the question, the lower the number of potential users. General-purpose tools are designed for answering general questions, while customized visualization tools are able to address specific questions that are only relevant for a small set of highly specialized domain experts

Dashboards: Multiple Coordinated Views

An important requirement for the visual analysis of heterogeneous data is that the analyst is able to evaluate, compare, and interpret related data subsets shown in various visual representations and at different levels of granularity (i.e., complete datasets, groups of items, or single items). Multiple Coordinated Views (MCV) [43] is an established and powerful concept that addresses this requirement by linking multiple juxtaposed views. Nowadays, this concepts is colloquially referred to as dashboards.

The coordination of views refers to the principle that operations triggered in one view are immediately reflected in all other views. This coordination can concern data operations, such as filters and selections that result in a synchronized highlight of items (known as linking & brushing) or synchronized view operations, such as pan, rotate, and zoom. The views to be linked can show the same data subset using different visualization techniques, different data subsets encoded by the same visualization technique, or combinations thereof—also at various levels of granularity.

Dashboards are an integral part of many state-of-the-art visual analysis tools. However, designing effective dashboards is not a trivial task and many different factors need to be considered for this purpose [20, 46]. In the following section, we discuss selected visual analysis solutions that are specifically tailored to the needs of real-world industrial use cases—going beyond standard off-the-shelf dashboards.

3 Selected Visual Data Science Approaches for Industrial Data Analysis

Visual data science methods are increasingly applied to solve problems in many domains and disciplines. Stakeholders from industry also show strong interest in these methods, and researchers have started to develop concepts and applications for visual analysis of industrial data. An overview of a number of techniques is provided by Zhou et al. [65]. The authors group the approaches by industrial sectors (automotive, energy, etc.), and phases of the process (production, service, etc.) and discuss a number of representative works.

In this section, we also give an exemplary overview of approaches. To this end, we chose a number of key industrial operation tasks, and for each one, we selected exemplary approaches from the literature and our own research. While we cannot claim completeness of the operations or approaches, we have striven to achieve a representative overview. Operations partly overlap and several of the approaches could be applied to more than one operation. For example, production planning operations also depend on and influence condition monitoring and quality control operations. Research in this area is active and the space of known solutions is steadily growing.

3.1 Production Planning

In production planning, the goal is to plan resource allocation to provide efficient production or service, subject to dependencies and constraints. In one application example from the metal industry, Wu et al. [58] propose to use abstract graphical elements to represent the smelting furnace and heating oven for metal ingots casting in order to support the engineers involved to achieve a better understanding of the synchronous relationship of scheduled capacity and the load between these two components. Jo et al. [29] focus on visualizing manufacturing schedules (i.e., plans to manufacture a product) used in semiconductor facilities. They use LiveGantt, a novel interactive schedule visualization tool that supports temporal filtering, product filtering, and resource filtering to help users explore large and highly concurrent production schedules from various perspectives. Although a case study demonstrated the efficacy of LiveGantt, it suffers from scalability issues when applied to large manufacturing schedules. To tackle this issue, the authors proposed more advanced visualization techniques, such as horizon graphs.

ViDX [62] is a visual analytics system that visualizes the processing time and status of work stations in automatic assembly lines, allowing production planners to explore the production data for identifying inefficiencies, locating anomalies, and defining hypotheses about their causes and effects. Their solution maps the production lines and dependencies to a dense flow chart enriched with time series and metadata on the production (see Fig. 5a). VIDX scales well to real-time for small data volumes, but not to year long data. Moreover, the visualizations are not responsive and do not easily adapt to different user devices.

Fig. 5
figure 5

Example visual analytics approaches for selected industrial data analysis problems. (a) Visual analysis of assembly sequences. Figure reused from [62] with permission. (b) The PAVED system [16] uses interactive parallel coordinate plots for the exploration of engineering solutions in a multi-criterion optimization process. The approach resulted from a design study with domain experts. Figure courtesy of Lena Cibulski

A common problem in production planning is that there are typically multiple objectives (goals) given, e.g., cost, time, and quality, but trade-offs exist which prevent all goals from being optimized simultaneously. Hence, production planners need to decide on a single solution from the efficient solution space (in Pareto set). The PAVED system [16] was created following a design study in the motor construction industry. The approach supports the visual exploration of solutions, using multidimensional data visualization and appropriate interaction facilities (see Fig. 5b).

3.2 Quality Control

Inspection of product and service quality is another important task that needs to be done regularly to ensure reliable output. As an example from the metal industry, the ADAM system [28] provides visual analysis of quality properties of aluminum plates produced by a casting and rolling process. The quality is measured by ultrasonic analysis of the metal plates, which record indications of inclusions in the metal by position and size. The ADAM tool represents the inclusion data in a scatter plot (see Fig. 6, right) from which densities and distributions of inclusions can be readily perceived. The interface allows to efficiently browse through large data sets and compare inclusion distributions for different production runs. ADAM also enables users to compare inclusion analysis results with production parameters, such as alloy recipe and cast control parameters. To this end, an array of data displays (see Fig. 6, left) are linked to the inclusion view, and interactive selection and highlighting allow users to search for outstanding patterns and possible correlations. However, ADAM has not been fully evaluated yet. Therefore, we cannot drive any conclusions about the efficacy and scalability of this tool.

Fig. 6
figure 6

Inspection of unwanted material inclusions for product quality monitoring (right) and identification of possible influencing factors (left) in the ADAM system [28]. Figure courtesy of Nikolina Jekic

3.3 Equipment Condition Monitoring

Equipment monitoring plays an important role in industrial applications and refers to observing a process, system, or machine with the goal to guarantee its expected functioning. Pure monitoring tasks allow operators or other personnel to inspect live streaming data coming from sensors or other data sources. Monitoring solutions frequently present the latest data in the context of historical data or provide predictions to assist users in judging or planning future operations (see section on predictive maintenance below). As an example, Wu et al. [59] present an interactive visual analytics system with a semi-supervised framework that supports equipment condition monitoring (see Fig. 7a). The idea is two-fold. Monitoring of operations is supported by visualizing the correlations of sensors, where changes can be noted in near-real-time and can inform domain experts in an exploratory way. Furthermore, this also involves a semi-supervised approach that learns about normal states of operations from user labeling of production sensor data. A classifier is trained, which can report deviations from the normal situation, again informing condition monitoring experts, or potentially triggering detection measures. However, the tool can face scalability issues when there are too many sensors. To tackle this issue, the authors propose to first group sensors into modules, then modules into super modules, and encode the statistical information of these super modules. Only one of the super modules will be visualized when selected by the user.

Anomaly detection is an important aspect of monitoring the condition of equipment or the whole system. Currently, the challenge of anomaly detection is to specify possible anomalies in advance, since its detection is situation-dependent, and previously unknown or unexpected anomalies may occur. Anomalies frequently need to be detected in real-time. Dutta et al. [17] use a comparative visualization technique to analyze the spatio-temporal evolution (variations of distributions, statistically anomalous regions in the data) of rotating stall in a jet engine simulation. They use a heat map to visualize the anomaly detection results and a 2D plot to show the evolution of anomalous regions of the jet engine stall, both allowing the engineers to verify the performance of the jet engine design prototype and improve its structure. Janetzko et al. [26] provide a visual analytics tool to report on anomalies found in multi-variate industrial energy consumption data and guide the user to the important time points. The algorithm used to detect the anomalies does not require computationally expensive calculations, which makes it possible to recognize sudden unexpected changes in power consumption. Likewise, Maier et al. [35] present a visual anomaly detection approach that guides the user in detecting anomalies found in time series of production plants. To reduce the users’ workload in identifying the anomalies, the dimensionality of the dataset is reduced using principal component analysis. While this approach reduces the information space to the most important dimensions, it may also result in not all anomalies being represented in the visualization.

Sedelmair et al. [47] proposed a visual analytics tool for the automotive industry which combines visualization techniques with anomaly detection algorithm that allows engineers to explore anomalies in messages from the in-car communication network. Although effective, a field study has revealed scalability issues of the tool that hinder its use by engineers in their daily work. In our own previous work, we considered anomaly detection in multivariate times series of engine test bed cycles [52]. Anomalies are declared if the measurements in one cycle differ, more than a certain degree, from expected values based on previous cycles. Various anomaly detection methods are implemented. Users can interactively explore the detections on multiple levels of detail, including cycle glyphs and correlation matrices (see Fig. 7b, top), as well as line-chart-based details (see Fig. 7b, bottom). However, plotting up to thousand of cycles can be an issue for glyph and matrix representations. To tackle this scalability issue, advanced filtering and reordering techniques could be used. Also, in order to include the users’ preferences and interests in this selection process, methods for preference elicitation could be implemented.

A common technique in condition testing are acoustic testing procedures. The IRVINE system [18] is based on acoustic analysis of test objects, e.g., engines during analysis in a testbed. The system visualizes obtained spectrograms, and allows to compare these across different tests to find deviations which may explain product errors. IRVINE features an annotation tool which allows users to record observations for future reference. In addition, the system provides cluster analysis to cope with increasingly large amounts of acoustic test data, supporting scalability. Figure 8 illustrates the interactive views provided.

Another relevant work conducted in the area on condition monitoring is proposed by Post et al. [42], who provide a series of interactive visualizations that shows the process data generated by a complex production system. In addition, the visualizations are used to highlight the bottlenecks or excess machining capacities, hence guiding the user to interesting locations and events. The guidance is particularly important, as it helps users to focus on single products or machines and readily identify the critical issues that affect them.

Fig. 7
figure 7

(a) Equipment condition monitoring based on machine state classification. Figure reused from [59] with permission. (b) Anomaly detection in sensor data streams using glyphs, a correlation matrix, and line chart inspection for details [52]. Figure courtesy of Josef Suschnigg

3.4 Predictive Maintenance

Predictive maintenance (PdM) refers to monitoring the performance and condition of equipment and machines in industrial environments during the normal production process and implementing methods to reduce malfunctions, failures, and errors. In order to detect such failures and performance issues, PdM uses condition monitoring tools that provide early warnings of fault or degradation (see Sect. 3.3). As an application example, the design limitations of complex products, such as products in aerospace, are usually found after the physical prototype is manufactured or is in use. It stands to reason that this causes delays and high costs in production. To tackle this issue, Peng et al. [41] and Abate et al. [1] propose visual analytics frameworks that are designed to help product designers and maintainability technicians to simulate and evaluate the product life cycle. As a result, the product designer can iteratively adjust the product schemes and hence reduce the development cycle time and costs. Peng et al. present a systematic approach of a visualization system that lacks an evaluation with end users. Abate et al. [1] present a virtual reality (VR) system that supports interactions with the environment in which maintenance activities can be simulated. The usability and the usefulness of the system has been assessed with a user study, revealing increased user performance and satisfaction when performing a maintenance task when using the proposed system. Canizo et al. [11] provide methods to predict failures on wind turbines. The methods are executed in a cloud computing environment, to tackle the scalability issues and make predictions in real-time. Given that each wind farm has its own configuration and turbines, Canizo et al. further provide a visualization dashboard that visualizes the geographical location of the turbines together with the notification about the predictions and the status information of the turbine in real-time. Finally, Wörner et al. [60] propose a visual analytics tool that visualizes diagnostic machine data from the manufacturing domain, hence helping experts to judge whether or not specific parts or elements of the machines are behaving as expected or need to be repaired or replaced. The system has only been tested with a small data set and is therefore difficult to assess with regard to its scalability and usefulness.

3.5 Causality Analysis

Machine learning is being used increasingly for industrial scenarios to investigate the statistical associations between the production parameters for generating predictions or event forecasts. Although powerful, the method often fails to answer the critical question “What influences X?” [21]. Such questions can be answered with causality tools. Nevertheless, the casual structure of a production process is often too complex for users to follow and understand by only looking at the outcome of the algorithms. The visualization community aims to address this problem by providing tools that help users to easily perceive and understand the complex structure of causal relations. A commonly used tool to display the causal relations between parameters is a directed graph [25, 63], whereas the current research showed promising results for interactive path diagrams and parallel coordinates to be used to expose the flow of causal sequences [56] (see Fig. 8). Graph visualizations, path diagrams, and parallel coordinates are common visualization techniques for visualizing high-dimensional data. As a result, they achieve higher efficiency and scalability for visualizing causal relationships between parameters when compared to other visualization techniques.

Fig. 8
figure 8

(a) The IRVINE system [18] supports inspecting large amounts of acoustic test data, over-viewing clusters of acoustic profiles of test objects. Test engineers can compare acoustic profiles, annotate observations for hints of errors, and compare measurement details. Figure courtesy of Joscha Eirich. (b) The causal structure investigator interface with interactive path diagrams (see mark b) for visualizing the causal relations, and parallel coordinates (see mark c) for observing data partitions and identifying causal models potentially hidden in the data. Figure reused from [56] with permission

4 Conclusions and Future Directions

We gave a compact overview of goals, fundamental concepts, and existing solutions in visual data science in the light of industrial applications. Such approaches integrate computational data analysis methods with interactive data visualization, aiming to support data understanding and pattern discovery. There are many application opportunities in industrial data analysis, pertaining to tasks such as production planning, equipment condition monitoring, quality control, and anomaly detection. We discussed the main data types and challenges in data acquisition. By means of application examples, we highlighted selected results from the state of the art, and closed with a selection of promising future directions.

There are many opportunities to leverage the potential of visual data science approaches for industrial data analysis problems. In order to advance in this field, many research challenges sill remain to be tackled. In the context of generic challenges in visual data science and application opportunities, we refer to a recent overview by Andrienko et al. [3].

In the following, we outline a number of challenges specific to industrial visual data analysis. First, we observe that the trend to Industry 4.0 is characterized by continuous digitalization of industrial operations. Notably, there is a continuous process to add new data sources for monitoring production and operation. Hence, when developing visual data science applications for industry, we cannot work with a stable set of data sources, but need to be able to flexibly integrate additional data types into running processes. This means that agile development environments are needed, and long-lasting research planning processes may not be applicable. We observe from our project work that, due to continuous integration of data sources and sensors, data heterogeneity is increasing in terms of data formats, sizes, resolutions, and qualities. While industrial standards already exist, data heterogeneity and complexity is expected to remain a permanent challenge due to the heterogeneity of equipment manufacturers.

Data ownership and governance problems may also occur in projects. For example, both equipment manufacturers and operators are interested in obtaining and analyzing equipment data. However, access to the data may be complicated due to competition and data privacy relationships.

In addition to data-driven approaches for understanding industrial processes, rich knowledge also exists in the form of industrial process management engines, containing domain-specific production and engineering knowledge as production programs and historic production records. In our opinion, how to link and integrate process engines and operational data analysis applications is a challenge that offers promising application potential. For more discussion on this, we refer to the work by Thalmann et al. [53].