Applying AI in Practice: Key Challenges and Lessons Learned

Fischer, Lukas; Ehrlinger, Lisa; Geist, Verena; Ramler, Rudolf; Sobieczky, Florian; Zellinger, Werner; Moser, Bernhard

doi:10.1007/978-3-030-57321-8_25

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12279))

Included in the following conference series:

International Cross-Domain Conference for Machine Learning and Knowledge Extraction

4687 Accesses
8 Citations
1 Altmetric

Abstract

The main challenges along with lessons learned from ongoing research in the application of machine learning systems in practice are discussed, taking into account aspects of theoretical foundations, systems engineering, and human-centered AI postulates. The analysis outlines a fundamental theory-practice gap which superimposes the challenges of AI system engineering at the level of data quality assurance, model building, software engineering and deployment.

Special thanks go to A Min Tjoa, former Scientific Director of SCCH, for his encouraging support in bringing together data and software science to tackle the research problems discussed in this paper. The research reported in this paper has been funded by BMK, BMDW, and the Province of Upper Austria in the frame of the COMET Programme managed by FFG.

You have full access to this open access chapter, Download conference paper PDF

Chapter 13 Engineering AI Systems

Data-Centric Artificial Intelligence

Article Open access 05 March 2024

Advances, challenges and opportunities in creating data for trustworthy AI

Article 17 August 2022

Keywords

1 Introduction

Many real-world tasks are characterized by uncertainties and probabilistic data that is hard to understand and hard to process for humans. Machine learning and knowledge extraction [46] help turning this data into useful information for realizing a wide spectrum of applications such as image recognition, scene understanding, decision-support systems, etc. that enable new use cases across a broad range of domains.

The success of various machine learning methods, in particular Deep Neural Networks (DNNs), for challenging problems of computer vision and pattern recognition, has led to a “Cambrian explosion” in the field of Artificial Intelligence (AI). In many application areas, AI researchers have turned to deep learning as the solution of choice [54, 97]. A characteristic of this development is the acceleration of progress in AI over the last decade, which has led to AI systems that are strong enough to raise serious ethical and societal acceptance questions. Another characteristic of this development is the way how such systems are engineered. Above all, there is an increasing interconnection of traditionally separate disciplines such as data analysis, model building and software engineering. In particular, data-driven AI methods such as DNNs allow data to shape models and software systems that operate them. System engineering of AI-driven software therefore faces novel challenges at all stages of the system lifecycle [51]:

Key Challenge 1: AI intrinsic challenges due to peculiarities or shortcomings of today’s AI methods; in particular, current data-driven AI is characterized by:
- data challenge in terms of quality assurance and procurement;
- challenge to integrate expert knowledge and models;
- model integrity and reproducibility challenge due to unstable performance profiles triggered by small variations in the implementation or input data (adversarial noise);
Key Challenge 2: Challenges in the process of AI system engineering ranging from requirements analysis and specification to deployment including
- testing, debugging and documentation challenges;
- challenge to consider the constraints of target platforms at design time;
- certification and regulation challenges resulting from highly regulated target domains such as in a bio-medical laboratory setting;
Key Challenge 3: Interpretability and trust challenge in the operational environment, in particular
- trust challenge in terms of lack of interpretability and transparency by opaque models;
- challenge posed by ethical guideline;
- acceptance challenge in terms of societal barriers to AI adoption in society, healthcare or working environments;

2 Key Challenges on System Engineering Posed by Data-Driven AI

2.1 AI Intrinsic Challenges

There are peculiarities of deep learning methods that affect the correct interpretation of the system’s output and the transparency of the system’s configuration.

Lack of Uniqueness of Internal Configuration: First of all, in contrast to traditional engineering, there is a lack of uniqueness of internal configuration causing difficulties in model comparison. Systems based on machine learning, in particular deep learning models, are typically regarded as black boxes. However, it is not just simply the complex nested non-linear structure which matters as often pointed out in the literature, see [86]. There are mathematical or physical systems which are also complex, nested and non-linear, and yet interpretable (e.g., wavelets, statistical mechanics). It is an amazing, unexpected phenomenon that such deep networks become easier to be optimized (trained) with an increasing number of layers, hence complexity, see [100, 110]. More precisely, to find a reasonable sub-optimum out of many equally good possibilities. As consequence, and in contrast to classical engineering, we lose uniqueness of the internal optimal state.

Lack of Confidence Measure: A further peculiarity of state of the art deep learning methods is the lack of confidence measure. In contrast to Bayesian based approaches to machine learning, most deep learning models do not offer a justified confidence measure of the model’s uncertainties. E.g., in classification models, the probability vector obtained in the top layer (predominantly softmax output) is often interpreted as model confidence, see, e.g., [26] or [35]. However, functions like softmax can result in extrapolations with unjustified high confidence for points far from the training data, hence providing a false sense of safety [39]. Therefore, it seems natural to try to introduce the Bayesian approach also to DNN models. The resulting uncertainty measures (or, synonymously, confidence measures) rely on approximations of the posterior distribution regarding the weights given the data. As a promising approach in this context, variational techniques, e.g., based on Monte Carlo dropout [27], allow to turn these Bayesian concepts into computationally tractable algorithms. The variational approach relies on the Kullback-Leibler divergence for measuring the dissimilarity between distributions. As a consequence, the resultant approximating distribution becomes concentrated around a single mode, underestimating the uncertainty beyond this mode. Thus, the resulting measure of confidence for a given instance remains unsatisfactory and there might be still regions with misinterpreted high confidence.

Lack of Control of High-Dimensionality Effects: Further, there is the still unsolved problem of lack of control of high-dimensionality effects. There are high dimensional effects which are not yet fully understood in the context of deep learning, see [31] and [28]. Such high-dimensional effects can cause instabilities as illustrated, for example, by the emergence of so-called adversarial examples, see e.g. [3, 96].

2.2 AI System Engineering Challenges

In a data-driven AI systems there are two equally consequential components: software code and data. However, some input data are inherently volatile and may change over time. Therefore, it is important that these changes can be identified and tracked to fully understand the models and the final system. To this end, the development of such data-driven systems has all the challenges of traditional software engineering combined with specific machine learning problems causing additional hidden technical debts [87].

Theory-Practice Gap in Machine Learning: The design and test principles of machine learning are underpinned by statistical learning theory and its fundamental theorems such as Vapnik’s theorem [99]. The theoretical analysis relies on idealized assumptions such as that the data are drawn independent and identically distributed from the same probability distribution. As outlined in [81], however, this assumption may be violated in typical applications such as natural language processing [48] and computer vision [106, 108].

This problem of data set shifting can result from the way input characteristics are used, from the way training and test sets are selected, from data sparsity, from shifts in data distribution due to non-stationary environments, and also from changes in activation patterns within layers of deep neural networks. Such a data set shift can cause misleading parameter tuning when performing test strategies such as cross-validation [58, 104].

This is why engineering machine learning systems largely relies on the skill of the data scientist to examine and resolve such problems.

Data Quality Challenge: While much of the research in machine learning and its theoretical foundation has focused on improving the accuracy and efficiency of training and inference algorithms, less attention has been paid to the equally important practical problem of monitoring the quality of the data supplied to machine learning [6, 19]. Especially heterogeneous data sources, the occurrence of unexpected patterns, and a large number of schema-free data pose additional problems for data management which directly impact data extraction from multiple sources, data preparation, and data cleansing [7, 84].

For data quality issues, the situation is similar to the detection of software bugs. The earlier the problems are detected and resolved, the better for model quality and development productivity.

Configuration Maintenance Challenge: ML system developers usually start from ready-made, pre-trained networks and try to optimize their execution on the target processing platform as much as possible. This practice is prone to the entanglement problem [87]: If changes are made to an input feature, the meaning, weighting, or use of the other features may also change. This means that machine learning systems must be designed so that feature engineering and selection changes are easily tracked. Especially when models are constantly revised and subtly changed, the tracking of configuration updates while maintaining the clarity and flexibility of the configuration become an additional burden.

Deployment Challenge: The design and training of the learning algorithm and the inference of the resulting model are two different activities. The training is very computationally intensive and is usually conducted on a high performance platform [103]. It is an iterative process that leads to the selection of an optimal algorithm configuration, usually known as hyperparameter optimization, with accuracy as the only major goal of the design [105]. While the training process is usually conducted offline, inference very often has to deal with real-time constraints, tight power or energy budgets, and security threats. This dichotomy determines the need for multiple design re-spins (before a successful integration), potentially leading to long tuning phases, overloading the designers and producing results highly depending on their skills. Despite the variety of resources available, optimizing these heterogeneous computing architectures for performing low-latency and energy-efficient DL inference tasks without compromising performance is still a challenge [5].

2.3 Interpretability and Trust Challenge

In contrast to traditional computing, AI can now perform tasks that previously only humans were able to do. As such it contains the possibility to revolutionize every aspect of our society. The impact is far-reaching. First, with the increasing spread of AI systems, the interaction between humans and AI will increasingly become the dominant form of human-computer interaction [1]. Second, this development will shape the future workforce. PwC^{Footnote 1} predicts a relatively low displacement of jobs (around 3%) in the first wave of AI, but this could dramatically increase up to 30% by the mid-2030’s. Therefore, human centered AI has started coming to the forefront of AI research based on postulated ethical principles for protecting human autonomy and preventing harm. Recent initiatives at national^{Footnote 2} and supra-national^{Footnote 3} level emphasize the need for research in trusted AI.

Interpretability Challenge: Essential aspects of trusted AI are explainability and interpretability. While interpretability is about being able to discern the mechanics without necessarily knowing why. Explainability is being able to quite literally explain what is happening, for example, by referring to mechanical laws. It is well known that the great successes of machine learning in recent decades in terms of applicability and acceptance are relativized by the fact that they can be explained less easily with increasing complexity of the learning model [44, 60, 90]. Explainability of the solution is thus increasingly perceived as an inherent quality of the respective methods [9, 15, 33, 90]. Particularly in the case of deep learning methods attempts to interpret the predictions made using parameters fail [33]. The necessity to obtain not only increasing prediction accuracy but also the interpretation of the solutions determined by ML or Deep Learning arises at the latest with the ethical [10, 76], legal [13], psychological [59], medical [25, 45], and sociological [111] questions tied to their application. The common element of these questions is the demand to clearly interpret the decisions proposed by artificial intelligence (AI). The complex of problems that derives from this aspect of artificial intelligence for explainability, transparency, trustworthiness, etc. is generally described with the term Explainable Artificial Intelligence, synonymously “Explainable AI” or “XAI”. Its broad relevance can be seen in the interdisciplinary nature of the scientific discussion that is currently taking place on such terms as interpretation, explanation and refined versions such as causability and causality in connection with AI methods [30, 33, 42, 43].

Trust Challenge: In contrast to Interpretability, trust is a much more comprehensive concept. Trust is linked to the uncertainty about a possible malfunctioning or failure of the AI system as well as to circumstances of delegating control to a machine as a “black box”. Predictability and dependability of AI technology as well as the understanding of the technology’s operations and the intentions of its creators are essential drivers of trust [12]. Particularly, in critical applications the user wants to understand the rationale behind a classification, and under which conditions the system is trustful and when not. Consequently, AI systems must make it possible to take these human needs of trust and social compatibility into account. On the other hand, we have to be aware of limitations and peculiarities of state of the art AI systems. Currently, the topic of trusted AI is discussed in different communities at different levels of abstraction:

in terms of high level ethical guidelines (e.g. ethics boards such as algorithmwatch.org^{Footnote 4}, EU’s Draft Ethics Guidelines^{Footnote 5});
in terms of regulatory postulates for current AI systems regarding e.g. transparency (working groups on standardization, e.g. ISO/IEC JTC 1/SC 42 on artificial intelligence^{Footnote 6});
in terms of improved features of AI models (above all by explainable AI community [34, 41]);
in terms of trust modeling approaches (e.g. multi-agent systems community [12]).

In view of the model-intrinsic and system-technical challenges of AI that have been pointed out in the Sects. 2.1 and 2.2, the gap between the envisioned high-level ethical guidelines of human-centered AI and the state of the art of AI systems becomes evident.

3 Approaches, In-Progress Research and Lessons Learned

In this section we discuss ongoing research facing the outlined challenges in the previous section, comprising:

(1)
Automated and Continuous Data Quality Assurance, see Sect. 3.1;
(2)
Domain Adaptation Approach for Tackling Deviating Data Characteristics at Training and Test Time, see Sect. 3.2;
(3)
Hybrid Model Design for Improving Model Accuracy, see Sect. 3.3;
(4)
Interpretability by Correction Model Approach, see Sect. 3.4;
(5)
Software Quality by Automated Code Analysis and Documentation Generation, see Sect. 3.5;
(6)
The ALOHA Toolchain for Embedded Platforms, see Sect. 3.6;
(7)
Human AI Teaming as Key to Human Centered AI, see Sect. 3.7.

3.1 Approach 1: Automated and Continuous Data Quality Assurance

In times of large and volatile amounts of data, which are often generated automatically by sensors (e.g., in smart home solutions of housing units or industrial settings), it is especially important to, (i), automatically, and, (ii), continuously monitor the quality of data [22, 88]. A recent study [20] shows that the continuous monitoring of data quality is only supported by very few software tools. In the open-source area these are Apache Griffin^{Footnote 7}, MobyDQ^{Footnote 8}, and QuaIIe [21]. Apache Griffin and QuaIIe implement data quality metrics from the reference literature (see [21, 40]), whereby most of them require a reference database (gold standard) for calculation. MobyDQ, on the other hand, is rule-based, with the focus on data quality checks along a pipeline, where data is compared between two different databases. Since existing open-source tools were insufficient for the permanent measurement of data quality within a database or a data stream used for data analysis and machine learning, we developed the Data Quality Library (DaQL). DaQL allows the extensive definition of data quality rules, based on the newly developed DaQL language. These rules do not require reference data and DaQL has already been used for a ML application in an industrial setting [19]. However, to ensure their validity, the rules for DaQL are created manually by domain experts.

Lesson Learned: In literature, data quality is typically defined with the “fitness for use” principle, which illustrates the high contextual dependency of the topic [11, 102]. Thus, one important lesson learned is the need for more research into the automated generation of domain-specific data quality rules. In addition, the integration of contextual knowledge (e.g., the respective ML model using the data) needs to be considered. Here, knowledge graphs pose a promising solution, which indicates that knowledge about the quality of data is part of the bigger picture outlined in Approach (and lesson learned) 7: the usage of knowledge graphs to interpret the quality of AI systems. In addition to the measurement (i.e., detection) of data quality issues, we consider research into the automated correction (i.e., cleansing) of sensor data as additional challenge [18]. Especially since automated data cleansing poses the risk to insert new errors in the data (cf. [63]), which is specifically critical in enterprise settings.

3.2 Approach 2: The Domain Adaptation Approach for Tackling Deviating Data Characteristics at Training and Test Time

In [106] and [108] we introduce a novel distance measure, the so-called Centralized Moment Discrepancy (CMD), for aligning probability distributions in the context of domain adaption. Domain adaptation algorithms are designed to minimize the misclassification risk of a discriminative model for a target domain with little training data by adapting a model from a source domain with a large amount of training data. Standard approaches measure the adaptation discrepancy based on distance measures between the empirical probability distributions in the source and target domain, i.e., in our setting this means training time and test time, respectively. In [109] we can show that our CMD approach, refined by practice-oriented information-theoretic assumptions of the involved distributions, yield a generalization of the conditions of Vapnik’s theorem [99].

As a result we obtain quantitative generalization bounds for recently proposed moment-based algorithms for unsupervised domain adaptation which perform particularly well in many practical tasks [74, 95, 106,107,108].

Lesson Learned: It is interesting that moment-based probability distance measure are the most weakest among those utilized in the machine learning and, in particular, domain adaptation. Weak in this setting means that convergence by the stronger distance measures entails convergence of the weaker. Our lesson learned is that a weaker distance measure can be more robust than stronger distance measures. At the first glance, this observation might appear counter-intuitive. However, at a second look, it becomes intuitive that the minimization of stronger distance measures are more prone to the effect of negative transfer [77], i.e. the adaptation of source-specific information not present in the target domain. Further evidence can be found in the area of generative adversarial networks where the alignment of distributions by strong probability metrics can cause problems of mode collapse which can be mitigated by choosing weaker similarity concepts [17]. Thus, it is better to abandon stronger concepts of similarity in favour of weaker ones and to use stronger concepts only if they can be justified.

3.3 Approach 3: Hybrid Model Design for Improving Model Accuracy by Integrating Expert Hints in Biomedical Diagnostics

For diagnostics based on biomedical image analysis, image segmentation serves as a prerequisite step to extract quantitative information [70]. If, however, segmentation results are not accurate, quantitative analysis can lead to results that misrepresent the underlying biological conditions [50]. To extract features from biomedical images at a single cell level, robust automated segmentation algorithms have to be applied. In the Austrian FFG project VISIOMICS^{Footnote 9}, which is devoted to cell analysis, we tackle this problem by following a cell segmentation ensemble approach, consisting of several state-of-the-art deep neural networks [38, 85]. In addition to overcome the lack of training data, which is very time consuming to prepare and annotate, we utilize a Generative Adversarial Network approach (GANs) for artificial training data generation [53]^{Footnote 10}. The underlying dataset was also published [52] and is available online^{Footnote 11}. Particularly for cancer diagnostics, clinical decision-making often relies on timely and cost-effective genome-wide testing. Similar to biomedical imaging, classical bioinformatic algorithms, often require manual data curation, which is error prone, extremely time-consuming, and thus has negative effects on time and cost efficiency. To overcome this problem, we developed the DeepSNP^{Footnote 12} network to learn from genome-wide single-nucleotide polymorphism array (SNPa) data and to classify the presence or absence of genomic breakpoints within large genomic windows with high precision and recall [16].

Lesson Learned: First, it is crucial to rely on expert knowledge when it comes to data augmentation strategies. This becomes more important the more complex the data is (high number of cores and overlapping cores). Less complex images do not necessarily benefit from data augmentation. Second, by introducing so-called localization units the network is able to gain the ability to exactly localize anomalies in terms of genomic breakpoints despite never experiencing their exact location during training. In this way we have learned that localization and attention units can be used to significantly ease the effort of annotating data.

3.4 Approach 4: Interpretability by Correction Model Approach

Last year, at a symposium on predictive analytics in Vienna [93], we introduced an approach to the problem of formulating interpretability of AI models for classification or regression problems [37] with a given basis model, e.g., in the context of model predictive control [32]. The basic idea is to root the problem of interpretability in the basic model by considering the contribution of the AI model as correction of this basis model and is referred to as “Before and After Correction Parameter Comparison (BAPC)”. The idea of small correction is a common approach in mathematics in the field of perturbation theory, for example of linear operators. In [91, 92] the idea of small-scale perturbation (in the sense of linear algebra) was used to give estimates of the probability of return of an odyssey on a percolation cluster. The notion of “small influence” appears here in a similar way via the measures of determination for the AI model compared to the basic model.

According to BAPC, an AI-based correction of a solution of these problems, which is previously provided by a basic model, is interpretable in the sense of this basic model, if its effect can be described by its parameters. Since this effect refers to the estimated target variables of the data. In other words, an AI correction in the sense of a basic model is interpretable in the sense of this basic model exactly when the accompanying change of the target variable estimation can be characterized with the solution of the basic model under the corresponding parameter changes. The basic idea of the approach is thus to apply the explanatory power of the basic model to the correcting AI method in that their effect can be formulated with the help of the parameters of the basic model. BAPC’s ability to use the basic model to predict the modified target variables makes it a so-called surrogate [9].

The proposed solution for the interpretation of the AI correction is of course limited from the outset by the interpretation horizon of the basic model. Furthermore, it must be assumed that the basic model is too weak to describe the phenomena underlying the correction in accordance with the actual facts. We therefore distinguish between explainability and interpretability and, with the definition of interpretability in terms of the basic model introduced above, we do not claim to always be able to explain, but rather to be able to describe (i.e. interpret) the correction as a change of the solution using the basic model. This is achieved by means of the features used in the basic model and their modified parameters. As with most XAI approaches (e.g., feature importance vector [33]), the goal is to find the most significant changes in these parameters.

Lesson Learned: This approach is work in progress and will be tackled in detail in the upcoming Austrian FFG research project “inAIco”. As lesson learned we appreciate the BAPC approach as result of interdisciplinary research at the intersection of mathematics, machine learning and model predictive control. We expect that the approach generally only works for “small” AI corrections. It must be possible to formulate conditions about the size (i.e. “smallness”) of the AI correction under which the approach will work in any case. However, it is an advantage of our approach that interpretability does not depend on human understanding (see the discussion in [33] and [9]). An important aspect is its mathematical rigidity, which avoids the accusation of “quasi-scientificity” (see [57]).

3.5 Approach 5: Software Quality by Code Analysis and Automated Documentation

Quality assurance measures in software engineering include, e.g., automated testing [2], static code analysis [73], system redocumentation [69], or symbolic execution [4]. These measures need to be risk-based [23, 83], exploiting knowledge about system and design dependencies, business requirements, or characteristics of the applied development process.

AI-based methods can be applied to extract knowledge from source code or test specifications to support this analysis. In contrast to manual approaches, which require extensive human annotation work, machine learning methods have been applied for various extraction and classification tasks, such as comment classification of software systems with promising results in [78, 89, 94].

Software engineering approaches contribute to automate (i) AI-based system testing, e.g., by means of predicting fault-prone parts of the software system that need particular attention [68], and (ii) system documentation to improve software maintainability [14, 69, 98] and to support re-engineering and migration activities [14]. In particular, we developed a feed-back directed testing approach to derive tests from interacting with a running system [61], which we successfully applied in various industry projects [24, 82]. In an ongoing redocumentation project [29], we automatically generate parts of the functional documentation, containing business rules and domain concepts, and all the technical documentation.

Lesson Learned: Keeping documentation up to date is essential for the maintainability of frequently updated software and to minimise the risk of technical debt due to the entanglement of data and sub-components of machine learning systems. The lesson learned is that for this problem also machine learning can be utilized when it comes to establishing rules for detecting and classifying comments (accuracy of >95%) and integrating them when generating readable documentation.

3.6 Approach 6: The ALOHA Toolchain for Embedded Platforms

In [66] and [65] we introduce ALOHA, an integrated tool flow that tries to make the design of deep learning (DL) applications and their porting on embedded heterogeneous architectures as simple and painless as possible. ALOHA is the result of interdisciplinary research funded by the EU^{Footnote 13}. The proposed tool flow aims at automating different design steps and reducing development costs by bridging the gap between DL algorithm training and inference phases. The tool considers hardware-related variables and security, power efficiency, and adaptivity aspects during the whole development process, from pre-training hyperparameter optimization and algorithm configuration to deployment. According to Fig. 1 the general architecture of the ALOHA software framework [67] consists of three major steps:

(Step 1) algorithm selection,
(Step 2) application partitioning and mapping, and
(Step 3) deployment on target hardware.

Starting from a user-specified set of input definitions and data, including a description of the target architecture, the tool flow generates a partitioned and mapped neural network configuration, ready to the target processing architecture, which also optimizes predefined optimization criteria. The criteria for optimization include both application-level accuracy and the required security level, Inference execution time and power consumption. A RESTful microservices approach allows each step of the development process to be broken down into smaller, completely independent components that interact and influence each other through the exchange of HTTP calls [71]. The implementations of the various components are managed using a container orchestration platform. The standard ONNX^{Footnote 14} (Open Neural Network Exchange) is used to exchange deep learning models between the different components of the tool flow.

In Step 1 a Design Space comprising admissible model architectures for hyperparamerter tuning is defined. This Design Space is configured via satellite tools that evaluate the fitness in terms of the predefined optimization criteria such as accuracy (by the Training Engine), robustness against adversarial attacks (by the Security evaluation tool) and power (by the Power evaluation tool). The optimization is based on a) hyperparameter tuning based on a non-stochastic infinite-armed bandit approach [55], and b) a parsimonious inference strategy that aims to reduce the bit depth of the activation values from initially 8bit to 4bit by a iterative quantization and retraining steps [47]. The optimization in Step 2 exploits genetic algorithm for surfing the design space and requiring evaluation of the candidate partitioning and mapping scheme to the satellite tools Sesame [80] and Architecture Optimization Workbench (AOW) [62].

The gain in performance was evaluated in terms of inference time needed to execute the modified model on NEURAghe [64], a Zynq-based processing platform that contains both a dual ARM Cortex A9 processor (667 MHz) and a CNN accelerator implemented in the programmable logic. The statistical analysis on the switching activity of our reference models showed that, on average, only about 65% of the kernels are active in the layers of the network throughout the target validation data set. The resulting model loses only 2% accuracy (baseline 70%) while achieving an impressive 48.31% reduction in terms of FLOPs.

Lesson Learned: Following the standard training procedure deep models tend to be oversized. This research shows that some of the CNN layers are operating in a static or close-to-static mode, enabling the permanent pruning of the redundant kernels from the model. But, the second optimization strategy dedicated to parsimonious inference turns out to more effective on pure software execution, since it more directly deactivates operations in the convolution process. All in all, this study shows that there is a lot of potential for optimisation and improvement compared to standard deep learning engineering approaches.

3.7 Approach 7: Human AI Teaming Approach as Key to Human Centered AI

In [36], we introduce an approach for human-centered AI in working environments utilizing knowledge graphs and relational machine learning ([72, 79]). This approach is currently being refined in the ongoing Austrian project Human-centred AI in digitised working environments (AI@Work). The discussion starts with a critical analysis of the limitations of current AI systems whose learning/training is restricted to predefined structured data, most vector-based with a pre-defined format. Therefore, we need an approach that overcomes this restriction by utilizing a relational structures by means of a knowledge graph (KG) that allows to represent relevant context data for linking ongoing AI-based and human-based actions on the one hand and process knowledge and policies on the other hand. Figure 2 outlines this general approach where the knowledge graph is used as an intermediate representation of linked data to be exploited for improvement of the machine learning system, respectively AI system.

Methods applied in this context will include knowledge graph completion techniques that aim at filling missing facts within a knowledge graph [75]. The KG flexibly will allow tying together contextual knowledge about the team of involved human and AI based actors including interdependence relations, skills and tasks together with application and system process and organizational knowledge [49]. Relational machine learning will be developed in combination with an updatable knowledge graph embedding [8, 101]. This relational ML will be exploited for analysing and mining the knowledge graph for the purpose of detecting inconsistencies, curating, refinement, providing recommendations for improvements and detecting compliance conflicts with predefined behavioural policies (e.g. ethic or safety policies). The system will learn from the environment, user feedback, changes in the application or deviations from committed behavioral patterns in order to react by providing updated recommendations or triggering actions in case of compliance conflicts. But, the construction of the knowledge graph and keeping it up-to-date is a critical step as it usually includes laborious efforts for knowledge extraction, knowledge fusion, knowledge verification and knowledge updates. In order to address this challenge, our approach pursues bootstrapping strategies for knowledge extraction by recent advances in deep learning and embedding representations as promising methods for matching knowledge items represented in diverse formats.

Lesson Learned: As pointed out in Sect. 2.3 there is a substantial gap between current state-of-the-art research of AI systems and the requirements posed by ethical guidelines. Future research will rely much more on machine learning on graph structures. Fast updatable knowledge graphs and related knowledge graph embeddings might a key towards ethics by design enabling human centered AI.

4 Discussion and Conclusion

This paper can only give a small grasp of the broad field of AI research in connection with the application of AI in practice. The associated research is indeed inter- and even transdisciplinary [56]. Whatever, we come to the conclusion that a discussion on “Applying AI in Practice” needs to start with its theoretical foundations and a critical discussion about the limitations of current data-driven AI systems as outlined in Sect. 2.1. Approach 1, Sect. 3.1, and Approach 2, Sect. 3.2, help to stick to the theoretical prerequisites. Approach 1 contributes by reducing errors in the data and Approach 2 by extending the theory by relaxing its preconditions, bringing statistical learning theory closer to the needs of practice. However, building such systems and addressing the related challenges as outlined in Sect. 2.2 requires a bunch of skills from different fields, predominantly model building and software engineering know-how. Approach 3, Sect. 3.3, and Approach 4, Sect. 3.4, contribute to model building: Approach 3 by creatively adopting novel hybrid machine learning model architectures and Approach 4 by means of system theory that investigates AI as addendum to a basis model in order to be able to establish a notion of interpretability in a strict mathematical sense. Every model applied in practice must be coded in software. Approach 5, Sect. 3.5, outlines helpful state-of-the-art approaches in software engineering for maintaining the engineered software in good traceable and reusable quality which becomes more and more important with increasing complexity. Approach 6, Sect. 3.6, is an integrative approach that takes all the aspects discussed so far into account by proposing a software framework that supports the developer in all these steps when optimizing an AI system for an embedded platform. Finally, the challenge for human centered AI as outlined in Sect. 2.3 is somehow beyond of the state of the art. While the Key Challenges 1 and 2 require, above all, progress in the respective disciplines, Key Challenge 3 addressing “trust” in the end will require a mathematical theory of trust, that is a trust modeling approach at the level of system engineering that takes the psychological and cognitive aspects of human trust into account as well. Approach 7, Sect. 3.7, contributes to this endeavour by its conceptional approach for human AI teaming and its analysis of its prerequisites from relational machine learning.

Notes

1.
https://www.pwc.com/gx/en/services/people-organisation/workforce-of-the-future/workforce-of-the-future-the-competing-forces-shaping-2030-pwc.pdf.
2.
https://www.whitehouse.gov/wp-content/uploads/2019/06/National-AI-Research-and-Development-Strategic-Plan-2019-Update-June-2019.pdf.
3.
https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai.
4.
https://algorithmwatch.org/en/project/ai-ethics-guidelines-global-inventory/.
5.
https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai.
6.
https://www.iso.org/committee/6794475/x/catalogue/p/0/u/1/w/0/d/0.
7.
https://griffin.incubator.apache.org.
8.
https://github.com/mobydq/mobydq.
9.
Platform supporting an integrated analysis of image and multiOMICs data based on liquid biopsies for tumor diagnostics – https://www.visiomics.at/.
10.
Nuclear Segmentation Pipeline code available: https://github.com/SCCH-KVS/NuclearSegmentationPipeline.
11.
BioStudies: https://www.ebi.ac.uk/biostudies/studies/S-BSST265.
12.
DeepSNP code available: https://github.com/SCCH-KVS/deepsnp.
13.
https://www.aloha-h2020.eu/.
14.
https://onnx.ai/.

References

Amershi, S., et al.: Guidelines for human-AI interaction. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI 2019 (2019)
Google Scholar
Anand, S., et al.: An orchestrated survey of methodologies for automated software test case generation. J. Syst. Softw. 86(8), 1978–2001 (2013)
Google Scholar
Athalye, A., Engstrom, L., Ilyas, A., Kwok, K.: Synthesizing robust adversarial examples. arXiv e-prints (2017)
Google Scholar
Baldoni, R., Coppa, E., D’elia, D.C., Demetrescu, C., Finocchi, I.: A survey of symbolic execution techniques. ACM Comput. Surv. (CSUR) 51(3), 1–39 (2018)
Google Scholar
Bensalem, M., Dizdarević, J., Jukan, A.: Modeling of deep neural network (DNN) placement and inference in edge computing. arXiv e-prints (2020)
Google Scholar
Breck, E., Zinkevich, M., Polyzotis, N., Whang, S., Roy, S.: Data validation for machine learning. In: Proceedings of SysML (2019)
Google Scholar
Cagala, T.: Improving data quality and closing data gaps with machine learning. In: Settlements, B.F.I. (ed.) Data Needs and Statistics Compilation for Macroprudential Analysis, vol. 46 (2017)
Google Scholar
Cai, H., Zheng, V.W., Chang, K.C.C.: A comprehensive survey of graph embedding: problems, techniques and applications (2017)
Google Scholar
Carvalho, D.V., Pereira, E.M., Cardoso, J.S.: Machine learning interpretability: a survey on methods and metrics. Electronics 8(8), 832 (2019)
Article Google Scholar
Char, D.S., Shah, N.H., Magnus, D.: Implementing machine learning in health care - addressing ethical challenges. N. Engl. J. Med. 378(11), 981–983 (2018). https://doi.org/10.1056/NEJMp1714229. pMID: 29539284
Article Google Scholar
Chrisman, N.: The role of quality information in the long-term functioning of a geographic information system. Cartographica Int. J. Geogr. Inf. Geovisualization 21(2), 79–88 (1983)
Google Scholar
Cohen, R., Schaekermann, M., Liu, S., Cormier, M.: Trusted AI and the contribution of trust modeling in multiagent systems. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2019, pp. 1644–1648 (2019)
Google Scholar
Deeks, A.: The judicial demand for explainable artificial intelligence. Columbia Law Rev. 119(7), 1829–1850 (2019)
Google Scholar
Dorninger, B., Moser, M., Pichler, J.: Multi-language re-documentation to support a COBOL to Java migration project. In: 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 536–540. IEEE (2017)
Google Scholar
Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv (2017)
Google Scholar
Eghbal-Zadeh, H., et al.: DeepSNP: an end-to-end deep neural network with attention-based localization for breakpoint detection in single-nucleotide polymorphism array genomic data. J. Comput. Biol. 26(6), 572–596 (2018)
Google Scholar
Eghbal-zadeh, H., Zellinger, W., Widmer, G.: Mixture density generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5820–5829 (2019)
Google Scholar
Ehrlinger, L., Grubinger, T., Varga, B., Pichler, M., Natschläger, T., Zeindl, J.: Treating missing data in industrial data analytics. In: 2018 Thirteenth International Conference on Digital Information Management (ICDIM), pp. 148–155. IEEE, September 2018
Google Scholar
Ehrlinger, L., Haunschmid, V., Palazzini, D., Lettner, C.: A DaQL to monitor data quality in machine learning applications. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DEXA 2019. LNCS, vol. 11706, pp. 227–237. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27615-7_17
Chapter Google Scholar
Ehrlinger, L., Rusz, E., Wöß, W.: A Survey of data quality measurement and monitoring tools. CoRR abs/1907.08138 (2019)
Google Scholar
Ehrlinger, L., Werth, B., Wöß, W.: Automated continuous data quality measurement with quaiie. Int. J. Adv. Softw. 11(3&4), 400–417 (2018)
Google Scholar
Ehrlinger, L., Wöß, W.: Automated data quality monitoring. In: 22nd MIT International Conference on Information Quality (ICIQ 2017), pp. 15.1–15.9 (2017)
Google Scholar
Felderer, M., Ramler, R.: Integrating risk-based testing in industrial test processes. Software Qual. J. 22(3), 543–575 (2014)
Google Scholar
Fischer, S., Ramler, R., Linsbauer, L., Egyed, A.: Automating test reuse for highly configurable software. In: Proceedings of the 23rd International Systems and Software Product Line Conference-Volume A, pp. 1–11 (2019)
Google Scholar
Forcier, M.B., Gallois, H., Mullan, S., Joly, Y.: Integrating artificial intelligence into health care through data access: can the GDPR act as a beacon for policymakers? J. Law Biosci. 6(1), 317–335 (2019)
Google Scholar
Gal, Y.: Uncertainty in deep learning. Thesis (2016)
Google Scholar
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48. pp. 1050–1059. JMLR.org (2016)
Google Scholar
Galloway, A., Taylor, G.W., Moussa, M.: Predicting adversarial examples with high confidence. arXiv e-prints (2018)
Google Scholar
Geist, V., Moser, M., Pichler, J., Beyer, S., Pinzger, M.: Leveraging machine learning for software redocumentation. In: 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 622–626. IEEE (2020)
Google Scholar
Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), pp. 80–89 (2018)
Google Scholar
Gorban, A.N., Tyukin, I.Y.: Blessing of dimensionality: mathematical foundations of the statistical physics of data. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 376(2118), 20170237 (2018)
MathSciNet Google Scholar
Grancharova, A., Johansen, T.A.: Nonlinear model predictive control, In: Explicit Nonlinear Model Predictive Control, vol. 429, pp. 39–69. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28780-0_2
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 1–42 (2018)
Google Scholar
Gunning, D.: Darpa’s explainable artificial intelligence (XAI) program. In: Proceedings of the 24th International Conference on Intelligent User Interfaces. p. ii. IUI 2019. Association for Computing Machinery, New York (2019)
Google Scholar
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. arXiv e-prints (2017)
Google Scholar
Gusenleitner, N., et al.: Facing mental workload in AI-transformed working environments. In: h-WORKLOAD 2019: 3rd International Symposium on Human Mental Workload: Models and Applications (2019)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE International Conference on Computer Vision (ICCV) (2017). arXiv: 1703.06870
Hein, M., Andriushchenko, M., Bitterwolf, J.: Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 41–50 (2019)
Google Scholar
Heinrich, B., Hristova, D., Klier, M., Schiller, A., Szubartowicz, M.: Requirements for data quality metrics. J. Data Inform. Qual. 9(2), 1–32 (2018)
Google Scholar
Hoffman, R.R., Mueller, S.T., Klein, G., Litman, J.: Metrics for explainable AI: challenges and prospects. CoRR abs/1812.04608 (2018)
Google Scholar
Holzinger, A.: Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. 3(2), 119–131 (2016). https://doi.org/10.1007/s40708-016-0042-6
Article Google Scholar
Holzinger, A., Carrington, A., Müller, H.: Measuring the quality of explanations: the system causability scale (SCS). Special Issue on Interactive Machine Learning. Künstliche Intelligenz (Ger. J. Artif. Intell. 34, 193–198 (2020)
Google Scholar
Holzinger, A., Kieseberg, P., Weippl, E., Tjoa, A.M.: Current advances, trends and challenges of machine learning and knowledge extraction: from machine learning to explainable AI. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-MAKE 2018. LNCS, vol. 11015, pp. 1–8. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99740-7_1
Chapter Google Scholar
Holzinger, A., Langs, G., Denk, H., Zatloukal, K., Müller, H.: Causability and explainability of artificial intelligence in medicine. WIREs Data Min. Knowl. Discov. 9(4), e1312 (2019)
Google Scholar
Holzinger, A.: Introduction to machine learning and knowledge extraction (make). Mach. Learn. Knowl. Extr 1(1), 1–20 (2017)
Google Scholar
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. CoRR abs/1712.05877 (2017)
Google Scholar
Jiang, J., Zhai, C.: Instance weighting for domain adaptation in NLP. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 264–271 (2007)
Google Scholar
Johnson, M., Vera, A.: No AI is an island: the case for teaming intelligence. AI Mag. 40(1), 16–28 (2019)
Google Scholar
Jung, C., Kim, C.: Impact of the accuracy of automatic segmentation of cell nuclei clusters on classification of thyroid follicular lesions. Cytometry. Part A J. Int. Soc. Anal. Cytol 85(8), 709–718 (2014)
Google Scholar
Kelly, C.J., Karthikesalingam, A., Suleyman, M., Corrado, G., King, D.: Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17(1), 195 (2019)
Google Scholar
Kromp, F., et al.: An annotated fluorescence image dataset for training nuclear segmentation methods. Nat. Sci. Data (2020, in press)
Google Scholar
Kromp, F., et al.: Deep learning architectures for generalized immunofluorescence based nuclear image segmentation. arXiv e-prints (2019)
Google Scholar
Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(FEB), 436–444 (2015)
Google Scholar
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18(1), 6765–6816 (2017)
MathSciNet MATH Google Scholar
Li, S., Wang, Y.: Research on interdisciplinary characteristics: a case study in the field of artificial intelligence. IOP Conf. Ser. Mater. Sci. Eng. 677, 052023 (2019)
Google Scholar
Lipton, Z.C.: The mythos of model interpretability. Queue 16(3), 31–57 (2018)
Google Scholar
Little, M.A., et al.: Using and understanding cross-validation strategies. Perspectives on saeb et al. GigaScience 6(5), gix020 (2017)
Google Scholar
Lombrozo, T.: Explanatory preferences shape learning and inference. Trends Cogn. Sci. 20(10), 748–759 (2016)
Google Scholar
London, A.: Artificial intelligence and black-box medical decisions: accuracy versus explainability. Hastings Cent. Rep. 49, 15–21 (2019)
Google Scholar
Ma, L., Artho, C., Zhang, C., Sato, H., Gmeiner, J., Ramler, R.: GRT: program-analysis-guided random testing (t). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 212–223. IEEE (2015)
Google Scholar
Masin, M., et al.: Pluggable analysis viewpoints for design space exploration. Procedia Comput. Sci. 16, 226–235 (2013)
Google Scholar
Maydanchik, A.: Data Quality Assessment. Technics Publications, LLC, Bradley Beach (2007)
Google Scholar
Meloni, P., et al.: NEURAghe: exploiting CPU-FPGA synergies for efficient and flexible CNN inference acceleration on Zynq SoCs. CoRR abs/1712.00994 (2017)
Google Scholar
Meloni, P., et al.: ALOHA: an architectural-aware framework for deep learning at the edge. In: Proceedings of the Workshop on INTelligent Embedded Systems Architectures and Applications - INTESA, pp. 19–26. ACM Press (2018)
Google Scholar
Meloni, P., et al.: Architecture-aware design and implementation of CNN algorithms for embedded inference: the ALOHA project. In: 2018 30th International Conference on Microelectronics (ICM), pp. 52–55 (2018)
Google Scholar
Meloni, P., et al.: Optimization and deployment of CNNS at the edge: the ALOHA experience. In: Proceedings of the 16th ACM International Conference on Computing Frontiers, CF 2019, pp. 326–332 (2019)
Google Scholar
Menzies, T., Milton, Z., Turhan, B., Cukic, B., Jiang, Y., Bener, A.: Defect prediction from static code features: current results, limitations, new approaches. Autom. Softw. Eng. 17(4), 375–407 (2010)
Google Scholar
Moser, M., Pichler, J., Fleck, G., Witlatschil, M.: RbGG: a documentation generator for scientific and engineering software. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 464–468. IEEE (2015)
Google Scholar
Méhes, G., et al.: Detection of disseminated tumor cells in neuroblastoma: 3 log improvement in sensitivity by automatic immunofluorescence plus FISH (AIPF) analysis compared with classical bone marrow cytology. Am. J. Pathol. 163(2), 393–399 (2003)
Google Scholar
Newman, S.: Building Microservices, 1st edn. O’Reilly Media Inc. (2015)
Google Scholar
Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A review of relational machine learning for knowledge graphs. Proc. IEEE 104(1), 11–33 (2016)
Google Scholar
Nielson, F., Nielson, H.R., Hankin, C.: Principles of Program Analysis. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-03811-6
Book MATH Google Scholar
Nikzad-Langerodi, R., Zellinger, W., Lughofer, E., Saminger-Platz, S.: Domain-invariant partial-least-squares regression. Anal. Chem. 90(11), 6693–6701 (2018)
Google Scholar
Noy, N., Gao, Y., Jain, A., Narayanan, A., Patterson, A., Taylor, J.: Industry-scale knowledge graphs: lessons and challenges. Commun. ACM 62(8), 36–43 (2019)
Google Scholar
Obermeyer, Z., Powers, B., Vogeli, C., Mullainathan, S.: Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464), 447–453 (2019)
Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)
Google Scholar
Pascarella, L., Bacchelli, A.: Classifying code comments in java open-source software systems. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp. 227–237. IEEE (2017)
Google Scholar
Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web 8(3), 489–508 (2017)
Google Scholar
Pimentel, A.D., Erbas, C., Polstra, S.: A systematic approach to exploring embedded system architectures at multiple abstraction levels. IEEE Trans. Comput. 55(2), 99–112 (2006)
Google Scholar
Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)
Google Scholar
Ramler, R., Buchgeher, G., Klammer, C.: Adapting automated test generation to gui testing of industry applications. Inf. Softw. Technol. 93, 248–263 (2018)
Google Scholar
Ramler, R., Felderer, M.: A process for risk-based test strategy development and its industrial evaluation. In: Abrahamsson, P., Corral, L., Oivo, M., Russo, B. (eds.) PROFES 2015. LNCS, vol. 9459, pp. 355–371. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26844-6_26
Chapter Google Scholar
Ramler, R., Wolfmaier, K.: Issues and effort in integrating data from heterogeneous software repositories and corporate databases. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 330–332 (2008)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Samek, W., Wiegand, T., Müller, K.R.: Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. arXiv e-prints (2017)
Google Scholar
Sculley, D., et al.: Hidden technical debt in machine learning systems. In: 28th International Conference on Neural Information Processing Systems (NIPS), pp. 2503–2511 (2015)
Google Scholar
Sebastian-Coleman, L.: Measuring Data Quality for Ongoing Improvement. Elsevier, Amsterdam (2013)
Google Scholar
Shinyama, Y., Arahori, Y., Gondow, K.: Analyzing code comments to boost program comprehension. In: 2018 25th Asia-Pacific Software Engineering Conference (APSEC), pp. 325–334. IEEE (2018)
Google Scholar
Dosilovic, F.K., Brçiç, M., Hlupic, N.: Explainable artificial intelligence: a survey. In: Skala, K. (ed.) Croatian Society for Information and Communication Technology, Electronics and Microelectronics - MIPRO (2018)
Google Scholar
Sobieczky, F.: An interlacing technique for spectra of random walks and its application to finite percolation clusters. J. Theor. Probab. 23, 639–670 (2010)
MathSciNet MATH Google Scholar
Sobieczky, F.: Bounds for the annealed return probability on large finite percolation graphs. Electron. J. Probab. 17, 17 (2012)
MathSciNet MATH Google Scholar
Sobieczky, F.: Explainability of models with an interpretable base model: explainability vs. accuracy. In: Symposium on Predictive Analytics 2019, Vienna (2019)
Google Scholar
Steidl, D., Hummel, B., Juergens, E.: Quality analysis of source code comments. In: 2013 21st International Conference on Program Comprehension (ICPC), pp. 83–92. IEEE (2013)
Google Scholar
Sun, B., Saenko, K.: Deep CORAL: correlation alignment for deep domain adaptation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 443–450. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_35
Chapter Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv e-prints (2013)
Google Scholar
Sünderhauf, N., et al.: The limits and potentials of deep learning for robotics. Int. J. Robot. Res. 37(4–5), 405–420 (2018)
Google Scholar
Van Geet, J., Ebraert, P., Demeyer, S.: Redocumentation of a legacy banking system: an experience report. In: Proceedings of the Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), pp. 33–41 (2010)
Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience, New York (1998)
MATH Google Scholar
Vidal, R., Bruna, J., Giryes, R., Soatto, S.: Mathematics of deep learning. arXiv e-prints (2017). arxiv:1712.04741
Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge graph embedding: a survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 29(12), 2724–2743 (2017)
Google Scholar
Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manage. Inform. Syst. 12(4), 5–33 (1996)
Google Scholar
Wang, Y.E., Wei, G.Y., Brooks, D.: Benchmarking TPU, GPU, and CPU platforms for deep learning. arXiv e-prints (2019)
Google Scholar
Xu, G., Huang, J.Z.: Asymptotic optimality and efficient computation of the leave-subject-out cross-validation. Ann. Stat. 40(6), 3003–3030 (2012)
MathSciNet MATH Google Scholar
Yu, T., Zhu, H.: Hyper-parameter optimization: a review of algorithms and applications. arXiv e-prints (2020)
Google Scholar
Zellinger, W., Grubinger, T., Lughofer, E., Natschläger, T., Saminger-Platz, S.: Central moment discrepancy (CMD) for domain-invariant representation learning. In: International Conference on Learning Representations (2017)
Google Scholar
Zellinger, W., et al.: Multi-source transfer learning of time series in cyclical manufacturing. J. Intell. Manuf. 31(3), 777–787 (2020)
Google Scholar
Zellinger, W., Moser, B.A., Grubinger, T., Lughofer, E., Natschläger, T., Saminger-Platz, S.: Robust unsupervised domain adaptation for neural networks via moment alignment. Inf. Sci. 483, 174–191 (2019)
MathSciNet Google Scholar
Zellinger, W., Moser, B.A., Saminger-Platz, S.: Learning bounds for moment-based domain adaptation. arXiv preprint arXiv:2002.08260 (2020)
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: International Conference on Learning Representations (201z)
Google Scholar
Zou, J., Schiebinger, L.: AI can be sexist and racist - it’s time to make it fair. Nature 559, 324–326 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Software Competence Center Hagenberg GmbH (SCCH), Hagenberg, Austria
Lukas Fischer, Lisa Ehrlinger, Verena Geist, Rudolf Ramler, Florian Sobieczky, Werner Zellinger & Bernhard Moser
Johannes Kepler University, Linz, Austria
Lisa Ehrlinger

Authors

Lukas Fischer
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Ehrlinger
View author publications
You can also search for this author in PubMed Google Scholar
Verena Geist
View author publications
You can also search for this author in PubMed Google Scholar
Rudolf Ramler
View author publications
You can also search for this author in PubMed Google Scholar
Florian Sobieczky
View author publications
You can also search for this author in PubMed Google Scholar
Werner Zellinger
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Moser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lukas Fischer .

Editor information

Editors and Affiliations

Human-Centered AI Lab, Institute for Medical Informatics, Statistics and Doumentation, Medical University Graz, Graz, Austria
Andreas Holzinger
UAS St. Pölten, St. Pölten, Austria
Peter Kieseberg
Institute of Software Technology and Interactive Systems, Technical University of Vienna, Vienna, Austria
A Min Tjoa
SBA Research, Vienna, Austria
Edgar Weippl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fischer, L. et al. (2020). Applying AI in Practice: Key Challenges and Lessons Learned. In: Holzinger, A., Kieseberg, P., Tjoa, A., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2020. Lecture Notes in Computer Science(), vol 12279. Springer, Cham. https://doi.org/10.1007/978-3-030-57321-8_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-57321-8_25
Published: 18 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57320-1
Online ISBN: 978-3-030-57321-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Applying AI in Practice: Key Challenges and Lessons Learned

Abstract