1 Introduction

With the data science boom, data analytics-driven solutions are increasingly demanded. From government service delivery to commercial transactions and all the way to specialized decision support systems, domain users are looking to integrate “big data” and advanced analytics into their business operations in order to become more analytics-driven in their decision making (which is also known as predictive decision making) [1]. Machine Learning (ML) research has been expanding rapidly to meet these expectations. Furthermore, studies found that human–machine collaborations can have better performance than solo human or machine [2], which encourages human–machine collaborations. However, ML technologies are currently facing prolonged challenges with user acceptance of delivered solutions as well as seeing system misuse, disuse, or even failure [3,4,5].

On the other hand, state-of-the-art human–machine systems aim to incorporate the contributions of all stakeholders including end users from the early design and content development process to the entire deployment process to help ensure that the result meets needs [6, 7], which is also known as participatory design [8]. However, the current ML research focuses more on the ML algorithm development and does not take much attention to end users for more impact of ML on real world applications [9]. For example, for many of users who do not have much knowledge on ML technologies (referred to non-ML users), an ML-based predictive decision making system is like a “black box”, to which they simply provide their source data and (after selecting some menu options on screen) colorful viewgraphs and/or recommendations are displayed as output [5, 10]. It is neither clear nor well understood that how trustworthy is this output, or how uncertainties are handled by underlying algorithmic procedures. As the ultimate frontline users of ML-based intelligent systems, humans are the key stakeholders and human factors are essential in extracting and delivering more sensible and effective insights from ML technologies [11]. Human–machine trust [12] is considered as the core among all these human-relevant aspects.

One of the most widely cited definition of trust from Lee and See [13] defines trust as “the attitude that an agent will help achieve an individual’s goals in a situation characterized by uncertainty and vulnerability”. This definition shows that uncertainty is tightly coupled to trust. In machine learning, inputs to ML models are often historical records or samples of some events. They are usually not the precise description of events. ML models are also imperfect abstractions of reality. Therefore, the imprecision and uncertainty are unavoidably associated with ML outputs and therefore in the decisions based on them. In human–machine interactions, uncertainty often plays an important role in hindering the sense-making process and conducting tasks: on the machine side, uncertainty builds up from the system itself; on the human side, these uncertainties often result in “lack of knowledge for trust” or “over-trust”. A user might be risking too much by completely ignoring uncertainties and having complete faith in ML-based systems. On the other hand, trivializing or having high uncertainty perception on ML-based systems could possibly dismiss the incredible potential of ML-based systems. Adobor [14] showed that a certain amount of uncertainty is necessary for trust to emerge. Beyond that threshold, increase in uncertainty can lead to a reduction in trust. This suggests that there may be correlations between uncertainty and trust.

Moreover, Parasuraman et al. [15] showed that human cognition constructs such as Cognitive Load (CL) are often invoked in considerations of function allocation and the design of automated systems. The construct of cognitive load is based on models of human working memory which state that humans have limited capacity to process information. Cognitive load is a variable that attempts to quantify extent of the demands imposed by a task on the working memory to process information [16]. For example, in task situations of modern complex high-risk domains, users often need to make decisions in a limited time. Therefore, they often make decisions under high cognitive load besides trust issues in such situations. It has found that a higher cognitive load worsens the situation in relation to trust building [17]. Biros et al. [18] showed that humans are too dependent on the automated system when they experience a high cognitive load. The effectiveness of human–machine collaboration including remote collaboration is also affected by cognitive load [19]. However, it is still not clear how trust varies under both high cognitive load and various uncertainty conditions.

Recent studies also showed that individual differences in personality traits contribute to differences in trust. For example, a probability model is proposed to examine the effect of personality traits on trust in automation [20]. While predictive decision making involves much human’s cognitive effort, understanding the effects of different personality traits on user trust will help to design more effective personalized intelligent user interface for human–machine collaborations. However, little work is done on the effects of personality traits on trust in predictive decision making especially under uncertainty and cognitive load conditions.

Since the development of trust is affected by an interplay of characteristics of human, machine, and operational environment [21], Hancock et al. [22] and Schaefer et al. [23] proposed a conceptual organization of trust influences highlighting crucial influence factors in trust development. This paper adapts this conceptual organization into the predictive decision making scenario and examines the effects of human, machine, and environment factors on trust in human–machine collaborations. We specifically focus on the investigation of personality traits and uncertainty of ML models as key human and machine factors respectively in predictive decision making. Furthermore, cognitive load is introduced from a second cognitive task in our predictive decision making scenario and it is used as an environment factor to examine how these three factors affect trust in predictive decision making. The Big Five personality model is used to identify personality traits of users. Two uncertainty types of risk and ambiguity are presented with predictive model results in a decision making scenario. This follows the user method and approach to design a cognitive system (as reviewed by [24]) which uses a simulation of water pipe failure prediction as a case study. It shows that different personality traits affect user trust in predictive decision making differently under both uncertainty presentation and cognitive load levels. A framework of user trust feedback loop is proposed to incorporate study results into human–machine collaborations. The investigation results can be used to help the participatory design of human–machine collaborations for personalized user interfaces.

2 Related work

2.1 Human factors and machine learning

Human factors are indispensable components of data science solutions. Scantamburlo [25] suggested that by considering the outline of some potential risks underlying the ML process, the ML method requires an in-depth analysis of the human factors involved throughout the whole implementation of the system. Watanabe [26] presented that the judgement of ML results as “right” or “wrong” is an activity that comes after apprehension, and which needs a very human intervention [25]. On the other hand, Wagstaff [9] solved the problem of making ML acceptable by presenting an ML research program with a three-stage model. She argued that the current ML research mostly focuses on the second stage of developing new ML algorithms, but misses persuading users to adopt ML techniques to ultimately improve the impact of these techniques. Fiebrink et al. [27] investigated the design of end-user interfaces for ML in real-time application domains such as music composition and performance. They built a software (named Wekinator) that allows end users to apply supervised learning to create custom, interactive, real-time systems. This type of approach to machine learning in which a human user steers model behaviours through iterative and strategic changes to the training data is called “interactive machine learning” [28].

However, such interactive ML does not consider user cognitive responses such as trust. This paper aims to set up a feedback loop by incorporating the user trust into the user interface. Such feedback loop allows to consider contribution of users throughout the pipeline from ML model development to the deployment of ML results.

2.2 Uncertainty and trust

The research in human–machine trust and similar cognitive engineering constructs has a rich history [15]. Winkler [29] demonstrated the importance of communicating uncertainties in predictions. He believed that the consideration of uncertainty is greatly necessary in making rational decisions. It was also found that the presentation of automation uncertainty information helped the automation system receive higher trust ratings and increase acceptance of the system [30]. This display might improve the acceptance of fallible systems and further enhances human–automation cooperation. However, it remains unclear whether different types of uncertainty (e.g. risk and ambiguity) affect trust building, and if yes how they affect trust building. Here risk refers to situations with a known distribution of possible outcomes, and ambiguity is the situation where outcomes have unknown probabilities. The two forms of uncertainty are supported by distinct neural mechanisms [31].

LeClerc and Joslyn [32] successfully demonstrated that adding a probabilistic uncertainty estimate in public weather forecasts improved both decision quality and compliance. Kantowitz et al. [33] investigated the effect of uncertainty with different levels on user trust in a driving environment. It was shown that information without or with low uncertainty yielded better driver performance and subjective opinion than information with high uncertainty. Uggirala et al. [34] showed that trust relates to competence and has an inverse relation to uncertainty, meaning that an increase in uncertainty decreases trust in systems. Helldin et al. [35] found that the presentation of uncertainty of car’s ability to autonomously drive resulted in the decreased trust in the autonomous system compared to the situation without uncertainty presentation, indicating a more proper trust calibration in automation. Kunze et al. [36] found that uncertainty communication in autonomous systems helps operators calibrate their trust and gain situation awareness prior to critical situations, resulting in safer takeovers. Trust calibration refers to the agreement between the user’s trust in automation and the capabilities of the automation.

However, little work has been done on the effects of uncertainty, especially different types of uncertainty on user trust in predictive decision making.

2.3 Personality traits and trust

Big Five personality model is one of the widely used taxonomies for personality traits [37]. It is considered as a comprehensive way of measuring a person’s personality traits [37, 38]. The five factors identified as primary factors of personality are: Extroversion, Agreeableness, Conscientiousness, Neuroticism, and Openness to experience. They are defined regardless of differences in cultures and languages, suggesting capturing a human universal [39]. Therefore, these Big Five personality traits are also expected to be capable of shaping the propensity to trust. Freitag and Bauer [40] investigated the impact of personality traits on trust in strangers and friends. It showed that Conscientiousness and Openness are important traits for the development of both trust in friends and strangers, Agreeableness is related to trust in strangers. Fahr and Irlenbusch [41] found that personality traits based on the Cattell’s 16-PF-R questionnaire [42] were linked to observed behavior in a trust game, showing that individuals with low scores in anxiety (close to Neuroticism in Big Five model) were particularly qualified for enhancing trust between organizations. Cho et al. [20] proposed a probability model to examine the effect of personality traits on trust, and found that Agreeableness and Neuroticism have significant effect on trust. It was also found that an individual with high trait of Agreeableness or Conscientiousness had higher trust in automation [43]. Levine et al. [44] explored the personality traits and demonstrated that people high in guilt-proneness are more likely trustworthy. However, little work is done on how personality traits affect user trust in predictive decision making under uncertainty and cognitive load variations.

With the use of a case study of predictive decision making for the water pipe failure budget planning, this paper investigates user trust changes under variations of both uncertainty types and cognitive load levels, and with a focus on the effects of personality traits on trust changes.

3 Hypotheses

Risk uncertainty has known probabilities. People, with the help of these known probabilities, can be expected to make better and well-informed decisions quickly. Ambiguity uncertainty does not have clear probabilities or unknown (more details on risk and ambiguity uncertainty are discussed in Sect. 5). Therefore, we pose the following hypotheses:

  • H1 Ambiguity uncertainty will lead to the increase of trust only under low cognitive load conditions where people have sufficient cognitive space to process information;

  • H2 Presentation of ambiguity uncertainty under high load conditions will lead to a decrease in trust.

Predictive decision making needs careful reasoning work especially under uncertainty conditions. Keeping this in mind, we pose the following hypotheses from the perspective of uncertainty when further drilling down into personality trait groups of users:

  • H3 Agreeable individuals are cooperative, helpful, nurturing [45], and are more likely to trust in uncertain situations [46]. We expect a positive effect of high Agreeableness on trust in predictive decision making when uncertainty is difficult to distinguish (ambiguity);

  • H4 Neurotic people are highly anxious, insecure, and sensitive. We expect that people with high Neuroticism will have high trust when uncertainty probabilities are known (risk uncertainty). People with low Neuroticism will tend to have high trust under ambiguity uncertainty;

  • H5 Extravert individuals show tendency to be outgoing, amicable, assertive, energetic, and friendly [45]. Extroversion is positively related to better managing phishing emails and highly extroverted people are less likely to be phished (less trust in phishing) [47]. Similarly, we assume people with high Extroversion will tend to have high trust under ambiguity uncertainty, while people with low Extroversion (i.e. Introversion) will tend to have high trust under risk uncertainty;

  • H6 Conscientious individuals are rational, responsible, informed, organized, and persevering [45, 48]. People with high Conscientiousness will tend to have high trust in predictive decision making regardless of uncertainty types;

  • H7 People with high Openness are curious and seeking new experience [48], while people with low Openness are often much more cautious and conservative and may struggle with abstract thinking [49]. Therefore, we expect a positive effect of high Openness on trust when uncertainty is hard to distinguish (ambiguous). However, when uncertainty has known probabilities (risk), people with low Openness are cautious to risk and will have high trust in predictive decision making.

From the perspective of cognitive load, we have the following hypotheses

  • H8 Different personality traits will affect trust over cognitive load levels differently. When people have enough cognitive resource to process information (low cognitive load), people with low Agreeableness, low Neuroticism, high Extraversion, high Conscientiousness and high Openness will show higher trust over ambiguity uncertainty;

  • H9 When people do not have enough cognitive resource to process information (high cognitive load), people with high Neuroticism and low Extraversion will show higher trust without uncertainty information.

4 Method

This section presents a framework of user trust feedback loop in predictive decision making in order to demonstrate relations among uncertainty, cognitive load, personality traits and trust in predictive decision making. A case study of water pipe failure prediction is introduced for the setup of a user experiment.

4.1 Framework of User Trust Feedback Loop in Predictive Decision Making

We present a framework of user trust feedback loop in predictive decision making (see Fig. 1). In this framework, when an ML-based predictive decision making task with certain decision factors such as uncertainty is exposed to users, user responses (e.g. physiological, behavioral responses and subjective ratings) during decision making are recorded. The recorded user responses during task time are then used to build user trust models in order to predict/classify user trust. Such user trust information is incorporated into the user trust adaptation model, where the parameters and their presentations of ML models for predictive decision making are modulated based on the current user trust level. A new task session based on the modulated models and presentations is conducted to enhance the user trust in predictive decision making tasks. Human characteristics such as personality traits as well as other factors of uncertainty and cognitive load play significant roles on trust in this framework. This paper specifically focuses on the investigation of effects of personality traits on user trust under uncertainty and cognitive load (see Fig. 1).

Fig. 1
figure 1

Framework of user trust feedback loop in predictive decision making in human–machine collaborations

4.2 Case study

This research used the water pipe failure prediction as a case study for predictive decision making (replicated in lab environment). Water supply networks constitute one of the most crucial and valuable urban assets. The combination of growing populations and aging pipe networks requires water utilities to develop advanced risk management strategies in order to maintain their distribution systems in a financially viable way [5]. Pipes are characterized by different attributes, referred to features, such as laid year, material, diameter size, etc. If pipe failure historical data is provided, future water pipe failure rate is predictable with respect to the inspected length of the water pipe network [5]. Such models are used by utility companies for budget planning and pipe maintenance. However, different models with various uncertainty conditions may be achievable resulting in different possible budget plans especially by users with different personality traits. The experiment is then set up to determine how personality traits and what uncertainty conditions and cognitive load levels may influence the user trust during the decision process.

5 Experiment

5.1 Experimental data

In this study, predictive models are simulated based on different pipe features (e.g. size or laid year) with the reference of Hierarchical Beta Process (HBP) and Weibull used in water pipe failure prediction [5]. The model performance curve was presented to let the participants evaluate different models. The model performance is the functional relationship between the inspected length of the network and the percentage of failures detected by the model. Figure 2 shows the performance of two sample models, where the “blue model” outperforms the “red model”, because the former detects more failures than the latter for a given pipe length (horizontal axis). This study assumes that an ML model is a “black-box” and users directly get the model performance based on their input data. This is consistent with the real-world ML applications where users do not learn how ML algorithms process the data to get predictions.

Fig. 2
figure 2

Performance curves of ML models without uncertainty (control task)

ML models are usually imperfect abstractions of reality. As a result, imprecision can occur in the prediction through model uncertainty. Model uncertainty here refers to an interval within which the true value of a measured quantity would lie. For example, in Fig. 3a, in order to inspect 20% of the pipes in length, the uncertainty interval of the failure rate is [46%, 60%] for the blue model, and about [15%, 25%] for the red model: the red model is said to have less uncertainty than the blue model because the red model has smaller uncertainty interval than the blue model.

Fig. 3
figure 3

Predictive models with uncertainty: a non-overlapping models (risk uncertainty), and b overlapping models (ambiguity uncertainty)

Model uncertainty usually spans as a band in the model performance diagram as shown in Fig. 3. By considering model uncertainty, the relationship between two models may have two cases as shown in Fig. 3: non-overlapping models (see Fig. 3a) where uncertainty associated with models is referred to risk uncertainty, and overlapping models (see Fig. 3b) where uncertainty associated with models is referred to ambiguity uncertainty. In Fig. 3b, the interval of the model with lower uncertainty is subsumed in the interval of the model with higher uncertainty, whereas in Fig. 3a, the two bands are disjoint. Control task had only point prediction lines (see Fig. 2) and no uncertainty was presented. Risk uncertainty was presented by models with non-overlapping uncertainty (see Fig. 3a) and ambiguity by overlapping uncertainty models (see Fig. 3b).

5.2 Task design

According to the water pipe failure prediction framework, we investigated the decisions made by users under various conditions. Each user was asked to make a budget plan, i.e. a budget in terms of pipe length to be inspected, using the failure prediction models learned from the historical pipe failure records. Two ML models were provided for each estimation task. Participants were required to make decisions by selecting one of two presented ML models and then making a budget estimate based on the selected ML model. The budget estimate needs to meet the following requirements:

  • To inspect as short length of pipes as possible (low cost);

  • To be as precise in budget estimate as possible (higher accuracy would reflect greater confidence in estimation).

In this study, a module named Automatic Predictive Assistant (APA) is introduced to the participant as a new module ‘under testing’ phase. The APA is a simulated module which reads in the information provided by the ML models, and then recommends a typical decision for the participant. Participants can choose to trust, modify, or totally ignore the recommendations of APA. The participant needs to evaluate whether he/she trusts the estimation recommended by the APA. If he/she does not trust (modify or ignore) the APA, he/she is asked to provide own estimation. Figure 4 shows the screenshot of a task performed in the study.

Fig. 4
figure 4

Screenshot of a task performed in the study

In summary, each task is divided into following steps:

  1. (1)

    The participant is firstly asked to study the ML performance diagram (in the middle of the screenshot in Fig. 4) and answer questions on model performance and uncertainty to validate his/her understanding of the information presented.

  2. (2)

    Next, the APA recommendations (at the right side of the screenshot in Fig. 4) are displayed and again the user understanding is validated with questions.

  3. (3)

    Finally, the participant is required to estimate the budget. If he/she does not trust the recommendations from the APA, he/she is required to provide his/her own estimations (at the left side of the screenshot in Fig. 4) based on the ML performance. Subjective trust ratings are obtained immediately after this step.

Participants were encouraged to reach the best budget estimates they could as quickly as possible. Cognitive load was introduced by asking participants to remember a random number digit sequence for the duration of task time and reciting it after the task. This dual-task load inducing technique is widely used in decision making scenarios [50]. The cognitive load level was determined based on the number of digits being remembered. Four cognitive load levels were applied in this study—represented with CL1, CL2, CL3, CL4 from low to high, corresponding to the number of random digits to be remembered being three, five, seven and nine respectively. Three-digit number for lowest load condition and nine-digit for the highest load. The number of random digits for different levels of cognitive load is based on a series of pilot experiment in the study.

There were three different uncertainty visualizations (no uncertainty (control), non-overlapping uncertainty (risk), and overlapping uncertainty (ambiguity)). Each condition was performed under four different cognitive load levels. Each task was performed for three rounds. All together 36 estimation tasks (3 uncertainty conditions × 4 cognitive load levels x 3 rounds) were conducted by each subject. Three additional training tasks were also conducted by each subject before the formal tasks. The order of tasks was randomized during the experiment to avoid any bias.

5.3 Participants and apparatus

Forty-two (42) participants (10 were females) were recruited with the mean age of 30.4 ± 8.5 years. All were requested to make predictive decisions (using historical data visualized on screen) about the optimal length of pipe (thus budget estimation) to be checked in order to minimize water pipe failures. Participants got paid with AUD10, AUD5, or a chocolate bar based on their decision performance. Information was presented on a 21-inch Dell widescreen monitor.

5.4 Data collection

After each decision making task, participants were asked to rate their trust in recommendations (using a 9-point Likert scale where 1 = least trust, and 9 = most trust). Cognitive load rankings for each task from subjects were also collected using a 9-point Likert scale (1 = least mental effort, and 9 = most mental effort) for load validation purposes. Besides, personality traits of each participant were collected before tasks using the Ten-Item Personality Inventory (TIPI) [37]. The physiological signals such as galvanic skin responses (GSR) and behavior information such as mouse movement of participants during the task time were also collected.

6 Analysis of subjective ratings

In this study, a two-way ANOVA was conducted to examine effects of uncertainty and cognitive load on trust. There was no statistically significant interaction between uncertainty and cognitive load found. Therefore, we conducted the analysis of main effects of uncertainty and cognitive load on trust respectively in this section.

We aim to understand: 1) the effects of uncertainty on user trust under a given cognitive load level, and 2) the effects of cognitive load on user trust under a given uncertainty condition respectively. Therefore, for the evaluation of each aims, we first performed Friedman test and then followed it up with post hoc analysis using Wilcoxon signed-rank tests (with a Bonferroni correction) to analyze differences in participant responses of trust under a fixed condition (e.g. trust changes with different uncertainty types under the fixed CL1). In this analysis, we are only interested in the extreme load levels administered, namely CL1 (the lowest) and CL4 (the highest), as they are the most relevant for automated cognitive load management [16]. Trust values were normalized with respect to each subject to minimize individual differences in rating behavior (see Eq. 1):

$$ T_{i}^{N} = \frac{{T_{i} - T_{i}^{ \text{min} } }}{{T_{i}^{ \text{max} } - T_{i}^{ \text{min} } }} $$
(1)

where \( T_{i} \) and \( T_{i}^{N} \) are the original trust rating and the normalized trusting rating respectively from the user i, \( T_{i}^{ \text{min} } \) and \( T_{i}^{ \text{max} } \) are the minimum and maximum of trust ratings respectively from the user i in all of his/her tasks.

6.1 Trust and uncertainty

Figure 5 shows normalized trust values over the uncertainty treatments under each fixed CL levels. For each given uncertainty condition, we analyzed trust differences among different CL levels. Statistically significant differences of trust among CL levels have not been found under control and risk uncertainty except ambiguity uncertainty.

Fig. 5
figure 5

Trust over uncertainty presented: Control (No Uncertainty), Risk (Non-Overlapping Uncertainty) and Ambiguity (Overlapping Uncertainty)

When participants experienced ambiguity uncertainty (the rightmost group of columns in Fig. 5), Friedman test for CL level conditions showed a statistically significant difference in trust among four CL levels, \( {{\chi }}^{2} \left( 3 \right) = 14.455 \), p = .002. Then post hoc Wilcoxon tests (with a Bonferroni correction under a significance level set at α < .013) were applied to find pair-wise differences between levels in trust. The adjusted significance alpha level of .013 was calculated by dividing the original alpha of .05 by 4, based on the fact that we had four load level conditions. The post hoc tests found that for uncertainty condition of ambiguity, participants had significantly lower trust under high cognitive load (CL4), with Z = 822.0, p < .000, compared to that of low load (CL1), which confirms our hypothesis (H2).

6.2 Trust and cognitive load

Figure 6 shows normalized trust values over cognitive load levels. For each given CL level, we analyzed trust differences among different uncertainty conditions. Friedman’s test of CL conditions of the lowest (CL1) and highest (CL4) both gave statistically significant differences in trust among three uncertainty conditions, \( {{\chi }}^{2} \left( 2 \right) = 10.492 \), p = .005 and \( {{\chi }}^{2} \left( 2 \right) = 5.972 \), p = .05 respectively. Then post hoc Wilcoxon tests (with a Bonferroni correction under a significance level set at p < .017) were applied to find pair-wise differences between uncertainty conditions. The adjusted significance alpha level of .017 was calculated by dividing the original alpha of .05 by 3, based on the fact that we had three uncertainty conditions to test.

Fig. 6
figure 6

Trust over cognitive load levels

The post hoc tests found that for the low CL condition (CL1), participants had significantly higher trust in decisions with ambiguity (Z = 736.5, p = .002) than that with risk condition (Fig. 6, leftmost group of three columns). Whereas, for the high cognitive load (CL4) condition, participants showed significantly lower trust in decisions with risk uncertainty (Z = 1177.5, p = .003) than that without uncertainty information (control condition). These findings support the idea that ambiguity uncertainty can be readily processed by users only under low cognitive load conditions and leads to the increase of trust in predictive decision making as we expected (H1).

7 Personality traits and trust

7.1 Personality traits

Five personality traits of each participants were got based on the collected TIPI values [37]. Figure 7 shows the distribution of participants in five personality traits in the study

Fig. 7
figure 7

Distribution of participants with different personality traits

In this study, a two-way ANOVA was conducted to examine effects of uncertainty and cognitive load on trust under each personality trait conditions respectively. There were no statistically significant interactions between uncertainty and cognitive load found under each personality trait conditions except the low Agreeableness (p = .039). We then conducted the analysis of main effects of uncertainty and cognitive load on trust under each personality trait conditions respectively in this section.

Similar to the analysis in Sect. 6, this section also analyzes: (1) the effects of uncertainty on user trust under a given cognitive load level, and (2) the effects of cognitive load on user trust under a given uncertainty condition respectively, but by different personality traits of participants. Friedman tests followed by post hoc analysis using Wilcoxon signed-rank tests (with a Bonferroni correction as previously) are used to understand how personality traits affect trust perception under uncertainty and cognitive load conditions.

7.2 Trust and uncertainty

Table 1 shows the summary of statistical analysis of trust variations over uncertainty under different personality traits (“NO” means there are no significant differences found). For each personality trait, we analyze trust over uncertainty under both “High” and “Low” trait levels with Friedman test followed by post hoc Wilcoxon signed-rank tests (with a Bonferroni correction under a significance level set at α < .013 as mentioned in the previous section). From Table 1 we found that different personality traits affected trust over uncertainty differently. For example, under ambiguity uncertainty, participants with low Neuroticism, high Conscientiousness, high Openness, and low Agreeableness showed higher trust over low cognitive load than high cognitive load (see Fig. 8 as an example), which confirms our hypotheses of H4, H6, and H7 respectively except H3. The result showed that low Agreeableness but not high Agreeableness resulted in the high trust under ambiguity uncertainty. This is maybe because that predictive decision making needed more careful reasoning work and low Agreeableness helped to boost trust. However, the post hoc tests did not find any significant differences in trust over cognitive load levels (p > .013) (maybe because of the relatively small number of subjects) despite the significant differences in trust for participants with low Extraversion. Furthermore, under risk uncertainty, participants with low Extraversion showed significantly higher trust over low cognitive load than high cognitive load as we expected (H5). However, despite significant differences in trust for participants with high Neuroticism, high Conscientiousness, and low Openness under risk uncertainty, the post hoc tests did not find any significant differences in trust over cognitive load levels (maybe because of the relatively small number of subjects in this study). The further checking found that the mean ratings of trust over risk uncertainty were still higher under low load than that under high load for high Neuroticism, high Conscientiousness, and low Openness, which still confirms our hypothesis of H4, H6, and H7 respectively.

Table 1 Trust over uncertainty with different personality traits
Fig. 8
figure 8

Trust over uncertainty presented (low Agreeableness)

The results suggest that low Agreeableness, low Neuroticism, high Conscientiousness, and high Openness affected trust significantly under ambiguity uncertainty; high Neuroticism, high Conscientiousness, low Extraversion, and low Openness affected trust significantly under risk uncertainty.

7.3 Trust and cognitive load

Table 2 shows the summary of statistical analysis of trust variations over cognitive load with different personality traits. For each personality trait, we analyze trust over cognitive load under both “High” and “Low” trait levels with Friedman test followed by post hoc Wilcoxon signed-rank tests (with a Bonferroni correction under a significance level set at α < .017). From Table 2 we found that different personality traits affected trust over cognitive load levels differently. For example, under low cognitive load (CL1), participants with low Agreeableness and low Neuroticism showed significantly higher trust over ambiguity uncertainty than both control and risk uncertainty respectively (see Fig. 9 for an example), while participants with high Extraversion, high Conscientiousness, and high Openness showed significantly higher trust over ambiguity uncertainty than that over risk uncertainty under low cognitive load (CL1). Furthermore, under high cognitive load (CL4), participants with high Neuroticism and low Extraversion showed significantly higher trust over control condition than that over ambiguity.

Table 2 Trust over cognitive load with personality traits
Fig. 9
figure 9

Trust over cognitive load levels with low Neuroticism

The results suggest that under low cognitive load (CL1), low Agreeableness, low Neuroticism, high Extraversion, high Conscientious, and high Openness enhance trust under ambiguity uncertainty as we expected (H8). Under high cognitive load, high Neuroticism and low Extraversion benefit the increase of trust without uncertainty presentation as we expected (H9).

8 Discussions

The intelligent support aids based on predictive decision making have been becoming widely available. However, because of inherent uncertainties in machine learning, user trust plays significant roles in the effectiveness of machine learning in real world applications. This study focused on the investigation of effects of personality traits on user trust in predictive decision making especially under uncertainty and cognitive load variations. Such investigation helps the effective monitoring of user trust in predictive decision making and therefore improves the communication between human and machine learning. The results have significant implications in participatory design in predictive decision making.

Looking at the overall trust ratings across personality traits (see Fig. 10), we can see that participants with low Openness showed the highest trust in predictive decision making followed by low Conscientiousness, low Extraversion and high Neuroticism. While people with low Openness are often much more traditional and may struggle with abstract thinking [49], this helps people to do more abstract reasoning and results in the increase of trust in predictive decision making. However, it is argued that people who have high Openness can be expected to have a high level of trust due to their tolerant and open-minded nature [51]. Maybe this is because that the trust between people is mainly on a faith in people and open-minded nature helps boost trust, while the trust in predictive decision making needs much careful reasoning work and low Openness could benefit such reasoning process. The following sections further drill down to show the effects of personality traits on trust over uncertainty and cognitive load respectively.

Fig. 10
figure 10

Overall trust ratings across different personality traits

8.1 Uncertainty on trust

Generally, in a predictive decision making scenario, humans are required to make future oriented decisions based on the information or recommendation presented on the screen by an ML model that mostly works on data behind the scenes (appearing like a black box to user). Since these decisions are about the future, there can be no absolutely correct answers—but only better and more appropriate ones based on a more precise understanding of the underlying data presented during the decision making process. Therefore, better presentation and adequate communication of uncertainty inherent in the underlying ML process can improve the trust of the user in the system and lead to better and effective decisions. In our case, we experimented with visualizing and communicating two forms of uncertainty, namely, risk and ambiguity. Risk is a form of uncertainty where all probabilities related to outcomes are known. The user, with the help of these known probabilities, can be expected to make better and well-informed decisions quickly. Such risk type uncertainty was represented by non-overlapping models (see Fig. 3a). The other type of uncertainty we experimented with was ambiguity, which was represented by overlapping models (see Fig. 3b) and where probabilities of outcomes were either unknown or not clearly stated. Visuals presented in control condition are straightforward with no complication—however, they are simple, but only at the cost of hiding away the uncertainty inherent in the ML models. Once attempts are made to communicate the uncertainty—the trust seems to increase from control to ambiguity uncertainty only under conditions of low cognitive load and decrease under conditions of high cognitive load (see Fig. 5).

When personality traits are considered, it was found that low Agreeableness, low Neuroticism, high Conscientiousness, and high Openness affect trust significantly under ambiguity uncertainty; high Neuroticism, low Extraversion, high Conscientiousness and low Openness affected trust significantly under risk uncertainty (see Fig. 11 for an example of low Extraversion resulting in high trust under risk uncertainty). This phenomenon seems to be opposite to claims on trust between people that high Extraversion results in higher trust for people’s desire for social interaction and communication with this trait [40]. This is maybe because that predictive decision making needs more careful and abstract reasoning but not social interactions.

Fig. 11
figure 11

Trust ratings across different personality traits under risk uncertainty

Important lessons learned here for improved trust can be to assign users decision making tasks according to both personality traits and uncertainty conditions. The results of this study provide guidelines for such task assignment.

8.2 Cognitive load on trust

It is well known that human performance can be significantly affected by high cognitive load [16]. Cognitive load is the load imposed on working memory that the user experiences when engaged in a cognitive problem. In our case, the trust in predictive decision making is influenced by a cognitive phase where the user tried to make sense of the model information presented. Since the decision making task was soft time bound, the user must make efficient use of available cognitive resources in order to complete the task. In this study, we look at the extreme conditions where most cognitive resources were expected to be available (CL1) and where least cognitive resources were expected to be available (CL4). It was found that ambiguity uncertainty can be readily processed by users only under the low cognitive load (see Sect. 6.2). It can be said that under low cognitive load (implying greater availability of cognitive resources), users felt more confident in analyzing and interpreting the ambiguity uncertainty and therefore appeared to trust the judgement/recommendation of the predictive assistant as it made more sense to them. However, under high cognitive load, users might find themselves almost at the edge of their working memory capacity. Limited cognitive resources would result in less understanding of the ambiguity uncertainty. This in turn is indicated by reduced trust in the system. This phenomenon seems to be in line with findings that the better the person understands the system and its working, the greater the person is willing to trust it [52].

When we further drilled down into personality trait groups of users, it was found that low Neuroticism, high Extraversion, and high Openness benefit the increase of trust, which is in line with arguments in [40], but only under low cognitive load (CL1) with ambiguity uncertainty condition in predictive decision making (see Sect. 7.2). Contrary to the arguments in [40], low Agreeableness and high Conscientiousness also benefit the increase of trust under low cognitive load with ambiguity uncertainty. This is maybe because that trust in predictive decision making requires more accurate reasoning than trust between people. Furthermore, our study also found that under high cognitive load (CL4), high Neuroticism and low Extraversion benefit the increase of trust without uncertainty presentation.

These findings suggest that personality traits affect trust in predictive decision making differently under different cognitive load levels. Therefore, the approaches for the improved trust need to consider both cognitive requirements by tasks and users’ personality traits at the same time, e.g. users with high Neuroticism and low Extraversion may be assigned to conduct predictive decision making tasks under highly critical situations without uncertainty presentation.

8.3 Implications in participatory design in human–machine collaborations

Overall, we can say that uncertainty presentation can lead to the increased trust but only under low cognitive load conditions when users have sufficient cognitive resources to process the information. Presentation of uncertainty under high cognitive load conditions, when cognitive resources are short in supply, can lead to lowering trust in the system and its recommendations. Furthermore, different personality traits affect trust differently under both uncertainty and cognitive load conditions. For predictive decision making tasks with different cognitive load requirements and uncertainty conditions, users should be appointed according to their personality traits. The results of this work provide guidelines for such appointment. For example, under low cognitive load with ambiguity uncertainty, people with low Agreeableness, low Neuroticism, high Extraversion, high Conscientiousness, and high Openness should be appointed to conduct predictive decision making tasks, while under low cognitive load with risk uncertainty, people with high Neuroticism, low Extraversion, high Conscientiousness and low Openness should be appointed to conduct predictive decision making tasks. Furthermore, under high cognitive load situations, people with high Neuroticism and low Extraversion should be recruited to conduct predictive decision making without uncertainty presentations.

These findings can be integrated into the framework of trust feedback loop (see Fig. 1) for the participatory design in human–machine collaborations for predictive decision making. According to the findings, personality traits of users need to be firstly identified in the framework. The effects of personality traits on trust under both uncertainty and cognitive load conditions are then considered in the trust modelling and trust adaptation model in the framework. Such user trust adaptation loop not only helps improve the user acceptance of ML solutions, but also demonstrates a novel participatory design for ML-based solutions for the monitoring of user participation in the overall pipeline.

In order to incorporate these findings into ML-based applications, the user interface for an ML-based intelligent system needs to include the following components:

  • User personality traits identification modules;

  • Components which visualize uncertainty of ML models;

  • Feedback on user trust and cognitive load levels that allows users be aware of their cognition status in order to adapt decision factors accordingly.

These components are incorporated into the framework of user trust feedback loop, thereby introducing trust into a predictive decision making process and allowing for efficient and informed decisions in human–machine collaborations. From this perspective, the revealing and adapting user trust to ML models help to make “black box” ML models transparent, but not directly explain how ML algorithms process data with visualizations or feature contributions as other work do [53], where domain users still have difficulty to understand those complex visualizations and abstract concepts. The revealing and adaptation of user trust in a predictive decision making scenario are more meaningful for both ML researchers and domain users, and therefore help to improve the acceptance of ML solutions.

Although this study was based on a collaboration setting where human interacted with information displayed to him/her locally, the findings in this study could be extended to human–machine remote collaborations, where human interacts with information displayed to him/her from a remote site. The results of the study can help design user interfaces by providing appropriate information to a user with specific personality traits for effective human–machine remote collaborations [54, 55].

However, this study only used one type of visual representation of the uncertainty. We are not clear how the participants will respond to uncertainty with other visualization methods or numerical methods. Furthermore, only two ML models were compared in the uncertainty presentation, it is also not clear the responses of participants if the uncertainty visualization of more than two ML models is presented. These can be possible directions for future research.

In summary, this study showed that personality traits have significant effects on trust, and participatory design for ML-based solutions need to consider personality traits of users besides uncertainty and cognitive load conditions. These findings have at least three benefits to applications:

  • To provide guidelines on participatory design for human–machine collaborations by considering personality traits, uncertainty and cognitive load;

  • To design personalized intelligent user interface of ML-based applications. The user interface, which shows user trust in predictive decision making in real-time, would help users make informed decisions effectively;

  • To make ML transparent in ML research by measuring what is the user trust level based on ML output.

9 Conclusions and future work

This paper investigated the effects of personality traits on trust in human–machine collaborations under uncertainty and cognitive load conditions. A user study found that both personality trait types and levels (high or low) affected user trust in predictive decision making. Furthermore, personality traits under different uncertainty types and cognitive load levels showed different user trust perceptions. A framework of trust feedback loop was proposed to integrate these findings into participatory design in human–machine collaborations. Our findings fill a significant gap in trust in predictive decision making with the introduction of personality traits into ML-based solutions.

Our future directions will focus on investigating user physiological and behavioral variations over personality traits under both uncertainty and cognitive load conditions in human–machine collaborations. Such investigation will help the setup of a personalized user interface to dynamically adjust trust levels in human–machine collaborations.