Keywords

1 Introduction

Inferences through knowledge driven approaches have been researched extensively in the field of Artificial Intelligence. Among such approaches computational argumentation has recently emerged as a solid theoretical research discipline for defeasible reasoning and inference under uncertainty. Unfortunately, there is a lack of studies which examine its impact in real-world settings by considering real knowledge surrounded by uncertainty, incompleteness and contradictions. In certain settings, like in health care, large amounts of data are not always available, due to the difficulties in gathering it and because of privacy issues. Nonetheless, inferences have to be made. Knowledge-driven approaches are likely better suited in such cases instead of data-driven approaches because they rely upon knowledge-bases derived by human experts and not automatically extracted from data. Various quantitative approaches of reasoning under uncertainty exist. One of these is Fuzzy reasoning which allows robust representation of linguistic information and provide designers with computational tools to describe incomplete, inconsistent or ambiguous knowledge.

In this research, the inferential capacity of computational argumentation is compared against the one of non-monotonic fuzzy reasoning. The domain that has been chosen for such a comparison is survival prediction using biological markers. Biomarkers can be defined as features of the state or condition of a human which can be objectively measured and assessed as indicators of normal or abnormal biological processes [10]. This domain has been chosen because of the availability of a small dataset built over a number of months by a doctor in medicine who also provided an extensive knowledge-base. The research question under investigation is: to what extent can computational argumentation enhance the prediction of survival in elderly using biomarkers features when compared to non-monotonic fuzzy reasoning?

The remainder of this paper is organised as follows: Sect. 2 introduces related work on computational argumentation, non-monotonic fuzzy reasoning and biomarkers. The design of a comparative experiment and the methods for the development of argument-based and fuzzy reasoning-based models are detailed in Sect. 3. Section 4 provides the results followed by a discussion while Sect. 5 concludes the study suggesting future avenues of research.

2 Related Work

Many approaches in the field of Artificial Intelligence (AI) have been studied for dealing with quantitative reasoning under uncertainty. Among them, Fuzzy Logic and Argumentation Theory (AT) have already been used for modeling non-monotonic (defeasible) reasoning, a type of reasoning characterised by incomplete, contradicting and uncertain knowledge.

Argumentation Theory (AT) provides computational models for the implementation of defeasible reasoning [14], or reasoning when a conclusion can be changed in the light of new evidence. It has become progressively central in the AI domain for implementing non-monotonic reasoning [2, 5]. Furthermore, it is getting momentum thanks to its higher capacity and transparency to justify and retrace inferences [15, 16]. In recent works [19, 20] it is shown how different knowledge-bases can be translated into different argument-based models following a 5-layer schema upon which argumentation systems are generally built [13]. This schema includes the definition of the internal structure of arguments, the attacks and the resolution of conflicts as well as the computation of their dialectical status and the production of a final justifiable inference (schema adopted in this study and detailed in Sect. 3.3).

Fuzzy reasoning is well suited for modelling linguistic information and handling uncertain, imprecise knowledge providing a powerful framework for reasoning. However, not much work has been carried out on non-monotonic fuzzy reasoning. A few works have proposed some possible approaches for handling non-monotonicity. For example, in [4], the resolution of conflicting rules was tackled through aggregating their conclusions with an averaging function, or in [9], a rule base compression method is proposed for the reduction of the set of non-monotonic rules. A third approach can be found in [21]. It makes use of Possibility Theory [7] as a mechanism to solve conflicting information. Possibility Theory generalises the traditional fuzzy system in the sense that propositions have not one, but two truth values: possibility and necessity. Both are values within [0, 1] \(\in \mathbb {R}\), but the first indicates the extent to which data fail to refute its truth while the second indicates the extent to which data supports its truth.

An example of a domain where inferences have to be made in condition of uncertainty, incomplete and contradicting knowledge is health care. Here, for example, mortality of elderly individuals has to be predicted and this is mainly caused by non-communicable diseases, such as cardiovascular disease [12]. Prognostic information is then of essential value for clinical decision making, that in turn is useful fort the development of advance care planning for higher risk patients [11]. Some works have tackled this problem and attempted to use new biomakers in the prediction of mortality. [6] compares a few biomarkers, such as homocysteine, against other classic risk scores for predicting cardiovascular mortality in older people. In another example [1] the use blood borne biomarkers is explored as potential predictors of mortality risk. Nonetheless, biomakers validation as prognostic factors is still an open issue [22] given the uncertainty of the knowledge applied. Also, when predicting mortality, available evidence might be partial and conflicting, adding burden to the decision making process.

3 Design and Methodology

In order to answer the research question a primary research study was designed. This includes a comparison between the inferences produced by AT and non-monotonic fuzzy reasoning within the biomarkers domain. A knowledge-base on mortality risk factors in elderly, produced by an expert in the field, was employed for the development of non-monotonic fuzzy reasoning and argument-based models. Both approaches require that firstly, the knowledge-base is translated into logical expressions that can be adapted as computational rules or arguments. Three main units compose the non-monotonic fuzzy reasoning models: (1) a fuzzification module, (2) an inference engine and (3) a defuzzification module (Fig. 1 left). The argument-based models are structured over 5 layers, as proposed in [13] (Fig. 1 right): (1) definition of the structure of arguments, (2) definition of their conflicts, (3) their evaluation (4) the computation of the dialectical status of each argument and (5) their final accrual. A comparison of the inferences produced by AT and fuzzy reasoning was done by assessing their true positive (TPR) and false positive (FPR) rates on a dataset of 93 elderly patients described by 51 biomarkers (feature set). This data was obtained in a primary health care European hospital and the survival status of the 93 patients was recorded 5 years after data collection. The design of the research is summarised in Fig. 1.

Fig. 1.
figure 1

Evaluation strategy schema.

3.1 Knowledge-Base

Fifty one biomarkers were described by a clinician and their association with mortality risk levels was defined. Each description was encapsulated in one or more sentences to facilitate their adaptation into formal rules and formal arguments. Six out of 51 biomarkers were discarded given the contradictory information in their descriptions. For instance, suppose the description given for serum iron (iron in the blood when red blood cells and clotting factors have been removed) and its respective encapsulation:

  • Description 1: ‘Testing serum iron is a part of complete blood count test. According to available knowledge, both, lower and upper extremes of the interval values, recorded in the sample, might be unbeneficial for survival.’

  • Encapsulation 1: low or high serum iron imply unbeneficial survival.

Mortality risks were subsequently classified into five different categories: no risk (\(r_1\)), low risk (\(r_2\)), medium risk (\(r_3\)), high risk (\(r_4\)) and extremely risk (\(r_5\)). This classification was deducted from natural language descriptions such as: “may be non beneficial for survival”, “major cause of mortality” and “unbeneficial for survival”. Encapsulation 1 can then be extended to:

  • Encapsulation 2: low or high serum iron imply .

Contradictions and preferences among biomarkers were also provided by the interviewed domain expert. Since the full knowledge-base is vast and due to space limitations in this paper, it can be found online.Footnote 1

3.2 Non-monotonic Fuzzy Reasoning Models

Fuzzification Module. Rules in the form “IF ... THEN ...” were constructed given the encapsulated description. It is a straightforward process exemplified by the definition of rule R1 given Encapsulation 2:

  • R1: IF low serum iron OR high serum iron THEN \(r_2\).

Fuzzy membership functions (FMF) were defined for linguistic variables such as low serum iron and high serum iron. Not all biomakers had a fuzzy representation and they were incorporated into the fuzzy models as crisp variables (membership grade always 0 or 1). These include, for example, categorical biomarkers, such as hypertension, or numerical biomarkers with a strict threshold for their different levels, such as high-density lipoprotein cholesterol. Twenty one out of 45 biomakers could be modelled as fuzzy variables and had a FMF defined by the domain expert. Figure 2 depicts an example of FMF for low and high serum iron. The categories representing the five mortality risks also had an associated FMF (Fig. 3). Due to space limitation, the full list of FMFs can be found in the online knowledge-base (see footnote 1).

Fig. 2.
figure 2

Membership function for low serum iron (triangular) and high serum iron (linear).

Fig. 3.
figure 3

Triangular membership functions for risks \(r_{2-4}\) and linear membership functions for risks \(r_1\) and \(r_5\).

Inference Engine. Once the knowledge-base has been fully operationalised in the fuzzification module, then the model could be extended to perform fuzzy inferences. Due to the presence of a high amount of contradicting information in the knowledge-base, a mechanism for resolving contradictions was required. An example of a contradiction for increased serum insulin (INS) and waist to hip ratio (w/h) exist:

  • Contradiction 1: IF low INS THEN w/h is not high.

This information indicates that if INS is low then any rule whose antecedent contains “high w/h” is being refuted and its truth value should be re-evaluated. For example:

  • R2: IF high w/h THEN low risk (\(r_2\))

  • Exception 1: low INS refutes R2.

A possible approach for dealing with these types of exceptions is through the use of Possibility Theory. The work [21] presents an implementation of fuzzy reasoning with rule-based systems. It expands the usual fuzzy system using not one but two truth values named possibility (Pos) and necessity (Nec). Possibility can be seen as the extent to which data fail to refute its truth, whereas the Necessity of a proposition can be seen as the extent to which data supports its truth. Both possibility and necessity lies in the range [0, 1] \(\in \mathbb {R}\). Possibility of a proposition can also be seen as the upper bound of the respective necessity (Pos \(\ge \) Nec). Note that in a regular fuzzy system, necessity represents the membership grade of a proposition and possibility is always 1 for all propositions. The effect on the necessity of a proposition A by a set of propositions Q which refutes A is derivable in [21] and given by:

$$\begin{aligned} Nec(A) = min (Nec(A), \lnot Nec(Q_1), \ldots , \lnot Nec(Q_n)) \end{aligned}$$
(1)

where \(\lnot Nec(Q) = 1 - Nec(Q)\). In this study, there is no consideration of supporting information but only attempts to refute information. Thus, Eq. (1) can deal with the contradictions in the knowledge-base when the membership grade of a proposition is interpreted as its necessity. It is important to highlight that the approach developed in [21] was inspired by a multi-step forward-chaining reasoning system. On the contrary, in this study, the reasoning is done in a single step, and data is imported and all rules are fired at once. However, in order to solve the conflicting information, it is possible to organise exceptions in a tree structure in which the consequent of an exception is the antecedent of the next exception. In this way Eq. 1 can be applied from the root or roots until the leaves. The drawback is that cycles are not allowed, a situation that does not occur in the knowledge-base considered in this study. Eventually, the effect of Exception 1 on the truth value of R2 is:

  • Truth value R2 = Nec(high w/h) = min (Nec(high w/h), 1 - Nec(low INS)).

Nec(high w/h) is the membership grade of the linguistic variable high of biomarker w/h. If Nec(low INS) = 0 note that Exception 1 has no impact on R2 and if Nec(low INS) = 1 the new truth value of R2 is 0. Values between 1 and 0 indicates that R2 is partially refuted. The truth value of R2 represents the truth value of low risk in this respective rule.

Table 1. T-Norms and t-Conorms employed for two propositions a and b

Having a mechanism to solve conflicts, fuzzy logic operators can now be used to aggregate the antecedents of each rule and to aggregate the categories of mortality risks of consequents. Traditional fuzzy operators are selected for investigation: Zadeh, Product and Lukasiewicz. Table 1 lists the t-norms and t-conorms (fuzzy AND and fuzzy OR respectively) for each of them. Antecedents might employ OR or/and AND, while consequents (mortality risks) are aggregated by the OR operator. For instance, the truth value of in a context where only R1 and R2 infer \(r_2\) is “Nec(R1) OR Nec(R2)".

Defuzzification Module. The output of the inference engine is a graphic representation of the aggregation of the consequents (\(r_{1-5}\)) of rules as depicted in the example of Fig. 4. Several methods can be used for calculating a single defuzzified scalar. Two are selected here: mean of max and centroid. The former returns the average of all elements (in this case mortality risks) with maximal membership grade. The latter returns the coordinates (x, y) of the center of gravity of the geometric shape formed by the aggregation of the FMF of each mortality risk (example in Fig. 4) In summary, a set of models is constructed with different fuzzy logic operators and defuzzification techniques (Table 2). Each designed model produces a single scalar in the range \([0, 100] \in \mathfrak {R}\) as a final inference. However, beside this output, a final inference has to be produced for predicting mortality: death or survival. Several cutoffs of the scalar output are automatically applied to investigate how to separate the two possible outcomes.

Fig. 4.
figure 4

Example of inference graph with truth values of \(r_1\) = 0 and \(r_{2-5}\) = 1. The coordinates of the centroid are (58.52, 0.34) and the mean of max is 62.5.

Table 2. Set up of fuzzy models designed.

3.3 Argument-Based Models

The definition of argument based-models follows the 5-layer modelling approach proposed in [13] (and depicted in Fig. 1 right).

Layer 1 - Definition of the Structure of Arguments. The first step consists on the construction of forecast arguments. These can be represented like:

figure a

This structure is composed by a set of premises related to some biomarkers from which a conclusion can be deducted by applying an inference rule \(\rightarrow \). These are defeasible argument and informally it means that if the set of premises holds, then the conclusion presumably holds. Here conclusions are represented by the 5 categories of mortality risks, \(r_{1-5}\) (Sect. 3.1). Arguments are constructed from the encapsulated descriptions of biomarkers provided by the domain expert. For example, the following argument is derived from Encapsulation 2:

figure b

Layer 2 - Definition of the Conflicts of Arguments. The objective here is to model possible inconsistencies among arguments. Mitigating arguments [17] are constructed using the notion of attack. These are formed by a set of premises and an attack relation \(\Rightarrow \) to an argument B (forecast or mitigating):

figure c

Different typologies of mitigating arguments can be found in [18]. However, only the notion of undercutting attack is employed in this study. It defines an exception by which the application of the knowledge carried in the attacked argument is no longer allowed. Below an example of a forecast argument and a mitigating argument derived from Contradiction 1:

- A2: high w/h \(\rightarrow \) low risk (\(r_2\) )       - UA1: low INS \(\Rightarrow \) A2

Differently than the conflict resolution strategy described in Sect. 3.2, here an undercutting attack does not allow partial refutation, rather full refutation whereby its target argument is always discarded. The set of arguments (forecast and mitigating) and the set of undercutting attacks, originated from mitigating arguments, form an argumentation framework (AF) (example in Fig. 5-Left).

Fig. 5.
figure 5

Argumentation framework (Left): graphical representation of the knowledge-base employed in this primary research. Nodes are arguments, directed edges are attacks. Sub-Argumentation framework (Right): activated arguments (blue nodes) and surviving attacks for one record of the dataset. (Color figure online)

Layer 3 - Evaluation of the Conflicts of Arguments. The knowledge-base operationalised as an AF can now be elicited with real data. Arguments whose premises evaluate true are activated otherwise discarded. Attacks between activated arguments are considered valid. From activated arguments and valid attacks a sub-argumentation framework (sub-AF) emerges (Fig. 5-Right).

Layer 4 - Definition of the Dialectical Status of Arguments. Given a sub-AF, acceptability semantics are applied to compute the dialectical status of each argument (accepted or rejected). Each record of the dataset activates a different sub-AF and thus semantics have to be run for each of them. Among well-known semantics such as grounded and preferred [8], the grounded semantics is employed here. It returns only one extension (set) of arguments which is conflict free (it can be empty). It represents the least questionable set of arguments. Beside grounded semantics, also a ranking-based semantics is employed in this study. The goal is to rank-order arguments from the most to the least acceptable one. Note that, with a ranking-based semantics, arguments supporting different conclusions (here mortality risks) can be part of the same extension since they are simply ranked. Here, the categorizer semantic has been selected [3]. It ranks arguments based on the number of direct attacks in a way that attacks, from non attacked arguments, are stronger than attacks from arguments attacked multiple times. The detailed implementation of the categorizer semantics can be found in [3]. Figure 6 shows an example of a sub-AF evaluated by grounded and categorizer semantics. Note that arguments attacked only by rejected arguments can still be rejected under the categorizer semantics.

Fig. 6.
figure 6

Argumentation framework: acceptable arguments computed by the grounded semantics (left) and categorizer semantics (right). Blue nodes are activated but do not support a conclusion (mitigating arguments), so are not accepted neither rejected. Red and green nodes are forecast arguments rejected and accepted respectively. (Color figure online)

Layer 5 - Accrual of Acceptable Arguments. The last stage of the reasoning process is to produce a final inference (here a single scalar). This is defined by accrual of the accepted forecast arguments. Mitigating arguments do not support a conclusion and so have their role finalized by contributing to the resolution of conflicts. Each accepted forecast argument supports one mortality risk. In this case mortality risks have crisp values: \(r_1 = 0, r_2 = 25, r_3 = 50, r_4 = 75, r_5 = 100\). It is important to highlight that there is no correct values for mortality risks, but for comparison purposes, argument-based models adopts the same values designed by the domain expert for the fuzzy membership functions (for the consequents of rules). In this research, the final scalar is proposed to be equal to the risk value supported by the highest number of accepted forecast arguments. In case of a tie, their average is returned. In the same way as in the defuzzification unit of the fuzzy reasoning approach, several cutoffs of the scalar inference are automatically used to separate the possible outcomes (death or survival).

4 Results

Data from 93 elderly patients and 51 different biomarkers was obtained from primary health care European hospital during the time span of two years.Footnote 2 This was used to instantiate argument-based models employing grounded and categorizer semantics and also fuzzy reasoning models as listed in Table 2 (Sect. 3.2). The percentage of death and survival records is 39% and 61% respectively, so not perfectly balanced. The evaluation metrics selected were true positive rate (TPR) and false positive rate (FPR), which can be visualised by a Receiver Operating Characteristic (ROC) curve and compared according to the Area Under the Curve (AUC). Different thresholds to separate the two type of inferences produced, (death and survival), are automatically generated, providing one TPR and one FPR for each model and each cutoff. The AUC of the Precision-Recall (PR) curve is also investigated. This has been chosen because of the imbalanced distribution of the grouth truth (death or survival). In this case the positive predictive value (fraction of patients who had an inference of death and actually died) is plotted against the true positive rate. Figure 7 depicts the results of the comparison between all the designed models. Fuzzy reasoning models have very low AUC for both the ROC curves (between 0.284 and 0.306), and the PR curve (between 0.232 and 0.264) which suggests a low inferential capacity for death regardless of the cutoff employed. In addition, the similar AUC among all fuzzy models indicates that the different fuzzy logic operators and defuzzification techniques had minimal impact in the final inferences produced. As for the argument-based models, it is possible to observe a higher AUC for the ROC and PR curves, 0.494 and 0.371 respectively for the model employing the grounded semantic and 0.502 and 0.377 for the model employing the categorizer semantic, which is significantly better than non-monotonic fuzzy reasoning.

Fig. 7.
figure 7

True positive rate by false positive rate (left) and positive predictive value by true positive rate (right), for fuzzy and argument-based models for different cutoffs in the range \([0, 100] \in \mathfrak {R}\). The AUC is presented next to each model’s name (top).

4.1 Discussion

The AUC of the ROC curve for the fuzzy reasoning models shows a worse performance when compared to that of the argument-based models (approximately 67% lower on average). One factor that can likely explain the better performance of argumentation is its superior capacity in conflict resolution, thus actually better handling non-monotonicity as well as capturing and representing defeasible information. Another factor that might explain the lower performance of fuzzy reasoning models is the higher number of crisp variables present in the knowledge-base. These variables can hide the vagueness associated to information, undermining the capacity of fuzzy reasoning models to capture non-monotonic reasoning. In relation to the PR curve, the peak of 0.5 positive predictive value (for models F1, F3, F4, F6) suggests that the models based upon fuzzy reasoning are able to achieve a higher fraction of correct death inferences, but only with a very low true positive rate. In other words, AT presents a more robust fraction of correct death inferences when the true positive rate is higher, which is a clear advantage in the prediction of mortality. Nonetheless, it is also important to highlight that the AUC of the ROC curve for all models is very similar to the area associated to a random binary classifier (0.5). Although someone can argue that this is very poor, the random classifier does not given any insight on the inferences produced. Therefore, such a comparison is not useful. However, findings here are in line to a previous work where it has been shown that not even some data-driven approach for classifying mortality, using the same dataset employed in this research, could significantly outperform a random classifier [20]. This indeed suggests that the knowledge available is actually incomplete, uncertain and fragmented. Further work can be done to extend the current knowledge-base with additional information and the argument-based approach, described in this study, can actually support such a task. For example, those cases that have been predicted incorrectly can be further analysed individually. Since the concept of argument is always used across the layers of the defeasible argumentative approach, this makes the retracement and explanation of its inferences easier. Thus for a non-expert it is easier to grasp whether something went wrong or some additional information is actually needed. If this additional information become available, it can then be added to the previous knowledge-base and the inferential process can be repeated again. This task is more intuitive for a non-expert when compared to the fuzzy reasoning approach which employes the fuzzification and defuzzification mechanisms that are not really intuitive.

5 Conclusion and Future Work

This study presented a comparison of the inferential capacity of different reasoning models built with defeasible argumentation and non-monotonic fuzzy logic. These models were constructed upon an extensive knowledge-base gathered from an expert in the domain of elderly survival prediction using biomakers and were aimed at inferring death or survival of elderly people. This knowledge-base was based upon assumptions, intuitions and it was highly characterised by incompleteness, conflicting information and uncertainty. Argument-based models were constructed based on a 5-layer schema upon which argumentation systems are generally built: from the definition of arguments and attacks to the resolution of their conflicts, the production of their dialectical status and their final accrual towards a final inference. The fuzzy reasoning models adopted Possibility Theory for modeling conflicts among designed rules. This allowed the expansion of an usual fuzzy system by using not only one but two truth values of a proposition namely possibility and necessity. The metrics selected for the investigation of the inferential capacity of designed models were the true positive rate, the false positive rate and the positive predictive value. Findings showed how the argument-based models outperformed the fuzzy reasoning models. Future work will be focused on the replication of this study by evaluating the impact of other argument-based acceptability semantics on the computation of the dialectical status of arguments and their final accrual. Other experts will be interviewed to build additional knowledge-bases for the same problem. This will help strengthening current findings and better demonstrate the impact of argumentation for defeasible inference across different knowledge-bases. Eventually, the explainability of defeasible argumentation and its capacity of presenting justifiable inferences will be investigated more precisely.