Definition of the Subject

Scientific analysis of decision problems aims at giving the decision maker (DM) a recommendation concerning a set of objects (also called alternatives, solutions, acts, actions, options, candidates,…) evaluated from multiple points of view considered relevant for the problem at hand and called attributes(also called features, variables, criteria, objectives, …). For example, a decision canconcern:

  1. 1)

    diagnosis of pathologies for a set of patients, where patients are objects of thedecision, and symptoms and results of medical tests are the attributes,

  2. 2)

    assignment of enterprises to classes of risk, where enterprises are objects of thedecision, and financial ratio indices and other economic indicators, such as the market structure,the technology used by the enterprise and the quality of management, are the attributes,

  3. 3)

    selection of a car to be bought from among a given set of cars, where cars are objectsof the decision, and maximum speed, acceleration, price, fuel consumption, comfort, color and soon, are the attributes,

  4. 4)

    ordering of students applying for a scholarship, where students are objects of thedecision, and scores in different subjects are the attributes.

The following three main categories of decision problems are typically distinguished [44]:

  • classification, when the decision aims at assigning eachobject to one of predefined classes,

  • choice, when the decision aims at selecting the best objects,

  • ranking, when the decision aims at ordering objects from the best to the worst.

Looking at the above examples, one can say that 1) and 2) are classification problems, 3) isa choice problem and 4) is a ranking problem.

The above categorization can be refined by distinguishing two kinds of classification problems: taxonomy,when the value sets of attributes and the predefined classes are not preference ordered, and ordinal classification (also known as multiplecriteria sorting), when the value sets of attributes and the predefined classesare preference ordered [12]. In the above examples, 1) is a taxonomy problem and 2) is an ordinalclassification problem. If value sets of attributes are preference ordered, they are called criteria,otherwise they keep the name of attributes. For example, in a decision regarding the selection ofa car, its price is a criterion because, obviously, a low price is better than a high price. Instead, the color of a car is nota criterion but simply an attribute, because red is not intrinsically better than green. One can imagine, however, that also the color of a car couldbecome a criterion if, for example, a DM would consider red better than green.

Introduction

Scientific support of decisions makes use of a more or less explicit model of the decision problem.The model relates the decision to the characteristics of the objects expressed by the consideredattributes. Building such a model requires information about conditions and parameters of theaggregation of multi-attribute characteristics of objects. The nature of this information dependson the adopted methodology: prices and interest rates for cost-benefit analysis, cost coefficientsin objectives and technological coefficients in constraints for mathematical programming,a training set of decision examples for neural networks and machine learning, substitution ratesfor a value function of multi-attribute utility theory, pairwise comparisons of objects in termsof intensity of preference for the analytic hierarchy process, attribute weights and severalthresholds for ELECTRE methods, and so on (see the state-of-the-art survey [4]). Thisinformation has to be provided by the DM, possibly assisted by an analyst.

Very often this information is not easily definable. For example, this is the case of the price ofmany immaterial goods and of the interest rates in cost-benefit analysis, or the case of thecoefficients of objectives and constraints in mathematical programming models. Even if the requiredinformation is easily definable, like a training set of decision examples for neural networks, itis often processed in a way which is not clear for the DM, such that (s)he cannot see what are theexact relations between the provided information and the final recommendation. Consequently, veryoften the decision model is perceived by the DM as a black box whose result has to beaccepted because the analyst's authority guarantees that the result is “right”. In this context,the aspiration of the DM to find good reasons to make decision is frustrated and rises the need fora more transparent methodology in which the relation between the original information and the finalrecommendation is clearly shown. Such a transparent methodology searched for has been calledglass box [32]. Its typical representative is using a learning set of decisionexamples as the input preference information provided by the DM, and it is expressing the decisionmodel in terms of a set of “if …, then …” decision rules induced fromthe input information. From one side, the decision rules are explicitly related to the originalinformation and, from the other side, they give understandable justifications for the decisions tobe made.

For example, in case of a medical diagnosis problem, the decision rule approach requires as inputinformation a set of examples of previous diagnoses, from which some diagnostic rules are induced,such as “if there is symptom a and the test result is ß, then there ispathology ?”. Each one of such rules is directly related to examples of diagnoses in theinput information, where there is symptom a, test result ß andpathology ?. Moreover, the DM can verify easily that in the input information there is noexample of diagnosis where there is symptom a, test result ß but nopathology ?.

The rules induced from the input information provided in terms of exemplary decisions representa decision model which is transparent for the DM and enables his understanding of the reasons of hispast decisions. The acceptance of the rules by the DM justifies, in turn, their use for futuredecisions.

The induction of rules from examples is a typical approach of artificial intelligence. Thisexplains our interest in rough set theory  [38,39] which proved to be a useful tool foranalysis of vague description of decision situations [41,48]. The rough set analysis aims atexplaining the values of some decision attributes, playing the role of “dependent variables”, bymeans of the values of condition attributes, playing the role of ?“independent variables”. Forexample, in the above diagnostic context, data about the presence of a pathology are given bydecision attributes, while data about symptoms and tests are given by condition attributes. Animportant advantage of the rough set approach is that it can deal with partly inconsistentexamples, i.?e. cases where the presence of different pathologies is associated with the presenceof the same symptoms and test results. Moreover, it provides useful information about the role ofparticular attributes and their subsets, and prepares the ground for representation of knowledgehidden in the data by means of “if …, then …” decision rules.

Classical rough set approach proposed by Pawlak [38,39] cannot deal, however, with preferenceorder in the value sets of condition and decision attributes. For this reason, the classical roughset approach can deal with only one of four decision problems listed above – classification oftaxonomy type. To deal with ordinal classification, choice and ranking, it is necessary togeneralize the classical rough set approach, so as to take into account preference orders andmonotonic relationships between condition and decision attributes. This generalization, calledDominance-based Rough Set Approach (DRSA), has been proposed by Greco, Matarazzo andSlowinski [12,14,15,18,21,49].

Classical Rough Set Approach to Classification Problems of Taxonomy Type

Information Table and Indiscernibility Relation

Information regarding classification examples is supplied in the formof an information table, whose separate rows refer todistinct objects, and whose columns refer to differentattributes considered. This means that each cell of thistable indicates an evaluation (quantitative or qualitative)of the object placed in the corresponding row by means of theattribute in the corresponding column. Formally, an information tableis the 4-tuple \( { \textbf{\textit{S}} = \langle U,Q,V,v \rangle } \),where U is a finite set of objects, called universe,\( { Q=\{q_{1}, \dots,q_{n}\} } \) is a finite set of attributes, V q isa value set of the attribute \( { q, V = \bigcup\nolimits_{q \in Q}{V_q} } \), and \( { v\colon U\times Q \to V } \) is a total function such that\( { v(x,q) \in V_q } \) for each \( { q \in Q,\, x \in U } \), calledinformation function.

Therefore, each object x of U is described by a vector (string)\( { \mathrm{Des}_{Q}(x)=[v(x,q_{1}), \dots,v(x,q_{n})] } \), called description of x in termsof the evaluations on the attributes from Q; it represents the available informationabout x. Obviously, \( { x \in U } \) can be described in terms of any non-empty subset \( { P \subseteq Q } \).

To every (non-empty) subset of attributes \( { P \subseteq Q } \) there is associated an indiscernibility relation on  U, denoted by I P :

$$I_{P} = \{(x,y) \in U\times U\colon\; v(x,q) = v(y,q)\,,\; \forall q \in P\}\:.$$

If \( { (x,y) \in I_{P} } \), it is said that the objects x and y are P-indiscernible. Clearly, theindiscernibility relation thus defined is an equivalence relation (reflexive, symmetric andtransitive). The family of all the equivalence classes of the relation I P is denoted by\( { U / I_{P} } \), and the equivalence class containing object \( { x \in U } \), by \( { I_{P}(x) } \). Theequivalence classes of the relation I P are called P-elementary sets.

Approximations

Let S be an information table, X a non-empty subset of U and \( { \emptyset \ne P \subseteq Q }\). The P -lower approximation and the P -upperapproximation of  X in S are defined, respectively, as:

$$\underline{P}\left(X \right) = \left\{{x \in U\colon I_P \left(x \right) \subseteq X} \right\}\: ,$$
$$\overline{P}\left(X \right) = \left\{{x \in U\colon I_P \left(x \right) \cap X} \neq \emptyset\right\}\:.$$

The elements of \( { \underline{P}\left(X \right) } \) are all and only those objects \( { x \in U } \) whichbelong to the equivalence classes generated by the indiscernibility relation I P ,contained in X; the elements of \( { \overline P \left(X \right) } \) are all and only thoseobjects \( { x \in U } \) which belong to the equivalence classes generated by the indiscernibilityrelation I P , containing at least one object x belonging to X. In otherwords, \( { \underline{P}\left(X \right) } \) is the largest union of the P-elementary sets includedin X, while \( { \overline P \left(X \right) } \) is the smallest union of the P-elementary setscontaining X.

The P-boundary of  X in S, denoted by \( { Bn_{P}(X) } \), is defined by

$$Bn_{P}(X) = \overline P \left(X \right)-\underline{P}\left(X \right)\:.$$

The following inclusion relation holds:

$$\underline{P}\left(X \right) \subseteq X \subseteq \overline P \left(X \right)\:.$$

Thus, in view of information provided by P, if an object xbelongs to \( { \underline{P}\left(X\right) } \), then it certainly belongsto set X, while if x belongs to \( { \overline P\left(X \right) } \),then it possibly belongs to set X. \( { Bn_{P}(X) } \) constitutes the“doubtful region” of X: nothing can be said with certainty aboutthe membership of its elements to set X, using the subset ofattributes P only.

Moreover, the following complementarity relation is satisfied:

$$\underline{P}\left(X \right)=U-\overline{P}(U-X)\:.$$

If the P-boundary of set X is empty, \( { Bn_{P}(X)=\emptyset } \), then X is an ordinary (exact)set with respect to P, that is, it may be expressed as union of a certain numberof P-elementary sets; otherwise, if \( { Bn_{P}(X) \ne \emptyset } \), set X is an approximate (rough)set with respect to P and may be characterized by means of the approximations\( { \underline{P}\left(X \right) } \) and \( { \overline P \left(X \right) } \). The family of all sets \( { X \subseteq U } \) having the same P-lower and P-upper approximations is called the roughset.

The quality of the approximation of set X by means of the attributes from P is definedas

$$\gamma_{P}(X)= \frac{\left| {\underline{P}\left(X \right)} \right|}{\left| X \right|}\: ,$$

such that \( { 0 \le \gamma_{P}(X) \le 1 } \). The quality ? P (X) represents the relativefrequency of the objects correctly classified using the attributes from P.

The definition of approximations of a set \( { X \subseteq U }\) can be extended to a classification, i.?e. a partition \( Y = \left\{{Y_1 , \dots, Y_m} \right\} \) of U. The P-lower and P-upper approximationsof Y in S are defined by sets \( \underline{P} (Y) = \left\{ {\underline{P} (Y_1) , \dots,\underline{P} (Y_m)} \right\} \) and\( { \overline {P} (Y) = \left\{{\overline {P} (Y_1) , \dots,\overline {P} (Y_m)} \right\} }\), respectively. The coefficient

$$\gamma_P \left(Y \right) = \frac{\sum\nolimits_{i = 1}^m {\left| {\underline{P} (Y_i)} \right|}}{\left| U \right|}\:$$

is called quality of the approximation of classification Y by set of attributes P, orin short, quality of classification. It expresses the ratio of all P-correctlyclassified objects to all objects in the system.

Dependence and Reduction of Attributes

An issue of great practical importance is the reduction of “superfluous” attributes in aninformation table. Superfluous attributes can be eliminated without deteriorating the informationcontained in the original table.

Let \( { P \subseteq Q } \) and \( { p \in P } \). It is said that attribute p is superfluous in Pwith respect to classification Y if \( { \underline{P}(Y) =\underline{(P-p)}(Y) } \); otherwise, p isindispensable in P. The subset of Q containing all the indispensable attributes isknown as the core.

Given classification Y, any minimal (with respect to inclusion) subset \( { P \subseteq Q } \), such that \( { \underline{P}(Y) =\underline{Q}(Y) } \), is called reduct. It specifies a minimal subset P of Qwhich keeps the quality of classification at the same level as the whole set of attributes, i.?e. \( { \gamma_{P} \left(Y \right) = \gamma_Q \left(Y \right) } \). In other words, the attributes thatdo not belong to the reduct are superfluous with respect to the classification Y of objects from U.

More than one reduct may exist in an information table and their intersection gives the core.

Decision Table and Decision Rules

In the information table describing examples of classification, the attributes of set Q are divided intocondition attributes (set\( { C \ne \emptyset } \)) and decision attributes (set \( { D\ne \emptyset), C \cup D=Q } \) and \( { C \cap D=\emptyset } \). Such an information table is calleda decision table. The decision attributes induce a partition of U deducedfrom the indiscernibility relation I D in a way that isindependent of the condition attributes. D-elementary sets are called decisionclasses, denoted by \( { Cl_t, t=1, \dots,m }\). The partition of U into decision classes is called classification\( { \textbf{\textit{Cl}} = \{ Cl_1, \dots,Cl_m \} }\). There is a tendency to reduce the set C while keeping all importantrelationships between C and D, in order to make decisions on the basis ofa smaller amount of information. When the set of condition attributes is replaced by one of its reducts, the quality of approximation of theclassification induced by the decision attributes is not deteriorating.

Since the aim is to underline the functional dependencies between condition and decisionattributes, a decision table may also be seen as a set of decision rules. These arelogical statements of the type “if …, then …”, where the premise (conditionpart) specifies values assumed by one or more condition attributes (description of C-elementarysets) and the conclusion (decision part) specifies an assignment to one or more decision classes.Therefore, the syntax of a rule is the following:

if \( { v(x,q_{1})=r_{q1} }\) and \( {v(x,q_{2})=r_{q2} } \) and\( { v(x,q_{p})=r_{qp} } \), then \( { x \text{ belongs to decision class } Cl_{j1} } \) or \( { Cl_{j2} } \) or\( { Cl_{jk} }\)”,

where \( \{q_{1}, q_2, \dots,q_{p}\} \subseteq C, (r_{q1},r_{q2}, \dots,r_{qp}) \in V_{q1}\times V_{q2}\times \dots \times V_{qp} \) and \( { Cl_{j1}, Cl_{j2}, \dots, Cl_{jk} } \) are some decision classes of the considered classificationCl. If the consequence is univocal, i.?e. \( { k=1 } \), then the rule isunivocal, otherwise it is approximate.

An object \( { x \in U } \) supports decision rule r if its description is matching both the condition part and thedecision part of the rule. The decision rule r covers object x if it matches the condition part of the rule. Each decision rule is characterized by its strength, defined as the number of objectssupporting the rule. In the case of approximate rules, the strength is calculated for each possible decision class separately.

If a univocal rule is supported by objects from the lower approximation of the corresponding decision class only, then the rule is called certain or deterministic. If, however, a univocal rule is supported by objects from the upper approximation of the corresponding decisionclass only, then the rule is called possible or probabilistic. Approximate rules are supported, in turn, only byobjects from the boundaries of the corresponding decision classes.

Procedures for generation of decision rules from a decision table use an inductive learning principle. The objects are considered as examples ofclassification. In order to induce a decision rule with a univocal and certain conclusion about assignment of an object to decisionclass X, the examples belonging to the C-lower approximationof X are called positive and all the others negative. Analogously, in case of a possible rule, the examples belonging to the C-upper approximation of X are positive and all the others negative. Possible rules arecharacterized by a coefficient, called confidence, telling to what extent the rule is consistent, i.?e. what is the ratio of the number of positiveexamples supporting the rule to the number of examples belonging to set X according to decision attributes. Finally,in case of an approximate rule, the examples belonging to the C-boundary of X are positive and all the others negative. A decision rule is called minimal if removing anyattribute from the condition part gives a rule covering also negative objects.

The existing induction algorithms use one of the following strategies [55]:

  1. (a)

    Generation of a minimal representation, i.?e. minimal set of rules coveringall objects from a decision table,

  2. (b)

    Generation of an exhaustive representation, i.?e. all rules for a givendecision table,

  3. (c)

    Generation of a characteristic representation, i.?e. a set of rules coveringrelatively many objects, however, not necessarily all objects from a decision table.

Explanation of the Classical Rough Set Approach by an Example

Suppose that one wants to describe the classification of basic traffic signs to a novice. There arethree main classes of traffic signs corresponding to:

  • Warning (W),

  • Interdiction (I),

  • Order (O).

Then, these classes may be distinguished by such attributes as the shape (S) and the principalcolor (PC) of the sign. Finally, one can consider a few examples of traffic signs, like those shownin Table 1. These are:

  1. a)

    Sharp right turn,

  2. b)

    Speed limit of 50?km/h,

  3. c)

    No parking,

  4. d)

    Go ahead.

Table 1 Examples of traffic signs described by S and PC

The rough set approach is used here to build a model of classification of traffic signs to classesW, I, O on the basis of attributes S and PC. This is a typical problem of taxonomy.

One can remark that the sets of signs indiscernible by “Class” are:

$$\mathrm{W}={\{a\}}_\mathrm{Class}\,,\quad \mathrm{I}={\{b,c\}}_\mathrm{Class}\,,\quad \mathrm{O}={\{d\}}_\mathrm{Class}\: ,$$

and the sets of signs indiscernible by S and PC are as follows:

$$ {\{a\}}_{\mathrm{S,PC}}\,,\quad {\{b\}}_\mathrm{{S,PC}}\,,\quad {\{c,d\}}_\mathrm{{S,PC}}\:. $$

The above elementary sets are generated, on the one hand, by decision attribute “Class” and, on the other hand, bycondition attributes S and PC. The elementary sets of signs indiscernible by “Class” are denoted by \( { \{\cdot\}_\mathrm{Class} } \) and those by S and PC are denoted by \( { \{\cdot\}_\mathrm{{S,PC}} } \). Notice that \( { \mathrm{W}=\{a\}_\mathrm{Class} } \) is characterized preciselyby \( { \{a\}_\mathrm{{S,PC}} } \). In order tocharacterize \( { \mathrm{I}=\{b,c\}_\mathrm{Class} }\) and \( { \mathrm{O}=\{d\}_\mathrm{Class} }\), one needs \( { \{b\}_\mathrm{{S,PC}} }\) and \( { \{c,d\}_\mathrm{{S,PC}} }\), however, only \( { \{b\}_\mathrm{{S,PC}} }\) is included in \( {\mathrm{I}=\{b,c\}_\mathrm{Class} } \) while \( {\{c,d\}_\mathrm{{S,PC}} } \) has a non-empty intersection with both \( { \mathrm{I}=\{b,c\}_\mathrm{Class} } \) and \( { \mathrm{O}=\{d\}_\mathrm{Class} } \). It follows, from this characterization, that by usingcondition attributes S and PC, one can characterize class W precisely, while classes I and O can only be characterized approximately:

  • Class W includes sign a certainly and possibly no other sign than a,

  • Class I includes sign b certainly and possibly signs \( { b, c } \) and d,

  • Class O includes no sign certainly and possibly signs c and d.

The terms “certainly” and “possibly” refer to the absence or presence ofambiguity between the description of signs by S and PC from the one side, and by “Class”, fromthe other side. In other words, using description of signs by S and PC, one can say that all signsfrom elementary sets \( { \{\cdot\}_\mathrm{{S,PC}} } \) included in elementary sets \( { \{\cdot\}_\mathrm{Class} } \)belong certainly to the corresponding class, while all signs from elementary sets\( { \{\cdot\}_\mathrm{{S,PC}} } \) having a non-empty intersection with elementary sets \( { \{\cdot\}_\mathrm{Class} } \)belong to the corresponding class only possibly. The two sets of certain and possible signs are,respectively, the lower and upper approximation of the correspondingclass by attributes S and PC:

$$\begin{aligned}&\mathrm{lower}{\_}\mathrm{approx.}_\mathrm{{S,PC}}(\mathrm{W})=\{a\},\\&\qquad\qquad\qquad\qquad \mathrm{upper}{\_}\mathrm{approx.}_\mathrm{{S,PC}}(\mathrm{W})=\{a\},\\&\mathrm{lower}{\_}\mathrm{approx.}_\mathrm{{S,PC}}(\mathrm{I})=\{b\},\\&\qquad\qquad\qquad\qquad \mathrm{upper}{\_}\mathrm{approx.}_\mathrm{{S,PC}}(\mathrm{I})=\{b,c,d\},\\&\mathrm{lower}{\_}\mathrm{approx.}_\mathrm{{S,PC}}(\mathrm{O})=\emptyset,\\&\qquad\qquad\qquad\qquad \mathrm{upper}{\_}\mathrm{approx.}_\mathrm{{S,PC}}(\mathrm{O})=\{c,d\}.\end{aligned}$$

The quality of approximation of the classification by attributes S and PC is equal to thenumber of all the signs in the lower approximations divided by the number of all the signs in thetable, i.?e. \( { 1/2 } \).

One way to increase the quality of the approximation is to add a new attribute so as to decreasethe ambiguity. Let us introduce the secondary color (SC) as a new condition attribute. The newsituation is now shown in Table 2.

Table 2 Examples of traffic signs described by S, PC and SC

As one can see, the sets of signs indiscernible by S, PC and SC, i.?e. the elementary sets\( { \{\cdot\}_\mathrm{{S,PC,SC}} } \), are now:

$$\{a\}_\mathrm{{S,PC,SC}}\,,\quad \{b\}_\mathrm{{S,PC,SC}}\,,\quad \{c\}_\mathrm{{S,PC,SC}}\,,\quad\{d\}_\mathrm{{S,PC,SC}}\:.$$

It is worth noting that the elementary sets are finer than before and this enables the ambiguity tobe eliminated. Consequently, the quality of approximation of the classification by attributes S, PCand SC is now equal to 1.

A natural question occurring here is to ask if, indeed, all three attributes are necessary tocharacterize precisely the classes W, I and O. When attribute S or attribute PC is eliminated fromthe description of the signs, the elementary sets \( { \{\cdot\}_\mathrm{{PC,SC}} } \) or \( { \{\cdot\}_\mathrm{{S,SC}} } \) aredefined, respectively, as follows:

$$\{a\}_\mathrm{{PC,SC}}\,,\quad \{b\}_\mathrm{{PC,SC}}\,,\quad \{c\}_\mathrm{{PC,SC}}\,,\quad \{d\}_\mathrm{{PC,SC}}\: ,$$
$$ \{a\}_\mathrm{{S,SC}}\,,\quad \{b,c\}_\mathrm{{S,SC}}\,,\quad \{d\}_\mathrm{{S,SC}}\:. $$

Using any one of the above elementary sets, it is possible to characterize (approximate) classes W, I and O with the same quality(equal to 1) as it is when using the elementary sets \( { \{\cdot\}_\mathrm{{S,PC,SC}} }\) (i.?e. those generated by the complete set of three condition attributes). Thus, the answer to the above questionis that the three condition attributes are not all necessary to characterize precisely the classes W, I and O. It is, in fact, sufficient to use either PC andSC or S and SC. The subsets of condition attributes \( { \{\mathrm{PC, SC}\} }\) and \( { \{\mathrm{S, SC}\} }\) are called reducts of \( { \{\mathrm{S, PC, SC}\} }\) because they have this property. Note that the identification of reducts enables us to reduce attributes about the signsfrom the table to only those which are relevant.

Other useful information can be generated from the identification of reducts by taking theirintersection. This is called the core. In our example, the core contains attributeSC. This tells us that it is clearly an indispensable attribute i.?e. it cannot beeliminated from the description of the signs without decreasing the quality of theapproximation. Note that other attributes from the reducts (i.?e. S and PC) areexchangeable. If there happened to be some other attributes which were not included in anyreduct, then they would be superfluous, i.?e. they would not be useful at all in thecharacterization of the classes W, I and O.

If, however, column S or PC is eliminated from Table 2, then the resulting table is nota minimal representation of knowledge about the classification of the four traffic signs. Notethat, in order to characterize class W in Table 2, it is sufficient to use the condition“S = triangle”. Moreover, class I is characterized by two conditions (“S = circle” and “SC = red”) and class O is characterized by the condition “SC = white”. Thus, the minimal representation of this information system requires only four conditions (rather than the eightconditions that are presented in Table 2 with either column S or PC eliminated). Thisrepresentation corresponds to the following set of decision rules which may be seen asclassification model discovered in the data set contained in Table 2 (in the braces thereare symbols of signs covered by the corresponding rule):

$$\begin{aligned}rule\, \#1: \textit{if }\, & \mathrm{S}=\mathrm{triangle}, && \textit{then } \mathrm{Class}=\mathrm{W} \quad && \{a\}\\rule\, \#2: \textit{if }\, & \mathrm{S}=\mathrm{circle} \\\textit{and}\;\; & \mathrm{SC}=\mathrm{red}, && \textit{then } \mathrm{Class}=\mathrm{I} \quad && \{b,c\}\\rule\, \#3: \textit{if }\, & \mathrm{SC}=\mathrm{white}, && \textit{then } \mathrm{Class}=\mathrm{O} \quad && \{d\}\:.\end{aligned}$$

This is not the only representation, because an alternative set of rules is:

$$\begin{aligned}rule\, \#1^{\prime}: \textit{if }\, & \mathrm{PC}=\mathrm{yellow}, && \textit{then } \mathrm{Class}=\mathrm{W} \quad && \{a\}\\rule\, \#2^{\prime}: \textit{if }\, & \mathrm{PC}=\mathrm{white}, && \textit{then } \mathrm{Class}=\mathrm{I} \quad && \{b\}\\rule\, \#3^{\prime}: \textit{if }\, & \mathrm{PC}=\mathrm{blue},\\\textit{and}\;\; & \mathrm{SC}=\mathrm{red}, && \textit{then } \mathrm{Class}=\mathrm{I} \quad && \{c\}\\rule\, \#4^{\prime}: \textit{if }\, & \mathrm{SC}=\mathrm{white}, && \textit{then } \mathrm{Class}=\mathrm{O} \quad && \{d\}\:.\end{aligned}$$

It is interesting to come back to Table 1 and to ask what decision rules represent thisinformation system. As the description of the four signs by S and PC is not sufficient tocharacterize exactly all the classes, it is not surprising that not all the rules will havea non-ambiguous decision. Indeed, the following decision rules can be induced:

$$ \begin{aligned} rule\, \#1^{\prime\prime}: \textit{if }\, & \mathrm{S}=\mathrm{triangle}, && \textit{then }\mathrm{Class}=\mathrm{W} && \{a\}\\ rule\, \#2^{\prime\prime}: \textit{if }\, &\mathrm{PC}=\mathrm{white}, && \textit{then } \mathrm{Class}=\mathrm{I} &&\{b\}\\ rule\, \#3^{\prime\prime}: \textit{if }\, &\mathrm{PC}=\mathrm{blue}, && \textit{then } \mathrm{Class}=\mathrm{I} \text{ or } \mathrm{O} && \{c,d\}\:.\end{aligned} $$

Note that these rules can be induced from the lower approximations of classes W and I,and from the set called the boundary of both I and O. Indeed, for exact rule \( { \#1^{\prime\prime} } \), the supporting example is in\( { \mathrm{lower}{\_}\mathrm{approx.}_\mathrm{{S,PC}}(\mathrm{W})=\{a\} }\); for exact rule \( { \#2^{\prime\prime} }\) it is in \( {\mathrm{lower}{\_}\mathrm{approx.}_\mathrm{{S,PC}}(\mathrm{I})=\{b\} } \); and the supporting examples for approximate rule\( { \#3^{\prime\prime} } \) are in the boundary ofclasses I and O, defined as:

$$\begin{aligned}[b]&\mathrm{boundary}_\mathrm{{S,PC}}(\mathrm{I})\\ &\quad =\mathrm{upper}{\_}\mathrm{approx.}_\mathrm{{S,PC}}(\mathrm{I})-\mathrm{lower}{\_}\mathrm{approx.}_\mathrm{{S,PC}}(\mathrm{I})\\&\quad = \{c,d\}\: ,\end{aligned}$$
$$\begin{aligned}[b]&\mathrm{boundary}_\mathrm{{S,PC}}(\mathrm{O})\\&\quad = \mathrm{upper}{\_}\mathrm{approx.}_\mathrm{{S,PC}}(\mathrm{O})-\mathrm{lower}{\_}\mathrm{approx.}_\mathrm{{S,PC}}(\mathrm{O})\\&\quad = \{c,d\}\:.\end{aligned}$$

As a result of the approximate characterization of classes W, I and O by S and PC, anapproximate representation in terms of decision rules is obtained. Since the quality of theapproximation is \( { 1/2 } \), exact rules (\( { \#1^{\prime\prime} } \) and \( { \#2^{\prime\prime} } \)) cover onehalf of the examples and the other half is covered by the approximate rule(\( { \#3^{\prime\prime} } \)). Now, the quality of approximation by S and SC, or by PC and SC, was equalto 1, so all examples were covered by exact rules (\( { \#1 } \) to #3, or #1' to#4' respectively).

One can see, from this simple example, that the rough set analysis of data included in aninformation system provides some useful information. In particular, the following results areobtained:

  • A characterization of decision classes in terms of chosen condition attributes through lowerand upper approximation.

  • A measure of the quality of approximation which indicates how good the chosen set ofattributes is for approximation of the classification.

  • The reducts of condition attributes including all relevant attributes. At the same time,superfluous and exchangeable attributes are also identified.

  • The core composed of indispensable attributes.

  • A set of decision rules which is induced from the lower and upper approximations of thedecision classes. This constitutes a classification model for a given information system.

Dominance-Based Rough Set Approach to Ordinal Classification Problems

Dominance-Based Rough Set Approach (DRSA)

Dominance-based Rough Set Approach (DRSA) has been proposed by the authors to handle backgroundknowledge about ordinal evaluations of objects from a universe, and about monotonic relationshipsbetween these evaluations, e.?g. “the larger the mass and the smaller the distance, the largerthe gravity” or “the greater the debt of a firm, the greater its risk of failure”. Sucha knowledge is typical for data describing various phenomena. It is also characteristic for dataconcerning multiple criteria decision or decision under uncertainty, where the order of value setsof condition and decision attributes corresponds to increasing or decreasing preference. In case ofmultiple criteria decision, the condition attributes are called criteria.

Let us consider a decision table including a finite universe of objects (solutions, alternatives, actions) U evaluated on a finite set of criteria \( { F=\{f_1, \dots,f_n\} } \), and on a single decision attribute d. The set of the indices ofcriteria is denoted by \( { I=\{1, \dots, n\} }\). Without loss of generality, \( { f_i\colon U\rightarrow \Re } \) for each \( { i=1, \ldots, n }\), and, for all objects \( { x,y \in U, f_i(x) \ge f_i(y) } \) means that “x is at least as good as y with respect to criterion f i ?”, which is denoted by \( { x\succeq_{i} y } \). Therefore, it is supposed that \( { \succeq_i } \) is a completepreorder, i.?e. a strongly complete and transitive binary relation, defined on U on the basis of evaluations\( { f_i(\cdot) } \). Furthermore, decisionattribute d makes a partition of U into a finite number of decisionclasses, \( { \textbf{Cl}=\{Cl_{1}, \ldots, Cl_{m}\} }\), such that each \( { x \in U }\) belongs to one and only one class \( { Cl_{t},t=1, \ldots, m } \). It is assumed that the classes are preference ordered, i.?e. for all \( { r,s=1, \ldots, m } \), such that \( { r > s } \), the objects from \( { Cl_{r} } \) are preferred to the objects from \( { Cl_{s} } \). More formally, if \( { \succeq } \) is a comprehensiveweak preference relation on  U, i.?e. if for all \( { x,y \in U,\, x{\succeq}y } \) reads “x is at least asgood as y”, then it is supposed that

$$[x{\in}Cl_{r},\, y {\in} Cl_{s},\, r > s] \Rightarrow x{\succ} y\: ,$$

where \( { x{\succ} y } \) means \( { x{\succeq} y } \) and not \( { y {\succeq} x } \).

The above assumptions are typical for consideration of an ordinal classification (or multiplecriteria sorting) problem. Indeed, the decision table characterized above includes examples ofordinal classification which constitute an input preference information to be analyzedusing DRSA.

The sets to be approximated are called upward union and downward union of decision classes, respectively:

$$Cl_t^\ge = \bigcup_{s \ge t} Cl_s , \quad Cl_t^\le = \bigcup_{s \le t} Cl_s ,\quad t=1, \dots,m\:.$$

The statement \( { x \in Cl_t^\ge } \) reads “x belongs to at least class \( { Cl_t } \)”, while \( { x \in Cl_t^\le } \) reads “x belongs to at most class \( { Cl_{t} } \)”. Let us remark that \( { Cl_1^\ge =Cl_m^\le =U,\, Cl_m^\ge = Cl_{m} } \) and \( { Cl_1^\le = Cl_{1} } \). Furthermore, for \( { t=2, \dots, m } \),

$$ Cl_{t - 1}^\le = U-Cl_t^\ge\quad\text{and}\quad Cl_t^\ge = U-Cl_{t - 1}^\le\:. $$

The key idea of DRSA is representation (approximation) of upward and downward unions of decision classes, by granules of knowledge generated by criteria. These granules aredominance cones in the criteria values space.

x dominates y with respect to set of criteria \( { P \subseteq I } \) (shortly, x P-dominates y), denoted by \( { x {D_P} y } \), if for every criterion \( { i \in P, f_i(x) \ge f_i(y) } \). The relation of P-dominance is reflexive and transitive, i.?e. it is a partialpreorder.

Given a set of criteria \( { P \subseteq I } \) and \( { x \in U } \), the granules of knowledge used forapproximation in DRSA are:

  • a set of objects dominating x, called P-dominating set, \( { D_P^+ \left(x \right)=\{y \in U\colon y {D_P} x\} } \),

  • a set of objects dominated by x, called P-dominated set, \( { D_P^ - \left(x \right)=\{y \in U\colon x {D_P} y\} } \).

Let us recall that the dominance principle requires that an object x dominating object y on allconsidered criteria (i.?e. x having evaluations at least as good as y on all consideredcriteria) should also dominate y on the decision (i.?e. x should be assigned to at least asgood decision class as y). Objects satisfying the dominance principle are calledconsistent, and those which are violating the dominance principle are calledinconsistent.

The P-lower approximation of \( { Cl_t^\ge }\), denoted by \( {\underline{P}\left(Cl_t^\ge\right) } \), and the P-upper approximation of \( { Cl_t^\ge }\), denoted by \( { \overline{P}\left(Cl_t^\ge\right) } \), are defined as follows \( { (t=2,\dots, m) } \):

$$\begin{aligned}\underline{P}\left(Cl_t^\ge \right)& = \left\{x \in U\colon D_P^+ (x) \subseteq Cl_t^\ge\right\}, \\\overline {P}\left(Cl_t^\ge \right)& = \left\{x \in U\colon D_P^- (x) \cap Cl_t^\ge \ne \emptyset\right\}\:.\end{aligned}$$

Analogously, one can define the P-lower approximation and the P-upperapproximation of \( { Cl_t^\le } \) as follows (\( { t=1, \dots,m-1 } \)):

$$\begin{aligned}\underline{P}\left(Cl_t^\le \right) &= \left\{x \in U\colon D_P^- (x) \subseteq Cl_t^\le\right\}, \\\overline{P}\left(Cl_t^\le \right) &= \left\{x \in U\colon D_P^+ (x) \cap Cl_t^\le \ne \emptyset\right\}\:.\end{aligned}$$

The P-lower and P-upper approximations so defined satisfy the following inclusionproperties, for all \( { P \subseteq I } \):

$$\underline{P}\left(Cl_t^\ge \right) \subseteq Cl_t^\ge \subseteq \overline {P}\left(Cl_t^\ge\right), \quad t =2, \ldots , m\: ,$$
$$\underline{P}\left(Cl_t^\le \right) \subseteq Cl_t^\le \subseteq \overline{P}\left(Cl_t^\le\right), \quad t=1, \ldots , m-1\:.$$

The P-lower and P-upper approximations of \( { Cl_t^\ge } \) and \( { Clt^\le } \) have an importantcomplementarity property, according to which,

$$\begin{aligned}&\underline{P}\left(Cl_t^\ge \right)=U-\overline{P}\left(Cl_{t - 1}^\le \right) \quad\text{and}\\&\overline{P}\left(Cl_t^\ge \right)=U-\underline{P}\left(Cl_{t - 1}^\le \right)\,,\quad t=2, \dots, m,\\&\underline{P}\left(Cl_t^\le \right)=U-\overline{P}\left(Cl_{t + 1}^\ge \right) \quad\text{and}\\&\overline{P}\left(Cl_t^\le \right)=U-\underline{P}\left(Cl_{t + 1}^\ge \right)\,,\quad t=1, \dots, m-1\:.\end{aligned}$$

The P-boundary of \( { Cl_t^\ge } \) and \( { Cl_t^\le } \), denoted by \( { Bn_{P}\left(Cl_t^\ge\right) } \)and \( { Bn_{P}\left(Cl_t^\le\right) } \), respectively, are defined as follows:

$$\begin{aligned}Bn_{P}\left(Cl_t^\ge\right)&=\overline{P}\left(Cl_t^\ge\right)-\underline{P}\left(Cl_t^\ge \right),\quad t=2, \ldots , m,\\Bn_{P}\left(Cl_t^\le\right)&=\overline{P}\left(Cl_t^\le \right)-\underline{P}\left(Cl_t^\le\right), \quad t=1, \ldots , m-1.\end{aligned}$$

Due to the above complementarity property, \( { Bn_{P}\left(Cl_t^\ge\right)= Bn_{P}\left(Cl_{t - 1}^\le\right) } \), for \( { t=2, \dots,m } \).

For every \( { P \subseteq C } \), thequality of approximation of the ordinal classificationCl by a set of criteria P is defined as the ratio of the number ofobjects P-consistent with the dominance principle and the number of all the objects in U. Since the P-consistent objects are those which do not belong to any P-boundary \( { Bn_{P}(\mathop {Cl}\nolimits_t^\ge), t =2, \ldots, m } \), or \( {Bn_{P}\left(Cl_t^\le\right), t = 1, \ldots, m-1 } \), the quality of approximation of the ordinal classification Cl by a set of criteria P, can be written as

$$ \begin{aligned}[b] \gamma_{P}(\textbf{\textit{Cl}}) &= \frac{\left| U-\left(\bigcup_{t = 2, \ldots, m}Bn_{P}\left(Cl_{t}^{\ge}\right)\right)\right|}{|U|}\\ &=\frac{\left|U-\left(\bigcup_{t =1, \ldots, m-1} Bn_{P}\left(Cl_{t}^{\le}\right)\right)\right|}{|U|}\:.\end{aligned} $$

\( {\gamma_{P}(\textit{\textbf{Cl}}) } \) can be seen as a degree of consistency of the objects from U, when P is the set of criteria and Cl is theconsidered ordinal classification.

Each minimal (with respect to inclusion) subset \( { P \subseteq C }\) such that \( { \gamma_{P} (\textit{\textbf{Cl}})= \gamma_{C} (\textit{\textbf{Cl}}) } \) is called a reduct of Cl, and is denoted by \( { \text{RED}_{\textbf{\textit{Cl}}}} \). Let us remark that for a given set U one can have more than onereduct. The intersection of all reducts is called the core, and is denoted by \( { \text{CORE}_{\textit{\textbf{Cl}}} } \). Criteria in\( { \text{CORE}_{\textit{\textbf{Cl}}} } \) cannotbe removed from consideration without deteriorating the quality of approximation. This means that, in set C, there arethree categories of criteria:

  • indispensable criteria included in the core,

  • exchangeable criteria included in some reducts, but not in the core,

  • redundant criteria, neither indispensable nor exchangeable, and thus not includedin any reduct.

The dominance-based rough approximations of upward and downward unions of decision classes can serve to induce a generalized description ofobjects in terms of “if …, then …” decision rules . For a given upward or downward union ofclasses, \( { Cl_{t}^{\ge} } \) or \( { Cl_{s}^{\le} } \), the decision rules induced undera hypothesis that objects belonging to \( { \underline {P}({Cl_{t}^{\ge}}) }\) or \( { \underline {P}\left(Cl_{s}^{\le}\right) }\) are positive examples, and all the others are negative, suggest a certainassignment to “class \( { Cl_{t} } \) orbetter”, or to “class \( { Cl_{s} } \)or worse”, respectively. On the other hand, the decision rules induced under a hypothesis that objects belonging to \( { \overline {P}\left(Cl_{t}^{\ge}\right) } \) or \( { \overline {P}\left(Cl_{s}^{\le}\right) } \) are positiveexamples, and all the others are negative, suggest a possible assignment to “class \( { Cl_{t} } \) or better”, or to “class\( { Cl_{s} } \) or worse”, respectively.Finally, the decision rules induced under a hypothesis that objects belonging to the intersection \( { \overline{P}(Cl_{s}^{\le}) \cap \overline {P}(Cl_{t}^{\ge}) } \) are positive examples, and allthe others are negative, suggest an approximate assignment to some classes between \( { Cl_{s} } \) and \( { Cl_{t} (s<t) } \).

In the case of preference ordered description of objects, set U is composed of examples ofordinal classification. Then, it is meaningful to consider the following five types of decisionrules:

  1. 1)

    certain \( { D_{\ge} } \)-decision rules,providing lower profile descriptions for objects belonging to \( { \underline{P} ({Cl_{t}^{\ge}}) } \): if \( f_{i_1}(x)\ge r_{i_1} \) and … and \( { f_{i_p}(x)\ge r_{i_p} } \), then \( x \in Cl_{t}^{\ge}, \{i_1, \ldots , i_p\}\subseteq I, t=2, \ldots, m, r_{i_1}, \ldots, r_{i_p} \in \Re\);

  2. 2)

    possible \( { D_{\ge} } \)-decision rules, providing lower profiledescriptions for objects belonging to \( { \overline{P} ({Cl_{t}^{\ge}}) }\): if \( f_{i_1}(x)\ge r_{i_1}\) and … and \( { f_{i_p}(x)\ge r_{i_p} }\), then x possibly belongs to \( Cl_{t}^{\ge}, \{i_1, \ldots , i_p\}\subseteq I, t=2, \ldots, m, r_{i_1}, \ldots,r_{i_p} \in \Re \);

  3. 3)

    certain \( { D_{\le} } \)-decision rules,providing upper profile descriptions for objects belonging to\( { \underline{P} ({Cl_{t}^{\le}}) } \): if\( f_{i_1}(x)\le r_{i_1} \) and … and\( { f_{i_p}(x)\le r_{i_p} } \), then \( x \in Cl_{t}^{\le}, \{i_1, \ldots , i_p\}\subseteq I, t=1,\ldots, m-1, r_{i_1}, \ldots, r_{i_p} \in \Re \);

  4. 4)

    possible \( { D_{\le} } \)-decision rules,providing upper profile descriptions for objects belonging to\( { \overline{P} ({Cl_{t}^{\le}}) } \): if\( f_{i_1}(x)\le r_{i_1} \) and … and\( f_{i_p}(x)\le r_{i_p} \), then x possibly belongs to \( Cl_{t}^{\le}, \{i_1, \ldots ,i_p\}\subseteq I, t=1, \ldots, m-1, r_{i_1}, \ldots, r_{i_p} \in \Re \);

  5. 5)

    approximate \( { D_{{\ge} {\mathrm \le}} } \)-decision rules,providing simultaneously lower and upper profile descriptions forobjects belonging to \( { Cl_{s} \cup Cl_{s{\mathrm +} 1} \cup \ldots \cup Cl_{t} }\), without possibility of discerning to which class: if \( { f_{i_1}(x)\ge r_{i_1} } \) and … and \( { f_{i_k}(x)\ge r_{i_k} } \) and \( { f_{i_{k+1}}(x)\le r_{i_{k+1}} } \) and … and \( { f_{i_p}(x)\le r_{i_p} } \), then \( x \in Cl_{s} \cup Cl_{s+1} \cup \ldots \cup Cl_{t}, \{i_1, \ldots ,i_p\}\subseteq I, s,t \in \{1, \ldots, m\}, s < t, r_{i_1}, \ldots, r_{i_p} \in \Re \).

In the premise of a \( { D_{{\ge} {\mathrm \le}} } \)-decision rule, there can be “\( { f_{i}(x)\ge r_{i} } \)” and“\( { f_{i}(x)\le r^{\prime}_{i} } \)”, where \( { r_{i} \le r^{\prime}_{i} } \), for the same \( { i \in I } \). Moreover, if \( { r_{i} = r^{\prime}_{i} } \), the two conditions boil down to “\( { f_i(x) = r_{i} } \)”.

Since a decision rule is a kind of implication, a minimal rule is understood as an implication such that there is no other implication with the premise of atleast the same weakness (in other words, a rule using a subset of elementary conditions or/and weaker elementary conditions) and the conclusion of atleast the same strength (in other words, a \( { D_{\ge} }\)- or a \( { D_{\le} }\)-decision rule assigning objects to the same union or sub-union of classes, or a \( { D_{{\ge}{\mathrm \le}} } \)-decision rule assigningobjects to the same or smaller set of classes).

The rules of type 1) and 3) represent certain knowledge extracted from data (examples of ordinalclassification), while the rules of type 2) and 4) represent possible knowledge; the rules of type5) represent doubtful knowledge, because they are supported by inconsistent objects only.

Moreover, the rules of type 1) and 3) are exact if they do not cover negative examples, and they are probabilistic otherwise. In the latter case, each rule is characterized by a confidence ratio,representing the probability that an object matching the premise ofthe rule also matches its conclusion.

Given a certain or possible \( { D_{\ge} }\)-decision rule \( { r \equiv }\) “if \( { f_{i_1}(x)\ge r_{i_1} }\) and … and \( { f_{i_p}(x)\ge r_{i_p} }\), then \( { x \in Cl_{t}^{\ge} }\)”, an object \( { y \in U }\) supports r if \( { f_{i_1}(y)\ge r_{i_1} } \) and … and \( { f_{i_p}(y)\ge r_{i_p} } \). Moreover, object \( { y \in U } \) supporting decision rule r is a base of r if \( { f_{i_1}(y) = r_{i_1} } \) and … and \( { f_{i_p}(y) = r_{i_p} } \). Similar definitions hold for certainor possible \( { D_{\le} } \)-decision rulesand approximate \( { D_{\ge \le} } \)-decisionrules. A decision rule having at least one base is called robust. Identification of supporting objects and bases of robust rules is important for interpretation of the rules inmultiple criteria decision analysis perspective. The ratio of the number of objects supporting a rule and the number of all considered objects is calledrelative support of a rule. The relative support and the confidence ratio are basic characteristics of a rule, however, some Bayesianconfirmation measures reflect much better the attractiveness of a rule [25].

A set of decision rules is complete if it covers all considered objects (examples of ordinalclassification) in such a way that consistent objects are re-assigned to their original classes, and inconsistent objects are assigned to clustersof classes referring to this inconsistency. A set of decision rules is minimal if it is complete and non-redundant, i.?e. exclusion of any rule from this set makes it incomplete.

Note that the syntax of decision rules induced from rough approximations defined using dominance cones, is using consistently this type of granules. Eachcondition profile defines a dominance cone in \( { \frac{p}{\uparrow V(p\leq n)} } \)-dimensional condition space \( {\Re^p (p \leq n) } \), and each decision profile defines a dominance cone in one-dimensional decision space\( { \{1, \ldots,m\} } \). In both cases, the conesare positive for \( { D_{\ge} } \)-rules andnegative for \( { D_{\le} } \)-rules.

Let also remark that dominance cones corresponding to condition profiles can originate in any pointof \( { \Re^n } \), without the risk of their being too specific. Thus, contrary to traditional granularcomputing, the condition space \( { \Re^n } \) need not be discretized.

Procedures for rule induction from dominance-based rough approximations have been proposedin [17].

In [10], a new methodology for the induction of monotonic decision trees from dominance-basedrough approximations of preference ordered decision classes has been proposed.

Application of DRSA to decision related problems goes far beyond ordinal classification. In [27],DRSA has been used for decision support involving multiple decision makers, and in [28], DRSA has beenapplied to case-based reasoning . The following sections present applications of DRSA to multiplecriteria choice and ranking, to decision under uncertainty and to interactive multiobjective optimization. The surveys [24,26,29,51,52] include other applications of DRSA.

Example Illustrating DRSA in the Context of Ordinal Classification

This subsection presents a didactic example which illustrates the main concepts of DRSA. Let us consider the following ordinal classification problem. Students of a college must obtain an overall evaluation on the basis of their achievements in Mathematics, Physics and Literature. The three subjects are clearly criteria (condition attributes) and the comprehensive evaluation is a decision attribute. For simplicity, the value sets of the criteria and of the decision attribute are the same, and they are composed of three values: bad, medium and good. The preference order of these values is obvious. Thus, there are three preference ordered decision classes, so the problem belongs to the category of ordinal classification. In order to build a preference model of the jury, DRSA is used to analyze a set of exemplary evaluations of students provided by the jury. These examples of ordinal classification constitute an input preference information presented as decision table in Table 3.

Note that the dominance principle obviously applies to the examples of ordinal classification,since an improvement of a student's score on one of three criteria, with other scores unchanged,should not worsen the student's overall evaluation, but rather improve it.

Table 3 Exemplary evaluations of students (examples of ordinal classification)

Observe that student S1 has not worse evaluations than student S2 on all theconsidered criteria, however, the overall evaluation of S1 is worse than the overall evaluation of S2. This contradicts the dominance principle, so the two examples of ordinal classification, and only those, are inconsistent. Thequality of approximation of the ordinal classification represented by examples in Table 3 is equal to0.75.

One can observe that in result of reducing the set of considered criteria, i.?e. the set ofconsidered subjects, some new inconsistencies can occur. For example, removing fromTable 3 the evaluation on Literature, one obtains Table 4, where S1 isinconsistent not only with S2, but also with S3 and S5. In fact, student S1 has not worseevaluations than students S2, S3 and S5 on all the considered criteria (Mathematics andPhysics), however, the overall evaluation of S1 is worse than the overall evaluation of S2,S3 and S5.

Table 4 Exemplary evaluations of students excluding Literature

Observe, moreover, that removing from Table 3 the evaluations on Mathematics, one obtainsTable 5, where no new inconsistencies occur, comparing to Table 3.

Table 5 Exemplary evaluations of students excluding Mathematics

Similarly, after removing from Table 3 the evaluations on Physics, one obtainsTable 6, where no new inconsistencies occur, comparing to Table 3.

Table 6 Exemplary evaluations of students excluding Physics

The fact that no new inconsistency occurs when Mathematics or Physics is removed, means that the subsets of criteria {Physics, Literature} or{Mathematics, Literature} contain sufficient information to represent the overall evaluation of students with the same quality of approximation as using thecomplete set of three criteria. This is not the case, however, for the subset {Mathematics, Physics}. Observe, moreover, that subsets {Physics, Literature} and{Mathematics, Literature} are minimal, because no other criterion can be removed without new inconsistencies occur. Thus, {Physics, Literature} and{Mathematics, Literature} are the reducts of the complete set of criteria {Mathematics, Physics, Literature}. Since Literature is the only criterion whichcannot be removed from any reduct without introducing new inconsistencies, it constitutes the core, i.?e. the set of indispensable criteria. The core is,of course, the intersection of all reducts, i.?e. in our example:

$$ \{\text{Literature}\}\\ \; = \{\text{Physics, Literature}\} \cap \{\text{Mathematics, Literature}\}. $$

In order to illustrate in a simple way the concept of rough approximation, let us confine our analysis to the reduct {Mathematics, Literature}. Letus consider student S4. His positive dominance cone \({ D_{\{\text{Mathematics},\,\mathrm{Literature}\}}^+(S4) } \) is composed of all the students having evaluations not worsethan him on Mathematics and Literature, i.?e. of all the students dominating him with respect to Mathematics and Literature. Thus,

$$D_{\{\text{Mathematics, Literature}\}}^+(S4)=\{S4, S5, S6\}\:.$$

On the other hand, the negative dominance cone of student \( { S4,D_{\{\text{Mathematics, Literature}\}}^-(S4) } \), is composed of all the students having evaluations not better than him onMathematics and Literature, i.?e. of all the students dominated by him with respect to Mathematics and Literature. Thus,

$$D_{\{\text{Mathematics, Literature}\}}^-(S4)=\{S1, S2, S3, S4, S7, S8\}\:.$$

Similar dominance cones can be obtained for all the students from Table 6. For example,for S2, the dominance cones are

$$D_{\{\text{Mathematics, Literature}\}}^+(S2)=\{S1, S2, S3, S4, S5, S6\}$$

and

$$D_{\{\text{Mathematics, Literature}\}}^-(S2)=\{S2, S7\}\:.$$

The rough approximations can be calculated using dominance cones. Let us consider, for example,the lower approximation of the set of students having a “good”overall evaluation \( { \underline P(Cl_{\mathrm{good}}^{\ge}) } \), with P?=?{Mathematics, Literature}. Notice that\( { \underline P(Cl_{\mathrm{good}}^{\ge})=\{S4, S5, S6\} } \), because positive dominancecones of students S4, S5 and S6 are all included in the set of students with an overallevaluation “good”. In other words, this means that there is no student dominating S4 or S5 orS6 while having an overall evaluation worse than “good”. From the viewpoint of decision making,this means that, taking into account the available information about evaluation of students onMathematics and Literature, the fact that student y dominates S4 or S5 or S6 isa sufficient condition to conclude that y is a “good” student.

The upper approximation of the set of students with a “good” overall evaluation is \( { \overline P(Cl_{\mathrm{good}}^{\ge})=\{S4, S5, S6\} } \), because negative dominance cones ofstudents S4, S5 and S6 have a nonempty intersection with the set of students having a “good”overall evaluation. In other words, this means that for each one of the students S4, S5 and S6,there is at least one student dominated by him with an overall evaluation “good”. From the pointof view of decision making, this means that, taking into account the available information aboutevaluation of students on Mathematics and Literature, the fact that student y dominates S4 orS5 or S6 is a possible condition to conclude that y is a “good” student.

Let us observe that for the set of criteria P?=?{Mathematics, Literature}, the lower andupper approximations of the set of “good” students are the same. This means that examples ofordinal classification concerning this decision class are all consistent. This is not the case,however, for the examples concerning the union of decision classes “at least medium”. For thisupward union the rough approximations are \( { \underline P(Cl_{\mathrm{medium}}^{\ge})=\{S3, S4, S5, S6\} } \) and \( { \overline P(Cl_{\mathrm{medium}}^{\ge})=\{S1, S2, S3, S4, S5, S6\} } \). The difference between \( { \overline P(Cl_{\mathrm{medium}}^{\ge}) } \) and \( { \underline P(Cl_{\mathrm{medium}}^{\ge}) } \), i.?e. the boundary \( { Bn_P(Cl_{\mathrm{medium}}^{\ge})=\{S1, S2\} } \), is composed of students with inconsistentoverall evaluations, which has already been noticed above. From the viewpoint of decision making,this means that, taking into account the available information about evaluation of students onMathematics and Literature, the fact that student y is dominated by S1 and dominates S2 isa condition to conclude that y can obtain an overall evaluation “at least medium” with somedoubts.

Until now, rough approximations of only upward unions of decision classes have been considered. Itis interesting, however, to calculate also rough approximations of downward unions of decisionclasses. Let us consider first the lower approximation of the set of students having “at mostmedium” overall evaluation \( { \underline P(Cl_{\mathrm{medium}}^{\le}) } \). Observe that\( { \underline P(Cl_{\mathrm{medium}}^{\le})=\{S1, S2, S3, S7, S8\} } \), because the negativedominance cones of students S1, S2, S3, S7, and S8 are all included in the set of students withoverall evaluation “at most medium”. In other words, this means that there is no studentdominated by S1 or S2 or S3 or S7 or S8 while having an overall evaluation better than“medium”. From the viewpoint of decision making, this means that, taking into account theavailable information about evaluation of students on Mathematics and Literature, the fact thatstudent y is dominated by S1 or S2 or S3 or S7 or S8 is a sufficient conditionto conclude that y is an “at most medium” student.

The upper approximation of the set of students with an “at most medium” overall evaluation is\( { \overline P(Cl_{\mathrm{medium}}^{\le})=\{S1, S2, S3, S7, S8\} } \), because the positivedominance cones of students S1, S2, S3, S7, and S8 have a nonempty intersection with the set ofstudents having an “at most medium” overall evaluation. In other words, this means that for eachone of the students S1, S2, S3, S7, and S8, there is at least one student dominating him withan overall evaluation “at most medium”. From the viewpoint of decision making, this means that,taking into account the available information about evaluation of students on Mathematics andLiterature, the fact that student y is dominated by S1 or S2 or S3 or S7 or S8 isa possible condition to conclude that y is an “at most medium” student.

Finally, the lower and upper approximations of the set of students having a “bad” overallevaluation are \( { \underline P(Cl_{\mathrm{bad}}^{\le})=\{S7, S8\} } \) and \( { \overline P(Cl_{\mathrm{bad}}^{\le})=\{S1, S2, S7, S8\} } \). The difference between \( { \overline P(Cl_{\mathrm{bad}}^{\le}) } \) and \( { \underline P(Cl_{\mathrm{bad}}^{\le}) } \), i.?e. the boundary \( { Bn_P(Cl_{\mathrm{bad}}^{\le})=\{S1, S2\} } \) is composed of studentswith inconsistent overall evaluations, which has already been noticed above. From the point of view ofdecision making, this means that, taking into account the available information about evaluation ofstudents on Mathematics and Literature, the fact that student y is dominated by S1 anddominates S2 is a condition to conclude that y can obtain an overall evaluation “bad” withsome doubts. Observe, moreover, that \( { Bn_P(Cl_{\mathrm{medium}}^{\ge})=Bn_P(Cl_{\mathrm{bad}}^{\le})=\{S1, S2\} } \).

Given the above rough approximations with respect to the set of criteria P?=?{Mathematics, Literature}, one can induce a set of decision rules representing the preferences of thejury. The idea is that evaluation profiles of students belonging to the lower approximations canserve as a base for some certain rules, while evaluation profiles of students belonging to theboundaries can serve as a base for some approximate rules. The following decision rules have beeninduced (between parentheses there are id's of students supporting the corresponding rule; thestudent being a rule base is underlined):

rule 1):

if the evaluation on Mathematics is (at least) good, and the evaluation onLiterature is at least medium, then the overall evaluation is (at least) good, \( { \{\underline{S4},S5,S6\} } \),

rule 2):

if the evaluation on Mathematics is at least medium, and the evaluation onLiterature is at least medium, then the overall evaluation is at least medium, \( { \{\underline{S3}, S4, S5,S6\} } \),

rule 3):

if the evaluation on Mathematics is at least medium, and the evaluation onLiterature is (at most) bad, then the overall evaluation is bad or medium, \( { \{ S1, \underline{S2}\} } \),

rule 4):

if the evaluation on Mathematics is at least medium, then the overall evaluationis at least medium, \( { \{\underline{S2, S3},S4,S5, S6\} } \),

rule 5):

if the evaluation on Literature is (at most) bad, then the overall evaluation isat most medium, \( { \{\underline{S1, S2, S7}\} } \),

rule 6):

if the evaluation on Mathematics is (at most) bad, then the overall evaluation is(at most) bad, \( { \{\underline{S7,S8}\} } \).

Notice that rules 1)–2), 4)–7) are certain, while rule 3) is an approximate one. These rulesrepresent knowledge discovered from the available information. In the current context, theknowledge is interpreted as a preference model of the jury. A characteristic feature of the syntaxof decision rules representing preferences is the use of expressions “at least” or “at most”a value; in case of extreme values (“good” and “bad”), these expressions are put in parenthesesbecause there is no value above “good” and below “bad”.

Even if one can represent all the knowledge using only one reduct of the set of criteria (as it isthe case using P?=?{Mathematics, Literature}), when considering a larger set of criteria thana reduct, one can obtain a more synthetic representation of knowledge, i.?e. the number ofdecision rules or the number of elementary conditions, or both of them, can get smaller. Forexample, considering the set of all three criteria, {Mathematics, Physics, Literature}, onecan induce a set of decision rules composed of the above rules 1), 2), 3) and 6), plus thefollowing:

rule 7):

if the evaluation on Physics is at most medium, and the evaluation on Literatureis at most medium, then the overall evaluation is at most medium,\( { \{S1,S2,\underline{S3},S7,S8\} } \).

Thus, the complete set of decision rules induced from Table 3 is composed of 5 instead of6 rules.

Once accepted by the DM, these rules represent his/her preference model. Assuming that rules 1)–7)in our example represent the preference model of the jury, it can be used to evaluate newstudents. For example, student S9 who is “medium” in Mathematics and Physics and “good” inLiterature, would be evaluated as “medium” because his profile matches the premise of rule 2),having as consequence an overall evaluation at least “medium”. The overall evaluation of S9cannot be “good”, because his profile does not match any rule having as consequence an overallevaluation “good” (in the considered example, the only rule of this type is rule 1) whose premiseis not matched by the profile of S9.

DRSA on a Pairwise Comparison Table for Multiple Criteria Choice and Ranking Problems

Multiple Criteria Choice and Ranking Problems

Ordinal classification decisions are based on absolute evaluation of objects on multiple criteria,however, multiple criteria choice and ranking decisions are based on preference relations betweenobjects. Decision table including examples of ordinal classification does not contain informationabout preference relations between objects, thus, in order to apply DRSA to multiple criteriachoice and ranking problems, a different representation of the input preference information isneeded.

To handle binary relations within the rough set approach, it has been proposed in [14,15] to operate on, so-called, pairwise comparison table (PCT) , i.?e. a decision table including pairs of objects for which multiplecriteria evaluations and a comprehensive preference relation are known. PCT represents preference information provided by the DM in the form of decisionexamples (pairwise comparisons of objects).

Similarly to ordinal classification, decision examples concerning multiple criteria choice andranking may be inconsistent with respect to the dominance principle, however, interpretation of theinconsistency is different: It occurs when preferences of a pair of objects, say \( { (a,b) } \), on allconsidered criteria are not weaker than preferences of another pair of objects, say \( { (c,d) } \), onthese criteria, however, the comprehensive preference of object a over object b is weaker thanthe comprehensive preference of object c over object d.

The Pairwise Comparison Table

Similarly to the ordinal classification, let us consider a finite set of criteria \( { F=\{f_1, \dots, f_n\} } \), the set of their indices \( { I=\{1, \dots, n\} } \) and a finite universe of objects(actions, solutions, alternatives) U. For any criterion \( { i\in I } \), let T i be a finite set of binary relations defined on U on the basis of theevaluations of objects from U with respect to the considered criterion i, suchthat \( { \forall (x,y) \in U\times U } \) exactlyone binary relation \( { t \in T_{i} } \) isverified; \( { t \in T_{i} } \) has the meaning ofa preference relation for a pair of objects on a particular criterion i. More precisely, given value set V i of\( { i \in I } \), if \( { v^{\prime}_{i}, v^{\prime\prime}_{i} \in V_{i} } \) are therespective evaluations of \( { x,y \in U } \) oncriterion i, and \( { (x,y) \in t }\), with \( { t \in T_{i} }\), then for each \( { w,z \in U }\) having the same evaluations \( { v^{\prime}_{i},v^{\prime\prime}_{i} } \) on \( { i, (w,z) \in t }\). For interesting applications it should be \( {|T_{q}| \ge 2,\,\forall i \in I } \). Furthermore, let T d be a set of binary relations defined on U, such that at most one binaryrelation \( { t \in T_{d} } \) is verified\( { \forall (x,y) \in U\times U;\; t \in T_{d} } \)has the meaning of a comprehensive preference relation for a pair of objects (comprehensive pairwisecomparison).

The preference information provided by the DM, has the form of pairwise comparisons of somereference objects from \( { B \subseteq U } \). These decision examples are presented in the pairwisecomparison table (PCT), defined as information table \( \textbf{\textit{S}}_\mathrm{{PCT}}=\langle \hat{B},F \cup \{d\}, T_{F} \cup T_{d}, g\rangle \), where \( { \hat{B} \subseteq B\times B } \) is a non-empty set ofexemplary pairwise comparisons of reference objects, \( { T_{F}=\bigcup_{i \in I} {T_i}, d } \) is a decision corresponding to the comprehensive pairwise comparison (comprehensivepreference relation), and \( g\colon \hat{B}\times (F \cup \{d\}) \to T_{F} \cup T_{d} \) is a totalfunction such that \( { g[(x,y),i] \in T_{i},\, \forall (x,y) \in U\times U } \) and \( { \forall i \in I } \), and\( g[(x,y),d] \in T_{d},\, \forall (x,y) \in \hat{B} \). It follows that for any pair of reference objects\( { (x,y) \in \hat{B} } \) there is verified one and only one binary relation \( { t \in T_{d} } \). Thus,T d induces a partition of \( { \hat{B} } \). In fact, information table \( { \textbf{\textit{S}}_\mathrm{{PCT}} } \)can be seen as decision table, since the set of considered criteria F and decision d are distinguished.

It is assumed that the exemplary pairwise comparisons provided by the DM can be represented in terms of graded preferencerelations (for example “very weak preference”,“weak preference”, “strict preference”, “strong preference”, “very strong preference”) \( { P_i^h\colon \forall i \in I } \) and \( { \forall (x,y) \in U\times U } \),

$$T_{i} = {\left\{P_i^h, h \in H_{i}\right\}}\: ,$$

where H i is a particular subset of the relative integers and

  • \( { xP_i^h y, h > 0 } \), meansthat object x is preferred to object y by degree h with respect tocriterion i,

  • \( { xP_i^h y, h<0 } \), meansthat object x is not preferred to object y by degree h with respect tocriterion i,

  • \( { xP_i^0 y } \) means that object x is similar (asymmetrically indifferent) to object y withrespect to criterion i.

Of course, \( { \forall i \in I } \) and \( { \forall (x,y) \in U\times U } \), it holds:

$$\left[xP_i^h y, h > 0\right]\Rightarrow \left[yP_i^kx, k \le 0\right]\:.$$

The set of binary relations T d may be defined in a similar way, but \( { xP_d^hy } \) means thatobject x is comprehensively preferred to object y by degree h.

Technically, the modeling of the binary relation \( { P_i^h } \), i.?e. the assessment of h, can beorganized as follows:

  • first, it is observed that criterion i is a function \( { f_i\colon U \rightarrow \Re } \) increasingwith respect to the preferences on i, for each \( { i=1, \ldots, n } \),

  • then, for each \( { i=1, \ldots, n }\), it is possible to define a function \( {k_{i}\colon \Re^{2} \to \Re } \) which measures the strength of the preference(positive or negative) of x over y (e.?g. \( { k_{i}[f_{i}(x),f_{i}(y)] = f_{i}(x)-f_{i}(y))} \); it should satisfy the following properties \( {\forall x,y,z \in U } \):

    1. i)

      \( f_{i}(x) > f_{i}(y) \Leftrightarrow k_{i}[f_{i}(x),f_{i}(z)] > k_{i}[f_{i}(y), f_{i}(z)] \),

    2. ii)

      \( f_{i}(x) > f_{i}(y) \Leftrightarrow k_{i}[f_{i}(z),f_{i}(x)]< k_{i}[f_{i}(z), f_{i}(y)] \),

    3. iii)

      \( f_{i}(x)=f_{i}(y) \Leftrightarrow k_{i}[f_{i}(x), f_{i}(y)]=0 \),

  • next, the domain of k i can be divided into intervals, using a suitable set ofthresholds \( { \Delta_{i},\, \forall i \in I } \); these intervals are numbered in such a way that\( { k_{i}[f_{i}(x),f_{i}(y)]=0 } \) belongs to interval no. 0,

  • the value of h in the relation \( { xP_i^hy } \) is then equal to the number of the intervalincluding \( { k_{i}[f_{i}(x),f_{i}(y)] } \), for any \( { (x,y) \in U\times U } \).

Actually, property iii) can be relaxed in order to obtain a more general preference model which,for instance, does not satisfy preferential independence.

To simplify the presentation, let us consider a PCT where the set T d is composed of two binaryrelations defined on U:

  • x outranks y (denotation xSy or \( { (x,y) \in S } \)), where \( { (x,y) \in \hat{B} } \),

  • x does not outrank y (denotation \( { xS^{c}y } \) or \( { (x,y) \in S^{c}) } \), where \( { (x,y) \in \hat{B} } \),

and \( { S \cup S^{c}=\hat{B} } \), where “x outranks y” means “x is at least as good as y”;observe that the binary relation S is reflexive, but neither necessarily transitive norcomplete. In [8], a more general PCT was considered, where the set T d is composed ofmulti-graded binary relations defined on U.

Approximation by Means of Graded Dominance Relations

Let \( H_{P}=\bigcap_{i \in P} {H_i},\, \forall P \subseteq I\). Given \( { P \subseteq I }\) and \( h \in H_{P}, \forall(x,y) \in U\times U \) it is said that x positivelydominates y by degree h with respect to criteria from P iff \( { xP_i^{c_i} y }\) with \( { c_i \ge h,\, \forall i \in P }\). Analogously, \( \forall (x,y) \in U\times U, x\) negatively dominates y by degree h with respect to criteria from P iff \( { xP_i^{c_i} y } \) with \({ c_i \le h,\, \forall i\in P } \). Therefore, each \( { P \subseteq I } \) and \({ h \in H_{P} } \) generate two binary relations (possibly empty) on U, calledpositive P-dominance by degree h(denotation \( { D_{+P}^h }\)) and negative P-dominance by degree h (denotation \( { D_{-P}^h } \)), respectively. They satisfy the followingconditions:

  1. (P1)

    if \( { (x,y) \in D_{+P}^h }\), then \( { (x,y) \in D_{+R}^h }\) for each \( { R \subseteq P }\) and \( { k \le h }\);

  2. (P2)

    if \( { (x,y) \in D_{-P}^h }\), then \( { (x,y) \in D_{-R}^h } \) for each \( { R \subseteq P }\) and \( { k \ge h }\).

In [15], it has been proposed to approximate the outranking relation S by means of the dominance relation \( { D_{+P}^h }\). Therefore, S is considered a  rough binary relation.

The P -lower approximation of S (denotation \( { \underline{P}(S)) }\) and the P -upper approximation of S (denotation \( { \overline{P}(S)) }\) are defined, respectively, as:

$$\underline{P}(S) =\bigcup_{h\in H_P}\left\{\left(D_{+P}^h \cap \hat{B}\right)\subseteq S\right\}\: ,$$
$$\overline{P}(S) =\bigcap_{h\in H_P}\left\{\left(D_{+P}^h \cap \hat{B}\right)\supseteq S\right\}\:.$$

\( { \underline{P}(S) } \) may be interpreted as the dominance relation \( { D_{+P}^h } \) havingthe largest intersection with \( { \hat{B} } \) included in the outranking relation S, and\( { \overline{P}(S) } \) as the dominance relation \( { D_{+P}^h } \) including S and having the smallestintersection with \( { \hat{B} } \).

Analogously, it is possible to approximate the relation S c by means of the dominance relation\( { D_{-P}^h } \). Observe that, in general, the definitions of the approximations of S and S c donot satisfy the condition of complementarity, i.?e. it is not true, in general, that\( { \underline{P}(S) } \) is equal to \( { \hat{B}-\overline{P}(S^{c}) } \) and that \( { \underline{P}(S^{c}) } \) isequal to \( { \hat{B}-\overline{P}(S) } \). This is because S and S c are approximated using twodifferent relations, \( { D_{+P}^h } \) and \( { D_{-P}^h } \), respectively. Nevertheless, the approximations thusobtained constitute a good basis for the generation of simple decision rules.

Decision Rules

It is possible to represent preferences of the DM revealed in terms of exemplary pairwisecomparisons contained in a given PCT, using decision rules. Since approximations of S and S cwere made using graded dominance relations, it is possible to induce decision rules beingpropositions of the following type:

  • \( { D_{+ +} } \)-decision rule: if \( { x D_{+P}^hy } \), then xSy,

  • \( { D_{+ -} } \)-decision rule: if not \( { x D_{+P}^h y } \), then \( { xS^{c}y } \),

  • \( { D_{- +} } \)-decision rule: if not \( { x D_{-P}^h y } \), then xSy,

  • \( { D_{- -} } \)-decision rule: if \( { x D_{-P}^h y } \), then \( { xS^{c}y } \),

where P is a non-empty subset of I. Therefore, for example, a \( { D_{+ +} } \)– decision rule isa proposition of the type: “if x positively dominates y by degree h with respect tocriteria from P, then x outranks y”.

A constructive definition of these rules may be given, being a kind of implication supported by theexistence of at least one pair of objects from \( { \hat{B} } \) satisfying one of the four propositionslisted above, and by the absence of pairs from \( { \hat{B} } \) contradicting it. Thus, for example, if

  • there exists at least one pair \( { (w,z) \in \hat{B} } \) such that \( { w D_{+P}^h z } \) and wSz and

  • there does not exist any pair \( { (v,u) \in \hat{B} } \) such that \( { v D_{+P}^h u } \) and \( { vS^{c}u } \),

  • then “if \( { x D_{+P}^h y } \), then xSy” is accepted as a \( { D_{++} } \)–decision rule.

\( { D_{+ +} } \)-decision rule“if \( { x D_{+P}^h y }\), then xSy” is said to be minimal if there does not exist any other rule“if \( { x D_{+R}^k y }\), then xSy” such that \( { R \subseteq P } \) and \( { k \le h } \). Analogous definitions hold for the othercases. In other words, a minimal decision rule is a kind of implication for which there is no other implication whose premise is of at least the sameweakness and whose consequence is of at least the same strength .

The following results show connections of the decision rules with the P-lower and P-upperapproximations of S and S c [15]:

  • \( { D_{+ +} } \)-minimal decision rule “if \( { x D_{+P}^h y } \), then xSy” issupported by pairs of objects belonging to \( \underline{P}(S)=D_{+ P}^h \cap \hat{B} \),

  • \( { D_{- -} } \)-minimal decision rule “if \( { x D_{-P}^h y } \), then \( { xS^{c}y } \)” issupported by pairs of objects belonging to \( \underline{P}(S^{c})=D_{- P}^h \cap \hat{B} \),

  • \( { D_{+ -} } \)-minimal decision rule “if not \( { x D_{+P}^h y } \), then \( { xS^{c}y } \)” is supported by pairs of objects belonging to \( \overline{P}(S)=D_{+ P}^h \cap \hat{B} \),

  • \( { D_{- +} } \)-minimal decision rule “if not \( { x D_{-P}^h y } \), then xSy” is supported by pairs of objects belonging to \( \overline{P}(S^{c})=D_{- P}^h \cap \hat{B} \).

Application of the Decision Rules and Final Recommendation

In order to obtain a recommendation in the multiple criteria choice or ranking problems with respect to a set of objects \( { M \subseteq U } \), the decision rules induced from theapproximations of S and S c (defined with respect to reference objects from B) should be applied on set\( { M\times M } \). The application of the rules toany pair of objects \( { (u,v) \in M\times M } \)establishes the presence \( { (uSv) } \) or theabsence \( { (uS^{c}v) } \) of outranking withrespect to \( { (u,v) } \). More precisely,

  • from \( { D_{+ +} }\)-decision rule “if \( { x D_{+P}^h y } \) then xSy” and from \( { u D_{+P}^h v }\), one concludes uSv,

  • from \( { D_{+ -} }\)-decision rule “if not \( { x D_{+P}^h y } \) then \( { xS^{c}y } \)” and from not \( { u D_{+P}^h v } \),one concludes \( { uS^{c}y } \),

  • from \( { D_{- +} }\)-decision rule “if not \( { x D_{-P}^h y } \), then xSy” and from not \( { uD_{-P}^h v } \), one concludes uSv,

  • from \( { D_{- -} }\)-decision rule “if \( { x D_{-P}^h y } \), then \( { xS^{c}y } \)” and from \( { u D_{-P}^h v } \), one concludes \( { uS^{c}v } \).

After the application of the decision rules to each pair of objects \( { (u,v) \in M\times M } \), one ofthe following four situations may occur:

  • uSv and not \( { uS^{c}v } \), that is true outranking (denotation \( { uS^{{\text{T}}}v) } \),

  • \( { uS^{c}v } \) and not uSv, that is false outranking (denotation \( { uS^{{\text{F}}}v) } \),

  • uSv and \( { uS^{c}v } \), that is contradictory outranking (denotation \( { uS^{{\text{K}}}v } \)),

  • not uSv and not \( { uS^{c}v } \), that is unknown outranking(denotation \( { uS^{{\text{U}}}v) } \).

The four above situations, which together constitute the so-called four-valued outranking(see [56]), have been introduced to underline the presence and the absenceof positive and negative reasons for the outranking. Moreover, they make itpossible to distinguish contradictory situations from unknown ones.

The following theorem underlines the operational importance of the minimal decisionrules [15]: The application of all the decision rules obtained for a given\( { \textbf{\textit {S}}_\mathrm{{PCT}} } \) to a pair of objects \( { (u,v) \in M\times M } \) results in the sameoutranking relations S and S c as those obtained from the application of the minimaldecision rules only. Therefore, the set of the minimal decision rules totallycharacterizes the preferences of the DM contained in \( { \textbf{\textit{S}}_\mathrm{{PCT}} } \).

A final recommendation can be obtained upon a suitable exploitation of the presence andthe absence of outranking S and S c on M. A possible exploitation procedure consists incalculating a specific score, called Net Flow Score, for each object \( { x \in M } \):

$$S_\mathrm{{nf}}(x) = S^{+ +}(x) - S^{+ -}(x) + S^{- +}(x) - S^{- -}(x)\: ,$$

where

$$\begin{gathered}{\begin{aligned}[b]S^{+ +}(x) = \vert \{y \in M\colon\; &\text{there is at least one decision}\\[-1ex]&\text{rule which affirms } xSy\}\vert \: ,\end{aligned}}\\{\begin{aligned}[b]S^{+ -}(x) = \vert \{y \in M\colon\; &\text{there is at least one decision}\\[-1ex]&\text{rule which affirms } ySx\}\vert\: ,\end{aligned}}\\{\begin{aligned}[b]S^{- +}(x) = \vert \{y \in M\colon\; &\text{there is at least one decision}\\[-1ex]&\text{rule which affirms } yS^{c}x\}\vert\: ,\end{aligned}}\\{\begin{aligned}[b]S^{- -}(x) = \vert \{y \in M\colon\; &\text{there is at least one decision}\\[-1ex]&\text{rule which affirms } xS^{c}y\}\vert\:.\end{aligned}}\end{gathered}$$

The recommendation in multiple criteria ranking problems consists of the total preorder determined by \( { S_\mathrm{{nf}}(x) } \) on M; in multiple criteriachoice problems it consists of the object(s) \( { x^* \in M }\) such that \( S_\mathrm{{nf}}(x^*)=\mathop {\max}_{x \in M} S_\mathrm{{nf}}(x) \).

The procedure described above has been characterized with reference to a number of desirableproperties in [13].

Approximation by Means of Multi-graded Dominance Relations

The graded dominance relation introduced above assumes a common grade of preference for all theconsidered criteria. While this permits a simple calculation of the approximations and of theresulting decision rules, it is lacking in precision. A dominance relation allowing a differentdegree of preference for each considered criterion (multi-graded dominance ) gives a far moreaccurate picture of the preference information contained in the pairwise comparison table\( { \textbf{\textit{S}}_\mathrm{{PCT}} } \) [14,16,18].

More formally, given \( P \subseteq I\, (P \ne \emptyset), (x,y),(w,z)\in U\times U, (x,y) \) is said to dominate \( { (w,z) } \) with respect to criteria from P (denotation\( { (x,y)D_{P}(w,z) } \)) if x is preferred to y at least as strongly as w ispreferred to z with respect to each \( { i \in P} \). Precisely, “at least as strongly as” means “by at least the same degree”,i.?e. \( { hi \ge ki } \), where\( hi,ki \in H_{i},\, xP_i^{hi}y \) and\( wP_i^{ki}z, \forall i \in P\). Let \( { D_{\{i\}} }\) be the dominance relation confined to the single criterion \( { i \in P } \). The binary relation \( { D_{\{i\}} } \) is reflexive \( ((x,y)D_{\{i\}}(x,y), \forall (x,y) \in U \times U) \), transitive\( { ((x,y)D_{\{i\}}(w,z) } \) and \( { (w,z)D_{\{i\}}(u,v) } \) imply \( (x,y)D_{\{i\}}(u,v), \forall (x,y),(w,z),(u,v) \in U\times U)\), and complete \( { ((x,y)D_{\{i\}}(w,z) }\) or \( (w,z)D_{\{i\}}(x,y),\forall (x,y), (w,z) \in U\times U) \). Therefore, \( { D_{\{i\}} } \) is a complete preorder on \( { U\times U } \). Since the intersection of complete preorders is a partial preorder and\( D_{P}=\bigcap_{i \in P} {D_{\{i\}}}, P \subseteq I\), then the dominance relation D P is a partial preorder on \( { U\times U }\).

Let \( { R \subseteq P \subseteq I } \) and \( { (x,y),(u,v) \in U \times U } \); then the following implicationholds:

$$(x,y)D_{P}(u,v) \Rightarrow (x,y)D_{R}(u,v)\:.$$

Given \( { P \subseteq I } \) and \( { (x,y) \in U \times U } \), let us introduce the positive dominance set(denotation \( { D_P^+(x,y)) } \) and the negative dominance set (denotation \( { D_P^-(x,y)) } \):

$$D_P^+(x,y)) = \{(w,z)\in U\times U\colon (w,z)D_{P}(x,y)\}\: ,$$
$$ D_P^-(x,y))= \{(w,z) \in U\times U\colon (x,y)D_{P}(w,z)\}\:. $$

Usingthe dominance relation D P , it is possible to define P-lower and P-upper approximations of the outranking relation S with respect to \( { P \subseteq I } \), respectively, as:

$$\underline{P}(S) = \left\{(x,y) \in \hat{B}\colon D_P^+ (x,y) \subseteq S\right\}\: ,$$
$$\overline{P}(S) =\bigcup_{(x,y) \in S} {D_P^+ (x,y)}\:.$$

Analogously, it is possible to define the approximations of S c:

$$\underline{P}(S^{c})=\left\{(x,y) \in \hat{B}\colon D_P^- (x,y) \subseteq S^{c}\right\}\: ,$$
$$\overline{P}(S^{c})=\bigcup_{(x,y) \in S^c} {D_P^- (x,y)}\:.$$

It may be proved that

$$\underline{P}(S) \subseteq S \subseteq \overline{P}(S)\:$$
$$\underline{P}(S^{c}) \subseteq S^{c} \subseteq \overline{P} (S^{c})\:.$$

Furthermore, the following complementarity properties hold:

$$\underline{P}(S) = \hat{B}-\overline{P} (S^{c}), \quad \overline{P}(S)=\hat{B}-\underline{P}(S^{c})\: ,$$
$$\underline{P}(S^{c})=\hat{B}-\overline{P}(S), \quad \overline{P}(S^{c})=\hat{B}-\underline{P}(S)\:.$$

The P-boundaries (P-doubtful regions) of S and S c are defined as

$$Bn_{P}(S)=\overline{P}(S)-\underline{P}(S),\quad Bn_{P}(S^{c})=\overline{P}(S^{c})-\underline{P}(S^{c})\:.$$

It is easy to prove that \( { Bn_{P}(S)=Bn_{P}(S^{c}) } \).

The concepts of quality of approximation, reducts and core can be extended also to theapproximation of the outranking relation by multi-graded dominance relations. In particular,

$$\gamma_P = \frac{\left| {\underline{P}(S) \cup \underline{P}(S^c)} \right|}{\left| \hat{B}\right|}\:$$

defines the quality of approximation of  S and S c by \( { P \subseteq I } \). It expresses the ratio of all pairs ofobjects \( { (x,y) \in \hat{B} } \) correctlyassigned to S and S cby the set P of criteria, to all the pairs of objects contained in \( { \hat{B} } \). Each minimal subset \( { P^{\prime} \subseteq P } \) such that \( { \gamma_{P^{\prime}} =\gamma_P } \) is called a reduct of  P (denotation \( { RED_{PCT}(P)) } \). Let us remark that \( { \textbf{\textit{S}}_\mathrm{{PCT}} } \) can have more than one reduct. The intersection of allreducts is called the core (denotation \( {CORE_{PCT}(P)) } \) .

Using the approximations defined above, it is then possible to induce a generalized description ofthe preference information contained in a given \( { \textbf{\textit{S}}_\mathrm{{PCT}} } \) in terms of suitabledecision rules. The syntax of these rules is based on the concept of upward cumulated preferences (denotation \( { P_i^{\ge h}) } \) and downward cumulated preferences (denotation \( { P_i^{\ge h}) } \), havingthe following interpretation:

  • \( { xP_i^{\ge h} y } \) means “x is preferred to y with respect to i by at leastdegree h”,

  • \( { xP_i^{\le h} y } \) means “x is preferred to y with respect to i by at mostdegree h”.

Exact definition of the cumulated preferences, for each \( { (x,y) \in U\times U, i \in I } \) and \( { h \in H_{i} } \), is the following:

  • \( { xP_i^{\ge h} y } \) if \( { xP_i^k y } \), where \( { k\in H_{i} } \) and \( { k \ge h } \),

  • \( { xP_i^{\le h} y } \) if \( { xP_i^k y } \), where \( { k\in H_{i} } \) and \( { k \le h } \).

Using the above concepts, three types of decision rules can be obtained:

  1. 1.

    \( { D_{\ge} } \) – decision rules, being statements of the type:

    $$\textit{if } xP_{i1}^{\ge h(i1)} y \textit{ and } xP_{i2}^{\ge h(i2)} y \textit{ and } \dots xP_{ip}^{\ge h(ip)} y, \textit{ then } xSy\:,$$

    where \( { P=\{i1, \dots,ip_{} \} \subseteq I } \) and \( (h(i1), \dots,h(ip)) \in H_{i1}\times \dots \times H_{ip} \); these rules are supported by pairs of objects from the P-lower approximation of S only;

  2. 2.

    \( { D_{\le} } \) – decision rules, being statements of the type:

    $$\textit{if } xP_{i1}^{\le h(i1)} y \textit{ and } xP_{i2}^{\le h(i2)}y \textit{ and } \dots xP_{ip}^{\le h(ip)} y, \textit{ then } xS^{c}y\:,$$

    where \( { P=\{i1, \dots,ip_{} \} \subseteq I } \) and \( (h(i1), \dots,h(ip)) \in H_{i1}\times \dots \times H_{ip} \); these rules are supported by pairs of objects from the P-lower approximation of S c only;

  3. 3.

    \( { D_{\ge \le} } \) – decision rules, being statements of the type:

    $$\begin{aligned}[b]\textit{if } & xP_{i1}^{\ge h(i1)} y \textit{ and } xP_{i2}^{\ge h(i2)} y \textit{ and } \dots xP_{ik}^{\ge h(ik)} y \;\; \textit{and }\\&xP_{ik + 1}^{\le h(ik + 1)} y \textit{ and } \dots xP_{ip}^{\le h(ip)}y, \textit{ then } xSy \textit{ or } xS^{c}y\:,\end{aligned}$$

    where \( O^{\prime}=\{i1, \dots,ik\} \subseteq I, O^{\prime\prime}=\{ik+1, \dots,ip\} \subseteq I, P=O^{\prime} \cup O^{\prime\prime}, O^{\prime} \) and \( { O^{\prime\prime} } \) not necessarily disjoint,\( (h(i1), \dots,h(ip)) \in H_{i1}\times H_{i2} \dots \times H_{ip} \); these rules are supported byobjects from the P-boundary of S and S c only.

Dominance Without Degrees of Preference

The degree of graded preference considered in Subsect. “The Pairwise Comparison Table” is defined on a quantitative scale of the strength of preference \( { k_{i}, i \in I } \). However, in many real world problems, theexistence of such a quantitative scale is rather questionable. Roy [45] distinguishes the followingcases:

  • Preferences expressed on an ordinal scale : this is the case where the difference between two evaluations has no clearmeaning;

  • Preferences expressed on a quantitative scale : this is the case where the scale is defined with reference toa unit clearly identified, such that it is meaningful to consider an origin (zero) of the scale and ratios between evaluations (ratioscale);

  • Preferences expressed on a numerical non-quantitative scale : this is an intermediate case between the previous two; there are two well-knownparticular cases:

    • Interval scale, where it is meaningful to compare ratios between differences of pairs ofevaluations,

    • Scale for which a complete preorder can be defined on all possible pairs ofevaluations.

The preference scale has also been considered within economic theory (e.?g. [47]), wherecardinal utility is distinguished from ordinal utility: the former deals with a strength of preference, while, for the latter, this concept ismeaningless. From this point of view, preferences expressed on an ordinal scale refer to ordinal utility, while preferencesexpressed on a quantitative scale or a numerical non-quantitative scale deal with cardinal utility.

The strength of preference k i and, therefore, the gradedpreference considered in Subsect. “Approximation by Means of Multi-graded Dominance Relations”,is meaningful when the scale is quantitative or numerical non-quantitative. If the information about k i is non-available, then it is possible to define a roughapproximation of S and S c using a specific dominance relation between pairs of objects from \( { U\times U } \), defined on an ordinal scale represented by evaluations \( { f_{i}(x) } \) on criterion i, for \( { x \in U }\) [14]. Let us explain this latter case in more details.

Let I O be the set of criteria expressing preferences on an ordinal scale, and I N, the setof criteria expressing preferences on a quantitative scale or a numerical non-quantitative scale,such that \( { I^{\mathrm{O}} \cup I^{\mathrm{N}}=I } \) and \( { I^{\mathrm{O}} \cap I^{\mathrm{N}}=\emptyset } \). Moreover, for each \( { P \subseteq I,\, P^{\mathrm{O}} } \) denotes the subset of P composed of criteria expressing preferences on an ordinalscale, i.?e. \( { P^{\mathrm{O}}=P \cap I^{\mathrm{O}} } \), and P N the subset of P composed of criteria expressingpreferences on a quantitative scale or a numerical non-quantitative scale, i.?e. \( { P^{\mathrm{N}}=P \cap I^{\mathrm{N}} } \). Of course, for each \( { P \subseteq I, P=P^{\mathrm{N}} \cup P^{\mathrm{O}} } \) and \( { P^{\mathrm{O}} \cap P^{\mathrm{N}}=\emptyset } \).

If \( { P=P^{\mathrm{N}} } \) and \( { P^{\mathrm{O}}=\emptyset } \), then the definition of dominance is the same as in the case ofmulti-graded dominance (Subsect. “Approximation by Means of Multi-graded Dominance Relations”). If \( { P=P^{\mathrm{O}} } \) and \( { P^{\mathrm{N}}=\emptyset } \),then, given \( { (x,y),(w,z) \in U\times U } \), the pair \( { (x,y) } \) is said to dominate the pair \( { (w,z) } \) withrespect to P if, for each \( i \in P, f_{i}(x) \ge f_{i}(w) \) and \( { f_{i}(z) \ge f_{i}(y) } \). Let\( { D_{\{i\}} } \) be the dominance relation confined to the single criterion \( { i \in P^{\mathrm{O}} } \). The binaryrelation \( { D_{\{i\}} } \) is reflexive \( ((x,y)D_{\{i\}}(x,y),\, \forall (x,y) \in U\times U) \), transitive\( ((x,y)D_{\{i\}}(w,z) \) and \( { (w,z)D_{\{i\}}(u,v) } \) imply \( (x,y)D_{\{i\}}(u,v),\, \forall (x,y),(w,z),(u,v) \in U\times U) \), but non-complete (it is possible that not \( { (x,y)D_{\{i\}}(w,z) } \) and not \( { (w,z)D_{\{i\}}(x,y) } \) for some \( { (x,y),(w,z) \in U\times U) } \). Therefore, \( { D_{\{i\}} } \) is a partial preorder. Since the intersection of partial preorders isalso a partial preorder and \( D_{P}=\bigcap_{i \in P} {D_{\{i\}}},\, P=P^{\mathrm{O}} \), then the dominance relation D P is also a partial preorder. If some criteria from\( { P \subseteq I } \) express preferences on a quantitative or a numerical non-quantitative scale andothers on an ordinal scale, i.?e. if \( { P^{\mathrm{N}} \ne \emptyset } \) and \( { P^{\mathrm{O}} \ne \emptyset } \), then,given \( { (x,y),(w,z) \in U\times U } \), the pair \( { (x,y) } \) is said to dominate the pair \( { (w,z) } \) withrespect to criteria from P, if \( { (x,y) } \) dominates \( { (w,z) } \) with respect to both P N andP O. Since the dominance relation with respect to P N is a partial preorder on \( { U \times U } \)(because it is a multi-graded dominance) and the dominance with respect to P O is alsoa partial preorder on \( { U \times U } \) (as explained above), then also the dominance D P , being theintersection of these two dominance relations, is a partial preorder. In consequence, all theconcepts introduced in the previous subsection can be restored using this specific definition ofdominance relation.

Using the approximations of S and S c based on the dominance relation defined above, it is possible to induce a generalized description of theavailable preference information in terms of decision rules. These decision rules are of the same type as the rules already introduced inthe previous subsection, however, the conditions on criteria from I O are expressed directly interms of evaluations belonging to value sets of these criteria. Let \( {F_{i}=\{f_{i}(x), x \in U\}, i \in I^{\mathrm{O}} } \). The decision rules have in this case the following syntax:

  1. 1.

    \( { D_{\ge} } \)-decision rule, being a statement of the type:

    $$\begin{aligned}[b]& \textit{if } xP_{i1}^{\ge h(i1)}y \textit{ and } \dots xP_{ie}^{\ge h(ie)}y \textit{ and } f_{ie + 1}(x) \ge r_{ie + 1}\\ &\textit{ and }f_{ie + 1}(y) \le s_{ie + 1} \textit{ and } \dots f_{ip}(x) \ge r_{ip}\\&\textit{ and } f_{ip}(y) \le s_{qp}, \textit{ then } xSy\:,\end{aligned}$$

    where \( P=\{i1, \dots,ip\} \subseteq I, P^{\mathrm{N}}=\{i1, \dots,ie\}, P^{\mathrm{O}}= \{ie+1, \dots, ip\}, (h(i1), \dots, h(ie)) \in H_{i1}\times \dots \times H_{ie} \) and \( (r_{ie + 1}, \dots,r_{ip}), (s_{ie + 1}, \dots, s_{ip}) \in F_{ie + 1}\times \dots \times F_{ip} \); these rules are supported bypairs of objects from the P-lower approximation of S only;

  2. 2.

    \( { D_{\le} } \)-decision rule, being a statement of the type:

    $$\begin{aligned}[b]& \textit{if } xP_{i1}^{\le h(i1)}y \textit{ and } \dots xP_{ip}^{\le h(ip)}y \textit{ and } f_{ie + 1}(x) \le r_{ie + 1}\\ &\textit{ and }f_{ie + 1}(y) \ge s_{ie + 1} \textit{ and } \dots f_{ip}(x) \le r_{ip}\\&\textit{ and } f_{ip}(y) \ge s_{ip}, \textit{ then } xS^{c}y\:,\end{aligned}$$

    where \( P=\{i1, \dots,ip\} \subseteq I, P^{\mathrm{N}}=\{i1, \dots, ie\}, P^{\mathrm{O}}=\{ie+1, \dots,ip\}, (h(i1), \dots, h(ie)) \in H_{i1}\times \dots \times H_{ie} \) and \( { (r_{ie + 1}, \dots,r_{ip}) } \),\( (s_{ie + 1}, \dots,s_{ip}) \in F_{ie + 1}\times \dots \times F_{ip} \); these rules are supportedby pairs of objects from the P-lower approximation of S c only;

  3. 3.

    \( { D_{\ge \le} } \)-decision rule, being a statement of the type:

    $$\begin{aligned}[b]& \textit{if } x P_{i 1}^{\ge h(i 1)}y \textit{ and } \dots x P_{ie}^{\ge h(i e)}y \textit{ and } x P_{i e+1}^{\le h(i e+1)}y \dots\\&\;\;x P_{if}^{\le h(if)}y \textit{ and } f_{if + 1}(x) \ge r_{if + 1}\\&\textit{ and } f_{if + 1}(y) \le s_{if + 1} \textit{ and } \dots f_{ig}(x) \ge r_{ig}\\& \textit{ and } f_{ig}(y) \le s_{ig} \textit{ and } f_{ig + 1}(x)\le r_{ig + 1}\\& \textit{ and } f_{ig + 1}(y) \ge s_{ig + 1} \textit{ and }\dots f_{ip}(x) \le r_{ip}\\ \textit{ and } & f_{ip}(y) \ge s_{ip},\textit{ then } xSy \textit{ or } xS^{c}y\:,\end{aligned}$$

    where \( O^{\prime}=\{i1, \dots,ie\} \subseteq I, O^{\prime\prime}=\{ie+1, \dots,if\} \subseteq I,\, P^{\mathrm{N}}=O^{\prime} \cup O^{\prime\prime}, O^{\prime} \) and \( { O^{\prime\prime} } \) not necessarilydisjoint, \( P^{\mathrm{O}}=\{if+1, \dots,ip\}, (h(i1), \dots,h(if)) \in H_{i1}\times \dots \times H_{if} \)and \( (r_{if + 1}, \dots,r_{ip}),(s_{if + 1}, \dots,s_{ip}) \in F_{if + 1}\times \dots \times F_{ip} \); these rules are supported by pairs of objects from the P-boundary of S and S conly.

Example Illustrating DRSA in the Context of Multiple Criteria Choice and Ranking

The following example illustrates DRSA in the context of multiple criteria choice and ranking. Sixwarehouses have been described by means of four criteria:

  • f 1, capacity of the sales staff,

  • f 2, perceived quality of goods,

  • f 3, high traffic location,

  • f 4, warehouse profit or loss.

The components of the information table S are: \( U=\{1,2,3,4,5,6\}, F=\{f_{1},f_{2},f_{3},f_{4}\}, I=\{1,2,3,4\}, F_{1}=\{\text{high, medium, low}\}, F_{2}=\{\text{good, medium}\}, F_{3}=\{\text{no, yes}\}, F_{4}=\{\text{profit, loss}\} \), the criterion \( { f_i(x) } \), taking values \( { f_1(1)=\text{high} } \), \( { f_2(1)=\text{good} } \), and so on.

Table 7 Information table of the illustrative example

It is assumed that the DM accepts to express preferences with respect to criteria \( { f_{1}, f_{2}, f_{3} } \) on a numerical non-quantitative scale, for which a complete preorder can bedefined on all possible pairs of evaluations. According to this assumption, in order to build thePCT, as described in Subsect. “Approximation by Means of Multi-graded Dominance Relation”, the DM specifies sets of possible degrees ofpreference; for example, \( { H_{1} =\{-2,-1,0,1,2\}, H_{2} =\{-1,0,1\}, H_{3} =\{-1,0,1\} } \). Therefore,with respect to f 1, there are the following preference relations \( { P_1^h } \):

  • \( { xP_1^2y } \) (and \( { yP_1^{-2}x } \)), meaning that x is preferred to y with respect to f 1, if\( { f_1(x)=\text{high} } \) and \( { f_1(y)=\text{low} } \),

  • \( { xP_1^1y } \) (and \( { yP_1^{-1}x } \)), meaning that x is weakly preferred to y with respect tof 1, if \( { f_1(x)=\text{high} } \) and \( f_1(y)=\text{medium} \), or \( { f_1(x)=\text{medium} } \) and \( { f_1(y)=\text{low} } \),

  • \( { xP_1^0y } \) (and \( { yP_1^{0}x) } \), meaning that x is indifferent to y with respect to f 1,if \( { f_1(x)=f_1(y) } \).

Analogously, with respect to f 2 and f 3, there are the following preference relations\( { P_2^h } \) and \( { P_3^h } \):

  • \( { xP_2^1y } \) (and \( { yP_2^{-1}x) } \), meaning that x is weakly preferred to y with respect tof 2, if \( { f_2(x)=\text{good} } \) and \( { f_2(y)=\text{medium} } \),

  • \( { xP_2^0y } \) (and \( { yP_2^{0}x) } \), meaning that x is indifferent to y with respect to f 2,if \( { f_2(x)=f_2(y) } \),

  • \( { xP_3^1y } \) (and \( { yP_3^{-1}x) } \), meaning that x is weakly preferred to y with respect tof 3, if \( { f_3(x)=\text{yes} } \) and \( { f_3(y)=\text{no} } \),

  • \( { xP_3^0y } \) (and \( { yP_3^{0}x) } \), meaning that x is indifferent to y with respect to f 3,if \( { f_3(x)=f_3(y) } \).

As to the comprehensive preference relation, the DM considers that, given two different warehouses \( x,y \in U=\{1,2,3,4,5,6\} \), if x makes profitand y makes loss, then xSy and \( { yS^{c}x } \). Moreover, the DM accepts xSx for eachwarehouse x. As to warehouses x and y,which both make profit or both make loss, the DM abstains from judging whether xSy or \( { xS^{c}y } \). Therefore, the set of exemplary pairwisecomparisons supplied by the DM is \( \hat{B}=\{\)(1,?1), (1,?2), (1,?4), (1,?5), (2,?1), (2,?2), (2,?3), (2,?6),(3,?2), (3,?3), (3,?4), (3,?5), (4,?1), (4,?3), (4,?4), (4,?6), (5,?1), (5,?3), (5,?5),(5,?6), (6,?2), (6,?4), (6,?5), (6,?6)}.

At this stage, the PCT can be build as shown in Table 8.

Table 8 Pairwise comparison table

The I-lower approximations, the I-upper approximations andthe I-boundaries of S and S c obtained by means of multi-graded dominance relations are asfollows:

  • \( \underline{I}(S) = \{ \)(1,?2), (1,?4), (1,?5), (3,?4), (6,?2), (6,?4), (6,?5)},

  • \( \overline{I}(S)=\{ \)(1,?1), (1,?2), (1,?4), (1,?5), (2,?2), (2,?3), (3,?2), (3,?3), (3,?4), (3,?5), (4,?4), (5,?3), (5,?5), (6,?2), (6,?4), (6,?5), (6,?6)},

  • \( \underline{I}(S^{c})= \{ \)(2,?1), (2,?6), (4,?1), (4,?3), (4,?6), (5,?1), (5,?6)},

  • \( \overline{I}(S^{c}) = \{ \)(1,?1), (2,?1), (2,?2), (2,?3), (2,?6), (3,?2), (3,?3), (3,?5), (4,?1), (4,?3), (4,?4), (4,?6), (5,?1), (5,?3), (5,?5), (5,?6), (6,?6)},

  • \( Bn_{I}(S) = Bn_{I}(S^{c}) = \{ \)(1,?1), (2,?2), (2,?3), (3,?2), (3,?3), (3,?5), (4,?4), (5,?3), (5,?5), (6,?6)}.

Therefore, the quality of approximation is equal to 0.58. Moreover, there is only one reduct which is also the core , i.?e. \( {RED_{S}(I)=CORE_{S}(I)=\{1\} } \).

Finally, the following decision rules can be induced (within parentheses there are the pairs ofobjects supporting the rule):

  • if \( { xP_1^{\ge 1}y }\), then xSy (or, in words, if x is at least weakly preferred to y with respect tof 1, then xoutranks y), ((1,?2), (1,?4), (1,?5), (3,?4), (6,?2), (6,?4),(6,?5)),

  • if \( { xP_1^{\le -1}y }\), then \( {xS^{c}y } \) (or, in words, if y is at least weaklypreferred to x with respect to f 1, then x does not outrank y), ((2,?1), (2,?6),(4,?1), (4,?3), (4,?6), (5,?1), (5,?6)),

  • if \( { xP_1^{\ge 0}y }\), and \( { xP_1^{\le 0}y }\), (i.?e. if \( { xP_1^{0}y } \)), then xSy or \( { xS^{c}y }\) (or, in words, if x and y are indifferent with respect to f 1, then x outranks y or x does not outrank y), ((1,?1), (2,?2), (2,?3), (3,?2), (3,?3),(3,?5), (4,?4), (5,?3), (5,?5), (6,?6)).

Let us assume now that the DM accepts to express preferences with respect to criteria f 1,\( { f_{2}, f_{3} } \) on an ordinal scale of preference, for which there is only information about a partial preorder on all possible pairs ofevaluations. In this case, S and S c can be approximated in the way described in Subsect. “Dominance Without Degrees of Preference”,i.?e. without considering degrees of preference. The I-lower approximations, the I-upper approximations and the I-boundaries of S and S c are as follows:

  • \( { \underline{I}(S) = \{ }\)(1,?1), (1,?2), (1,?4), (1,?5), (3,?4), (4,?4), (6,?2), (6,?4), (6,?5), (6,?6)},

  • \( { \overline{I}(S) = \{ } \)(1,?1), (1,?2), (1,?4), (1,?5), (2,?2), (2,?3), (3,?2), (3,?3), (3,?4), (3,?5), (4,?4), (5,?3), (5,?5), (6,?2), (6,?4), (6,?5), (6,?6)},

  • \( { \underline{I}(S^{c})= \{ } \)(2,?1), (2,?6), (4,?1), (4,?3), (4,?6), (5,?1), (5,?6)},

  • \( { \overline{I}(S^{c}) = \{ } \)(2,?1), (2,?2), (2,?3), (2,?6), (3,?2), (3,?3), (3,?5), (4,?1), (4,?3), (4,?6), (5,?1), (5,?3), (5,?5), (5,?6)},

  • \( { Bn_{I}(S) = Bn_{I}(S^{c}) = \{ } \)(2,?2), (2,?3), (3,?2), (3,?3), (3,?5), (5,?3), (5,?5)}.

Let us observe that the pairs (1,?1), (4,?4) and (6,?6) belong now to the I-lowerapproximation of S and are not contained in the I-boundaries. Therefore,the quality of approximation is equal to 0.71. Moreover, there is still only one reduct which is also the core ,i.?e. again \( {RED_{S}(I)=CORE_{S}(I)=\{1\} } \).

The following decision rules are induced from the above approximations and boundaries (withinparentheses there are the pairs of objects supporting the rule):

  • if \( { f_1(x) } \) is at least high and \( { f_1(y) } \) is at most high, then \( { xSy, } \) ((1,?1), (1,?2), (1,?4), (1,?5), (6,?2), (6,?4), (6,?5), (6,?6)),

  • if \( { f_1(x) } \) is at least low and \( { f_1(y) } \) is at most low, then \( { xSy, } \) ((1,?4), (3,?4), (4,?4), (6,?4)),

  • if \( { f_1(x) } \) is at most medium and \( { f_1(y) } \) is at least high, then \( { xS^{c}y, } \) ((2,?1), (2,?6), (4,?1), (4,?6)(5,?1), (5,?6)),

  • if \( { f_1(x) } \) is at most low and \( { f_1(y) } \) is at least medium, then \( { xS^{c}y, } \) ((4,?1), (4,?3), (4,?6))

  • if \( { f_1(x) } \) is at least medium and \( { f_1(y) } \) is at most medium and \( { f_1(x) } \) is at most medium and \( { f_1(y) } \) is at least medium, (i.?e. if \( { f_1(x) } \)is equal to medium and \( { f_1(y) } \) is equal to medium), then xSy or \( { xS^{c}y, } \) ((2,?2), (2,?3), (3,?2), (3,?3), (3,?5), (5,?3), (5,?5)).

DRSA for Decision Under Uncertainty

Basic Concepts

To apply the rough set approach to decision under uncertainty, the following basic elements must beconsidered:

  • a set \( { S=\{s_1, s_2,\dots , s_u\} }\) of states of the world, or simply states,which are supposed to be mutually exclusive and collectively exhaustive,

  • an a priori probability distribution P over the states of the world: more precisely, theprobabilities of states \( { s_1, s_2,\dots , s_u }\) are \( { p_1, p_2,\dots , p_u }\), respectively (\( p_1+ p_2+\dots + p_u=1, p_i\ge0, i=1,\dots ,u \)),

  • a set \( { A=\{a_1, a_2,\dots , a_m\} }\) of acts,

  • a set \( { X=\{x_1, x_2,\dots , x_r\} }\) of outcomes or consequences expressed in monetaryterms (\( { X\subseteq \Re }\)),

  • a function \( { g\colon A\rightarrow X }\) assigning to each pair act-state \( { (a_i,s_j)\in A \times S \times S } \) an outcome \( { x \in X }\),

  • a set of classes \( {\textit{\textbf{Cl}}=\{Cl_1, Cl_2, \ldots, Cl_p\} } \), such that \( { Cl_1 \cup Cl_2 \cup \ldots \cup Cl_p=A, Cl_r \cap Cl_q=\emptyset } \) for each \( { r,q \in \{1,\dots ,p\} } \) with \( { r\neq q } \); the classes of Cl are preference-ordered according to the increasing order of their indices,

  • a function \( { e\colon A \rightarrow\textit{\textbf{Cl}} } \) assigning each act \( {a_i\in A } \) to a class \( { Cl_j \in\textit{\textbf{Cl}} } \).

In this context, two different types of dominance can be considered:

  1. 1)

    Classical dominance:given \( { a_p,a_q\in A, a_p } \) dominates a q iff, for each possible state of the world, act a p gives an outcome at least as good as act a q ; more formally, \( { g(a_p,s_j) \ge g(a_q,s_j) }\), for each \( { s_j \in S }\),

  2. 2)

    Stochastic dominance:given \( { a_p,a_q \in A, a_p } \) dominates a q iff, for each outcome \( { x \in X }\), act a p gives an outcome atleast as good as x with a probability at least as great as the probability that act a q gives the same outcome, i.?e. for all \( { x \in X } \),

    $$P[S(a_p,x)] \ge P[S(a_q,x)]\:$$

    where, for each \( {(a_i,x)\in A \times X,\, S(a_i,x) = \{s_j \in S:} g(a_i,s_j)\ge x\} \).

In [19], it has been shown how to apply stochastic dominance in this context. On the basis ofan a priori probability distribution P, one can assign to each subset of states of the world\( { W\subseteq S\; (W \neq \emptyset } \)) the probability P(W) that one of the states in W is verified,i.?e. \( { P(W) = \sum_{i: s_i \in W}p_i } \), and then to build up the set ? of all possible valuesP(W), i.?e.

$$\Pi = \{\pi \in [0,1]\colon \pi = P(W), W \subseteq S\}\:.$$

Let us define the following functions \( { z\colon A \times S \rightarrow \Pi } \) and \( { z^{\prime}\colon A \times S \rightarrow \Pi } \) assigning to each act-state pair \( { (a_i,s_j)\in A \times S } \) a probability \( { \pi \in \Pi } \), as follows:

$$z(a_i,s_j) = \sum_{r: g(a_i,s_r)\ge g(a_i,s_j)} p_r\: ,$$

and

$$ z^{\prime}(a_i,s_j) = \sum_{r: g(a_i,s_r)\le g(a_i,s_j)} p_r\:. $$

Therefore, \( { z(a_i,s_j) } \) represents theprobability of obtaining an outcome whose value is at least \( { g(a_i,s_j) }\) by act a i . Analogously,\( { z^{\prime}(a_i,s_j) } \) represents theprobability of obtaining an outcome whose value is at most \( { g(a_i,s_j) }\) by act a i . On the basis offunction \( { z(a_i,s_j) } \), function\( { \rho\colon A \times \Pi \rightarrow X } \) canbe defined as follows:

$$\rho(a_i,\pi) = \operatorname{max}_{j\colon z(a_i,s_j)\ge \pi^{g(a_i ,s_j)}}\:.$$

Thus, \( { \rho(a_i,\pi)=x } \) means that the outcome got by act a i is greater than or equal to xwith a probability at least p (i.?e. a probability p or greater). On the basis of function\( { z^{\prime}(a_i,s_j) } \), the function \( { \rho^{\prime}\colon A \times \Pi \rightarrow X } \) can be defined asfollows:

$$\rho^{\prime}(a_i,\pi) = \operatorname{min}_{j\colon z(a_i,s_j)\le \pi^{g(a_i ,s_j)}}\:.$$

\( { \rho^{\prime}(a_i,\pi)=x } \) means that the outcome got by act a i is smaller than or equal to xwith a probability at least p.

Let us observe that information given by \( { \rho(a_i,\pi) } \) and \( { \rho^{\prime}(a_i,\pi) } \) isrelated. In fact, if the elements of \( \Pi,\, 0=\pi_{(0)}, \pi_{(1)}, \pi_{(2)} , \ldots, \pi_{(d)}=1\; (d=|\Pi|) \), are reordered in such a way that \( 0=\pi_{(0)} \le \pi_{(1)} \le \pi_{(2)} \le \ldots \le \pi_{(d)}=1 \), then

$$\rho\left(a_i,\pi_{(j)}\right) = \rho^{\prime}\left(a_i,1-\pi_{(j-1)}\right)\:.$$

Therefore, \( { \rho(a_i,\pi_{(j)}) \le x } \) is equivalent to \( \rho^{\prime}(a_i,1-\pi_{(j-1)}) \ge x, a_i\in A, \pi_{(j-1)},\pi_{(j)} \in \Pi, x \in X \). This implies that the analysis of the possibledecisions can be equivalently conducted on values of either \( { \rho(a_i,\pi) } \) or\( { \rho^{\prime}(a_i,\pi) } \). However, from the point of view of representation of results, it isinteresting to consider both values \( { \rho(a_i,\pi) } \) and \( { \rho^{\prime}(a_i,\pi) } \). The reason isthat, contrary to intuition, \( { \rho(a_i,\pi) \le x } \) is not equivalent to the statement that by acta i the outcome is smaller than or equal to x with a probability at least p. The followingexample clarifies this point. Let us consider a game a with rolling a dice, in which if the resultis 1, then the gain is 1, if the result is 2 then the gain is 2, and so on. Suppose, moreover, thatthe dice is equilibrated and thus each result is equiprobable with probability 1/6. The values of\( { \rho(a_i,\pi) } \) for all possible values of probability are:

$$\rho(a, 1/6) = \$6, \quad \rho(a, 2/6) = \$5, \quad \rho(a, 3/6) = \$4\: ,$$
$$\rho(a, 4/6) = \$3, \quad \rho(a, 5/6) = \$2, \quad \rho(a, 6/6) = \$1\:.$$

Let us remark that \( { \rho(a, 5/6) \le \$3 }\) (indeed, \( \rho(a, 5/6) = \$2,\) and thus \( { \rho(a, 5/6) \le \$3 }\) is true), however, this is not equivalent to the statement that by act a theoutcome is smaller than or equal to $3 with a probability at least 5/6. In fact, this is false because this probability is 3/6 (related to results 1, 2and 3). Analogously, the values of \( { \rho^{\prime}(a, \pi) }\) for all possible values of probability are:

$$\rho^{\prime}(a, 1/6) = \$1, \quad \rho^{\prime}(a, 2/6) = \$2, \quad \rho^{\prime}(a, 3/6) = \$3\: ,$$
$$ \rho^{\prime}(a, 4/6) = \$4, \quad \rho^{\prime}(a, 5/6) = \$5, \quad \rho^{\prime}(a, 6/6) = \$6\:.$$

Let us remark that \( { \rho^{\prime}(a,5/6) \ge \$5 } \) (indeed, \( { \rho^{\prime}(a, 5/6)= \$5,} \) and thus \( { \rho^{\prime}(a, 5/6) \ge\$3 } \) is true), however, this is not equivalent to the statement that by act athe outcome is greater than or equal to 3 with a probability at least 5/6. In fact, this is false because this probability is 4/6 (related to results 3, 4,5 and 6). Therefore, in the context of stochastic acts, an outcome expressed in positive terms refers to \( { \rho(a, \pi) } \) giving a lower bound of an outcome (“for act a there isa probability p to gain at least \( {\rho(a, \pi)} \)”), while an outcome expressed in negative terms refers to \( { \rho^{\prime}(a,\pi) } \) giving an upper bound of an outcome (“for act a there is a probability p to gain at most \( {\rho^{\prime}(a,\pi) } \)”).

Given \( { a_p,a_q \in A, a_p } \) stochastically dominates a q if and only if \( { \rho(a_p,\pi) \ge\rho(a_q,\pi) } \) for each \( { \pi \in \Pi }\). This is equivalent to the statement: given \( {a_p,a_q \in A, a_p } \) stochastically dominates a q if and only if \( { \rho^{\prime}(a_p,\pi) \ge\rho^{\prime}(a_q,\pi) } \) for each \( { \pi \in\Pi } \).

For example, consider the game \( { a^* } \) with rolling a dice, in which if the result is 1, then thegain is $7, if the result is 2 then the gain is $6, and so on until the case in which theresult is 6 and the gain is $2. In this case game \( { a^* } \) stochastically dominates game abecause \( { \rho(a^*, 1/6) = \$7 } \) is not smaller than \( \rho(a, 1/6) = \$6,\, \rho(a^*, 2/6) = \$6 \) isnot smaller than \( { \rho(a, 2/6) = \$5 } \), and so on. Equivalently, game \( { a^* } \) stochastically dominatesgame a because \( { \rho^{\prime}(a^*, 1/6) = \$2 } \) is not smaller than \( \rho^{\prime}(a, 1/6) = \$1, \rho^{\prime}(a^*, 2/6) = \$3 \) is not smaller than \( \rho^{\prime}(a, 2/6) = \$2 \), and so on.

DRSA can be applied in the context of decision under uncertainty considering as set of objects Uthe set of acts A, as set of criteria (condition attributes) I the set ?, as decisionattribute \( { \{d\} } \) the classification Cl, as value set of all criteria theset X, as information function f a function f such that\( { \textbf{\textit{f}}(a_i, \pi) = \rho(a_i, \pi) } \) and \( { \textbf{\textit{f}}(a_i,cl)=e(a_i) } \). Let usobserve that due to equivalence \( { \rho(a_i,\pi_{(j)}) = \rho^{\prime}(a_i,1-\pi_{(j-1)}) } \), one canalso consider information function \( { \textbf{\textit{f}}^{\prime}(a_i, \pi) = \rho^{\prime}(a_i,\pi) } \).

The aim of the rough set approach to preferences under uncertainty is to explain the preferences ofthe DM represented by the assignments of the acts from A to the classes from Clin terms of stochastic dominance, expressed by means of function ?. The syntax of decision rules obtained from this rough set approach is asfollows:

  1. 1)

    \( { D_{\ge} }\)-decision rules with the following syntax: “if \( { \rho(a, p_{h_1}) \ge x_{h_1} } \) and …, and \( { \rho(a, p_{h_z})\ge x_{h_z} } \), then \( { a \in Cl^{\ge}_r } \)” (i.?e. “if byact a the outcome is at least \( { x_{h_1} }\) with probability at least \( { p_{h_1} }\), and …, and the outcome is at least \( {x_{h_z} } \) with probability at least \( { p_{h_z} }\), then \( { a \in Cl^{\ge}_r }\)”) where \( { p_{h_1}, \dots, p_{h_z} \in\Pi, x_{h_1}, \dots, x_{h_z} \in X } \) and \( { r\in \{2, \dots,p\} } \);

  2. 2)

    \( { D_{\le} }\)-decision rules with the following syntax: “if \( { \rho^\prime(a, p_{h_1}) \le x_{h_1} } \) and …, and \( { \rho^\prime(a, p_{h_z}) \le x_{h_z} } \), then\( { a \in Cl^{\le}_r } \)” (i.?e. “if byact a the outcome is at most \( { x_{h_1} }\) with probability at least \( { p_{h_1} }\), and …, and the outcome is at most \( {x_{h_z} } \) with probability at least \( { p_{h_z} }\), then \( { a \in Cl^{\le}_r }\)”) where \( { p_{h_1}, \dots, p_{h_z} \in\Pi, x_{h_1}, \dots, x_{h_z} \in X } \) and \( { r\in \{1, \dots,p-1\} } \);

  3. 3)

    \( { D_{\ge\le} }\)-decision rules with the following syntax: “if \( { \rho(a, p_{h_1}) \ge x_{h_1} } \) and …, and \( { \rho(a, p_{h_z}) \ge x_{h_w} } \) and\( { \rho^\prime(a, p_{h_{w+1}}) \le x_{h_{w+1}} }\) and …, and \( {\rho^\prime(a, p_{h_z}) \le x_{h_z} } \), then \( { a \in Cl_s \cup Cl_{s+1} \cup \dots Cl_t } \)” (i.?e. “if by act a the outcome is at least \( { x_{h_1} }\) with probability at least \( { p_{h_1} }\), and …, and the outcome is at least \( {x_{h_w} } \) with probability at least \( { p_{h_w} }\) and the outcome is at most \( { x_{h_{w+1}} }\) with probability at least \( { p_{h_{w+1}} }\), and …, and the outcome is at most \( {x_{h_z} } \) with probability at least \( { p_{h_z} }\), then \( { a \in Cl_s \cup Cl_{s+1} \cup \dots Cl_t } \)”) where \( { p_{h_1}, \dots, p_{h_w},p_{h_{w+1}}, p_{h_z} \in \Pi,\, x_{h_1}, \dots, x_{h_z} \in X } \) and \( { s,t \in \{1, \dots,p\} } \), such that \( { s<t } \).

According to the meaning of \( { \rho(a_i, p) }\) and \( { \rho^\prime(a_i, p) }\) discussed above, \( { D_{\ge} }\)-decision rules are expressed in terms of \({ \rho(a_i, p), D_{\le} } \)-decision rules are expressed in terms of \( { \rho^\prime(a_i, p) } \), and \( { D_{\ge\le} } \)-decision rules are expressed in terms ofboth \( { \rho(a_i, p) } \) and \( { \rho^\prime(a_i, p) } \). Let us observe that due toequivalence \( \rho(a_i,\pi_{(j)}) = \rho^{\prime}(a_i,1-\pi_{(j-1)})\), all above decision rules can be expressed equivalently in terms of values of \( { \rho(a_i, p) } \) or \( { \rho^\prime(a_i, p) } \). For example, a \( { D_{\ge} } \)-decision rule \( r_\ge(\rho)= \)“if \( \rho(a, p_{h_1})\ge x_{h_1} \) and …, and \( \rho(a, p_{h_z})\ge x_{h_z} \), then \( a \in Cl^{\ge}_r \)” can be expressed in terms of\( { \rho^\prime(a_i,p) } \) as \( r_\ge(\rho^\prime)= \)“if \( \rho^\prime(a, p^\ast_{h_1})\ge x_{h_1} \) and …, and\( \rho^\prime(a, p^{\ast}_{h_z})\ge x_{h_z} \),then \( a \in Cl^{\ge}_r \)”, where, if\( p_{h_r}=\pi (j_r) \), then \( p^\ast_{h_r}=1-\pi(j_r-1) \), with \( r=1, \dots,z \), and \( 0=\pi(0),\pi(1),\pi(2), \dots,\pi(|P|)=1 \) reordered in sucha way that \( 0=\pi(0) \le \pi(1) \le \pi(2) \le \dots \le \pi(|P|)=1\). Analogously, a \( { D_{\le} }\)-decision rule \( r_\le(\rho^\prime)=\)“if \( \rho^\prime(a, p_{h_1})\le x_{h_1}\) and …, and \( \rho^\prime(a, p_{h_z})\le x_{h_z} \), then \( a \in Cl^{\le}_r\)” can be expressed in terms of \( {\rho(a_i,p) } \) as \( r_\le(\rho)=\)“if \( { \rho(a, p^\ast_{h_1})\le x_{h_1} }\) and …, and \( { \rho(a, p^{\ast}_{h_z})\le x_{h_z} } \), then \( a \in Cl^{\le}_r\), where, if \( p_{h_r}=\pi (j_r)\), then \( p^\ast_{h_r}=1-\pi (j_r-1)\), with \( r=1, \dots,z\), and \( 0=\pi(0),\pi(1),\pi(2), \dots,\pi(|P|)=1\) reordered in such a way that \( 0=\pi(0) \le\pi(1) \le \pi(2) \le \dots \le \pi(|P|)=1 \).

Let us observe, however, that \( { r_{\ge}(\rho) } \) is an expression much more natural and meaningfulthan \( { r_{\ge}(\rho^\prime) } \), as well as \( { r_{\le}(\rho^\prime) } \) is an expression much more naturaland meaningful than \( { r_{\le}(\rho) } \). Another useful remark concerns minimality of rules, related tothe specific intrinsic structure of the stochastic dominance. Let us consider the following twodecision rules:

  • \( { r1 \equiv } \) “if by act a the outcome is at least100 with probability at least 0.25,then a is at least good”,

  • \( { r2 \equiv } \) “if by act a the outcome is at least 100 with probability at least 0.50,then a is at least good”.

\( { r1 } \) and \( { r2 } \) can be induced from the analysis of the sameinformation table, because they involve different criteria (condition attributes). In fact, \( { r1 } \) involves attribute \( { \rho(a,0.25) } \) (it can be expressed as “if \( { \rho(a,0.25)\ge 100 } \), then a is at least good”), \( { r2 }\) involves attribute \( { \rho(a,0.50) }\) (it can be expressed as “if \( {\rho(a,0.50)\ge100 } \), then a is at least good”). Considering thestructure of the stochastic dominance, the condition part of rule \( { r1 }\) is the weakest. In fact, rule \( { r1 }\) requires a cumulated outcome to be at least 100 with probability of 0.25, while rule \( { r2 } \) requires the same outcome but with a greaterprobability, 0.5 against 0.25. Since the decision part of these two rules is the same, \( { r1 } \) is minimal among these two rules. From a practical point of view, this observationsays that, if one induces decision rules using the algorithms designed for DRSA, it is necessary to further filter the obtained results in order to removerules which are not minimal in the specific context of the DRSA analysis based on stochastic dominance.

Example Illustrating DRSA in the Context of Decision Under Uncertainty

The following example illustrates the approach. Let us consider

  • a set \( { S=\{s_1, s_2, s_3\} } \) of states of the world,

  • an a priori probability distribution P over the states of the world defined as follows:\( p_1=0.25, p_2=0.35, p_3=0.40 \),

  • a set \( { A=\{a_1, a_2, a_3, a_4, a_5, a_6\} } \) of acts,

  • a set \( { X=\{0, 10, 15, 20, 30\} } \) of consequences

  • a set of classes \( { \textit{\textbf{Cl}}=\{Cl_1, Cl_2, Cl_3\} } \), where Cl 1 is the set of badacts, Cl 2 is the set of medium acts, Cl 3 is the set of good acts,

  • a function \( { g\colon A \rightarrow X }\) assigning to each act-state pair \( { (a_i, s_j)\in A \times S \times S } \) a consequence \( { x_h\in X} \) and a function \( { e\colon A \rightarrow\textit{\textbf{Cl}} } \) assigning each act \( { a_i\in A } \) to a class \( { Cl_j \in\textit{\textbf{Cl}} } \) presented in the following Table 9.

Table 9 Acts, consequences and assignment to classes from Cl

Table 10 shows the values of function \( { \rho(a_i, p) } \).

Table 10 Acts, values of function \( { \rho(a_i, p) } \) and assignment to classes fromCl

Table 10 is the decision table on which the DRSA is applied. Let us give some examples ofthe interpretation of the values in Table 10. The column of act a 3 can be read asfollow:

  • the value 20 in the row corresponding to 0.25 means that the outcome is at least 20 witha probability of at least 0.25,

  • the value 15 in the row corresponding to 0.65 means that the outcome is at least 15 witha probability of at least 0.65,

  • the value 0 in the row corresponding to 0.75 means that the outcome is at least 0 witha probability of at least 0.75.

Analogously, the row corresponding to 0.65, can be read as follows:

  • the value 10 relative to a 1, means that by act a 1 the outcome is at least 10 witha probability of at least 0.65,

  • the value 20 relative to a 2, means that by act a 2 the outcome is at least 20 witha probability of at least 0.65,

  • and so on.

Applying DRSA, the following upward union and downward union of classes are approximated:

  • \( { Cl_2^\ge=Cl_2 \cup Cl_3 } \), i.?e. the set of the acts at least medium,

  • \( { Cl_3^\ge=Cl_3 } \), i.?e. the set of the acts (at least) good,

  • \( { Cl_1^\le=Cl_1 } \), i.?e. the set of the acts (at most) bad,

  • \( { Cl_2^\le=Cl_1 \cup Cl_2 } \),i.?e. the set of the acts at most medium.

The first result of the DRSA approach was a discovery that the decision table (Table 10) is notconsistent. Indeed, Table 10 shows that act a 4stochastically dominates act a 3, however act a 3 is assigned to a better class (medium) than act a 4 (bad). Therefore, act a 3 cannot be assignedwithout doubts to the set of the class of the at least medium acts as well as act a 4 cannot beassigned without doubts to the set of the classes of the (at most) bad acts. In consequence, lower approximation and upper approximation of \( { Cl_2^\ge, Cl_3^\ge } \) and \( { Cl_1^\le, Cl_2^\le } \) are equal, respectively, to

  • \( { \underline{I}\left(Cl_2^\ge\right)=\{a_1,a_2,a_5,a_6\}=Cl_2^\ge -\{a_3\} } \),

  • \( { \overline{I}\left(Cl_2^\ge\right)=\{a_1,a_2,a_3,a_4,a_5,a_6\}=Cl_2^\ge \cup \{a_4\} } \),

  • \( { \underline{I}\left(Cl_3^\ge\right)=\{a_1,a_6\}=Cl_3^\ge } \),

  • \( { \overline{I}\left(Cl_3^\ge\right)=\{a_1,a_6\}=Cl_3^\ge } \),

  • \( { \underline{I}\left(Cl_1^\le\right)=\emptyset=Cl_1^\le-\{a_4\} } \),

  • \( { \overline{I}\left(Cl_1^\le\right)=\{a_3,a_4\}=Cl_1^\le \cup \{a_3\} } \),

  • \( { \underline{I}\left(Cl_2^\le\right)=\{a_2,a_3,a_4,a_5\}=Cl_2^\le } \),

  • \( { \overline{I}\left(Cl_2^\le\right)=\{a_2,a_3,a_4,a_5\}=Cl_2^\le } \).

Since there are two inconsistent acts on a total of six acts (\( { a_3,a_4 } \)), then the quality ofapproximation of the ordinal classification is equal to 4/6. The second discovery was one reduct ofcriteria (condition attributes) ensuring the same quality of approximation as the whole set ?of probabilities: \( { \mathrm{RED}_{\textit{\textbf{Cl}}}=\{0.25, 0.75, 1\} } \). This means that the preferencesof the DM can be explained using only the probabilities fromRED Cl . RED Cl is also the core becauseno probability value in RED Cl can be removed without deterioratingthe quality of approximation. The third discovery was a set of minimal decision rules describingthe DM's preferences (within parentheses there is a verbal interpretation of the correspondingdecision rule, and the supporting acts):

  1. 1)

    if \( { \rho(a_i, 0.25) \ge 30 } \), then \( { a_i \in Cl_3^\ge } \) (if the probability ofgaining at least 30 is at least 0.25, then act a i is at least good) (a 1),

  2. 2)

    if \( { \rho(a_i, 0.75) \ge 20 } \) and \( { \rho(a_i, 1) \ge 10 } \), then \( { a_i \in Cl_3^\ge } \)(if the probability of gaining at least 20 is at least 0.75 and the probability of gainingat least 10 is (at least) 1 (i.?e. for sure the gaining is at least 10), then act a i is atleast good) (a 6),

  3. 3)

    if \( { \rho(a_i, 1) \ge 10 } \), then \( { a_i \in Cl_2^\ge } \) (if the probability of gainingat least 10 is (at least) 1 (i.?e. for sure the gaining is at least 10), then act a i is atleast medium) (\( { a_1,a_5,a_6 } \)),

  4. 4)

    if \( { \rho(a_i, 0.75) \ge 20 } \), then \( { a_i \in Cl_2^\ge } \) (if the probability ofgaining at least 20 is at least 0.75, then act a i is at least medium) (\( { a_2,a_6 } \)),

  5. 5)

    if \( { \rho(a_i, 0.25) \le 20 }\) (i.?e. \( \rho^\prime(a_i, 1) \le 20\)) and \( \rho(a_i, 0.75) \le 15\) (i.?e. \( { \rho^\prime(a_i, 0.35) \le 15 }\)), then \( { a_i \in Cl_2^\le }\) (if the probability of gaining at most 20 is (at least) 1 (i.?e. for sure you gain at most20) and the probability to gain at most 15 is at least 0.35, then act a i is at most medium) (\( { a_3,a_4, a_5 }\)),

  6. 6)

    if \( { \rho(a_i, 1) \le 0 } \) (i.?e. \( { \rho^\prime(a_i, 0.25) \le 0 } \)), then \( { a_i \in Cl_2^\le } \)(if the probability of gaining at most 0 is at least 0.25, then act a i is at mostmedium) (\( { a_2,a_3,a_4 } \)),

  7. 7)

    if \( { \rho(a_i, 1) \ge 0 } \) and \( { \rho(a_i, 1) \le 0 } \) (i.?e. \( { \rho(a_i, 1)=0 } \)) and \( \rho(a_i, 0.75)\le 15\; (\rho^\prime(a_i, 0.35) \le 15 \)), then \( a_i \in Cl_1 \cup Cl_2 \) (if the probability of gaining at least 0 is 1, then act a i is at most medium) (\( { a_2,a_3,a_4 } \)).

Minimal sets of minimal decision rules represent the most concise and non-redundant knowledge contained in Table 9 (and, consequently, in Table 10). The above minimal set of 7 decision rulesuses 3 attributes (probability \( { 0.25, 0.75 } \)and 1) and 11 elementary conditions, i.?e. 26% of descriptors from the original data table (Table 10). Of course, this is only a didactic example. Representation in terms of decision rules of larger sets of exemplaryacts from real applications are more synthetic in the sense of the percentage of used descriptors from the original decision table.

Multiple Criteria Decision Analysis Using Association Rules

In multiple criteria decision analysis, the DM is often interested in relationships between attainable values of criteria. This information is particularly useful in multiobjective optimization (see [32] and Sect. “Interactive Multiobjective Optimization Using DRSA (IMO-DRSA)”). For instance, in a car selection problem, one can observe that in the set of considered cars, if the maximum speed is at least 200?km/h and the time to reach 100?km/h is at most 7?s, then the price is not less than 40,000?$ and the fuel consumption is not less than 9?liters per 100?km. These relationships are association rules whose general syntax, in case of minimization of criteria \( { f_i, i \in I } \), is:

“if \( { f_{i_1}(x) \le r_{i_1} } \) and … and \( { f_{i_p}(x) \le r_{i_p} } \), then \( { f_{i_{p+1}}(x) \ge r_{i_{p+1}} } \) and … and \( { f_{i_q}(x) \ge r_{i_q} } \)”, where \( { \{i_1, \dots , i_q\}\subseteq I, r_{i_1} } \), …, \( { r_{i_q} \in \Re } \).

If criterion \( { f_i, i \in I } \), should be maximized, the corresponding condition in the association rule should be reversed, i.?e. in the premise, the condition becomes \( { f_i(x) \ge r_i } \), and in the conclusion it becomes \( { f_i(x) \le r_i } \).

Given an association rule \( { r \equiv } \) “if \( { f_{i_1}(x) \le r_{i_1} } \) and … and \( { f_{i_p}(x) \le r_{i_p} } \), then \( { f_{i_{p+1}}(x) \ge r_{i_{p+1}} } \) and … and \( { f_{i_q}(x) \ge r_{i_q} } \)”,an object \( { y \in U } \) supports r if \( { f_{i_1}(y) \le r_{i_1} } \) and … and\( { f_{i_p}(y) \le r_{i_p} } \) and \( { f_{i_{p+1}}(y) \ge r_{i_{p+1}} } \) and … and \( { f_{i_q}(y) \ge r_{i_q} } \). Moreover, object \( { y \in U } \) supporting decision rule r is a base of r if\( { f_{i_1}(y) = r_{i_1} } \) and … and \( { f_{i_p}(y) = r_{i_p} } \) and \( { f_{i_{p+1}}(y) = r_{i_{p+1}} } \)and … and \( { f_{i_q}(y) = r_{i_q} } \). An association rule having at least one base is calledrobust.

An association rule \( { r \equiv } \) “if \( { f_{i_1}(x) \le r_{i_1} } \) and … and \( { f_{i_p}(x) \le r_{i_p} } \), then \( { f_{i_{p+1}}(x) \ge r_{i_{p+1}} } \) and … and \( { f_{i_q}(x) \ge r_{i_q} } \)” holdsin universe U if:

  1. 1)

    there is at least one \( { y \in U } \) supporting r,

  2. 2)

    r is not contradicted in U, i.?e. there is no \( { z \in U } \) such that \( { f_{i_1}(z) \le r_{i_1} } \) and … and \( { f_{i_p}(z) \le r_{i_p} } \), while not \( { f_{i_{p+1}}(z) \ge r_{i_{p+1}} } \) or … or \( { f_{i_q}(z) \ge r_{i_q} } \).

Given the two association rules:

  • \( { r_1\equiv } \)“if \( { f_{i_1}(x) \le r^1_{i_1} } \) and … and \( { f_{i_p}(x) \le r^1_{i_p} } \),then \( { f_{i_{p+1}}(x) \ge r^1_{i_{p+1}} } \) and … and \( { f_{i_q}(x) \ge r^1_{i_q} } \)”,

  • \( { r_2\equiv } \)“if \( { f_{j_1}(x) \le r^2_{j_1} } \) and … and \( { f_{j_s}(x) \le r^2_{j_s} } \),then \( { f_{j_{s+1}}(x) \ge r^2_{j_{s+1}} } \) and … and \( { f_{j_t}(x) \ge r^2_{j_t} } \)”,

rule r 1 is not weaker than rule r 2, denoted by \( { r_1 \underline \triangleright r_2 } \), if:

  1. a)

    \( { \{i_1, \dots ,i_p\} \subseteq \{j_1, \dots,j_s\} } \),

  2. ß)

    \( { r^1_{i_1} \ge r^2_{i_1}, \dots, r^1_{i_p} \ge r^2_{i_p} } \),

  3. ?)

    \( { \{i_{p+1}, \dots ,i_q\} \supseteq \{j_{s+1}, \dots,j_t\} } \),

  4. d)

    \( { r^1_{j_{s+1}} \ge r^2_{j_{s+1}}, \dots, r^1_{j_t} \ge r^2_{j_t} } \).

Conditions ß and d are formulated for criteria f i to be minimized. If criterion f i should be maximized, the corresponding inequalities should be reversed, i.?e. \( { r^1_i \le r^2_i } \) in condition ß as well as incondition d. Notice that \( { \underline \triangleright }\) is a binary relation on the set of association rules, which is a partial preorder, i.?e. it isreflexive (each rule is not weaker than itself) and transitive. The asymmetric part of the relation \( { \underline \triangleright } \) is denoted by ?, and \( { r_1 \triangleright r_2 } \) reads “r 1 is stronger than r 2”.

For example, consider the following association rules:

  • \( { r_1 \equiv } \) “if the maximum speed is at least 200?km/h and the time to reach 100?km/h isat most 7?s, then the price is not less than 40,000?$ and the fuel consumption is not lessthan 9 liters per 100?km”,

  • \( { r_2 \equiv } \) “if the maximum speed is at least 200?km/h and the time to reach 100?km/h isat most 7?s and the horse power is at least 175?kW, then the price is not less than40,000?$ and the fuel consumption is not less than 9 liters per 100?km”,

  • \( { r_3 \equiv } \) “if the maximum speed is at least 220?km/h and the time to reach 100?km/h isat most 7?s, then the price is not less than 40,000?$ and the fuel consumption is not lessthan 9 liters per 100?km”,

  • \( { r_4 \equiv } \) “if the maximum speed is at least 200?km/h and the time to reach 100?km/h isat most 7?s, then the price is not less than 40,000?$”,

  • \( { r_5 \equiv } \) “if the maximum speed is at least 200?km/h and the time to reach 100?km/h isat most 7?s, then the price is not less than 35,000?$ and the fuel consumption is not lessthan 9 liters per 100?km”,

  • \( { r_6 \equiv } \) “if the maximum speed is at least 220?km/h and the time to reach 100?km/h isat most 7?s and the horse power is at least 175?kW, then the price is not less than35,000?$”.

Let us observe that rule r 1 is stronger than each of the other five rules for the followingreasons:

  • \( { r_1 \triangleright r_2 } \) for condition a) because, all things equal elsewhere, inthe premise of r 2 there is an additional condition: “the horse power is at least 175?kW”,

  • \( { r_1 \triangleright r_3 } \) for condition ß) because, all things equal elsewhere, inthe premise of r 3 there is a condition with a worse threshold value: “the maximum speed is atleast 220?km/h” instead of “the maximum speed is at least 200?km/h”,

  • \( { r_1 \triangleright r_4 } \) for condition ?) because, all thing equal elsewhere, inthe conclusion of r 4 one condition is missing: “the fuel consumption is not less than 9 litersper 100?km”,

  • \( { r_1 \triangleright r_5 } \) for condition d) because, all thing equal elsewhere, inthe conclusion of r 5 there is a condition with a worse threshold value: “the price is not lessthan 35,000?$” instead of ?“the price is not less than 40,000?$”,

  • \( { r_1 \triangleright r_6 } \) for conditions a), ß), ?) and d) becauseall weak points for which rules \( { r_2, r_3, r_4 } \) and r 5 are weaker than rule r 1 are present inr 6.

An association rule r is minimal if there is no other rule stronger than r withrespect to ?. An algorithm for induction of association rules from preferenceordered data has been presented in [20].

Interactive Multiobjective Optimization Using DRSA (IMO-DRSA)

This section presents a recently proposed method for Interactive Multiobjective Optimization using Dominance-based Rough Set Approach(IMO-DRSA) [32]. Assuming that objective functions \( { f_{i,j=1,\dots ,n} } \), are to minimized, the method is composed of the following steps.

Step 1.:

Generate a representative sample of solutions from the currentlyconsidered part of the Pareto optimal set .

Step 2.:

Present the sample to the DM, possibly together with association rules showing relationships between attainable values of objective functions in the Pareto optimal set.

Step 3.:

If the DM is satisfied with one solution from the sample, then this is themost preferred solution and the procedure stops. Otherwise continue.

Step 4.:

Ask the DM to indicate a subset of relatively “good” solutions in thesample.

Step 5.:

Apply DRSA to the current sample of solutions classified into “good” and“others” solutions, in order to induce a set of decision rules with the following syntax “if\( { f_{j_1}(\textbf{x}) \le \alpha_{j_1} } \) and … and \( { f_{j_p}(\textbf{x}) \le \alpha_{j_p} } \), thensolution x is good”, \( { \left\{j_1, \dots, j_p\right\} \subseteq \left\{1, \dots, n\right\} } \).

Step 6.:

Present the obtained set of rules to the DM.

Step 7.:

Ask the DM to select the decision rules most adequate to his/herpreferences.

Step 8.:

Adjoin the constraints \( f_{j_1}(\textbf{x}) \le \alpha_{j_1},\dots, f_{j_p}(\textbf{x}) \le \alpha_{j_p} \) coming from the rules selected in Step 7 to the setof constraints imposed on the Pareto optimal set, in order to focus on a part interesting from thepoint of view of DM's preferences .

Step 9.:

Go back to Step 1.

In a sequence of iterations, the method is exploring the Pareto optimal set of a multiobjectiveoptimization problem or an approximation of this set. In the calculation stage (Step 1),any multiobjective optimization method, which finds the Pareto optimal set or its approximation,such as Evolutionary Multiobjective Optimization methods, can be used. In the dialogue stage of themethod (Step 2 to 7), the DM is asked to select a decision rule induced from his/herpreference information, which is equivalent to fixing some upper bounds for the minimized objectivefunctions f j .

In Step 1, the representative sample of solutions from the currently considered part ofthe Pareto optimal set can be generated using one of existing procedures, suchas [33,57,58]. It is recommended to use a fine grained sample ofrepresentative solutions to induce association rules, however, the sample of solutions presented tothe DM in Step 2 should be much smaller (about a dozen) in order to avoid an excessivecognitive effort of the DM. Otherwise, the DM would risk to give non reliable information.

The association rules presented in Step 2 help the DM in understanding what (s)he canexpect from the optimization problem. More precisely, any association rule

“if \( { f_{i_1}(x) \le r_{i_1} } \) and … and \( { f_{i_p}(x) \le r_{i_p} } \), then \( { f_{i_{p+1}}(x) \ge r_{i_{p+1}} } \) and … and \( { f_{i_q}(x) \ge r_{i_q} } \)”, where \( { \{i_1, \dots, i_q\}\subseteq I, r_{i_1}, \dots, r_{i_q} \in \Re } \)

says to the DM that, if (s)he wants attain the values of objective functions \( { f_{i_1}(x) \le r_{i_1} } \) and … and \( { f_{i_p}(x) \le r_{i_p} } \), then (s)he cannot reasonably expect toobtain values of objective functions \( { f_{i_{p+1}}(x) < r_{i_{p+1}} } \) and … and \( { f_{i_q}(x) < r_{i_q} } \).

With respect to the ordinal classification of solutions into the two classes of “good” and“others”, observe that “good” means in fact “relatively good”, i.?e. better than therest. In case, the DM would refuse to classify as “good” any solution, one can ask the DM tospecify some minimal requirements of the type \( { f_{j_1}(\textbf{x}) \le \alpha_{j_1} } \) and … and\( { f_{j_p}(\textbf{x}) \le \alpha_{j_p} } \) for “good” solutions. These minimal requirements give someconstraints that can be used in Step 8, in the same way as the analogous constraintscoming from selected decisions rules.

The rules considered in Step 5 have a syntax corresponding to minimization of objectivefunctions. In case of maximization of an objective function f j , the condition concerning thisobjective in the decision rule should have the form \( { f_j(\textbf{x}) \ge \alpha_j } \).

Remark, moreover, that the Pareto optimal set reduced in Step 8 by constraints \( { f_{j_1} (\textbf{x})\le \alpha_{j_1}, \dots , f_{j_p}(\textbf{x}) \le \alpha_{j_p} }\) is certainly not empty if these constraints are coming from one decision rule only. Since robust rules (see theglossary Dominance-Based Rough Set Approach (DRSA)) are considered, the threshold values \( { \alpha_{j_1}, \dots, \alpha_{j_p} } \) are values of objectivefunctions of some solutions from the Pareto optimal set. If \( { \{j_1, \dots , j_p\}=\{1, \dots , n \} } \), i.?e. \( { \{j_1,\dots , j_p \} } \) is the set of all objective functions, then the new reduced part of the Pareto optimal set containsonly one solution x such that \( {f_1(\textbf{x})=\alpha_1, \dots , f_n(\textbf{x})=\alpha_n } \). If \( { \{j_1, \dots , j_p \} \subset \{1, \dots , n \} } \), i.?e. \( { \{j_1, \dots , j_p \} } \) is a proper subset of the setof all objective functions, then the new reduced part of the Pareto optimal set contains solutions satisfying conditions \( { f_{j_1} (\textbf{x})\le \alpha_{j_1} } \) and … and\( { f_{j_p} (\textbf{x})\le \alpha_{j_p} } \).Since the considered rules are robust , then there is at least one solution x satisfying these constraints. When the Paretooptimal set is reduced in Step 8 by constraints \( {f_{j_1} (\textbf{x})\le \alpha_{j_1}, \dots , f_{j_p}(\textbf{x}) \le \alpha_{j_p} } \) coming from more than one rule,then it is possible that the resulting reduced part of the Pareto optimal set is empty. Thus, before passing to Step 9, itis necessary to verify if the reduced Pareto optimal set is not empty. If the reduced Pareto optimal set is empty, then the DM is required to revise his/herselection of rules. The DM can be supported in this task, by information about minimal sets of constraints \( { f_{j} (\textbf{x})\le \alpha_{j} } \) coming from the considered decision rules to be removedin order to get a non-empty part of the Pareto optimal set.

The constraints introduced in Step 8 are maintained in the following iterations of theprocedure, however, they cannot be considered as irreversible. Indeed, the DM can come back to thePareto optimal set considered in one of previous iterations and continue from this point. This isin the spirit of a learning oriented conception of interactive multiobjective optimization,i.?e. it agrees with the idea that the interactive procedure permits the DM to learn about his/herpreferences and about the “shape” of the Pareto optimal set.

Example Illustrating IMO-DRSA in the Context of Multiobjective Optimization

To illustrate the interactive multiobjective optimization procedure based on DRSA, a product mixproblem is considered. There are three products: A, B, C which are produced in quantities denotedby \( x_A, x_B,\text{and } x_C \), respectively. The unit prices of the three products are \( { p_A=20,} {p_B=30,} {p_C=25 } \). The production process involves two machines. The production times of A, B, C on thefirst machine are equal to \( { t_{1A}=5, t_{1B}=8, t_{1C}=10 } \), and on the second machine they areequal to \( { t_{2A}=8,} {t_{2B}=6,} {t_{2C}=2 } \). Two raw materials are used in the production process. Thefirst raw material has a unit cost of 6 and the quantity required for production of one unit of A, B and C is \( { r_ {1A}=1, r_{1B}=2 } \) and \( { r_{1C}=0.75 } \), respectively. The second raw material hasa unit cost of 8 and the quantity required for production of one unit of A, B and C is \( { r_{2A}=0.5, r_{2B}=1 } \) and \( { r_{2C}=0.5 } \), respectively. Moreover, the market cannot absorb a productiongreater than 10, 20 and 10 units for A, B and C, respectively. To decide how much of A, Band C should be produced, the following objectives have to be taken into account:

  • Profit (to be maximized),

  • Time (total production time on two machines – to be minimized),

  • Production of A (to be maximized),

  • Production of B (to be maximized),

  • Production of C (to be maximized),

  • Sales (to be maximized).

Table 11 A sample of Pareto optimal solutions proposed in the first iteration

The above product mix problem can be formulated as the following multiobjective optimizationproblem:

$$\begin{aligned}&\mathrm{Maximize}\\& 20x_A + 30x_B + 25x_C - (1x_A + 2x_B + 0.75x_C)6\\ &\qquad - (0.5x_A + 1x_B + 0.5 x_C)8 \quad [\mathrm{Profit}]\: ,\end{aligned}$$
$$\text{Minimize}\quad 5x_A + 8x_B + 10x_C + 8x_A + 6x_B + 2x_C \quad [\mathrm{Time}]\: ,$$
$$\text{Maximize}\quad x_A \quad [\text{Production of $A$}]\: ,$$
$$\text{Maximize}\quad x_B \quad [\text{Production of $B$}]\: ,$$
$$\text{Maximize}\quad x_C \quad [\text{Production of $C$}]\: ,$$
$$\text{Maximize}\quad 20x_A + 30x_B + 25x_C \quad [\mathrm{Sales}]\: ,$$

subject to:

$$x_A\le 10\:, \quad x_B\le 20\:, \quad x_C\le 10 \\\relax [\text{Market absorption limits}]\:,$$
$$x_A\ge0\:, \quad x_B\ge0\:, \quad x_C\ge0 \\\relax [\text{Non-negativity constraints}]\:.$$

A sample of representative Pareto optimal solutions has been calculated and proposed to theDM. Observe that the considered problem is a Multiple Objective Linear Programming (MOLP) problem,and thus representative Pareto optimal solutions can be calculated using classical linearprogramming looking for the solutions optimizing each one of the considered objectives or fixingall the considered objective functions but one at a satisfying value, and looking for the solutionoptimizing the remaining objective function. The set of representative Pareto optimal solutions isshown in Table 11. Moreover, a set of potentially interesting association rules have beeninduced from the sample and presented to the DM. These rules represent strongly supportedrelationships between attainable values of objective functions. The association rules are thefollowing (between parentheses there are id's of solutions supporting the rule):

  1. 1)

    if Time = 140, then Profit = 180.38 and Sales \( { \le 280.77 \,(s1, s2, s3, s4, s12, s13) } \),

  2. 2)

    if Time = 150, then Profit \( { \le 188.08 } \) and Sales \( { \le 296.15 \,(s1, s2, s3, s4, s5, s6, s7, s8, s9, s12, s13) } \),

  3. 3)

    if \( { x_B \ge 2 } \), then Profit \( { \le209.25 } \) and \( { x_A \le 6 } \) and \( { x_C \le 7.83 \,(s4, s5, s6, s9, s10, s11, s12, s13) } \),

  4. 4)

    if Time = 150, then \( x_B \le 3 \,(s1, s2, s3, s4, s5, s6, s7, s8, s9, s12, s13) \),

  5. 5)

    if Profit \( { \ge 148.38 } \) and Time = 150, then \( x_B \le 2 \,(s1, s2, s3, s5, s7, s8) \),

  6. 6)

    if \( { x_A \ge 5 } \), then Time \( { \ge 150 \,(s5, s6, s8, s9, s10, s11) } \),

  7. 7)

    if Profit \( { \ge 127.38 } \) and \( { x_A \ge 3 } \), then Time \( \ge 130 \,(s4, s5, s6, s8, s9, s10, s11, s12) \),

  8. 8)

    if Time \( { \le 150 } \) and \( { x_B \ge 2 } \), then Profit \( \le 148.38 \,(s4, s5, s6, s9, s12, s13) \),

  9. 9)

    if \( { x_A\ge 3 } \) and \( { x_C \ge 4.08 } \), then Time \( \ge 130 \,(s4, s5, s8, s10, , s11, s12) \),

  10. 10)

    if Sales \( { \ge 265.38 } \), then Time \( \ge 130 \,(s2, s3, s4, s5, s6, s7, s8, s9, s10, s11) \).

Then, the DM has been asked if (s)he was satisfied with one of the proposed Pareto optimalsolutions. Since his/her answer was negative, (s)he was requested to indicate a subset ofrelatively “good” solutions which are indicated in the “Evaluation” column ofTable 11.

Taking into account the ordinal classification of Pareto optimal solutions into “good” and“others”, made by the DM, twelve decision rules have been induced from the lower approximation of“good” solutions. The frequency of the presence of objectives in the premises of the rules givesa first idea of the importance of the considered objectives. These frequencies are the following:

  • Profit: \( { \frac{4}{12} } \),

  • Time: \( { \frac{12}{12} } \),

  • Production of \( { A: \frac{7}{12} } \),

  • Production of \( { B: \frac{4}{12} } \),

  • Production of \( { C: \frac{5}{12} } \),

  • Sales: \( { \frac{5}{12} } \).

The following potentially interesting decision rules were presented to the DM:

  1. 1)

    if Profit = 140.5 and Time = 150 and \( { x_B \ge2 } \), then product mix is good (\( { s4,s5,s9 } \)),

  2. 2)

    if Time = 140 and \( { x_A \ge 1.538 } \) and \( { x_C \ge 10 } \), then product mix is good (s3),

  3. 3)

    if Time \( { \le 150 } \) and \( { x_B \ge 2 } \) and \( { x_C \ge 4.75 } \), then product mix is good (s4, s5),

  4. 4)

    if Time \( { \le 140 } \) and Sales \( { \ge 272.9167 } \), then product mix is good (s3, s4),

  5. 5)

    if Time \( { \le 150 }\) and \( { x_B \ge 2 }\) and \( { x_C \ge 3.67 }\) and Sales \( { \ge271.67 }\), then product mix is good (s4, s5, s9).

Among these decision rules, the DM has selected rule 1) as the most adequate to his/herpreferences. This rule permits to define the following constraints reducing the feasible region ofthe production mix problem:

  • \( 20x_A + 30x_B + 25x_C - (x_A + 2x_B + 0.75x_C)6- (0.5x_A + x_B + 0.5x_C)8 \ge 140.5 \) [Profit \( { \ge 140.5 } \)],

  • \( { 5x_A+8x_B +10x_C+ 8x_A+ 6x_B+2x_C \le 150 } \) [Time \( { \le 150 } \)],

  • \( { x_B \ge 2 } \) [Production of \( { B \ge 2 } \)].

These constraints have been considered together with the original constraints for the productionmix problem, and a new sample of representative Pareto optimal solutions shown inTable 12 have been calculated and presented to the DM, together with the followingpotentially interesting association rules:

  1. 1')

    if Time = 140, then Profit = 174 and \( { x_C \le9.33 } \) and Sales \( { \le 293.33 } \) (s5', s6', s7', s8', s9', s10', s11', s12'),

  2. 2')

    if \( { x_A\ge } \) 2, then \( { x_B \le 3 } \) and Sales \( \le 300.83 \) (s2', s3', s4', s6', s7', s9'),

  3. 3')

    if \( { x_A \ge 2 } \), then Profit \( { \le 172 } \) and \( { x_C \le 8 } \) (s2', s3', s4', s6', s7', s9'),

  4. 4')

    if Time = 140, then \( { x_A \le 2 } \) and \( { x_B \le 3 } \) (s5', s6', s7', s8', s9', s10', s11', s12'),

  5. 5')

    if Profit = 158.25, then \( { x_A \le 2 } \) (s1', s3', s4', s5', s6', s8'),

  6. 6')

    if \( { x_A \ge 2 } \), then Time \( { \ge 130 } \) (s2', s3', s4', s6', s7', s9'),

  7. 7')

    if \( { x_C \ge 7.17 } \), then \( { x_A \le 2 } \) and \( { x_B \le 2 } \) (s1', s3', s5', s6', s8', s10'),

  8. 8')

    if \( { x_C \ge 6 } \), then \( { x_A \le 2 } \) and \( { x_B \le 3 } \) (s1', s3', s4', s5', s6', s7', s8', s9', s10', s11', s12'),

  9. 9')

    if \( { x_C \ge 7 } \), then Time \( { \ge 125 } \) and \( { x_B \le 2 } \) (s1', s3', s5', s6', s8', s10', s11'),

  10. 10')

    if Sales \( { \ge 280 } \), then Time \( { \ge 140 } \) and \( { x_B \le 3 } \) (s1', s2', s3', s4', s5', s7'),

  11. 11')

    if Sales \( { \ge 279.17 } \), then Time \( { \ge 140 } \) (s1', s2', s3', s4', s5', s6', s7'),

  12. 12')

    if Sales \( { \ge 272 } \), then Time \( { \ge 130 } \) (s1', s2', s3', s4', s5', s6', s7', s8').

The DM has been asked again if (s)he was satisfied with one of the proposed Pareto optimalsolutions. Since his/her answer was negative, (s)he was requested again to indicate a subset ofrelatively “good” solutions, which are indicated in the “Evaluation” column ofTable 12.

Table 12 A sample of Pareto optimal solutions proposed in the second iteration

Taking into account the ordinal classification of Pareto optimal solutions into “good” and“others”, made by the DM, eight decision rules have been induced from the lower approximation of“good” solutions. The frequencies of the presence of objectives in the premises of the rules arethe following:

  • Profit: \( { \frac{2}{8} } \),

  • Time: \( { \frac{1}{8} } \),

  • Production of \( { A\colon \frac{5}{8} } \),

  • Production of \( { B\colon \frac{3}{8} } \),

  • Production of \( { C\colon \frac{3}{8} } \),

  • Sales: \( { \frac{2}{8} } \).

The following potentially interesting decision rules were presented to the DM:

  1. 1)

    if Time \( { \le 125 } \) and \( { x_A \ge 1 } \), then product mix is good (\( { s11^{\prime},s12^{\prime} } \)),

  2. 2)

    if \( { x_A \ge 1 } \) and \( { x_C \ge 7 } \), then product mix is good (\( { s3^{\prime},s6^{\prime},s11^{\prime} } \)),

  3. 3)

    if \( { x_A \ge 1.5 } \) and \( { x_C \ge 6.46 } \), then product mix is good (\( { s3^{\prime},s4^{\prime},s6^{\prime},s12^{\prime} } \)),

  4. 4)

    if Profit = 158.25 and \( { x_A \ge 2 } \), then product mix is good (\( { s3^{\prime},s4^{\prime},s6^{\prime} } \)),

  5. 5)

    if \( { x_A \ge 2 } \) and Sales \( { \ge 300 } \), then product mix is good (\( { s3^{\prime},s4^{\prime} } \)).

Among these decision rules, the DM has selected rule 4) as the most adequate to his/herpreferences. This rule permits to define the following constraints reducing the Pareto optimal setof the production mix problem:

  • \( 20x_A + 30x_B + 25x_C - (x_A + 2x_B + 0.75x_C)6- (0.5x_A + x_B+ 0.5x_C)8 \ge 158.25 \) [Profit \( { \ge 158.25 } \)],

  • \( { x_A \ge 2 } \) [Production of \( { A \ge 2 } \)].

Let us observe that the first constraint is just strengthening an analogous constraint introducedin the first iteration (Profit \( { \ge 140.5 } \)).

Considering the new set of constraints, a new sample of representative Pareto optimal solutions shown in Table 13 has been calculated and presented to the DM, together with the following potentially interesting association rules:

  1. 1?)

    if Time \( { \le 145 } \), then \( { x_A \le 2 } \) and \( { x_B \le 2.74 } \) and Sales \( { \le 290.2 \,(s2^{\prime\prime}, s3^{\prime\prime}, s4^{\prime\prime}) } \),

  2. 2?)

    if \( { x_C\ge 6.92 } \), then \( { x_A \le 3 } \) and \( { x_B \le 2 } \) and Sales \( { \le 292.92 \,(s3^{\prime\prime}, s4^{\prime\prime}, s5^{\prime\prime}) } \),

  3. 3?)

    if Time \( { \le 145 } \), then Profit \( { \le 165.13 } \) and \( { x_A \le 2 } \) and \( { x_C \le 7.58 \,(s2^{\prime\prime}, s3^{\prime\prime}, s4^{\prime\prime}) } \),

  4. 4?)

    if \( { x_C\ge 6.72 } \), then \( { x_B \le 2.74 \,(s2^{\prime\prime}, s3^{\prime\prime}, s4^{\prime\prime}, s5^{\prime\prime}) } \),

  5. 5?)

    if Sales \( { \ge 289.58 } \), then Profit \( { \le 165.13 } \) and Time \( { \ge 145 } \) and\( { x_C \le7.58 \,(s1^{\prime\prime}, s2^{\prime\prime}, s3^{\prime\prime}, s5^{\prime\prime}) } \).

The DM has been asked again if (s)he was satisfied with one of the presented Pareto optimalsolutions shown in Table 13 and this time (s)he declared that solution\( { s3^{\prime\prime} } \) is satisfactory for him/her. This ends the interactive procedure.

Table 13 A sample of Pareto optimal solutions proposed in the third iteration

Characteristics of the IMO-DRSA

The interactive procedure presented in Sect.“ Interactive Multiobjective Optimization Using DRSA(IMO-DRSA)” can be analyzed from the point of view of input and output information. As to the input, the DM gives preference information byanswering easy questions related to ordinal classification of some representative solutions into two classes (“good” and“others”). Very often, in multiple criteria decision analysis, in general, and in interactive multiobjectiveoptimization , in particular, the preference information has to be given in terms of preference model parameters, such asimportance weights, substitution rates and various thresholds (see [6] for the Multiple Attribute UtilityTheory and [1,5,34,46] for outranking methods; for some well-known interactivemultiobjective optimization methods requiring preference model parameters, see the Geoffrion–Dyer–Feinbergmethod [9], the method of Zionts and Wallenius [60,61] and the Interactive Surrogate Worth Tradeoffmethod [2,3] requiring information in terms of marginalrates of substitution, the reference point method [58] requiring a reference point and weights toformulate an achievement scalarizing function, the Light Beam Search method [33] requiring information interms of weights and indifference, preference and veto thresholds, being typical parameters of ELECTRE methods). Eliciting such information requiresa significant cognitive effort on the part of the DM. It is generally acknowledged that people often prefer to make exemplary decisions and cannot alwaysexplain them in terms of specific parameters. For this reason, the idea of inferring preference models from exemplary decisions provided by the DM is veryattractive. The output result of the analysis is the model of preferences in terms of “if …, then …” decision rules which is used to reduce the Pareto optimal set iteratively, until the DM selectsa satisfactory solution. The decision rule preference model is very convenient for decision support, because it gives argumentation for preferences ina logical form, which is intelligible for the DM, and identifies the Pareto optimal solutions supporting each particular decision rule. This is veryuseful for a critical revision of the original ordinal classification of representative solutions into the two classes of “good” and“others”. Indeed, decision rule preference model speaks the same language of the DM without any recourse to technical terms, like utility,tradeoffs, scalarizing functions and so on.

All this implies that IMO-DRSA has a transparent feedback organized in a learning orientedperspective, which permits to consider this procedure as a “glass box”, contrary to the “blackbox” characteristic of many procedures giving final result without any clear explanation. Notethat with the proposed procedure, the DM learns about the shape of the Pareto optimal set using theassociation rules. They represent relationships between attainable values of objective functions onthe Pareto optimal set in logical and very natural statements. The information given by associationrules is as intelligible as the decision rule preference model, since they speak the language ofthe DM and permit him/her to identify the Pareto optimal solutions supporting each particularassociation rule.

Thus, decision rules and association rules give an explanation and a justification of the finaldecision, that does not result from a mechanical application of a certain technical method, butrather from a mature conclusion of a decision process basedon active intervention of the DM.

Observe, finally, that the decision rules representing preferences and the association rulesdescribing the Pareto optimal set are based on ordinal properties of objective functionsonly. Differently from methods involving some scalarization (almost all existing interactivemethods), in any step the proposed procedure does not aggregate the objectives into a single value,avoiding operations (such as averaging, weighted sum, different types of distance, achievementscalarization) which are always arbitrary to some extent. Observe that one could use a method basedon a scalarization to generate the representative set of Pareto optimal solutions, nevertheless,the decision rule approach would continue to be based on ordinal properties of objective functionsonly, because the dialogue stage of the method operates on ordinal comparisons only. In theproposed method, the DM gets clear arguments for his/her decision in terms of “if …, then …” decision rules and the verification if a proposed solution satisfies thesedecision rules is particularly easy. This is not the case of interactive multiobjectiveoptimization methods based on scalarization. For example, in the methods using an achievementscalarization function, it is not evident what does it mean for a solution to be “close” to thereference point. How to justify the choice of the weights used in the achievement function? What istheir interpretation? Observe, instead, that the method proposed in this chapter operates on datausing ordinal comparisons which would not be affected by any increasing monotonic transformation ofscales, and this ensures the meaningfulness of results from the point of view of measurement theory(see e.?g. [42]).

With respect to computational aspects of the method, notice that the decision rules can becalculated efficiently in few seconds only using the algorithms presented in [17,20]. When thenumber of objective functions is not too large to be effectively controlled by the DM (let us sayseven plus or minus two, as suggested by Miller [37]), then the decision rules can becalculated in a fraction of one second. In any case, the computational effort grows exponentiallywith the number of objective functions, but not with respect to the number of considered Paretooptimal solutions, which can increase with no particularly negative consequence on calculationtime.

Conclusions

Rough set theory is a mathematical tool for dealing with granularity of information and possibleinconsistencies in the description of objects. Considering this description as an input data abouta decision problem, the rough set approach permits to structure this description into lower andupper approximations, corresponding to certain and possible knowledge about the problem. Inductionalgorithms run on these approximations discover, in turn, certain and possible decision rules thatfacilitate an understanding of the DM's preferences, and that enable a recommendation concordantwith these preferences.

The original version of the rough set approach, based on indiscernibility or similarity relation, and typical induction algorithms considered withinmachine learning, data mining and knowledge discovery, deal with data describing problems of taxonomy-type classification, i.?e. problems whereneither the attributes describing the objects, nor the classes to which the objects are assigned, are ordered. On the other hand, multiple criteria decisionmaking deals with problems where descriptions (evaluations) of objects by means of attributes (criteria), as well as decisions in classification, choice andranking problems, are ordered. Moreover, in data describing multiple criteria decision making, there exist a monotonic relationship between conditions anddecisions, like “the bigger the house, the more expensive it is”. The generalization of the rough set approach and of the induction algorithmsabout problems in which order properties and monotonic relationships are important gave birth to the Dominance-based Rough Set Approach(DRSA) which made a breakthrough in scientific decision support.

The main features of DRSA are the following:

  • preference information necessary to deal with any multiple criteria decision problem, orwith decision under uncertainty, is asked to the DM just in terms of exemplary decisions,

  • the rough set analysis of preference information supplies some useful elements of knowledgeabout the decision situation; these are: the relevance of attributes or criteria, the minimalsubsets of attributes or criteria (reducts) conveying the relevant knowledge contained in theexemplary decisions, the set of indispensable attributes or criteria (core),

  • DRSA can deal with preference information concerning taxonomy-type classification, ordinalclassification, choice, ranking, multiobjective optimization and decision under uncertainty,

  • the preference model induced from preference information structured by DRSA is expressed ina natural and comprehensible language of “if …, then …”decision rules,

  • suitable procedures have been proposed to exploit the results of application of thedecision rule preference model on a set of objects or pairs of objects in order to workouta recommendation,

  • no prior discretization of quantitative condition attributes or criteria is necessary,

  • heterogeneous information (qualitative and quantitative, ordered and non-ordered, nominaland ordinal, quantitative and numerical non-quantitative scales of preferences) can be processedwithin DRSA,

  • the proposed methodology fulfills some desirable properties for both rough set approach (theapproximated sets include lower approximation and are included in upper approximation, and thecomplementarity property is satisfied), and for multiple criteria decision analysis (the decisionrule preference model is formally equivalent to the non-additive, non-transitive andnon-complete conjoint measurement model, and to a more general model for preferences defined onall kinds of scales),

  • the decision rule preference model resulting from DRSA is more general than all existingmodels of conjoint measurement, due to its capacity of handling inconsistent preferences (a newmodel of conjoint measurement is formally equivalent to the decision rule preference model handlinginconsistencies),

  • the decision rule preference model fulfills the postulate of transparency andinterpretability of preference models in decision support; each decision rule can be clearlyidentified with those parts of the preference information (decision examples) which support therule; the rules inform the DM in a quasi-natural language about the relationships betweenconditions and decisions; in this way, the rules permit traceability of the decision supportprocess and give understandable justifications for the decision to be made,

  • the proposed methodology is based on elementary concepts and mathematical tools (sets andset operations, binary relations), without recourse to any algebraic or analytical structures.

The decision rules entering the preference model have a special syntax which involves partialevaluation profiles and dominance relations on these profiles. The traditional preference models,which are the utility function and the outranking relation, can be represented in terms ofequivalent decision rules. The clarity of the rule representation of preferences enables one to seethe limits of these aggregation functions. Several studies [22,23,26,50] presented anaxiomatic characterization of all three kinds of preference models in terms of conjoint measurementtheory and in terms of a set of decision rules. The given axioms of “cancellation property” typeare the weakest possible. In comparison to other studies on the characterization of aggregationfunctions, these axioms do not require any preliminary assumptions about the scales ofcriteria. A side-result of these investigations is that the decision rule preference model is themost general among the known aggregation functions.

Future Directions

The article shows that the Dominance-based Rough Set Approach is a very powerful tool for decisionanalysis and support. Its potential goes, however, beyond the theoretical frame considered in thisarticle. There are many possibilities of applying DRSA to real life problems. The non-exhaustivelist of potential applications includes:

Decision support in medicine::

In this area there are already many interesting applications(see, e.?g., [35,36,40,59]), however, they exploit the classical rough set approach;applications requiring DRSA, which handle ordered value sets of medical signs, as well as monotonicrelationship between the value of signs and the degree of gravity of a disease, are in progress;

Customer satisfaction survey::

Theoretical foundations for application of DRSA in this fieldare available in [30], however, a fully documented application is still missing;

Bankruptcy risk evaluation::

This is a field of many potential applications, as can be seenfrom promising results reported, e.?g. in [12,53,54], however, a wider comparative studyinvolving real data sets is needed;

Operational research problems, such as location, routing, scheduling or inventory management::

These are problems formulated either in terms of classification of feasible solutions(see, e.?g., [11]), or in terms of interactive multiobjective optimization, for which theIMO-DRSA [32] procedure is suitable;

Finance::

This is a domain where DRSA for decision under uncertainty has to be combined withinteractive multiobjective optimization using IMO-DRSA; some promising results going in thisdirection have been presented in [31];

Ecology::

Assessment of the impact of human activity on the ecosystem is a challengingproblem for which the presented methodology is suitable; the up to date applications are based onthe classical rough set concept (see, e.?g., [7,43]), however, it seems that DRSA handlingordinal data has a greater potential in this field.