Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Resulting from numerous empirical studies are multidimensional data, say x i , i = 1,…,n, x i  = {x ik }, with i’s denoting observations, or objects, and k = 1,…, m, denoting the descriptive variables, features, or criteria. We would often like to put these data into a ranking type of a structure, i.e. to order the items i. This means, implicitly, obtaining a sequence of “ranks” {o i }, corresponding to i = 1,…, n, where o i are natural numbers ranging from 1 to at most n. (We assume that x ik are the values of measurements regarding certain criteria, numbered k, kK = {1,…, m}, and that in all cases “the more, the better”. None of these assumptions limits the generality of the considerations.) Yet, the sheer multiplicity of dimensions prohibits, as a rule, a straightforward ordering of the data items. This is the obvious consequence of the situations, in which x ik  > x jk for some i, j and a definite subset of kK, while x ik < x jk for the same i, j and another subset of k′ ∈ K.

Thus, we very often stop at the result of analysis, being a poset, encompassing all the situations, where x ik  ≥ x jk for all kK, and leaving out all the other ones, its illustration being constituted by the respective Hasse diagram.

There are—indeed numerous—situations, though, in which we would like to go beyond the poset “skeleton” and endow it with “flesh”, up to construction of a complete order, perhaps with some additional characterisation. (A feasible alternative might be a kind of information, resulting from the poset processing, that is effectively “close enough” to the actual ranking.)

We argue here that such extension of a poset may be legitimate, and a shorthand analysis is provided of why and how one could go about it, based on the essential properties of the analytical tasks in general.

2 Why Not?

There exist serious reservations, concerning going beyond the poset structure as a result of the ordering-oriented study for the given set of data items {x i }.

The primary one is that the empirical data do not contain any other information than that corresponding to the poset obtained. If we go further away from this point, first of all by adding (i,j) edges that do not exist on the original Hasse diagram, and especially toward the complete order, then we are unfaithful to the data. There is, actually, an often justified suspicion of manipulation, motivated by “political” interests, behind the operations, leading from the original data-based poset to some complete order. This suspicion may, of course, be well founded.

The second reservation refers to the fact that while forming a poset from the initial {x i } data is straightforward and unambiguous, virtually all approaches meant to go beyond it either involve subjectivity, or have to refer to data that may have little to do with the original empirical data used in the study.

It is largely in view of these two types of reservations that the technique of counting the consistent linear extensions for a poset is advocated, which, even if still arbitrary, appears to be a possibly neutral operation, based only on the relation between the given poset and the structure of the entire lattice.

3 But Perhaps…

On the other hand, though, there are quite obvious, and, at that, quite numerous and diverse, reasons for insisting on complementing the posets to completeness, or at least somehow transforming it in a definite direction and manner. These result from the considerations, associated with the aspects, roughly illustrated in Fig. 1.

Fig. 1
figure 1

The environment of the studies, leading to data structures, including posets

3.1 The Purpose and the Utility

First is the sheer utility: it may be so that the very objective of the endeavour, from which the data originate, includes the determination of a (possibly) complete order, for quite practical purposes. Lack of such a structure may mean a failure and a loss in economic or social terms.Footnote 1

This argument involves a much broader background, involving such notions as: a problem, an image (model, theory, perception) of the problem, the need to deal with it (to resolve it), the need to cognise it (to identify its structure and mechanism), the need to apply definite means to resolve the problem, based on the cognition of the problem, on the purpose (the objective), and the instruments we operate. All these enter the classical decision-making loop of Fig. 2.

Fig. 2
figure 2

The classical decision-making loop, in which the notions referred to appear

If our purpose does not involve (imply) a decision or a policy, an action regarding the problem, then our cognition may be the last step in the procedure, and we might not need anything more than a poset, in case we compare some objects or states, and especially then, when not so much the values x ik are important, as sheer binary relations between them.

This last remark is quite telling. A simplest pertinent situation is outlined in the frame of Example 1 (which will be continued further on, through addition of consecutive aspects).

So, if there is a purpose, requiring action, based on a decision or policy, not only ordering may be required, but also measurement of quantities x i and their transformations (mappings). An illustration of the exposure to a situation with an explicit structure of purpose and instruments is provided below, with a hint to an important proposition.

Assume we deal with two dimensions of “wealth”: k = 1—income, and k = 2—“usable wealth”, meaning lump assessment of the value of assets, considering mainly their utility and only secondarily their market value (e.g. a car as a transport means, not as a certain saleable good; ownership of a dwelling with its equipment being the primary instance). Even though there is a high correlation between the two dimensions, there are numerous cases when households with lower incomes dispose of an ampler “usable wealth”.Footnote 2 This is especially important when such situations occur close to the border of the derived deprivation function, D(.), namely near D(x) = 0, whether we speak of x .1 or of x .2. If the authorities dispose of only one instrument, the general subsidy, then a single “reaction” (decision) function S(D 1,D 2) has to be developed, meaning, in fact, appropriate weighing (implicit or explicit) of the two dimensions. Now, assume that the authorities can deploy a second instrument, say, a non-transferable allowance for housing costs. We deal with S 1(D 1,D 2), S 2(D 1,D 2). The fundamental question is: how are the two pairs of dimensions interrelated? Most conveniently, the dimensions k would correspond directly to the instruments. If such a correspondence existed, even in the form of a demonstrable correlation, then the task of the authorities would be straightforward, and no additional analysis, beyond (two) unidimensional rankings, would be necessary.

Thus, it is obvious that when we dispose of more than one instrument, then there exists a room for finding plausible structures other than a single ranking. This is the case in many situations, where, in fact, not just a single ordering is required, but, rather, some structure, obtained from the poset, which corresponds, on the one hand, to the “measurements” made, and on the other—to the ultimate purpose of the exercise, which need not be unidimensional.

3.2 The Data Themselves

Then, the second broad motivation to go beyond the raw poset obtained from the data collected comes from the data themselves, as if in a paradox. And there are, indeed, several aspects or sources, for this kind of motivation:

  1. 1.

    The data which served to set up the poset are often, if not always, charged with definite uncertainty, coming from various origins. A trivial illustration for this fact is provided in Table 1.

    Table 1 Some examples of the sources and character of uncertainty in data specification
  2. 2.

    There exists a definite “model” or “theory”, which was, at least to some extent, the background for the study considered, the “model” or “theory” resulting, at the minimum level, in (a) the selection and specification of criteria (variables), k = 1,…,m, and (b) the ways in which they are measured, or “evaluated”. Thus, these criteria or variables are not some jack-out-of-the-box entities, about which we know nothing, and which cannot be subject to any operations—analytical or manipulative. They are an internally consistent fragment of perception of the reality, on which we might wish to act, basing upon the results of the study at hand.

  3. 3.

    Accompanying the “model” or “theory” there are empirical data, which are parallel to and intertwined with those having formed the basis for formulating the model or theory, but also for undertaking the study under consideration. These data offer a “logic” on their own, through statistical or otherwise relations between them, but, as well, through their characteristics in terms of uncertainty. Here, also, belongs the issue, which is very often formulated as one of main reproaches against the “pure poset” approach—namely the absence of scale of values of the variables (criteria), whenever they are not binary or strictly nominal.

Figure 3 shows the notions related both to data themselves (metadata, complementary data, etc.) and to the broader environment, partly introduced in connection with Fig. 2. The difference between Figs. 2 and 3 consists also in the fact that Fig. 3 presents more of the actual “data processing” aspect than of the general “thought framework”, oriented at the problem and the potential solution to it (a “policy” or a “decision”), with respect to which the poset or another structure, resulting from the study, constitutes only an instrument.

Fig. 3
figure 3

Scheme of interrelations between the components of the study, to be considered when analysing the ordering of observations

4 So What?

None of the above listed aspects can be simply shrugged away. Depending upon the case, these aspects intervene in various manners and with various importance. This chapter is not meant to provide any definite methodological proposals, nor solutions—the problems touched upon differ so widely in their structure and character that dozens of theories and methodologies may not suffice to encompass all the situations, the technical variants put apart. Thus, we shall concentrate on some types of situations, and forward comments related to them.

In the light of the above, it is obvious that an approach, that tries to go beyond the “pure poset” result, must first of all account for the aspects mentioned, in order to avoid the trap of technical correctness that overlooks the actual issues for the sake of such technical correctness.

4.1 The Purpose and the Policy Instruments

As said, if our purpose is just “to know”, and there is no possibility of developing and verifying a true-to-life model of the phenomenon, obtaining a poset may indeed be the justified terminal result. The situation may turn out to be similarly straightforward, although leading to different results (e.g. single dimension orders), when we explicitly account for more than one “resulting dimensions” (instruments to be applied). For this, though, additional assumptions have to be satisfied, first of all concerning relations between the criteria of assessment and the instruments envisaged. There may exist cases, when the presence of multiple instruments, which would correspond to potential (or “required”) multiple rankings, even if not leading to ultimate simplification in the form of single-dimensional complete orders, may allow for “disentangling” of the poset, in the sense of obtaining more than one structure, each of the resulting structures being closer to the complete order than the original poset, without any additional operations. Yet, there might also exist situations, in which the relations mentioned lead to more complex issues and potential structures, and the initial problem remains unsolved.

The analysis of these potential situations is not only beyond the scope of the present note, but may also turn out to be highly complicated, at least for more general cases. Still, the possibility of simplification should be kept in mind when designing the respective study and when proceeding with analysis of data. This issue is quite closely associated with the subsequent ones, which refer to the consideration of a “model” or “theory”, standing behind the study.

4.2 The “Model” Behind the Study

Whenever a study, including data collection, is undertaken, there must exist some concept, underlying the very launching of the study, as well as the potential “decision” or “policy”, possibly together with the instruments considered. While this is obvious, the scope and the degree of precision of such models range extremely broadly: from situations, where very little is known or assumed, up to those, where definite, well-grounded hypotheses are being verified against a broader knowledge of the respective domain. Yet, even in the former situations, it cannot be accepted that our entire knowledge consists in saying “there is a problem”: this knowledge, actually, led to the specification of the objects, variables, methods of measurement, etc. It also contributed, in a vast majority of cases, to some evaluation framework, that is—variables turned into criteria, along with respective scales.

Even if we neglect the models involving definite hypotheses (e.g. “the poor can be classified into two classes, class A, for which…, and class B, …”),Footnote 3 and the aspect of instruments/policies, there must exist some “minimum specification”, originating from the most primitive perception of a given problem. This minimum specification almost certainly involves some concept either directly involving comparability or leading to a possibility of comparison. Let us consider this on an example from Table 1. Assume, namely, that we consider the case of batteries, and we deal with four, otherwise quite the same, batteries, as exemplified in Table 2.

Table 2 An illustrative example: four kinds of batteries

It is obvious that we are not capable of simply ordering these objects, but, on the other hand, we can formulate and ask questions, helping to come at least closer to the ultimate linear order. We should keep in mind that we deal with one-at-a-time situation: it would be different, if we were buying batteries sequentially. Then,the optimum decisions could take the form of a closed-loop strategy, based on the results from the preceding decision, like in Fig. 2 (with the incremental objective function, based on cost of duration over time).

The types of questions, and potential answers, make a part of usual decision-making process, and are, indeed, applied both in the proper context of the poset-based analysis (like in the proposal from Tagliabue et~al. 2015) and, more generally, in the context of multicriteria decision-making (like in Kaliszewski and Miroforidis 2014, or Kaliszewski et~al. 2014).

4.3 The Statistical Features of the Data

It is also frequent that an important aspect of the study is constituted by the statistical—in the popular sense—characterisation of the objects to be ordered. This usually means that we deal with the numbers of observations, or occurrences, of a given object. This has an obvious relation to the model or theory that we may have. We often interpret such statistics as some reflections of probabilities, resulting from the “inner” working of the process or system. This aspect, again, can by no means be overlooked, even if our goal is just to (somehow) order the objects.

Thus, if the statistics reflect some reality, inherent to the system at hand, and there are significant differences among the numbers of occurrences, then not all linear orderings of the objects considered should be taken as equally probable. Even though not a straightforward exercise (additional assumptions have to be made), the statistics ought to be used to determine the probabilities (“weights”) of the particular linear extensions, in the approaches as described and analysed in Lerche et~al. (2003), Patil and Taillie (2005), or De Loof et~al. (2008). The same applies to the counting approach.

Thus, it is not so that we would introduce the “weights” by some subjectively designed back door—they are a direct reflection of the data, coming from the same study, having the very same degree of “legitimacy”, and a similar, or even higher, level of reliability (up to a truly well-based statistical analysis of the distribution of particular “paths”—extensions—through the entire set of possible states).

Let us also indicate that the “statistical” aspect to a problem or study may entail a plethora of different problem structures, calling for entirely different approaches, even if in virtually each of them a poset structure might be obtained. So, in particular, we may deal with unique, separate cases (like, e.g. in characterisation of a set of chemical compounds, or the set of tender offers), or with a sample, and possibly even an entire population, in which certain “states” appear, their numbers of appearances often widely differing, while other ones do not appear, or have very low (“exceptional”) frequencies. As an illustration, we give a realistic, though quite stylised, example of a scholarly classification at lower grades of the primary school in Poland.Footnote 4

Thus, assume pupils are classified with respect to four broadly conceived domains: 1. Behaviour, social attitude, cooperation; 2. Humanities; 3. Sciences; 4. Physical and technical exercises. The assessments are made on the 5-point scale: 5—very good, 4—good, 3—sufficient, 2—insufficient, 1—very bad. In a class of 25 pupils, we may have the “statistics” of assessments as in Table 3.

Table 3 An example of feasible pupil assessments for a class of 25 pupils (grades from 1 to 5 in four domains: numbers of pupils)

First, there are far less objects than possible states (25 vs. 625, or, rather, actually, in terms of states, 13 vs. 625, see Fig. 4). Then, most importantly, there is a clear “statistical” nature to such exercise. Actually, if we have, say, 12 binary variables and a sample or population of, say, 10,000 items, there is definitely a high probability that some of the possible 212 states shall be “empty”, or “close to empty”, while other ones might group quite a number of items.

Fig. 4
figure 4

Illustration of the partial order for the data of Table 3

There are several regularities, well known for Polish teachers, parents and children, as well, appearing in the data. One of them is the “shortness” or “flatness” of the distribution. Another, known also in other countries, is the relation of marks for humanities and sciences. Here, for 14 pupils the assessments are equal, for three the marks for humanities are lower than for sciences, and for eight—vice versa. A hint for linear extensions, indeed.

Actually, in order to specify all the linear extensions in this case, we need only altogether 22 evaluation “states”, as listed in Table 4 below.

Table 4 Assessment “states” from Table 3, complemented with the missing ones for full extensions

Note that both the “a priori” model or theory and the direct implications of the “statistics” of the empirical results constitute a different basis for potential processing of the results, including poset extensions, than those mentioned in regard to the mutual or “absolute” importance of criteria, or variables in Sect. 4.2, and also those from Bruggemann and Carlsen (2015).

4.4 Representations of Uncertainty: Just One Hint

Another important aspect of the perception outlined is that of direct representation of uncertainty. This representation might be a statistical or probabilistic, given sufficient knowledge and adequate sample or population. Yet, in the situation we deal here with we can hardly afford such assumptions (were this not the case, we would not have to face the issue that we are trying to resolve). In such cases one of the ways out is to use fuzzy set-based representation that we shall illustrate for the example of the previous section.

Thus, we can assume that the actual meaning of the assessment marks for humanities, sciences and exercises is as shown in Fig. 5, while for behaviour and cooperation—as in Fig. 6. The meaning of these definitions of marks is that “there is no precise statement of a mark X corresponding to the level of quality = X, for at least two reasons: first, the marks are strongly discrete, while the actual assessment concerns a continuum, second: various aspects (criteria) of assessment (effort, diligence, skill, knowledge,…); so, the actual mark X corresponds to a fuzzy number X*”. We assume consistency in these quasi-definitions (e.g. for two neighbouring “marks” only one can be equal to 1 at any point) although this may not be necessary at all.

Fig. 5
figure 5

Potential fuzzy “definitions” of the marks for humanities, sciences, and exercises

Fig. 6
figure 6

Potential fuzzy “definitions” of the marks for behaviour and cooperation

What may be the consequences of such a character of data for the ordering? Comparing fuzzy numbers is possible (see, e.g. Brunelli and Mezei 2013) even if—like many operations over fuzzy sets—quite heavily loaded with arbitrariness. So, it would be possible, in principle, to employ such data in both linear extensions and the counting approach (the issue of scalability left apart), but either under very stringent conditions (e.g. all the fuzzy sets, representing numbers, have the same form, and all of the comparisons take, therefore, standard forms) or in a very simple manner (e.g. fractional weights).

On the other hand, though, it would be quite feasible to aggregate such numbers (e.g. like in clustering), and also make a straightforward ordering. The operations on fuzzy sets, representing variable values, and their collections, representing objects, would have to involve a high degree of arbitrariness, but such a possibility exists, and is indeed being made use of.

4.5 Back to the Essential Issue

Thus, the basic question arises: How much do we, in fact, sacrifice, either way? How much is worth the loss of quantitative measurements? disregard of the known or hypothesised interrelations? etc. And vice versa: How big is the risk of introducing arbitrariness into the “raw” result, and can we measure the degree of arbitrariness? And so on.

Some of the pertinent questions shall remain unanswered, or answers would only be very superficial. This means simply trying out various techniques, allowing for the possibly effective extension of the poset towards the complete order, and, eventually, those that might lead straight towards the complete order from the initial data, while violating as little as possible the formal requirements, and taking into account to the maximum extent the data available and the associated knowledge. This means, also, that we should try to find the answers to such questions as:

  • Can we establish conditions for equivalence between the various procedures, related to poset, based on the output from definite data, these procedures aiming at obtaining the complete order?

  • If not, what would be the results of comparison of the obtained complete orders, including the properties of the respective structures, as related to the initial poset?

This involves, in particular, the issue of:

  • Potential parallelism and differences between the counting of linear extensions consistent with the poset (assumption of equal probabilities!) and the solution to the relational programming problem (see Owsiński 2011), the former possibly enriched with the assessment of different probabilities of the extensions, the latter being a way to obtain the Kemeny-median-like structure from the data.

5 A Real-Life Example Without Constructive Conclusions

5.1 The Project Outline

In 2011–2014 a modelling project was carried out at the Systems Research Institute, Polish Academy of Sciences, in collaboration with the Institute of Geography and Spatial Organization of the Academy, on the development and implementation of forecasting models for the variables, characterising the socio-economic situation of the capital province of Masovia (for an ampler description, see Czapiewski et~al. 2016).

The list of modelling domains was specified by the commissioning institution, Masovian Bureau of Regional Planning. Altogether some 70 indicators were developed, along with the respective models—usually relatively simple econometric models. All this had to be done for each of the 314 municipalities of the province and for 15 years. The biggest was, of course, the demographic model, producing at each run some half a million numbers (municipalities, age groups, sexes, etc.) (Table 5).

Table 5 The list of domains and indicators of the project illustrated

For a number of domains, model developers were asked by the commissioning agency to provide the “synthetic indicator”, given the presence of, say, three to five indicators, oriented at definite phenomena. This was done in some cases, but, generally, the reaction of the respective model authors was negative (“how do we put variables X and Y together, if they express completely different phenomena/processes?”). Yet, it must be admitted that these same authors made often quite arbitrary choices when selecting variables and developing their particular models. The question would therefore be quite justified: how does one compare these two kinds of arbitrariness, and if one is to be accepted, why the other one should not be?

5.2 An Illustrative Case

In one of such several cases of “need to develop a synthetic indicator”, in domain no. 9: Technical infrastructure, there were three variables, corresponding to the shares of inhabitants of the municipalities (in %), with access to water supply system, sewage system, and water treatment plant. So, all values ranged between 0 and 100. The author of the model in question declined providing an integrative indicator of the “state of technical infrastructure”.

Yet, it was easily shown that in the population of 314 municipalities of the province there are some—otherwise absolutely obvious—regularities, such as:

  • Highly stable and mostly significant correlations between all three variables.

  • These correlations needing a correction in view of the 100 % upper bound on values.

  • High correlation with population density.

  • In view of the latter, the “synthetic indicator” had to be corrected for density/urban character.

Altogether, the conclusions and the results from this microstudy were as follows:

  1. 1.

    The three variables could be replaced relatively safely by one (in just few cases doubts might have arisen), most handily, their average.

  2. 2.

    The proper “quality indicator” was taken as a function of the former, with an experimentally established divisor, based on population density and population number.

We never consulted this approach with the author of the model, who did not consent to participate in the exercise.

6 Some Conclusions

Figure 7 subsumes the conclusions, according to which the exercise in ordering of objects is (almost) always just a piece in a broader process or system, and should be viewed, and treated, as such. Hence, the direct results, while valid in themselves, are a part of a broader perspective, and therefore can be processed, under definite assumptions, so as to provide the feasible and necessary information.

Fig. 7
figure 7

General framework for the data analytic studies involving possible ordering structures