Introduction

The aim of this paper is to provide a Discrete Mathematics point of view of some statistical applications. Two are our main lines of thought: Design Theory and statistical distances. The Design Theory attracts interest from the group theory and projective geometry. Design Theory is discussed, while some emphasis is given to the Latin squares. We also recall the theory of ideals and provide some aspects from the probability theory that, we believe, deserves more attention.

The notion of distance is fundamental in Statistics. In mathematical analysis, especially in metric spaces, the distance serves as a criterion to check the convergence of a sequence, while a sequence in Statistics (with typical example being the Robbins-Monro iterative scheme) is asked to converge in distribution, which is usually the normal distribution; see [13] for details.

The reason is that such sequences can provide maximum likelihood estimators (MLE), being within the classical statistical framework, while other methods might not.

The notion of “concept” associated with the “objects” and “attributes” is introduced, from which the idea of a set-theoretic distance, in a Discrete Mathematics sense, is emerged.

Geometric methods, and therefore distance metrics methods, are adopted in various problems in Statistics. In optimal Experimental Design Theory for the continuous case, geometric methodologies are considered on the induced design space and the relative geometrical aspects have been discussed by Kitsos et al. [20]. For the discrete case, the geometrical approach is tackled in a compact form in this paper.

In principle, the usual (geometrical) distance metric in Statistics is considered to be the Euclidean distance, based on the 2 norm, but this is not the case for Discrete Mathematics; see section “Finite Geometry and Design Theory.” In section “Discrete Mathematics and Statistics,” the relation between Discrete Mathematics and Statistics is developed. The Experimental Design Theory is also discussed, especially the Latin squares. Moreover, a finite geometry approach is also developed in a compact form. In section “Discrete Mathematics and Probability,” the applications of Discrete Mathematics to Probability is presented, while in section “Discrete Distance Measures,” certain distance measures are discussed.

Discrete Mathematics and Statistics

Introduction

Discrete Mathematics offers a strong background to statistical problems, especially to the Design Theory. We shall trace some of these applications, bringing practice with theory. Consider the practical problem where a manufacturer is developing a new product. Let us assume that he/she wishes to evaluate v varieties (of the product) and asks a number of consumers to test them. However, it seems impossible in practice to ask each consumer to test all the varieties. Therefore, two lines of thought might be adopted:

  1. 1.

    Each consumer tests the same number of varieties, say k < v.

  2. 2.

    Each variety should be tested by r consumers.

The above problem gives rise to the following generalization: Let X be any set of size v, i.e., v := |X|. We say that a set \(\mathcal B\) of k-subsets of X is a design, denoted by D(v, k, r) with parameters , when each member x ∈ X belongs to exactly r ≤ v sets of \(\mathcal B\). In Design Theory, a subset \(B\in \mathcal B\) is called a block of the design under investigation.

Suppose now that C denotes any set of k-subsets of X with v := |X|. In general, we say that the marks (i.e., the readings of an experiment) are members of the set

$$\displaystyle \begin{aligned}\mathfrak C := \{(x,C)\}_{x\in C}. \end{aligned} $$
(1)

What in Statistics is known as a replication of a value x is the row total r(x) := #({C :  x occurs in C}). The column total is k in all the cases by the definition of C. Therefore, it is easy to see that

$$\displaystyle \begin{aligned}\sum_{x\in X}r(x) = k|C|.\end{aligned} $$
(2)

Example 1

The above discussion can be visualized with Table 1 where X := {1, 2, …, 6}. Notice that \(\sum r(x) = 12\).

Table 1 The table of Example 1 with k = 3, |C| = 4

In principle, when we are working on a statistical design D(v, k, r) then r(x) := r and as we are working with blocks B, b := |B|, relation (2) is then reduced to vr = bk, and hence

$$\displaystyle \begin{aligned} b = \frac{vr}{k}\le\left(\begin{smallmatrix}\displaystyle v\\ {} \displaystyle k\end{smallmatrix}\right).\end{aligned} $$
(3)

In general, it can be proved that there is a design D(v, k, r) if and only if (iff) k | vr,

$$\displaystyle \begin{aligned} \frac{vr}{k}\le\left(\begin{smallmatrix}\displaystyle v\\ {} \displaystyle k\end{smallmatrix}\right). \notag\end{aligned} $$

The condition that each object (i.e., element of X) belongs to the same number of blocks can be strengthened. In particular, it can be required that a pair of objects or, even more, that t objects are taken at a time: this is known as t-design, .

Let X be a set with v := |X|. Then a set B of k-subsets of X is said to be a t-design, denoted with D(v, k, r t), iff for each t-subset T of X, the number of blocks which contain T is constant r t. It is interesting to notice that

  1. i.

    If B is a t-design, it is also an s-design, 1 ≤ s ≤ t − 1, .

  2. ii.

    If B is a D(v, k, r t) t-design, it is then also a D(v, k, r s) s-design, with

    $$\displaystyle \begin{aligned}r_s = r_t\frac{(v-s)(v-s-1)\cdots(v-t+1)}{(k-s)(k-s-1)\cdots(k-t+1)}.\end{aligned} $$
    (4)
  3. iii.

    For 0 ≤ s ≤ t − 1, it is required that

    $$\displaystyle \begin{aligned}(k-s)(k-s-1)\cdots(k-t+1) \big| r_t(v-s)(v-s-1)\cdots(v-t+1). \notag\end{aligned} $$
  4. iv.

    A recursive formula holds:

  5. v.

    If b = |B|, then

    $$\displaystyle \begin{aligned}b = r_0 = r_1\frac{v}{k}. \notag\end{aligned} $$

Usually a 2-design with k = 3 and r 2 = 1 is called as Steiner Triple System (STS). Such a system can be seen by considering two words, say u 1 and u 2, both of length n in the alphabet {0, 1}. Let u 1 + u 2 denote the word obtained adding the corresponding digits of u 1 and u 2. Then, if we consider X to be the set of all such words with the exception of 00…0, the set of all three subsets of X, formed by {u 1, u 2, u 1 + u 2}, is a 2-design such as D(2n−1, 3, 1) which is an STS design with 2n − 1 varieties.

Latin Squares

In Statistics, and especially in Experimental Design Theory, the Latin squares (LS), as proposed by Fisher in [5], play an important role; see [2, 23, 24] and [3] among others. The traditional example is the following: Suppose we have to plan an agriculture experiment in which four new kinds of fertilizers, say A, B, C, and D, are to be tested in a square field. The “scheme” has to be as in Table 2 in order to describe an LS.

Table 2 A 4 × 4 Latin square L 1

In principle, an LS of order n is an n × n array in which each of the n elements occurs once in each row and once in each column. The statistical term LS comes from the fact that R.A. Fisher used “Latin letters” to cover the “square” field in his agricultural experiments.

We shall denote with L(i, j) the (i, j) entry of an LS L, while the labels for the rows and columns of L as well as for all its (i, j) elements shall be considered as -valued, with being the set of nonnegative integers modulo m.

Given two Latin squares of the same order n, say L 1 and L 2, the relation for the orthogonality between L 1 and L 2 can be defined. Indeed, L 1 ⊥ L 2 iff for every there are just two for which

$$\displaystyle \begin{aligned} L_1(i,j) = k,\ \ \mbox{and}\ \ L_2(i,j) = l. \notag \end{aligned} $$

Following this terminology, Euler in 1786 was the first who suggests constructing an orthogonal pair of arrays of a six-order LS. He was unable to solve this problem. It is now known that no such pair exists. Actually, the problem that Euler suggested was

Given 36 officers of 6 different ranks from 6 different regiments, can they be arranged in a square in such a way that each row and each column contains one officer of each rank and one officer from each regiment?

What to admire? The physical problem itself or the underlined mathematical insight which tries to tackle the problem? We shall come to this problem later.

Despite Euler’s problem that has no solution, there is a way of constructing orthogonal LS. Theorem 2 below offers a method of constructing orthogonal LS (OLS) based on properties of with p being a prime.

Theorem 1

For each v ≥ 2, the v × v array defined by L(i, j) = i + j, is an LS.

See Appendix for the proof.

Theorem 2

Let p be a prime and given. Then, the rule

(5)

defines an LS. Furthermore, for given ba, , it holds that

$$\displaystyle \begin{aligned}L_b\perp L_a. \end{aligned} $$
(6)

See Appendix for the proof.

Example 2

The LS, say L 2, of Table 3 below is orthogonal to LS L 1 of Table 2, as the 16 pairs (A, A), (A, B)…, (D, D) occur in one of the 16 positions, i.e., L 2 ⊥ L 1 by the LS orthogonality definition.

Table 3 A 4 × 4 Latin square L 2

Based on Theorem 2, we say that we can obtain a set of p − 1 mutually orthogonal LS (MOLS) of order p (MOLSp) for each prime p.

Example 3

For p = 3 we get two MOLS3, i.e.,

$$\displaystyle \begin{aligned}L_1:\ \begin{array}{c@{\quad }c@{\quad }c}0 &1 &2\\ 1 &2 &0\\ 2 &0 &1\end{array}\ \ \ \mbox{and}\ \ \ L_2:\ \begin{array}{c@{\quad }c@{\quad }c}0 &1 &2\\ 2 &0 &1\\ 1 &2 &0\end{array} \notag\end{aligned} $$

So far, we discussed that for a prime p, using the properties of the field , it is possible to construct a set of p − 1 MOLSp. The same result holds if we replace p with a prime power q := p r, . Indeed:

Theorem 3

For q being a prime r-power of p, it is possible to construct q − 1 mutually orthogonal Latin squares of order q (MOLSq ).

Proof

Apply Theorem 2 where is now replaced by , with q − 1 nonzero and being a finite field in place of .

In practice, given any field with n elements, we would like to construct n − 1 MOLS. Due to Theorem 3 the arisen question is:

Question

Is it possible to construct n − 1 mutually orthogonal Latin squares of order n (MOLSn) when n is not a prime power?

Answer

In Discrete Mathematics and Statistics, this is one of the most well-known unsolved problems. For the case of n = 6, it is known already that there is not a set of 5 MOLS6 (recall Euler’s problem mentioned earlier which is unsolved).

Another approach to Design Theory, under the context of Discrete Mathematics, is through geometry and particular through finite geometry on projective planes. Recall that there are two main approaches of finite plane geometry: affine and projective. In an affine plane, the normal sense of parallel lines is valid. In a projective plane, by contrast, any two lines intersect at a unique point, so parallel lines do not exist. The finite affine plane geometry and finite projective plane geometry can be described by simple axioms. An affine plane geometry is a nonempty set X (whose elements are called “points”), along with a nonempty collection L of subsets of X (whose elements are called “lines”), such that:

  1. (a)

    For every two distinct points, there is exactly one line that contains both points.

  2. (b)

    Given a line and a point p outside , there exists exactly one line ℓ′ containing p such that \(\ell \cap \ell ' = \varnothing \) (Playfair’s axiom).

  3. (c)

    There exists a set of four points, no three of which belong to the same line.

The last axiom ensures that the geometry is not trivial (either empty or too simple to be of interest, such as a single line with an arbitrary number of points on it), while the first two specify the nature of the geometry. Note that the simplest affine plane contains only four points; it is called the affine plane of order 2, where the order of an affine plane is the number of points on any line. Since no three are collinear, any pair of points determines a unique line, and so this plane (of four points) contains six lines. It corresponds to a tetrahedron where nonintersecting edges are considered “parallel” or a square where not only opposite sides but also diagonals are considered “parallel”; see Fig. 1.

Fig. 1
figure 1

A finite affine plane of order 2. The two diagonal lines do not intersect

In optimal Experimental Design Theory, a geometry is constructed via the induced design space; see [12, 27] and [18] among others. In this paper, the geometric approach is realized in the following through the finite field .

Finite Geometry and Design Theory

Let , where being a finite field. Then, the “coordinate” or analytic plane geometry for is still valid for the elements of . As all the algebraic “manipulations” hold in , the sense of “line” and “plane” for a finite number of “points” is also parent in . In particular, the lines in a 2-design are the blocks on the set points; see Theorem 4 below. Thus, a line satisfies the analytic expression ax + by + c = 0 where with a 2 + b 2≠0.

Theorem 4

Consider a finite field equipped with lines and planes as above. Then, the lines of are the blocks of a 2-design D(v, k, r 2) of the set of points of . In particular, the design is D(q 2, q, 1).

See Appendix for the proof.

The D(q 2, q, 1) design, described in Theorem 4, is usually known as the affine plane over (see also the proof in Appendix). Recall the point (iv) in sub-Section 2.1. For the 2-design above, i.e., for t := 2, it is

$$\displaystyle \begin{aligned}r_1 = r_2\frac{v-2+1}{k-2+1} = r_2\frac{v-1}{k-1} = 1\times\frac{q^2-1}{q-1} = q+1.\end{aligned} $$
(7)

According to property (v), as in Section 3.2, it is

$$\displaystyle \begin{aligned}b = r_0 = r_1\frac{v}{k} = (q+1)\frac{q^2}{q} = q(q+1).\end{aligned} $$
(8)

Example 4

Let q = 3, i.e., is under consideration. There are v = q 2 = 9 points and k = q = 3. Those nine points, say P i, i = 1, 2, …, 9, are

$$\displaystyle \begin{aligned}\begin{array}{l@{\quad }l@{\quad }l} P_1 = (0,0), &P_2 = (0,1), &P_3 = (0,2), \\ P_4 = (1,0), &P_5 = (1,1), &P_6 = (1,2), \\ P_7 = (2,0), &P_8 = (2,1), &P_9 = (2,2). \notag\end{array}\end{aligned} $$

The b = r 0 = q(q + 1) = 12 lines, say i, i = 1, 2, …, 12, are presented in Table 4:

Table 4 Lines of Example 4

Notice that there are four classes with three lines as

$$\displaystyle \begin{aligned}\big\{(\ell_1,\ell_2,\ell_3),(\ell_4,\ell_5,\ell_6),(\ell_7,\ell_8,\ell_9), (\ell_{10},\ell_{11},\ell_{12})\big\}. \notag\end{aligned} $$

Each class of parallel lines has no intersection (common point) between them, while when there is an intersection, one common point exists (as in the Euclidean case of ). However, if we adopt the projective geometry’s approach, i.e., assume that every two lines have always one common point, we are in a finite version of projective geometry [4], and its relation with the Design Theory. Considering a prime power, Theorem 5 holds where the projective plane property over is demonstrated in comparison with the affine plane over , according to Theorem 4.

Theorem 5

For any prime power q, there is a 2-design D(v, k, r 2) = D(q 2 + q + 1, q + 1, 1). This particular design has the additional property that any two blocks have just one member in common.

See Appendix for the proof.

Calculating r 1 and r 0 = b, due to the relations (iv) and (v) in Section 3.2, it holds

$$\displaystyle \begin{aligned}\begin{array}{r*{20}l}r_1 & = \frac{v-1}{k-1}r_2 = \frac{\big(q^2+q+1\big)-1}{(q+1)-1}-1 = q+1, {} \end{array}\end{aligned} $$
(9a)
$$\displaystyle \begin{aligned}\begin{array}{r*{20}l} b = r_0 & = \left(\frac{v}{k}\right)r_1 = q^2+q+1. {}\end{array}\end{aligned} $$
(9b)

See the similarity between (7)–(9b) and (9a)–(9b). Moreover, we can notice that:

  • There are q 2 + q + 1 points and q 2 + q + 1 lines.

  • Each line contains q + 1 points and each point belongs to q + 1 lines.

  • any two points belong to one common line and any two lines have one common point.

Example 5

Consider the affine plane over , as in Example 4. We need the parallel lines of to meet, so we add to each line a new (arbitrary) point, which corresponds to the projective geometry’s point at infinity or infinity point. In particular,

$$\displaystyle \begin{aligned} \ell_i^{\prime} & := \ell_i\cup\{X_1\},\quad i = 1,2,3, \notag\\ \ell_i^{\prime} & := \ell_i\cup\{X_2\},\quad i = 4,5,6, \notag\\ \ell_i^{\prime} & := \ell_i\cup\{X_3\},\quad i = 7,8,9, \notag\\ \ell_i^{\prime} & := \ell_i\cup\{X_4\},\quad i = 10,11,12. \notag\end{aligned} $$

A new line ⊇{X 1, X 2, X 3, X 4}, containing the newly introduced points X i, i = 1, 2, 3, 4, is then introduced and called as the line at infinity or infinity-line. Therefore, the projective plane over has in total 13 lines, i.e., \(\ell _i^{\prime }\), i = 1, 2, …, 12, and , and 13 points, i.e., the given P i, i = 1, 2, …, 9, and X j, j = 1, 2, 3, 4. Each line contains four points, i.e., each block contains four elements, and each pair of points belongs exactly to one line. Hence, the 2-design D(13, 4, 1) is obtained.

Applications of Experimental Design

In practice, a complete randomized block design (CRBD) is analyzed as a two-way ANalysis Of VAriance (ANOVA); see the pioneering work of [26]. The incomplete balanced needs a special ANOVA to analyze the collected data while an incomplete general block design is analyzed through regression analysis; see [8] among others. A (complete) Latin square is analyzed through a three-way ANOVA in industry. That is, ANOVA and regression analysis are adopted to analyze real data with the assistance of an appropriate software; see [15] among others.

The theoretical insight of experimental design provides food for thought for different branches of mathematics. We tried to present some of them in a compact form. The experimenter faces often the need of a strong mathematical background when analyzing a 2n factorial experiment, defined as in [30], and especially for a portion of it. When we are talking about a confounded experiment, one may consider the number-theoretic Kempthorne method [11]; see [9] among others, for a number-theoretic development. A “traditional” example is the following:

Example 6

Construct two blocks in a 23 factorial experiment, so that the term ABC is confounded. We then have the linear combination L = X 1 + X 2 + X 3 with the value of L is evaluated as follows:

$$\displaystyle \begin{aligned}\begin{array}{rll} (1) &= \mathrm{A}^0\mathrm{B}^0\mathrm{C}^0, &L = 0+0+0 = 0, \\ a &= \mathrm{A}^1\mathrm{B}^0\mathrm{C}^0, &L = 1+0+0 = 1, \\ b &= \mathrm{A}^1\mathrm{B}^1\mathrm{C}^0, &L = 1, \\ ab &= \mathrm{A}^1\mathrm{B}^1\mathrm{C}^0, &L = 2 = 0 (\mathrm{mod}2), \\ \end{array} \notag\end{aligned} $$
$$\displaystyle \begin{aligned}\begin{array}{rll} c &=\mathrm{A}^0\mathrm{B}^0\mathrm{C}^1, &L = 1, \\ ac &= \mathrm{A}^1\mathrm{B}^0\mathrm{C}^1, &L = 2 = 0 (\mathrm{mod}2), \\ bc &= \mathrm{A}^0\mathrm{B}^1\mathrm{C}^1, &L = 2 = 0 (\mathrm{mod}2), \\ abc &= \mathrm{A}^1\mathrm{B}^1\mathrm{C}^1,\quad &L = 3 = 1 (\mathrm{mod}2).\end{array} \notag\end{aligned} $$

So, for

$$\displaystyle \begin{aligned}&L = 0\ \ \mbox{the block is}\ \ (1),ab,ac,bc, \notag\\ &L = 1\ \ \mbox{the block is}\ \ (1),a,b,c,abc. \notag\end{aligned} $$

Therefore, if we decide to apply a \(\frac {1}{2}2^3 = 2^{3-1}\) experiment, i.e., a half 23 factorial experiment, we have to choose one of the two blocks as above.

Different rules have been proposed to overpass confounding. For Fisher’s multiple confounding rule, see [23]. The violation of the structure of a 2n factorial design, by adding center points, dominates EVolutionary OPeration (EVOP), [1]. Then we moved to a new “model” by adding more points, i.e., 2n +  center  +  “star”, to study response surface exploration; see [25] among others.

The nonlinear (optimal) experimental design, as it was studied by Kitsos [12], has no relation with the elegant mathematical approach of Fisher. The nonlinear Design Theory suffers from parameter dependence [12], and the practical solution is to adopt sequential design; see [6]. The induced design space offers the possibility of a geometrical consideration, [18, 20]. The compromise of a quasi-sequential approach [14] was also proposed, while some technics based on polynomial root-finding, for nonlinear problems, were studied in [28].

This section offers a quick attempt to complete the development of the experimental design topic, the backbone of Statistics.

Discrete Mathematics and Probability

As we already discussed in section “Discrete Mathematics and Statistics,” discrete mathematical ideas appear to have an aesthetic appeal in Statistics. Especially during the Fisher’s and Kolmogorov’s era, there was an attempt to develop a theoretical discrete framework to shed a new light in statistical problems. In this section, we show influence of Discrete Mathematics in other research areas related to Statistics. The measure-theoretic approach of Kolmogorov had a competitor from algebra, i.e., probability algebra, which was never successfully formulated. The measure theory context of probability was accepted completely. But, still, the algebraic definition of probability space deserves some further attention, especially when the foundation needs some discrete treatment.

Probability algebra (PA), as introduced in [10], is defined to be the pair (α, P), with \(\mbox{{$\alpha $}}\ne \varnothing \) where its set elements A, B, C, … ∈α are called events. P is a real function on α and is called probability. The binary operations A ∨ B and A ∧ B and the unitary operation A c (not in A) equip α with the algebraic structure of a Boolean algebra. For the probability P we assume that:

  1. i.

    P is positive definite, i.e.,

    $$\displaystyle \begin{aligned}\mathrm P(A)\ge0\ \ \mbox{for every}\ \ A\in\mbox{{$\alpha$}},\ \ \mbox{and}\ \ \mathrm P(A) = 0 \Longleftrightarrow A = \varnothing\in\mbox{{$\alpha$}} .\notag\end{aligned} $$
  2. ii.

    P is normed i.e.,

    $$\displaystyle \begin{aligned}\mathrm P(E) = 1\ \ \mbox{where}\ \ E\in\mbox{{$\alpha$}}\ \ \mbox{is the unitary element}. \notag\end{aligned} $$
  3. iii.

    P is additive, i.e.,

    $$\displaystyle \begin{aligned}\mathrm P(A\vee B) = \mathrm P(A)+\mathrm P(B)\ \ \mbox{when}\ \ A\wedge B = \varnothing. \notag\end{aligned} $$

Let β be a Boolean sub-algebra of α. Then the restriction of the function P to β is probability on β. If \(\mbox{{$\alpha $}} := \{\varnothing ,E\}\) with P(E) = 1, \(\mathrm P(\varnothing ) = 0\), the probability algebra (α, P) is called improper.

The terms probability space and Borel probability field, introduced by Kolmogorov in his pioneering work [22], are also constructed through the algebraic approach, while the distribution function was also well defined; see [10, Theorem 5.4].

For given probability algebras (α 1, P 1) and (α 2, P 2) with α i, i = 1, 2, Boolean algebras, consider the isomorphism

$$\displaystyle \begin{aligned}\varphi:\ \mbox{{$\alpha$}}_1\longrightarrow\mbox{{$\alpha$}}_2,\ \ \mbox{where}\ \ A\longmapsto\varphi(A). \notag\end{aligned} $$

Then, we say that the two probability algebras are isometric iff

$$\displaystyle \begin{aligned}P_1(A) = P_2\big(\varphi(A)\big),\quad A\in\mbox{{$\alpha$}}_1. \notag\end{aligned} $$

Example 7

Let A := {α 1, α 2, …, α n}, n ≥ 2. We define α n to be the class of all subsets of A forming a Boolean algebra. Let Pi, i = 1, 2, …, n, 0 < Pi < 1 with ∑iPi = 1 be associated with α i, i = 1, 2, …, n. For every subset of A of the form \(\big \{\mbox{{$\alpha $}}_{\ell _1},\mbox{{$\alpha $}}_{\ell _2},\dots ,\mbox{{$\alpha $}}_{\ell _k}\big \}\), we define the probability P as follows:

$$\displaystyle \begin{aligned}\mathrm P\Big(\big\{\mbox{{$\alpha$}}_{\ell_1},\mbox{{$\alpha$}}_{\ell_2},\dots,\mbox{{$\alpha$}}_{\ell_n}\big\}\Big) := \mathrm P_{\ell_1}+\mathrm P_{\ell_2}+\dots+\mathrm P_{\ell_k}. \notag\end{aligned} $$

Then (α n, P) is a probability algebra with 2n elements, provided that \(\mathrm P(\varnothing ) = 0\).

With the above, we tried to provide some elements of the algebraic foundations of probability. Problems such us convergence in stochastic spaces, expectations of random variables, moments, etc. can be defined appropriately this algebraic approach to probability, [10]. The multivariate problem, the sequential line of thought, and other statistical fields have not been tackled yet.

Algebraic Approach to Concept

We introduce now the term concept through lattice theory. Recall that a lattice is an abstract order structure. It consists of a partially ordered set in which every two elements have a unique supremum (also called a least upper bound or join) and a unique infimum (also called a greatest lower bound or meet). An example is given by the natural numbers, partially ordered by divisibility, for which the unique supremum is the least common multiple and the unique infimum is the greatest common divisor.

The main question, from a statistical point of view, and not only, might be: why lattice? When the study of hierarchies is one of the target of the research, the hierarchy of concept can be proved dominant, using subconcepts and superconcepts. A typical example of a subconcept is “human with a disease” where the superconcept is “healthy human being” which is a subconcept of the superconcept “being.” The concept has to be determined from all the objects belonging to the concept under consideration as well as from all attributes necessarily valid for the defined objects. Usually it is not expected the experimenter to consider all objects and attributes describing the given concept, i.e., a certain amount of sets of objects and attributes is initially fixed.

A concept is every set function ϕ of a set O, called the objects, to another set A, called the attributes. We shall use the notation (O, A).

The above definition of concept provides the mathematical insight of the expression: the “object” O has “attributes” A. Let now all the objects under consideration form the set \(\mathfrak O\), which has a finite number of elements. Similarly, all the attributes form the set \(\mathfrak A\) which also has a finite number of elements. We emphasize that ϕ(O) does not define a unique set A while, equivalently, ϕ −1(A) does not define a unique set O. Based on the collected data, A, O and φ can be defined appropriately.

The data we examine (e.g., any qualitative attributes, “yes” or “no” to a given disease, to human or animals, or the level of quality of an industrial product) act as a generator ϕ of concepts:

$$\displaystyle \begin{aligned}\phi:\ \mathfrak O\longmapsto\mathfrak A ,\quad \varphi(O) = A.\end{aligned} $$
(10)

From the above discussion, the set \(\mathfrak O\) represents a set of “humans” or “animals,” and O can represent “humans with a disease” and φ(O) = A = {0, 1} with 0 := “no” and 1 := “yes”.

When we are interested to create a new concept, we must consider the simple laws of set theory.

The concept union of two concepts is a new concept of the form , \((O_1,A_1),(O_2,A_2)\in \mathfrak C\). The concept intersection \( {\bullet } \bigcap \,\) is defined, respectively, as \((O_1,A_1) {\bullet } \bigcap \,(O_2,A_2) := (O_1\cap O_2,A_1\cup A_2)\), \((O_1,A_1),(O_2,A_2)\in \mathfrak C\); see [17].

It is easy to verify that the concept \((\varnothing ,A)\) is the neutral (zero) element for the union in the sense that .

The definition of the union between two concepts is not only mathematically valid but also practical, as you can assign to the empty set any attribute. The neutral element for the intersection \( {\bullet } \bigcap \,\) is the element \((\mathfrak O,\varnothing )\). It can be proved that the set of all concepts with operation either or \( {\bullet } \bigcap \,\) is a commutative Abelian semigroup with the appropriate neutral element. The set of all concepts \(\mathfrak C\) it can be proved, as far as the union and intersection of concepts are concerned, to be commutative and associative and, therefore, a lattice.

Two concepts (O 1, A 1) and (O 2, A 2) belonging to \(\mathfrak C\) are equivalent if and only if A 1 = A 2. We shall write \((O_1,A_1)\cong (O_2,A_2)\overset {{\scriptscriptstyle \mathrm {def.}}}{ \Longleftrightarrow }A_1 = A_2\).

Proposition 1

The equivalence between two concepts is a genuine equivalence relation among concepts. Therefore, we can create a partition of the concepts (equivalent with each other) coming from the collected data \((\mathfrak C/\cong )\).

See Appendix for the proof.

Therefore, all the concepts of the form (O i, A) are equivalent to (O i ∪ O j, A), \(O_i,O_j\in \mathfrak C\) under the relation “≅,” and the whole class of equivalence is formed by taking the “concept union” . Consequently, from the objects \(O_i\in \mathfrak O\), we create the new object elements of the power set \(\mathcal P(\mathfrak O)\). This is a way to classify concepts depending on attributes.

To classify concepts depending on concepts themselves, we define another equivalence relation of the form (O 1, A 1) ≡ (O 2, A 2)⇔O 1 = O 2.

Relation “≡” is an equivalence relation as it can be proved similarly to Proposition 1.

We call the set O as the extension of a concept, while set A shall be called as the intension of it.

Now, from the definition of the concept union , we realize that by taking the union of two concepts, we find common attributes (similarities) of another “greater” object. Correspondingly, thinking about the concept intersection, we find that “less extension implies greater intension.” Lattice means order, as we mentioned already; so for every two elements of it, there exists another “upper” or “preceding” element and another “lower” or “following” element. It is not a hierarchy (a tree), but it is a network as in Figs. 2 and 3.

Fig. 2
figure 2

From fourth level to super-ordinated

Fig. 3
figure 3

The construction of concept \( \big (O_1, \bigcup _{i = 1}^3A_i \big )\) from \((O_1,\varnothing )\)

We now define the order relations “≼” of the lattice for the already existing operations and \( {\bullet } \bigcap \,\). Indeed:

The concept (O 1, A 1) follows concept (O 2, A 2) or, equivalently, the concept (O 2, A 2) precedes concept (O 1, A 1), if and only if O 1 ⊆ O 2 and A 1 ⊇ A 2, i.e., \((O_1,A_1)\preceq (O_2,A_2)\overset {{\scriptscriptstyle \mathrm {def.}}}{ \Longleftrightarrow }O_1\subseteq O_2\) and A 1 ⊇ A 2.

Moreover a ring structure can be proved for the set \(\mathfrak C\) of concepts. The complement (O, A)c of the concept \((O,A)\in \mathfrak C\) is the concept consisted by the usual set-theoretic complements of \(O\in \mathfrak O\) and \(A\in \mathfrak A\), i.e., (O, A)c := (O c, A c); see [17].

The complement of a concept as above is well defined because:

  1. (a)

    \(O^{\mathrm {c}}\subseteq \mathfrak O\) and \(A^{\mathrm {c}}\subseteq \mathfrak A\) imply that \(\big (O^{\mathrm {c}},A^{\mathrm {c}}\big )\in \mathfrak C\).

  2. (b)

    There is only one complement O c of O and only A c of A; hence, there is only one complement of the concept \((O,A)\in \mathfrak C\).

The symmetric difference or disjunctive union (O 1, A 1)  ⊙ (O 2, A 2) between two concepts (O 1, A 1) and (O 2, A 2) belonging to \(\mathfrak C\) is the concept \(\big (O_1\ominus O_2,(A_1\ominus A_2)^{\mathrm {c}}\big )\in \mathfrak C\) where O 1 ⊖ O 2 := (O 1 ∪ O 2) ∖ (O 1 ∩ O 2) and A 1 ⊖ A 2 := (A 1 ∪ A 2) ∖ (A 1 ∩ A 2) are the usual set-theoretic symmetric differences (or disjunctive unions) of \(O_1,O_2\in \mathfrak O\) and \(A_1,A_2\in \mathfrak A\), respectively. The following can be proved; see [17].

Theorem 6

The set \(\mathfrak C\) enriched with the operation  is a group.

Moreover, the set C enriched with the operations  ⊙ and \( {\bullet } \bigcap \,\) is a ring, where  ⊙ plays the role of “addition” and \( {\bullet } \bigcap \,\) of “multiplication.” It is commutative (due to the commutative property of the operation \( {\bullet } \bigcap \,\)) with unit (because of the neutral element \((\mathfrak O,\varnothing )\) of the operation \( {\bullet } \bigcap \,\)) and distributive from both sides.

Since “≅” is an equivalence relation, we can define equivalence classes of concepts, through this similarity relation in which classes are disjoint. Therefore, one can define the “orbit” in the geometrical sense, and not only, as we are moving from one class to another class; see [7] for a general affine geometric point of view of Statistics and [16] for an affine geometric approach for the logit problem. As we know, the classes are disjoint sets and their union makes the set of reference, \(\mathfrak C\) in this paper. So, in this case, we have a partitioning of \(\mathfrak C\) according to the defined equivalence relation.

Discrete Distance Measures

It is known that the number of coefficients in which vectors X and Y  differ is a distance d(X, Y ), known as Hamming distance. If we let X := (0, 0, 1, 1, 1) and Y := (1, 1, 0, 0, 0) with , then d(X, Y ) = 5. Such a definition is used in binary codes where the minimum distance is always desired. In particular, if we define the weight w := w(a) of a word a, to be the number of ones in a, then w(a) = d(a, 0) and d(a, y) := w(a − y) with y being another word.

The above discussion provides us food for thought to work on the introduction of a discrete distance measure between two given concepts.

The object distance d(O 1, O 2), i.e., the distance between two finite objects \(O_1,O_2\in \mathfrak O\), is defined to be the nonnegative integer expressing the number of elements of their symmetric difference O 1 ⊖ O 2, i.e.,

$$\displaystyle \begin{aligned}d(O_1,O_2) := |O_1\ominus O_2|. \end{aligned} $$
(11)

Proposition 2

The defined object distance d(O 1, O 2) as above is a genuine distance metric, i.e., it satisfies the three properties of a metric: positive definiteness, symmetricity, and triangularity.

Proof

Trivial.

Recall the symmetric difference between two concepts, i.e., C 1  ⊙ C 2 = (O 1 ⊖ O 2, (A 1A 2)c). The distance d between objects, as in (11), is not that informative, since it is a quantitative but not a qualitative one: two sets of objects may have many different elements, coming from the same (we assume homogenous) population, but we are not measuring the data differences qualitatively but quantitatively. Besides, we are not working with objects or attributes, but with both of them, i.e., with concepts.

The symmetric difference O 1 ⊖ O 2 between two objects acts between attributes to create a new one of the form (A 1A 2). Thus, if we want a qualitative distance between \(O_1,O_2\in \mathfrak O\), we must check (A 1A 2). In such a case, we can then define

$$\displaystyle \begin{aligned}d(A_1,A_2) := |A_1\ominus A_2| = |\mathfrak A|-\big|(A_1\ominus A_2)^{\mathrm{c}}\big|, \quad A_1,A_2\in\mathfrak A.\end{aligned} $$
(12)

Note that if the distance between two attributes is increasing, i.e., there are many noncommon attributes, then (12) yields that the number of elements of (A 1A 2)c is decreasing.

Suppose now we have two objects, O 1 and O 2. As a measure of “comparison,” we introduce the normalized distance

$$\displaystyle \begin{aligned}d_{\mathrm{n}}(O_1,O_2) := \frac{|O_1\ominus O_2|}{|O_1|+|O_2|},\quad O_1,O_2\in\mathfrak O.\end{aligned} $$
(13)

For a discussion and a number of calculations for different cases of objects, see the Proof of Claim in Appendix.

Claim

The normalized distance between objects is a function depending on the number of different elements between them, ranging from value 0 (no differences) to 1 (everything is different).

See Appendix for the proof.

Fuzzy Logic Approach

The fuzzy logic extends the classical binary response: a sentence is either true or false (not true), and hence it belongs to the {0, 1} binary set. This binary set can be extended to the [0, 1] interval. In strictly mathematical terms, the characteristic function of a given set Q ∈ Ω, i.e.,

$$\displaystyle \begin{aligned}I_Q:\ Q\longrightarrow\{0,1\},\quad \mbox{with}\ \ Q\ni x\longmapsto I_Q(x)\in\{0,1\}, \notag\end{aligned} $$

is now considered as the membership function of a fuzzy set Q ⊆ Ω, i.e.,

$$\displaystyle \begin{aligned}M_Q:\ Q\longrightarrow[0,\,1],\quad \mbox{with }\ \ Q\ni x\longmapsto M_Q(x)\in[0,\,1]. \notag\end{aligned} $$

The value of M Q(x) declares the degree of participation of the element x ∈ Q which belongs/participates to the fuzzy set Q ∈ Ω. In particular,

$$\displaystyle \begin{aligned}M_Q(x) {:=} \begin{cases} 1, &\!\mbox{declares}\ \mbox{that}\ x\ \mbox{belongs}\ \mbox{to}\ Q, \\ 0, &\!\mbox{declares}\ \mbox{that}\ x\ \mbox{does not belongs}\ \mbox{to}\ Q, \\ q\in(0,\,1), &\!\mbox{declares}\ \mbox{that}\ x\ \mbox{belongs}\ \mbox{``partially''}\ \mbox{(in}\ \mbox{some}\ \mbox{degree)}\ \mbox{to}\ Q.\end{cases} \notag\end{aligned} $$

The above introductory elements are useful to realize the extensions succeeded by fuzzy logic, i.e., adopting an interval of values rather than a single binary value to describe a phenomenon.

The interval mathematics, as described in [29], offers another approach to develop intervals, different than the fuzzy logic one; see [21] and [31] for the corresponding applications.

Recall that, under the fuzzy logic approach, the subset-hood between two sets A and B, subsets of the “universe” set Ω, is

$$\displaystyle \begin{aligned}S(A,B) = \frac{k(A\cap B)}{k(A)}, \notag\end{aligned} $$

where k(A) := card(A) = |A| is the generalized cardinal number and represents the degree to which B is a subset of A. If A ⊆ B then S(A, B) = 1. Based on this, the fuzzy entropy of a set A, denoted with E F(A), can be defined as

$$\displaystyle \begin{aligned}E_{\mathrm{F}}(A) := \frac{k(A\cap A^{\mathrm{c}})}{k(A\cup A^{\mathrm{c}})}. \notag\end{aligned} $$

The fuzzy entropy of A measures “how much” underlap A ∪ A c and overlap A ∩ A c violate the existent laws \(A\cap A^{\mathrm {c}} = \varnothing \), A ∪ A c =  Ω. That is, the fuzzy entropy measures eventually “how much” of A ∪ A c is included to A ∩ A c.

The following theorem rules the fuzzy entropy theory:

Theorem 7 (of Fuzzy Entropy)

It holds that

$$\displaystyle \begin{aligned}E_{\mathrm{F}}(A) = S\big(A\cup A^{\mathrm{c}},A\cap A^{\mathrm{c}}\big). \notag\end{aligned} $$

Proof

From the definition (14a), it holds that

$$\displaystyle \begin{aligned}S\big(A\cup A^{\mathrm{c}},A\cap A^{\mathrm{c}}\big) = \frac{k\Big(\big(A \cup A^{\mathrm{c}}\big)\cap\big(A\cap A^{\mathrm{c}}\big)\Big)}{k(A\cup A^{\mathrm{c}})} = \frac{k(A\cap A^{\mathrm{c}})}{k(A\cup A^{\mathrm{c}})} = E_{\mathrm{F}}(A). \notag\end{aligned} $$

Example 8

It holds

$$\displaystyle \begin{aligned}E_{\mathrm{F}}(\Omega) = S\big(\Omega\cup\Omega^{\mathrm{c}},\Omega\cap\Omega^{\mathrm{c}}\big) = S\big(\Omega\cup\varnothing,\Omega\cap\varnothing\big) = S(\Omega,\varnothing) = 0. \notag\end{aligned} $$

Therefore, the universe set Ω has fuzzy entropy 0, while for the middle point M, it holds

$$\displaystyle \begin{aligned}E_{\mathrm{F}}(M) = S\big(M\cup M^{\mathrm{c}},M\cap M^{\mathrm{c}}\big) = 1. \notag\end{aligned} $$

Based on the above discussion, we can define the fuzzy entropy deviance, i.e.,

$$\displaystyle \begin{aligned}\delta(O_1,O_2) := E_{\mathrm{F}}(O_1)-E_{\mathrm{F}}(O_2) = \frac{k\big(O_1\cap O_1^{\mathrm{c}}\big)}{k\big(O_1\cup O_1^{\mathrm{c}}\big)}- \frac{k\big(O_2\cap O_2^{\mathrm{c}}\big)}{k\big(O_2\cup O_2^{\mathrm{c}}\big)}. \notag\end{aligned} $$

It is always a problem to define a distance measure when information divergences are under consideration; see [19] for the continuous case of the Kullback–Leibler divergence.

We would like to emphasize that fuzziness and randomness are different ideas. They seem similar, but they are not identical. The randomness concerns problems where the event is well defined but it is uncertain if it will take place or not. The fuzziness concerns situations which are not well defined and can only be described in a sufficient way when it is known how we shall move between different classes. That is, we are moving under a fuzzy event to the “probability of a fuzzy event,” which is still (even to such a probability oriented procedure) closer to a measure-theoretic approach than to an algebraic approach (via probability algebras). Moreover, in fuzzy logic, the additivity due to a “measure” is not existing, so it is not a defined probability measure, while the term “possibility” replaces the term “probability.”

Discussion

This paper studied the general influence of the Discrete Mathematics line of thought to Statistics and probability. In particular, the Experimental Design Theory employs many discrete statistical concepts, which offer a solid framework, although the real data analysis is mainly performed by the ANOVA approach. Furthermore, geometry plays an important role in all the mathematical scenarios—so does in the Experimental Design Theory in its finite formulation.

The foundations of probability theory are mainly based on measure-theoretic concepts. But still, there is also a set-theoretic approach. Lattice theory is applied in concepts and Boolean algebra supports the fuzzy logic extension. Distance measures offer criteria to decide “how close” two estimates or two sets or two concepts are. A brief discussion to the subject was also offered in this paper.

Appendix

Proof (of Proposition 1)

We shall prove that the introduced relation ≅ is reflective, symmetric, and transitive. Indeed:

  1. i.

    Reflexivity of relation “≅”. Indeed, (O, A)≅(O, A) ⇔ A = A, which is true due to the reflexivity of the equality relation “= ” for sets.

  2. ii.

    Symmetricity of relation “≅”. Indeed, (O 1, A1)≅(O 2, A 2) ⇔ A 1 = A 2 ⇔ A 2 = A 1 (symmetricity of the equality relation “= ” for sets) ⇔ (O 2, A 2) = (O 1, A 1).

  3. iii.

    Transitivity of relation “≅”. Indeed, it holds that

    $$\displaystyle \begin{aligned}(O_1,A_1)\cong(O_2,A_2)\Leftrightarrow A_1 = A_2\ \ \mbox{and} \end{aligned} $$
    (14a)
    $$\displaystyle \begin{aligned}(O_2,A_2)\cong(O_3,A_3)\Leftrightarrow A_2 = A_3,\end{aligned} $$
    (14b)

    which are both equivalent to A 1 = A 3, due to the transitivity of the equality relation “= ” for sets, and hence (O 1, A 1)≅(O 3, A 3).

Proof (of Theorem 1)

Let us consider (i, j) and (i.j′) be the same symbols for the position. Then,

$$\displaystyle \begin{aligned}L(i,j) = L(i,j') \Longrightarrow i+j = i+j'.\notag\end{aligned} $$

As , then − i exists and thus

$$\displaystyle \begin{aligned}(-i)+i+j = (-i)+i+j' \Longrightarrow j = j'. \notag\end{aligned} $$

This means each symbol occurs once in row i. Since there are v symbols and v positions, each symbol occurs exactly once. The same line of thought is followed for the columns. Therefore, L(i, j) is an LS.

Proof (of Theorem 2)

Following the same line of thought of Theorem 1, we prove that L a = L a(i, j) is an LS. Indeed, for L a(i, j) = L a(i, j′), it holds ai + j = ai + j′. Since , then , and hence j = j′. Similarly, L a(i, j) = L a(i, j′) yields j′ = j. Consider now the position, say (i 1, j 1) of L a, and a different position, say (i 2, j 2) of L b. Moreover, let k 1, k 2 be the symbols for both positions. Then, for

$$\displaystyle \begin{gathered}L_a(i_1,j_1) = k_1,\quad L_b(i_1,j_1) = k_2,\ \ \mbox{it is} \notag\\ ai_1+j_1 = k_1,\quad bi_1+j_1 = k_2, \notag\end{gathered} $$

and for

$$\displaystyle \begin{gathered}L_a(i_2,j_2) = k_1,\quad L_b(i_2,j_2) = k_2,\ \ \mbox{it is} \notag\\ ai_2+j_2 = k_1,\quad bi_2+j_2 = k_2. \notag\vspace{-3pt}\end{gathered} $$

Thus,

$$\displaystyle \begin{aligned}a(i_1-i_2) = j_2-j_1\ \ \mbox{and}\ \ b(i_1-i_2) = j_2-j_1. \notag\end{aligned} $$

Assuming that i 1i 2 then and

$$\displaystyle \begin{aligned}a = b = (i_1-i_2)^{-1}(j_2-j_1). \notag\end{aligned} $$

However, we assumed that ab; thus, k 1 and k 2 are equal in only one position. Therefore, L a ⊥ L b.

Proof (of Theorem 4)

Since x, y are elements of , they can take q different values. So, there are v = q 2 points. As far as the block is concerned, we must prove that every line has exactly q points and that any two points of belong to exactly one line. Indeed:

Consider the line ax + by + c = 0, b≠0. Then,

$$\displaystyle \begin{aligned}y = -b^{-1}(c+ax), \notag\end{aligned} $$

such that (x, y) is on the line, and hence the line has q points. If b = 0, a≠0, it holds

$$\displaystyle \begin{aligned}x = -a^{-1}c. \notag\end{aligned} $$

In such a case, there are q possible values of y in and q points of the form ( − a −1 c, y) lie on the line.

Now, suppose that (x 1, y 1) and (x 2, y 2) are two given distinct points, and hence x 2 − x 1 and y 1 − y 2 are not both zero. The equation of the line “passing” (actually “containing”) the two points is

$$\displaystyle \begin{aligned}\ell:\ (y_1-y_2)x+(x_2-x_1) = x_2y_1-x_1y_2, \notag\end{aligned} $$

is the equation of a line. Moreover, it contains the two given points. If another line is containing the two given points and described by the analytic form

$$\displaystyle \begin{aligned}ax+by+c = 0, \notag\end{aligned} $$

then it holds

$$\displaystyle \begin{gathered}ax_1+by_1+c = 0\ \ \mbox{and}\ \ ax_2+by_2+c = 0,\ \ \mbox{i.e.,} \notag\\ a(x_2-x_1) = b(y_1-y_2). \notag\vspace{-3pt}\end{gathered} $$

The value (x 2x 1)−1 exists in , provided x 1x 2, and hence

$$\displaystyle \begin{aligned}a = b(x_2-x_1)^{-1}(y_1-y_2) = \lambda(y_1-y_2). \notag\end{aligned} $$

So we have

$$\displaystyle \begin{aligned} \begin{array}{rcl}b &\displaystyle =&\displaystyle \lambda(x_2-x_1)\ \ \mbox{and} \notag\\ c &\displaystyle =&\displaystyle -ax_1-by_1 = -ax_1-\lambda(x_2-x_1)y_1 = -\lambda(y_1-y_2)x_1-\lambda(x_2-x_1)y_1 \\ &\displaystyle =&\displaystyle \lambda(x_1y_2-x_2y_1). \notag\end{array} \end{aligned} $$

Thus, the line is the same with the above defined since lines and λℓ coincide, in finite geometry. Therefore, only one line exists and “contains” the two points.

Proof (of Theorem 5)

In the affine plane over , the lines

$$\displaystyle \begin{aligned}ax+by+c = 0\ \ \mbox{and}\ \ a'x+b'y+c' = 0, \notag\end{aligned} $$

are said to be parallel if ab′ = a′b in . There are q + 1 equivalence classes of parallel lines of the form

and the y = 0 line. Any point of the affine plane belongs to just one line in each class.

We introduce now q + 1 points X λ, , and X , all belonging to a new line ; see also Example 5 for the discussion on line. The X λ points lie to each line parallel to line x + λy = 0, while X to each line parallel to y = 0. We have to prove that the lines are blocks of a design with parameters stated above. There are q 2 + q + 1 points and each line contains q + 1 points. Thus, we proved that any two distinct points, say H and I, belong to just one line. Then, the following cases can be true:

  1. 1.

    The points H and I are both parts of the initial affine plane; let us called them “old” points. So, H and I belong to a unique line of this plane, which corresponds uniquely to a line on the extended plane (with infinity point).

  2. 2.

    If H is an “old point” and I is a “newly added” point, i.e., I := X λ, then H belongs already to precisely one “old” line in the parallel class represented by X , and, therefore, the corresponding new line in the unique line ℓ′ contains points H and I. The same is certainly true if X := X .

  3. 3.

    If H and I are both “new” points, then they belong to by definition. Therefore, any two points belong to just one line (i.e., to one block). Now, from the above construction, any two lines have just one common point.

Proof (of Claim)

We distinguish the following cases:

  • Case \(O_1\cap O_2 = \varnothing \) (disjointed objects). In general, it holds

    $$\displaystyle \begin{aligned}O_1\ominus O_2 = (O_1\setminus O_2)\cup(O_2\setminus O_1) = O_1\cap O_2, \quad O_1,O_2\in\mathfrak O,\end{aligned} $$
    (15)

    and, hence, in this case. we obtain

    $$\displaystyle \begin{aligned}|O_1\ominus O_2| = |O_1\cup O_2| = |O_1|+|O_2|-|O_1\cap O_2| = |O_1|+|O_2|,\end{aligned} $$
    (16)

    which is—in principle—the maximum possible number of elements of the symmetric difference between two objects. Thus, the normalized distance d n between \(O_1,O_2\in \mathfrak O\) equals 1. Indeed, via (15),

    $$\displaystyle \begin{aligned}|O_1\ominus O_2| & = \big|(O_1\setminus O_2)\cup(O_2\setminus O_1)\big| \notag\\ & = \|(O_1\setminus O_2|+|O_2\setminus O_1|- \big|(O_1\setminus O_2)\cap(O_2\setminus O_1)\big|) \notag\\ & = |O_1\setminus O_2|+|O_2\setminus O_1|-|O_1\cap O_2| \notag\\ & \le |O_1\setminus O_2|+|O_2\setminus O_1|\le|O_1|+|O_2|, {}\end{aligned} $$
    (17)

    where the equality holds iff O 1 ∖ O 2 = O 1 and O 2 ∖ O 1 = O 2, which is equivalent to \(O_1\cap O_2 = \varnothing \). Furthermore, the normalized distance between O 1 and O 2 is confirmed to be 1 since

    $$\displaystyle \begin{aligned}d_{\mathrm{n}}(O_1,O_2) := \frac{|O_1\ominus O_2}{|O_1|+|O_2|} = \frac{|O_1|+|O_2|}{|O_1|+|O_2|} = 1.\end{aligned} $$
    (18)
  • Case O 1 ⊆ O 2 (included objects). If O 1 is a subset of O 2, then

    $$\displaystyle \begin{aligned}|O_1\ominus O_2| & = |O_1\setminus O_2|+|O_2\setminus O_1|- \big|(O_1\setminus O_2)\cap(O_2\setminus O_1)\big| \notag\\ & = |\varnothing|+|O_2\setminus O_1|-\big|\varnothing\cap(O_2\setminus O_1)\big| \notag\\ & = |O_2\setminus O_1|-|\varnothing|\le|O_2|, {}\end{aligned} $$
    (19)

    where the equality holds iff \(O_1 = \varnothing \). In principle, the normalized distance d n between objects is less than or equal to 1. Indeed, for \(O_1,O_2\in \mathfrak O\),

    $$\displaystyle \begin{aligned}d_{\mathrm{n}}(O_1,O_2) & = \frac{|O_1\setminus O_2|+|O_2\setminus O_1|- \big|(O_1\setminus O_2)\cap(O_2\setminus O_1)\big|}{|O_1|+|O_2|} \notag\\ & \le \frac{|O_1\setminus O_2|+|O_2\setminus O_1|}{|O_1|+|O_2|}\le \frac{|O_1|+|O_2|}{|O_1|+|O_2|} = 1. {}\end{aligned} $$
    (20)

    The above is also confirmed, via (19), for the specific case of O 1 ⊆ O 2, as

    $$\displaystyle \begin{aligned}d_{\mathrm{n}}(O_1,O_2) := \frac{|O_1\ominus O_2}{|O_1|+|O_2|} = \frac{|O_2\setminus O_1|}{|O_1|+|O_2|}\le\frac{|O_2|}{|O_1|+|O_2|}\le1,\end{aligned} $$
    (21)

    where, again, the equality holds iff \(O_1 = \varnothing \).

  • Case O 2 ⊆ O 1 (included objects). If O 2 is a subset of O 1, then we obtain dually that

    $$\displaystyle \begin{aligned}d_{\mathrm{n}}(O_1,O_2) := \frac{|O_1\ominus O_2|}{|O_1|+|O_2|} = \frac{|O_1\setminus O_2|}{|O_1|+|O_2|}\le\frac{|O_1|}{|O_1|+|O_2|}\le1,\end{aligned} $$
    (22)

    where the equality holds iff \(O_2 = \varnothing \).

  • Case O 1 = O 2 (equated objects). If object O 1 coincides with object O 2, then

    $$\displaystyle \begin{aligned}d_{\mathrm{n}}(O_1,O_2) & = \frac{|O_1\setminus O_2+|O_2\setminus O_1|- \big|(O_1\setminus O_2)\cap(O_2\setminus O_1)\big|}{|O_1|+|O_2|} \notag\\ & = \frac{|\varnothing|+|\varnothing|- |\varnothing\cap\varnothing|}{|O_1|+|O_2|} = 0. {}\end{aligned} $$
    (23)