Keywords

1 Introduction

Data Science is the buzzword of the last decade. One of the goals of Data Science is to extract knowledge, insights and usable information from various types of data. Data Science courses typically include the topics on statistical data analysis, machine learning methods and data mining. Correlation, association, similarity, relationship and interestingness coefficients or measures play an important role in data analysis, classification tasks and data mining. Dozens of such measures have been created for various types of data, and articles on these indicators are cited in hundreds and thousands of papers [3, 13, 14, 20, 23,24,25, 27]. Such measures are defined for dichotomous or real valued variables measured for sampling units, for rankings, binary or real valued vectors of attributes, time series, fuzzy sets of different types, probabilistic distributions, images, rating profiles etc. These measures used in information retrieval, data classification, machine learning, analysis of relationships and decision making in ecology, computational linguistics, image and signal processing, financial data analysis, bioinformatics and social sciences. Often, measures introduced for one type of data give misleading results when they are used for analysis of relationships between data of another type [8]. There is a need in construction of a general theory of similarity, correlation and association measures, which will be able to analyze general properties of these measures independent on domain or for large class of data types. Such a theory will make it possible to transfer the methods of constructing association measures from one domain to another and propose such measures for new data types.

The first steps in construction of the general theory of similarity, dissimilarity and correlation measures have been done in [7, 9, 10] where the fundamental results on definition and construction of association measures (correlation coefficients) on sets with involution operations have been proposed. It was shown that many known correlation coefficients can be considered as functions defined on an underlying set with involution operation and satisfying a simple set of properties. The general methods of construction of these functions from suitable similarity or dissimilarity functions have been proposed. These results can be considered as an alternative solution of the Kendall’s problem of construction of general correlation coefficient [22]. Kendall proposed a formula that gives as particular cases Pearson’s product-moment correlation, Spearman’s and Kendall’s rank correlation coefficients but it was not clear how to obtain from this formula other known association and correlation coefficients and how to build correlation coefficients for other domains. The approach proposed in [7, 9, 10] gives possibility to construct the classical correlation coefficients that can be obtained from Kendall’s formula, the Yule’s Q and Hamann coefficients together with some other association coefficients for binary data and gives a tool for constructing new correlation coefficients on new domains.

Similarity, dissimilarity and correlation measures or coefficients are considered in this lecture as functions defined on an underlying set \( \Omega \) and satisfying some reasonable sets of properties. Generally, these functions can be considered as association functions, measuring (may be non-statistical) associations or relationships between objects. The methods of construction of such functions, the transformations of them and relationships between them studied in [7, 9, 10]. This paper presents a short and updated description of some part of author’s Lecture on RAAI Summer School 2019. Due to the limit on the size of the paper it was not possible to include all topics discussed in [9] and in this lecture. The presented paper can be considered as complementary to the [9]. It includes new methods of construction of correlation functions presented recently on INES 2019 [10] and contains more examples of (dis)similarity and correlation functions illustrating these methods. The list of references includes only several works. Hundreds or thousands papers have been published on related topics during more than one hundred years. Of course it was not possible to include most of them. Some useful references can be found in the papers cited in the list of references.

2 Examples of Similarity Measures and Correlation Coefficients for Different Domains

2.1 Binary n-Tuples

Consider n-tuple \( x = \left( {x_{1} , \ldots ,x_{n} } \right) \), \( x_{i} \in \left\{ {0,1} \right\}, i = 1, \ldots ,n \). It can represent an object x described by n binary features, attributes or properties such that \( x_{i} = 1 \) if the object x possesses the i-th attribute and \( x_{i} = 0 \) in the opposite case. Denote X the set of attributes possessed by x. In statistics, a binary n-tuple can denote n measurements of a dichotomous variable or property x for n sampling units. \( x_{i} = 1 \) denotes that the property x fulfilled for the unit i, and \( x_{i} = 0 \) otherwise. We use here the first interpretation of binary n-tuples. For two binary n-tuples \( x = \left( {x_{1} , \ldots ,x_{n} } \right), \) \( y = \left( {y_{1} , \ldots ,y_{n} } \right) \) denote:

  • a the number of attributes possessed by both objects x and y, when \( x_{i} = y_{i} = 1 \);

  • b the number of attributes possessed by x and not by y, when \( x_{i} = 1, y_{i} = 0 \);

  • c the number of attributes possessed by y and not by x, when \( x_{i} = 0, y_{i} = 1 \);

  • d the number of attributes not possessed by both objects x and y, when \( x_{i} = y_{i} = 0. \)

The numbers a and d usually referred to as the numbers of positive and negative matches, respectively. Below are examples of similarity and association measures for binary n-tuples [13]:

Jaccard similarity measure:

$$ S_{J} = \frac{a}{a + b + c} = \frac{{\left| {X \cap Y} \right|}}{{\left| {X \cup Y} \right|}}, $$

Simple Matching similarity measure:

$$ S_{SM} \varvec{ } = \frac{a + d}{a + b + c + d} = \frac{{\left| {X \cap Y} \right| + \left| {\bar{X} \cap \bar{Y}} \right|}}{n}, $$

Hamann coefficient:

$$ A_{H} = \frac{{\left( {a + d} \right) - \left( {b + c} \right)}}{a + b + c + d} = \frac{{\left| {X \cap Y} \right| + \left| {\bar{X} \cap \bar{Y}} \right| - \left| {X \cap \bar{Y}} \right| - \left| {\bar{X} \cap Y} \right|}}{n}, $$

Yule’s Q association coefficient:

$$ A_{Y - Q} \varvec{ } = \frac{ad - bc}{ad + bc} = \frac{{\left| {X \cap Y} \right| \cdot \left| {\bar{X} \cap \bar{Y}} \right| - \left| {X \cap \bar{Y}} \right| \cdot \left| {\bar{X} \cap Y} \right|}}{{\left| {X \cap Y} \right| \cdot \left| {\bar{X} \cap \bar{Y}} \right| + \left| {X \cap \bar{Y}} \right| \cdot \left| {\bar{X} \cap Y} \right|}}. $$

2.2 Real Valued n-Tuples

Consider n-tuples \( x = \left( {x_{1} , \ldots ,x_{n} } \right), \) \( y = \left( {y_{1} , \ldots ,y_{n} } \right) \) with real valued components.

Cosine similarity measure:

$$ cos\left( {x,y} \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{n} x_{i} y_{i} }}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} x_{i}^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{n} y_{i}^{2} } }}, $$

where \( x_{i} , y_{i} \ge 0 \), \( i = 1, \ldots ,n \).

Pearson’s product-moment correlation coefficient:

$$ r = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)\left( {y_{i} - \bar{y}} \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \bar{y}} \right)^{2} } }}, $$

where \( \bar{x} = \frac{1}{n}\mathop \sum \nolimits_{i = 1}^{n} x_{i} \), \( \bar{y} = \frac{1}{n}\mathop \sum \nolimits_{i = 1}^{n} y_{i}. \)

For both measures, it is supposed that denominators do not equal to zero.

2.3 Rankings

Consider two rankings of n objects \( x = \left( {x_{1} , \ldots ,x_{n} } \right), \) \( y = \left( {y_{1} , \ldots ,y_{n} } \right) \) containing n different integer ranks, \( 1 \le x_{i} ,y_{i} \le n \), i.e. “without ties”.

Spearman’s rank correlation coefficient [12, 18, 22]:

$$ \rho = 1 - \frac{{6\mathop \sum \nolimits_{i = 1}^{n} d_{i}^{2} }}{{n\left( {n^{2} - 1} \right)}}, $$

where \( d_{i} = x_{i} - y_{i} , i = 1, \ldots ,n \).

Kendall’s rank correlation coefficient [12, 18, 22] for real valued n-tuples \( x = \left( {x_{1} , \ldots ,x_{n} } \right), \) \( y = \left( {y_{1} , \ldots ,y_{n} } \right) \) without ties, i.e. \( x_{i} \ne x_{j} \), \( y_{i} \ne y_{j} \) for all \( i,j \):

$$ \tau = \frac{NC - ND}{{n\left( {n - 1} \right)/2}}, $$

where NC is the number of concordant pairs \( \left( {i,j} \right) \) calculated as follows:

$$ NC = \mathop \sum \nolimits_{i = 1}^{n - 1} \mathop \sum \nolimits_{j = i + 1}^{n} s_{ij},\quad {\text{and}}\quad s_{ij} = \left\{ {\begin{array}{*{20}r} {1,} \hfill & {if \left( {x_{i} - x_{j} } \right)\left( {y_{i} - y_{j} } \right) > 0} \hfill \\ {0,} \hfill & {\quad \quad \quad otherwise} \hfill \\ \end{array} } \right., $$

and ND is the number of disconcordant pairs calculated as follows:

$$ ND = \mathop \sum \nolimits_{i = 1}^{n - 1} \mathop \sum \nolimits_{j = i + 1}^{n} d_{ij} ,\quad {\text{and}}\quad d_{ij} = \left\{ {\begin{array}{*{20}r} {1,} \hfill & {if \left( {x_{i} - x_{j} } \right)\left( {y_{i} - y_{j} } \right) < 0} \hfill \\ {0,} \hfill & {\quad \quad \quad otherwise} \hfill \\ \end{array} } \right.. $$

Note that only the ordering of the component values of n-tuples is used.

2.4 Finite Probabilistic Distributions

Suppose \( x = \left( {x_{1} , \ldots ,x_{n} } \right) \) is a finite probabilistic distribution, i.e. \( x_{i} \ge 0 \), for all \( i = 1, \ldots ,n \), and \( \mathop \sum \limits_{i = 1}^{n} x_{i} = 1 \).

Bhattacharyya coefficient is defined as follows [1]:

$$ S\left( {x,y} \right) = \mathop \sum \nolimits_{i = 1}^{n} \sqrt {x_{i} y_{i} }. $$

2.5 Kendall’s General Correlation Coefficient

Kendall [22] considered the problem of constructing general correlation coefficient. For two n-tuples \( x = \left( {x_{1} , \ldots ,x_{n} } \right) \) and \( y = \left( {y_{1} , \ldots ,y_{n} } \right) \) he defined it as:

$$ r = \frac{{\mathop \sum \nolimits_{i,j = 1}^{n} a_{ij} b_{ij} }}{{\sqrt {\mathop \sum \nolimits_{i,j = 1}^{n} a_{ij}^{2} \mathop \sum \nolimits_{i,j = 1}^{n} b_{ij}^{2} } }}, $$

where the values \( a_{ij} \) calculated from the values \( x_{i} \) and \( x_{j} \), similarly \( b_{ij} \) calculated from \( y_{i} \) and \( y_{j} \), such that the following conditions are fulfilled: \( a_{ij} = - a_{ji} \), \( b_{ij} = - b_{ji} \), \( a_{ii} = b_{ii} = 0 \) for all \( i,j = 1, \ldots ,n \). As particular cases of this formula he obtained Spearman’s and Kendall’s rank correlation coefficients, and Pearson’s product-moment correlation coefficient. How to obtain other correlation coefficients from this formula does not clear.

3 Similarity and Dissimilarity Functions

The examples of similarity measures and correlation coefficients considered in the previous section show that there is a variety of such indicators for different types of data. Generally, there are introduced tens of such measures and coefficients [13, 14, 27]. To analyze the general properties of these measures, possible relationships between them and to study the methods of their construction in [9] it was proposed to consider these measures as functions defined on a universal domain \( \Omega \) and satisfying some simple sets of properties. Three main types of such functions have been considered: similarity, dissimilarity and correlation functions, which can be used as models of similarity, dissimilarity and correlation measures and coefficients, respectively. Below we introduce these functions and consider some of their properties. More results can be found in [9].

Further, for brevity, similarity and dissimilarity functions will be referred to as (dis)similarity functions or as resemblance functions.

3.1 Definition of Similarity, Dissimilarity and Correlation Functions

Let \( \Omega \) be a nonempty set that will be referred to as a universal domain or an underlying set in definition of similarity and correlation functions. As \( \Omega \) we can consider domain specific for a considered data type: the set of all binary n-tuples, the set of all real valued vectors of the length n, the set of all fuzzy sets defined on some domain X, the set of some images or objects considered in some problem, etc.

A function \( S:\Omega \times\Omega \to \left[ {0,1} \right] \) is called a similarity function on \( \Omega \) if for all x, y in \( \Omega \) it is symmetric:

$$ S\left( {x,y} \right) = S\left( {y,x} \right), $$

and reflexive:

$$ S\left( {x,x} \right) = 1. $$

A function \( D:\Omega \times\Omega \to \left[ {0,1} \right] \) is a dissimilarity function on \( \Omega \) if for all x, y in \( \Omega \) it is symmetric:

$$ D\left( {x,y} \right) = D\left( {y,x} \right), $$

and irreflexive:

$$ D\left( {x,x} \right) = 0. $$

These functions are called complementary if for all x, y in \( \Omega \) it is fulfilled:

$$ S\left( {x,y} \right) + D\left( {x,y} \right) = 1. $$

For complementary (dis)similarity functions we have:

$$ S\left( {x,y} \right) = 1 - D\left( {x,y} \right),D\left( {x,y} \right) = 1 - S\left( {x,y} \right). $$
(1)

A function \( A:\Omega \times\Omega \to \left[ { - 1,1} \right] \) is a correlation function (association measure) on \( \Omega \) if for all x, y in \( \Omega \) it is symmetric:

$$ A\left( {x,y} \right) = A\left( {y,x} \right), $$

reflexive:

$$ A\left( {x,x} \right) = 1, $$

and negative: \( A\left( {x,y} \right) < 0 \) for some x, y in \( \Omega \).

Such correlation functions will be referred to as weak correlation functions if they will not satisfy inverse relationship property considered in Sect. 4. In Sect. 4 we define a strong (invertible) correlation function on a set \( \Omega \) with involution operation.

It is easy to see that the indicators considered in the previous section belong to the following classes of functions:

  • Similarity functions: Jaccard, Simple Matching, Cosine (if all \( x_{i} \) and \( y_{i} \) have non-negative values) similarity measures and Bhattacharyya coefficient;

  • Correlation functions: Hamann and Yule’s Q coefficients, Pearson’s product-moment correlation coefficient, Spearman’s and Kendall’s rank correlation coefficients, Cosine (if \( x_{i} \) and \( y_{i} \) can have positive and negative values).

3.2 Examples of Complementary (Dis)Similarity Functions

Jaccard (see Sect. 2.1):

$$ S_{J} \left( {x,y} \right)\varvec{ } = \frac{a}{a + b + c} = \frac{{\left| {X \cap Y} \right|}}{{\left| {X \cup Y} \right|}}, $$
$$ D_{J} \left( {x,y} \right) = \frac{b + c}{a + b + c} = \frac{{\left| {X \oplus Y} \right|}}{{\left| {X \cup Y} \right|}}. $$

Simple Matching (Sect. 2.1):

$$ S_{SM} \left( {x,y} \right) = \frac{a + d}{a + b + c + d} = \frac{{\left| {X \cap Y} \right| + \left| {\bar{X} \cap \bar{Y}} \right|}}{n}, $$
$$ D_{SM} \left( {x,y} \right) = \frac{b + c}{a + b + c + d} = \frac{{\left| {X \oplus Y} \right|}}{n} = \frac{1}{n}\mathop \sum \nolimits_{1}^{n} \left| {x_{i} - y_{i} } \right|. $$

Cosine (Sect. 2.2):

$$ cos\left( {x,y} \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{n} x_{i} y_{i} }}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} x_{i}^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{n} y_{i}^{2} } }},x_{i} ,y_{i} \ge 0,i = 1, \ldots ,n. $$
$$ D\left( {x,y} \right) = \frac{1}{2}\mathop \sum \nolimits_{i = 1}^{n} \left( {\frac{{x_{i} }}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} x_{i}^{2} } }} - \frac{{y_{i} }}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} y_{i}^{2} } }}} \right)^{2}. $$

Bhattacharyya (Sect. 2.4):

$$ S\left( {x,y} \right) = \mathop \sum \nolimits_{i = 1}^{n} \sqrt {x_{i} y_{i} }. $$
$$ D\left( {x,y} \right) = \frac{1}{2}\mathop \sum \nolimits_{i = 1}^{n} \left( {\sqrt {x_{i} } - \sqrt {y_{i} } } \right)^{2}. $$

The last dissimilarity function is called Hellinger discrimination, Matusita measure or Squared-Chord distance.

3.3 Fuzzy Relations and Kleene Algebra of Resemblance Functions

Here, similarity and dissimilarity functions will be also referred to as resemblance functions. A resemblance function is a symmetric function \( R:\Omega \times\Omega \to \left[ {0,1} \right] \), which is reflexive or irreflexive. We will say that two resemblance functions have the same type if both are reflexive or both are irreflexive.

Resemblance function R can be considered as a fuzzy relation [2, 17, 28, 29], given by membership function: \( \mu_{R} :\Omega \times\Omega \to \left[ {0,1} \right] \), where \( \mu_{R} \left( {x,y} \right) \) is the strength of the relation R between x and y. The properties of fuzzy relations can be extended on resemblance functions.

Denote \( {\mathcal{P}}\left( S \right),{ \mathcal{P}}\left( D \right) \) and \( {\mathcal{P}}\left( R \right) \) the sets of all similarity, dissimilarity and resemblance functions, respectively, defined on the set \( \Omega \). We have \( {\mathcal{P}}\left( S \right),{ \mathcal{P}}\left( D \right) \subset {\mathcal{P}}\left( R \right) \) and \( {\mathcal{P}}\left( S \right) \cup { \mathcal{P}}\left( D \right) = {\mathcal{P}}\left( R \right) \). Define the operations intersection \( \cap \) and union \( \cup \) of resemblance functions \( R_{1} \) and \( R_{2} \) on the set \( \Omega \) for all x, y in \( \Omega \) as follows:

$$ \left( {R_{1} \cap R_{2} } \right)\left( {x,y} \right) = min\left\{ {R_{1} \left( {x,y} \right),R_{2} \left( {x,y} \right)} \right\}, $$
$$ \left( {R_{1} \cup R_{2} } \right)\left( {x,y} \right) = max\left\{ {R_{1} \left( {x,y} \right),R_{2} \left( {x,y} \right)} \right\}. $$

The set \( {\mathcal{P}}\left( R \right) \) will be a distributive lattice [11], partially ordered by the relation:

$$ R_{1} \subseteq R_{2} \,{\text{if}}\,R_{1} \left( {x,y} \right) \le R_{2} \left( {x,y} \right)\,{\text{for}}\,{\text{all}}\,x,y\,{\text{in}}\,\Omega \text{.} $$

The sets \( {\mathcal{P}}\left( S \right) \) and \( {\mathcal{P}}\left( D \right) \) are distributive sublattices of \( {\mathcal{P}}\left( R \right) \). For all similarity functions \( S_{1} ,S_{2} \in {\mathcal{P}}\left( S \right) \) and dissimilarity functions \( D_{1} ,D_{2} \in {\mathcal{P}}\left( D \right) \) it is fulfilled:

$$ \begin{array}{*{20}l} {S_{1} \cap S_{2} , S_{1} \cup S_{2} \in {\mathcal{P}}\left( S \right),} \hfill & {D_{1} \cap D_{2} , D_{1} \cup D_{2} \in {\mathcal{P}}\left( D \right),} \hfill \\ \end{array} $$
$$ S_{1} \cap D_{1} \in {\mathcal{P}}\left( D \right),\quad S_{1} \cup D_{1} \in {\mathcal{P}}\left( S \right). $$

Consider similarity and dissimilarity functions defined for all x, y in \( \Omega \) by:

$$ \begin{array}{*{20}l} {D_{0} \left( {x,y} \right) = 0,} \hfill & {D_{1} \left( {x,y} \right) = \left\{ {\begin{array}{*{20}l} {0,} \hfill & {if\; x = y} \hfill \\ {1,} \hfill & {otherwise} \hfill \\ \end{array} } \right..} \hfill \\ \end{array} $$
$$ \begin{array}{*{20}l} {S_{0} \left( {x,y} \right) = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {if\; x = y} \hfill \\ {0,} \hfill & {otherwise} \hfill \\ \end{array} } \right.,} \hfill & {S_{1} \left( {x,y} \right) = 1,} \hfill \\ \end{array} $$

\( D_{0} \) is the least element of \( {\mathcal{P}}\left( D \right) \) and \( {\mathcal{P}}\left( R \right) \), \( S_{1} \) is the greatest element of \( {\mathcal{P}}\left( S \right) \) and \( {\mathcal{P}}\left( R \right) \), \( S_{0} \) is the least element of \( {\mathcal{P}}\left( S \right) \) and \( D_{1} \) is the greatest elements of \( {\mathcal{P}}\left( D \right) \).

The complement \( N\left( R \right) \) of a resemblance function R is defined for all x, y in \( \Omega \) by: \( {N\left( R \right)} \left( {x,y} \right) = 1 - R\left( {x,y} \right) \). This operation is involutive, i.e. for all \( R \) in \( {\mathcal{P}}\left( R \right) \) we have: \( N\left( {N\left( R \right)} \right) = R \). The complement of the similarity and dissimilarity functions will be equal to their complementary dissimilarity and similarity functions (1), respectively:

$$ \begin{array}{*{20}l} {N\left( D \right) = S,} \hfill & {N\left( S \right) = D.} \hfill \\ \end{array} $$

The lattice \( {\mathcal{P}}\left( R \right) \) with the complement N will be a normal De Morgan (Kleene) algebra [5] where for any resemblance functions \( R_{1} \) and \( R_{2} \) De Morgan laws:

$$ \begin{array}{*{20}l} {N\left( {R_{1} \cap R_{2} } \right) = N\left( {R_{1} } \right) \cup N\left( {R_{2} } \right),} \hfill & {N\left( {R_{1} \cup R_{2} } \right) = N\left( {R_{1} } \right) \cap N\left( {R_{2} } \right),} \hfill \\ \end{array} $$

and normality:

$$ R_{1} \cap N\left( {R_{1} } \right)\; \subseteq \;R_{2} \cup N\left( {R_{2} } \right), $$

are fulfilled.

Entropy of Resemblance Functions.

On the Kleene algebra of resemblance functions one can introduce a measure of non-probabilistic entropy of these functions [5, 9, 15]. Similarity and dissimilarity functions such that \( S\left( {x,y} \right) = D\left( {x,y} \right) = 0.5 \) for all \( x \ne y \) in \( \Omega \), have the maximal entropy and the uncertainty of making decision “x and y are similar” is maximal for such functions [9].

3.4 Min-Transitivity and Hierarchical Clustering

A symmetric and reflexive fuzzy relation \( S:\Omega \times\Omega \to \left[ {0,1} \right] \) is called a fuzzy similarity (fuzzy equivalence) relation [29] if for all x, y, z in \( \Omega \) it satisfies min-transitivity:

$$ S\left( {x,z} \right) \ge min\left\{ {S\left( {x,y} \right),S\left( {y,z} \right)} \right\}. $$

A dissimilarity function D complementary to min-transitive similarity function is called an ultrametric and satisfies for all x, y, z in \( \Omega \) the ultrametric inequality:

$$ D\left( {x,z} \right) \le max\left\{ {D\left( {x,y} \right),D\left( {y,z} \right)} \right\}. $$

For any \( \alpha \in \left[ {0,1} \right] \) the \( \alpha \)-cut of fuzzy relation S defines a crisp relation \( S_{\alpha }\, \subseteq\,\Omega \times\Omega \) as follows: \( S_{\alpha } = \left\{ {\left( {x,y} \right) \in\Omega \times\Omega |S\left( {x,y} \right) \ge \alpha } \right\} \). \( \alpha \)-cuts are nested such that from \( \alpha > \beta \) it follows \( S_{\alpha } \, \subseteq\, S_{\beta } \).

Optimal and Invariant Hierarchical Clustering.

All \( \alpha \)-cuts of the min-transitive similarity function (fuzzy equivalence relation) \( E:\Omega \times\Omega \to \left[ {0,1} \right] \) are non-fuzzy equivalence relations; hence they define nested partitions of the set \( \Omega \) on equivalence classes of these relations. These properties of fuzzy equivalence relations give rise to consider hierarchical clustering [21] of the set \( \Omega \) with similarity function S as a min-transitive transformation of this similarity function S into a fuzzy equivalence relation \( E \). Tamura et al. [26] proposed to transform S into its transitive closure \( \hat{S} \) that will be fuzzy equivalence relation. It was shown [16, 19], that this method coincides with a spanning tree clustering and with a version of the single linkage hierarchical clustering algorithm.

Batyrshin [4, 6] showed that the solution of the problem of optimal approximation of similarity function by fuzzy equivalence relation has the form \( E = \widehat{F\left( S \right)} \), i.e. it can be presented as the min-transitive closure of similarity function \( F\left( S \right) \), where F is a “correction” of S such that \( F\left( S \right) \, \subseteq\, S \). In addition, it was studied the problem of construction of invariant hierarchical clustering algorithms which are invariant under monotone transformations of similarity values and under initial numbering (indexing) of objects. The solution of this problem also have been presented in the form \( E = \widehat{F\left( S \right)} \), where \( F\left( S \right) \, \subseteq\, S \). The parametric family of invariant corrections F has been proposed [4, 6].

3.5 Equivalent Resemblance Functions

Two resemblance functions \( R_{1} \) and \( R_{2} \) of the same type defined on the set \( \Omega \) called equivalent (by ordering) [3, 24] if for all x, y, u, v in \( \Omega \) it is fulfilled:

$$ R_{1} \left( {x,y} \right) \le R_{1} \left( {u,v} \right)\quad {\text{if}}\,{\text{and}}\,{\text{only}}\,{\text{if}}\quad R_{2} \left( {x,y} \right) \le R_{2} \left( {u,v} \right). $$

It is clear that two equivalent resemblance functions should have the same type.

A continuous, strictly increasing function \( \varphi : \left[ {0,1} \right] \to \left[ {0,1} \right] \) such that \( \varphi \left( 0 \right) = 0 \) and \( \varphi \left( 1 \right) = 1 \) is called an automorphism of the interval \( \left[ {0,1} \right] \).

Proposition 1.

If R is a resemblance function on \( \Omega \) and \( \varphi \) is an automorphism of the interval [0,1] then the function \( R_{1} \) defined for all x, y in \( \Omega \) by:

$$ R_{1} \left( {x,y} \right) = \varphi \left( {R\left( {x,y} \right)} \right), $$

will be a resemblance function equivalent to R.

Below there are examples of simplest equivalent transformations of resemblance functions:

$$ \begin{array}{*{20}l} {R_{1} \left( {x,y} \right) = R^{2} \left( {x,y} \right),} \hfill & {R_{1} \left( {x,y} \right) = \sqrt {R\left( {x,y} \right)} .} \hfill \\ \end{array} $$

For example, instead of dissimilarity function

$$ D\left( {x,y} \right) = \frac{1}{2}\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {\frac{{x_{i} - \bar{x}}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} } }} - \frac{{y_{i} - \bar{y}}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \bar{y}} \right)^{2} } }}} \right)^{2} } , $$

one can use the equivalent dissimilarity function

$$ D\left( {x,y} \right) = \frac{1}{4}\mathop \sum \nolimits_{i = 1}^{n} \left( {\frac{{x_{i} - \bar{x}}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} } }} - \frac{{y_{i} - \bar{y}}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \bar{y}} \right)^{2} } }}} \right)^{2}. $$

Equivalent Resemblance Functions and Invariant Clustering.

Equivalence of (dis)similarity functions supposes that the use of such functions in some classification algorithm will give equivalent results. As such clustering algorithms one can use hierarchical clustering algorithms invariant under monotone transformations of similarity values discussed in previous section.

4 Correlation Functions

4.1 Correlation Functions and Correlation Triplets

In Sect. 3 we introduced the definition of correlation function as follows.

A function \( A:\Omega \times\Omega \to \left[ { - 1,1} \right] \) is a correlation function (association measure) on \( \Omega \) if for all x, y in \( \Omega \) it is symmetric:

$$ A\left( {x,y} \right) = A\left( {y,x} \right), $$

reflexive:

$$ A\left( {x,x} \right) = 1, $$

and negative: \( A\left( {x,y} \right) < 0 \), for some x, y in \( \Omega \).

Correlation function will be called a weak correlation function if it does not satisfy the inverse relationship property considered below. Here we consider and extend some results introduced in [10].

Proposition 2.

Suppose S and D are similarity and dissimilarity functions on \( \Omega \) such that for some x, y in \( \Omega \) it is fulfilled: \( S\left( {x,y} \right) < D\left( {x,y} \right) \), then the function defined for all x, y in \( \Omega \) by:

$$ A\left( {x,y} \right) = S\left( {x,y} \right) - D\left( {x,y} \right), $$
(2)

is a correlation function. If S and D are complementary then the function A will be a correlation function if for some x, y in Ω it is fulfilled: \( S\left( {x,y} \right) < 0.5 \).

The obtained formula for A has the reasonable interpretation: the correlation between x and y is positive if the similarity between them is greater than the dissimilarity, and the correlation is negative in opposite case.

If the similarity S and dissimilarity D functions are complementary then the correlation function A defined by (2) is called complementary to S and D. Complementary functions S, D and A will be denoted as (S, D, A) and called a correlation triplet. From the definition of the complementary (dis)similarity functions and from (2) it follows that the similarity, dissimilarity and correlation functions from the correlation triplet (S, D, A) can be obtained one from another for all x, y in Ω as follows:

$$ \begin{array}{*{20}l} {S\left( {x,y} \right) = 1 - D\left( {x,y} \right),} \hfill & {D\left( {x,y} \right) = 1 - S\left( {x,y} \right),} \hfill \\ \end{array} $$
(3)
$$ \begin{array}{*{20}l} {A\left( {x,y} \right) = 2S\left( {x,y} \right) - 1,} \hfill & {S\left( {x,y} \right) = \frac{1}{2}\left( {A\left( {x,y} \right) + 1} \right).} \hfill \\ \end{array} $$
(4)
$$ \begin{array}{*{20}l} {A\left( {x,y} \right) = 1 - 2D\left( {x,y} \right),} \hfill & {D\left( {x,y} \right) = \frac{1}{2}\left( {1 - A\left( {x,y} \right)} \right).} \hfill \\ \end{array} $$
(5)

4.2 Examples of Constructing Correlation Functions from (Dis)Similarity Functions

Hamann coefficient \( \varvec{A}_{\varvec{H}} \) (see Sect. 2.1). The Simple Matching similarity measure \( S_{SM} \left( {x,y} \right) = \frac{a + d}{a + b + c + d} \) has the complementary dissimilarity function \( D_{SM} \left( {x,y} \right) = \frac{b + c}{a + b + c + d} \). From (2) we obtain:

$$ A\left( {x,y} \right) = S\left( {x,y} \right) - D\left( {x,y} \right) = \frac{a + d}{a + b + c + d} - \frac{b + c}{a + b + c + d} = \frac{{\left( {a + d} \right) - \left( {b + c} \right)}}{a + b + c + d} = A_{H} . $$

Yule’s Q association coefficient \( \varvec{A}_{{\varvec{Y} - \varvec{Q}}} \) (Sect. 2.1). The function \( S_{Y} \left( {x,y} \right) = \frac{ad}{ad + bc} \) is the similarity function and the function \( D_{Y} \left( {x,y} \right) = \frac{bc}{ad + bc} \) is it’s complementary dissimilarity function. From (2) we obtain:

$$ A\left( {x,y} \right) = S\left( {x,y} \right) - D\left( {x,y} \right) = \frac{ad}{ad + bc} - \frac{bc}{ad + bc} = \frac{ad - bc}{ad + bc} = A_{Y - Q} . $$

Note that similarly to Yule’s Q association and Hamann coefficients it is easy to construct the most of correlation functions considered for binary data [13].

Pearson’s product-moment correlation coefficient r (Sect. 2.2). The function

$$ D\left( {x,y} \right) = \frac{1}{4}\mathop \sum \nolimits_{i = 1}^{n} \left( {\frac{{x_{i} - \bar{x}}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} } }} - \frac{{y_{i} - \bar{y}}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \bar{y}} \right)^{2} } }}} \right)^{2} , $$

is the dissimilarity function. From (5) obtain Pearson’s product-moment correlation coefficient:

$$ A\left( {x,y} \right) = 1 - 2D\left( {x,y} \right) = 1 - \frac{1}{2}\mathop \sum \nolimits_{i = 1}^{n} \left( {\frac{{x_{i} - \bar{x}}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} } }} - \frac{{y_{i} - \bar{y}}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \bar{y}} \right)^{2} } }}} \right)^{2} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)\left( {y_{i} - \bar{y}} \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \bar{y}} \right)^{2} } }} = r. $$

Spearman’s rank correlation coefficient \( \rho \) (see Sect. 2.3). Consider the function:

$$ D\left( {x,y} \right) = \frac{{3\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - y_{i} } \right)^{2} }}{{n\left( {n^{2} - 1} \right)}}. $$

It satisfies the properties of dissimilarity functions and from (5) we obtain the Spearman’s rank correlation coefficient: \( A\left( {x,y} \right) = 1 - 2D\left( {x,y} \right) = 1 - \frac{{6\mathop \sum \nolimits_{i = 1}^{n} d_{i}^{2} }}{{n\left( {n^{2} - 1} \right)}} = \rho . \)

Kendall’s rank correlation coefficient \( \varvec{\tau} \) (Sect. 2.3). Consider the functions:

$$ S\left( {x,y} \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{n - 1} \mathop \sum \nolimits_{j = i + 1}^{n} s_{ij} }}{{n\left( {n - 1} \right)/2}} = \frac{NC}{{n\left( {n - 1} \right)/2}},\quad D\left( {x,y} \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{n - 1} \mathop \sum \nolimits_{j = i + 1}^{n} d_{ij} }}{{n\left( {n - 1} \right)/2}} = \frac{ND}{{n\left( {n - 1} \right)/2}}. $$

They are the complementary similarity and dissimilarity functions, respectively, such that \( S\left( {x,y} \right) + D\left( {x,y} \right) = 1 \), and from (2) we obtain \( A\left( {x,y} \right) = \frac{NC - ND}{{n\left( {n - 1} \right)/2}} = \tau \).

4.3 Strong (Invertible) Correlation Functions on the Sets with Involution Operation

Initially the correlation function (association measure) was defined on the set with involution operation [7] as function satisfying inverse relationship property considered below. Such correlation functions will be called here strong or invertible correlation functions. It is surprising that all correlation functions considered above are invertible. For this reason, the correlation function which is not invertible will be called a weak correlation function.

A function \( N:\Omega \to\Omega \) is called a reflection or a negation on \( \Omega \) if it satisfies for all x in \( \Omega \) the involutivity property:

$$ N\left( {N\left( x \right)} \right) = x, $$

and if it is not an identity function, i.e. for some x in \( \Omega \) it is fulfilled: \( N\left( x \right) \ne x \).

An element x in \( \Omega \) such that \( N\left( x \right) = x \) is called a fixed point and the set of all fixed points of the reflection N on \( \Omega \) is denoted as \( FP\left( {N,\Omega } \right) \) or \( FP\left(\Omega \right) \).

Definition 1.

[7] Let N be a reflection on \( \Omega \) and V be a subset of \( {\varOmega \setminus }FP\left(\Omega \right) \) closed under N. A strong correlation function (association measure) on V is a function \( A:V \times V \to \left[ { - 1,1} \right] \) satisfying for all x, y in V the properties:

$$ \begin{aligned} & A\left( {x,y} \right) = A\left( {y,x} \right),\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;({\text{symmetry}}) \\ & A\left( {x,x} \right) = 1,\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;({\text{reflexivity}}) \\ & A\left( {x,N\left( y \right)} \right) = - A\left( {x,y} \right).\,\,\,\,\,({\text{inverse}}\,{\text{relationship}}) \\ \end{aligned} $$

The strong correlation function also will be referred to as an invertible correlation function.

Theorem 1.

The correlation function A from a correlation triplet (S, D, A) is invertible if and only if the complementary similarity and dissimilarity functions satisfy the following properties:

$$ \begin{array}{*{20}l} {S\left( {x,y} \right) + S\left( {x,N\left( y \right)} \right) = 1,} \hfill & {D\left( {x,y} \right) + D\left( {x,N\left( y \right)} \right) = 1.} \hfill \\ \end{array} $$

These properties will be called bipolarity properties and corresponding functions S and D will be called bipolar, see [8, 10]. The value 1 equals to the sum of the pole values 0 and 1 of the interval [0, 1] of similarity and dissimilarity values. It is clear that S is bipolar if and only if its complementary dissimilarity function D is bipolar.

Similarly, the property of inverse relationship of correlation function can be written in the form of bipolarity:

$$ A\left( {x,y} \right) + A\left( {x,N\left( y \right)} \right) = 0, $$

taking into account that \( 0 = - 1 + 1 \), i.e. zero equals to the sum of the pole values of the interval \( \left[ { - 1,1} \right] \) of correlation values. With this terminology the Theorem 1 can be formulated as follows: The correlation function A from a correlation triplet (S, D, A) is bipolar if and only if the similarity and dissimilarity functions S and D are bipolar.

The proof follows, for example, from (4): \( A\left( {x,y} \right) + A\left( {x,N\left( y \right)} \right) = 2S\left( {x,y} \right) - 1 + 2S\left( {x,N\left( y \right)} \right) - 1 = 2\left( {S\left( {x,y} \right) + S\left( {x,N\left( y \right)} \right) - 1} \right) \), and the first sum equals to zero if and only the last sum equals to zero.

For complementary (dis)similarity functions we have: \( S\left( {x,y} \right) + D\left( {x,y} \right) = 1 \) and bipolarity of these functions is equivalent to the properties:

$$ \begin{array}{*{20}l} {D\left( {x,y} \right) = S\left( {x,N\left( y \right)} \right),} \hfill & {S\left( {x,y} \right) = D\left( {x,N\left( y \right)} \right).} \hfill \\ \end{array} $$
(6)

Hence to prove the inverse relationship property of the correlation function A it is sufficient to show the fulfillment of the bipolarity or (6) properties for the (dis)similarity functions S or D used in construction of A by means of (2), (4) or (5).

One can show that all correlation functions considered in Sect. 4.2 are invertible with respect to suitable involutions. See some results in [10].

4.4 Constructing Strong Correlation Functions from Co-symmetric (Dis)Similarity Functions

Similarity S and dissimilarity D functions are consistent on the set \( \Omega \) with involution N if for all x in \( \Omega \) it is, respectively, fulfilled [9]:

$$ \begin{array}{*{20}l} {S\left( {x,N\left( x \right)} \right) = 0,} \hfill & {D\left( {x,N\left( x \right)} \right) = 1.} \hfill \\ \end{array} $$

Resemblance function R is co-symmetric on the set \( \Omega \) with involution N if for all x, y in \( \Omega \) it is fulfilled [9]:

$$ R\left( {N\left( x \right),N\left( y \right)} \right) = R\left( {x,y} \right). $$

It was shown [7, 9] that a resemblance function R is co-symmetric if and only if for all x, y in \( \Omega \) it is fulfilled the following property:

$$ R\left( {x,N\left( y \right)} \right) = R\left( {N\left( x \right),y} \right). $$

Proposition 3

[7]. Invertible correlation function is co-symmetric.

Proposition 4

[10]. Bipolar resemblance function is consistent and co-symmetric.

Theorem 1 says that for construction of invertible correlation function from its complementary (dis)similarity functions by (2), (4) or (5) we need to have bipolar (dis)similarity functions. To construct such functions for specific domain is not always easy. From Proposition 4, one can conclude that consistent and co-symmetric (dis)similarity functions may be not so restrictive than bipolar functions, and it is easier to construct such (dis)similarity functions than bipolar functions. The following theorem shows how to use them for constructing invertible correlation functions.

Theorem 2

[7]. Let N be a reflection on \( \Omega \) and V be a nonempty subset of \( \Omega \backslash FP\left(\Omega \right) \) closed under reflection N. Let \( S:V \times V \to \left[ {0,1} \right] \) be a co-symmetric and consistent similarity function, then the function \( A:V \times V \to \left[ { - 1,1} \right] \) defined for all x, y in V by:

$$ A\left( {x,y} \right) = S\left( {x,y} \right) - S\left( {x,N\left( y \right)} \right), $$
(7)

is a strong correlation function on V.

The formula (7) has the simple interpretation: the correlation between x and y is positive if x is more similar to y than to its negation and the correlation is negative in the opposite case.

More general methods of constructing invertible correlation functions (association measures) have been proposed in [7, 9, 10]. These methods instead of difference operation in (7) use pseudo-difference operations and instead of consistent similarity functions in (7) they can use similarity functions satisfying weaker conditions.

5 Conclusion and Future Directions of Research

This work presents a short and updated version of a part of author’s Lecture on RAAI Summer School 2019. The presented paper can be considered as complementary to the [9]. It includes new methods of construction of correlation functions presented recently on INES 2019 [10] and contains more examples of (dis)similarity and correlation functions illustrating these methods. As a future work, it is supposed to extend the developed approach on other types of association and relationships measures and on other domains.