1 Introduction

A quantum object set (QOS), see for example reference [1], can be constructed by the Cartesian product: \(Q=M \times P\) of a well-defined submicroscopic object set: \(M=\{m_I | I=1,N\}\) and a tag set made by quantum density functions (DF): \(P=\{\rho _I |I=1,N\}\).

Then, a quantum polyhedron (QP) can be defined in some functional vector space as a geometrical construct, made by means of the tag set \(P\) of some QOS. The corresponding cardinality of the \(\hbox {QOS}:N\), is coincident with the number of vertices of the QP.

Recently, collective Euclidian distances have been studied [2, 3] in connection with quantum similarity theoretical foundations. Also, recent studies performed on the similarity relations between the vertices of QP, provide the possibility to uniformize any of such DF sets, acting as QP vertices, via an Origin Shift (OS) with respect any vertex or convex linear combination of them [4]. Several publications describe, the general features and properties of such an operation [5, 6].

On the other hand, preliminary studies [8, 9] have conducted towards the application of the QP OS, which has been useful to set up an efficient quantum QSPR (QQSPR) procedure [10]. Some interesting properties about QP OS have been obtained and recently published [7].

Based on the above definitions and the previous literature, the aim of the present work is the development of another set of characteristic properties related to QP and their connection with straightforward statistical algorithms.

The main result of the present discussion can be resumed with a surprisingly simple definition of a global dissimilarity index, related to a collective distance feature existing within QP. A similar question constituted a must, which was initially discussed in several previous papers [1113], but taking into account a complete different perspective than the one proposed here in the present study.

Owing to the above proposals, QP computational structures will be first studied over origin shifted DF vertices, to be subsequently extended over shape functions and finally to classically N-dimensional descriptor defined multimolecular polyhedra (MP).

2 Quantum polyhedra characteristic functions

2.1 Centroid

The simpler and general convex linear combination of the vertices belonging to any QP is the centroid, which can be directly expressed as:

$$\begin{aligned} {\rho _{C}} =N^{-1}\sum _I {\rho _{I}}. \end{aligned}$$
(1)

Such a trivial definition acts as an arithmetic mean of the involved DF set forming the QP. As it has been said before, the centroid, among other applications, has been employed to origin shifting QP for QQSPR purposes [10].

One must note now that the primary mathematical characteristic of the DF tag set elements, acting as QP vertices, is essentially their positive definite nature. The centroid definition (1) can be taken as a convex linear combination of the QP vertices, where all the coefficients are equal to the inverse of the number of vertices. Therefore, the centroid function inherits in this way the character of being positive definite. Besides, the centroid function can be considered as the point in infinite dimensional space, for which the sum of squared Euclidian distances to all the vertices is minimal.

2.2 Variance

Now, from such intuitive construction of the QP centroid, it is easy to design the function equivalent to the variance attached to the elements of the QP, by means of the expression:

$$\begin{aligned} \upsilon _C =N^{-1}\sum _I {\left( {\rho _I -\rho _C } \right) ^{2}} =\rho _C^{(2)} -\rho _C^2, \end{aligned}$$
(2)

where the function,

$$\begin{aligned} \rho _{C}^{\left( 2 \right) } =N^{-1}\sum _{I} {\rho _{I}^{2}} \end{aligned}$$

represents the DF set squared elements centroid.

2.3 Some remarks

The expression (2) above coincides with the well-known algorithm attached to the variance of a set of discrete scalar values, except that in this QP case, the involved elements in the variance computation are themselves DF. Thus, defined in this way, the QP variance also appears as a positive definite function.

As will be discussed later on, centroid and variance functions as previously defined here, can produce at the end numerical values, which are obviously related with the usual mean and variance of a set of values of a random variable, as they are usually defined in statistical lore, but have not to be confused with them as in the present paper the source are functions.

While the centroid can be associated to a central point within the QP, thus to some kind of mean value, the numerical QP variance described below acquires the general definition of some squared Euclidian distance involving an indefinite number of QO.

Both QP centroid and variance functions are attached to the same number of variables which forms the QP vertex DF set. They belong in this way to the same functional vector space subtended by the DF set.

3 Numerical QP arithmetic mean and variance

In the same way as the pair of QP characteristic functions: \(\{ \rho _C; \upsilon _C\}\) as above defined, there seems also interesting to know the positive scalars provided by the Minkowski norms of both.

3.1 Centroid Minkowski norm as an arithmetic mean

That is, in one hand the Minkowski norm of the centroid DF yields:

$$\begin{aligned} \left\langle {\rho _C } \right\rangle =N^{-1}\sum _I {\left\langle {\rho _I}\right\rangle } =N^{-1}\sum _I {\nu _I } =\nu _C, \end{aligned}$$
(3)

provided that the number of particles of every QO is described by the number set: \(\{\nu _I|I=1,N\}\). Thus, \(\nu _C\) is the average number of particles involved in the DF of the QO’s contained in the QOS, associated in turn with the QP vertices.

When the QO’s are molecules, then \(\nu _C\) corresponds to the average number of electrons of all the involved structures attached to the DF set of vertices of the QP.

3.2 Minkowski norm of the variance as a collective QP index

On the other hand, the Minkowski norm of the QP variance, provides the sequence of equalities:

$$\begin{aligned} \left\langle {\upsilon _C}\right\rangle&= \left\langle {\rho _C^{\left( 2 \right) }}\right\rangle -\left\langle {\rho _C^2} \right\rangle \nonumber \\&= N^{-1}\sum _I {\left\langle {\rho _I^2 } \right\rangle } -N^{-2}\left\langle {\left( {\sum _I {\rho _I } } \right) ^{2}} \right\rangle \nonumber \\&= N^{-1}\sum _I {\left\langle {\rho _I \rho _I } \right\rangle } -N^{-2}\sum _I {\sum _J {\left\langle {\rho _I \rho _J } \right\rangle }}\nonumber \\&= N^{-1}\sum _I {Z_{II} -} N^{-2}\sum _I {\sum _J {Z_{IJ} } }\nonumber \\&= N^{-1}Tr\left( \mathbf{Z} \right) -N^{-2}\left\langle \mathbf{Z} \right\rangle \end{aligned}$$
(4)

where use has been made of the similarity matrix Z definition, see for example [1416], attached to the DF vertex set forming the QP:

$$\begin{aligned} \mathbf{Z}=\left\{ {Z_{IJ} \left| {I,J=1,N} \right. } \right\} \leftarrow \forall I,J:Z_{IJ} =\left\langle {\rho _I \rho _J } \right\rangle =\int _D {\rho _I\left( \mathbf{r} \right) \rho _J } \left( \mathbf{r} \right) d\mathbf{r}. \end{aligned}$$

Moreover, as it is well-known, one can write:

$$\begin{aligned} Tr\left( \mathbf{Z} \right) =\sum _I {Z_{II} =\left\langle {Diag\left( \mathbf{Z} \right) } \right\rangle } . \end{aligned}$$

Then, \(\langle {\upsilon _C}\rangle \) the numerical variance of the QP, corresponds to the arithmetic mean of the QOS self-similarities minus the arithmetic mean of the similarity matrix. By definition one can certainly write: \(\langle {\upsilon _C } \rangle \ge 0\).

The magnitude of this QP numeric variance will indicate the generic similarity-dissimilarity between the whole QP vertices. In fact, taking into account that it can be also defined: \(Offdiag(\mathbf{Z} )=\mathbf{Z}-Diag( \mathbf{Z})\), as the original similarity matrix but provided with a zero diagonal. Also, due to the symmetric nature of the similarity matrix, it can be used the algorithm:

$$\begin{aligned} \left\langle {Offdiag\left( \mathbf{Z} \right) } \right\rangle =2\sum _I {\sum _{J>I} {Z_{IJ}}}. \end{aligned}$$

Therefore, the QP numerical variance can be also written as:

$$\begin{aligned} \left\langle {\upsilon _C } \right\rangle&= N^{-1}\left\langle {Diag\left( \mathbf{Z}\right) } \right\rangle -N^{-2}\left\langle {Offdiag\left( \mathbf{Z} \right) +Diag\left( \mathbf{Z} \right) } \right\rangle \\&= N^{-1}\left( {1-N^{-1}} \right) \left\langle {Diag\left( \mathbf{Z} \right) } \right\rangle -N^{-2}\left\langle {Offdiag\left( \mathbf{Z} \right) } \right\rangle \end{aligned}$$

From this result one can argue that the numerical value of the QP variance measures the balance between self-similarities: \(\{Z_{II}\}\) and pair similarities: \(\{Z_{IJ} | I\ne J\}\), associated with the different vertices of the QP.

Besides, as a consequence of the fact that the resultant numerical QP variance value must be non-negative definite, a relationship can be written, which must be fulfilled by any similarity matrix and which can be written as:

$$\begin{aligned} \left( {1-N^{-1}} \right) \left\langle {Diag\left( \mathbf{Z} \right) } \right\rangle&\ge N^{-1}\left\langle {Offdiag\left( \mathbf{Z} \right) } \right\rangle \\ \rightarrow \left( {N-1} \right) \left\langle {Diag\left( \mathbf{Z} \right) } \right\rangle&\ge \left\langle {Offdiag\left( \mathbf{Z} \right) } \right\rangle \end{aligned}$$

A value of \(\langle {\upsilon _C } \rangle \) nearby zero will indicate a large similarity between the whole QP vertex pairs. While larger the variance, greater the dissimilarity between the vertices. That is: between the QO DF forming the QP.

4 Examples

It is worthwhile to discuss several simple examples in order to grasp the nature and interest of the QP numerical variance.

4.1 Two QO case

In a QP just made by two QO, the similarity matrix and the variance can be written with the general formalism:

$$\begin{aligned} \mathbf{Z}=\left( {{\begin{array}{ll} \alpha &{} \gamma \\ \gamma &{} \beta \\ \end{array}}} \right) \rightarrow \left\langle {\upsilon _C } \right\rangle =\frac{1}{2}\left[ {\frac{1}{2}\left( {\alpha +\beta } \right) -\gamma } \right] \leftarrow N=2 \end{aligned}$$

It is obvious that a null variance value will coincide with the fact that both involved QO are exactly the same and because in this case: \(\alpha =\beta =\gamma \). Moreover, in this simple case nullity of the variance might also happen when: \(\alpha +\beta =2\gamma \), though. Also a greater value of the variance will be associated to the fact that the arithmetic average of self-similarities becomes greater than the similarity between both QO, that is, when the inequality: \(\frac{1}{2}(\alpha +\beta )>\gamma \), holds.

In general, this also precludes a new distance-like similarity index between any pair of QO: \(\{m_I ,m_J\}\), say, as one can write in this circumstance the following expression:

$$\begin{aligned} \left\langle {\upsilon _{IJ} } \right\rangle =\frac{1}{2}\left[ {\frac{1}{2}\left( {Z_{II} +Z_{JJ} } \right) -Z_{IJ} } \right] . \end{aligned}$$

Written as above, this index can be compared with the squared Euclidian Distance between a pair of QO, which could be written as:

$$\begin{aligned} D_{IJ}^2 =Z_{II} +Z_{JJ} -2Z_{IJ}. \end{aligned}$$

Therefore, comparing both expressions it might be also written:

$$\begin{aligned} \left\langle {\upsilon _{IJ} } \right\rangle =\frac{1}{4}D_{IJ}^2 . \end{aligned}$$

Resulting into that the generalized QP numeric variance will become a trivial scaled Euclidian distance definition, when considering two QO only.

4.2 Three QO case

After the previous result for two QO, one must take into account that the numerical variance of a QP structure can be extended to any number of QO. For instance, a set of three QO will have the following numerical variance:

$$\begin{aligned} \left\langle {\upsilon _{IJK}}\right\rangle =\frac{2}{9} \left[ \left( Z_{II} +Z_{JJ} +Z_{KK}\right) -\left( Z_{IJ} +Z_{IK} +Z_{JK} \right) \right] , \end{aligned}$$

owing to the symmetric structure of the similarity matrix Z.

To have another particular point of view on how the multiple distance QP index will behave, just suppose that the three QO are the same, then all the similarity integrals will become equal and obviously \(\langle \upsilon _{IJK} \rangle =0\).

Moreover, imagine now that only two QO are the same, for instance:

$$\begin{aligned} m_I =m_J \rightarrow Z_{II} =Z_{JJ} =Z_{IJ} \wedge Z_{IK} =Z_{JK} \end{aligned}$$

then

$$\begin{aligned} \left\langle {\upsilon _{IIK} } \right\rangle&= \frac{2}{9}\left[ {\left( {2Z_{II} +Z_{KK} } \right) -\left( {Z_{II} +2Z_{IK} } \right) } \right] \\&= \frac{2}{9}\left[ {\left( {Z_{II} +Z_{KK} } \right) -2Z_{IK} } \right] =\frac{2}{9}D_{IK}^2 =\frac{2}{9}D_{JK}^2 \end{aligned}$$

providing a result coherent with the fact that only two different QO are present, and obtaining an expression associated to the previous two QO case, where the numerical variance corresponds to a scaled Euclidian distance.

4.3 Conclusion

At the light of the previous simpler examples it must be stressed again the fact that such an index, defined over a similarity matrix Z, attached to the tag set of a QOS forming a QP, by means of the difference of two mean values involving the similarity matrix elements:

$$\begin{aligned} \left\langle {\upsilon _C } \right\rangle =N^{-1}Tr\left( \mathbf{Z} \right) -N^{-2}\left\langle \mathbf{Z} \right\rangle , \end{aligned}$$
(5)

corresponds to a generalized squared Euclidian collective distance between any arbitrarily large number of QO.

5 Origin Shifted (OS) DF set and variance

The QP variance function definition can be also constructed in the following manner. Take into account that in any QP it can be constructed an OS DF set in the following way [4, 5]:

$$\begin{aligned} \forall I:\xi _I=\rho _I -\rho _C \rightarrow S=\left\{ {\xi _I \left| {I=1,N} \right. } \right\} . \end{aligned}$$

Therefore, the QP variance can be now easily rewritten as the mean function of the squares of the OS DF:

$$\begin{aligned} \upsilon _C =N^{-1}\sum _I {\left| {\xi _I } \right| } ^{2} \end{aligned}$$

and the previously performed numerical analysis can be effortlessly described as:

$$\begin{aligned} \left\langle {\upsilon _C } \right\rangle =N^{-1}\sum _I {\left\langle {\left| {\xi _I } \right| ^{2}} \right\rangle } . \end{aligned}$$

Besides that, the Minkowski norms of the squared OS DF integrals averaged can be interpreted as the Euclidian norms of the OS DF set \(S\), as one can write:

$$\begin{aligned} \forall I:\left\langle {\left| {\xi _I } \right| ^{2}} \right\rangle =\int _D {\left| {\xi _I \left( \mathbf{r} \right) } \right| ^{2}} d\mathbf{r}=\left\langle {\xi _I } | {\xi _I } \right\rangle . \end{aligned}$$

However, taking into account the definition of the OS DF set, the resultant final expression can be transformed into the already deduced one.

Contrarily to the non-negative definite structure of the DF set, the OS DF elements possess a non-definite structure, but obviously enough their squared modules behave as a non-negative set of functions.

The role of the OS DF set \(S\) is important in the definition of an efficient algorithm for QQSPR purposes [10].

6 Positive definite weighted variance

When variance is integrated in order to obtain a collective distance index, involving the whole set of a QP vertices, as previously commented, its definition can be generalized just weighting the involved integrals with a positive definite operator, see for example [1416]. This prospect might be easily made, for instance using the generalized definition of the involved integrals employing some positive definite operator: \(\Omega (\mathbf{r}_1 ,\mathbf{r}_2 )\),

$$\begin{aligned} \forall I:\left\langle {\left| {\xi _I } \right| ^{2}} \right\rangle \rightarrow \left\langle {\Omega \left| {\xi _I } \right| ^{2}} \right\rangle \equiv \int _D {\int _D {\xi _I \left( {\mathbf{r}_1 } \right) } } \Omega \left( {\mathbf{r}_1 ,\mathbf{r}_2 } \right) \xi _I \left( {\mathbf{r}_2 } \right) d\mathbf{r}_1 d\mathbf{r}_2. \end{aligned}$$

Using this generalized definition, the variance of a QP and thus the generalized distance between their vertices are easily set up.

7 Shape functions and QP variance

Another question might be the construction of a Shape QP (SQP). It can be easily made starting with a set of shape functions (ShF) as vertices, substituting the QP DF set of vertices.

The set of SQP vertices can be now described as the ShF set: \(\Sigma =\{\sigma _I | I=1,N\}\). The ShF set can be straightforwardly obtained taking into account the number of particles attached to every DF: \(\{\nu _I |I=1,N\}\) as defined in Eq. (3). Then, one can easily write the ShF set as the set of function elements everyone having a unit Minkowski norm. Such a set can be obtained from the DF tag set: \(P=\{\rho _I | I=1,N \}\), defined at the beginning, in the following way:

$$\begin{aligned} \forall I:\sigma _I =\nu _I^{-1} \rho _I \rightarrow \left\langle {\sigma _I} \right\rangle =\nu _I^{-1} \left\langle {\rho _I } \right\rangle =\nu _I^{-1} \nu _I =1 \end{aligned}$$

Obviously enough, the centroid of the SQP will be defined in turn as the shape function:

$$\begin{aligned} \sigma _C =N^{-1}\sum _I {\sigma _I } \rightarrow \left\langle {\sigma _C } \right\rangle =N^{-1}\sum _I {\left\langle {\sigma _I } \right\rangle } =1. \end{aligned}$$

One can be interested on how different both QP and SQP centroid functions are. For instance, the cosine of the angle between them can be easily computed, provided that a \(N\)-dimensional (ND) column vector with the inverses of the number of particles, like: \(| \mathbf{v} \rangle =\{ {v_I^{-1} } \}\) is defined, then the squared cosine between both centroids can be written as:

$$\begin{aligned} r^{2}=\frac{\left\langle {\rho _C \sigma _C } \right\rangle ^{2}}{\left\langle {\rho _C \rho _C } \right\rangle \left\langle {\sigma _C \sigma _C } \right\rangle }=\frac{\left( {{\sum \nolimits _I} {{\sum \nolimits _J} {\nu _J^{-1} Z_{IJ} } } } \right) ^{2}}{\left( {{\sum \nolimits _I} {{\sum \nolimits _J} {Z_{IJ} } } } \right) \left( {{\sum \nolimits _I} {{\sum \nolimits _J}{\nu _I^{-1} \nu _J^{-1} Z_{IJ} } } } \right) }=\frac{\left\langle {\mathbf{Z}\left| \mathbf{v} \right\rangle } \right\rangle ^{2}}{\left\langle \mathbf{Z} \right\rangle \left\langle \mathbf{v} \right| \mathbf{Z}\left| \mathbf{v} \right\rangle }. \end{aligned}$$

From this result it is interesting to note that, as: \(r^{2}\le 1\); then, the following inequality will always hold:

$$\begin{aligned} \left\langle {\mathbf{Z}\left| \mathbf{v} \right\rangle } \right\rangle ^{2}\le \left\langle \mathbf{Z} \right\rangle \left\langle \mathbf{v} \right| \mathbf{Z}\left| \mathbf{v} \right\rangle . \end{aligned}$$

A squared Euclidian distance between both centroids can be also written as:

$$\begin{aligned} D^{2}=\left\langle {\rho _C \rho _C } \right\rangle +\left\langle {\sigma _C \sigma _C } \right\rangle -2\left\langle {\rho _C \sigma _C } \right\rangle =N^{-2}\left( {\left\langle \mathbf{Z} \right\rangle +\left\langle \mathbf{v} \right| \mathbf{Z}\left| \mathbf{v} \right\rangle -2\left\langle {\mathbf{Z}\left| \mathbf{v} \right\rangle } \right\rangle } \right) \end{aligned}$$

which provides another relationship which holds for any similarity matrix:

$$\begin{aligned} \left\langle \mathbf{Z} \right\rangle +\left\langle \mathbf{v} \right| \mathbf{Z}\left| \mathbf{v} \right\rangle \ge 2\left\langle {\mathbf{Z}\left| \mathbf{v} \right\rangle } \right\rangle . \end{aligned}$$

The associated variance function of any SQP can be written now in this context as:

$$\begin{aligned} \upsilon _S =N^{-1}\sum _I {\left( {\sigma _I -\sigma _C } \right) ^{2}} =N^{-1}\sum _I {\sigma _I^2 } -\sigma _C^2 =\sigma _C^{\left( 2 \right) } -\sigma _C^2 \end{aligned}$$
(6)

where: \(\sigma _C^{(2)} =N^{-1}{\sum \nolimits _I} {\sigma _I^2 }\).

The numerical SQP variance is easily written as the former DF one was, just obtaining the Minkowski norm of Eq. (6):

$$\begin{aligned} \left\langle {\upsilon _S } \right\rangle =N^{-1}\sum _I {\left\langle {\sigma _I^2 } \right\rangle } -\sigma _C^2 =N^{-1}\sum _I {S_{II} } -N^{-2}\sum _I {\sum _J {S_{IJ}}} \end{aligned}$$
(7)

where use is now made of the definition of the similarity shape matrix \(\mathbf{S}=\{S_{IJ}\}\):

$$\begin{aligned} \forall I,J:S_{IJ} =\left\langle {\sigma _I \sigma _J } \right\rangle =\int _D {\sigma _I \left( \mathbf{r} \right) \sigma _J } \left( \mathbf{r} \right) d\mathbf{r}, \end{aligned}$$

which can also be written as the matrix inward product [16]:

$$\begin{aligned} \mathbf{S}=\left| \mathbf{v} \right\rangle \left\langle \mathbf{v} \right| {*}\mathbf{Z}. \end{aligned}$$

Within such a definition one constructs first the tensor product:

$$\begin{aligned} \left| \mathbf{v} \right\rangle \left\langle \mathbf{v} \right| =\left\{ {\left[ {\left| \mathbf{v} \right\rangle \left\langle \mathbf{v} \right| } \right] _{IJ} =v_I^{-1} v_J^{-1} } \right\} , \end{aligned}$$

then the inward product is simply performed as the products of the elements of the involved matrices, in such a way that:

$$\begin{aligned} \forall I,J:S_{IJ} =\left[ {\left| \mathbf{v} \right\rangle \left\langle \mathbf{v} \right| } \right] _{IJ} Z_{IJ} =\nu _I^{-1} \nu _J^{-1} Z_{IJ} . \end{aligned}$$

Therefore, the expression of the Eq. (7) result can be also written compactly as:

$$\begin{aligned} \left\langle {\upsilon _S } \right\rangle =N^{-1}Tr\left( \mathbf{S} \right) -N^{-2}\left\langle \mathbf{S} \right\rangle \!, \end{aligned}$$
(8)

which constitutes an equivalent result as the one obtained in the QP case. This equivalence can be made self-evident, when comparing the Eq. (8) above with formerly described Eq. (5).

The numerical squared Euclidian distance of the shape centroid and any vertex can be written as:

$$\begin{aligned} \forall I:D_{IC}^2&= \left\langle {\left( {\sigma _I -\sigma _C } \right) ^{2}} \right\rangle =\left\langle {\sigma _I^2 } \right\rangle +\left\langle {\sigma _C^2 } \right\rangle -2\left\langle {\sigma _I \sigma _C } \right\rangle \\&= S_{II} +N^{-2}\left\langle \mathbf{S} \right\rangle -N^{-1}\left\langle {\left| {\mathbf{s}_I } \right\rangle } \right\rangle \end{aligned}$$

where \(\{|\mathbf{s}_I \rangle | I=1,N\}\) is the set of the columns of the shape similarity matrix \(\mathbf{S}\).

8 Discrete multimolecular polyhedra or point clouds

Recently, some discussion has been associated to Multimolecular Polyhedra (MP) [7]. They are geometrical constructs whose vertices are made not with quantum mechanical DF or the related ShF, but with discrete N-Dimensional (ND) vectors, constructed by N ordered parameters. In previous papers, see for example [1], MP were also named as point clouds.

The term MP is preferable than point cloud though, as it distinguishes quite clearly those polyhedra from the subject of the present paper: QP, while all of them can be considered as point clouds, as both MP and QP are related mathematical objects.

The relationship between them can be obtained easily from the similarity matrices directly obtained from the DF or ShF vertices of QP or SQP respectively. Indeed, it has been earlier discussed in several places [17, 18] the role of the columns of similarity matrices as projections associated to the QP functional vertex sets.

Even so, there is a large class of MP which do not need at all to have a quantum molecular origin. This is the case of MP whose vertices are just defined classically with ND vectors, whose elements are empirically and arbitrarily made with the so-called molecular descriptors, which are parameters extracted from diverse origins, even containing experimental values. In fact, any kind of MP, can be easily considered as the basis of QSPR or QSAR, see for example [8].

A classical MP can be defined by means of a set of ND vectors: \(X=\{|\mathbf{x}_I\rangle |I=1,N\}\) which act as vertices of the polyhedron. Then it is trivial to define the MP centroid vector as:

$$\begin{aligned} \left| {\mathbf{x}_C} \right\rangle =N^{-1}\sum _I {\left| {\mathbf{x}_I } \right\rangle } \end{aligned}$$

but the variance vector, which could be equivalent to Eqs. (2) and (6) in the functional QP or SQP spaces respectively, it is not so trivial to define. To do so, first the inward product of two vectors, for more details see for example reference [16], shall be defined in a similar manner as it has been done before with two matrices when dealing with SQP variance:

$$\begin{aligned} \left| \mathbf{p} \right\rangle =\left| \mathbf{a} \right\rangle {*}\left| \mathbf{b} \right\rangle \rightarrow \forall I:p_I =a_I b_I . \end{aligned}$$

It is easy to see that the inward product is symmetric. Also, when considering inward product of vectors, an interesting characteristic consists into the fact that the sum of the elements of the inward product vector is coincident with the scalar product of the involved vectors:

$$\begin{aligned} \left\langle {\left| \mathbf{p} \right\rangle } \right\rangle =\sum _I {p_I } =\left\langle {\mathbf{a}{*}\mathbf{b}} \right\rangle =\sum _I {a_I b_I =\left\langle \mathbf{a} | \mathbf{b} \right\rangle } . \end{aligned}$$

With this in mind, one can write a variance vector of a MP, which will be equivalent to the variance functions formerly described in QP. It is as simple as to write the variance vector as a sum of inward products, by means of the expression:

$$\begin{aligned} \left| {\mathbf{v}_X } \right\rangle =N^{-1}\sum _I {\left( {\left| {\mathbf{x}_I } \right\rangle -\left| {\mathbf{x}_C } \right\rangle } \right) } {*}\left( {\left| {\mathbf{x}_I } \right\rangle -\left| {\mathbf{x}_C } \right\rangle } \right) =\left| {\mathbf{x}^{\left[ 2 \right] }} \right\rangle -\left| {\mathbf{x}_C } \right\rangle ^{\left[ 2 \right] }. \end{aligned}$$
(9)

Equation (9) is easily obtained taking into account that the inward product is also distributive with respect vector addition. The supplementary definition has been also employed:

$$\begin{aligned} \left| {\mathbf{x}^{\left[ 2 \right] }} \right\rangle = N^{-1}\sum _I {\left( {\left| {\mathbf{x}_I } \right\rangle {*}\left| {\mathbf{x}_I } \right\rangle } \right) =}\,\, N^{-1}\sum _I {\left| {\mathbf{x}_I } \right\rangle ^{\left[ 2 \right] }} \end{aligned}$$

and accordingly one can also write:

$$\begin{aligned} \left| {\mathbf{x}_C } \right\rangle ^{\left[ 2 \right] }=\left| {\mathbf{x}_C } \right\rangle {*}\left| {\mathbf{x}_C } \right\rangle . \end{aligned}$$

Moreover, within MP the numerical variance might be written as the complete sum of the elements of both terms appearing into the vector variance, as defined in Eq. (9), that is:

$$\begin{aligned} v_X&= \left\langle {\left| {\mathbf{v}_X } \right\rangle } \right\rangle =\left\langle {\left| {\mathbf{x}^{\left[ 2 \right] }} \right\rangle } \right\rangle -\left\langle {\left| {\mathbf{x}_C } \right\rangle ^{\left[ 2 \right] }} \right\rangle =N^{-1}\sum _I {\left\langle {\mathbf{x}_I } | {\mathbf{x}_I } \right\rangle } -\left\langle {\mathbf{x}_C } | {\mathbf{x}_C } \right\rangle \\&= N^{-1}\sum _I {\left\langle {\mathbf{x}_I } | {\mathbf{x}_I } \right\rangle } -N^{-2}\sum _I {\sum _J {\left\langle {\mathbf{x}_I } | {\mathbf{x}_J } \right\rangle } } . \end{aligned}$$

Defining now the Gram matrix of the ND vertex set of the MP, as the ordered scalar products of its vertices:

$$\begin{aligned} \mathbf{X}=\left\{ {X_{IJ} =\left\langle {\mathbf{x}_I } | {\mathbf{x}_J } \right\rangle } \right\} , \end{aligned}$$

then the numerical equivalent of the QP and SQP variances, as defined in Eqs. (4) and (8), which can be now constructed in classical MP structures, can be written without problems as:

$$\begin{aligned} v_X =N^{-1}Tr\left( \mathbf{X} \right) -N^{-2}\left\langle \mathbf{X} \right\rangle . \end{aligned}$$

Therefore, taking the Gram matrix of the MP vertices as a ND version of the similarity matrices in DF or ShF spaces, all the considerations developed in the two previous functional QP and SQP cases are applicable in discrete ND classical MP.

9 Discussion

A QP has been defined as a set of DF associated to a set of QO, taken for instance as molecules. A simple definition of the vertex variance function of a QP leads to the numerical evaluation of the QP numerical variance, which can be set up in terms of the elements associated to any similarity matrix computed using the DF set.

It has been shown that such a QP numerical variance measures a global degree of dissimilarity between the whole set of QP DF vertices. It can be considered in general as a collective squared Euclidian distance between these QP vertices.

The same concept is valid in SQP, that is: QP defined over ShF instead of DF. Finally, the coherence of the mathematical and computational procedures appears when one realizes that the developed definitions and the derived algorithms, previously described here for QP and SQP, are also easily extensible to MP. Taking into account that MP are classical polyhedral structures made of ND discrete vectors, whose elements are molecular descriptors of any origin.

The whole set of results indicates that a collective distance can be in general easily described in metric vector spaces.

It has not been discussed the QP nature of the MO components of the DF in LCAO MO theory. It might be worth of a separate discussion, which will be done elsewhere.