1 Background

In the language of chemistry, molecules are built from atoms and functional groups [15]. Although the atoms and functional groups are deformed (or “promoted”) when they are combined, they nonetheless maintain their quiddity. This is why, for example, the periodic table, along with tables of atomic properties (like the electronegativity, hardness, and polarizability) [68], is essential to practicing chemists. This motivated the strategy first proposed by Nalewajski, Parr, and Ayers: define the electron density of an atom in a molecule to maximize its resemblance to the electron densities of the isolated reference atoms and atomic ions enshrined in the periodic table [912]. The electron density is chosen as the fundamental descriptor of atoms because of the Hohenberg–Kohn theorem [13, 14] and inspired by the pioneering work of Richard Bader [35, 15, 16]. The measure of “resemblance” between the atom-in-molecule’s density, \(\rho_{A} \left({\mathbf{r}} \right)\), and the isolated reference pro-atom’s density, \(\rho_{A}^{0} \left({\mathbf{r}} \right)\), was originally taken to be the Kullback–Leibler directed divergence [11],

$$I\left[{\rho_{A} \left| {\rho_{A}^{0}} \right.} \right] = \int {\rho_{A} \left({\mathbf{r}} \right)\ln \left({\frac{{\rho_{A} \left({\mathbf{r}} \right)}}{{\rho_{A}^{0} \left({\mathbf{r}} \right)}}} \right)\text{d}{\mathbf{r}}}.$$
(1)

The Kullback–Leibler divergence is just the Shannon information/entropy gain relative to a reference distribution and has been used in many chemical and biochemical contexts beyond the atomic partitioning problem [1773]. If one minimizes the total divergence of all atoms, subject to the constraint that the sum of all the atom-in-molecule densities is equal to the total molecular density,

$$\underbrace {\min}_{{\left\{{\rho_{A} \left({\mathbf{r}} \right)\left| {\sum\nolimits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)} = \rho_{\text{mol}} \left({\mathbf{r}} \right)} \right.} \right\}}}\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {I\left[{\rho_{A} \left| {\rho_{A}^{0}} \right.} \right]},$$
(2)

then one obtains the “stockholder” partitioning that Hirshfeld proposed on heuristic grounds in 1977 [74], and upon which many state-of-the-art atomic population methods are based [7589].

It was rapidly recognized that the Hirshfeld partitioning can also be obtained from more general directed and undirected divergences [12] including the non-extensive Tsallis entropy [90], generalized Hellinger-Bhattacharyya distances [91], and general f-divergences [92]. Reference [92] proves a strong result: any divergence measure that can be written as a local density functional,

$$I\left[{\rho_{A} \left| {\rho_{A}^{0}} \right.} \right] = \int {h\left({\rho_{A} \left({\mathbf{r}} \right),\rho_{A}^{0} \left({\mathbf{r}} \right)} \right)\text{d}{\mathbf{r}}}$$
(3)

and which furthermore satisfies

$$I\left[{\rho_{A} \left| {\rho_{A}^{0}} \right.} \right] > I\left[{\rho_{A}^{0} \left| {\rho_{A}^{0}} \right.} \right] = 0$$
(4)

whenever \(\rho_{A} \left({\mathbf{r}} \right) \ne \rho_{A}^{0} \left({\mathbf{r}} \right)\) for the electron densities with the same number of electrons,

$$\int {\rho_{A} \left({\mathbf{r}} \right)\text{d}{\mathbf{r}}} = N_{A} = N_{A}^{0} = \int {\rho_{A}^{0} \left({\mathbf{r}} \right)\text{d}{\mathbf{r}}},$$
(5)

and when minimized according to Eq. (2) gives the Hirshfeld partitioning is an f-divergence.

The goal of this paper is to consider divergence measures that cannot be written as local functionals of the electron density. Recall that all local functionals of the electron density can be written as

$$F_{\text{local}} \left[\rho \right] = \int {f\left({\rho \left({\mathbf{r}} \right)} \right)\text{d}{\mathbf{r}}}$$
(6)

where \(f\left(x \right):{\mathbb{R}}^{+} \to {\mathbb{R}}\) is an ordinary function. Equivalently, to evaluate the functional derivative of a local functional at a point, one needs to only know the electron density at that point,

$$\frac{{\delta F_{\text{local}} \left[\rho \right]}}{{\delta \rho \left({\mathbf{r}} \right)}} = \left. {\frac{{\text{d}f\left(x \right)}}{{\text{d}x}}} \right|_{{x = \rho \left({\mathbf{r}} \right)}}$$
(7)

In this paper, we will consider non-local functionals which are functions of local functionals,

$$G_{\text{non-local}} \left[\rho \right] = g\left({F_{\text{local}}^{\left(1 \right)} \left[\rho \right],F_{\text{local}}^{\left(2 \right)} \left[\rho \right], \ldots} \right)$$
(8)

Of particular interest are divergence measures that are based on non-extensive functionals for the thermodynamic entropy. (Entropy functionals which are non-local are inherently non-extensive.) These are not obviously f-divergences, and for this reason, it is interesting to assess whether or not the Hirshfeld partitioning is recovered. Specifically, we consider the directed divergence measures associated with the Tsallis divergence [90, 93, 94],

$$I_{T}^{\alpha} \left[{\left\{{\rho_{A}} \right\}\left| {\left\{{\rho_{A}^{0}} \right\}} \right.} \right] = \frac{1}{\alpha - 1}\left[{\int {\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\left({\frac{{\rho_{A} \left({\mathbf{r}} \right)}}{{\rho_{A}^{0} \left({\mathbf{r}} \right)}}} \right)^{\alpha - 1} \text{d}{\mathbf{r}}}} - \int {\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\text{d}{\mathbf{r}}}}} \right]$$
(9)

the Réyni divergence [9496],

$$I_{R}^{\alpha} \left[{\left\{{\rho_{A}} \right\}\left| {\left\{{\rho_{A}^{0}} \right\}} \right.} \right] = \frac{1}{\alpha - 1}\ln \left({\frac{{\int {\sum\nolimits_{A}^{{N_{\text{atoms}}}} {\left({\rho_{A} \left({\mathbf{r}} \right)\left({\frac{{\rho_{A} \left({\mathbf{r}} \right)}}{{\rho_{A}^{0} \left({\mathbf{r}} \right)}}} \right)^{\alpha - 1}} \right)\text{d}{\mathbf{r}}}}}}{{\int {\sum\nolimits_{A}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\text{d}{\mathbf{r}}}}}}} \right)$$
(10)

the Sharma–Mittal divergence [97101],

$$I_{\text{SM}}^{\alpha,\beta} \left[{\left\{{\rho_{A}} \right\}\left| {\left\{{\rho_{A}^{0}} \right\}} \right.} \right] = \frac{1}{\beta - 1}\left[{\left({\int {\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\left({\frac{{\rho_{A} \left({\mathbf{r}} \right)}}{{\rho_{A}^{0} \left({\mathbf{r}} \right)}}} \right)^{\alpha - 1} \text{d}{\mathbf{r}}}}} \right)^{{\frac{\beta - 1}{\alpha - 1}}} - \left({\int {\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\text{d}{\mathbf{r}}}}} \right)^{{\frac{\beta - 1}{\alpha - 1}}}} \right]$$
(11)

a recently proposed supraextensive divergence [102],

$$I_{\text{SE}}^{\alpha,\beta} \left[{\left\{{\rho_{A}} \right\}\left| {\left\{{\rho_{A}^{0}} \right\}} \right.} \right] = \frac{1}{\alpha - 1}\left[\begin{aligned} \left(\begin{array}{l} \int {\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\text{d}{\mathbf{r}}}} \hfill \\ + \frac{\beta - 1}{\alpha - 1}\ln \left({\frac{{\int {\sum\nolimits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\left({\frac{{\rho_{A} \left({\mathbf{r}} \right)}}{{\rho_{A}^{0} \left({\mathbf{r}} \right)}}} \right)^{\alpha - 1} \text{d}{\mathbf{r}}}}}}{{\int {\sum\nolimits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\text{d}{\mathbf{r}}}}}}} \right) \hfill \\ \end{array} \right)^{{\frac{\alpha - 1}{\beta - 1}}} \\ - \left({\int {\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\text{d}{\mathbf{r}}}}} \right)^{{\frac{\alpha - 1}{\beta - 1}}} \\ \end{aligned} \right]$$
(12)

and the very general family of H-divergences [103],

$$H_{{h,\varphi_{1},\varphi_{2}}} \left[{\left\{{\rho_{A}} \right\}\left| {\left\{{\rho_{A}^{0}} \right\}} \right.} \right] = h\left({\frac{{\int {\sum\nolimits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\varphi_{1} \left({\frac{{\rho_{A}^{0} \left({\mathbf{r}} \right)}}{{\rho_{A}^{{}} \left({\mathbf{r}} \right)}}} \right)\text{d}{\mathbf{r}}}}}}{{\int {\sum\nolimits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\varphi_{2} \left({\frac{{\rho_{A}^{0} \left({\mathbf{r}} \right)}}{{\rho_{A}^{{}} \left({\mathbf{r}} \right)}}} \right)\text{d}{\mathbf{r}}}}}}} \right)$$
(13)

These H-divergences are not a valid divergence measure for every choice for the functions φ 1(x), φ 2(x), and h(x). It suffices, however, for φ 1(x) to be convex with φ 1(1) = 0 (as for an f-divergence), φ 2(x) > 0, and h(x) to be monotonic (h′(x) > 0) with h(0) = 0.

There are further extensions (e.g., corresponding to position-dependent values, \(\alpha \left({\mathbf{r}} \right)\), for the parameter in Tsallis divergence) [103, 104], but we will choose not to explore those generalizations here. We also omit consideration of divergence measures that are invariant to coordinate rotations (e.g., the total Bregman divergence) [105107]. Finally, we note that divergence measures in Eqs. (9)–(13) are slightly different from the usual form of these divergence measures. This revision is needed because atomic electron densities are normalized to the number of electrons, while the traditional divergence measures only apply to probability distribution functions that are normalized to one.

The Tsallis divergence is known to be an f-divergence and, in particular, is closely related to the special type of f-divergences called the α-divergences [92, 94, 108111],

$$\begin{aligned} I_{f}^{\alpha} \left[{\left\{{\rho_{A}} \right\}\left| {\left\{{\rho_{A}^{0}} \right\}} \right.} \right] = \int {\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\left({\left({\frac{{\rho_{A} \left({\mathbf{r}} \right)}}{{\rho_{A}^{0} \left({\mathbf{r}} \right)}}} \right)^{\alpha - 1} - 1} \right)\text{d}{\mathbf{r}}}} \\ = \int {\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\left({\frac{{\rho_{A} \left({\mathbf{r}} \right)}}{{\rho_{A}^{0} \left({\mathbf{r}} \right)}}} \right)^{\alpha - 1} \text{d}{\mathbf{r}}}} - N_{\text{mol}} \\ \end{aligned}$$
(14)

where

$$N_{\text{mol}} = \int {\rho_{\text{mol}} \left({\mathbf{r}} \right)\text{d}{\mathbf{r}}} = \int {\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\text{d}{\mathbf{r}}}}$$
(15)

is the number of electrons in the molecule. For convenience, we have chosen a different normalization of the α-divergence from the usual form. While we regard Eq. (14) as merely a notational convenience, we note that in the absence of prefactors, I α f is not a valid divergence measure for 0 ≤ α ≤ 1, because it is not convex.

Specifically, the Tsallis divergence is proportional to the α-divergence [94]

$$I_{T}^{\alpha} = \frac{{I_{f}^{\alpha}}}{\alpha - 1},$$
(16)

Similarly, the Réyni divergence can be written as

$$I_{R}^{\alpha} = \frac{1}{\alpha - 1}\ln \left({1 + \frac{{I_{f}^{\alpha}}}{{N_{\text{mol}}}}} \right).$$
(17)

The α-divergence is also closely related to the Sharma–Mittal divergence,

$$I_{\text{SM}}^{\alpha,\beta} = \frac{{N_{\text{mol}}^{{\frac{\beta - 1}{\alpha - 1}}}}}{\beta - 1}\left[{\left({1 + \frac{{I_{f}^{\alpha}}}{{N_{\text{mol}}}}} \right)^{{\frac{\beta - 1}{\alpha - 1}}} - 1} \right],$$
(18)

and the supraextensive divergence,

$$I_{\text{SE}}^{\alpha,\beta} = \frac{{N_{\text{mol}}^{{\frac{\alpha - 1}{\beta - 1}}}}}{\alpha - 1}\left[{\left({1 + \frac{\beta - 1}{{N_{\text{mol}}^{{}} \left({\alpha - 1} \right)}}\ln \left({1 + \frac{{I_{f}^{\alpha}}}{{N_{\text{mol}}}}} \right)} \right)^{{\frac{\alpha - 1}{\beta - 1}}} - 1} \right].$$
(19)

Notice that the Réyni, Sharma–Mittal, and supraextensive divergences are functions of local functionals [cf. Eq. (8)]. They are therefore non-local density functionals, not f-divergences.

The goal of this paper is to show that these more general families of divergences, which are closely related to the α-divergence, give rise to Hirshfeld atoms. This extends the results of Ref. [92] and is the first time that the Hirshfeld partitioning has been obtained from non-local divergence functionals. This gives a more general approach to Hirshfeld-inspired atomic partitioning. As Verstraelen et al. [75] have noted, this also has potential applications in computational algorithms for electronic structure theory, e.g., density fitting [112116].

2 Non-extensive information models and Hirshfeld atoms

Suppose one is given an information loss function that has the general form,

$$I_{\text{gen}}^{\alpha} \left[{\left\{{\rho_{A}} \right\}\left| {\left\{{\rho_{A}^{0}} \right\}} \right.} \right] = g\left({N_{\text{mol}},I_{f}^{\alpha}} \right)$$
(20)

This form clearly encompasses and generalizes the Tsallis, Réyni, Sharma–Mittal, and supraextensive divergence measures. We then determine the atoms in molecule by the usual procedure,

$$\underbrace {\min}_{{\left\{{\rho_{A} \left({\mathbf{r}} \right)\left| {\sum\nolimits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)} = \rho_{\text{mol}} \left({\mathbf{r}} \right)} \right.} \right\}}}I_{\text{gen}}^{\alpha} \left[{\left\{{\rho_{A}} \right\}\left| {\left\{{\rho_{A}^{0}} \right\}} \right.} \right]$$
(21)

Introducing the constraint with a Lagrange multiplier, the Lagrangian is:

$$\varLambda \left[{\left\{{\rho_{A}} \right\}} \right] = g\left({N_{\text{mol}},I_{f}^{\alpha}} \right) - \int {\lambda \left({\mathbf{r}} \right)\left({\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)} - \rho \left({\mathbf{r}} \right)} \right)\text{d}{\mathbf{r}}}$$
(22)

and the stationary condition for the minimum is

$$0 = \frac{\delta \varLambda}{{\delta \rho_{B} \left({\mathbf{r}} \right)}} = \frac{\partial g}{{\partial N_{\text{mol}}}}\frac{{\delta N_{\text{mol}}}}{{\delta \rho_{B} \left({\mathbf{r}} \right)}} + \frac{\partial g}{{\partial I_{f}^{\alpha}}}\frac{{\delta I_{f}^{\alpha}}}{{\delta \rho_{B} \left({\mathbf{r}} \right)}} - \lambda \left({\mathbf{r}} \right)$$
(23)

where

$$\frac{{\delta N_{\text{mol}}}}{{\delta \rho_{B} \left({\mathbf{r}} \right)}} = 1$$
(24)
$$\frac{{\delta I_{f}^{\alpha}}}{{\delta \rho_{B} \left({\mathbf{r}} \right)}} = \left({\alpha - 1} \right)\left({\frac{{\rho_{B} \left({\mathbf{r}} \right)}}{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}} \right)^{\alpha - 1}$$
(25)

The equation can then be written

$$\lambda \left({\mathbf{r}} \right) - \frac{\partial g}{{\partial N_{\text{mol}}}} = \left({\alpha - 1} \right)\frac{\partial g}{{\partial I_{f}^{\alpha}}}\left({\frac{{\rho_{B} \left({\mathbf{r}} \right)}}{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}} \right)^{\alpha - 1}$$
(26)

As long as α ≠ 1 and ∂g/∂I α f  ≠ 0, this identity gives the key relation from which the Hirshfeld atom is derived, namely that \({{\rho_{B} \left({\mathbf{r}} \right)} \mathord{\left/{\vphantom {{\rho_{B} \left({\mathbf{r}} \right)} {\rho_{B}^{0} \left({\mathbf{r}} \right)}}} \right. \kern-0pt} {\rho_{B}^{0} \left({\mathbf{r}} \right)}}\) is the same for all atoms. (For example, it is sufficient to have a strictly monotonic \(g\left({N_{\text{mol}},I_{f}^{\alpha}} \right)\) with respect to I α f  > 0.) For the Tsallis, Réyni, and Sharma–Mittal divergences,

$$\frac{{\partial g_{T}}}{{\partial I_{f}^{\alpha}}} = \frac{1}{\alpha - 1} \ne 0$$
(27)
$$\frac{{\partial g_{R}}}{{\partial I_{f}^{\alpha}}} = \frac{1}{{\left({\alpha - 1} \right)\left({N_{\text{mol}} + I_{f}^{\alpha}} \right)}} \ne 0$$
(28)
$$\frac{{\partial g_{\text{SM}}}}{{\partial I_{f}^{\alpha}}} = \frac{{N_{\text{mol}}^{{\frac{\beta - \alpha}{\alpha - 1}}}}}{\alpha - 1}\left[{\left({1 + \frac{{I_{f}^{\alpha}}}{{N_{\text{mol}}}}} \right)^{{\frac{\beta - \alpha}{\alpha - 1}}}} \right] \ne 0$$
(29)

For the supraextensive entropy,

$$\frac{{\partial g_{\text{SE}}}}{{\partial I_{f}^{\alpha}}} = \frac{{N_{\text{mol}}^{{\frac{\alpha - 1}{\beta - 1}}}}}{\beta - 1}\left[{\left({N_{\text{mol}} + \frac{\beta - 1}{{N_{\text{mol}} \left({\alpha - 1} \right)}}\ln \left({1 + \frac{{I_{f}^{\alpha}}}{{N_{\text{mol}}}}} \right)} \right)^{{\frac{\alpha - \beta}{\beta - 1}}}} \right]\left({\frac{\beta - 1}{{\left({\alpha - 1} \right)\left({N_{\text{mol}}^{{}} + I_{f}^{\alpha}} \right)}}} \right)$$
(30)

Since this expression cannot be equal to zero, one must have β ≠ 1. In all these expressions, we have used the fact that I α f  ≥ 0, which presumes that the sum of the atomic densities and the sum of the reference pro-atomic densities have the same normalization.

For local divergence functionals, one sometimes uses the fact that the densities of the reference pro-atoms, \(\left\{{\rho_{A}^{0} \left({\mathbf{r}} \right)} \right\}_{A = 1}^{{N_{\text{atoms}}}}\), can be optimized to make the density of the so-called pro-molecule,

$$\rho_{\text{mol}}^{0} \left({\mathbf{r}} \right) = \sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A}^{0} \left({\mathbf{r}} \right)}$$
(31)

as close as possible to the density of the molecule, \(\rho_{\text{mol}} \left({\mathbf{r}} \right)\) [75, 92]. (This can remove the ambiguity associated with picking the reference pro-atoms.) This can also be done for these measures. To see this, notice that the key Hirshfeld criterion,

$$\frac{{\rho_{B} \left({\mathbf{r}} \right)}}{{\rho_{B}^{0} \left({\mathbf{r}} \right)}} = h\left({\mathbf{r}} \right),$$
(32)

for some function \(h\left({\mathbf{r}} \right)\), can be written as

$$\sum\limits_{B = 1}^{{N_{\text{atoms}}}} {\rho_{B} \left({\mathbf{r}} \right)} = h\left({\mathbf{r}} \right)\sum\limits_{B = 1}^{{N_{\text{atoms}}}} {\rho_{B}^{0} \left({\mathbf{r}} \right)}$$
(33)

Therefore

$$h\left({\mathbf{r}} \right) = \frac{{\rho_{B}^{{}} \left({\mathbf{r}} \right)}}{{\rho_{B}^{0} \left({\mathbf{r}} \right)}} = \frac{{\rho_{\text{mol}} \left({\mathbf{r}} \right)}}{{\rho_{\text{mol}}^{0} \left({\mathbf{r}} \right)}}$$
(34)

and Eq. (14) can be rewritten as

$$I_{f}^{\alpha} \left[{\rho_{\text{mol}} \left| {\rho_{\text{mol}}^{0}} \right.} \right] = \int {\rho_{\text{mol}} \left({\mathbf{r}} \right)\left({\frac{{\rho_{\text{mol}} \left({\mathbf{r}} \right)}}{{\rho_{\text{mol}}^{0} \left({\mathbf{r}} \right)}}} \right)^{\alpha - 1} \text{d}{\mathbf{r}}} = I_{f}^{\alpha} \left[{\left\{{\rho_{A}} \right\}\left| {\left\{{\rho_{A}^{0}} \right\}} \right.} \right]$$
(35)

where in Eqs. (33) and (35) we have used the constraint that the atom-in-molecule densities add up to the total molecular density. The pro-molecule density can therefore be optimized by the two-step procedure,

$$\underbrace {\min}_{{\left\{{\rho_{A}^{0} \left({\mathbf{r}} \right)} \right\}}}\underbrace {\min}_{{\left\{{\rho_{A} \left({\mathbf{r}} \right)\left| {\sum\nolimits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)} = \rho_{\text{mol}} \left({\mathbf{r}} \right)} \right.} \right\}}}I_{f}^{\alpha} \left[{\left\{{\rho_{A}} \right\}\left| {\left\{{\rho_{A}^{0}} \right\}} \right.} \right] = \underbrace {\min}_{{\left\{{\rho_{A}^{0} \left({\mathbf{r}} \right)} \right\}}}I_{f}^{\alpha} \left[{\rho_{\text{mol}} \left| {\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A}^{0}}} \right.} \right]$$
(36)

Identity (35) and the strategy in Eq. (36) clearly extend to any of the generalized α-divergences in this paper. While these formulas generalize the f-divergences considered in Ref. [92] somewhat, they do not contradict the results in that paper because these divergences are not local functionals of the electron density. Their generalizations are also not very consequential, since one still obtains the Hirshfeld partitioning. However, while the Tsallis and Réyni divergences give the same pro-atoms (because both objective functions are minimized when \(I_{f}^{\alpha} \left[{\rho_{\text{mol}} \left| {\rho_{\text{mol}}^{0}} \right.} \right]\) is made as small as possible), this is not necessarily true for the Sharma–Mittal and supraextensive divergences.

3 Atoms in molecules from H-divergences

The divergence measures we considered in the previous section are all based on non-extensive entropy formulas. The H-divergence formula in Eq. (13) generalizes these equations and also the f-divergence. For example, the H-divergence is an f-divergence (up to a choice of normalization) if φ 1(1) = 0, φ 1(x) is convex, φ 2(x) = 1, and h(x) = x.

As mentioned before, not every choice of functions in Eq. (13) is allowed. In this paper, we consider only H-divergences which satisfy the requirements:

  • h(x) is monotonically increasing, h′(x) > 0, and h(0) = 0.

  • φ 1(x) is convex, φ1(x) > 0, and φ 1(1) = 0.

  • φ 2(x) > 0.

This gives \(H_{{h,\varphi_{1},\varphi_{2}}} \left[{\left\{{\rho_{A}} \right\}\left| {\left\{{\rho_{A}^{0}} \right\}} \right.} \right] \ge H_{{h,\varphi_{1},\varphi_{2}}} \left[{\left\{{\rho_{A}^{0}} \right\}\left| {\left\{{\rho_{A}^{0}} \right\}} \right.} \right] = 0\) for densities with the same normalization, which is one of the essential properties of a divergence measure. The analogous H-divergence derivation of the Hirshfeld atoms-in-molecules partitioning is found by minimizing

$$\underbrace {\min}_{{\left\{{\rho_{A} \left({\mathbf{r}} \right)\left| {\sum\nolimits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)} = \rho_{\text{mol}} \left({\mathbf{r}} \right)} \right.} \right\}}}H_{{h,\varphi_{1},\varphi_{2}}} \left[{\left\{{\rho_{A}} \right\}\left| {\left\{{\rho_{A}^{0}} \right\}} \right.} \right]$$
(37)

with the solution

$$\lambda \left({\mathbf{r}} \right) = \frac{1}{{G_{{\varphi_{2}}}^{2}}} \cdot h^{\prime}\left({\frac{{G_{{\varphi_{1}}}}}{{G_{{\varphi_{2}}}}}} \right) \cdot \left[{G_{{\varphi_{2}}} \frac{{\delta G_{{\varphi_{1}}}}}{{\delta \rho_{B} \left({\mathbf{r}} \right)}} - G_{{\varphi_{1}}} \frac{{\delta G_{{\varphi_{2}}}}}{{\delta \rho_{B} \left({\mathbf{r}} \right)}}} \right]$$
(38)

where we have defined the convenient notation,

$$\begin{aligned} G_{{\varphi_{1}}} = \int {\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\varphi_{1} \left({\frac{{\rho_{A}^{0} \left({\mathbf{r}} \right)}}{{\rho_{A}^{{}} \left({\mathbf{r}} \right)}}} \right)\text{d}{\mathbf{r}}}} \\ G_{{\varphi_{2}}} = \int {\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A} \left({\mathbf{r}} \right)\varphi_{2} \left({\frac{{\rho_{A}^{0} \left({\mathbf{r}} \right)}}{{\rho_{A}^{{}} \left({\mathbf{r}} \right)}}} \right)\text{d}{\mathbf{r}}}}, \\ \end{aligned}$$
(39)

Note that by requiring that φ 1(x) is a convex function with φ 1(1) = 0, we ensure that \(G_{{\varphi_{1}}}\) is an f-divergence. \(G_{{\varphi_{2}}}\) is not an f-divergence, but a type of normalization factor. Possible choices include φ 2(x) = x α (0 ≤ α ≤ 1), φ 2(x) = x/(x + 1), φ 2(x) = ln (x + 1), φ 2(x) = tanh (x). All of these functions are concave for x ≥ 0, φ2(x) < 0. This is not required for H-divergence to be a valid divergence measure, but later it will turn out to be useful.

Inserting the functional derivatives,

$$\begin{aligned} \frac{{\delta G_{{\varphi_{1}}}}}{{\delta \rho_{B} \left({\mathbf{r}} \right)}} &= \varphi_{1} \left({\frac{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}{{\rho_{B}^{{}} \left({\mathbf{r}} \right)}}} \right) - \frac{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}{{\rho_{B}^{{}} \left({\mathbf{r}} \right)}}\varphi^{\prime}_{1} \left({\frac{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}{{\rho_{B}^{{}} \left({\mathbf{r}} \right)}}} \right) \\ \frac{{\delta G_{{\varphi_{2}}}}}{{\delta \rho_{B} \left({\mathbf{r}} \right)}} &= \varphi_{2} \left({\frac{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}{{\rho_{B}^{{}} \left({\mathbf{r}} \right)}}} \right) - \frac{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}{{\rho_{B}^{{}} \left({\mathbf{r}} \right)}}\varphi^{\prime}_{2} \left({\frac{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}{{\rho_{B}^{{}} \left({\mathbf{r}} \right)}}} \right) \\ \end{aligned}$$
(40)

into Eq. (38), we obtain the expression

$$\lambda \left({\mathbf{r}} \right) = h^{\prime}\left({\frac{{G_{{\varphi_{1}}}}}{{G_{{\varphi_{2}}}}}} \right)\left(\begin{aligned} \frac{1}{{G_{{\varphi_{2}}}}} \cdot \left({\varphi_{1} \left({\frac{{\rho_{B} \left({\mathbf{r}} \right)}}{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}} \right) - \frac{{\rho_{B} \left({\mathbf{r}} \right)}}{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}\varphi^{\prime}_{1} \left({\frac{{\rho_{B} \left({\mathbf{r}} \right)}}{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}} \right)} \right) \\ - \frac{{G_{{\varphi_{1}}}}}{{G_{{\varphi_{2}}}^{2}}} \cdot \left({\varphi_{2} \left({\frac{{\rho_{B} \left({\mathbf{r}} \right)}}{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}} \right) - \frac{{\rho_{B} \left({\mathbf{r}} \right)}}{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}\varphi^{\prime}_{2} \left({\frac{{\rho_{B} \left({\mathbf{r}} \right)}}{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}} \right)} \right) \\ \end{aligned} \right)$$
(41)

Equation (32), which leads to the Hirshfeld partitioning, is a solution to this equation. However, it may not be the only solution. In general, Eq. (41) gives an equation relating the densities of every atom pair in the molecule,

$$g\left({\frac{{\rho_{A} \left({\mathbf{r}} \right)}}{{\rho_{A}^{0} \left({\mathbf{r}} \right)}}} \right) = g\left({\frac{{\rho_{B} \left({\mathbf{r}} \right)}}{{\rho_{B}^{0} \left({\mathbf{r}} \right)}}} \right)$$
(42)

where

$$g\left(x \right) = \frac{1}{{G_{{\varphi_{2}}}^{{}}}}\left({\varphi_{1} \left(x \right) - x\varphi^{\prime}_{1} \left(x \right)} \right) - \frac{{G_{{\varphi_{1}}}}}{{G_{{\varphi_{2}}}^{2}}}\left({\varphi_{2} \left(x \right) - x\varphi^{\prime}_{2} \left(x \right)} \right) .$$
(43)

If g(x) is invertible for x ≥ 0, then the unique solution to Eq. (42) is

$$\frac{{\rho_{A} \left({\mathbf{r}} \right)}}{{\rho_{A}^{0} \left({\mathbf{r}} \right)}} = \frac{{\rho_{B} \left({\mathbf{r}} \right)}}{{\rho_{B}^{0} \left({\mathbf{r}} \right)}},$$
(44)

which leads to the Hirshfeld partitioning. If we assume that all the functions are at least twice differentiable, it is sufficient that g(x) be monotonic. Therefore, for x > 0,

$$g^{\prime}\left(x \right) = \frac{x}{{G_{{\varphi_{2}}}^{2}}}\left({G_{{\varphi_{2}}} \varphi^{\prime\prime}_{1} \left(x \right) - G_{{\varphi_{1}}} \varphi^{\prime\prime}_{2} \left(x \right)} \right) > 0$$
(45)

The conditions stipulated at the beginning of this section are almost sufficient to satisfy this equation because they ensure that \(G_{{\varphi_{2}}}\) is positive, that φ1(x) are positive, and that \(G_{{\varphi_{1}}}\) is nonnegative. If we further require φ2(x) to be non-positive, then the Hirshfeld partitioning is the unique solution to variational procedure (37). These conditions also suffice to derive the analogue of the identity in Eq. (35), namely that for the atom-in-molecule densities obtained from Eq. (37),

$$I_{{h,\varphi_{1},\varphi_{2}}} \left[{\left\{{\rho_{A}} \right\}\left| {\left\{{\rho_{A}^{0}} \right\}} \right.} \right] = I_{{h,\varphi_{1},\varphi_{2}}} \left[{\rho_{\text{mol}} \left| {\rho_{\text{mol}}^{0}} \right.} \right] = h\left({\frac{{\int {\rho_{\text{mol}} \left({\mathbf{r}} \right)\varphi_{1} \left({\frac{{\rho_{\text{mol}}^{0}}}{{\rho_{\text{mol}} \left({\mathbf{r}} \right)}}} \right)\text{d}{\mathbf{r}}}}}{{\int {\rho_{\text{mol}} \left({\mathbf{r}} \right)\varphi_{2} \left({\frac{{\rho_{\text{mol}}^{0}}}{{\rho_{\text{mol}} \left({\mathbf{r}} \right)}}} \right)\text{d}{\mathbf{r}}}}}} \right) .$$
(46)

4 Summary

This work was initiated when our numerical investigations revealed that optimizing the pro-atoms,

$$\underbrace {\min}_{{\left\{{\rho_{A}^{0} \left({\mathbf{r}} \right)} \right\}}}I\left[{\rho_{\text{mol}} \left| {\sum\limits_{A = 1}^{{N_{\text{atoms}}}} {\rho_{A}^{0}}} \right.} \right],$$
(47)

gave the same results for the Tsallis and Réyni divergences. We were also surprised that the Réyni divergence, even though it is not an f-divergence, gave back the Hirshfeld partitioning. This led us to explore what other sorts of non-local divergence measures would recover the Hirshfeld partitioning. This paper reports the results of that exploration.

Reference [92] shows that the only local density functionals that lead to the popular Hirshfeld partitioning are f-divergences. This paper explores divergences that are non-local density functionals but which also give the Hirshfeld partitioning. In particular, we observe that the Réyni, Sharma–Mittal, and supraextensive divergence measures all give the Hirshfeld partitioning. Moreover, all of these functionals are very closely linked to the α-divergence. This is desirable insofar that it ensures that these measures are closely linked to a very popular and useful family of divergence measures, but it is undesirable insofar as it means that optimizing the pro-atom densities [using Eq. (47)] does not give significantly different results for these approaches.

The H-divergence in Eq. (13) is much more general. While it is difficult to find necessary conditions for the H-divergence that gives the Hirshfeld atom, it is sufficient to require the following properties for x > 0:

  • h(x) is monotonically increasing, h′(x) > 0. Also h(0) = 0.

  • φ 1(x) is convex, φ1(x) > 0. Also φ 1(1) = 0. This is the same as the requirements for an f-divergence.

  • φ 2(x) > 0 and is non-convex, φ2(x) ≤ 0.

Note that this family of H-divergences is closely related to the f-divergences, but extends that set in a non-trivial way. We anticipate that the family of H-divergences could be used to define new, and more effective, alternatives to variational Hirshfeld-based methods like the minimal basis iterated stockholder (MBIS) partitioning [75].

While our primary interest in divergence measures is motivated by the problem of atomic partitioning and, more generally, fitting molecular densities [cf. Eq. (35)], the mathematical tools presented here are suitable for measuring the divergence between other probability distribution functions that arise in quantum chemistry. For example, there has been a significant recent interest in approaches that use the shape function [90, 117119], instead of the electron density, to describe chemical phenomena [70, 120123].