Introduction

Electronic population analyses (PA) represent a powerful means to connect the outputs of quantum chemistry computations to the chemical knowledge. Atomic charges may be useful to rationalize chemical reactivity, for example to investigate substituent effects on an energy profile. Atomic charges and higher order multipoles are also key quantities for molecular mechanics force fields. The possibility to extract reliable multipoles from quantum chemistry computations is actually a topic of high interest for the development of second or third generation force fields and hybrid QM/MM schemes [13]. Various quantum chemistry methodologies also rely on atomic charges. For the sake of illustration let us mention constrained density functional theory (DFT) whereby atomic charges are imposed to some molecular fragments in order to define diabatic states at the DFT level [4, 5]. Such diabatic states proved to be useful for modeling electron transfer processes or charge transfers within non-covalent complexes [6].

It is actually well-known that there is no unique way to define an atom in a polyatomic molecule, and hence to define atomic charges. This is due to the fact that atomic charges are not physical observables. Despite this ambiguity of atomic charges, atom centered multipole expansions are rather reliable if the problem of atomic multipole invariance is properly addressed. A convenient way to do so is the definition of cumulative atomic multipole moments [7, 8]. These moments are built up from atomic charges, which are invariant to coordinate transformation. Because the convergence of the atom centered multipole expansion depends on the quality of the underlying atomic charges [9], population analyses are important ingredients for the development of atom centered multipole expansions with only a limited number of expansion terms. A critical point to any population scheme is to divide the real or function space in order to distribute the electron density over the atoms. It is customary to classify population schemes along various categories. The Mulliken [10, 11] and Löwdin [12] approaches as well as the more elaborated natural population analysis [13] or natural bonding orbital (NBO) [14] analysis define atomic charges by division of the function space spanned by the atomic or molecular orbitals. Another family encompasses the Becke [15], Hirshfeld [16], and Voronoi deformation density [17] approaches. These schemes are based on a real space partitioning of the electron density itself. On-going efforts are deployed by several groups to improve the quality of the charges produced by these approaches [1820]. For example iterative schemes have been proposed such as the Hirshfeld-I [21], Hirshfeld-λI [22], the fractional occupation Hirshfeld-I [23] or the Stockholder-I methods [24]. We also mention the methods where the charges are obtained through the integration of the electronic density over topological basins of well suited functions [25]. These functions may be the electronic density (the Bader’s atoms-in-molecules theory [26]) or the electron localization function [25]. For the sake of completeness, we finally mention methods like the Merz-Singh-Kollman scheme that consists in the fit of the atomic charges so as to reproduce the electrostatic potential created by the molecule [2729]. These methods have been abundantly employed to develop molecular mechanics force fields like AMBER [30] or CHARMM [31].

In this paper, we focus on methods based on the real space integration of the density in the context of DFT. These calculations have become the most used quantum chemistry approaches because they provide an excellent cost/quality ratio. Indeed, modeling systems containing up to a few hundreds of atoms are nowadays accessible by DFT. These performances have become possible thanks to the development of ingenious algorithms for solving the Kohn-Sham equations. In particular, methods resorting on fitted densities besides the Kohn-Sham density (e.g., variational density fitting [32, 33] and Cholesky decomposition of the density matrix [34]) permit the elimination of the cumbersome evaluation of four-center electron repulsion integrals.

Performing population analyses on optimized electronic densities is usually not a CPU demanding task compared to the self-consistent-field (SCF) procedure. However, if iterative schemes are employed this task can be a computational bottleneck. This limitation can become critical for systems comprising tens or hundreds of atoms, as commonly investigated by nowadays DFT approaches. One possible strategy to overcome these potential limitations is to develop more efficient parallelization and/or grid techniques. An alternative, although not exclusive, approach is to perform analyses based on auxiliary function densities instead of Kohn-Sham densities. Even though this procedure seems rather straightforward, it must be taken into account that auxiliary densities, as obtained in density fitting approaches, are not designed to mimic the Kohn-Sham orbital density, but to provide a density from which a mathematically simpler electron-electron repulsion energy term will be calculated, avoiding explicit four-center-integrals. There is no guarantee that auxiliary densities can be used in lieu of Kohn-Sham densities in population analyses.

The objectives of this paper are twofold. First, we wish to assess whether auxiliary densities are suited for population analyses. By this, we mean extracting not only monopoles (atomic charges) but also atomic dipoles and quadrupoles. We have considered large sets of organic molecules in our tests. Second, we wish to investigate the capabilities of various populations schemes, including some of the most advanced to produce electrostatic multipoles that may be used for interaction energy calculations. Reliable schemes would then be valuable for second or third generation force fields or for accurate hybrid DFT/MM (molecular mechanics) approaches. As an example, relative energies of different structures between a tryptamine molecule, a water molecule and a sodium cation are computed with the AMOEBA [35, 36] force field using multipoles extracted from DFT population analyses.

The article is organized in three parts. We first present the new population schemes we have introduced in deMon2k. We then report extensive benchmark calculations on sets of organic molecules. We finally report the performances of the population schemes to produce atomic multipoles that can be used in the AMOEBA force field to reproduce structures and non-covalent energies.

Population schemes

The population schemes described thereafter have been implemented in a new version of the program deMon2k [37]. They can be classified into two categories. The first category refers to population analyses that define atomic charges from the number of electrons belonging to each atom and its nuclear charge. The charge on atom A reads:

$$ {Q}_A={Z}_A-{N}_A $$
(1)

where N A represents the number of electrons on atom A and Z A its nuclear charge. In the second category the charge is defined from a deformation density between the converged SCF density and a reference density.

$$ {Q}_A={N}_A-{N}_A^{ref} $$
(2)

Typically N A and N ref A refer to the number of electrons of atom A from the SCF electronic density and the so-called promolecular density which is the superposition of non-interacting atomic densities. The latter scheme will be called deformation density analyses. In both approaches, N A is obtained by numerical integration of the electronic density over a grid of points

$$ {N}_A={\displaystyle \sum_i}\rho \left({r}_i\right){\omega}_q\left({r}_i\right){\omega}_A\left({r}_i\right) $$
(3)

where the index i loops over grid points. We have chosen Lebedev grids for the angular integration in combination with an Euler-MacLaurin radial quadrature scheme. The here used grids are identical to the default fixed grids in deMon2k for coarse, medium, and fine integration accuracy. In the above formula ω q collects all quadrature weights, angular and radial ones, whereas ω A is an atomic weight function for the real space partition into atomic cells. Five variants have been implemented in deMon2k. These are the Voronoi (V), Becke (B) [15], Hirshfeld (H) [16], iterative Hirshfeld (IH) [21] and, finally, the iterative Hirshfeld with fractional occupations numbers (IHFO) [23] partition schemes. The Voronoi cell of atom A is defined by all grid points that are closer to nucleus A than from any other atom. Therefore, the ω A function takes the form:

$$ {\omega}_A^V\left({r}_i\right)=1\kern0.75em \mathrm{if}\kern0.5em \left|{r}_i-{r}_A\right|<\left|{r}_i-{r}_X\right|\ \forall\ X\ne A $$
(4)
$$ {\omega}_A^V\left({r}_i\right)=0\kern0.5em \left|{r}_i-{r}_A\right|>\left|{r}_i-{r}_X\right|\ \forall\ X\ne A $$
(5)

r i , r A and r X are the positions of grid point i and of nuclei A and X, respectively. The Voronoi scheme renders a space division by non-overlapping polyhedrons. The Becke atomic cells are defined from the Voronoi cells by making them slightly overlapping [15]. This is achieved by introducing a smoothing function to define fuzzy borders of cells.

$$ {\omega}_A^B\left({r}_i\right)=\frac{P_A\left({r}_i\right)}{{\displaystyle {\sum}_X}{P}_X\left({r}_i\right)} $$
(6)

The cell functions P A (r i ) and P X (r i ) are defined by:

$$ {P}_A\left({r}_i\right)={\displaystyle \prod_{B\ne A}} s\left({\mu}_{A B}\right) $$
(7)

.

The “soft” step function s(μ AB ) is obtained by a threefold iteration of the polynom \( p\left({\mu}_{AB}\right)=\frac{3}{2}{\mu}_{AB}-\frac{1}{2}{\mu}_{AB}^3 \). The here appearing elliptic coordinate,

$$ {\mu}_{A B}=\frac{r_A-{r}_B}{R_{A B}}, $$

is defined in the local coordinate system of the atom pair A and B as depicted in Fig. 1

Fig. 1
figure 1

Elliptic coordinate definition for Becke weight calculation

For atom A ω B A equals unity close to the nuclei but it rapidly drops to zero when approaching the border of the Voronoi cell of the atom. Both the Voronoi and Becke schemes are based on geometrical considerations only. The chemical nature of the atoms composing the molecule of interest never enters into the definition of the atom cells, and then into the definition of the atomic charges. As a consequence, these population schemes may produce atomic charges that are not satisfactory from a chemical point of view. For example, charges on hydrogen atoms typically take values around -0.5, simply because the Voronoi/Becke cells of hydrogen atoms expand to half the length of the bonds in which they are engaged. The H, IH, and IHFO schemes constitute improvements in that regard. For these three schemes, the integration weights are functions of atomic reference densities, ρ ref A .

$$ {\omega}_A^H\left({r}_i\right)=\frac{\rho_A^{r ef}\left({r}_i\right)}{{\displaystyle {\sum}_X}{\rho}_X^{r ef}\left({r}_i\right)} $$
(8)

The denominator in Eq. (8), \( {\displaystyle {\sum}_X}{\rho}_X^{ref} \), defines the so-called promolecular density. There is some liberty to define the ρ ref X functions. In the standard Hirshfeld scheme, ρ ref X are the densities of neutral atoms. Note that other choices are acceptable, for example ρ ref X may be the densities of isolated ions. The non-uniqueness of reference density is actually a drawback of the standard Hirshfeld scheme. In deMon2k, ρ ref X are obtained by performing SCF calculations of spherically averaged neutral atoms. The ω H A will be close to unity near atom A but will progressively decay to zero when approaching other nuclei. Note that another drawback of the standard Hirshfeld partition is that atomic charges are generally close to zero.

To alleviate the inconvenience of the standard scheme, iterative variants have been proposed [21]. In the IH scheme, one chooses ρ ref X to be the density of an isolated atom X holding the same number of electrons N X as the atom in the molecule (thereafter denoted \( {\rho}_X^{ref,{N}_X} \)). In other words the ω IH A function, hence the Hirshfeld cell of atom A, is adjusted iteratively so that both the reference atom and the corresponding atom in the molecule have the same number of electrons. This procedure has been shown to minimize the loss of information when defining an atom in a molecule according to the Shanon theory of information [21]. In the original article of the IH scheme, the authors proposed to define \( {\rho}_X^{ref,{N}_X} \) by interpolation between electronic densities of isolated ions, the electron numbers of which bracket N X .

$$ {\rho}_X^{ref,{N}_X}={\rho}_X^{fint\left({N}_x\right)}\left[ cint\left({N}_x\right)-{N}_x\right]+{\rho}_X^{cint\left({N}_x\right)}\left[{N}_x- fint\left({N}_x\right)\right] $$
(9)

In this expression, fint(N x ) (resp. cint(N x )) is the largest (resp. smallest) integer less (resp. greater) than or equal to N x . Alternatively \( {\rho}_X^{ref,{N}_X} \) can be obtained by running a SCF calculation for an ion holding N x electrons. Note that N x is usually a non-integer number. Both variants have been tested in deMon2k and showed to give very similar atomic charges. We finally only kept the second variant based on atomic SCF calculations with non-integer electron numbers because of its simple straightforward definition.

The IHFO scheme represents an extension of the IH scheme in which both alpha (ρ α) and beta (ρ β) densities are integrated separately [23]. Accordingly, N σ A , ω σ A and \( {\rho}_A^{ref,{N}_A^{\sigma}} \) become spin-specific. The reference ionic densities are obtained as for the IH scheme by running SCF calculations in which the number of both alpha and beta numbers of electrons are imposed. Now the reference atom and the corresponding atom in the molecule have the same charges and spin charges. For closed-shell molecules, the IH and IHFO schemes obviously are identical but they should produce different charges for open-shell systems. Another alternative for defining ρ ref A in Eq. (7) is the IHDO-D scheme where the atomic dipoles are further imposed in the iterative procedure. We leave the introduction of the IHDO-D scheme in deMon2k for future work. Finally, we mention that ρ ref A may also be built from the densities of reference molecular fragments, as shown for example in [17]. We have already described such an implementation in deMon2k [6].

Once atomic cells have been defined according to any of the partition schemes defined above, atomic charges are easily computed with Eqs. (1) or (2). Now, higher order moments can be defined based on the atomic cells. For example, the components of the "intrinsic" atom dipoles (μ) and quadrupoles (Θ) can be calculated by:

$$ {\mu}_{\alpha}^A={\displaystyle \sum_i}\left({r}_{i,\alpha}-{r}_{A,\alpha}\right)\rho \left({r}_i\right){\omega}_q\left({r}_i\right){\omega}_A\left({r}_i\right) $$
(10)
$$ {\Theta}_{\alpha \beta}^A={\displaystyle \sum_i}\left({r}_{i,\alpha}-{r}_{A,\alpha}\right)\left({r}_{i,\beta}-{r}_{A,\beta}\right)\rho \left({r}_i\right){\omega}_q\left({r}_i\right){\omega}_A\left({r}_i\right) $$
(11)

where r A,α are the components of the position vector of nucleus A.

All the electronic densities that have been introduced above are obtained from the Kohn-Sham molecular orbitals (MOs). In deMon2k, these MOs are expanded within the LCGTO approximation (linear combination of Gaussian-type orbitals). The corresponding density is given as:

$$ \rho (r)={\displaystyle \sum_{\mu, \nu}}{P}_{\mu \nu}\mu (r)\nu (r) $$
(12)

where P μν is an element of the density matrix and μ, ν represent GTOs. Greek letters are used as indexes and also to label the GTOs. In deMon2k auxiliary densities (denoted by \( \overset{\sim }{\rho} \)) are also introduced to reduce the scaling of the calculation of the Coulomb interaction. The auxiliary density, \( \overset{\sim }{\rho} \), is expanded as a linear combination of auxiliary functions \( \overline{k} \) :

$$ \overset{\sim }{\rho}(r)={\displaystyle \sum_{\overline{k}}}{x}_{\overline{k}}\overline{k}(r) $$
(13)

where \( {x}_{\overline{k}} \) are the so-called Coulomb fitting coefficients. In deMon2k, the \( \overline{k} \) are primitive Hermite GTOs [38]. The coefficients \( {x}_{\overline{k}} \) are obtained from the variational fitting of the Coulomb potential as proposed by Dunlap [32, 33]. Because we have at hand such auxiliary densities in DFT calculation in deMon2k, we may expect them to be valuable to perform population analysis in lieu of the Kohn-Sham density. The fitted density may be used to calculate the number of electrons for each atom by replacing ρ with \( \overset{\sim }{\rho} \) in Eqs. (1) or (2). It can also be used to calculate the integration weights involved in the Hirshfeld schemes. In both cases, one should expect an important saving of computer time since the Kohn-Sham density is expressed as a sum of products of atomic orbitals whereas the fitted density is a simple linear combination of auxiliary functions. Note that the number of atomic orbital products greatly exceeds the number of auxiliary functions which are typically 3 to 5 times the number of basis functions. Thus, significant computational savings can be expected. Approximating the Hirshfeld weights using the fitted density is certainly less dramatic than integrating this density itself instead of the Kohn-Sham density. In our implementation the Hirshfeld weights are always calculated with \overset{\sim}{\rho} while the liberty is left to the user to integrate either the Kohn-Sham, ρ, or fitted, \( \overset{\sim }{\rho} \), densities. To conclude this section, we stress that for deformation density analyses, although the fitted reference densities, \( {\overset{\sim }{\rho}}_A^{ref} \), are used to calculate the integration weights, the Kohn-Sham reference densities, ρ ref A , are used to calculate N ref A .

Accuracy of population analyses

In this section, we assess the accuracy of population analysis performed from the auxiliary function density within the Becke, Hirshfeld (standard and iterative variant), and Voronoi deformation density (VDD). To this end we consider two sets of molecules. The first one is an ensemble of 66 organic molecules relevant to biological structures (thereafter referred as S66). It contains C, H, N, and O atoms. The S66 set of molecules has been reported recently by Řezác et al. in the context of benchmarking computations of interaction energies by quantum chemistry methodologies [39, 40]. Although the present paper is not devoted to this topic, the S66 set still provides a valuable ensemble of organic molecules to test our population analysis implementation. The second set contains 40 halogenated organic molecules, also provided by Řezác and Hobza [41]. In total 105 organic molecules are considered. These test sets encompass 987 H, 584 C, 70 N, 96 O, 2 S, and 74 halogen (X) atoms. We used the DZVP-GGA (double zeta with valence polarization functions, calibrated for generalized-gradient-approximation functionals) [42] basis set and the PBE exchange correlation functional [43]. The XC energy and potential have been integrated numerically on an adaptive grid of medium accuracy [44]. The auxiliary density has been used to compute both the classical Coulomb and XC potential following the so-called auxiliary DFT (ADFT) framework [45]. Various auxiliary basis sets have been considered. Auxiliary basis sets are generated by an automatic procedure implemented in deMon2k that depends on the atomic orbital basis set. The GEN-An auxiliary function sets contain groups of auxiliary functions with s and spd angular momenta. The index n determines the number of auxiliary function sets, i.e., the number of these sets increase with increasing n [42]. We have considered the GEN-A1, GEN-A2, and GEN-A3 auxiliary function sets, as well as the GEN-A2* that is supplemented by f and g auxiliary functions. Numerical integrations involved in population analysis have been carried out with fixed grids of medium accuracy. For the iterative schemes the iterations were pursued until the root-mean-square error was below 10-5.

We first consider atomic charges obtained by four population schemes. We report the mean unsigned error (MUE) and the maximum error (MAXERR) between atomic charges obtained by analyzing the Kohn-Sham and the auxiliary density in Figs. 2 and 3, respectively. For simplicity, we will refer to them as the KS (BASIS) and auxiliary (AUXIS) charges. The calculations are repeated for four sets of auxiliary functions. With the Hirshfeld schemes the differences between the KS and auxiliary charges decrease when going from GEN-A1 to GEN-A2. Passing to GEN-A3 does not guarantee a better convergence. For the Becke and VDD schemes none of the GEN-A1, -A2 or -A3 auxiliary function sets allow to match the atomic charges obtained by integration of the KS density. Similar conclusions can be drawn for the maximum errors (Fig. 3). Note that the maximum errors with the GEN-A1 auxiliary function set can be quite large (0.3 e-). For any of the four population schemes investigated here, it is the addition of angular flexibility in the auxiliary function set (i.e., GEN-A2*) that enables a significant decrease of the MUE and of the maximum error. In conclusion the GEN-A2* auxiliary function set seems to offer an excellent accuracy close to 0.01 e- of the fitted density analysis compared to the KS density. On the other hand, auxiliary function sets comprising only s and spd sets (GEN-An, n = 1, 2 or 3) should only be used in population analyses of the fitted electronic densities if qualitative results are aimed.

Fig. 2
figure 2

Mean unsigned error (MUE) between atomic charges (in atomic units) of the fitted density with respect to KS density analysis for four populations schemes. The symbol X refers to F, Cl, Br, and I atoms. Four population schemes are investigated (Becke, VDD, standard Hirshfeld, and iterative Hirshfeld) in combination with four auxiliary basis sets (GEN-A1, GEN-A2, GENA3, GEN-A2*). Please note that different scales are used for the graphs

Fig. 3
figure 3

Maximum error between atomic charges obtained through the analyses of the auxiliary density and the Kohn-Sham density. The symbol X refers to F, Cl, Br, and I atoms. Four population schemes are investigated (Becke, VDD, standard Hirshfeld, and iterative Hirshfeld) in combination with four auxiliary basis sets (GEN-A1, GEN-A2, GENA3, GEN-A2*). Please note that different scales are used for the graphs

We now turn to the analysis of the intrinsic dipole moments. In Fig. 4, we report the root-mean-square-deviation between the norms of the intrinsic dipole moments obtained with the AUXIS and BASIS approaches. In Fig. 5, we report the angles between the dipoles obtained with the two approaches. The atomic dipoles obtained with GEN-A1 are clearly not reliable. In particular, the orientation of the dipoles obtained from the integration of ρ or of \( \overset{\sim }{\rho} \) can be extremely different (see Fig. 5). The situation is largely improved with GEN-A2 or GEN-A3 both in term of the norms and orientations of the dipole moments. When using GEN-A2* the comparison is, as for atomic charges, much more satisfactory. In most cases the RMSD between the dipoles obtained with both approaches is below 0.01 D, while the orientation of the AUXIS dipoles is below 1° from the BASIS dipoles.

Fig. 4
figure 4

Root mean square deviation between the norms of the intrinsic dipole moments (in D) obtained with the AUXIS and BASIS approaches. The symbol X refers to F, Cl, Br and I atoms. Four population schemes are investigated (Becke, VDD, standard Hirshfeld and iterative Hirshfeld) in combination with four auxiliary function sets (GEN-A1, GEN-A2, GEN-A3, GEN-A2*). Please note that different scales are used for the graphs

Fig. 5
figure 5

Average angle between the intrinsic dipole moments obtained with the AUXIS and BASIS approach. The symbol X refers to F, Cl, Br, and I atoms. Four population schemes are investigated (Becke, VDD, standard Hirshfeld, and iterative Hirshfeld) in combination with four auxiliary function sets (GEN-A1, GEN-A2, GEN-A3, GEN-A2*). Please note that different scales are used for the graphs

Computational performances

In this section, we report the efficiency of our iterative Hirshfeld population analysis implementation, which is the most time consuming partition scheme here discussed. To this end, we optimized the insulin molecule at the PBE/DZVP/GEN-A2 level of theory employing ADFT. This molecule contains 784 atoms (H, C, N, O, and S) and a total of 3078 electrons. The optimized geometry is depicted in Fig. 6.

Fig. 6
figure 6

3D representation of the geometrically optimized insulin molecule. Color code: H in white, C in green, N in blue, O in red, and S in yellow

For the optimized insulin structure, we performed IH analyses of the KS (BASIS) and fitted (AUXIS) densities employing a varying number of compute cores. The same level of theory as for the structure optimization was used, i.e., PBE/DZVP/GEN-A2. The resulting timings are depicted in Fig. 7 as a function of the number of cores. All calculations are performed with Intel® Xeon™ E5-2650v2 (2.6 GHz) 8 core CPUs with 4 GB RAM per core. To guide the eye the individual data points in Fig. 7 are connected. As expected the analysis of the fitted density is always significantly faster (factor of around 2) than the KS density. The scaling with respect to the number of cores is rather satisfying. As Fig. 7 shows computational savings are still gained when passing from 96 to 128 cores.

Fig. 7
figure 7

Computational time for an iterative Hirshfeld analysis of insulin employing the Kohn-Sham (BASIS) or fitted (AUXIS) densities as a function of the number of compute cores

To put the timings in Fig. 7 into perspective we note that the structure optimization of insulin took around 1200 optimization steps. This optimization required 3 weeks on 32 of the above specified Xeon™ cores. Thus, the here reported timings for the iterative Hirshfeld analysis are a small overhead to the structure optimization. Because they scale similar to the SCF and geometry optimization this relation holds also for larger number of cores, e.g., the above shown 128 cores. In fact, in the case of the insulin molecule the CPU time needed for the iterative Hirshfeld analysis is only 2 to 3 times larger as for a single point ADFT energy calculation. Therefore, the here discussed population analysis implementation can be used for larger systems that are of interest to biological chemistry or material science.

Electrostatic interaction calculations with amoeba

In this section we wish to determine if the multipole distribution obtained from the population schemes described above are suitable to calculate electrostatic interaction energies within supramolecular complexes. Previous work showed that iterative approaches give atomic charges that better reproduce the DFT electrostatic potential than non-iterative approaches [46]. We use the Na+(Tryp)(H2O) complex to see the capability of the various population schemes to produce multipole sets in order to compute non-covalent interactions with the AMOEBA polarizable force field. The tryptamine molecule (Tryp) is derived from the tryptophan amino acid, the carboxylic acid being replaced by a hydrogen atom, and classified as a neurotransmitter. Four conformers have been selected from the previous work of Nicely and Lisy (see Fig. 1 in ref. [47] and Fig. 8). For structures A and B, the water molecule interacts both with the sodium cation and the amino group, making a hydrogen bond with the amino nitrogen. They differ mainly by the orientation of the ethylamine side chain. In structure C, the sodium cation is “sandwiched” between the water and the amino group whereas in structure D, the water molecule is acceptor of a hydrogen bond from the indole N-H.

Fig. 8
figure 8

Calculated structures for the Na+(Tryp)(H2O) complex and M06-2X relative energies (kJ mol-1)

This complex provides interesting structural pictures with various electrostatic interactions such as charge-dipole, dipole-dipole and polarization effects. The AMOEBA force field was chosen for its high-level treatment of electrostatic interactions by using a multipolar expansion up to quadrupoles on each atom and an explicit iterative polarization term. The multipoles have been computed following both IH and H schemes for the isolated tryptamine and then defined in their local atomic frame using the Orient program [48] to be used in the framework of AMOEBA. The relative energies taking structure A as reference, computed at the M06-2X/TZVP [49] level, are compared with the different atomic multipole sets and the energetic errors are reported in Fig. 9. When extracting multipoles from the KS density (BASIS), the error is found to be small, in the range of the error from quantum chemistry calculations ("IH GEN-A2*/A3* (BASIS)" histograms). If the multipoles are extracted from the auxiliary density (AUXIS) the error is still very small as long as GEN-A2* is used ("IH GEN-A2* (AUXIS)" histograms). The errors associated with multipoles obtained from auxiliary density may become large when using the GEN-A2 function set ("IH GEN-A2 (AUXIS)" histograms). Finally, we find that the standard Hirshfeld scheme is less accurate to reproduce such relative energies than the iterative version in both BASIS or AUXIS approaches.

Fig. 9
figure 9

Error (kJ mol-1) on the relative energies of the four conformers between M06-2X and AMOEBA using the different multipole sets. Conformer A is taken as reference

Due to the various interactions involved in the different structures, the errors to reproduce the relative energies should come from wrong specific energetic contributions. The graphs in Fig. 10 represent the values of the main components of the electrostatic energy and the total electrostatic energy for structures B, C, and D as a function of the error on the total energy relative to structure A. For the three structures, the largest error comes from the underestimation of the Na+-N(amino) interaction whereas the energy for the Na+-O interaction remains relatively constant. Furthermore, the water-N(amino) interaction also plays a role in structure B and the water-N(indole) one in structure D, with a less extent, respectively. Furthermore, this effect can be easily correlated to the charge and the x-component of the dipole moment of the nitrogen atom of the amino group when they are reported as a function of the energetic error for C. The large error for C in the IH scheme using the GEN-A2 auxiliary basis can be explained by a cumulative error on the charge and dipole moment on N and the small value of the polarization energy (Fig. 10).

Fig. 10
figure 10

Main electrostatic contribution error as a function of relative energies between M06-2X and AMOEBA using the different multipole sets (kJ mol-1) for structure B (top, left), structure C (top, right), and structure D (bottom, left). See text for details. Charge and largest dipole component on N atom in the NH2 group as a function of relative energy for C (bottom, right) between M06-2X and AMOEBA using the different multipole sets (kJ mol-1)

Consequently, the polarization energy of the structures can be compared for the different multipole sets (Fig. 11). Even if the effect is smaller than for the electrostatic component, this term contributes to the non-covalent interactions. Let’s focus on the IH first. When the GEN-A2 auxiliary function set is used the AUXIS and BASIS give different polarization energies that may differ by several kJ mol-1. This is especially noticeable for structure C for example. On the other hand we obtain very similar polarization energies for each structures when the GEN-A2* auxiliary function set is used (compare the "IH GEN-A2* (BASIS)" vs. "IH GEN-A2* (AUXIS)" data points). When considering the standard Hirshfeld scheme, we find that polarization energies are significantly larger. For example if we consider structure C, together with the BASIS approach (taking GEN-A2*) the polarization energy goes from 65 kJ mol-1 to 85 kJ mol-1.

Fig. 11
figure 11

AMOEBA polarization energies (kJ mol-1) of the four conformers using the different multipole sets

Conclusions

In this work we have been interested in the accuracy of population analyses based on fitted densities in the context of DFT. The main conclusion of our study can be summarized as follow. We found that fitted densities can actually be used instead of the Kohn-Sham density to extract electrostatic multipoles from DFT calculations. However, the quality of the auxiliary function sets to expand the fitted density has a great impact on the results and should thus be considered with care in applications. With standard sets comprising s and spd angular momentum functions only qualitative agreement between the BASIS and AUXIS atomic charges is found. Conversely, when using GEN-A2* (or GEN-A3*) the agreement between both approaches is excellent for either atomic charges or higher order multipoles. As seen in the case of insulin, the AUXIS approach offers a significant reduction of computational cost compared to the BASIS one. We finally tested the capabilities of the extracted multipoles of the tryptamine system to provide sufficiently accurate interaction energy calculation with the AMOEBA force field. In that regard the iterative Hirshfeld scheme represents a clear improvement over the traditional Hirshfeld scheme. Good results have been obtained with the IH scheme and GEN-A2* or GEN-A3* auxiliary function sets. Overall these results encourage us to pursue our ongoing efforts on the implementation of advanced QM/MM schemes that include second and third generation force fields in deMon2k [50, 51].