Introduction

Among the vibrational modes of molecular assemblies such as dimers, clusters, and crystals, low-frequency (LF) vibrational modes are of considerable importance in terahertz absorption [1,2,3,4,5], Raman scattering [6,7,8,9], and resonant two-photon ionization [10,11,12]. These methods provide a great deal of information about intermolecular forces concealed in translational and librational motions of molecules. If the constituent molecules are sufficiently rigid, the LF vibrations can be represented as the relative displacements of rigid bodies and are virtually separate from high-frequency intramolecular vibration modes [13, 14]. In molecules containing flexible moieties, some intramolecular vibrations may couple with intermolecular motions [15]. The contributions of inter/intramolecular components to LF vibrations are of considerable importance in crystal structure prediction [16, 17]. Several methods of modal analysis that explicitly account for coupling effects among inter/intramolecular vibrations have been proposed [3, 5, 18,19,20]. However, strict atomistic models often face problems relating to the computational cost of the associated calculations.

Previously, we proposed a universal formulation for coarse-graining LF vibrational modes that incorporated contributions from inter/intramolecular vibrational coupling [21,22,23]. In the framework of our formulation, an atomic displacement vector (3N dimensions for a molecule containing N atoms) is represented by a linear expansion in a lower-dimensional space (a coarse-grained space) spanned by a basis set corresponding to inter/intramolecular motions, namely three translations, three rotations, and some (a maximum of 3N − 6) intramolecular vibrations per nonlinear molecule. The coefficient of linear expansion can be regarded as a component in an internal coordinate system that determines its own representation of the stiffness constants and inertial loads of the constituents. In principle, we can restore information about the original vibrations, namely the atomic displacement, modal stiffness, modal mass, and eigenfrequency. However, if the basis set of the coarse-graining space is inadequately selected, some essential details will be lost, so the results of that calculation may be erroneous [22]. In other words, the quality of the basis set can be evaluated by measuring the similarity of the restored data to the original data. In this study, we propose four such indices for quantifying the reliability of coarse-grained representations of inter/intramolecular vibrations. We apply these measures to several hydrogen-bonded dimers and compare their characteristics. We then consider the best index to use according to the type of information to be restored.

Theory

The normal mode motion of a molecular assembly composed of N atoms is represented as a displacement vector ci with 3N dimensions, where i identifies either the zero-frequency modes (three translations i = 1, 2, 3; three rotations i = 4, 5, 6) or “true” vibrations (i = 7, 8, ..., 3N). A full-displacement matrix C contains ci as the ith column vector. By definition, ci is normalized but not necessarily orthogonal to another column vector. Here, we select a non-square (3N × n) matrix \( \overset{\sim }{\mathbf{C}} \) that contains n columns (implicitly including zero-frequency modes and LF vibrational modes) from C, where n is the number of LF-mode vectors to be coarse-grained and equals the dimension of the coarse-grained space.

\( \overset{\sim }{\mathbf{C}} \) is converted to a mass-weighted displacement (MWD) matrix \( \overset{\sim }{\mathbf{W}} \) by a diagonal matrix M containing atomic masses. In Eq. 1, \( {\overset{\sim }{\mathbf{w}}}_i \), the ith column vector of \( \overset{\sim }{\mathbf{W}} \), is normalized using matrix L, the square root of the modal mass matrix L2.

$$ \overset{\sim }{\mathbf{W}}={\mathbf{M}}^{1/2}\overset{\sim }{\mathbf{C}}\;{\mathbf{L}}^{-1} $$
(1)
$$ {\mathbf{L}}^2={\overset{\sim }{\mathbf{C}}}^{\mathrm{T}}\mathbf{M}\overset{\sim }{\mathbf{C}}. $$
(2)

To fulfill Eq. 3, a coarse-grained displacement matrix Ξ is defined that involves \( \overset{\sim }{\mathbf{B}} \), a non-square (3N × n) coarse-graining matrix detailed in previous papers [21, 22]. In the present work, some improvements are made in the definition of \( \overset{\sim }{\mathbf{B}} \) (see the “Appendix”).

$$ \boldsymbol{\Xi} =\boldsymbol{\Gamma} {\overset{\sim }{\mathbf{B}}}^{\mathrm{T}}\mathbf{M}\overset{\sim }{\mathbf{C}}. $$
(3)

Matrix Γ is the inverse of the inertial load matrix Γ−1, which is named by analogy with the matrix G−1 in the GF method [24].

$$ {\boldsymbol{\Gamma}}^{-1}={\overset{\sim }{\mathbf{B}}}^{\mathrm{T}}\mathbf{M}\overset{\sim }{\mathbf{B}}. $$
(4)

Using the inertial load matrix Γ−1 in the given coarse-grained coordinate system, Ξ is converted to the coarse-grained MWD matrix U. In Eq. 4, each column vector of U is normalized using matrix Λ, the square root of the modal mass matrix Λ2. We then have the following representations for the given coordinate system (which are somewhat similar to those for the Cartesian coordinate system, Eqs. 1 and 2):

$$ \mathbf{U}={\boldsymbol{\Gamma}}^{-1/2}{\boldsymbol{\Xi} \boldsymbol{\Lambda}}^{-1} $$
(5)
$$ {\boldsymbol{\Lambda}}^2={\boldsymbol{\Xi}}^{\mathrm{T}}{\boldsymbol{\Gamma}}^{-1}\boldsymbol{\Xi} . $$
(6)

The modal mass matrix Λ2 is largely identical to L2 when the transformer \( \overset{\sim }{\mathbf{B}} \) is chosen appropriately. The MWD matrix in the original coordinate system is then approximately restored to the full-dimensional mass-weighted displacements by the following equation:

$$ \overline{\mathbf{W}}={\mathbf{M}}^{1/2}\overset{\sim }{\mathbf{B}}{\boldsymbol{\Gamma}}^{1/2}\mathbf{U}. $$
(7)

The restored MWD matrix \( \overline{\mathbf{W}} \) contains \( {\overline{\mathbf{w}}}_i \) as the ith column, which is normalized but not necessarily orthogonal to another column.

We define a correlation matrix R as the product of \( {\overset{\sim }{\mathbf{W}}}^{\mathrm{T}} \) and \( \overline{\mathbf{W}} \); therefore, its component Rij is the inner product of \( {\overset{\sim }{\mathbf{w}}}_i \) and \( {\overline{\mathbf{w}}}_j \).

$$ \mathbf{R}={\overset{\sim }{\mathbf{W}}}^{\mathrm{T}}\overline{\mathbf{W}}. $$
(8)

By definition, R will be a unit matrix if \( \overline{\mathbf{W}} \) is identical to \( \overset{\sim }{\mathbf{W}} \). To quantify the similarity between \( \overset{\sim }{\mathbf{W}} \) and \( \overline{\mathbf{W}} \), we define four types of fidelity indices (F1 to F4) as follows:

$$ {F}_1=\frac{1}{\mathrm{n}}\mathrm{tr}\left(\mathbf{R}\right) $$
(9)
$$ {F}_2={\left(\prod \limits_{i=1}^n{R}_{ii}\right)}^{\frac{1}{n}} $$
(10)
$$ {F}_3=\prod \limits_{i=1}^n{R}_{ii} $$
(11)
$$ {F}_4=\kern0.5em \det \left(\mathbf{R}\right) $$
(12)

According to the original formulation, the reverse transformation of Ω (an eigenfrequency matrix) using U and Γ gives the stiffness matrix Φ. The element Φij in the stiffness matrix is a force constant with respect to the given coarse-grained internal coordinate system:

$$ \boldsymbol{\Phi} ={\boldsymbol{\Gamma}}^{1/2}{\mathbf{U}\boldsymbol{\Omega}}^2{\mathbf{U}}^{\mathrm{T}}{\boldsymbol{\Gamma}}^{1/2} $$
(13)

The matrix Φ also includes an error originating from the selection of the coarse-graining matrix \( \overset{\sim }{\mathbf{B}} \). Therefore, the diagonalization of Γ−1/2ΦΓ−1/2 does not need to yield eigenfrequencies identical to the original ones. Here, we define the reproduced eigenfrequency matrix as \( \overline{\boldsymbol{\Omega}} \), which fulfills the following eigenvalue equation:

$$ {\boldsymbol{\Gamma}}^{1/2}{\boldsymbol{\Phi} \boldsymbol{\Gamma}}^{1/2}\overline{\mathbf{U}}={\overline{\boldsymbol{\Omega}}}^2\overline{\mathbf{U}}. $$
(14)

Computational details

We examined six hydrogen-bonded homodimers of formic acid, acetic acid, trichloroacetic acid, formamide, formamidine, and urea (Fig. 1). We chose three carboxylic acids from our previous study [22], and added three new compounds in order to check the applicability of our method to nitrogen-containing hydrogen-bonding systems. The geometries of the monomers and dimers were optimized at the Hartree–Fock level of theory with the 6-311G(d, p) basis set in order to maintain continuity with our previous studies [21, 22]. Although we are aware that this level of theory is not sufficient for supramolecular systems, the present method does not depend on the level of calculation. Using the optimized geometries, normal mode analysis was performed at the same level of calculation. These molecular orbital calculations were performed using the Gaussian 09 W [25] suite of programs. The displacement vectors and frequencies of the normal mode vibrations were obtained from the output files and applied in subsequent calculations performed by a coarse-graining program developed in-house in order to determine stiffness constants and the four fidelity indices. The displacement vectors of the molecules were visualized using Jmol ver.14.4.0 [26].

Fig. 1
figure 1

Structures of the hydrogen-bonded dimers studied

Results and discussion

According to the definitions given in the “Theory” section, the indices F1 to F4 will be 1 if \( \overline{\mathbf{W}} \) and \( \overset{\sim }{\mathbf{W}} \) are identical and will be <1 if these matrices are different from each other. F1 and F2 are the arithmetic and geometric means of Rii, respectively, so F1 is always larger than F2. Therefore, F2 is a more severe criterion than F1. As n increases, F2 underestimates the local mismatch between \( \overline{\mathbf{W}} \) and \( \overset{\sim }{\mathbf{W}} \), while F3, the nth power of F2, is the severest criterion. In addition to the similarity between \( {\overset{\sim }{\mathbf{w}}}_i \) and \( {\overline{\mathbf{w}}}_i \), F4 takes into account the orthogonality between \( {\overset{\sim }{\mathbf{w}}}_i \) and \( {\overline{\mathbf{w}}}_j \) (i ≠ j). As a result, F4 is considered to be the most severe criterion among the four indices.

Figure 2 compares the four fidelity indices F1 to F4 as functions of the coarse-graining dimension n. The component ΦTx,Tx, corresponding to translation along the x-axis, is also included in each plot as a representative of the set of intermolecular stiffness constants. Except for a few data points, F1 and F2 varied within a narrow range between 0.95 and 1, suggesting that a discrepancy between the columns of \( \overset{\sim }{\mathbf{W}} \) and \( \overline{\mathbf{W}} \) was averaged over the dimension n. Thus, F1 and F2 can be understood as indicators of the average similarity between the original vectors and the restored vectors. The variations in F3 and F4 show similar behavior within the range between 0.6 and 1, a much wider range of variation than seen for F1 and F2. This behavior is reasonably explained from the character of F3 and F4 that are approximately a product of all the similarity between \( {\overset{\sim }{\mathbf{w}}}_i \) and \( {\overline{\mathbf{w}}}_i \). Thus, F3 and F4 can be understood as indicators of the completeness of the restored vectors.

Fig. 2
figure 2

af The four fidelity indices (left-hand y-axes) versus the coarse-graining dimension, as calculated for molecular dimers of a formic acid, b acetic acid, c trichloroacetic acid, d formamide, e formamidine, and f urea. Stiffness constants corresponding to x-axis translation (right-hand y-axes) are superimposed

Among the four fidelity indices, we can see that F3 and F4 commonly show alternating behavior as a function of n: the values obtained at odd n are lower than those obtained at even n. This phenomenon is caused by the symmetries of the molecular systems. In the case of the formic acid dimer, for example, when going from the coarse-graining dimension n = 12 to n = 13, we need to add a new MWD vector to construct a \( \overset{\sim }{\mathbf{W}} \) matrix composed of six zero-frequency motions (translation and rotation) and seven vibrations. The 13th column (\( {\overset{\sim }{\mathbf{w}}}_{13} \)) corresponds to the seventh vibrational motion, which is characterized as in-phase coupling of the intramolecular bending motions of O–C=O moieties (Fig. 3a). To coarse-grain this motion, the coarse-graining matrix \( \overset{\sim }{\mathbf{B}} \) needs to contain displacement vectors corresponding to the symmetric combination of the bending motions of O–C=O moieties. In this study, however, each column vector in \( \overset{\sim }{\mathbf{B}} \) is defined to represent the intramolecular motion of each molecule in the dimeric system (see the “Appendix”). This procedure means that the coarse-graining of \( {\overset{\sim }{\mathbf{w}}}_{13} \) requires two vectors corresponding to the bending motions of the respective molecules. Specifically, we need at least 14 columns in \( \overset{\sim }{\mathbf{B}} \) to adequately compress the dimension of \( {\overset{\sim }{\mathbf{w}}}_{13} \). This augmented \( \overset{\sim }{\mathbf{B}} \) matrix can also adequately compress the dimension of \( {\overset{\sim }{\mathbf{w}}}_{14} \), the MWD vector of the eighth vibrational mode, which is characterized as antiphase coupling of the bending motions of O–C=O moieties (Fig. 3b). These considerations explain the rapid increases in the fidelity indices at n = 14. Similarly, the vibrational motions corresponding to two sequential columns are in-phase and antiphase couplings of intramolecular vibrations, which accounts for the alternating behavior of the fidelity indices in Fig. 2a. The alternating behaviors of F3 and F4 for the other molecular dimers can be explained in a similar manner. In view of these results, we can propose a threshold for these fidelity indices: 0.8. Below this value, the intramolecular vibrational motion(s) in \( \overset{\sim }{\mathbf{B}} \) would be insufficient to express the vibrations of the molecular system. In contrast, this type of error does not seem to be notable for F1 and F2.

Fig. 3
figure 3

ab Atomic displacement vectors for a the seventh and b the eighth normal-mode vibrations of the formic acid dimer

In Fig. 2b, c, f, there are some disturbances of this alternating behavior, implying inconsistency in the vibrational motions between a monomer in a dimer and an isolated monomer. For the acetic acid dimer (Fig. 2b) and trichloroacetic acid dimer (Fig. 2c), the stiffness constants obtained with n = 12 and 13 are unnaturally low (0.0 and 7.9, respectively; data points not shown), implying that coarse-graining of the molecular vibration was conducted inadequately. This error is known to occur due to a lack of displacement vectors corresponding to intramolecular vibration modes (libration of the methyl group) in the \( \overset{\sim }{\mathbf{B}} \) matrix under these computational conditions [22]. As compared to those seen at higher n, the fidelity indices at these dimension values are rather low: F1 and F2 are lower than 0.95 and F3 and F4 are only 0.0–0.5. These results suggest that the absence of an intramolecular vibrational motion that contributes to the vibration of the whole molecular system results in severe decreases in the fidelity indices. In some cases, such severe decreases in fidelity indices occur simultaneously with an anomaly in the trend for the stiffness constant. However, for most coarse-graining dimension values, the stiffness constant shows a stepwise increase with increasing n. As described elsewhere, the stiffness constants calculated under low-dimension coarse-graining conditions are apparent stiffness constants (Φapp), which are normally smaller than the actual ones (Φinter) due to modulation caused by coupling with the internal elasticity of the constituent molecules [22, 23]. Based on simple four-body models, we formulated a relationship between the height of the stepwise increase and the intramolecular stiffness constant (Φintra). According to this formulation, an erroneous evaluation of the step height will severely compromise the accurate evaluation of the intramolecular stiffness constant. The present study provides a rational criterion for choosing the coarse-graining dimension with which the calculation can yield an adequate value of Φapp. If we adopt a threshold for F4 of 0.8, we can designate the minimum reliable dimension for the coarse-graining of formic acid, formamide, formamidine, and urea dimers as 12, and that for acetic acid and trichloroacetic acid dimers as 14. These values depend on the degrees of freedom of the LF molecular motion.

We examined the eigenfrequencies reproduced from the stiffness constants and inertial loads with respect to the given coarse-grained internal coordinate system (Eq. 14). Figure 4 shows the wavenumber νj (in cm−1) calculated from an element of \( \overline{\boldsymbol{\Omega}} \) corresponding to an intermolecular stretch mode (jth) vibration, in which the contribution from antisymmetric translation of the two monomers is predominant. As can be seen from this figure, the frequencies are nearly constant across a wide range of n (dimension) values, although there are some anomalies at positions where the fidelity indices decrease significantly (n = 36 and 37 for acetic acid and n = 30 and 31 for trichloroacetic acid; see Fig. 2). The reproduced frequencies are nearly identical to those obtained with full-dimensional calculations (i.e., without coarse-graining) at the minimum reliable dimension or higher. These results clearly show that with respect to the given coordinate system, the ratio of the stiffness constant to inertial load is adequately maintained irrespective of the coarse-graining dimension. We examined the modal masses calculated using Eq. 6. Figure 5 shows the values of λj2, the jth diagonal elements of Λ2 calculated in the coarse-graining space, for various dimension values. Except the data for the trichloroacetic acid dimer, the values are almost constant over a wide range of n. A stepwise increase at n = 24 for trichloroacetic acid is probably connected to the stepwise increase in ΦTx,Tx at the same dimension. Errors in the apparent stiffness seem to be compensated for by errors in the apparent inertial load when the frequencies are reproduced using Eq. 14. The ideal modal mass matrix is diagonal, but the real Λ2 is not. In previous studies [21, 22], we approximated Λ2 to be diagonal when obtaining the normalized U matrix (Eq. 4). However, the above consideration suggests that an arbitrary modification of Λ2 may cause unpredictable errors in the subsequent computational procedure. Therefore, for all the calculations in the present study, we used Λ2, as it is nondiagonal. As a result, in contrast to our previous prediction [21], the behavior of the modal masses is not a good measure of the completeness of the selected coarse-graining basis set.

Fig. 4
figure 4

af Restored frequency corresponding to stretching of the intermolecular bond as a function of the coarse-graining dimension for molecular dimers of a formic acid, b acetic acid, c trichloroacetic acid, d formamide, e formamidine, and f urea

Fig. 5
figure 5

af The modal mass corresponding to stretching of the intermolecular bond as a function of the coarse-graining dimension for dimers of a formic acid, b acetic acid, c trichloroacetic acid, d formamide, e formamidine, and f urea

Conclusions

We verified the reliability of our method of coarse-graining intermolecular vibrations by quantifying the fidelity of the restored MWD vectors. We proposed four fidelity indices F1 to F4, each of which measured the similarity between the original and restored MWD vectors in different ways. Of those four indices, F4, the determinant of the product of the two MWD matrices, was the most severe criterion, and we tentatively proposed an index value of 0.8 as the lowest value that would indicate reliable coarse-graining calculations. When this criterion was applied, the minimum reliable dimension for the molecular dimers was found to be 12 or 14, depending on the degrees of freedom of the LF molecular motion. In contrast to an apparent dependence of the stiffness constants on the coarse-graining dimension n, the eigenfrequencies and the modal masses were not very sensitive to n. These findings suggest that coarse-graining analyses using our method are useful for deriving the effective stiffnesses of intermolecular forces, facilitating the efficient calculation of LF-mode vibrations for large systems such as molecular crystals or biomacromolecules. A formulation for calculating LF phonons using coarse-grained force constants is now being prepared and will be published elsewhere.