Introduction

X-ray and neutron studies revealed abundant hydrogen bonds and regular network patterns among the crystal structures of cellulose and chitin (Langan et al. 1999; Nishiyama et al. 2002, 2003, 2011; Wada et al. 2004; Deringer et al. 2016; Ogawa et al. 2019; Sikorski et al. 2009; Naito et al. 2016) (Chen et al. 2014). Some authors regarded hydrogen bonds as the key factor governing the assembly of polymer chains, trying to explain many physical properties of cellulose. Recent studies have gradually reshaped this view. The impact of hydrogen bonds on the peeling-off of cellotetraose (Bergenstråhle et al. 2010) and the proposal of the hydrophobicity of cellulose (Lindman et al. 2010, 2021; Medronho et al. 2014; Glasser et al. 2012) both indicated that the contribution of hydrogen bonding interaction to the insolubility of cellulose was overemphasized.

After a few years of debate, it seems that we have established that other noncovalent interactions, such as electrostatic interaction and London dispersion interactions, are also responsible for the tight stacking of chains (Jarvis 2023). Recently, a detailed review evaluated the “exaggerated” role of hydrogen bonds (Wohlert et al. 2023) associated with the properties of cellulose in the past decades. Based on the linear tendencies of the heat of evaporation of analog molecules with numbers of hydroxy groups and molecular weights, the inter-chain hydrogen bond energy is 24 kJ/mol in crystalline cellulose Iβ (Nishiyama 2018), and London dispersion interaction is estimated to be 67 kJ/mol per glucose based on empirical Lennard-Jones potential parameters from the GLCYAM06 force field. Still, the first-principles-based quantification of the internal energy of cellulose and chitin is rare (Deringer et al. 2016). We previously quantified the partition of the noncovalent interaction in chitin and chitosan based on DFT-D2 calculations (Chen et al. 2021) using an energy decomposition analysis based on a low-dimension fragment approach (Deringer et al. 2016). Here, we have extended this method for the systematic analysis of cellulose, chitin, chitosan, and their allomorphs, as well as their monomers using three different generations of dispersion correction approaches (D2 (Grimme 2006), D3 (Grimme et al. 2010), D4 (Caldeweyher et al. 2017, 2019) in which polarizability was either considered or not. In addition, the upper limit of hydrogen bond strength is estimated based on the energy profile of hydroxy rotation, providing a basic, quantitative understanding of the noncovalent components of the crystalline polysaccharides.

Computational methods

Model construction

Eight types of crystalline polysaccharides whose atomic structures are available were considered, which were named cellulose Iα (Nishiyama et al. 2003), Iβ (Nishiyama et al. 2002), II (Langan et al. 1999, Chen et al. 2015), IIII(Wada et al. 2004), α-chitin-A (Sikorski et al. 2009; Deringer et al. 2016), α-chitin-B (Sikorski et al. 2009; Deringer et al. 2016), β-chitin (Nishiyama et al. 2011), chitosan (Naito et al. 2016). β-d-glucose and β-d-cellobiose (Jeffrey 1968). The unit cells and the corresponding fragments of these crystals are all presented in Fig. 1. The computational setup is identical to our former study (Chen et al. 2021), and a detailed description can be found in the following subsections.

Fig. 1
figure 1

Unit cell representation of cellulose, chitin, chitosan, glucose, and cellobiose. (All crystal structures will be deposited in glyco3D after publication)

Energy minimization

Periodic boundary conditions (PBC) based and dispersion-corrected DFT calculations were performed by using Quantum Espresso (QE) (Giannozzi et al. 2009, 2017) and VASP package (Kresse and Furthmüller 1996). D2 (Grimme 2006) and D3 (Grimme et al. 2010) are implemented in QE, and D4 (Caldeweyher et al. 2017, 2019) can be implemented in VASP (Hafner et al. 1997). The generalized gradient approximation (GGA) functional PBE (Perdew et al. 1996) was used for the geometry optimization of crystals. The total energy and force convergence thresholds for ion minimization were set to 1.0e-6 Ry and 1.0e-5 Ry/Bohr, respectively. The kinetic energy cutoff value of the wave function was 160 Ry. The k-points were set to (2, 2, 2, 0, 0, 0).

Energy decomposition analysis

The classical molecular mechanics (MM) represents the intermolecular noncovalent interaction as composed of Coulomb interactions between point charges and Lennard-Jones potential interactions of each paired atoms, including the London dispersion and Pauli repulsion terms. In DFT calculations, the intermolecular energy comprises terms such as electrostatics, exchange, induction, and dispersion interaction. For both cases, we can simplify them into dispersion energy and everything else as electrostatic interactions. The molecular interaction energy within a crystal can be written as Eq. (1).

$$E_{{int}} = {\text{ }}E_{{elec}} {\text{ }}+E_{{disp}}$$
(1)

Further decomposition for intra-chain and inter-chain terms results in four energy terms, which are the interchain electrostatic energy (Einter_E), the intrachain electrostatic energy (Eintra_E), the interchain dispersion energy (Einter_D), and the intrachain dispersion energy (Eintra_D) as presented in Eq. (2).

$$E_{{int}} = {\text{ }}E_{{inter\_E}} {\text{ }}+E_{{intra\_E}}{\text{ }}+E_{{inter\_D}} {\text{ }}+E_{{intra\_D}}$$
(2)

We rely on the low dimensional fragments approach widely used in DFT-based materials simulations (Deringer et al. 2016) to estimate the contribution of interchain interactions. In brief, DFT calculation was performed for the three-dimensional crystals (3D: raw unit cell applied with PBC) with relaxation for both atomic coordinates and crystal lattices, obtaining optimized energy and structure (noted as E3D_disp, corresponds to top left in Fig. 2). One isolated structural fragment (1D) was constructed by computationally “cleaving” the lattice apart from the above fully relaxed 3D structure. This is achieved by leaving one chain within the supercell and enlarging the transverse lattices by a factor of two, which “forcedly” separates this chain from its periodic boundary images, as shown in (top right of) Fig. 2. The total-energy computation was subsequently performed freezing both the box size and coordinates, and the obtained energy is noted as E1D_disp. This setup is slightly different from our previous work (Chen et al. 2022), in which the enlarged box and atoms were both relaxed, and makes the Eintra_E truly constant. In the previous studies, the Eintra_E was “assumed” to be unchanged, which was not the case, as seen in the comparison between Table 1 and Table S1. By switching off the dispersion correction and freezing the atoms both in the 3D system and in the 1D system, two other energy terms can be derived, noted as E3D_nodisp and E1D_nodisp, corresponding to the bottom left and right states in Fig. 2, respectively. This is illustrated in Eqs. (3), (4), and (5), respectively.

Table 1 Decomposed energy (Einter_E, Einter_D, Eintra_D, ECohe_E) of crystals (Glu represents glucose and CB is cellobiose)
$$E_{{intra\_D}} = {\text{ }}(E_{{1D\_disp}} {-}{\text{ }}E_{{1D\_nodisp}} )/N$$
(3)
$$E_{{inter\_D}} = {\text{ }}(E_{{3D\_disp}} {-}{\text{ }}E_{{3D\_nodisp}} )/N{\text{ }} - {\text{ }}E_{{intra\_D}}$$
(4)
$$E_{{inter\_E}} = {\text{ }}E_{{3D\_nodisp}} /N$$
(5)

where N stands for the number of residues per unit cell.

The cohesion energy of crystals per residue equals the sum of Einter_D and Einter_E (Eq. (6)).

$$E_{{Cohe\_E}} = {\text{ }}E_{{inter\_D}} {\text{ }}+E_{{inter\_E}}$$
(6)

The graphical illustration of such energy decomposition for cellulose Iβ as a trial is shown in Fig. 2. On the graph, E1D_disp is the total energy of one isolated structural fragment with dispersion correction switched on. In contrast, E1D_nodisp is the one without dispersion correction. For E3D_disp and E3D_nodisp, the 3D subscript means a standard unit cell with PBC applied, and thus the crystal is three-dimensional and infinite. Disp. and Nodisp. indicated whether or not the dispersion correction is applied.

Fig. 2
figure 2

Illustration of the condensed 3D fragment and isolated 1D fragment with dispersion correction switched on/off used for the energy composition calculation

Fig. 3
figure 3

Comparison of intermolecular London dispersion (Einter_D) and electrostatic energies (Einter_E) (ECohe_E = Einter_D + Einter_E) based on DFT-D2 (left), D3(middle), and D4(right)

Results and discussion

Estimation of London dispersion interaction

Figure 3 and Table 1 show the decomposed energy for all crystals as a function of the generation of dispersion correction. The (Eintra_D, Einter_D, Einter_E, ECohe_E) of Iβ is estimated as (79, 62, 47, 109) kJ/mol per glucose based on DFT-D2, respectively. This quantification of Einter_D is close to the empirical estimation of 67 kJ/mol (Nishiyama 2018). Comparing the four allomorphs of cellulose within the D2 framework, the Einter_D of II is the largest (64 kJ/mol per residue), while that of IIII is the smallest (60 kJ/mol ), and Iα/Iβ is in the middle (62 kJ/mol).

Overall, the estimated Einter_D for eight polysaccharide crystals ranges from 60 to 74 kJ/mol, the Eintra_D varies from 76 to 105 kJ/mol, and the Einter_E varies from 47 to 70 kJ/mol per residue based on DFT-D2 calculation. The larger Einter_D of chitin than cellulose is due to the higher molecular weight of N-acetyl-glucosamine residue than glucose residue. If one normalizes the energy by the volume of the residue relative to cellulose Iβ, we obtained (62, 62, 64, 57, 51,55, 55, 51, 64) kJ/mol of EIntrer_D (Iα, Iβ, II, IIII, α-chitin-A, α-chitin-B, β-chitin, chitosan) (Table S2), showing smaller normalized EIntrer_D of chitin than cellulose. Both II and chitosan exhibited maximal values (64 kJ/mol). One can find (Table 1) in crystalline monomer and dimer that Eintra_D (59, 68 kJ/mol) and Einter_D (98, 79 kJ/mol) are, although different, already comparable to that of polymer crystals. The slightly higher value of Einter_D and smaller value of Eintra_D in small molecular crystal than polymer crystal is partially ascribed to the increased molecular weight of repeat units (162 for glucose residue, 171 for cellobiose and 180 for glucose) and partially because of the chain polarity, which can be either parallel or antiparallel.

Different number of hydroxy groups per residue also lead to different numbers of inter-molecular hydrogen bonds per residue, which are 5 for glucose, 4 for cellobiose, 2 for II, IIII, chitin, and 1 for Iα/Iβ and chitosan. Because the nature of hydrogen bond is mostly electrostatic interaction, the Einter_E in small molecular crystals is much larger than its Einter_D (Fig. S1) and Einter_E of polymer crystals, respectively. We simply divide Einter_E by the number of hydrogen bonds per residue for a quick and rough calculation. The strength of the single hydrogen bond in mono and dimer crystals can be estimated to be under 30 kJ/mol in glucose and 27 kJ/mol in cellobiose, respectively. Similarly, the Einter_E in cellulose II, IIII, and chitin are larger than Iα, Iβ, and chitosan, due to one more inter-chain hydrogen bond per residue. More measurements of hydrogen bond interactions will be discussed in the last section.

Based on the DFT-D2 calculation, for the β-1,4-linked crystalline polysaccharides, the London dispersion interaction represents by 48 ~ 58% of the total cohesion energy of the polymer crystal regardless of polymer categories and types of correction used, as can be overviewed in Fig. 3 (left), Table 2, S4, and S5.

Table 2 The type and maximum strength of intra- and inter-chain hydrogen bonds in crystals

The impact of three generations of dispersion correction

When dispersion correction was modulated from D2 to the other two generations (D3 and D4), the four energy terms (Eintra_D, Einter_D, Einter_E, ECohe_E) of cellulose Iβ varied to (52, 53, 51, 105 kJ/mol) for D3 and (79, 49, 52, 100 kJ/mol) for D4, showing the monotonical decreasing of Einter_D and the increase of Einter_E and resulting in the Einter_E slightly over Einter_D. Such a reverse trend between Einter_D and Einter_E is also applicable to other crystals, as presented in Fig. S2. In D2, the dispersion coefficients for each atom species were constant, no matter their chemical contexts and the atomic number of atoms. In D3 and D4, the local electron polarizability effect was accounted for, and the dispersion coefficients were automatically adjusted according to their local chemical environment. In detail, the atomic partial charge used for scaling of polarizabilities relies on Mulliken partial charge in D3, but relies on the electronegativity equilibration partial charge in D4. Such treatment results in more expensive calculations and different energy values. By comparing with the benchmark of available molecular dipole-dipole dispersion coefficients, the D4 achieved a slightly better agreement with the experiment (Caldeweyher et al. 2019).

The update of dispersion correction leads to a slight difference in predicted unit cell parameters and thus also minor differences in chain packing (Table S3), especially the slight expansion of unit cell parameters a and b, which reflected the relatively less tight packing of pyranose ring in D3 and D4, and therefore reduced Einter_D and increased Einter_E. Still, the Einter_D takes 41 ~ 55% of the total intermolecular interactions for D3, and 35 ~ 48% for D4, as shown in Table S4.

No matter which type of dispersion correction is used, the Einter_D and Eintra_D of chitin are always higher than those of cellulose. This is simply ascribed to the larger molecular weight of the repeat unit (162 Da for cellulose and 203 Da for chitin). Nominalization by volume will reduce the Einter_D and Eintra_D of chitin, as previously discussed.

Estimations of hydrogen bond strength

Although more than one hundred years have passed since the first proposal for hydrogen bonds (Huggins1971; Derewenda et al. 2021), the estimation of its range of strength is still under development (Emamian et al. 2019). To computationally estimate one hydrogen bond interaction between small molecules such as water, one can separate the hydrogen bonding paired molecules and estimate the energy difference as hydrogen bond strength since the London dispersion interaction between paired small molecules is small and thus can be neglected. However, one cannot simply do so for hydrogen bonds in cellulose or chitin because of other electrostatic interactions and the increased London dispersion due to increased molecular weight. In addition, dividing Einter_E by the number of interchain hydrogen bonds would overestimate the hydrogen bond contribution because other multipolar electrostatic interactions also contribute to Einter_E. In the textbooks (Mark 2023), the hydrogen bonds are often judged by an arbitrary geometric factor: donor (H)-acceptor length < 0.27 nm and H-donor-acceptor angle < 30º. When such criteria are not fulfilled, they fall into the category of Coulomb interaction. Based on this, we developed an approach by extracting one chain or sheet out of the 3D crystal (as shown in Fig. 4) and rotating the hydroxy group around the C-O bond. Single-point energy calculations were run at each point by freezing all the atoms. Only a proton is moving, so the London dispersion interaction can be regarded as nearly constant, and the energy difference can be regarded as an indicator of hydrogen bond and partial contribution from electrostatic repulsion interaction. This method is similar to the study by Estácio et al. (Estácio et al. 2004), which shows that this hydrogen bond energy value is overestimated and can be considered an upper limit of hydrogen bond strength. Details for each hydrogen bond energy in β-chitin are also provided in Fig. S3.

Taking the intra-chain hydrogen bond of Iβ as an example, the H-O3 was rotated around the O3-C3 bond with a stepwise increment of 10º starting from the initial energy minimum (labeled as A), and single-point energy was calculated at each frame, as shown in Fig. 4. The variation of total energy is purely ascribed to the movement of hydrogen atom. The difference between the optimized energy and when H-O…O angle becomes 30º is considered as the strength of intra-chain O3-H…O5 hydrogen bond (labeled B in Fig. 4a). The estimation of another intra-chain (O2-H…HO6) hydrogen bond was done in a similar way by applying rotation for O2-H around the C2-O2 bond as shown in Fig. 4b. The estimation of inter-chain hydrogen bond (O6-H…O3) requires the simultaneous rotations of C2-O2-H and C6-O6-H angles because the sole rotation of C6-O6-H will induce unreasonable short HO6…HO2 contact (< 1 Å). To avoid this short proton-proton contact, the HO2 hydroxy and HO6 hydroxy groups are rotated in the opposite direction in the red arrow in Fig. 4c. The rotation of HO2 in Fig. 4c and b follow the same direction. The total energy variation of the simultaneous rotation of HO2 and HO6 was subtracted by the energy profiles of the sole HO2 rotation in Fig. 4b, and the result is shown in Fig. 4c. The estimation of other crystals follows the receipt of Iβ.

Fig. 4
figure 4

Total energy variation of Iβ chain as the function of H-O rotated angle around bond C-O. The arrow indicates the rotation trajectory of hydroxy groups. A, B, C, D, E in energy profiles corresponds to the same label in molecular snapshots, which indicates the selected frames during the rotation trajectory of hydroxy groups. The red arrows indicate the rotational direction of hydroxy groups from 0 to 360 degrees

By setting the universal hydrogen bond criteria (H…acceptor < 0.27 nm and H-D-A < 30º), the three major hydrogen bonds (EHO3…O5, EHO2…O6, EHO6…O23) of Iβ can be qualitatively estimated to have the upper limits of 25 kJ/mol, 24 kJ/mol, and 31 kJ/mol, with DFT-D2 as labeled in the dashed line in Fig. 4a and b, and 4c. The slightly higher value of EHO6…O3 than the other two can be ascribed to the additional contribution of electrostatic attraction from the O6H…O2 pair since the rupture of O6H…O3 also alters the O6H…O2 distance during the rotation of the hydroxy group (Fig. 4 & S4). Summary of all hydrogen bond strength is provided in Table 3, showing the range from 14 to 33 kJ/mol, which is similar to that in the estimation of alcohol hydrogen bond strength (24 kJ/mol) of small analogs (Nishiyama 2018). This indicates that the hydrogen bond strength in cellulose, chitin, and chitosan is not particularly strong but is similar to their smaller analogs (such as glucose, Table S6). For chitin and chitosan, the strength of a single NH…OC hydrogen bond is 26 and 34 kJ/mol, respectively, within the same magnitude of the strength of OH…OH hydrogen bond of cellulose. All the hydrogen bond strength estimation depends very little on the generation types of London dispersion correction, as shown in Table S7, S8 (13 ~ 34 kJ/mol for D3), and S9 (14 ~ 32 kJ/mol for D4) for all crystals. These precise estimations of OH…OH strength is also identical to the rough estimations of glucose and cellobiose in the previous section.

Each glucose residue in cellulose Iβ contains three free hydroxy groups that form one inter-chain hydrogen bond and two intra-chain hydrogen bonds on average, the strength of which is < 31 and < 50 kJ/mol per glucose, proving that both the inter- and intra-chain hydrogen bonds are weaker than the corresponding inter- and intra-chain London dispersion interactions (62 and 79 kJ/mol per glucose), respectively (Fig. 5). This picture is regardless of types of dispersion correction (52, 51, 53, 27 of Eintra_D, Eintra_HB, Einter_D, Einter_HB for D3 and 79, 53, 49, 31 for D4). A similar expression also applies to cellulose Iα, exhibiting similar structural features as Iβ, as shown in Table 3.

Table 3 Comparison between London dispersion interaction and hydrogen bond
Fig. 5
figure 5

Comparison of (inter-chain and intra-chain) London dispersion interaction and hydrogen bond strength in crystals

For other cellulose allomorphs (II and IIII) and chitin, as well as chitosan, the intra-chain hydrogen bond (O3-H...O5O3) is retained. However, the planar hydrogen-bond network disappears due to the conformational variation of the exocyclic hydroxymethyl group, which is tg in native cellulose, gg/gt in α-chitin, and gt in the rest. The number of inter-chain hydrogen bonds increases from 1 to 2 per residue, and the intra-chain one decreases from 2 to 1, thus accompanied by the increased total strength of inter-chain hydrogen bonds. However, the strength of inter-chain hydrogen bonds is always below 50 kJ/mol from D2 and D3 and 47 kJ/mol from D4, showing that none exceeds their corresponding inter-chain London dispersion interaction (see Table 2). Regarding the intra-chain interactions, hydrogen bonds contribute far less than dispersion interaction (< 32 kJ/mol versus > 50 kJ/mol, respectively), as also shown in Table 2. The same picture between intra- and inter-chain dispersion interactions and hydrogen bonds also holds for other polysaccharide analogs. As shown in Table S10, the relative ratio of hydrogen bonding interaction occupied in the total interchain interaction in crystals varies between 23% and 40%. In comparison, London dispersion energy fluctuates between 35% and 58% (Table S4), and the overall electrostatic interactions occupy from 42 to 61% (Table S5).

When hydroxy groups are rotated around their corresponding C-O bond (Fig. 4), the energy barrier during the C-O-H angle rotation may reach 50 ~ 60 kJ/mol per residue. One may simply think this should be considered as the hydrogen bond strength. Such a thought is improper because the energy variation induced by the rotation of the hydroxy group includes both hydrogen bonds and other repulsions or attractions between hydrogen and nearby atoms. This repulsion or attraction occurs in polymers but not for small molecules (such as water) in an isolated state due to the steric effect of adjacent atoms in the polymer chain. One obvious evidence is the bimodal shape of the energy profile, where an energy minimum occurs between 200º and 250º, and is due to the electrostatic attraction between mobile hydrogen and its adjacent oxygen. Their geometry parameters, at this minimum, are far beyond the standard hydrogen bond criteria. At this low energy minimum, the nearby electrostatic repulsion is the smallest. The energy variation is consistently lower than 32 kJ/mol in comparison to the initial state (20 kJ/mol in Fig. 4a and 32 kJ/mol in Fig. 4b and 18 kJ/mol in Fig. 4c), which is much smaller than the energy barrier. Therefore, our estimated energy difference is already the upper limit of hydrogen bond strength.

The hydrogen bond strength estimated here is slightly higher than those reported for small molecules. This is because one hydroxy in polysaccharide crystal structures acts as both donor and acceptor due to the hydrogen bonding network, constructing correlation among each hydrogen bond. Influencing one may also partially interrupt others. The cooperativity in the hydrogen bond network enhances the strength of a single hydrogen bond, but the extent is limited (Qian 2008; Masella et al. 2000). The hydrogen bond strength is also context-dependent in protein, which is stronger in the inner hydrophobic core than the surface, but never found to dominate the structural stability (Deechongkit et al. 2004). According to our estimation on hydrogen bond and London dispersion interaction, a similar principle can be applied to crystalline polysaccharides, .

Conclusion

To summarize, using the DFT calculation and modulation of dispersion correction and the single energy calculation with rotating hydroxy groups, we have systematically quantified the London dispersion interaction and strength of hydrogen bonds of cellulose, chitin, and chitosan and their monomers and dimers. We can confirm that inter-chain London dispersion interaction exceeds the strength of inter-chain hydrogen bonds within the lattice energy of cellulose Iβ as Nishiyama reported. In addition, the intra-chain London dispersion interaction was also proved to be stronger than the intra-chain hydrogen bonds for Iβ crystals. Moreover, these findings not only apply to cellulose Iβ but also can be extended to other cellulose allomorphs (Iα, II, IIII) and other β-(1,4)-crystalline polysaccharides (chitin and chitosan). The alteration of different generations of dispersion interactions slightly alters the absolute value of intermolecular dispersion and electrostatic and hydrogen bonding energies due to slightly different chain packing and unit cell parameters compared to experimental observation. Still the picture that London dispersion interaction exceeds hydrogen bonding interaction always stands, for both inter- and intra-chain terms. Overall, the inter-chain hydrogen bonding interaction occupied 23 ~ 40% of the total interchain molecular interaction. At the same time, London dispersion energy fluctuated between 35% and 58%, and the electrostatic interactions occupied from 42 to 61% among eight crystals.

Our finding offers molecular insights to understand the driving force for the initial assembly of polymer chains during the biosynthesis of cellulose nanofibrils. Our quantification also provides direct evidence that refutes the hydrogen bonding interaction dominated dissolution mechanism of cellulose and chitin. One may argue that although one hydrogen bond is not strong enough, the activation energy to peel off abundant hydrogen bonds along a polymer chain would be strong enough. However, the peeling-off energy required against London dispersion interaction in the crystal would be also higher than the total energy of the many hydrogen bonds. In the future, our approach can be extended to the co-crystal between cellulose/chitin/chitosan and small molecules, such as cellulose/ammonia and cellulose/EDA complex, understanding the energy components and providing insight to develop a protocol for the deconstruction of these crystals.