Introduction

Cellulose is one of the most naturally abundant biopolymers, with approximately 700 billion tons produced every year.1 It is the main component of algae and plant cell walls, vegetable tissues such as wood and cotton, and, after processing, paper.2 In material science and engineering, cellulose has a particular interest for its uses as a biofuel,3,4,5,6 its mechanical,7,8,9,10 and superhydrophobic11,12,13,14,15,16 properties, and its amphiphilic character and behaviors with ionic liquids.1,17,18,19,20,21

Experimental investigation of cellulose and cellulose fibers at a molecular level began as early as 1912 by von Laue.22 The first complete crystallographic unit was proposed in 1928 by Mayer and Mark,23 refined in 1937 by Meyer and Misch,24 and confirmed in 1938 by Gross and Clark.22 A more detailed description of the cellulose crystal structure and its internal hydrogen bond network was established for the α and β allomorphs by Nishiyama, Langan, and Chanzy by x-ray and neutron diffraction.25 For the cellulose Iβ allomorph, they reported a monoclinic unit cell that contains two parallel cellobiose chains that form a layered structure, held together by covalent bonds in c directions, hydrophobic interactions in a, and hydrogen bonds in b.

Using a force field approach to study cellulose, like other carbohydrates, remains challenging. Several all-atom force fields (AAFF)26,27,28,29,30,31,32 have been published, and their parameters have been validated mostly using the conformational energies of model compounds of mono- or disaccharides. However, the predicted cellulose Iβ crystal lattice does not match the experimental values. More seriously, as pointed by Matthews et al.,33 long (up to 0.8 μs) molecular dynamics (MD) simulations of cellulose Iβ using these AAFFs converge slowly, and the results show considerable deviations in conformations and lattice parameters. Other available force fields of cellulose are the united atom34 and coarse grain models7,35 that rely on constraining the cellobiose monomer to the configuration found experimentally.

The difficulty of developing an accurate AAFF for cellulose originates from numerous hydroxyl substituents, forming intra- and intermolecular hydrogen bond networks. In addition, the glucosidic (-O-) bonds between two glucose rings are relatively flexible. As a result, the potential energy surface (PES) is complex with multiple minima and conformers often in equilibrium with each other.36 It has been reported that the conformational changes in cellobiose (disaccharide) take place on a microsecond time-scale,26 and result in significant fluctuations of the internal degrees of freedom and changes in the intra-ring hydrogen bonds.37 In the solid, the intermolecular hydrogen bonds and van de Walls interactions, coupled with the intramolecular interactions, control the form of packing. Therefore, a good force field must accurately represent both the intra- and intermolecular interactions in cellulose, and accurately describe conformational changes.

To solve this problem, in this work, we have conducted a full parameterization for cellulose. Using saccharide-related molecules as model compounds, we applied quantum mechanics (QM) density functional theory (DFT) to explore the PES for ring deformations and inter-ring rotations. From those QM calculations, an accurate force field representation of the intramolecular PES was obtained. The intermolecular interactions were initially optimized using liquid phase simulation data, and later refined with cellulose Iβ crystal data. The parameterization procedure was iterated multiple times so that the couplings between the intra- and intermolecular interactions are considered.

The force field is one of the specific force fields in the TEAM force field database (TEAMFF), which consists of multiple force field tables which are independently developed. These force fields are grouped by force field types (e.g., AMBER,38 CHARMM,31,39 CFF,40 and TEAM41). Each group has a base force field that provides generic coverage, allowing force fields developed for specific compounds to have s high accuracy. On deployment, the force fields of a group are compatible and can be used together by combining the atom types and parameters.

In the following sections, we first explain the parameterization and simulation procedures, then present and discuss the parameterization and validation results in the gas and condensed phases, and finally summarize the main contributions of this work.

Methods

Functional Form and Atom Types

In this work, we choose the AMBER functional form, with a temperature-dependent dispersion term.42 The total energy is expressed as:

$${U}_{tot}= \sum_{b}{K}_{b}{\left(b-{b}_{0}\right)}^{2} + \sum_{\theta }{K}_{\theta }{\left(\theta -{\theta }_{0}\right)}^{2} + \sum_{\varphi }\frac{{K}_{\varphi }}{2}\left[1+\mathrm{cos}\left(n\varphi -{\varphi }_{0}\right)\right] +\sum_{\chi }\frac{{K}_{\chi }}{2}\left[1+\mathrm{cos}\left(n\chi -{\chi }_{0}\right)\right]+\sum_{r}\left[\frac{Cqq^{\prime}}{r}+\varepsilon \left(T\right)\left({\left(\frac{\sigma \left(T\right)}{r}\right)}^{12}-{\left(\frac{\sigma \left(T\right)}{r}\right)}^{6}\right)\right]$$

where \(b, \theta , \varphi ,\chi ,\mathrm{ and }\, r\) represent the bonds, angles, dihedral angles, improper-dihedral angles, and nonbonded atom–atom distances. The first four terms are called valence terms because they are described by the connectivity of the valence bonds. The last two terms, coulombic and Lennard–Jones (LJ) 12-6 functions, are nonbond terms representing all intra- and intermolecular nonbonded interactions, including hydrogen bonds. The well depth and diameter parameters are scaled with a scaling factor \({(f}_{\mathrm{disp}})\)42 and expressed as functions of temperature, and given by:

$$\varepsilon \left(T\right)= {f}_{\mathrm{disp}}^{2}\left(T\right){\varepsilon }_{298}$$
$$\sigma \left(T\right)= {f}_{\mathrm{disp}}^{-1/ 6}(T){\sigma }_{298}$$

The charge parameters, \(q,\) are expressed in terms of partial atomic charges and bond-charge increments:

$${q}_{i}= {q}_{i}^{0}+ {\sum }_{j}{\delta }_{ij}$$

The pairwise LJ12-6 potential uses the Lorentz–Berthelot combination rule:

$${\varepsilon }_{ij}= \sqrt{{\varepsilon }_{i}{\varepsilon }_{j}}$$
$${r}_{ij}^{0}=\frac{{r}_{i}^{0}+{r}_{j}^{0}}{2}$$

The atom types are defined following the TEAMFF Hierarchical Atom Definition scheme that takes into account the immediate environment of each atom and the essential features of the atom, such as hybridization, coordination number, ring size, and aromaticity.41 A list of the atom types is given in the support information, Table S1.

Parameterization

The valence terms are parametrized from the QM-DFT data. Molecules that form the training set were selected using a fragment-based approach.43 The QM-DFT calculations were carried out with Gaussian0944 software at the B3LYP/6-31G(d) level. The bond-charge increment parameters were determined from QM ESP charges, while the initial LJ parameters were taken from the default TEAMFF. With the charge and the LJ parameters fixed, the valence parameters were optimized using a Levenberg–Marquardt procedure to fit the QM-DFT data, including the energies and the first and second energy derivatives for all the training set molecules.

With the valence and charge terms fixed, the LJ parameters were optimized using a MD simulation to fit the liquid state experimental data. The training set for this part of the parametrization includes cyclic ethers, cyclic alcohols, and linear molecules with similar atom sequences. For each compound of the training set, the experimental liquid density (ρ) and heat of vaporization (HoV) at different temperatures were obtained from the NIST standard reference database.45 To avoid complications due to couplings with intramolecular interactions and strong polarization, large polyolic chains and small alcohols have been excluded.29

Generally, for each molecule, several temperature points have been selected to adequately cover its thermodynamic space (above the melting point, at the boiling point, and near the critical point).42 A series of MD simulations were carried out at each temperature point to calculate the liquid densities and HoV values to optimize the van der Waals parameters. While liquid densities can be directly determined from MD simulations, the HoV values are calculated by extracting the intermolecular energy \(({E}_{\mathrm{inter}})\) according to the ideal gas approximation:

$${H}_{\mathrm{vap}}=RT-{E}_{\mathrm{inter}}$$

The resulting MD data were used to fit the available experimental data using a least-squares procedure. At this stage, bulk liquid MD simulations, where the liquid was represented by a simulation box with a 3-D periodic condition, were carried out with the GROMACS simulation software46 in boxes constructed with Packmol47 that contained approximately 300 molecules. The simulation boxes underwent annealing, where the temperature of the system was raised at 800 K to relax the internal strain and to randomize the initial configuration, followed by a conjugate gradient energy minimization. Then, they were equilibrated at the target temperature and pressure conditions under an isothermal-isobaric (NPT) ensemble with Langevin temperature control and Berendsen pressure control for 0.2 ns in 1 fs time step. Once the system reached equilibrium, the density and HoV were sampled during 1 ns in 2 fs time step with using a Parrinello–Rhaman barostat. During the simulations, long-range electrostatics were modulated using a particle-mesh Ewald (PME) summation and long-range van der Waals using the dispersion correction with a cutoff of 1.2 nm. Bonds with hydrogen atoms were constrained with the LINCS algorithm.

The crystal structure of the cellulose Iβ form of cellobiose was used to test and fine-tune the LJ parameters. The force field functions are commonly used and supported by different simulation software packages. While the simulations of liquids were carried out with GROMACS,46 mainly for its efficiency of simulations of amorphous systems, the simulations of crystals were performed using LAMMPS,48 for its flexibility of handling lattice structures. The initial unit cell and atom positions were taken from the x-ray and neutron diffraction data of Nishiyama et al.,25 available on the Cambridge Crystal Structure Database.49 The simulations were carried out on a \(4\times 4\times 4\) supercell, which had the size of \(31.14\times 32.60\times 41.52\) Å3, and 5,376 atoms. The periodic boundary conditions were applied in the X, Y, and Z directions. The long-range electrostatics and van der Waals cutoffs were set to be 1.2 nm. The experimental structure was relaxed by simulated annealing and conjugate gradient energy minimization. The minimization was then used to test and fine-tune the nonbond parameters to improve the fit of the lattice parameters.51 For true comparisons with the experimental data, MD simulations at the experimental temperature (298 K) and pressure (1 atm) were carried out using the same super-cell model. The equilibration was done initially by an isochoric-isothermal (NVT) simulation and then an NPT simulation, while the data collection was carried out by NPT simulation. In the NPT equilibration, the a, b, and c edges of the lattice were controlled independently with anisotropic pressure control. The data collection period was extended to 100 ns, with a time step of 2 fs; snapshots of this simulation are shown in Fig. 1 of the supporting information. As evident from the block-averaged cell parameters (Table III of the supporting information), the simulation converged under the simulation condition.

Figure 1
figure 1

Model disaccharides. For better visualization, only hydrogen atoms involved in hydrogen bonds are shown.

After the LJ parameters were optimized by condensed phase simulations, the QM data fit was repeated to count in the small perturbations to the intramolecular interactions due to the revised LJ parameters. The entire process was repeated a couple of times to obtain the final force field parameters that yielded consistent results.

Results and Discussion

Fit QM Data

The training set for the QM calculations and the parameterization of the valence terms is shown in Table I. It includes small linear alcohols, substituted and unsubstituted heterocycles, and model disaccharides (Fig. 1) that describe the hydroxyl/oxygen and hydroxyl/hydroxyl interactions.

Table I Training set for QM calculation and parameterization of valence terms

Linear alcohols were scanned along the O-C-C-C and the H-O-C-C dihedrals at intervals of 10° for 180°. Intra-ring hydrogen bonding between vicinal hydroxyls has been explored from single-ring compounds such as cyclohexanediol and the heterocycle tetrahydro-2H-pyran-2,3-diol. Sampling starts from two configurations, one with the hydroxyl groups located at the same side forming a 60° angle between them and internal hydrogen bonds, and the other with the hydroxyls at opposite sides at 150°, without any internal hydrogen bonding. Inter-ring hydroxyl interactions were sampled along the hydroxyl substituent and the methylhydroxyl of the model disaccharide 3 (Fig. 1).

More complex cycles, such as cyclohexanol and oxanol, have been used to sample the position of the rotations of the hydroxyl groups and its interactions with the heterocycle oxygen in a fixed chair configuration. Cyclohexanol and oxanol were scanned along the C-C-C-C/O-C-C-C and O-C-C-C/H-O-C-O directions, respectively. After fitting, carbon ring deformations and some possible hydroxyl positions were accurately captured.

The glucosidic bond was scanned from model disaccharides 1–3 along the O-C-O-C direction for 360°28,32 and refined with cellobiose. This underwent additional sampling along its conformational dihedrals and hydroxyl groups. The overall fit results of the QM training set are shown in Fig. 2 for the conformational energies and vibrational (normal mode) frequencies, and in Fig. 3 for structures in terms of bond-length, bond-angle and dihedral angles.

Figure 2
figure 2

Comparison of energies (left) and normal mode frequencies (right) between QM calculations and MM fitting. Energies are measured in kcal/mol and frequencies in wavenumbers.

Figure 3
figure 3

Comparison of structural parameters in bond-length (a), bond-angle (b), and torsion dihedral angle (c), obtained between QM and MM calculations from the training set. Length is measured in Å and angles in degrees.

Cyclic molecules are complex, with several possible conformations, accompanied by significant bond-length, bond-angle, and dihedral fluctuations. Here, we have chosen simple molecules that display accessible deformations which are six-membered heterocycles, such as tetrahydropyran and dioxane, to sample the ring deformations. Due to their limited independent internal degrees of freedom, a complete scanning of their PES is possible using two dihedral angles.52 As shown in Fig. 4 and Table II, all distinct conformers were obtained for the unsubstituted heterocycles, tetrahydropyran and dioxane.

Table II Sampled conformers for simple heterocycles and their conformational energies
Figure 4
figure 4

Comparison of PES of tetrahydropyran (top) and dioxane (bottom) between QM and FF. Dihedrals are measured in degrees, and the energy in kcal/mol, QM data are shown on the left and the MM fit on the right. The relative energy difference between two regions of different colors is at 2 kcal/mol with zero set in the blue area (chair conformer) (Color figure online).

Fit Liquid Data

The LJ parameters were optimized from the bulk liquid densities and HoV of 24 molecules, including alcohols, ethers, and cyclic compounds (Table III). The liquid phase fitting results of the densities and HoVs (Fig. 5) show a globally good agreement with the experimental values. Due to the limited availability of high-quality HoV experimental data, there are significantly more points for densities than for HoVs. More details about the fitting results of liquid properties can be found in the supporting information SI2.

Table III Training set of molecular liquids; molecule identifiers (IUPAC name, fitting reference name and SMILES strings) are shown in columns 13, and the number of different temperature points used for density and HoV are shown in the fourth column
Figure 5
figure 5

Density (left) and HoV (right) fitting results. Experimental data is on the X-axis and simulation data on Y.

Condensed phase polarizability effects are apparent both in the density, and the HoV fits. In the density fit, the largest percentage deviation of the set (− 2.5%), appears at the 2,4,6-Trimethyl-1,3,5-trioxane, commonly known as Paraldehyde, near its melting point. In the HoV fit, the largest percentage deviation (− 8.05%) is located in the ethylene glycol near its boiling point.

Validation on Cellobiose

A first-level validation concerns the monomer of glucose, cellobiose, which is a disaccharide with significant internal hydrogen bonds and flexible linkage (-O*-) between the two rings; here, O* indicates the linkage oxygen. The internal hydrogen bonding stabilizes the cellobiose conformers.37 We obtained 7 conformers by scanning the C-C-O*-C and C-O*-C-O dihedral angles. The structures and relative energies of the conformers are shown in Fig. 6. The molecular mechanics (MM) calculation using the present force field accurately predicts the minimum energy structure (#0) and all conformers with various relative energies. Low energy conformers are due to more intramolecular hydrogen bonds. In high-energy conformers, the methyl hydroxyl group rotates and the intramolecular hydrogen bonds are broken. However, the conformers may be stabilized in the condensed phase due to intermolecular hydrogen bonds. Structure VII is a fixed structure that is cut from the crystal of glucose and the MM energy agrees well with the QM data, both located approximately 18–19 kcal/mol above the minimum. This energy cost is compensated in the crystal by intermolecular interactions.

Figure 6
figure 6

Conformation energies and structures with hydrogen bonds (dashed lines) of cellobiose. The relative energies in kcal/mol are relevant to QM energy (orange) of structure 0. Conformer VII isa fixed structure cut from Iβ crystal of cellulose (Color figure online).

Figure 7 shows the normal mode frequencies of the QM and MM calculations. Over the entire frequency range, the MM data agree well with the QM data. It is interesting to see that small but systematical deviations are found in the middle frequency ranges. Close examination of the normal modes indicates that these modes are associated with couplings in bond stretching and angle distortions, for which more complex functional forms are required to fully describe.

Figure 7
figure 7

Normal mode frequencies (in cm−1) of cellobiose. Quantum data are shown in blue and MM data in orange (Color figure online).

A statistical analysis of QM and MM structural parameters in terms of bond-lengths, bond-angles and dihedral angles of cellobiose is given in Table IV. For each type of internal coordinate, the number of data points and the ranges of data measured by the maximum and minimum values, as well as the standard deviation of the data distribution, are listed for QM and MM, respectively. The first point to make is that the molecule, although it is relatively small, presents a significant fluctuation in all the internal coordinates. The second point to make is that the MM calculation based on the present force field agrees well with QM calculation in terms of the predicted values and fluctuations.

Table IV Statistic analysis of bond-length in Å, bond-angle and dihedral angles in degrees of cellobiose conformers, calculated by QM and MM methods

Validation on Cellulose Iβ Crystal

The crystal structure of cellulose Iβ has been resolved and published,25 and several research groups have published their computational results with different force fields. In Table V, we list a comparison of unit cell parameters on cellulose Iβ crystals obtained by using energy minimization (MM) and molecular dynamics simulation (MD) using the present force field, together with the experimental data and other simulation data for comparison. The MD simulation up to 100 ns appears well converged, as evident from the data given in supporting information Table SI3. The data show that the density obtained by energy minimization is about 1% higher, which is reasonable, while the MD simulation at ambient temperature and pressure yields excellent agreement with the experimental density with − 0.1% deviation. In terms of cell edge parameters, the a and c edges are slightly underestimated, whereas the b edge is slightly overestimated. Our MD predictions are overall closer to the experimental data than other predictions found in the literature.

Table V Comparison of experimental and computational results in unit cell parameters of cellulose Iβ crystal

As shown in Fig. 8, cellulose Iβ is monoclinic with the polymer backbone aligned with the c direction, and layers joined by hydrophobic interactions and hydrogen bonds in the a and b directions. Presumably, edge a is mostly affected by the hydrophobic interactions,35,54 edge b is mostly affected by the intralayer hydrogen bond network, and edge c is mostly influenced by the intramolecular structures. Since our force field is derived by using multiple datasets including liquid data of relevant molecules, the deviations indicate the limit of transferability of the force field parameters. We speculate that the problems might be lessened by including a polarizable function, since the polarization is significantly different between small molecular liquids and crystalline polymers.

Figure 8
figure 8

Cellulose Iβ unit cell. Lattice is shown in dashed black lines, and the lattice directions are labelled with pink letters. Fragmented atoms are bonded to atoms located outside the unit cell. The C axis is placed in the Z direction, B in the Y and A in the XY plane. The unit cell and the atom coordinates are taken from the Nishiyama x-ray and neutron diffraction data25 deposited at the Cambridge Crystallographic Data Center (CCDC) (Color figure online).

Conclusion

We have presented a new AAFF for the simulation of cellulose. The force field is a part of the TEAMFF force field database in AMBER functional form, and can be used with popular simulation engines. The parameters are derived by fitting large amounts of QM energetic data generated from a training set of 12 molecules. The temperature-dependent LJ parameters have been optimized by simultaneously fitting the experimental density and the HoV of 23 relevant molecular liquids at various temperatures. Validation on glucoside, the monomer of cellulose, indicates that excellent agreements in conformational energies, molecular structures, and vibrational frequencies between MM calculations using the present force field and QM calculations are obtained. Finally, the force field is tested on the prediction of the crystal Iβ structure of cellulose. The density agrees perfectly with the experimental data, and all cell parameters have less than 1% deviations compared with the experimental data. This is a superior performance compared with previously published force fields.

Notes

The force field parameter set is available for download at https://github.com/sungroup-sjtu.