Abstract
Fast, empirical potentials are gaining increased popularity in the computational fields of materials science, physics and chemistry. With it, there is a rising demand for high-quality reference data for the training and validation of such models. In contrast to research that is mainly focused on small organic molecules, this work presents a data set of geometry-optimized bulk phase zeolite structures. Covering a majority of framework types from the Database of Zeolite Structures, this set includes over thirty thousand geometries. Calculated properties include system energies, nuclear gradients and stress tensors at each point, making the data suitable for model development, validation or referencing applications focused on periodic silica systems.
Measurement(s) | potential energy |
Technology Type(s) | Computational Chemistry |
Factor Type(s) | Crystal structure, composition and topology |
Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.17313236
Similar content being viewed by others
Background & Summary
Atomistic models are an essential tool for the prediction of thermodynamic, mechanical or biochemical properties of a substance. More recently, the use of pre-trained models has become increasingly popular due to their comparably low complexity and high accuracy on modern hardware1,2,3,4,5,6. In order for such models to perform well, their empirical parameters require fitting to high-quality reference data. Depending on the application, reference data are either experimental, or come from computationally more expensive ab initio calculations. Although there are already a handful of large computational data sets covering small organic molecules7,8,9, such data is still scarce for larger periodic systems (cf. Materials Cloud Archive10,11 or the NOMAD database12,13). Motivated by this fact, we present a quantum-chemical data set for zeolites. Zeolites are porous materials comprised of interconnected SiO4 or AlO4 tetrahedra. Their properties can be fine-tuned through synthesis of materials with specific pore size, or the inclusion of additional metal cation sites14,15,16,17. Because of their topology and synthetic flexibility, zeolites have various applications as adsorbents18,19,20 and catalysts17,21,22,23. To this day, a myriad of different zeolite framework types is available experimentally, and many more hypothetical structures can be derived24,25,26. The documentation of fundamental zeolite framework types and derived materials has led to the publication of the well-known Atlas of Zeolite Structures27 in several editions. The atlas lists each unique framework type by its three-letter-code, as assigned by the by the Structure Commission of the International Zeolite Association (IZA). Today, its contents are available online at the Database of Zeolite Structures28, which we use as a source of initial structures for our data set. In this first installment, we include properties for 204 out of the currently available 256 zeolite framework types in the database (a total of 226 unique geometries when also considering derived materials). Our descriptor provides the complete optimization trajectories for each system with atomic positions, lattice vectors, atomic gradients and stress tensors at each step. We envision future extensions of the data set to focus on derived geometries, covering structural defects and host-guest interactions.
Methods
Initial zeolite structures are collected from the public Database of Zeolite Structures28 in the Crystallographic Information File (CIF) format, before conversion to the XYZ format with the Atomic Simulation Environment29 (ASE) package. After selection of all systems with less than 301 atoms, each is manually filtered by removing redundant atom positions in case of fractional occupancies and adding missing hydrogen atoms where needed. Each structure’s coordinates and cell parameters are energy-minimized with the periodic density functional code BAND30, as implemented in the Amsterdam Modeling Suite31 (AMS). The calculations are performed with the revPBE functional32,33, a ‘Small’ frozen core and the double-ζ polarized (DZP) basis set. Grimme’s D3(BJ) dispersion correction34 is applied to all calculations. Previous research has shown that the selected level of theory can accurately reproduce zeolite geometries, albeit slightly overestimating the Si-O bond length (in the range of 2 pm) and smaller Si-O-X angles (in the range of 5 degrees) when compared to experimental results35,36. At the same time, dispersion-corrected functionals are generally more accurate when describing adsorption processes37,38,39. For the optimization of the initial structures, geometry convergence criteria are left at their default values, namely 0.001 Hartree/Å, 0.00001 Hartree/Atom and 0.1 Å for atomic gradients, energy and atomic displacements respectively. We use a Quasi-Newton optimizer40 in the delocalized coordinates space for the initial optimizations. Cases of problematic convergence are restarted with the FIRE41 optimizer.
Data Records
The data is made available at the Materials Cloud Archive42. Each system’s trajectory is stored in an individual NumPy43. npz file. We describe the data types held in each file in Table 1, storing the complete geometry optimization trajectory, including atomic coordinates, system energies, nuclear gradients, lattice vectors and stress tensors for each geometry optimization step. Entries at the first position correspond to the input structure; the last position holds the data for the final, optimized structure. Hirshfeld partial charges44 are provided for the final (optimized) geometries. Atomic coordinates and lattice vectors are stored in ångström, all other properties are stored in atomic units.
Technical Validation
The complete data set includes geometry optimizations of 226 systems, resulting in a total of 32550 geometries. System sizes range between 15 and 334 atoms (mean: 126). We illustrate the convergence of all reference calculations in Fig. 1, showing that all optimized systems are well within the defined convergence criteria. Elemental occurrences in the data set are listed in Table 2. Si-O, Si-Si distances as well as Si-O-Si angles are presented in Fig. 2 as the most prominent geometrical descriptors. As most of the initial structures from the IZA database are idealized geometries45, a sharp mean for the Si-O bond distance can be observed at roughly 161 pm (Fig. 2a, blue histogram). Long tails in the distribution vanish and the mean is shifted towards approximately 164 pm when considering geometry-optimized structures (Fig. 2a, orange histogram). Considering the Si-O-Si angles, a slight shift towards smaller values is observed (mean of 149 vs. 142 degrees, Fig. 2c). Both effects have been previously reported by Fischer et al.35,36 and are inherent to the selected level of theory. Distributions of the Si-Si distances in the second coordination sphere do not shift significantly when comparing initial and optimized geometries (Fig. 2b). Relative changes in the cell volumes are presented in Fig. 3 as the ratio of each system’s optimized-to-initial volume. Values below 1 translate to a shrinking unit cell as the optimization progresses. Overall, the geometrical descriptors are in good agreement with experimental data46,47,48,49,50,51. Additional averages for bond distances and angles are summarized in Tables 3, 4 respectively. Distributions of energies, atomic gradients, cell volumes and stress tensors are depicted in Fig. 4. As expected from geometry optimization trajectories, all properties have – with the exception of relative cell volumes – a distinct mean close to zero. Structures close to the initial input geometries contribute to the relatively high standard deviations. Evaluation of the relative cell volumes shows a shifted distribution, with roughly 76% of all structures having a larger volume than their respective optimized geometry. A detailed overview of all calculated structures, sorted by their IZA three-letter-code, the system size and number of iterations is provided in Online Table 1.
Usage Notes
No data points were filtered as outliers with regards to the distributions of chemical properties (see. Figure 4). Consecutive structures from the same optimization trajectory will be autocorrelated. The data repository provides an interactive plotting script, displaying the system energy, maximum absolute component of the nuclear gradients and the cell volume at every iteration step for each structure. This requires the Bokeh52 (v. 2.3.1) package for Python to be installed. SHA-1 hash sums are provided for each file to guarantee data integrity, as well as an example input script for a calculation with BAND. Naming conventions: Derived materials are referred to by their IZA three-letter-code, e.g. H-EU-12 is tabulated as ETL_0. Leading non-alphabetical characters have been removed, e.g. *-ITN is tabulated as ITN.
Code availability
Downloads of the Atomic Simulation Environment29 (v. 3.21.1) and NumPy43 (v. 1.20.1) packages for Python are freely available. Amsterdam Modeling Suite31 (v. 2020.203, r92091) is a commercial software, for which a free trial may be requested at www.scm.com.
References
Smith, J. S. et al. Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nature Communications 10 (2019).
Bannwarth, C., Ehlert, S. & Grimme, S. GFN2-xTB—an accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. J. Chem. Theory Comput. 15, 1652–1671 (2019).
Shao, Y., Hellström, M., Mitev, P. D., Knijff, L. & Zhang, C. PiNN: A python library for building atomic neural networks of molecules and materials. Journal of Chemical Information and Modeling 60, 1184–1193 (2020).
Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. Preprint at https://arxiv.org/abs/2102.09844 (2021).
Behler, J. & Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
Kondratyuk, N. et al. Performance and scalability of materials science and machine learning codes on the state-of-art hybrid supercomputer architecture. In Voevodin, V. & Sobolev, S. (eds.) Supercomputing, 597–609 (Springer International Publishing, Cham, 2019).
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1, a data set of 20 million calculated off-equilibrium conformations for organic molecules. Scientific Data 4, 170193 (2017).
Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Scientific Data 7, 134 (2020).
Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data 1, 140022 (2014).
Materials Cloud Archive. https://archive.materialscloud.org/ (2021).
Talirz, L. et al. Materials cloud, a platform for open computational science. Scientific Data 7, 299 (2020).
NOMAD Laboratory. https://nomad-lab.eu/ (2021).
Draxl, C. & Scheffler, M. Nomad: The fair concept for big data-driven materials science. MRS Bulletin 43, 676–682 (2018).
Davis, M. E. & Lobo, R. F. Zeolite and molecular sieve synthesis. Chemistry of Materials 4, 756–768 (1992).
Cundy, C. S. Microwave techniques in the synthesis and modification of zeolite catalysts. a review. Collection of Czechoslovak Chemical Communications 63, 1699–1723 (1998).
Chen, L.-H. et al. Hierarchically structured zeolites: synthesis, mass transport properties and applications. Journal of Materials Chemistry 22, 17381 (2012).
Moliner, M., Martnez, C. & Corma, A. Multipore zeolites: Synthesis and catalytic applications. Angewandte Chemie International Edition 54, 3560–3579 (2015).
Ozekmekci, M., Salkic, G. & Fellah, M. F. Use of zeolites for the removal of H2S: a mini-review. Fuel Processing Technology 139, 49–60 (2015).
Papaioannou, D., Katsoulos, P., Panousis, N. & Karatzias, H. The role of natural and synthetic zeolites as feed additives on the prevention and/or the treatment of certain farm animal diseases: a review. Microporous and Mesoporous Materials 84, 161–170 (2005).
Dehghan, R. & Anbia, M. Zeolites for adsorptive desulfurization from fuels: a review. Fuel Processing Technology 167, 99–116 (2017).
Derouane, E. et al. The acidity of zeolites: concepts, measurements and relation to catalysis: A review on experimental and theoretical methods for the study of zeolite acidity. Catalysis Reviews 55, 454–515 (2013).
Weitkamp, J. Zeolites and catalysis. Solid State Ionics 131, 175–188 (2000).
Corma, A. State of the art and future challenges of zeolites as catalysts. Journal of Catalysis 216, 298–312 (2003).
Treacy, M. M. J., Randall, K. H., Rao, S., Perry, J. A. & Chadi, D. J. Enumeration of periodic tetrahedral frameworks. Zeitschrift für Kristallographie - Crystalline Materials 212, 768–791 (1997).
Treacy, M. M. J. & Foster, M. Atlas of Prospective Zeolite Structures. http://www.hypotheticalzeolites.net/ (2021).
Pophale, R., Cheeseman, P. A. & Deem, M. W. A database of new zeolite-like materials. Phys. Chem. Chem. Phys. 13, 12407–12412 (2011).
Baerlocher, C., McCusker, L. & Olson, D. Atlas of Zeolite Framework Types (Published on behalf of the Structure Commission of the International Zeolite Association by Elsevier, 2007).
Baerlocher, C. & McCusker, L. Database of Zeolite Structures. http://www.iza-structure.org/databases/.
Larsen, A. H. et al. The atomic simulation environment—a python library for working with atoms. Journal of Physics: Condensed Matter 29, 273002 (2017).
te Velde, G. & Baerends, E. J. Precise density-functional method for periodic structures. Phys. Rev. B 44, 7888–7903 (1991).
Rüger et al. Amsterdam Modeling Suite. https://scm.com (2019).
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Physical Review Letters 77, 3865–3868 (1996).
Zhang, Y. & Yang, W. Comment on “generalized gradient approximation made simple”. Physical Review Letters 80, 890–890 (1998).
Grimme, S., Ehrlich, S. & Goerigk, L. Effect of the damping function in dispersion corrected density functional theory. Journal of Computational Chemistry 32, 1456–1465 (2011).
Fischer, M., Evers, F. O., Formalik, F. & Olejniczak, A. Benchmarking dft-gga calculations for the structure optimisation of neutral-framework zeotypes. Theoretical Chemistry Accounts 135 (2016).
Fischer, M. & Angel, R. J. Accurate structures and energetics of neutral-framework zeotypes from dispersion-corrected dft calculations. The Journal of Chemical Physics 146, 174111 (2017).
Göltl, F., Grüneis, A., Bučko, T. & Hafner, J. Van der waals interactions between hydrocarbon molecules and zeolites: periodic calculations at different levels of theory, from density functional theory to the random phase approximation and møller-plesset perturbation theory. The Journal of Chemical Physics 137, 114111 (2012).
Rehak, F. R., Piccini, G., Alessio, M. & Sauer, J. Including dispersion in density functional theory for adsorption on flat oxide surfaces, in metal—organic frameworks and in acidic zeolites. Physical Chemistry Chemical Physics 22, 7577–7585 (2020).
Stanciakova, K., Louwen, J. N., Weckhuysen, B. M., Bulo, R. E. & Göltl, F. Understanding water—zeolite interactions: on the accuracy of density functionals. The Journal of Physical Chemistry C 125, 20261–20274 (2021).
Swart, M. & Bickelhaupt, F. M. Optimization of strong and weak coordinates. International Journal of Quantum Chemistry 106, 2536–2544 (2006).
Bitzek, E., Koskinen, P., Gähler, F., Moseler, M. & Gumbsch, P. Structural relaxation made simple. Physical Review Letters 97 (2006).
Komissarov, L. & Verstraelen, T. Zeo-1: a computational data set of zeolite structures. Materials Cloud Archive https://doi.org/10.24435/materialscloud:cv-zd (2021).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Hirshfeld, F. L. Bonded-atom fragments for describing molecular charge densities. Theoret. Chim. Acta 44, 129–138 (1977).
Baerlocher, C., Hepp, A. & Meier, W. Dls-76, a fortran program for the simulation of crystal structures by geometric refinement. Institut fur Kristallographie und Petrographie, ETH, Zurich, Switzerland (1978).
Pettifer, R., Dupree, R., Farnan, I. & Sternberg, U. NMR determinations of Si–O–Si bond angle distributions in silica. Journal of Non-Crystalline Solids 106, 408–412 (1988).
Mauri, F., Pasquarello, A., Pfrommer, B. G., Yoon, Y.-G. & Louie, S. G. Si-O-Si bond-angle distribution in vitreous silica from first-principles 29 Si NMR analysis. Physical Review B 62, R4786 (2000).
Wragg, D. S., Morris, R. E. & Burton, A. W. Pure silica zeolite-type frameworks: A structural analysis. Chemistry of Materials 20, 1561–1570 (2008).
Ramdas, S. & Klinowski, J. A simple correlation between isotropic 29 si-nmr chemical shifts and t–o–t angles in zeolite frameworks. Nature 308, 521–523 (1984).
Antao, S. M. Quartz: structural and thermodynamic analyses across the α ↔ β transition with origin of negative thermal expansion (NTE) in β quartz and calcite. Acta Crystallographica Section B Structural Science, Crystal Engineering and Materials 72, 249–262 (2016).
OKeeffe, M. & Hyde, B. G. On Si–O –Si configurations in silicates. Acta Crystallographica Section B 34, 27–32 (1978).
Bokeh Development Team. Bokeh: Python library for interactive visualization. https://bokeh.pydata.org/en/latest/ (2021).
Acknowledgements
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 814143. T.V. acknowledges funding of the research board of Ghent University. The computational resources (Stevin Supercomputer Infrastructure) and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by Ghent University, FWO and the Flemish Government–department EWI.
Author information
Authors and Affiliations
Contributions
L.K. designed and performed the study. Both authors wrote the manuscript. T.V. oversaw the project.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Online Table
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
About this article
Cite this article
Komissarov, L., Verstraelen, T. Zeo-1, a computational data set of zeolite structures. Sci Data 9, 61 (2022). https://doi.org/10.1038/s41597-022-01160-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-022-01160-5
- Springer Nature Limited