Keywords

1 Introduction

The recent evolution of Grid technologies has fostered the formation of Virtual Organizations (VO, http://www.egi.eu/community/vos/) and Virtual Research Communities (VRC, http://www.egi.eu/community/vos/vrcs/) within EGI (the European Grid Infrastructure, http://www.egi.eu/). In the molecular science environment this process has led to the formation of the COMPCHEM VO (https://www3.compchem.unipg.it/compchem/) [1] and of the Chemistry, Molecular & Materials Science and Technology (CMMST) VRC [2].

The specific goals of the CMMST VRC are those of meeting the user requirements for an efficient access and use of high throughput (HTC) and high performance (HPC) computing resources, enabling the composition of higher level of complexity applications through the sharing of hardware and software [3], developing a new collaborative model of carrying out research grounded on a quality evaluation of the work done for the community [4]. This is achieved by assembling a set of inter-linkable applications useful to build higher level of complexity multi-scale computational procedures, exploit the tools offered by EGI to support the activities of a distributed computing community and further develop applications and tools specific of its partners. Such articulation is meant to enable the selection of the provided resources (from personal systems to supercomputers) and services (from number crunching to massive data handling on heterogeneous platforms) [5, 6]. The quality of service (QoS) parameters of the products provided and of the resources used are utilized also to build a metrics on which grounding the rewarding of the work done for the community through a credit acquisition/redemption system.

The most advanced community service offered by CMMST to its members leveraging on the computing platforms accessible through COMPCHEM is the so called Grid Empowered Molecular Simulator (GEMS) [7]. GEMS is a distributed workflow that gathers together the codes implemented by the members of the community bearing the complementary competences necessary to assemble the most accurate treatment of the tackled molecular problem. As recently quoted in the EGI.eu website under the headings “What happens when molecules collide” (https://www.egi.eu/use-cases/research-stories/) GEMS is the simulator of election when one needs to compute accurately and effectively the efficiency of processes in which molecules collide to react, dissociate, exchange energy and deform. Using GEMS, as will be listed in some detail later, applications to interstellar clouds, atmospheric entry, energy transfer processes, materials’ properties, renewable energies utilization, combustions, etc. have been assembled for distributed execution [8,9,10].

In the present paper we discuss recent upgrades of GEMS for its use on distributed and cloud computing infrastructures. In Sect. 2 the structure and workflow of GEMS is briefly reviewed. In Sect. 3 we discuss some issues related to a Cloud OpenStack implementation of GEMS. In Sect. 4 we detail a novel procedure for automated generation of the PES withing this workflow. In Sect. 5 we illustrate calculations on a prototypical (H + H\(_2\)) study case. In Sect. 6 we draw some conclusions and outline perspectives for future work.

2 Structure and Workflow of GEMS

The molecular simulator GEMS has been described in detail elsewhere [5, 7, 11, 12], especially with a focus on its use for the a priori modeling of elementary reactive processes [13,14,15,16]. We thus limit here to briefly review its foundations.

GEMS leverages on the Grid infrastructure of EGI in which a synergistic use of CMMST information is made. In particular, GEMS manages a workflow that is articulated into four modules (Interaction, Fitting, Dynamics, and Statistics) featuring a high degree of interoperability thanks to the definition of the common data formats Q5Cost/D5Cost [17, 18] (see a scheme in Fig. 1). The four blocks are part of a workflow designed to enable the coordinated execution of in-house developed and commercial codes on the distributed platform of the European Grid Infrastructure (EGI) [19] by properly selecting compute resources among the high-performance computing (HPC) and high-throughput computing (HTC) available ones. In particular, the first module (Interaction) deals with the sampling, through high-level electronic-structure calculations, of the potential energy surface (PES) on which a chemical reaction takes place. In the second module (Fitting) an analytic representation of the PES is obtained through fitting or interpolation procedures using the data produced in Interaction. In the third module (Dynamics) dynamical calculations are performed on the PES generated by Fitting. In the fourth module (Statistics), the required statistical averaging is performed on the outcome of Dynamics to get estimates of experimental observables such as cross sections and reaction rate coefficients.

Fig. 1.
figure 1

Scheme of the workflow of the molecular simulator GEMS. Data formats fostering interoperability among the codes used in the various modules are also reported.

GEMS has been successfully used to study the dynamics and compute the reactive probabilities/cross sections and rate coefficients (through the J-shifting approximation [20]) of atom-diatom systems such as H + H\(_2\) [18], Li + FH [16], N + N\(_2\) [21, 22], O + O\(_2\) [23, 24] and OH + CO [25,26,27], where PESs were already available. However, while electronic-structure and fitting programs are already incorporated into GEMS, only quite recently an automated procedure for a full generation of the PES including an optimal choice of geometries at which ab initio calculations should be carried out has been devised [28] and applied to the study of the astrochemical process C + CH\(^+\) [29, 30].

3 A Cloud OpenStack Implementation of GEMS

GEMS and its blocks are a typical distributed computing procedures that can exploit the typical utilities of a cloud environment [31] like producing, transmitting, archiving, reusing on pre-existing and configurable resources made available on Internet by other users, centers and facilities. A characteristic of the cloud is the fact that the resources are not ad hoc configured by the providers for a specific user. However, they are quickly and conveniently made available to the user out of a pool of shared resources through automated procedures following its specific requests. At the end of its application the user releases the resources which become available for further use. There are three main service typologies in cloud computing [32]:

  • SaaS (Software as a Service) - provision of computer programs installed on a remote server most often through a web server;

  • DaaS (Data as a Service) - provision of data with modalities typical of local disks;

  • HaaS (Hardware as a Service) - provision of compute service for elaboration of data supplied by and returned to the user;

that the new (cloud) version of GEMS targets. This is that the various blocks of GEMS (and within each block its different components) are articulated in a way that codes developed by different users can be inserted as SaaS and the data they produce can be offered as DaaS. Moreover, when the GEMS procedure includes an interaction with the experiment (say for input/checks to/of computer simulations) also HaaS is activated. This differentiates the cloud approach of GEMS, being implemented at present, from its previous grid computing oriented one targeted to distributed computing of the different components of the simulator. The new cloud implementation of GEMS will address also two other important services:

  • PaaS (Platform as a Service) - the execution is given in charge, as remote, to a software platform made of various services;

  • IaaS (Infrastructure as a Service) - both virtual and hardware (servers, networking, memory, storage, backup, etc.) resources are made available upon request when needed.

The cloud IaaS provides the user with resources as they were implemented on “standard” systems (personal servers and peripherals). Within this perspective using OpenStack a IaaS for GEMS is being implemented for use by the local Chemistry and Physics community that will be shared between INFN, the University of Perugia and the University of Chieti and will be used within the School on Open Science Cloud for training the students of ITN-EJD-642294_TCCM (4–10 June 2017).

The resulting Cloud enabled GEMS application is able to automatically select and instantiate a given set of resources (i.e. number of Virtual Machines) as a function of the molecular system dimension via the OpenStack REST API [33]. Almost all the components of the workflow can easily distributed in an embarrassingly parallel approach [34], without the explicit need for high throughput and very low latency network infrastructures such as for instance InfiniBand, or different kind of specialized hardware such as GPGPUs. However, in future releases one may try to explore also the support for Docker containers, that provide an easy and efficient way to exploit specialized hardware (see for instance [35]).

The infrastructure where a first cloud enabled version of GEMS has been deployed is made of three main hardware resources geographically distributed in three research centers, namely: Perugia Physics Department (PPD), Perugia Chemistry Department (PCD) and Chieti Pharmacy Department (CPD). The infrastructure is based on OpenStack Mitaka release. The central controller (the so called Cloud Master) is installed at the PPD (where all the main components of OpenStack have been deployed and configured) while all the Nova Compute Nodes are distributed among the three centers.

4 Automated Generation of the PES

One of the most common approaches in generating a single-valued potential energy surface (PES) for use in dynamics calculations of atom-diatom exchange reactions of the type A + BC \(\rightarrow \) AB + C, is to sample the potential energy (the energy of a selected electronic state) computed with high level ab initio methods for a finite number of nuclear configurations (geometries) and to fit or interpolate the obtained set of energies with an appropriate analytic formulation. Ideally, ab initio energies should be computed for a large number of geometries on a dense grid sampling the available configuration space and allowing to reproduce all of the topological features of the PES such as asymptotic atom-diatom channels, saddles and wells. However, ab initio calculations are often computationally highly demanding, and some of them might require additional care as might have not converged to the desired electronic state. Therefore, a wise choice of an as small as possible set of geometries optimally sampling configuration space turns out often to be crucial.

In the following, we shall focus on the two topics of configuration-space sampling (i.e., choosing the set of geometries) and that of fitting/interpolating the obtained values.

A. Configuration-space sampling. A common way of sampling the three-atom configuration space for A + BC \(\rightarrow \) AB + C reactions is to use the following internal coordinates:

  • the interatomic distance of the reactant diatomic (\(r_1\))

  • the interatomic distance of the product diatomic (\(r_2\))

  • the angle formed by these two distances (\(\phi \))

and to adopt regular grids defined on these coordinates.

One of us has recently published a more efficient scheme, namely the space-reduced bond-order (SRBO) scheme [28], which introduces a diatom-tailored force-based metric to better sample the configuration space of the diatoms. In the SRBO scheme, use is made of properly defined diatomic bond-order (BO) variables \(n = \exp [-\beta (r - r_\mathrm {e})]\) (with r being the diatom internuclear distance and \(r_\mathrm {e}\) its equilibrium value) where \(\beta \) is relaxed so as to reach a desired ratio f between the sampled attractive (\(0< n < 1\)) and repulsive (\(1< n < \exp [\beta r_\mathrm {e}]\)) regions of the diatom configuration space. A proper tuning of f and the adoption of regular grids in SRBO variables allows for a wise, process-oriented selection of geometries providing a small, most informative set of electronic energies [28].

Whereas in principle one may define the SRBO grid on the whole diatom configuration space ranging from \(n = 0\) (infinite distance) to \(n = \exp [\beta r_\mathrm {e}]\) (collapsed atoms), it turns useful to introduce two boundary values \(n_\mathrm {min}\) and \(n_\mathrm {max}\) to avoid including points in the highly repulsive regions (not accessible to the dynamics) or the long-range weakly attractive potential regions. Diatom tailored values of \(n_\mathrm {min}\) and \(n_\mathrm {max}\) can be obtained by the equilibrium property (equilibrium distance \(r_\mathrm {e}\), dissociation energy \(D_\mathrm {e}\) and force constant \(k_\mathrm {e}\)) of the considered diatom through a Morse modeling of the potential by setting the parameters \(V_\mathrm {fact}\) and \(V_\mathrm {thrs}\). \(V_\mathrm {fact}\) sets a boundary where the Morse potential is exactly \(V_\mathrm {fact}\) times the dissociation energy. \(V_\mathrm {thrs}\) sets a boundary where the Morse potential has reached the dissociation energy within a certain threshold. The reader is referred to Ref. [28] for further details on these aspects.

Accordingly, the workflow for setting up a one-dimensional SRBO grid for a given diatom is as follows:

  • compute \(r_\mathrm {e}\), \(D_\mathrm {e}\) and \(k_\mathrm {e}\) (this is done trough a geometry-optimization calculation) for the diatom in the proper electronic state

  • choose suitable values for f, \(V_\mathrm {fact}\), and \(V_\mathrm {thrs}\) (space-ratio and bounding parameters)

  • get a regular grid in the considered segments of \(n = \exp [-\beta (r-r_\mathrm {e})]\)

The full three-dimensional SRBO grid (on which ab initio calculations should be carried out for studying reaction A + BC \(\rightarrow \) AB + C) can then be constructed by the two one-dimensional SRBO grids for the reactant and product diatoms and a regular grid in the angular coordinate \(\phi \).

B. Fitting or interpolation. Once the three-dimensional SRBO grid has been set up, ab initio calculations have to be carried out yielding the energy of the selected electronic state at each point of the SRBO grid. The obtained set of energies is ready to be either fit or interpolated with a suitable analytic functional form. Aguado and Paniagua have made available a code (GFIT3C) [36] for fitting a set of two-body and three-body ab initio energies with the Aguado-Paniagua functional form [37]. Another successful approach is that based on the so-called modified Shepard (local) interpolation [38, 39], whereby the potential energy is written as a weighed sum of second-order Taylor expansions around a set of ab initio energies (note that for this the second order derivatives of the electronic energies are required and a suitable ab initio method should be chosen).

Summary. The workflow for the above devised procedure can be summarized as follows. Once the desired electronic states have been chosen for the possible atoms, diatoms and the triatom of the A + BC reactive system, the following steps are performed:

  1. 1.

    Compute two-body equilibrium properties (\(r_\mathrm {e}\), \(D_\mathrm {e}\), \(k_\mathrm {e}\)) of the reactant and product diatoms

  2. 2.

    Set f, \(V_\mathrm {fact}\), \(V_\mathrm {thrs}\) and set up two one-dimensional SRBO grids for these diatoms

  3. 3.

    Set up a regular grid on the angular coordinate \(\phi \) and assemble the full three-dimensional SRBO grid

  4. 4.

    Compute two-body and three-body electronic energies at the points of the three-dimensional SRBO grid

  5. 5.

    Fit or interpolate the computed energies to get an analytic representation of the PES

5 Calculations on the H + H\(_2\) Prototype

To illustrate how the above scheme works on a real case, we report in this section calculations on the H + H\(_2\) prototypical system.

Computational details. Configuration-space sampling was performed according to the SRBO scheme [28] (see later on for details). A computer program for generating one-dimensional SRBO grids starting from input parameters \(r_\mathrm {e}\), \(D_\mathrm {e}\), \(k_\mathrm {e}\), \(V_\mathrm {fact}\), \(V_\mathrm {thrs}\), and f is made available.Footnote 1

Ab initio calculations were carried out at full configuration-interaction (FCI) level of theory with the MOLPRO package [40] version 2010.1 using Dunning’s correlation-consistent basis sets cc-pVTZ (hereafter VTZ) [41].

The AP fit was carried out using the GFIT3C program [36], employing a 6th-degree polynomial fit for the two-body term and a 9th-degree polynomial fit for the three-body term. The correct permutational symmetry was ensured by setting input parameter INDICE = 1 (specifying an A\(_3\) system as opposed to AB\(_2\) or ABC, with A, B and C labelling the three atoms).

Reactive scattering calculations on the H + H\(_2\) \(\rightarrow \) H\(_2\) + H atom-exchange reaction for total angular momentum \(J = 0\) and total-energy range 0.4–1.4 eV were carried out within a time-independent (TI) hyperspherical-coordinate formalism as implemented in the ABC program [42] by setting input parameters emax = 2.4 eV (maximum internal energy in any channel), jmax = 50 (maximum rotational quantum number in any channel), rmax = 12.0 \(a_0\) (maximum hyperradius) and mtr = 150 (number of log-derivative propagation sectors).

A typical set of input data adopted for the ABC code is given in Table 1.

Table 1. Typical input parameters adopted for the present TI calculations.

Results. For the three-body configuration-space sampling of H\(_3\), an SRBO three-dimensional grid defined on internal coordinates \(r_1\) and \(r_2\) (two bond distances) and \(\phi \) (the angle formed by the related bonds) was adopted. To the purpose of constructing the one-dimensional SRBO grids on \(r_1\) and \(r_2\), the FCI/VTZ values of \(r_e\) = 1.7576 \(a_0\), \(D_e\) = 0.1727 \(E_\mathrm {H}\), \(k_e = 0.3707\) \(E_\mathrm {H} a_0^{-2}\) were used. SRBO parameters were set to \(V_\mathrm {fact}\) = 2.0, \(V_\mathrm {thrs}\) = 0.001 and \(f=N_\mathrm {a}/N_\mathrm {r}\), with N\(_\mathrm {a}\) set to 6 and N\(_\mathrm {r}\) set to 3 (this means 6 points in the attractive region and 3 in the repulsive region for a total number of 10 points, including the equilibrium geometry which is neither attractive nor repulsive). A regular grid of five values ranging from 180\(^\circ \) (collinear configuration) to 60\(^\circ \) in steps of 30\(^\circ \) was set up for the angular coordinate \(\phi \).

Due to the symmetry of the system, for each angle only \(\frac{(N_\mathrm {r} + N_\mathrm {a} + 1) \times [(N_\mathrm {r} + N_\mathrm {a} + 1) + 1]}{2}\) rather than \((N_\mathrm {r} + N_\mathrm {a} + 1)^2\) grid points were computed (only the points for which \(r_2 \le r_1\)). The FCI/VTZ saddle point (\(r_1\) = \(r_2\) = 1.7576 \(a_0\), \(\phi =180^\circ \)) was also added to the ensemble of points. The corresponding final three-dimensional grid is made of \(N = \frac{(N_\mathrm {r} + N_\mathrm {a} + 1) \times [(N_\mathrm {r} + N_\mathrm {a} + 1) + 1]}{2} \times 5 + 1 =\) 276 points (where the plus one accounts for the just mentioned H\(_3\) transition-state geometry). For illustrative purposes, the points of the fixed angle (\(\phi =180^\circ \)) two-dimensional SRBO grid in \(r_1\) and \(r_2\) are depicted in Fig. 2. As apparent from the figure, the distribution of the computed ab initio points indicates a fair coverage of the molecular geometry space by the SRBO grid used for the calculations, with a denser concentration in the high-gradient regions of the potential. The figure also shows the smoothness of the fitted PES and the occurrence of a barrier the strong the interaction region.

Fig. 2.
figure 2

Scheme of a fixed angle (\(\phi =180^\circ \)) two-dimensional SRBO grid in \(r_1\) and \(r_2\) used for the three-dimensional H\(_3\) testcase. Computed FCI/VTZ points together with the SRBO-AP fitted surface are also shown. The energy zero is set to the bottom of the reactant valley.

Fig. 3.
figure 3

\(J=0\) state-specific (\(v=0\), \(j=0\)) reactive probabilities plotted as a function of the total energy E.

To carry out the dynamical calculations we went through the next module of GEMS Dynamics. The \(J=0\) state-specific (\(v=0,j=0\)) quantum reactive probabilities for the H + H\(_2\) exchange reaction on the generated PES were calculated for a grid of total energy values ranging from 0.4 eV to 1.4 eV in steps of 0.01 eV and are compared in Fig. 3 with results obtained using the popular LSTH PES [43,44,45,46]. A resolution of these state-specific quantities in the product vibrational state (for the vibrational channels \(v'=0\) and \(v'=1\) open in the considered energy range) is also shown in the Figure. As apparent from the Figure, the PESs provide qualitatively similar reactive probabilities, with the SRBO-AP PES slightly reducing the threshold and enhancing reactivity at the near-threshold region, though being less reactive in the higher energies region.

6 Conclusions and Future Work

In this paper the innovative synergistic distributed computing model developed by the COMPCHEM VO and constituting the angular stone of the CMMST VRC is adopted for a full dimensional quantum study of the H + H\(_2\) reaction. For this purpose the exploitation of a quality based combined access to HTC and HPC computing resources and of a workflow approach enabling the composition of complementary/competitive codes has allowed the implementation of an efficient computational machinery able to tackle higher level of complexity applications like the fully ab initio simulation of data produced in CMB experiments.

The calculations presented here focus on the benchmark H + H\(_2\) system in order to illustrate as a single workflow implementation of GEMS can work on a combined HTC-HPC platform. More complex applications can be assembled using the machinery illustrated in the paper when the single modules of GEMS are used to describe more complex molecular processes, to evaluate more statistically averaged properties, to compose multiscale simulations of different atomistic granularity. In that case use of metaworkflows will be made to combine simpler (atomistic) workflows à la carte. In doing this particular care has to be put in handling members’ data resources (ranging from experimental measurements recorded either in an individual laboratory or over large infrastructure facilities) specific to given analyses and simulation applications. These are usually stored in data structures and formats specific of the particular laboratory and will need to be ported to a uniform, open metadata format accompanied by robust ontologies.

For this reason next evolution of GEMS towards two-way communications between Computational and Experimental Chemists in the cloud will have to be implemented to facilitate the cooperative use of molecular simulations whose results can be further checked in experiments whose outcomes will fuel the design and running of simulations that can complement experiments to produce new research achievements. To this end a data management support that will facilitate re-usability and consequently reproducibility of the handled information using metadata will have to be developed. Experimental researchers produce a large volume of primary data in many different data formats that make it difficult to share data among researchers. Furthermore, storing and sharing primary experimental data might not be meaningful because it does not contain information about how it was obtained and processed. Accordingly, a system of metadata to primary data (particularly provenance information) will have to be designed and implemented in GEMS in the future.