Ab Initio Transport Calculations for Functionalized Graphene Flakes on a Supercomputer

Walz, Michael; Bagrets, Alexei; Evers, Ferdinand; Kondov, Ivan

doi:10.1007/978-3-319-24633-8_9

Michael Walz⁴,
Alexei Bagrets⁵,
Ferdinand Evers⁵ &
…
Ivan Kondov⁶

1818 Accesses

Abstract

We present ab initio transport studies of large graphene flakes focusing on the local current density j(r) as it arises from a dc-transport measurement. Such ab initio transport calculations for sufficiently large flakes can be successfully tackled only using well scaling ab initio packages capable for transport calculations in thin film geometries. We employ the FHI-aims /AitransS packages to study the effect of disorder on the local current density in graphene flakes, in particular, the effect of chemical functionalization on mesoscopic fluctuations of the current density. Simulating graphene flakes with several thousands of atoms, we clearly see the qualitative effects of quantum interference and mesoscopic fluctuations in such systems. We also discuss the parallelization and optimization techniques, which we implemented into the transport module AitransS to allow efficient ab initio transport calculation on Cray XE6 and XC40 supercomputers.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Electron Quantum Transport in Disordered Graphene

Effect of random vacancies on the electronic properties of graphene and T graphene: a theoretical approach

Article 13 July 2017

Ab Initio Calculations of the Electronic Properties and the Transport Phenomena in Graphene Materials

Article 16 November 2020

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Graphene is an atomically thin layer of carbon atoms that self-organize in a honeycomb lattice. Since several years, such layers are routinely manufactured in laboratories worldwide. Electron transport in graphene and other graphene properties are investigated intensively and form one of the hottest fields in condensed matter and materials sciences. The driving force results from prospective applications in information and nanotechnology, e.g., as field effect transistors or spin quantum bits. Also, chemical functionalization triggers proposals for additional technologies like hydrogen storage or sensor applications.

In most applications, the electron transport properties of graphene-based films are crucial. Despite intensive research, the transport properties are far from being fully understood, especially concerning the influence of disorder, as it is induced, e.g., by functionalization with adsorbates. In computational work, disorder effects have been treated so far mostly on the level of tight-binding calculations that incorporate aspects of π-electron physics; genuine ab initio treatments of transport in functionalized molecular films were not yet available. Partially, this is because electronic structure calculations based on the density functional theory (DFT) for large, sufficiently representable flakes are computationally highly demanding. They can be successfully treated only by the most advanced ab initio packages. An additional difficulty is that such packages are often not capable for transport calculations in film geometries.

Recently, the FHI-aims code [1] has been complemented with a versatile transport module AitransS [2–5], which implements the non-equilibrium Green’s function formalism (NEGF) and uses DFT results as input. It enables computing ab initio transport properties of single molecules but also of more extended structures such as molecular flakes, films or nanotubes.

In this work we study the effect of disorder (e.g. hydrogen adsorbates) and chemical functionalization on the conductance of graphene flakes. In contrast to the presently available tight-binding treatments, our DFT approach includes the effects of charging and screening of impurities, and lattice distortion, i.e., strain. As a consequence, we can study the cross-talk between different impurities which is not possible with present tight-binding implementations.

The sizes of computationally tractable graphene flakes reach 2500 carbon atoms. This is sufficient to see the qualitative effects of quantum interference and mesoscopic fluctuations in such systems. Reaching such flake sizes was only possible by achieving an excellent DFT performance on the HLRS systems when using the FHI-aims package. In addition, we improved the parallel scaling of the transport module AitransS .

The structure of the paper is as follows: in Sect. 2.1, we describe our transport method as implemented in the code, and we comment on the employed parallelization techniques (Sect. 2.2). In Sect. 3.1, the key findings are summarized. For reference, we provide the numerical parameters used in a typical calculation in Sect. 3.2. Next, the achieved excellent performance on the HLRS computing systems is shown, first for the DFT simulation (Sect. 3.3), second for the transport part (Sect. 3.4). In Sect. 3.5, we present two optimizations, necessary for achieving the excellent performance.

2 Methods

Our transport calculations are performed in a two-step procedure. First, we perform a DFT calculation using the all-electron DFT code FHI-aims including relaxation effects. Second, we perform a transport calculation using AitransS with the Kohn–Sham orbitals from the previous DFT step as input. We summarize the transport calculation in the following and point out where parallelization is necessary. The implementation description of FHI-aims and AitransS is outlined in [1] and [5], respectively.

2.1 Transport Calculations

We extract the Kohn–Sham (KS) Green’s function $\mathbf{G}_{0}^{\text{KS}}(E) = \left (E\mathbb{1} -\mathbf{H}^{\text{KS}} +\mathrm{ i}0\right )^{-1}$ for a finite disordered graphene flake from the DFT calculations. To model the infinite extension of the system in current flow direction, we compute the self-energies $\boldsymbol{\varSigma }_{\text{L/R}}$ using absorbing boundary conditions [5, 6]. The self-energies $\boldsymbol{\varSigma }_{\text{L/R}}$ represent the influences of the leads. The resulting Green’s function

$$\displaystyle{ \mathbf{G}(E)^{-1} = \mathbf{G}_{ 0}^{\text{KS}}(E)^{-1} -\boldsymbol{\varSigma }_{ \text{L}}(E) -\boldsymbol{\varSigma }_{\text{R}}(E) }$$

(1)

describes the propagation of KS particles in the device in the presence of leads and is used to calculate the transmission $\mathcal{T} (E) =\mathop{ \mathrm{Tr}}\limits \{\boldsymbol{\Gamma }_{\rm{L}}\,\mathbf{G}\,\boldsymbol{\Gamma }_{\rm{R}}\,\mathbf{G}^{\dag }\}\,.$ Here, $\boldsymbol{\varGamma }_{\text{L/R}}$ denote the anti-Hermitian parts of the self-energies, i.e., $\boldsymbol{\varGamma }_{ \text{L/R}} =\mathrm{ i}(\boldsymbol{\varSigma }_{\text{L/R}} -\boldsymbol{\varSigma }_{\text{L/R}}^{\dag })$. They account for the level broadenings in the finite graphene flake due to the coupling to the leads.

Using the retarded Green’s function, we calculate the non-equilibrium Keldysh Green’s function $\mathbf{G}^{<} =\mathrm{ i}\mathbf{G}\big[f_{\text{L}}\boldsymbol{\varGamma }_{\text{L}} + f_{\text{R}}\boldsymbol{\varGamma }_{\text{R}}\big]\mathbf{G}^{\dag }$. Assuming that the scattering states originating from the left/right lead are occupied/unoccupied, the Keldysh Green’s function G ^<(E) simplifies to

$$\begin{array}{rlrlrl} \mathbf{G}^{<}(E) =\mathrm{ i}\mathbf{G}(E)\boldsymbol{\varGamma }_{ \text{L}}(E)\mathbf{G}^{\dag }(E)\,. & &\end{array}$$

(2)

Here, we assumed zero temperature and an energy inside the voltage window, so that f _L = 1 and f _R = 0 for the occupation in the leads.

We use orthonormal basis functions ${\tilde{\varphi }_{i}}(\mathbf{r})$ [constructed via Löwdin-orthogonalization from the DFT basis functions $\varphi _{i}(\mathbf{r})$] to transform the Keldysh Green’s function into real space:

$$\displaystyle{ \mathbf{G}^{<}(\mathbf{r},\mathbf{r}',E) =\sum _{ ij}\tilde{\varphi }_{i}(\mathbf{r})\mathbf{G}_{ij}^{<}(E)\tilde{\varphi }_{ j}^{{\ast}}(\mathbf{r}')\,. }$$

(3)

The current density (per spin and energy) is then expressed as

$$\displaystyle{ \mathbf{j}(\mathbf{r},E) = \frac{1} {2\pi } \frac{\hslash } {2m}\lim \limits _{\mathbf{r}'\rightarrow \mathbf{r}}(\boldsymbol{\nabla }_{\mathbf{r}'} -\boldsymbol{\nabla }_{\mathbf{r}})G^{<}(\mathbf{r},\mathbf{r}',E)\,. }$$

(4)

2.2 Implementation: Parallelization Efforts

Because the computational demand of the DFT and transport steps scale very similar with flake size (see Sect. 3.5), none of the steps represents a single bottleneck for the whole simulation. Thus, we also optimized and parallelized the transport module AitransS to enable study of large flakes.

The AitransS code uses two types of parallelization:

shared-memory parallelization, :: in which several threads within one process have access to the same data, i.e., the same energy point E. We use threaded LAPACK as implemented in the Intel MKL for matrix operations and OpenMP for parallelization of loops over real space, etc.
distributed-memory parallelization, :: in which each process works with separate data sets for possibly different energy points E. The Message Passing Interface (MPI) is used for work balancing, e.g., for distributing different energy points to different processes, cutting and distributing the real-space grid.

A workflow diagram of the parallelized modules is shown in Fig. 1.

3 Results

The results of this work are twofold. First, we summarize the physics results for the investigated graphene flakes in Sect. 3.1. Second, we present our development of computational techniques: (a) we demonstrate that our applied codes scale sufficiently when using a few thousand cores and (b) we discuss the improvements in our transport module that were necessary to achieve good scalability.

3.1 Overview of the Key Findings

As a preparation for this work, we calculated the local current density response j(r, E) of pristine armchair graphene nanoribbons (AGNRs) with varying width [7]. We observe pronounced current patterns, which we call “streamlines”, with threefold periodicity in the ribbon width. They arise as a consequence of quantum confinement in the direction transverse to the current flow. Neighboring streamlines are separated by stripes of almost vanishing flow. This effect can be explained in a tight-binding toy model. The response of the current density to adatoms is very sensitive to the placement: adatoms placed within the current filaments lead to strong backscattering; while in other regions, adatoms have almost no impact.

Then, we switched to larger graphene flakes calculating the local current density j(r, E) in the presence of hydrogen adsorbates [8], an example with 5 % hydrogen is shown in Fig. 2. We discovered pronounced local current patterns, ring currents (current vortices), that go along with orbital magnetism. Importantly, the magnitude of the ring currents can exceed the average transport current by orders of magnitude. The associated magnetic fields exhibit drastic fluctuations with large field gradients reaching up to $1\,\text{T}\,\text{nm}^{-1}\,\text{V}^{-1}$. These observations are relevant for spin relaxation in systems with very weak spin–orbit interaction, e.g., organic semiconductors. In such systems, spin relaxation induced by bias-driven orbital magnetism competes with the hyperfine interaction. Both appear to be of similar strength. As a result of our calculation, we propose an NMR-type experiment combined with a dc-transport setup to observe the spatial fluctuations of the induced magnetic fields. We studied several impurity concentrations and different graphene flake sizes. The described physics seems to be independent and, therefore, should also be present in larger mesoscopic samples which are more common in experiments.

We also studied the statistical distribution of the current density in the graphene flakes [9]. The distribution function of the current density follows a log-normal distribution in a wide range. Its typical value is larger than the average current. Therefore, there are always significant contributions to the current density which do not contribute to the conductance, i.e., they form current rings. This work is still on-going, but so far, these features seem to be remarkably stable, in a wide range of impurity concentrations (5–30 %) and system sizes (up to 2500 carbon atoms), and even survive an averaging over several scattering states, e.g., when a finite bias voltage is applied to the system.

3.2 Typical Numerical Parameters

In Table 1, we list the numerical parameters of a typical calculation performed for hydrogenated graphene flakes. First, the graphene flake is structurally relaxed using FHI-aims until the remaining forces decrease below $10^{-2}\,\,\text{eV}/\text{\AA }$. This is, by far, the most expensive part of the calculation. Then a final DFT run for the relaxed structure is performed and the output written to disk. This is used by AitransS to perform a wide scan over the transmission function (the self-energies $\boldsymbol{\varSigma }$ are pre-calculated since they only depend on the system size, not on the impurity configuration). Eventually, a few interesting energy points are taken from the transmission function and the current density is calculated at those energy points.

Table 1 An overview of a typical calculation performed on Cray XC40 (Hornet) for a graphene flake (with 1312 carbon atoms) whose central 24 × 41 carbon atoms have been functionalized with hydrogen (compare with Fig. 2)

Full size table

3.3 Achieved DFT Performance on Cray XE6 (Hermit) and Cray XC40 (Hornet)

Within a test project on Hermit for which a budget of 40,000 core hours has been granted, we carried out code porting, tuning and performance measurements for DFT calculations of graphene to ensure that the FHI-aims code scales as necessary for the completion of the project goals and that the envisaged computing system Hermit was appropriate for the planned productive simulations.

The scalability of the FHI-aims code was demonstrated running the DFT module for pure planar graphene flakes, in which dangling bonds at the boundaries were saturated with hydrogen atoms with 170 (10×17), 345 (15×23), 735 (21×35) and 1500 (30×50) carbon atoms, each represented by a tier 1 basis set, i.e. 14 basis functions per C atom and 5 per H atom. The total number of basis functions N is 2560, 5095, 10,675 and 21,550, respectively.

Figure 3 (upper plot) presents the scaling of the code where the data from each set (color) are plotted as reduced speedup T _n∕T _P. T _P is the wall time of calculation on P processor cores and T _n is the wall time on the minimum number of cores n necessary for the job to finish within wall time limit (24 h). Thus, n is 8, 16, 32 or 256, depending on the graphene flake size, i.e., each color in Fig. 3 is for a different n. The normalization with n is also necessary to allow easy comparison of the speedups for the four graphene flake sizes. Then, the data for each set have been fitted to Amdahl’s law (the solid lines) showing that the serial fraction of the work α is always very small, around 1 ‰( = 0. 001) and furthermore decreases with increasing the graphene flake size.

Figure 3 (lower plot) shows the scaling for graphene using the same data as in Fig. 3 (upper plot) now represented by the speedup $S(P) =\tilde{ T}_{1}/T_{P}$ achieved on P cores, where $\tilde{T}_{1}$ is a hypothetical time for which the same calculation would take when a single core were used. Note that for a fixed number of cores used for computations, e.g., set of data points for P = 256, the speedup is improved when the size of the problem increases approaching the ideal speedup (Gustafson’s law).

3.4 Parallelization of the Transport Module AITRANSS and Achieved Performance

In Fig. 4, we present detailed measurements of the performance of our transport code for realistic system sizes. In the tests, we distinguish between calculation of the transmission and of local observables. Transmission calculations ($\mathcal{T} (E)$) also include the density of states ρ(E). Local observables are the local current density j(r, E) but also its divergence $\boldsymbol{\nabla }\boldsymbol{\cdot }\mathbf{j}(\mathbf{r},E)$, the non-equilibrium density n(r, E) and the local density of states ρ(r, E).

In the upper panel, the wall time for calculating N _E = 128 transmission and density of states of hydrogen-saturated AGNRs (the same as in Sect. 3.3) is plotted depending on the number of basis functions N. The total time $T_{\mathcal{T}}(N)$ is divided into four groups (t _H, t _diag, $t_{\boldsymbol{\varSigma }}$, t _G, cf. Fig. 1). Because the calculation (via 200 iterations in the decimation technique [5]) of the self-energy representing the leads depends directly on the number of basis functions N _lead of the leads (and only indirectly on the basis functions N of the device region), it is plotted separately. The main effort for a transmission calculation is reconstructing the self-energy of the leads; therefore it makes sense to save them on the hard disk if several different impurity configurations are processed which all use the same leads. Then, the main effort is spent for reconstructing the Green’s function (t _G). The diagonalization (t _diag, see Sect. 3.5.3) does not significantly contribute to the overall effort for the considered system sizes since it is performed only once and not for every energy point.

In the lower panel of Fig. 4, the same quantities are plotted for a single (N _E = 1) current density calculation. The number of grid points used is proportional to the system size (one grid point every 0. 2 Å, 31 in z-direction). We first note, that the main effort is dominated by the discretization of the local quantities (t _(r)). We were able to optimize local observable calculation to scale below N ² employing the locality of the basis functions.^{Footnote 1}

In Fig. 5, we discuss the parallelization efficiency of the transmission calculation and current density calculation for a fixed system size (21×35 = 735 carbon atoms). The speedup S for many MPI processes compared to a single process is shown and compared to Amdahl’s law: $T_{N_{ \text{MPI}}} = T_{1}\left [\alpha +(1-\alpha )/N_{\text{MPI}}\right ]$, α ≈ 1%. We see a good scalability for the total wall time, the self-energy construction and the Green’s function construction ($T_{\mathcal{T}}$, $t_{\boldsymbol{\varSigma }}$, t _G). The reconstruction of the KS Hamiltonian does not speedup since only the first MPI process is involved (cf. Fig. 1). For the diagonalization, we even observe that using two processes (using ScaLAPACK) is slower than using only one process (using LAPACK). Therefore, our code now only uses ScaLAPACK starting with 4 MPI processes.

3.5 Optimization of the Transport Module AitransS

3.5.1 Overview: Code Optimization

In the following sections, the two most important optimizations used throughout our code development are presented. Their performance impact is summarized in Table 2. Please note, that both optimizations have larger impact for larger systems.

Table 2 Increase in wall time when specific optimizations are removed from the code

Full size table

The optimization “SpaceBlocks” is vital because without it calculations for systems with more than 1000 atoms become unfeasible. Also the optimization “MatrixInverse” is quite handy because it allows quick transmission scans before turning to more expensive current density calculations.

3.5.2 SpaceBlocks: Dividing Space into Blocks

Here, we discuss how to evaluate the formulas for space-depending local quantities such as the current density

(5)

In principle, the double sum runs over all basis functions of the underlying DFT simulation. FHI-aims uses numerically tabulated atom-centered orbitals (NAOs), i.e. localized basis functions. When restricting the spatial point r to a small region, most basis functions are vanishing inside this region. (These basis functions are, of course, nonzero elsewhere.) This locality in the basis set can be exploited in the following way.

First, we define r _max as the maximal radial extent of all basis functions (i.e. all basis functions are zero at points which are further away from the central atom than r _max). Second, the 3D space is divided into little cubes with edge length r _max∕n with n being an integer (see Fig. 6 for an example). When performing the sum of Eq. (5) for any spatial point r inside the blue shaded area, the only basis functions taken into account are centered around atoms in the green (and blue) shaded area. All other basis functions do not contribute to this area.

Hence, we divide the space into cubes of length r _max∕n and distribute them to separate MPI processes. The integer n is chosen such that every MPI process works on at least five blocks to alleviate load imbalance due to different block sizes. Then, for each inner (blue) block, we restrict the Green’s function G ^< to the basis functions localized at atoms in the extended (green) block.

3.5.3 MatrixInverse: Calculating the Green’s Function Inverse

As the self-energy can be read from hard disk, the most expensive part in a transmission calculation is the matrix inversion in calculating the retarded Green’s function G, cf. Eq. (1). According to Fig. 4 (upper panel), G can be constructed in $\mathcal{O}(N^{2})$. Without this optimization the matrix inversion in calculating the Green’s function would scale as $\mathcal{O}(N^{3})$ and would therefore dominate for large systems.

Partitioning of the Green’s function: The Green’s function inverse, see Eq. (1), can be calculated by transforming the Hamiltonian so that it is diagonal in the regions where the self-energies $\boldsymbol{\varSigma }$ are zero. We partition the indices in the Green’s function inverse such that the self-energy contribution of the leads only appears in subblock D, i.e.,

$$\displaystyle{ \begin{array}{ll} \mathbf{G}^{-1} & = E\mathbb{1} -\mathbf{H} -\boldsymbol{\varSigma }_{L}(E) -\boldsymbol{\varSigma }_{R}(E) \\ & = \left (\begin{array}{*{10}c} E\mathbb{1}_{\text{AA}} -\mathbf{H}_{\text{AA}}&& -\mathbf{H}_{\text{AD}} \\ -\mathbf{H}_{\text{DA}} &&E\mathbb{1}_{\text{DD}} -\mathbf{H}_{\text{DD}} -\boldsymbol{\varSigma }_{L}(E) -\boldsymbol{\varSigma }_{R}(E) \end{array} \right )^{-1} =: \left (\begin{array}{*{10}c} \mathbf{A}&\mathbf{B} \\ \mathbf{C}&\mathbf{D} \end{array} \right )^{-1}\,, \end{array} }$$

(6)

with the subscripts AA, AD, DA, DD denoting the restriction to the respective matrix subspace.

As advantage of this partitioning, the only non-trivial energy dependence appears in subblock $\mathbf{D} = E\mathbb{1}_{\text{DD}} -\mathbf{H}_{\text{DD}} -\boldsymbol{\varSigma }_{L}(E) -\boldsymbol{\varSigma }_{R}(E)$. The block A can be diagonalized for all energies in a single eigenvalue problem: the eigenvalues are given by the diagonal matrix $\mathbf{\tilde{A}} = E\mathbb{1} -\mathbf{\tilde{H}}_{\text{AA}}$ where $\mathbf{\tilde{H}}_{\text{AA}}$ denotes the diagonal matrix with the eigenvalues of H _AA. The transformation matrix V ($\mathbf{\tilde{H}}_{\text{AA}}=\mathbf{V}^{-1}\mathbf{H}_{\text{AA}}\mathbf{V}$) is constructed by filling its columns with the (right) eigenvectors of H _AA. The off-diagonal blocks stay energy independent, i.e., $\mathbf{\tilde{B}} = -\mathbf{V}^{-1}\mathbf{H}_{\text{AD}}$.

General matrix: For the matrix inversion, we first tend to a general matrix which we divide into four blocks

$$\displaystyle{ \left (\begin{array}{*{10}c} \mathbf{A}&\mathbf{B}\\ \mathbf{C} &\mathbf{D} \end{array} \right )\,, }$$

(7)

so that the submatrices A and D are square matrices. The inverse is given by

$$\displaystyle{ \left (\begin{array}{*{10}c} \mathbf{A}&\mathbf{B}\\ \mathbf{C} &\mathbf{D} \end{array} \right )^{-1} =\ \left (\begin{array}{*{10}c} \mathbf{A}^{-1}(\mathbb{1} + \mathbf{B}\mathbf{E}^{-1}\mathbf{C}\mathbf{A}^{-1})&-\mathbf{A}^{-1}\mathbf{B}\mathbf{E}^{-1} \\ -\mathbf{E}^{-1}\mathbf{C}\mathbf{A}^{-1} & \mathbf{E}^{-1} \end{array} \right )\text{ with }\mathbf{E}:= \mathbf{D}-\mathbf{C}\mathbf{A}^{-1}\mathbf{B} }$$

(8)

as is easily checked by direct matrix multiplication.

Transforming A into diagonal form $\mathbf{\tilde{A}}$, i.e., $\mathbf{A} = \mathbf{V}\mathbf{\tilde{A}}\mathbf{V}^{-1}$, makes the calculation of the inverse $\mathbf{\tilde{A}}^{-1}$ trivial and we get:

$$\displaystyle{ \left (\begin{array}{*{10}c} \mathbf{A}&\mathbf{B}\\ \mathbf{C} &\mathbf{D} \end{array} \right )^{-1}=\ \left (\begin{array}{*{10}c} \mathbf{V}\mathbf{\tilde{A}}^{-1}(\mathbb{1} + \mathbf{\tilde{B}}\mathbf{E}^{-1}\mathbf{\tilde{C}}\mathbf{\tilde{A}}^{-1})\mathbf{V}^{-1} & -\mathbf{V}\mathbf{\tilde{A}}^{-1}\mathbf{\tilde{B}}\mathbf{E}^{-1} \\ -\mathbf{E}^{-1}\mathbf{\tilde{C}}\mathbf{\tilde{A}}^{-1}\mathbf{V}^{-1} & \mathbf{E}^{-1} \end{array} \right )\text{ with }\mathbf{E}:= \mathbf{D}-\mathbf{\tilde{C}}\mathbf{\tilde{A}}^{-1}\mathbf{\tilde{B}} }$$

(9)

using the abbreviations $\mathbf{\tilde{C}}:= \mathbf{C}\mathbf{V}$ and $\mathbf{\tilde{B}}:= \mathbf{V}^{-1}\mathbf{B}$.

Exploiting symmetries of G: In general, the Hamiltonian H is Hermitian and the self-energies $\boldsymbol{\varSigma }$ are non-Hermitian. In most cases, we can restrict ourselves to a real symmetric Hamiltonian and complex symmetric self-energies. In that case, the Green’s function G is also (complex) symmetric, $\mathbf{\tilde{B}}$ and $\mathbf{\tilde{C}}$ are related by transposition, the eigenvalue problem simplifies to a real symmetric one^{Footnote 2} which makes the transformation matrix V orthogonal, i.e., $\mathbf{V}^{-1} = \mathbf{V}^{T}$.

Basis change for non-local quantities: If we are only interested in non-local quantities such as the transmission or the density of states, we can go one step further. Such quantities do not dependent on the spatial basis and we can transform the Green’s function so that the Hamiltonian is diagonal in the subblock A:

$$\displaystyle{ \mathbf{G} \rightarrow \mathbf{S}^{-1}\mathbf{G}\mathbf{S}\,,\qquad \mathbf{S} = \left (\begin{array}{*{10}c} \mathbf{V}& 0\\ 0 &\mathbb{1}_{ \text{DD}} \end{array} \right )\,. }$$

(10)

In practice, we indirectly perform this transformation by omitting the respective factors of V in Eq. (9). All in all, the inverse is given by:

$$\displaystyle{ \mathbf{G} = \left (\begin{array}{*{10}c} \mathbf{\tilde{A}}^{-1}(\mathbb{1} + \mathbf{\tilde{B}}\mathbf{E}^{-1}\mathbf{\tilde{B}}^{T}\mathbf{\tilde{A}}^{-1})&-\mathbf{\tilde{A}}^{-1}\mathbf{\tilde{B}}\mathbf{E}^{-1} \\ \left [-\mathbf{\tilde{A}}^{-1}\mathbf{\tilde{B}}\mathbf{E}^{-1}\right ]^{T} & \mathbf{E}^{-1} \end{array} \right )\text{ with }\mathbf{E}:= \mathbf{D}-\mathbf{\tilde{B}}^{T}\mathbf{\tilde{A}}^{-1}\mathbf{\tilde{B}} }$$

(11)

using the abbreviation $\mathbf{\tilde{B}}:= \mathbf{V}^{T}\mathbf{B}$.

Optimization traits: In Eq. (11), no matrix operations for matrices of size of H _AA appear (except for the initial eigenvalue problem): the inverse $\mathbf{\tilde{A}}^{-1}$ is trivial since $\mathbf{\tilde{A}}$ is diagonal. Therefore, this optimization is extremely useful for large systems where the contact regions to the leads are only a small part of the overall system, i.e., N _A ≫ N _D with N _A/D denoting the size of the square matrices A, D, respectively.

For a short complexity analysis, we assume that multiplication and eigenvalue problem of N×N-matrices have computational complexity $\mathcal{O}(N^{3})$. Then, without above optimization, the direct matrix inversion used to calculate the Green’s function has complexity $\mathcal{O}((N_{\text{A}} + N_{\text{D}})^{3})\mathop{ \rightarrow }\limits^{ N_{\text{A}} \gg N_{\text{D}}}\mathcal{O}(N_{\text{A}}^{3})$.

In the above optimization, the complexity of the preparation process containing the eigenvalue problem and the calculation of $\mathbf{\tilde{B}}$ is $\mathcal{O}(N_{\text{A}}^{3} + N_{\text{A}}^{2}N_{\text{D}})\mathop{ \rightarrow }\limits^{ N_{\text{A}} \gg N_{\text{D}}}\mathcal{O}(N_{\text{A}}^{3})$. All the following inversions using Eq. (11) only are of complexity $\mathcal{O}(N_{\text{D}}^{3} + N_{\text{A}}N_{\text{D}}^{2} + N_{\text{A}})\mathop{ \rightarrow }\limits^{ N_{\text{A}} \gg N_{\text{D}}}\mathcal{O}(N_{\text{A}}N_{\text{D}}^{2})$. The summands stand for inversion of E, products of N _A× N _D-matrices with N _D× N _D-matrices like $\mathbf{\tilde{B}}\mathbf{E}^{-1}$ and inversion of $\mathbf{\tilde{A}}$, respectively.

Strictly speaking, the optimization still scales cubically in N _A due to the initial eigenvalue problem. Nevertheless, for energy sweeps over the density of states or the transmission, the complexity of each inversion step dominates and this effort could be reduced to complexity $\mathcal{O}(N_{\text{A}}N_{\text{D}}^{2})$ for large systems,^{Footnote 3} cf. Fig. 4 (lower panel).

As stated above, the optimization only applies for non-local quantities. For local quantities such as current densities, the transformation matrix V cannot be omitted from Eq. (9) and we are back to cubic complexity.

4 Conclusions

In this work, we calculated the local current density in large hydrogenated graphene flakes. The current flow shows complicated patterns, a fact that is ignored in most studies focusing only on the total conductance. These patterns show large local fluctuations. The idea behind the nontrivial local current patterns is very general: scattering states of mesoscopic samples have an inner, nontrivial structure. The latter is seen in the electric current density but it is easily generalized for other observables, e.g. for heat. A mesoscopic device (like a hydrogenated graphene ribbon) which is connected to reservoirs with different temperatures will show fluctuations in the local temperatures as a result of the nontrivial structure of the scattering states.

Along the way, we parallelized and optimized our transport module AitransS to benefit from a supercomputer, thus to enable ab initio transport calculations for large graphene flakes. We showed that our techniques feature good scalability and discussed necessary optimizations. Future work will benefit from the fact that ab initio current density calculations for disordered systems are now available for large 2D film materials and for medium sized 3D materials. Essentially, the approach is only limited by the available computing power.

Notes

1.
Naively the evaluation of the current or its divergence scales as N ³ with N since the number of spatial grid points r scales linearly, for constant grid spacing, and the summation of i, j [see Eq. (3)] gives additional N ². Please, refer to Sect. 3.5.2 for optimization details.
2.
For real symmetric eigenvalue problems, efficient implementations such as ScaLAPACK [10] or ELPA [11] exist that are parallelized over many computing nodes.
3.
For the AGNRs used for Fig. 4, the central part scales linearly, $N_{\text{A}} \in \mathcal{O}(N)$, but the contact regions scale with the square root, $N_{\text{D}} \in \mathcal{O}(\sqrt{N})$ because they only grow transverse to the current direction but not in current direction. This gives the observed overall complexity $\mathcal{O}(N^{2})$.

References

Blum, V., Gehrke, R., Hanke, F., Havu, P., Havu, V., Ren, X., Reuter, K., Scheffler, M.: Comput. Phys. Commun. 180(11), 2175 (2009). doi:10.1016/j.cpc.2009.06.022. http://www.sciencedirect.com/science/article/pii/S0010465509002033
Article Google Scholar
Bagrets, A.: J. Chem. Theory Comput. 9(6), 2801 (2013). doi:10.1021/ct4000263. http://pubs.acs.org/doi/abs/10.1021/ct4000263
Article Google Scholar
Wilhelm, J., Walz, M., Stendel, M., Bagrets, A., Evers, F.: Phys. Chem. Chem. Phys. 15, 6684 (2013). doi:10.1039/C3CP44286A. http://dx.doi.org/10.1039/C3CP44286A
Article Google Scholar
Arnold, A., Weigend, F., Evers, F.: J. Chem. Phys. 126(17), 174101 (2007). doi:http://dx.doi.org/10.1063/1.2716664. http://scitation.aip.org/content/aip/journal/jcp/126/17/10.10%63/1.2716664
Article Google Scholar
Walz, M., Bagrets, A., Evers, F.: Local current density calculations for molecular films from ab initio. J. Chem. Theory Comput. (accepted, 2015).
Google Scholar
Evers, F., Arnold, A.: In: Röthig, C., Vojta, M. (eds.) CFN Lectures on Functional Nanostructures - Volume 2: Nanoelectronics. Springer, Berlin/Heidelberg (2011)
Google Scholar
Wilhelm, J., Walz, M., Evers, F.: Phys. Rev. B 89, 195406 (2014). doi:10.1103/PhysRevB.89.195406. http://link.aps.org/doi/10.1103/PhysRevB.89.195406
Article Google Scholar
Walz, M., Wilhelm, J., Evers, F.: Phys. Rev. Lett. 113, 136602 (2014). doi:10.1103/PhysRevLett.113.136602. http://link.aps.org/doi/10.1103/PhysRevLett.113.136602
Article Google Scholar
Walz, M., Evers, F.: (in preparation)
Google Scholar
Blackford, L.S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia (1997)
Book Google Scholar
Marek, A., Blum, V., Johanni, R., Havu, V., Lang, B., Auckenthaler, T., Heinecke, A., Bungartz, H.J., Lederer, H.: J. Phys. Condens. Matter 26(21), 213201 (2014). http://stacks.iop.org/0953-8984/26/i=21/a=213201
Article Google Scholar

Download references

Acknowledgements

We gratefully acknowledge HLRS Stuttgart for granting us computing time on the Hermit and Hornet systems.

Author information

Authors and Affiliations

Institute of Nanotechnology and Institut für Theorie der Kondensierten Materie, Karlsruhe Institute of Technology, Karlsruhe, Germany
Michael Walz
Institute of Nanotechnology, Karlsruhe Institute of Technology, Karlsruhe, Germany
Alexei Bagrets & Ferdinand Evers
Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Karlsruhe, Germany
Ivan Kondov

Authors

Michael Walz
View author publications
You can also search for this author in PubMed Google Scholar
Alexei Bagrets
View author publications
You can also search for this author in PubMed Google Scholar
Ferdinand Evers
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Kondov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivan Kondov .

Editor information

Editors and Affiliations

Technische Universität Dresden, Dresden, Germany
Wolfgang E. Nagel
Abteilung für Angewandte Mathematik, Universität Freiburg, Freiburg, Germany
Dietmar H. Kröner
Höchstleistungsrechenzentrum, Universität Stuttgart, Stuttgart, Germany
Michael M. Resch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Walz, M., Bagrets, A., Evers, F., Kondov, I. (2016). Ab Initio Transport Calculations for Functionalized Graphene Flakes on a Supercomputer. In: Nagel, W., Kröner, D., Resch, M. (eds) High Performance Computing in Science and Engineering ’15. Springer, Cham. https://doi.org/10.1007/978-3-319-24633-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-24633-8_9
Published: 06 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24631-4
Online ISBN: 978-3-319-24633-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics