1 Introduction

Science and engineering of materials is a vast field ultimately enabling the development of new technologies, with impact in energy, quantum information science, medicine and health, and national security. Almost 10 years ago, in the United States, President Obama launched the Materials Genome Initiative (MGI) (Obama 2011), with a speech at Carnegie Mellon (June 2011). MGI has been a game changer in the way the scientific community thinks of and approaches research in materials. The ambitious goal set by Obama, “To help businesses discover, develop, and deploy new materials twice as fast…,” catapulted research on materials at the forefront of science and engineering on the national scene and pointed out clearly and forcefully that business as usual was not an option in materials research. The MGI pushed toward innovation, to developing brand new techniques to make, study, and predict materials, and recognized theory and computation not only as an integral part of the innovative process but as a driver seat player. Projects in developing predictive tools and databases for materials flourished, for example, the Materials Projects (Jain et al. 2013a) initiated at MIT and then established at LBNL and several related initiatives (Curtarolo et al. 2012; Jain et al. 2013a, 2016; Saal et al. 2013; Bhat et al. 2015; Kalidindi and De Graef 2015; Rajan 2015; Blaiszik et al. 2016; Thygesen and Jacobsen 2016; Chard et al. 2018).

In 2015 the US DOE established for the first time computational materials centers (CMS) to develop methods and software to predict materials properties – importantly to develop software open to the community – thus further enhancing the pace of research and innovation. Three centers were established, two at National Laboratories (BNL: https://www.bnl.gov/comscope/ and ANL: http://miccom-center.org/) and one on a university campus (USC: https://magics.usc.edu/). A year later two additional centers were created, at LBNL (http://c2sepem.lbl.gov/) and ORNL (https://cpsfm.ornl.gov/). These centers were born in the ecosystems of the energy hubs conceived by Steven Chu (JCESR (www.jcesr.org) and JCAP (https://solarfuelshub.org/)); the hubs have a clear mission toward a societal grand challenge, climate change (global warming), and within that ecosystems, many of the projects of DOE centers and other agencies, notably NSF (www.nsf.gov/pubs/2019/nsf19516/nsf19516.htm), focused their research on functional materials for energy.

A question that arises in many instances when discussing the impact of MGI and the US CMS, as well similar centers established in in Europe and Asia, is simply what’s new relative to the deployment of software in the semiconductor and pharmaceutical industry: in these industries codes for materials and molecular systems have been used for decades. However such codes have traditionally been used as mostly end of the line engineering tools, e.g., to test molecules that would not be synthetized first in the laboratory or to help design chips that had already been planned, based on specific material choices. The theories, codes, and software developed and pushed by MGI-like ideas are meant to become (and in some cases are already becoming) beginning of the line multi-scale tools to produce innovative ideas on materials that have not yet been made, have not yet been planned, and do not even come from conventional synthetic and fabrication sources.

These theories and codes are envisioned to be general enough to meet two upcoming scientific revolutions: artificial intelligence (AI) and big data (De Mauro et al. 2015) and quantum information science and technology (Bennett and DiVincenzo 2000). It is important to emphasize that general applicability is a key, distinctive feature relative to the codes used in the semiconductor industry, for example, which have traditionally been developed for specific tasks, targeting one specific class of materials and processes. General codes may be used to produce the data needed for AI technologies for vast classes of materials. Computer generated data may then be part of design strategies that include innovative feedback loops with experiments, as well as strategies to make data reproducible and available to the scientific community worldwide (see, e.g., Govoni et al. 2019).

Contemporary computational methods and database mining techniques have already made tremendous strides in the prediction of equilibrium properties of materials that exhibit simple morphologies. However, the functionality of modern materials depends critically on the integration of dissimilar components and on the interfaces that arise between them. Hence the atomic- and molecular-scale manipulation of these components and the heterogeneous structures that emerge from them are key to materials design. In particular, the controlled and driven assembly of building blocks into hierarchical systems, as well as the control of defects and complex morphologies, offers the opportunity to create artificial materials that do not exist in nature and that exhibit superior physical properties for, e.g., emerging energy and quantum information technologies.

The simulations of heterogeneous materials and of the assembly process of artificial materials are much less advanced than the study of equilibrium properties. In order to accelerate the discovery of innovative functional materials, it will be key to acquire the ability not only to compute the properties of the end product but also to simulate and validate the assembly processes that take place during synthesis and fabrication. In addition, in order to design materials relevant to many technologies, it is essential to predict functionalities of systems with complex defective structures and ultimately complex morphologies and to simulate and eventually engineer the basic mass, charge, and energy transport phenomena, as pictorially illustrated in Fig. 1. We emphasize that most transport phenomena, e.g., electron transport, and phenomena involved in the spectroscopic characterization of materials, which involve interaction with light, are inherently quantum mechanical and thus require a first-principles, quantum mechanical treatment of interatomic interactions, at the atomistic scale.

Fig. 1
figure 1

Integrated predictions of multiple properties are key to define effective design strategies for materials with target characteristics. These properties encompass the atomistic structure of the material, possibly derived from the assembly of complex building blocks, the response to electromagnetic fields (light) used to probe and characterize the material and transport properties, including mass, charge, and heat transport

In the following, we focus on two examples (interfaces for energy conversion processes, in Sect. 2 and materials composed of complex building blocks, in Sect. 3), and we describe recent progress in describing heterogeneous, defective materials with complicated morphologies using first-principles methods (Martin 2004, Martin et al. 2016). We aim at showing the importance of unraveling mechanisms and providing fundamental, physical insights, in order to pave the way to material design strategies. We close (Sect. 4) by describing open challenges in understanding and predicting synthetic pathways to obtain materials with target properties.

2 Energy Conversion at Interfaces from First Principles

In Figs. 2 and 3, we show some of the key processes and properties that one aims at understanding to establish a structure-function relationship and eventually predict optimal materials for solar-to-fuel and solar energy conversion, respectively. The figures illustrate the complexity of the predictive endeavor and the multitude of properties one should be able to compute, validate, and ultimately integrate with experiments. We concentrate here on materials for photo-electrochemical cells (PECs; Fig. 2).

Fig. 2
figure 2

Pictorial representation of key processes and systems involved in water-splitting reactions occurring on a catalytic surface, which starts with harvesting light to form charge carriers and involves proton-coupled electron transfer (PCET) processes

Fig. 3
figure 3

Pictorial representation of the key physical processes involved in the prediction and design of nanostructured semiconducting materials for solar energy conversion, including ensembles of nanoparticles (NPs), embedded NPs, and inorganic clathrates (icons on right hand side). Electronic (absorption, photoemission, and band offsets) and transport properties may be obtained from calculations based on density functional and many-body perturbation theory

The generation of hydrogen from water and sunlight through PECs is one of the promising approaches investigated by the scientific community in the last decades for producing sustainable carbon-free energy (Walter et al. 2010; McKone et al. 2013; Pham et al. 2017). A key aspect to building an efficient PEC is the availability of Earth-abundant semiconducting photoelectrode materials that can absorb sunlight and eventually drive water-splitting reactions when interfaced with the liquid. Despite steady efforts, no single material has yet been found that simultaneously satisfies the efficiency and stability required for the widespread commercialization of hydrogen technology, and efforts have been concentrated on architectures composed of different materials, notably absorber solids interfaced with catalysts. Hence, understanding the properties of the interfaces between the various components is key to predict novel systems and eventually to optimize the device performance.

In this regard, electronic and structural properties of absorbers/catalysts/water interfaces play a critical role, as rapid charge transfer between the photoelectrode, the catalysts, protective layers, and electrolytes is required for efficient fuel production. Interfacial structural and electronic properties of PECs are of course intertwined. For example, band edge positions of photoelectrode absorbers depend on the surface termination, the reconstruction, and the concentration of impurities and defects. In addition, the stability of the absorbers against oxidation (reduction) is determined by the relative energy between their valence band maximum (conduction band minimum) and intrinsic oxidation (reduction) potential. Such a complex interplay results in a multi-property optimization problem which, given recent advances in high-performance computing and sophisticated electronic structure theories and codes (Kresse and Hafner 1993; Soler et al. 2002; Gygi 2008; Blum et al. 2009; Giannozzi et al. 2009; Hutter et al. 2014; VASP, Kresse and Furthmüller (1996a, b) Kresse and Hafner (1994) www.vasp.at; SIESTA, www.icmab.es/siesta; Qbox, www.qbox-code.org; FHI-AIMS, http://aims.fhi-berlin.mpg.de/; Quantum Espresso, www.quantum-espresso.org; CP2K, www.cp2k.org; CPMD, www.cpmd.org;), is now conceivable to tackle using first-principles simulations.

In recent years, it has been successfully demonstrated that first-principles calculations can be employed to scan thousands of combinations of elements across the entire periodic table to suggest new photoelectrode candidates (Greeley et al. 2006; Jain et al. 2013b; Castelli et al. 2015). However, computational screening schemes available thus far in the literature have mostly focused on bulk properties of candidate materials, and only recently the structural and chemical properties of surfaces and interfaces with the electrolyte have attracted the attention that they deserve to build successful design strategies. To paraphrase what Herbert Kroemer so elegantly pointed out is his Nobel lecture (Kroemer 2000) on semiconductor heterojunctions, the interface is still the device! As shown by us and others, the effective predictions of band offsets for water photocatalysis require the simulations of the electronic structure of solvated surfaces at finite temperature and in the case of oxide surface, importantly of defective solvated surfaces (Gerosa et al. 2018).

For example, in a case we have recently studied, WO3 (Gerosa et al. 2018 and reference therein), we have shown that the average potential energy difference at the interface of pristine and defective WO3 varies by ∼1 eV and that solvation is absolutely critical (see Fig. 4). In addition, we have shown the key importance of using a high level of theory, beyond the widely used density functional theory (DFT) (Hohnberg and Kohn 1964; Kohn and Sham 1965; Martin 2004) and hybrid DFT (Perdew et al. 1996; Heyd et al. 2003, 2006) to carry out predictive calculations. The latter have allowed us to understand that the excess charge present at defective WO3 surfaces due to oxygen vacancies forms a large 2D polarons (∼10 A radius) on the plane of the surface; the predicted charge localization properties hint at possible formation of stable (OH) groups at the surface in contact with water and at the fact that holes transferred to water would then form a highly reactive (OH)*, a possible precursor of water-splitting reactions. Altogether our calculations have identified three major factors determining the chemical reactivity of oxide absorbers interfaced with water: the presence of surface defects, the dynamics of excess charge at the surface, and finite temperature fluctuations of the surface electronic orbitals. These general descriptors are essential for the understanding and prediction of optimal oxides for water oxidation.

Fig. 4
figure 4

Energy levels (valence band maximum, blue; conduction band minimum, red; defect state due to oxygen vacancies; yellow) of a WO3 surface in vacuo, at T = 0 and at room temperature, in the presence of water (solvated). The energy levels have been obtained using first-principles molecular dynamics simulations and calculations at the many-body perturbation theory level (GW), starting from electronic states computed with hybrid density functionals (From Gerosa et al. 2018). Note the striking difference of the positions of the levels on the right and left hand side, relative to the redox levels of liquid water

This was presented as an example of the importance of gaining fundamental physical insight into descriptors in order to define material design strategies and in particular into non-intrinsic properties of materials such as interfaces between complex components and defects present at finite temperature. We now turn to a second example of materials made of complex, nanostructured building blocks, where again interfaces – specifically buried interfaces – dominate the scene.

3 Building Blocks for Electronic Materials and Materials for Energy Conversion

In this section we consider materials made of nanostructured building blocks, in particular semiconducting colloidal nanocrystals (NCs) (Scalise et al. 2018; Greenwood et al. 2018; Talapin et al. 2010). Systems built from the assembly of these “artificial atoms” are emerging as tunable, earth-abundant, and potentially nontoxic materials for solar energy conversion, light emission, and electronic applications (Talapin 2012; Kovalenko 2013; Wippermann et al. 2013, 2014, 2016). The electronic and transport properties of NC-based solids depend on many factors that encompass the intrinsic characteristics of the individual NCs, for example, their shape, size, and composition, as well as their surface chemistry and mutual interactions. Organic ligands traditionally used in NC synthesis play a central role in controlling shape and size, as well as in driving self-assembly into superlattices. However, these ligands are often composed of long hydrocarbon chains, which create an insulating barrier that leads to low charge carrier mobilities. Significantly higher mobilities could be achieved by using inorganic ligands, and their use has enabled significant performance improvements of NC-based solar cells, transistors, and lasers.

For example, InAs and CdSe NCs capped with molecular metal chalcogenide complexes (MCC) were shown to exhibit high electron mobilities (Lee et al. 2011; Liu et al. 2013), with III–V-based nanomaterials preferable for commercial applications due to their lower toxicity. However, the atomistic structure of these materials is difficult to characterize, in particular that of the NC surfaces and interfaces, whose control is required to engineer systems with the desired properties. Recently we proposed (Scalise et al. 2018) a strategy to model a broad class of nanocrystal-in-glass systems that extends significantly beyond semiconductor quantum dots and MCC ligands.

Our strategy is summarized in Fig. 5. By combining first-principles molecular dynamics (MD) and ab initio stability diagram calculations (ab initio electronic structure calculations of surface energies and stability), main structural motifs were identified; in particular the structure of buried interfaces was determined. Before proceeding to derive a complete structural model, the motifs obtained computationally were experimentally validated, by carrying out XPS and Raman measurements, which both confirmed the results of the calculations. Using these validated structural motifs as a starting point of additional first-principles MD simulations, a structural model consistent with experiment was finally derived and used to analyze the electronic structure of the composite material. The predicted electronic states were used to interpret and understand the reasons for the measured negative photoconductivity, thus identifying specific reasons giving rise to properties that had remained unexplained and controversial for some time.

Fig. 5
figure 5

Schematic representation of the integrated experimental and computational strategy adopted to obtain validated structural models and electronic properties of all inorganic semiconductor materials composed of colloidal nanocrystals, represented in the inset on the bottom right (see text)

Overall, by combining electronic structure calculations and first-principles molecular dynamics (MD) simulations with experiments, we showed that the ligands are not absorbed as intact units but rather they decompose on contact with the NC surface to form an amorphous matrix that encapsulates the nanoparticles (NPs). The intrinsic electronic properties of the isolated NCs are greatly modified in the matrix, whose atomistic structure plays a key role in enabling an efficient electronic transport. The structural model derived in this way permitted an explanation of the origin of the measured negative photoconductivity of the nanocomposite. This was presented as an example of novel material properties emerging when assembling building blocks at the nanoscale and as an example of the importance of tightly integrating theory, computation, and experiments. The future challenge will be to achieve such integration automatically and to define general validation strategies appropriate for broad classes of systems.

4 The Synthetic Challenge

One of the open challenges in computational materials science is the understanding and prediction of how to synthesize materials with target properties (De Yoreo et al. 2016). Using experimental data and simulations, the challenge is to establish correlations between synthesis protocols (SP) and material structure (M) and between synthesis protocols and material properties (P). This endeavor requires the solution of both direct and so-called inverse problems (Kaipio and Somersalo 2005). The former include answering first the question: Given a synthesis protocol (SP), what material (M) does one obtain? This forward problem is a grand challenge for predictive, computational methods. We still lack well-defined physics models that can describe synthesis; in addition “realistic” materials encompass complex descriptors including crystal structure, morphology, defects, and surface coverings. An even more complex forward problem concerns materials properties and the following questions: Given a synthesis protocol, which properties (P) does one get? This forward problem is clearly not unrelated to the first one (SP → M); however, it poses additional experimental and theoretical challenges. In particular, on the theory side, the prediction of certain complex properties is still in the making, e.g., obtaining optoelectronic and vibrational spectroscopic data and transport data, which require the use of sophisticated, cannot yet be used efficiently to acquire large amount of data for broad classes of systems.

The ultimate goal of the science of synthesis research is to solve the inverse problems associated with the forward problems mentioned above: Given a desired material, what synthesis protocol should be used to obtain it? Given a set of desired materials properties, what synthesis protocol should be used to obtain them? Solutions to inverse problems are found by solving many forward problems in a regression loop, which requires forward problems to be rapidly computable and the use of data analytic approaches, designed to mine experimental and theoretical data.

Ultimately, by seeking correlations between data and synthesis conditions, one will enable the discovery of materials and synthesis ontologies, i.e., a set of descriptors linking materials and synthetic pathways (e.g., crystal phase and synthesis temperature). Accurate ontologies are not yet known for synthesis, and their definition may come from the use of machine learning (ML) to search for the most relevant descriptors for a given outcome and to relate different descriptors used by different researchers. We expect that with the increasing population of experimental and computational databases in the materials science community, ML workflows (Jain et al. 2015; Pizzi et al. 2016; Meng and Thain 2017; Adorf et al. 2018; Freire and Chirigati 2018) may be used to train models of materials synthesizability and properties and hence to predict novel materials.

In closing, we would like to comment on data and data availability. A key guiding principle for materials research is to make data findable, accessible, interoperable, and reusable. The reproducibility of experiments and computations and of the corresponding results is an important and critical part of the overall research process of all scientific disciplines and in particular of materials predictions heavily relying on large amount of data. Yet the data presented in most published scientific papers are not made available to the community, and the procedures followed to obtain or generate the data are often not articulated step by step or in any detail. Hence making all data available to the public (Govoni et al. 2019), on a paper-by-paper basis, so as to increase experimental and computational rigor in reporting results, together with transparency, should become integral part of the research process. This endeavor will also greatly contribute to devising improved validation procedures for computational data as well as establishing experimental and computational automatic feedback loops.