Introduction: The materials discovery problem

Throughout history, society has depended on the discovery of new materials with a desired set of chemical or physical properties for utility. In the modern era, new materials have resulted in a swath of important structure–function relationships that have been used to advance applications in synthesis, health care, energy, and more. The basic principles behind traditional materials discovery processes are simple: there is a need for a material with a desired set of properties, a material is made and tested for this utility, and then optimized for performance. If the functionality is not sufficient, the process begins again and continues in a serial manner. Selecting materials to synthesize and explore in this way is a critical decision-making step, and although elemental characterization of the 118 elements across the periodic table has been established, there is much left to be explored when considering the enormous design space across all possible elemental compositions, structures, and size effects. The vastness of the materials design space can be understood in a simple thought experiment: if one considers the 61 metallic elements with stable isotopes in the periodic table, the combinations of bi-elemental materials alone would result in more than 1800 possible combinations, with this value reaching >400 million when considering hepta-elemental compositions. Still, this preliminary estimate only accounts for composition and the sheer volume becomes even more daunting when considering the added variables of size and structure effects and estimates quickly reach the trillions. This unfathomably large materials design space is the basis for the materials discovery problem (Figure 1a).

Figure 1
figure 1

(a) Representative design parameters required for nanomaterials synthesis demonstrating the enormous possible combinations that need to be explored for comprehensive materials discovery. (b) Summary of the megalibrary platform for massive materials synthesis, screening, and analysis.

For more efficient workflows, combinatorial approaches accelerate the materials design process through batch or parallelized synthesis and comparison of tens to thousands of samples.1,2,3 For example, the “Multi-Sample Concept” emerged in the late 1960s with advances in cosputtering binary and ternary compositional thin films.4 Beyond vapor deposition, various drop casting, inkjet printing, and microfluidic synthesis techniques have been explored in the context of combinatorial chemistry and materials science for discovery, but even these state-of-the-art high-throughput methods suffer a lack of the requisite materials diversity, synthetic fidelity, and screening resolution within each experiment for the pace of discovery to match modern needs.5 If one wants to fully interrogate the materials genome, serial and current combinatorial approaches toward materials discovery will not suffice.

In this article, we showcase the megalibrary platform as a strategy for accelerating exploration of the materials genome and assert that increasing combinatorial experimentation by multiple orders of magnitude makes megalibraries uniquely primed to revolutionize materials discovery. Megalibraries are centimeter-scale chips consisting of millions to billions of discrete and structurally characterizable materials, each of which is individually addressable. The advent of the megalibrary platform is the result of decades of research in scanning-probe lithography techniques, beginning with dip-pen nanolithography (DPN)6 and evolving to cantilever-free and massively parallel patterning7 with nanometer resolution on the centimeter scale.8 When coupled with high-throughput screening and computational approaches, such as artificial intelligence (AI), megalibraries are equipped to contend with the enormous materials design space.9 In summary, the megalibrary platform consisting of materials design, megalibrary synthesis, performance screening, and machine learning represents a major paradigm shift in high-throughput materials discovery that other traditional or semi-combinatorial methods cannot match (Figure 1b).

Megalibraries: Massively parallel nanomaterial synthesis

High-throughput materials discovery requires rapid synthesis and characterization tools that do not sacrifice precision or speed. Nanotechnology can solve these challenges by enabling precise materials assembly in confined spaces and providing probes with nano-to-micro-resolution analysis capabilities. Here, we review the development of nanomaterial megalibraries that contain millions of individual materials synthesized in parallel and discuss progress made in extracting and utilizing the lessons learned from massive library screening (Figure 2).

Figure 2
figure 2

Timeline highlighting the key developments in the progression from cantilever-based DPN molecular patterning to cantilever-free massively parallel patterning via PPL and the development of nanoreactor confined materials synthesis using SPBCL, which together with commercialized lithography instrumentation enabled the synthesis of nanomaterial megalibraries with progressively increasing synthetic control, screening capabilities, and data analysis. Figures of SPBCL and Polyelemental Libraries from Reference 26. Reprinted with permission from AAAS. Perovskite Libraries from Reference 29.

Controlled patterning of molecules and materials with nanometer precision with DPN was introduced by the Mirkin Group in 1999.6 With DPN, scanning probes with nanoscale tips are coated with molecules and touched to surfaces to deposit those molecules at specific positions. DPN was a major discovery that transformed the measurement tool of atomic force microscopy (AFM) into a writing tool, enabling a broad range of new applications.10 The scale and capabilities of DPN continued to increase by introducing more scanning probe cantilevers to pattern in parallel, until 2008 yielded a cantilever-free patterning technique, termed polymer pen lithography (PPL).7 Massive arrays of nanoscale pen tips can be fabricated as polymer stamps using standard photolithography defined templates, and up to 11 million pen tips have been fabricated on a single centimeter-scale wafer for simultaneous patterning.11

The capabilities of DPN were expanded in 2010 with scanning probe block copolymer lithography (SPBCL), transforming the writing tool into a synthesis tool.12 In SPBCL, an aqueous ink containing poly(ethylene oxide)-b-poly(2-vinylpyridine) block copolymer with dissolved metal salts is patterned onto a surface using DPN to make a dome-shaped feature with <1-µm diameter of block copolymer and embedded metal salts. After thermal annealing under Ar or H2 (~150°C), these droplets behave as “nanoreactors” as metal cations diffuse throughout the polymer matrix, are reduced, and coalesce into single nanoparticles (Figure 3a). Importantly, the surface confined nanoreactors have attoliter volumes that favor the formation of single particles within each nanoreactor to minimize surface energy. Because the nanoscale feature guides the chemistry toward a single product, the final composition is dictated by the identity and ratios of metal precursors in the ink solution. This allows for predictive synthesis of multicomponent alloy or heterostructured nanoparticles characterized by high-angle annular dark-field scanning transmission electron microscopy (HAADF-STEM) and energy-dispersive x-ray spectroscopy (EDS), whereas such stoichiometry control is difficult in traditional multicomponent nanomaterials synthesis (Figure 3b).13 This confined nanoreactor synthetic technique has been successfully employed for creating arrays of metal nanoparticles, metal oxides, metal sulfides, and perovskites, among others.

Figure 3
figure 3

(a) Schematic for combined PPL and SPBCL depicting parallel deposition of nanoreactors containing materials precursors, single particle formation upon annealing, and higher temperature nanoreactor removal. (b) Fluorescence microscope image of perovskite nanocrystals arranged in the “IIN” logo demonstrating position control, from Reference 29; high-angle annular dark-field-scanning transmission elecron microscopy (HAADF-STEM) and energy-dispersive x-ray spectroscopy (EDS) maps of AuAgNi nanoparticles with varied Ni content demonstrating composition control; HAADF-STEM images of Pt nanoparticles demonstrating size control, from Reference 26, reprinted with permission from AAAS; SEM image of a tetrahexahedral Pt nanoparticle demonstrating material shape control, from Reference 18, reprinted with permission from AAAS. (c) Optical microscope images of a 2.25-cm2 megalibrary of nanoreactors; a magnified region showing patterns defined by individual pen tips; SEM image of AuPdCu nanoparticles after annealing the megalibrary. Reprinted with permission from Reference 35. © 2023 American Chemical Society.

Beyond position and composition control (vide infra), SPBCL allows for material size and shape control (Figure 3b). Specifically, the nanoreactor volume and ink concentration dictate the precursor quantity and thus the size of the resulting single nanoparticle, first demonstrated with  ~5–25-nm Au nanoparticles.12 Extensive DPN and PPL studies have established feature size control by modulating the dwell time, force, and surface hydrophobicity, which readily transfer to SPBCL.14,15,16 Nanoparticle size can be further tuned by covalently merging the polymer nanoreactor with the metal precursors, such as poly(ethylene oxide) modified metalloporphyrins, which allowed deposition of extremely small nanoreactors that yielded sub-2-nm Pt nanoparticles.17 SPBCL typically yields spherical nanoparticles, but other shapes can be attained via post-patterning modifications. In one approach, high-index facet nanoparticles were made by alloying trace elements such as Sb, Bi, or Pb with patterned nanoparticles via solid state reaction followed by high-temperature de-alloying evaporation.18,19 This shape-regulating process yielded a variety of mono- or multicomponent tetrahexahedral nanoparticles with simultaneous size control. Similar post-patterning modification has yielded hollow metal sulfide particles20 or semiconductor nanowires.21

Combining these advances in SPBCL and PPL, a new modality for materials synthesis has been established termed megalibraries. Exceptional levels of nanomaterial synthetic control are merged with extremely high-throughput parallel synthesis to yield centimeter chips with millions to billions of individual particles, each with addressable position, composition, size, and shape. In 2016, PPL was improved to achieve feature separations <200 nm with a total of 5.9 billion identical features on a 14.5-cm2 area.22 In a typical high-throughput megalibrary synthesis and screening experiment, 2.25-cm2 chips are patterned with ~50–225 million individual nanomaterials (Figure 3c). Fabrication of such large libraries is impractical at the macroscale, but the three-dimensional (3D) miniaturization of materials synthesis with attoliter volumes and production of materials at the nanoscale provides a standalone solution that moves beyond conventional approaches to exploring chemical reactions. To put this in perspective, each reactor is more than a quadrillion times smaller than the 250-mL flask a traditional reaction is run in.

Next, we highlight strategies to expand control over design features, including serial composition inking, post-synthetic introduction of composition gradients, and multi-inking of pen arrays for simultaneous composition and size control. Significant strides have been made toward accessing the full capabilities of megalibrary synthesis to current capabilities that offer the synthesis and screening of >1 million unique materials per experiment.

Polyelemental nanoparticle megalibraries

Since the most critical aspect of SPBCL is the formation of a single nanoparticle in each nanoreactor, it is not immediately obvious that the same would hold for polyelemental nanoparticles. Indeed, solution-phase methods for polyelemental nanoparticle synthesis must contend with competing nucleation processes, resulting in significant compositional spread.23 However, the extreme confinement of the precursors inside a nanoreactor guides the formation of a single polyelemental nanoparticle with precise composition control in high yield.24 This was first demonstrated with a series of bimetallic and trimetallic nanoparticles,25 then a representative polyelemental library of nanoparticles composed of all possible combinations of Ag, Au, Cu, Co, and Ni was synthesized (Figure 4a).26 Using synthetic conditions that yield thermodynamically stable structures, the phases found in these nanoparticles generally matched expectations from bulk phase diagrams considering metal miscibility.26 However, prediction of thermodynamic structures in polyelemental nanoparticles can become intractable because the bulk phase diagrams are frequently unknown. Accordingly, the promise of polyelemental nanoparticle megalibraries provides (1) spatially encoded synthesis of designer nanostructures featuring specific phases and interfaces, and (2) systematic exploration of unknown materials spaces at scale to tackle the materials discovery problem.

Figure 4
figure 4

(a) A library of polyelemental nanoparticles synthesized by SPBCL. (b) Possible interfacial arrangements in multiphasic nanoparticles. (c) Possible interfaces in the Au–Co–PdSn trimetallic system, with the calculated interfacial energies. (d) Four interfaces formed in the Ag–Cu–Co–PdSn tetraphasic system. From References 26 and 27. Reprinted with permission from AAAS.

Synthesizing nanomaterials with a specific set of structural motifs requires establishment of corresponding design rules, especially at increasing levels of complexity. To unveil such guiding principles, a systematic exploration of the Au–Ag–Cu–Co–Ni–Pd–Sn nanoparticle design space was performed, in which the elements have known but nontrivial miscibilities.27 In that work, encompassing more than 30 compositions, including the seven-component single particle, nanoparticles with low phase counts (≤3) were sufficiently simple to allow for experimental and theoretical study, revealing two possible configurations of triphasic nanoparticles: linear (three phases with two interfaces) or pie-shaped (three phases with three interfaces) (Figure 4b). The configuration formed could not be deduced from the corresponding biphasic systems, but could be predicted theoretically as a balance of the respective surface and interfacial energies. The complexity of nanoparticles with ≥4 phases presented challenges for simulations; however, it was possible to predict the interfaces that formed using lessons learned from the four constituent triphasic systems (Figure 4c). Depending on their respective interface counts (two or three), the resulting tetraphasic structure had between four and six interfaces (Figure 4d).27

The ability of SPBCL to reliably produce single nanoparticles of arbitrary complexity and the elucidation of the rules guiding formation of specific structural motifs enables the synthesis of designer nanostructures. Fabricated at scale, these polyelemental nanoparticle megalibraries allow for discovery of materials at an unprecedented pace.

Optoelectronics discovery using perovskite megalibraries

SPBCL has also been adapted to synthesize perovskite nanomaterials to study optical and optoelectronic properties of isolated crystals.28 Through an analogous strategy, perovskite precursors are dissolved in an ink and patterned as nanoreactors, which are composed of high boiling point solvents rather than polymer. The solvent evaporates after patterning, concentrating the precursors and inducing crystallization. Combined with PPL, this evaporation-crystallization patterning, termed EC-PPL, offers excellent control over halide perovskite crystallization processes, which allows one to systematically study size/composition dependent properties and ion mixing/segregation. (Figure 5a).29 For example, size-dependent emission properties of single-halide perovskite nanocrystals were first studied, revealing the importance of defects at the nanoscale. Also, perovskites compositional libraries were synthesized to make a range of heterostructures and solid solutions attainable by simply altering the precursor composition in the ink solution (Figure 5b).30 The confined perovskite synthesis was critical for inducing de-mixing of ions to access 3D heterostructures difficult to obtain conventionally.

Figure 5
figure 5

(a) Schematic for EC-PPL synthesis of perovskite nanocrystals. (b) A representative library of halide perovskite heterostructures characterized by confocal microscopy. Scale bars = 5 µm. (c) Perovskite defect generation using laser irradiation followed by ion exchange. (d) Confocal photoluminescence mapping of a CsPb(Br1–xClx)3 megalibrary synthesized by laser defect generation and halide exchange. Scale bar = 100 µm. (e) Photoluminescence intensity of individual perovskites in the CsPb(Br1–xClx)3 megalibrary as a function of emission wavelength, identifying CsPb(Br0.6Cl0.4)3 as the highest intensity blue emitter. (f) Photoluminescence spectra of a higher resolution screening of perovskite composition centered around the discovered CsPb(Br0.6Cl0.4)3 material (particle 2). (a) From Reference 29. (b) From Reference 30. (c–f) Reprinted with permission from Reference 33. © 2022 American Chemical Society.

In addition to controlling perovskite composition via ink design, a post-synthetic modification strategy can transform large arrays of perovskites into megalibraries with halide composition gradients. Importantly, this enables one to address fundamental questions associated with halide perovskite structure–function relationships.28 For example, defect engineering is a promising route to improve energy conversion in perovskite materials, yet progress has been slowed by few strategies to systematically vary defect concentration across large batches of samples.31,32 Perovskite megalibraries are a promising solution where laser exposure of individual crystals synthesized by EC-PPL induces loss of anions dependent on laser intensity and irradiation time (Figure 5c).33 Subsequent exposure of the defective crystals to new halide solutions results in an overall halide exchange process. In a first demonstration, parallel patterning using 16 pen tips each depositing 17 × 17 arrays of ink yielded 4624 CsPbBr3 perovskites with ~400-nm diameters. Each crystal was then individually irradiated with a 405-nm laser while systematically varying the laser power and exposure time. The extent of vacancy generation was monitored by the loss in photoluminescence intensity. Finally, the particle array was submerged in a cyclohexane solution containing PbCl2 for chloride insertion into the vacant sites and yield the combinatorial CsPb(Br1–xClx)3 megalibrary (Figure 5d).

After perovskite megalibrary synthesis, determining the optical structure–function relationships was achieved by serial confocal mapping acquiring a photoluminescence spectrum for each particle to reveal the dependence of peak wavelength and intensity on the ratio of Cl:Br. Within the vast design space synthesized, the composition with the highest blue photoluminescence intensity was identified as CsPb(Br0.6Cl0.4)3 (Figure 5e–f). The discovery of this high-performing blue emitter was further confirmed by translation to a bulk thin-film system where this CsPb(Br0.6Cl0.4)3 composition retained its superior blue photoluminescence properties. Perovskite megalibraries are promising platforms to further push the boundaries of heterostructure synthesis and defect design at accelerated rates to uncover improved optical and optoelectronic properties.

Expanding synthetic control to millions of materials

To dramatically increase synthetic control of megalibraries, it is necessary to transition away from serially altering the ink composition or serial defect generation. To accomplish this, pen arrays can be spray-coated with multiple inks containing different precursor identities/concentrations.34,35 Using pressurized spray dispensers, the ink is nebulized and distributed in a radial fashion and a Gaussian distribution (Figure 6a). Overlaying multiple ink sprays onto a pen array creates composition gradients with each tip containing a unique precursor stoichiometry that is replicated in the synthesized nanomaterial after patterning and annealing. Spray inking was first demonstrated with dye nanoreactor megalibraries. Two block copolymer inks containing either the Cy5 or the rhodamine 6G fluorophores were separately sprayed at two corners of a pen array.34 Resultingly, a fluorescence gradient depicted as blue-to-red was created along one axis after nanoreactor patterning (Figure 6b). Because only two sprays were performed, an ink volume gradient along the opposite array axis was also created, which translated to a nanoreactor size gradient on the substrate after patterning since more heavily inked pens deposit larger features (Figure 6c). Overall, a 126 × 126 pen array sprayed with two inks patterned nanoreactors with 126 colors and 126 sizes, each with 900 replicates for a total of 14,288,400 nanoreactors with 15,876 unique composition/size combinations.

Figure 6
figure 6

(a) Contour plot showing the variation in ink intensity from a single spray position at the corner of a 1.5 × 1.5 cm2 pen array. (b) Fluorescence microscope image of a megalibrary of dye-containing nanoreactors with variation in fluorescence wavelength across the horizontal axis and size control along the vertical axis. (c) Atomic force microscopy (AFM) height images of the dye-containing nanoreactors at the top and bottom of the megalibrary. (d) Overlaid intensity profiles of four sprays at each corner of a pen array containing either Au, Pd, or Cu precursors showing the calculated composition design space. (e) AFM height image of a nanoreactor size gradient defined by a single pen tip. (f) The resulting size gradient of Au–Pd–Cu nanoparticles after annealing showing a 15–35-nm size range. (a, d–f) Reprinted with permission from Reference 35. © 2023 American Chemical Society. (b, c) From Reference 34. © 2019 National Academy of Sciences.

Spray inking was further enhanced by spraying the four corners of a pen array to define compositional gradients along both axes and coating every pen tip with a unique precursor composition (Figure 6d). Appropriate positioning of the spray dispensers can minimize ink volume fluctuations across the array to maintain uniform nanoreactor sizes. Size gradients can be created by each individual pen tip instead of along an entire array axis by varying the pen velocity, dwell time, and force when touching the surface. This superior level of nanoparticle synthetic control was demonstrated in a three-component megalibrary covering the Au–Pd–Cu design space.35 A 300 × 300 pen array was sprayed at the four corners, twice with Au precursors and once each with Pd or Cu precursors. Nanoreactors were deposited by each pen in 25 × 25 patterns where each row of nanoreactors was created using a different pen velocity and dwell time (Figure 6e). After annealing, this 2.25 cm2 megalibrary contained a total of 56,250,000 nanoparticles with 2,250,000 unique composition/size combinations, each with 25 replicates (Figure 6f). The scale of this megalibrary and level of synthetic control attaining 1,000,000 unique nanomaterials per cm2 far surpasses work on nano-combinatorial libraries synthesized by techniques such as drop casting, inkjet printing, or masked vapor deposition.1,2,3,36,37

Significant SPBCL and PPL research has focused on nanomaterial structural control with patterning occurring on silicon substrates or silicon nitride membranes for electron microscopy analysis. However, such substrates limit the functionality of megalibraries because many applications rely on specific nanomaterial–substrate interactions and substrate properties, such as conductivity, light absorption, surface area, and more. Successful megalibrary synthesis requires hydrophobic surfaces, subnanometer surface roughness over micron areas, and submicron height variations over centimeter areas. Few substrates fit these requirements, but it was discovered that substrates can be temporarily modified with polystyrene thin films to enable megalibrary synthesis.35 A ~200-nm polystyrene thin film can be applied to an arbitrary substrate to impart the surface characteristics required for patterning and a final >400°C treatment induces polystyrene degradation, fully removing it and depositing the nanomaterials on the desired substrate. This was demonstrated on a range of conductors, semiconductors, and insulators and is facilitating the expansion of megalibrary screening abilities as it opens routes to explore additional material properties such as energy conversion.

Catalyst discovery using nanoparticle megalibraries

Catalyst materials are of tremendous value because most chemical processes rely on catalysis,38 thus discovering materials with improved activity, selectivity, durability, and cost will always be advantageous. The first application of megalibraries for functional materials discovery was metal nanoparticle catalysts for single-walled carbon nanotube (SWCNT) synthesis demonstrated by the Mirkin Group in collaboration with the Air Force Research Laboratory.34 SWCNTs are prized materials in nanotechnology with wide ranging applications necessitating improved synthesis techniques.39 Spray inking defined a Au–Cu nanoparticle design space with uniform ~2.5-nm diameters. Site-specific laser-induced heating catalyzed the vapor deposition of SWCNTs from a C2H4/H2 atmosphere. To enable site-specific heating and catalysis, a specialized substrate was fabricated with 10-µm-diameter pillars spaced 50 µm apart (Figure 7a). PPL and thermal annealing were performed to synthesize the Au–Cu nanoparticle megalibrary on top of these pillars, and to simplify analysis, the composition gradient was divided into 36 sections. Individual pillars were heated between 700°C and 900°C and subsequently analyzed by Raman spectroscopy to quantify SWCNT formation (Figure 7b). In total, 360 SWCNT growth experiments were serially performed, and the top performing catalyst composition was determined to be Au3Cu, a nanoparticle composition not previously known to catalyze SWCNT growth (Figure 7c).

Figure 7
figure 7

(a) Optical image of a micropillar array upon which nanoreactors are patterned for site-isolated heating. (b) Experimental setup for laser-induced chemical vapor deposition of single-walled carbon nanotubes (SWCNTs) on the megalibrary coupled with Raman spectroscopy. (c) Variation of the integrated SWCNT Raman G band as a function of Au–Cu catalyst composition identifying Au3Cu as the most active catalyst. (d) Graphic depicting a rhodamine B fluorophore thin film on a nanoparticle megalibrary synthesized on a TiO2 substrate for the visualization of relative reactive oxygen species generation upon visible light irradiation. (e) Fluorescence microscope image of an entire Au–Pd–Cu/TiO2 megalibrary after irradiation with magnified regions showing the varied extent of fluorescence degradation, identifying Au0.53Pd0.38Cu0.09–TiO2 as the most active photocatalyst. (f) Scanning transmission electron microscopy energy-dispersive x-ray spectroscopy maps of the discovered alloy nanoparticle catalyst for dye degradation. (a–c) From Reference 34. © 2019 National Academy of Sciences. (d–f) Reprinted with permission from Reference 35. © 2023 American Chemical Society.

Megalibrary synthesis using multi-spray inking and functional substrate coatings dramatically increased the number of unique materials per megalibrary. This creates challenges for screening because detection methods must have excellent spatial resolution and sensitivity to differentiate nanoparticles with micron separation. This was addressed during the discovery of photocatalytic metal nanoparticles interfaced with semiconductor materials for the visible light degradation of organic pollutants, an important strategy for environmental remediation.40 A Au–Pd–Cu nanoparticle megalibrary was synthesized on a nanoparticulate TiO2 substrate to endow light absorption and charge-separation properties.35 Upon visible light irradiation in the presence of O2 and H2O, reactive oxygen species (ROS) were generated, which can degrade organic molecules. Based on this mechanism, it was reasoned that a fluorescent thin film on top of the megalibrary would lead to ROS reaction with the organic fluorophore causing a decrease in fluorescence intensity, the extent of which correlating with the amount of ROS generation. Importantly, embedding the fluorophore within a polymer film spatially confines fluorescence akin to the spatial confinement of each catalyst, thus enabling spatially selective ROS detection (Figure 7d). Photocatalyst screening was performed by irradiating the entire megalibrary for 30 min followed by fluorescence microscopy. The megalibrary contained 90,000 possible Au–Pd–Cu photocatalyst compositions, and each composition was produced in 25 × 25 arrays with 1-µm separation between each nanoparticle. Each array of 625 nanoparticles was separated from adjacent arrays by 25 µm to spatially distinguish the activity of each composition. Microscopy revealed the extent of fluorescence loss surrounding each composition, revealing the relative amounts of ROS generation and thus catalyst turnovers (Figure 7e). Overlaying the fluorescence with the spray profiles identified Au0.53Pd0.38Cu0.09–TiO2 as the highest performing photocatalyst (Figure 7f). Parallel catalyst screening using thin-film detection is a key advance that matches the parallel materials synthesis in megalibraries and dramatically increases the speed of characterization and the amount of data generated with this single experiment screening 90,000 compositions within 1 h. Ongoing work is expanding to the single nanoparticle screening level while also broadening the immobilized fluorescence detection technique to chemoselectively react with diverse products of interest akin to intracellular small molecule sensing in chemical biology.41

Materials prediction with machine learning and AI

Given the scale of the materials discovery problem, finding new materials for specific applications generally requires a priori information about target design spaces and structural motifs to narrow down the otherwise intractable realm of possibilities. However, the enormous scale of this problem makes the manual compilation of this information unrealistic. The increasingly widespread adoption of AI in the physical sciences, and specifically its ability to rapidly process vast quantities of data and synthesize relevant information,42 suggests that the integration of AI into the megalibrary platform, and the data streams associated with it, could be an extremely powerful combination that could further accelerate materials discovery. For instance, the AlphaFold model, which predicts protein structures from their sequence, has had immense impact on the scientific community and popularized the use of predictive AI in chemistry and biology.43 In the context of functional materials discovery with megalibraries, finding compositions that exhibit specific structural features is of particular interest to accelerate synthesis and screening workflows.

Since the first use of SPBCL to make multielemental nanoparticles,25 it has been employed in the synthesis and structural characterization of thousands of unique compositions. These data sets serve as starting points for the guided exploration of previously unknown phase spaces. To that end, in collaboration with the Toyota Research Institute, a machine learning model was trained on the composition and interface structures of this compiled data set and used to predict specific interfacial features in an iterative feedback loop between the model and physical experiments.9 Specifically, the model is designed to predict elemental compositions likely to result in a desired structural feature, such as nanoparticles with a single interface. These compositions were then verified experimentally using SPBCL and STEM-EDS characterization, and the results were fed back into the model to inform the next set of experiments (Figure 8a). In the process, the model gradually discovered new composition spaces with the desired properties, predicting complex structures far beyond what would be realistic by human intuition alone. For instance, an exploration of biphasic nanoparticles produced the most chemically complex biphasic particle ever synthesized, containing six elements, Au10Ag10Cu10Co20Ni40Pd10 (Figure 8b). Such an iterative traversal of a given materials space occurred incrementally, venturing further away from known datapoints as more data became available (Figure 8c).9 Future iterative exploration of materials spaces at the megalibrary scale with predicted gradient spray profiles will enable evaluation of entire compositional spaces at once.

Figure 8
figure 8

(a) The artificial intelligence (AI)-integrated experimental feedback loop for nanoparticle synthesis. (b) Scanning transmission electron microscopy energy-dispersive x-ray spectroscopy of a six-element, biphasic nanoparticle predicted by the machine learning model. (c) A 2D projection of the composition-embedded design space explored generated using the t-distributed stochastic neighbor embedding (TSNE) method. Gray circles represent the entire compositional space. Experimental data are color-coded with the measured interface counts. Acquisition suggestions by the algorithm Gaussian process (GP, large blue circles) and synthesized nanoparticle (NP) suggestions (large open circles) are overlaid on the same projection to visualize how the search space was traversed. Figures from Reference 9.

Beyond the predictive capabilities of AI for the targeted design and synthesis of polyelemental nanoparticles, its ability to rapidly extract information from large multimodal data streams is perfectly suited for megalibrary screening. One can easily envision unsupervised models to cluster data generated from megalibrary chips into regions of interest. Similarly, the megalibrary geometry as regular arrays of discrete materials facilitates integration with computer vision tools for automated screening or quality control experiments. Finally, because the capabilities of AI models are derived directly from its training data sets, screening experiments with millions of datapoints are prime sources of training data with megalibraries serving as the largest available platform for continuously sourced materials data.

Summary and outlook

Megalibraries represent a transformation of high-throughput experimentation made possible with nanotechnology. Transitioning away from iterative experimentation toward parallel synthesis and screening techniques makes possible rapid surveying of unexplored materials design spaces. Experimental design does not need to be limited by empirical observations since enormous portions of the materials genome can be readily tested, thus eliminating barriers to pursue understudied materials and applications. Importantly, the shrinking of experiments from the macroscale to the nanoscale by performing synthetic chemistry within attoliter volume nanoreactors makes this massive parallelization possible.

From synthesis to screening, the megalibrary platform is a vastly interdisciplinary strategy. As such, this new generation of materials discovery and materials genome exploration will require contributions from materials synthesis, chemical catalysis, biological sensing, and computation, and it is thus a highly collaborative endeavor. In particular, training machine learning algorithms on the vast quantities of high-quality megalibrary data not only will help guide materials discovery but also unveil structure–function relationships that provide the basis for fundamental materials design. The combination of megalibraries and AI stands to revolutionize how we pursue and optimize new materials and bring them into real-world applications.