Introduction

Current practices for developing tools and infrastructure used in multiscale materials design, development, and deployment are generally highly localized (sometimes even within a single organization) resulting in major inefficiencies (duplication of effort, lack of code review, not engaging the right talent for the right task, etc.). Although it is well known that the pace of discovery and innovation significantly increases with effective collaboration [14], scaling such efforts to large heterogeneous communities such as those engaged in materials innovation has been very difficult.

The advent of information technology has facilitated massive electronic collaborations (generally referred to as e-collaborations) that have led to significant advances in several domains including the discovery of the Higg’s boson [5], the sequencing of the human genome [6], the Polymath project [7], the monitoring of species migration [8, 9], and numerous open-source software projects. E-collaborations allow experts from complementary domains to create highly productive collaborations that transcend geographical, temporal, cultural, and organizational distances. E-collaborations require a supporting cyber-infrastructure that allows team members to generate, analyze, disseminate, access, and consume information at dramatically increased pace and/or quantity [10]. A key element of this emerging cyber-infrastructure is open-source software, as it eliminates collaboration hurdles due to software licenses and can help foster truly massive e-collaborations. In other words, even with collaborations involving proprietary data, open-source cyber-infrastructure provides a common language that can facilitate e-collaborations with large numbers of team members.

Several recent national and international initiatives [1113] have been launched with the premise that the adoption and utilization of modern data science and informatics toolsets offers a new opportunity to accelerate dramatically the design and deployment cycle of new advanced materials in commercial products. More specifically, it has been recognized that innovation cyber-ecosystems [14] are needed to allow experts from the materials science and engineering, design and manufacturing, and data science domains to collaborate effectively. The challenge in integrating these traditionally disconnected communities comes from the vast differences in how knowledge is captured, curated, and disseminated in these communities [15]. More specifically, knowledge systems in the materials field are rarely captured in a digital form. In order to create a modern materials innovation ecosystem, it is imperative that we design, develop, and launch novel collaboration platforms that allow automated distilling of materials knowledge from large amounts of heterogeneous data acquired through customized protocols that are necessarily diverse (elaborated next). It is also imperative that this curated materials knowledge is presented to the design and manufacturing experts in highly accessible (open) formats.

Customized materials design has great potential for impacting virtually all emerging technologies, with significant economic consequences [12, 13, 1624]. However, materials design (including the design of a manufacturing process route) resulting in the combination of properties desired for a specific application is a highly challenging inverse problem due to the hierarchical nature of the internal structure of materials. Material properties are controlled by the materials’ hierarchical internal structure as well as physical phenomena with timescales that vary at each of the hierarchical length scales (from the atomic to the macroscopic length scale). Characterization of the structure at each of these different length scales is often in the form of images which come from different experimental/computational techniques resulting in highly heterogeneous data. As a result, tailoring the material hierarchical structure to yield desired combinations of properties or performance characteristics is enormously difficult. Figure 1 provides a collection of materials images depicting materials structures at different length scales, which are generally acquired using diverse protocols and are captured in equally diverse formats.

Fig. 1
figure 1

Heirarchical materials structure at multiple length scales a Simulated graphene crystalline structure. b Simulated fivefold icosahedral Al-Ag quasicrystals. c High-resolution electron microscopy image of delamination cracks in h-BN particles subjected to compressive stress in the (0001) planes (within a silicon nitride particulate-reinforced silicon carbide composite). d Electron diffraction pattern of an icosahedral Zn-Mg-Ho quasicrystal. e Cross-polarised light image of spherulites in poly-3-hydroxy butyrate (PHB). f Cast iron with magnesium-induced spheroidised graphite. g SEM micrograph of a taffeta textile fragment. h Optical microscopy image of a cross-section of an aluminum casting. i X-ray tomography image of open-cell polyurethane foam. Images courtesy of Core-Materials [25] (Color figure online)

While the generation (from experiments and computer simulations) and dissemination of datasets consisting of heterogeneous images are necessary elements in a modern materials innovation ecosystem, there is an equally critical need for customized analytics that take into account the stochastic nature of these data at multiple length scales in order to extract high-value, transferable knowledge. Data-driven process-structure-property (PSP) linkages [26] provide a systemic, modular, and hierarchical framework for community engagement (i.e., several people making complementary or overlapping contributions to the overall curation of materials knowledge). Computationally cheap PSP linkages also communicate effectively the curated materials knowledge to design and manufacturing experts in highly accessible formats.

The Materials Knowledge Systems in Python project (PyMKS) is the first open-source materials data analytics toolkit that can be used to create high-value PSP linkages for hierarchical materials in large-scale efforts driven and directed by an entire community of users. In this regard, it could be a foundational element of the cyber-infrastructure needed to realize a modern materials innovation ecosystem.

Current Materials Innovation Ecosystem

Open-access materials databases and computational tools are critical components of the cyber-infrastructure needed to curate materials knowledge through effective e-collaborations [27]. Several materials science open-source computational toolsets and databases have emerged in recent years to help realize the vision outlined in the Materials Genome Initiative (MGI) and the Integrated Computational Materials Engineering (ICME) paradigm [12, 13, 1624]. Yet, the creation and adoption of a standard materials taxonomy and database schema has not been established due to the unwieldy size of material descriptors and heterogeneous data. Additionally, the coupled physical phenomena that govern material properties are too complex to model all aspects of a material simultaneously using a single computational tool. Consequently, current practices have resulted in the development of computation tools and databases with a narrow focus on specific length/structure scales, material classes, or properties.

The NIST Data Gateway contains over 100 free and paid query-able web-based materials databases. These databases contain atomic structure, thermodynamics, kinetics, fundamental physical constants, and x-ray spectroscopy, among other features [28]. The NIST DSpace provides a curation of links to several materials community databases [29]. The NIST Materials Data Curation Systems (MDCS) is a general online database that aims to facilitate the capturing, sharing, and transforming of materials data [30]. The Open Quantum Materials Database (OQMD) is an open-source data repository for phase diagrams and electronic ground states computed using density functional theory [31]. MatWeb is a database containing materials properties for over 100,000 materials [32]. Atomic FLOW of Materials Discovery (AFLOW) databases millions of materials and properties and hosts computational tools that can be used for atomic simulations [33]. The Materials Project (and the tool pyMatgen) [34, 35] provides open web-based access to computed information on known and predicted materials as well as analysis tools for electronic band structures. The Knowledgebase of Interatomic Models (OpenKIM) hosts open-source tools for potentials for molecular simulation of materials [36]. PRedictive Integrated Structural Materials Science (PRISMS) hosts a suite of ICME tools and datastorage for the metals community focused on microstructure evolution and mechanical properties [37]. Granta Materials and Citrine Informatics represent two of the for-profit efforts in these domains. Granta Materials Intelligence provides enterprise-scale infrastructure for in-house materials data management, which can be integrated with design tools [38]. Citrine Informatics is a cloud-based platform that provides access to multisource material databases as well as machine learning tools [39, 40]. Citrine Informatics also maintains open-access databases as well as open-source software projects [41].

SPPARKS Kinetic Monte Carlo Simulator (SPPARKS) is a parallel Monte Carlo code for on-lattice and off-lattice models [42]. MOOSE is a parallel computational framework for coupled systems of nonlinear equations [43]. Dream3D is a tool used for synthetic microstructure generation, image processing, and mesh creation for finite element [44].

While there exits a sizable number of standard analytics tools [4554], none of them are tailored to create PSP linkages from materials structure image data and their associated properties. PyMKS aims to seed and nurture an emergent user group in the materials data analytics field for establishing homogenization and localization (PSP) linkages by leveraging open-source signal processing and machine learning packages in Python. An overview of the PyMKS project accompanied with several examples is presented here. This paper is a call to others interested in participating in this open science activity.

Theoretical Foundations of Materials Knowledge Systems

Material properties are controlled by their internal structure and the diverse physical phenomena occurring at multiple time and length scales. Generalized composite theories [55, 56] have been developed for hierarchical materials exhibiting well-separated length scales in their internal structure. Generally speaking, these theories either address homogenization (i.e., communication of effective properties associated with the structure at a given length scale to a higher length scale) or localization (i.e., spatiotemporal distribution of the imposed macroscale loading conditions to the lower length scale). Consequently, homogenization and localization are the essential building blocks in communicating the salient information in both directions between hierarchical length/structure scales in multiscale materials modeling. It is also pointed out that localization is significantly more difficult to establish, and implicitly provides a solution to homogenization.

The most sophisticated composite theory available today that explicitly accounts for the full details of the material internal structure (also simply referred as microstructure) comes from the use of perturbation theories and Green’s functions [55, 5768]. In this formalism, one usually arrives at a series expansion for both homogenization and localization, where the individual terms in the series involve convolution integrals with kernels based on Green’s functions. This series expansion was refined and generalized by Adams and co-workers [67, 69, 70] through the introduction of the concept of a microstructure function, which conveniently separates each term in the series into a physics-dependent kernel (based on Green’s functions) and a microstructure-dependent function (based on the formalism of n-point spatial correlations [6166]).

Materials knowledge systems (MKS) [7177] complements these sophisticated physics-based materials composite theories with a modern data science approach to create a versatile framework for extracting and curating multiscale PSP linkages. More specifically, MKS employs a discretized version of the composite theories mentioned earlier to gain major computational advantages. As a result, highly adaptable and templatable protocols have been created and used successfully to extract robust and versatile homogenization and localization metamodels with impressive accuracy and broad applicability over large microstructure spaces.

The MKS framework is based on the notion of a microstructure function. The microstructure function provides a framework to represent quantities that describe material structure such as phase identifiers, lattice orientation, chemical composition, defect types, and densities, among others (typically referred to as local states). The microstructure function, m j (h;s), represents a probability distribution for the given local state, hH, at each position, sS, in a given microstructure, j [7881]. The introduction of the local state space H (i.e., the complete set of all potential local states) provides a consolidated variable space for combining the diverse attributes (often a combination of scalar and tensor quantities) needed to describe the local states in the material structure. The MKS framework requires a discretized description of m j , which is denoted here as m j [h;s], where the [⋅;⋅] represent the discretized space (in contrast to (⋅;⋅), which defines the continuous space). The “ ;” symbol separates indices in physical space to the right of “ ;” from indices in local state space to the left of “ ;”. In most applications, S is simply tessellated into voxels on a regular (uniform) grid so that the position can be denoted by si,j,k in three dimensions.

As noted earlier, the local state space in most advanced materials is likely to demand sophisticated representations. In prior work [73, 82, 83], it was found that spectral representations on functions on the local state space offered many advantages both in compact representation and in reducing the computational cost. In such cases, h indexes the spectral basis functions employed. The selection of these functions depends on the nature of local state descriptors. Examples include (i) the primitive basis (or indicator functions) used to represent simple tessellation schemes [71, 72, 7479, 84], (ii) generalized spherical harmonics used to represent functions over the orientation space [73, 82], and (iii) Legendre polynomials used to represent functions over the concentration space [83].

Homogenization

Comparing different microstructures is quite difficult even after expressing them in convenient discretized descriptions mainly due to the lack of a reference point or a natural origin for the index s in the tessellation of the microstructure volume. Yet the relative spatial distributions of the local states provide a valuable representation of the microstructure that can be used effectively to quantify the microstructure and compare it with other microstructures in robust and meaningful ways [7779, 81, 84]. The lowest order of spatial correlations with relative spatial information comes in the form of 2-point statistics and can be computed as a correlation of a microstructure function such that

$$ f_{j}[h, h^{\prime}; r] = \frac{1}{{\Omega}_{j}\left[r\right]} \sum\limits_{s} m_{j}[h; s] m_{j}[h^{\prime}; s + r], $$
(1)

where r is a discrete spatial vector within the voxelated domain specified by s, f j [h,h ;r] is one set of 2-point statistics for the local stats h and h , and Ω j [r] is a normalization factor that depends on r [84]. The subscript j refers to a sample microstructure used for analysis (i.e., each j could refer to a microstructure image). The physical interpretation of the 2-point statistics is explained in Fig. 2 with a highly simplified two-phase microstructure (the two phases are colored white and gray). If the primitive basis is used to discretize both the spatial domain and the local state space then f j [h,h ;r] can be interpreted as the probability of finding local states h and h at the tail and head, respectively, of a randomly placed vector r.

Fig. 2
figure 2

The discretization scheme for both the microstructure function and the vector space needed to define the spatial correlations, illustrated on a simple two-phase composite material. The discretized vectors r describe the relative positions between different spatial locations

Two-point statistics provide a meaningful representation of the microstructure, but create an extremely large feature space that often contains redundant information. Dimensionality reduction can be used to create low dimensional microstructure descriptors from the sets of spatial correlations (based on different selections of h and h ) with principal component analysis (PCA). The PCA dimensionality reduction can be mathematically expressed as follows:

$$ f_{j}\left[l\right] \approx \sum\limits_{k\in K} \mu_{j}\left[k\right] \phi\left[k,l\right] + \overline{f[l]}. $$
(2)

In Eq. 2, f j [l] is a contracted representation of f j [h,h ;r] as a large vector (i.e., l maps uniquely to every combination of h, h , and r deemed to be of interest in the analyses). The μ j [k] are low dimensional microstructure descriptors (the transformed 2-point statistics) or principal component scores (PC scores). The ϕ[k,l] are the calibrated principle components (PCs) and the \(\overline {f[l]}\) are the mean values from the calibration ensemble of f j [l] for each l. The kK indices refer to the μ j [k] in decreasing order of significance and are independent of l, l , and r. The main advantage of this approach is that the f j [l] can be reconstructed to sufficient fidelity with only a small subset of μ j [k] [85].

After obtaining the needed dimensionality reduction in the representation of the material structure, machine learning models can be used to create homogenization PSP linkages of interest. As an example, a generic homogenization linkage can be expressed as follows:

$$ p_{j}^{\text{eff}} = \mathcal{F}(\mu_{j}[k]) $$
(3)

In Eq. 3, \(p_{j}^{\text {eff}}\) is the effective materials response (reflecting an effective property in structure-property linkages or an evolved low dimensional microstructure descriptor in process-structure linkages), and \(\mathcal {F}\) is a machine learning function that links μ j [k] to \(p_{j}^{\text {eff}}\).

Localization

MKS Localization linkages are significantly more complex than the homogenization linkages. These are usually expressed in the same series forms that are derived in the general composite theories, while employing discretized kernels based on Green’s functions [55, 5768]. Mathematically, the MKS localization linkages are expressed as follows:

$$\begin{array}{@{}rcl@{}} p_{j}[s] &=& \sum\limits_{h; r} \alpha[h; r] m_{j}[h; s - r]\\ && + \sum\limits_{h, h^{\prime}; r, r^{\prime}} \alpha[h, h^{\prime}; r, r^{\prime}] m_{j}[h; s - r] m_{j}\\&&~~~ [h^{\prime}; s - r^{\prime}] + ... \end{array} $$
(4)

In Eq. 4, p j [s] is the spatially resolved (localized) response field (e.g., a response variable such as stress or strain rate in a structure-property linkage, or an evolved microstructure function in a process-structure linkage), and α[h;r] are the Green’s function-based discretized influence kernels. These digital kernels are calibrated using regression methods [7174, 82, 83].

Figure 3 provides schematic overviews of the MKS homogenization and localization workflows. More detailed explanations on the MKS homogenization and localization linkages can be found in prior literature [7179, 83, 84].

Fig. 3
figure 3

The MKS homogenization workflow (left) consists of four steps. 1. Discretize the raw microstructure with the microstructure function. 2. Compute 2-point statistics using local states (Eq. 1). 3. Create low dimensional microstructure descriptors using dimensionality reduction techniques (Eq. 2). 4. Establish a linkage with low dimensional microstucture descriptors using machine learning. (Eq. 3). The MKS localization workflow (right) consists of 2 steps. 1. Discretize the raw microstructure with the microstructure function. 2. Calibrate physics-based kernels using regression methods (Eq. 4) (Color figure online)

Materials Knowledge Systems in Python

PyMKS is an object-oriented numerical implementation of the MKS theory developed in the literature [72]. It provides a high-level, computationally efficient framework to implement data pipelines for classification, cataloging, and quantifying materials structures for PSP relationships. PyMKS is written in Python, a natural choice for scientific computing due to its ubiquitous use among the data science community as well as many other favorable attributes [86]. PyMKS is licensed under the permissive MIT license [87] which allows for unrestricted distribution in commercial and non-commercial systems.

Core Functionality

PyMKS consists of four main components including a set of tools to compute 2-point statistics, tools for both homogenization and localization linkages, and tools for discretizing the microstructure. In addition, PyMKS has modules for generating data sets using conventional numerical simulations and a module for custom visualization of microstructures. PyMKS builds on Scikit-learn’s pipelining methodology to create materials-specific machine learning models. This is a high-level system for combining multiple data and machine learning transformations into a single customizable pipeline with only minimal required code. This approach makes cross-validation and parameter searches simple to implement and avoids the complicated book keeping issues associated with training, testing, and validating data pipelines in machine learning.

The starting point for an MKS homogenization analysis is to use 2-point statisics as outlined in Eq. 1 and provided in PyMKS by the MKSStructureAnalysis object, which calculates the objective low dimensional structure descriptors, μ j [k]. The default dimensionality reduction technique is PCA, but any model that uses the transform_fit or a “transformer” object can be substituted. After calculating the descriptors, the MKSHomogenizationModel is used to create linkages between the μ j [k] and the effective material response, \(p_{j}^{\text {eff}}\), as indicated in Eq. 3. The default machine learning algorithm is a polynomial regression, but any estimator with the fit and predict methods can be substituted to create the linkages between μ j [k] and \(p_{j}^{\text {eff}}\).

The MKSLocalizationModel object provides the MKS localization functionality. It calibrates the first-order influence kernels α[h;r] used to predict local materials responses, p j [s], as outlined in Eq. 4. The calibration of the influence kernels is achieved using a variety of linear regression techniques described in numerous previous studies [7173, 83]. The MKSLocalizationModel object uses fit and predict methods to follow the standard interface for a Scikit-learn estimator object.

To use either the homogenization or the localization models in PyMKS, the microstructure first needs to be represented by a microstructure function, m j [h,s]. The bases module in PyMKS contains four transformer objects for generating the m j [h,s] using a varietly of discretization methods [7177, 83]. These four objects can be thought of as materials-specific extension to the feature extraction module in Scikit-learn. A PrimitiveBasis object uses indicator (or hat) functions and is well suited for microstructures that have discrete local states (e.g., distinct thermodynamic phases). The LegendreBasis and FourierBasis objects create spectral representations of microstructure functions defined on nonperiodic and periodic continuous local state spaces, respectively. For example, functions over a range of chemical compositions can be described using LegendreBasis, while functions over orientations in two-dimensional space can be described using FourierBasis. Furthermore, GSHBasis creates compact spectral representations for functions over lattice orientation space (such as those needed to describe polycrystalline microstructures) [8899].

PyMKS contains modest data generation tools (in the datasets module) that are used in both the PyMKS examples and the PyMKS test suite. The MicrostructureGenerator object creates stochastic microstructures using digital filters. This assists users in creating PyMKS workflows even when data is unavailable. PyMKS has objects for generating sample data from both a spinodal decomposition simulation (using the CahnHilliardSimulation object) and a linear elasticity simulation (using the ElasticFESimulation object). PyMKS comes with custom functions for visualizing microstructures in elegant ways (in the tools module). These are used extensively in the PyMKS example notebooks to minimize incidental code associated with visualization.

Underlying Technologies

PyMKS is built upon the highly optimized Python packages NumPy [100], SciPy [101], and Scikit-learn [47]. NumPy arrays are the primary data structure used throughout PyMKS and provide the basic vector and matrix manipulation operations. SciPy’s signal processing and numerical linear algebra functions are used to calibrate models and generate synthetic data. PyMKS is highly integrated with Scikit-learn and mimics its simple API in order to leverage from Scikit-learn’s data pipeling methodology for machine learning and data transformations. In addition, PyMKS uses the Pytest framework to automate execution of the test suite [102].

Optional packages that can be used with PyMKS include Simple Finite Elements in Python (SfePy) [103], the python wrapper for the FFTW library (pyFFTW) [104], and the plotting package Matplotlib [105]. SfePy is used to simulate linear elasticity to create sample response field data. PyFFTW is a highly optimized fast Fourier transform library that enhances the efficiency of PyMKS and enables parallel computations in PyMKS. Matplotlib is used to generate custom microstructure visualizations.

Development Practices

PyMKS leverages from existing tools, standards, and web resources wherever possible. In particular, the developers are an open community that use GitHub for issue tracking and release management (see https://github.com/materialsinnovation/pymks https://github.com/materialsinnovation/pymks). Additionally, a Google group is used as a public forum to discuss the project development, support, and announcements (see pymks-general@googlegroups.com). The Travis CI continuous integration tool is used to automate running the test suite for branches of the code stored on GitHub. Code standards are maintained by following the Python PEP8 standards and by reviewing code using pull requests on GitHub. Detailed administrative guidelines are outlined in the ADMINISTRATA.md document, and potential developers are encouraged to follow them.

Examples of Homogenization and Localization with PyMKS

A demonstration of the MKS homogenization and localization workflows as shown in Fig. 3 is presented in this section using PyMKS. Additional workflow examples can be found on the PyMKS website http://pymks.org.

Prediction of Effective Stiffness with Homogenization

Generation of Calibration Data

In this example, the MKSHomogenizationModel is used to create a structure-property linkage between a 2-phase composite material and effective stiffness C x x .

Multiple classes of periodic microstructures and their effective elastic stiffness values can be generated by importing the make_elastic_stiffness function from pymks.datasets.

This function has several arguments. n_samples is a list indicating the number of microstructures for each class. grain_size and volume_fraction are also lists that specify the average grain features and mean volume fractions for each of the microstructure classes. Variance in the volume fractions for each class can be controlled using percent_variance which specifies a range of volume fractions centered about the mean values (i.e., volume_fration ±percent_variance). size indicates the dimensions of all the microstructures. elastic_modulus and poissons_ratio are used to indicate the material properties for each of the phases. Lastly, seed is used as the seed for the random number generator.

In this homogenization example, 50 samples from 16 different microstructures classes with dimensions 21 × 21, and their effective stiffness values were created totaling to 800 samples. Each of the 16 classes has different sized microstructure features and volume fractions. The make_elastic_stiffness function returns the microstructures X and their associated stiffness values y.

figure a

An example microstructure from each of the 16 classes can be visualized by importing draw_microstructures function from pymks.tools. The output from draw_ microstructures can be found in Fig. 4.

figure b
Fig. 4
figure 4

One sample from each of the 16 different microstructure classes used for calibration of the homogenization model

Calibration of Homogenization Model

Before an instance of the MKSHomogenizationModel can be made, an instance of a basis class is needed to specify the discretization method for the microstructure functions (see Fig. 3). For this particular example, there are only two discrete phases numerated by 0 and 1. It has been shown that the primitive basis provides the most compact representation of discrete phases [71, 71, 74, 7779, 81, 84]. In PyMKS, the class PrimitiveBasis from pymks.bases can be used with n_states equal to 2 and the domain equal to [0, 1].

The periodic axes as well as the set(s) of spatial correlations need to be specified in addition to the basis class for the MKSHomogenizationModel. This is done using the arguments periodic_axes and correlations respectively. In practice, the set of spatial correlations are a hyper-parameter of our model that could be optimized, but for this example, only the two autocorrelations will be used.

figure c

The default pipeline used to create the homogenization linkage uses PCA and polynomial regression objects from Scikit-learn. Using GridSearchCV from Scikit-learn, cross-validation is used on the testing data to find the optimal number of principal components and degree of polynomial (based on the R-squared values) within a defined subspace for the hyper parameters for our model. A dictionary params_to_tune defines the subspace. For this example, n_components will be varied between 1 to 13, and degree of the polynomial regression will be varied between 1 to 3. StratifiedKFold is used to ensure that microstructures from each of the classes are used for each fold during cross-validation. The array labels is used to label each of the classes.

figure d

The results of our parameter grid search can be examined by either printing or creating visualizations. The parameters and score of the best estimator can be printed as shown below.

figure e

Two different visualizations of the results from GridsearchCV can be created using draw_ gridscores_matrix and draw_gridscores from pymks.tools.

figure f

draw_gridscores_matrix provides a visualization of two matrices for both the mean R-squared values and their standard deviation. The output from draw_gridscores_matrix can be found in Fig. 5.

Fig. 5
figure 5

Mean R-squared values and standard deviation as a function of the order of the polynomial and the number of principal components

draw_gridscores provides another view of the same information with the mean values indicated by the points and the standard deviation indication by the shared regions. The output from draw_gridscores can be found in Fig. 6.

figure g
Fig. 6
figure 6

The mean R-squared values indicated by the points and the standard deviation indication by the shared regions as a function of the number of principal components for the first three orders of a polynomial function (Color figure online)

For the specified parameter range, the model with the highest R-squared value was found to have a 2nd-order polynomial with 11 principal components. This model is calibrated using the entire training dataset and is used for the rest of the example.

figure h

Prediction of Effective Stiffness Values

In order to validate our model, additional data is generated using the make_elastic_stiffness function again with the same parameters with the exception of the number of samples and the seed used for the random number generator. The function returns the new microstructure X_new and their effective stiffness values y_new.

figure i

Effective stiffness values predicted by the model for the new data are generated using the predict method.

figure j

A visualization of the PC scores for both the calibration and the validation data can be created using draw_components_scatter from pymks.tools. The output from draw_components_scatter can be found in Fig. 7.

Fig. 7
figure 7

Low dimensional microstructure distributions (μ j [k] from Eq. 2) for both the calibration and validation datasets (Color figure online)

Because both the validation and the calibration data were generated from the make_elastic_stiffness function with the same parameters, both sets of data are different samples from the same distribution. Similar visualizations can provide insights on differences between different data sources.

figure k

To evaluate our model’s predictions, a goodness-of-fit plot can be generated by importing draw_goodness_ of_fit from pymks.tools. The results from draw_ goodness_of_fit can be found in Fig. 8. Additionally, the R-squared value for our predicted data can be printed.

figure l
figure m
Fig. 8
figure 8

Goodness-of-fit plot for effective stiffness C x x for the homogenization model (Color figure online)

Prediction of Local Strain Field with Localization

Generation of Calibration Data

In this example, the MKSLocalizationModel is used to predict the local strain field for a three-phase microstructure with elastic moduli values of 80, 100, and 120 MPa; Poisson’s ratio values all equal to 0.3 and a macroscopic imposed strain equal to 0.02. The model is calibrated using delta microstructures

(analogous to using a unit impulse response to find the kernel of a system in signal processing) [71]. The material parameters specified above are used in a finite element simulation using the make_elasticFEstrain_delta function from pymks.datasets. The number of Poisson’s ratio and elastic moduli values indicates the number of phases.

figure n

Delta microstructures are composed of only two phases with the center of the microstructure being a different phase from the rest. All permutations of the delta microstructures and their associated strain fields ε x x are needed to calibrate the localization model. A delta microstructure and its strain field can be visualized using draw_ microstructure_strain from pymks.tools. The output from draw_microstructure_strain can be found in Fig. 9.

figure o
Fig. 9
figure 9

Delta microstructure (right) and its associated strain field (left). The delta microstructures and their local response fields are used to calibrate the localization model (Color figure online)

Calibration of the Localization Model

In order to make an instance of the MKSLocalization Model, an instance of a basis class must first be created to specify the discretization method for the microstructure function (see Fig. 3). For this particular example, there are three discrete phases; therefore, the PrimitiveBasis from pymks.bases will be used. The phases are enumerated by 0, 1, and 2; therefore, we have three local states with a domain from 0 to 2. An instance of the PrimitiveBasis with these parameters can be used to create an instance of the MKSLocalizationModel as follows:

figure p

With the delta microstructures and their strain fields, the influence kernels can be calibrated using the fit method. A visualization of the influence kernels can be generated using the draw_coeff function from pymks.tools. The results from draw_coeff can be found in Fig. 10.

figure q
Fig. 10
figure 10

Calibrated influence kernels for the localization model (Color figure online)

Prediction of the Strain Field for a Random Microstructure

Model validation is done by comparing strain fields computed using a finite element simulation and our localization model for the same random microstructure. The make_elasticFEstrain_random function from pymks.datasets generates a random microstructure and its strain field results from finite element analysis. The output from make_elasticFEstrain_random is visualized using draw_microstructure_strain and can be found in Fig. 11.

figure r
Fig. 11
figure 11

Random microstructure and its local strain field found using finite element analysis (Color figure online)

The localization model predicts the strain field by passing the random microstructure to the predict method. A visualization of the two strain fields from both the localization model and finite element analysis can be created using draw_strains_compare from pymks.tools. The output from draw_strains_compare can be found in Fig. 12.

figure s
Fig. 12
figure 12

A comparison between the local strain field computed using finite element (left) and the prediction from the localization model (right) (Color figure online)

These examples demonstrate the high-level code that creates accurate and computationally efficient homogenization structure-property linkages using MKSHomogenization Model and localization linkages using MKSLocalization Model with PyMKS.

Conclusion

The MKS framework offers a practical and computationally efficient approach for distilling and disseminating the core knowledge gained from physics-based simulations and experiments using emerging concepts in modern data science. PyMKS is an open-source project with a permissive license that provides simple high-level APIs to access the MKS framework by implementing pipelines from Scikit-learn with customized objects for data from hierarchical materials. PyMKS has been launched with the aim to nucleate and grow an emergent community focused on establishing data-driven homogenization and localization process-structure-property linkages for hierarchical materials.