Introduction

Nanomaterials generally refer to materials with dimensions ranging from 1 to 100 nm. They possess high specific surface area, magnetic characteristics, quantum effects, exceptional mechanical properties, elevated thermal and electrical conductivity, outstanding catalytic performance, as well as antimicrobial activities [1], which can be attributed to their size falling within the transitional region between atomic clusters and macroscopic substances. Nanomaterials find extensive applications across diverse fields including magnetism, electricity, optics, machinery, catalysis, and biomedicine. Moreover, they are widely integrated into our daily lives. For instance, graphene nanomaterials exhibit utility in the electrode materials industry. The forefront of nano-science research lies in the development of nanomaterials with tailored functionalities. A material's function is dictated by its structure; henceforth, the evolution of structure during synthesis or utilization plays a decisive role in determining its properties. Consequently, investigating the in situ structural evolution of nanomaterials represents a prominent research direction within this field [2]. The in situ dynamic SAXS technique is an essential approach for characterizing nanomaterials, as it enables the elucidation of nucleation, growth, evolution, and structural transformation processes [3, 4].

SAXS is a specialized characterization technique designed specifically for investigating the structure and morphology of nanomaterials. The scattering angle range typically spans 5° around the incident beam, corresponding to nanomaterial sizes ranging from 1 to 100 nm. When an X-ray is through a sample, the electrons of the atoms in the sample will oscillate around their equilibrium positions under the action of the incident beam. Therefore, SAXS is particularly sensitive to the inhomogeneity of the electron density [5]. The samples used for SAXS experiments can be solid, liquid, gas, crystalline, amorphous, or a mixture between them, or porous materials [6]. SAXS can provide information on nanoparticles' shape, size, porosity, type, and size distribution. The commonly used softwares for processing SAXS data include FIT2D, RAW, SASfit, and SASview [7,8,9,10]. Typically, when collecting SAXS data, two-dimensional images are obtained, which generally should be converted into one-dimensional data, then undergo preprocessing steps, such as normalization and background correction before analysis. The scattering data obtained from a single particle in the absence of concentration effects can be viewed as coming from a single scatter. The one-dimensional scattering curve is analyzed using software such as GNOM or GIFT [11, 12] to obtain the distance distribution function of the single scatterer. The distance distribution function serves as an input file for Dammin software [13], the shape and size information can be obtained. Software like MCSAS [14] can be used to compute the granularity distribution of the polydisperse system. The labor-intensive use of this software and the slow, subjective data processing is unsuitable for high-throughput SAXS data acquisition. In addition, obtaining different physical quantities often requires frequent switching between different software, this process will not only increase the time cost but also put forward a high theoretical knowledge reserve and proficiency requirements for data processing personnel.

Artificial intelligence methods have become increasingly popular in recent years and have been applied to various fields. They are especially effective at processing large-scale data sets in batches. For instance, vision algorithms can process a single image in a matter of milliseconds [15]. "AI for Science" refers to the application of artificial intelligence methods in basic scientific research, and it has become a popular interdisciplinary field of study in recent years. For example, Zhongzheng Zhou et al. enhanced the SNR (signal-to-noise ratio) of SAXS/WAXS images by constructing an encoder–decoder convolutional neural network. This network was then applied to synchrotron radiation experiments with short exposure times [16]; Minghui Sun et al. constructed FCNN, DenseNet, and PreActResNet34 algorithms to automatically predict 3D fiber orientation from synchrotron radiation X-ray micro focusing diffraction data, making the process of synchrotron radiation technology to characterize the orientation of biological materials no longer rely on iterative parameter fitting [17]; Janis Timoshenko et al. trained an Ann by constructing a simulated dataset and revealed the sub nano substructure in nano-assemblies formed by clusters under reactive atmosphere through the analysis of in situ SAXS and X-ray near-edge absorption spectroscopy (XANES) experimental data [18]; Ernest Y. Lee et al. developed an SVM-based classifier to study the correlation of ⍺-helical amp and its functional commonality and sequence homology. SVM training results were calibrated by SAXS experiments and found that the SVM metric σ was not correlated with the minimal inhibitory concentration of the peptide but with its ability to produce negative Gaussian membrane curvature [19]; Shuai Liu et al. reported the use of convolutional neural networks to learn scattering patterns to classify the orientation of nanoparticles in thin films [20]. Most applications of machine learning in the field of small-angle scattering aim to assist researchers in selecting appropriate models for fitting SAXS or small-angle neutron scattering (SANS) data [21,22,23,24], as well as to accelerate the analysis of SAXS data [25,26,27,28,29,30]. These works primarily focus on enhancing the experimental data and extracting structural information from nanomaterials, thereby significantly improving the efficiency of data processing.

Processing and parsing large amounts of SAXS data are a crucial step in the overall workflow of SAXS representation. This is especially important when dealing with data from in situ SAXS closed-loop experiments conducted under the conditions of fourth-generation synchrotron radiation sources. That is to say, during the experimental process, data are collected and analyzed to obtain structural parameters, and then experimental conditions are sequentially modified according to the structural parameters to carry out a new experimental process [31, 32], which requires a fast speed of structural information analysis. Currently, the majority of data processing methods involve offline processing after the experiment has been completed. This approach hinders the prompt of online processing of data and, consequently, feedback on results. Even when automatic feedback of the experimental process results is unnecessary, quick and precise experimental results are beneficial to detect changes in the sample structure in real-time during the experiment. This can help researchers to make timely adjustments to the in situ environment, which can significantly improve the efficiency of the experiment. The existing SAXS data processing software requires improvements for its data processing function and the development of online data processing mode. MCSAS software requires the morphology of nanoparticles as input before fitting the size and distribution of nanoparticles. Obtaining the morphology of nanoparticles is usually done through scanning electron microscopy (SEM), which cannot be achieved in the SAXS in situ experiment. Currently, the shape and structure of nanoparticles can only be analyzed using offline SEM technology. However, SEM can only provide very limited local information, which is less statistical than SAXS. On the other hand, the software available for processing SAXS data cannot handle large quantities of image data within a short period of time. For example, in the High Energy Photon Source (HEPS) to be completed, the sample exposure time is in milliseconds, and the data generated each year can reach the order of Exabytes (EB) [33]. At present, the SAXS data processing software relying on the experience of researchers has been unable to cope with the high-throughput experimental data processing tasks of this order of magnitude. Therefore, we use the method of deep learning algorithm combined with machine learning regression algorithm to meet the challenge of high-throughput in situ SAXS data processing.

One of the main challenges in applying machine learning or deep learning algorithms to synchrotron radiation is the lack of a unified and standard open-source dataset. Researchers typically create their own datasets in order to address this issue. In this work, our goal is to use machine learning algorithms to identify the one-dimensional SAXS data of nanoparticles. We aim to quickly provide the shape and size evolution information of nanoparticles during in situ growth process, and then to enable automatic closed-loop feedback for subsequent studies. To begin, a Python program is utilized to create a batch data set meant for training and validation. The data contained in the data set is a simulated one-dimensional curve of SAXS. The labels are inclusive of shape and size information that correspond to the one-dimensional curve. Different algorithms were trained for shape classification and size prediction. Shape classification algorithms include SAXSNET, ResNet-18, ResNet-34, and Vision Transformer. The size prediction algorithms are Random Forest and XGboost regression algorithm. SAXSNET is a convolutional neural network developed using the PyTorch framework by ourself. In this study, the training results of SAXSNET were compared with those of three open-source algorithms, namely, ResNet-18, ResNet-34, and Vision Transformer. Additionally, we collected a batch of SAXS experimental data to evaluate the prediction accuracy of the deep learning model used in this study for determining the shape and size of the scatterer.

Methods

Data generation

After decades of development, synchrotron radiation SAXS has been developed extensively, and as a result, a significant amount of experimental data has been generated. However, for the lack of systematic organization of these data, there is little open-source SAXS database available for neural network training for the nanomaterials. To address this issue, we have developed Python programs that can generate large batches of SAXS data based on the theoretical formulation of SAXS. Based on the SAXS theory, the distribution of electron density in nanomaterials can be transformed using Fourier transform to derive the ideal SAXS scattering intensity curve. And a large number of simulated datasets for neural network were generated. However, there are differences between the simulated data used for training and the data obtained experimentally, the main difference is that the simulated data has no experimental error, but the neural network trained using the simulated data can be generalized to make predictions on real experimental data. The generalization ability of the model can be enhanced by fine-tuning the final fully connected layer of the neural network, which will be introduced in detail on that later.

However, it lacks the ability to produce a large volume of simulated data in batch mode. Therefore, we wrote a Python program to generate simulated data in batch according to the theoretical formula of SAXS. The shapes of the simulated data covered cylinder, parallelepiped, lamellar, and sphere. Most of the data obtained in the real experiments were from a polydisperse system, so the geometric parameters of each shape were set as polydisperse distribution and different aspect ratios were added to the particles of cylindrical and parallelepiped. Gaussian distribution and uniform distribution are used in the polydisperse system.

The SAXS simulated data for generating cylindrical nanoparticles are formulated as follows:

$$P(q) = \frac{{{\text{scale}}}}{V}\int_{0}^{\pi /2} {F^{2} \left( {q,\alpha } \right)\sin \alpha {\text{d}}\alpha } + {\text{background}}$$
$$F(q,\alpha ) = 2(\Delta \rho )V\frac{{\sin (\frac{1}{2}qL\cos \alpha )}}{{\frac{1}{2}qL\cos \alpha }}\frac{{J_{1} (qR\sin \alpha )}}{qR\sin \alpha }$$
$$q = \sqrt {q_{x}^{2} + q_{y}^{2} }$$

P(q) is the one-dimensional scattering intensity function of a randomly oriented cylinder, and F(q, α) is the scattering amplitude of a cylindrical cylinder with different inclination angles. α represents the angle between the axis of the cylinder and q. The cylinder's volume is given by V = πR2L, where L is its length and R is its radius. Δρ refers to the scattering length density difference between the scatterer and the solvent. J1 denotes the first-order Bessel function [34]. In the simulated data of cylindrical particles, the radius of cylindrical nanoparticles ranges from 1 to 150 nm, and the aspect ratio ranges from 2 to 20.

The SAXS simulated data for the generation of parallelepiped nanoparticles are formulated as follows:

$$I(q_{x} ,q_{y} ) = \frac{{{\text{scale}}}}{V}\left( {\Delta \rho \cdot V} \right)^{2} P(q_{x} ,q_{y} ) + {\text{background}}$$
$$P(q_{x} ,q_{y} ) = \left[ {\frac{{\sin \left( {\frac{1}{2}qA\cos \alpha } \right)}}{{\left( {\frac{1}{2}qA\cos \alpha } \right)}}} \right]^{2} \cdot \left[ {\frac{{\sin \left( {\frac{1}{2}qB\cos \beta } \right)}}{{\left( {\frac{1}{2}qB\cos \beta } \right)}}} \right]^{2} \cdot \left[ {\frac{{\sin \left( {\frac{1}{2}qC\cos \gamma } \right)}}{{\left( {\frac{1}{2}qC\cos \gamma } \right)}}} \right]^{2}$$
$$\begin{aligned} & \cos \alpha = \hat{A} \cdot \hat{q}, \\ & \cos \beta = \hat{B} \cdot \hat{q}, \\ & \cos \gamma = \hat{C} \cdot \hat{q} \\ \end{aligned}$$

the volume V = ABC, the contrast is defined as Δρ = ρp − ρsolvent [35, 36],In the simulated data, the side lengths of parallelepiped nanoparticles range from 1 to 500 nm.

The SAXS simulated data for generating lamellar nanoparticles are formulated as follows:

$$I(q) = {\text{scale}} \cdot \frac{2\pi P(q)}{{q^{2} \delta }} + {\text{background}}$$
$$P(q) = \frac{{2\Delta \rho^{2} }}{{q^{2} }}\left( {1 - \cos (q\delta )} \right) = \frac{{4\Delta \rho^{2} }}{{q^{2} }}\sin^{2} \left( {\frac{q\delta }{2}} \right)$$

δ is the total layer thickness and Δρ is the scattering length density difference [37, 38]. In the simulated data, the thickness range of the lamellar nanoparticles is 1–200 nm.

The SAXS simulated data for generating spherical nanoparticles are formulated as follows:

$$I(q) = \frac{{{\text{scale}}}}{V} \cdot \left[ {3V(\Delta \rho ) \cdot \frac{\sin (qr) - qr\cos (qr)}{{(qr)^{3} }}} \right]^{2} + {\text{background}}$$

scale” is the volume fraction, V is the volume of the scatterer, r is the radius of the sphere and background is the background level. SLD and sld_solvent are the scattering length densities (SLDs) of the scatterer and the solvent, respectively, whose difference is Δρ [39]. In the simulated data, the radius of the sphere nanoparticles ranges from 1 to 500 nm.

To obtain a more representative sample, the distribution coefficient and background noise are randomized, ranging from 0.01 to 0.1. The scale parameter is also randomized, varying in size from 0.01 to 1. A total of more than 10,000 simulated data were generated for each particle shape. The input data used for the convolutional neural network was a 2 × 500 matrix, with 500 sampling points on both the y and x axes. The scattering intensity was plotted on the y-axis; while, the scattering vector q (nm−1) was plotted on the x-axis. The range of q is 0.01 to 4 nm−1 according to q = 4πsin(θ)/λ. For a fixed detector area, the range of q values depends on the wavelength of the X-ray and the distance from the sample to the detector.

Deep learning algorithms

SAXSNET, ResNet-18, ResNet-34, and ViT classification algorithms for nanoparticle morphology classification were introduced as following. Among these methods, SAXSNET was developed by our team, it showed the same excellent shape classification ability compared with exceptional open-source algorithms such as ResNet-18, ResNet-34, and ViT, but with much higher speed of training and working. Random Forest and XGboost regression algorithms were utilized for nanoparticle size prediction.

SAXSNET

SAXSNET is a convolutional neural network that developed by ourself using the PyTorch framework. This type of neural network is a form of supervised deep learning algorithm that maps inputs to outputs using training data with labeled labels. In contrast to other neural networks, convolutional neural networks extract and combine features via convolutional layers, pooling layers, and fully connected layers [40]. SAXSNET is a model that utilizes a combination of a two-dimensional convolution layer and a one-dimensional convolution layer. The first step involves extracting the features of the 2 × 500 data matrix using the two-dimensional convolution layer. This results in obtaining a set of one-dimensional feature data. Following each convolutional layer are BatchNorm, LeakyReLU, and MaxPool layers. BatchNorm is a technique that standardizes each batch of data by bringing the mean and variance of each feature close to 0 and 1, respectively. This helps to stabilize the input to the network, accelerates the training process, and can also act as a form of regularization. The LeakyReLU function adds nonlinearity to a neural network, allowing it to learn and express complex nonlinear relationships. Unlike the ReLU function, LeakyReLU addresses the issue of output being zero when input data is negative, preventing some neurons from being activated. This feature helps accelerate the convergence of the neural network model [41]. MaxPool is a popular pooling operation used to reduce the spatial dimension of feature maps. This helps to decrease the number of parameters and computation needed, speeding up model training and making the model more robust. The features extracted by the convolutional layer are then passed to a fully connected layer for classification. The fully connected layer has a total of three layers and its structure is similar to that of an artificial neural network. To prevent overfitting, a Dropout layer was added to the fully connected layer. This layer randomly turns off some neurons of the fully connected layer during each iteration. Recent studies by Zhuang Liu and colleagues have shown that Dropout layers can not only prevent overfitting but also address underfitting [42].

Residual neural network

In this study, ResNet-18 and ResNet-34 were compared using the same dataset for training and validation purposes. ResNet, which stands for Residual Neural Network, is a deep convolutional neural network introduced by Kaiming He et al. from Microsoft Research in 2015 [43]. The convolutional neural network has been improved to address the issue of deep network training. When the previous neural network has many layers, transmitting the gradient layer by layer during backpropagation can cause the gradient to gradually decay. This makes it difficult to train the deep network. The core idea of ResNet is to use residual connections, which allow information to be passed directly from one layer to the next by jumping directly across layers, thus solving the vanishing gradient and exploding gradient problems in deep neural networks. ResNet trains the network by minimizing the loss of the residual function. This training method enables the network to learn the change of the residual, to better adapt to the complex data distribution. An important variant of ResNet is ResNet-34, which consists of a 34-layer network with multiple residual blocks and a global average pooling layer, and a fully connected layer is added at the end for classification. ResNet-34 has achieved good performance on the ImageNet dataset and has become one of the classical models in deep learning.

Vision transformer

Vision Transformer (ViT) is a deep learning model based on the Transformer architecture that can be used to deal with computer vision tasks [44]. In the past, convolutional neural networks (CNNs) were frequently used for computer vision tasks. However, ViT introduces a new approach by utilizing the Transformer framework for image processing. The ViT's feature extraction network is comprised of two primary components: Patch Embedding and Transformer encoder. Patch Embedding breaks down the input image into fixed-size patches, and converts each patch into a vector representation. The vector obtained from Patch Embedding is then fed into the Transformer encoder as input data. A Transformer encoder is composed of several identical Transformer encoder layers, each of which has a Multi-Head Self-Attention and a Feed-Forward Neural Network. The Transformer encoder's primary role is to encode and combine the input sequence to capture the global dependencies present within the sequence. The attention mechanism of the Transformer encoder plays a crucial role in optimizing the entire model, including the feature extraction network and the Transformer encoder parameters, during model training. This is achieved through the calculation of the loss function and backpropagation. By doing so, the model can learn to represent and encode features more effectively, leading to improved model performance.

Random forest regressor

Random Forest belongs to bagging, which belongs to ensemble learning, and ensemble learning can be roughly divided into Bagging, boosting, and stacking [45]. Ensemble learning involves training several weak models and combining them to create a strong model. The performance of the strong model is generally superior to that of a single weak model. Decision trees, SVMs, and other models can serve as weak models. The algorithm is based on the integration of multiple CART decision trees to predict regression. During the training process, multiple sample points are randomly selected from the training sample set S to obtain several new sub-training sets. Each CART decision tree is trained with a sub-training set. During the training process, the segmentation rule for each node is to randomly select k features from all features, and then choose the optimal cut point from the k features to divide the left and right subtrees. The randomness of Random Forest is reflected in two aspects: (i) the randomness of samples, a certain number of samples are randomly selected from the training set as the root node samples of each regression tree; (ii) Feature randomness: when building each regression tree, a certain number of candidate features are randomly selected, and the most appropriate feature is selected as the split node. Therefore, the Random Forest regression algorithm has good robustness and generalization ability and performs well for regression problems with large-scale data and high-dimensional features.

XGboost regressor

XGBoost is a machine learning algorithm that uses gradient boosting and forward distribution algorithm for training. In 2016, Tianqi Chen formally proposed in the paper [46]. The XGBoost objective function comprises of a loss function and a regularization term that can efficiently regulate the complexity of the model and minimize overfitting risks. XGBoost is an ensemble learning technique that creates a powerful classifier by combining multiple weak classifiers. During the training process, the gradient boosting algorithm is used to train several decision trees iteratively. Each iteration trains a new decision tree based on the residual of the previous round. XGBoost can also help users select features and interpret models by evaluating the importance of features through indicators such as the number of uses of features or the gain of split points. And the ability to automatically handle missing values in features without additional processing for missing values. Due to the above advantages, the XGBoost algorithm performs well on large-scale datasets and is widely used in data mining, machine learning, and data science, especially for classification and regression problems.

Training and fine-tuning

The input data have a dimension of 2 × 500, with 500 sampling points on the Y-axis and X-axis, respectively. To prepare the data for network training, the one-dimensional curve is first converted into an array, and then the array is transformed into a PyTorch tensor. The dataset is split into two parts—80% for training and 20% for validation. This is done using PyTorch's random_split module. The deep learning classification algorithms are trained from scratch, without using pre-trained models or freezing the backbone feature extraction network model parameters. During training, the batch size is set to 10. The learning rate is gradually decreased using the cosine annealing method, where the learning rate range is from 0.0005 to 0.1. Stochastic Gradient Descent (SGD) with a momentum of 0.9 and weight decay of 0.0005 is used as the optimizer. The training process runs for a total of 200 epochs. The regression algorithms, Random Forest and XGBoost, are trained using the 10-fold cross-validation method with a random state value of 42. The k-fold cross-training method allows for each data to be alternately used as the training set and validation set, ensuring its participation in the model iteration process. This approach has an advantage in that it ensures a more comprehensive training of the model.

The fine-tuning principle in deep learning classification algorithms involves optimizing a pre-trained model. To do this, we must first load the weight parameters of four deep learning algorithms that were trained using simulated SAXS data. Then, we freeze most of the parameters of the pre-trained model's backbone feature extraction network, as it has already learned a vast amount of SAXS data features from the dataset; thus, we need to retain the parameters of the feature extraction network. Finally, a small amount of experimental data is used to train the fully connected layer network at the end of the deep learning algorithm. In this process, the parameters of the fully connected layer are updated to adapt to the classification task of SAXS experimental data. During the process of fine-tuning, the learning rate is usually set lower than the rate used in training the model from scratch. In this case, we set the learning rate to a value between 0.00001 and 0.01. The backbone network was trained for 50 epochs while kept frozen, and the other parameters remained the same as when training the model from scratch.

Evaluation method

Methods for evaluating classification algorithms

The confusion matrix is a commonly used tool to evaluate classification algorithms. It helps to visualize and analyze the performance of the model by showing the correspondence between the predicted and true class in the classification results. This enables us to understand how well the model performs on different categories. The confusion matrix consists of two axes: the horizontal axis represents the predicted results of the neural network, and the vertical axis represents the labels of the in situ data used to evaluate the algorithm. When the predicted result matches the actual label, it is considered a correct prediction, and this is shown in the blue square. The concentration of confusion matrix in the downward-sloping dark square area indicates a better training effect of the neural network. The confusion matrix can also be used to optimize models. Common metrics used to evaluate machine learning models can be calculated from the confusion matrix. These metrics include accuracy, precision, recall, and F1-score, which can be calculated as follows:

Accuracy is the proportion of samples correctly predicted by the model versus the total number of samples:

$${\text{accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}}$$

Precision is the ratio of the number of samples predicted by the model to be positive and actually being positive to the number of samples predicted by the model to be positive:

$${\text{precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}}$$

Recall is the ratio of the number of samples that the model predicted to be positive and were actually positive to the number of samples that were actually positive:

$${\text{recall}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$

The F1 score is the harmonic mean of precision and recall and is used to balance the two in a single metric:

$$F1{\text{ - score}} = 2 \times \frac{{{\text{precision}} \times {\text{recall}}}}{{{\text{precision}} + {\text{recall}}}}$$

TP: True Positive, where the classifier predicts a positive sample and the actual sample is also positive, that is, the number of positive samples that are correctly identified. FP: False Positive, the classifier predicted positive samples, but the actual negative samples, that is, the number of negative samples that were false positives. TN: True Negative, where the classifier predicts a negative sample, and the actual negative sample, that is, the number of negative samples that are correctly identified. FN: False Negative, the classifier predicted negative samples, but the actual positive samples, that is, the number of missed positive samples.

Methods for evaluating regression algorithms

A regression algorithm is typically evaluated based on the magnitude of its predictions' deviation from the actual outcomes. As a result, the most commonly used evaluation metrics in regression algorithms include mean squared error, mean absolute error, and coefficient of determination.

The mean-square error (MSE) is a measure of the difference between the estimator and the estimate, which can be written as follows:

$${\text{MSE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {\hat{y}_{i} - y_{i} } \right)^{2} }$$

Mean Absolute Error (MAE) is also called “average deviation”. When the number of units in the population is N, the difference between each variable and the population average is called deviation. The formula is as follows:

$${\text{MAE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {y_{i} - \hat{y}_{i} } \right|}$$

The coefficient of determinability (R2) is a measure of fit for a linear regression model. This statistic represents the percentage of variance that the independent variables together explain in the dependent variable. R2 score of 1 indicates that the predicted value and the true value in the sample are completely identical, indicating that the independent variable in the regression analysis explains the dependent variable perfectly. R2 score = 0, where the numerator is equal to the denominator, and the predicted value for each sample is equal to the mean. The formula is as follows:

$$R^{2} = 1 - \frac{{\sum\limits_{i = 1}^{n} {\left( {y_{i} - \hat{y}_{i} } \right)^{2} } }}{{\sum\limits_{i = 1}^{n} {\left( {y_{i} - y^{2} } \right)^{2} } }}$$

Experimental methods

The SAXS data of nanoparticles with an estimated size below 100 nm were collected at the 1W2A SAXS experimental station of Beijing Synchrotron Radiation Source (BSRF), where the X-ray incident wavelength is 1.54 angstroms. A 2D area detector: Pilatus 1 M was used to collect SAXS intensity. The SAXS experimental data of nanoparticles with an estimated size over 100 nm were collected at the BL10U1-time-resolved Ultra Small Angle X-ray Scattering (USAXS) experimental station of the Shanghai Synchrotron Radiation Source (SSRF). The X-ray incident wavelength of the experimental station is 1.24 angstroms, and the detector used is Eiger 4 M.

There are 15 experimental samples used to evaluate the machine learning model. Among which, samples NO. 1 to 7 are self-assembly strings of CoOOH nanosheets (Fig. 6a shows the SEM image of the sample named Co-lamellar-6h) prepared for directional polymerization and CoOOH nanosheets prepared through hydrothermal for different durations [47]; Samples NO. 11 to 14 are fluorescent polystyrene microspheres synthesized in our lab, with nominal sizes of 80 nm, 110 nm, 215 nm and 380 nm (Fig. 6d shows the SEM image of the sample named FPM-sphere-380 nm), respectively. The remaining four samples were purchased from XFNONA. Samples NO. 8–10 were gold nanorods with aspect ratios of 2:1, 4:1, and 6:1 (Fig. 6c shows the SEM image of the gold nanoparticle with the aspect ratio of 6:1), and sample NO. 15 was Prussian blue parallelepiped nanoparticle (the SEM results shown in Fig. 6b). Before conducting SAXS and USAXS experiments, the solution samples must be dispersed using ultrasound. It is very important to pay attention to whether the ultrasonic dispersion will alter the structure of the samples. During the experiment, the solution sample needs to be placed on a propriate sample holder. It is also necessary to collect the background scattering for data processing. The softwares used to process experimental data for SAXS and USAXS are FIT2D and RAW to obtain the 1D scattering curve.

Result and discussion

There are some differences between the theoretically simulated scattering data and the experimental data. Therefore, the evaluation of the classification algorithm and the numerical regression algorithm used in this study will be divided into two stages. In the first stage, the theoretical simulation data are used to train the model, and a part of the simulation data are generated for testing the model. During second stage, the model is trained on simulated data and tested on experimental data.

Classification of nanoparticle shapes

The classification algorithm employs four different algorithms: SAXSNET, ResNet-18, ResNet-34, and ViT. The number of parameters for each algorithm are 14.4 M, 34.8 M, 464.7 M, and 44.0 M, respectively. Figure 1 shows the statistical graph of the training accuracy for four algorithms that were trained for a total of 200 epochs. The training accuracy for SAXSNET was 98.7%, with a validation accuracy of 96.2%. For ResNet-18, the training accuracy was 98.9%, and the validation accuracy was 97.6%. The ResNet-34 model had a training accuracy of 99.4% and a validation accuracy of 98.6%; while, ViT achieved 97.8% training accuracy and 94.6% validation accuracy. It also can be found that ResNet-18 and ResNet-34 algorithms converge faster in the training process, reaching a very high accuracy after 50 epochs, and both the training accuracy and the validation accuracy are almost stable above 90.0% after 75 epochs. By generating simulated theoretical data training sets and validation sets for training four different machine learning algorithms, relatively stable and repeatable training results were obtained. The training time of SAXSNET, ResNet-18, ResNet-34, and ViT is 16 h, 25 h, 56 h, and 151 h, respectively. Although ViT has fewer parameters than ResNet-34, it takes the highest training time. The reason is that ViT contains a large number of attention heads and fully connected layers, which leads to a large number of matrix multiplies and attention computations during training..

Fig. 1
figure 1

Training accuracy versus validation accuracy for four deep learning models, graphs are automatically generated during training

After completing the training of a machine learning model, it is usually necessary to verify the training effect of the model using a test set of simulated data (Fig. 2). It can be found that the four neural network models perform very well in the validation dataset randomly generated using Python code, and most of the data are correctly classified. The confusion matrix can comprehensively evaluate the training effect of the classification algorithm model. In Fig. 1, the accuracy can reflect the overall performance of the model, but it may cover up the shortcomings of the model in specific categories. For example, in Fig. 2, all four algorithms appear to predict the real particle shape as a parallelepiped, and the result is a cylinder. And the case where the real particle shape is a cylinder and the result is predicted to be a parallelepiped. This is because in the actual experiment, when the aspect ratio of the cylindrical particle is small, it is relatively similar to the SAXS signal of the parallelepiped. In the process of scanning electron microscopy (SEM), we found that the two kinds of particles are not easy to distinguish, and usually the experimental sample can only subjectively think which kind of particle is closer to. It depends on the experience of  researchers, and different researchers may come to different conclusions, so it is necessary to use machine learning classification methods to exclude the interference of human subjective factors.

Fig. 2
figure 2

All four confusion matrices reflect that the machine learning algorithm has confusion when classifying cylinders and parallelepiped in nanoparticles, which is determined by the physical characteristics of nanoparticles, and the scattering signal of cylinders with shorter lengths is closer to that of parallelepiped

Metrics for evaluating machine learning models such as Accuracy, Precision, Recall, and F1-score can be calculated from the confusion matrix (in Table 1). Accuracy is the most direct indicator to measure the performance of the classification model, which reflects the overall performance of the model in the validation data set. The higher the Precision, the higher the accuracy of the model in predicting the positive class (TP), that is, the less the model incorrectly predicts the negative class (TN) samples as the positive class. The recall rate mainly measures the coverage degree of the model for the positive class samples. The higher the recall rate, the better the model can capture the actual positive class samples, that is, the less the model can mispredict the actual positive class samples as the negative class. The F1 score refers to the harmonic mean of precision and recall, which reflects the balance between precision and recall of the model. When the precision and recall of the model are high, the F1 score is also high. From Table 1, it can be found that the four indicators of Accuracy, Precision, Recall, and F1-score have achieved very high scores, so we can see that, after training with simulated SAXS datasets, SAXSNET, ResNet-18, ResNet-34, ViT four algorithms have very good performance on the whole, and can handle the SAXS data classification task of different shape nanoparticles well.

Table 1 Accuracy, Precision, Recall, and F1-score of the four deep learning models as shown in the table, they all achieved very high scores in the four measures

After the training and validation of the four neural networks, we used the experimental data to evaluate the four neural network algorithms. The experimental data included the following in situ experimental processes: (i) the cylinder gradually grows into lamellar; (ii) the cylinder-like particles gradually become longer along the radial direction; (iii) The spherical particles continue to grow. The machine learning model trained on simulated data has less than 30% accuracy when predicting experimental samples because there is a large difference between simulated and experimental data, so the fully connected layers of the neural network need to be fine-tuned with a small amount of experimental data. After the fully connected layer is fine-tuned, the accuracy of the four neural network algorithms is greatly improved and the shape of the experimental sample can be accurately recognized. In the group of samples from Co-lamellar-1 min to Co-lamellar-9 h, it can be noticed that the nanoparticles of the sample of Co-lamellar-2 min contain cylinder and lamellar two shapes. The reason is that in the process of the sample changing from cylinder to lamellar, the lamellar starts to appear at 2 min, and it is in a mixed state at that time. ViT algorithm identified the samples in the mixed state of the cylinder and lamellar as lamellar, and the other three algorithms identified them as cylinders. After observing Fig. 3b of the electron microscope, we found that the lamellar appeared at the beginning of this mixed state, and the proportion of the cylinder was greater. It can be seen that our deep learning algorithm and data set have certain limitations, which only involve the classification of a single category of the sample and do not consider the mixing state of the sample. Therefore, the results of predicting the mixing state of multi-shape nanoparticles are not very ideal and need to be continuously improved in the future (Table 2).

Fig. 3
figure 3

ad SEM pictures of samples undergoing hydrothermal reaction for 1 min, 2 min, 5 min, and 9 h, respectively. It can be visually observed that at 2 min, the cylinder nanoparticles begin to degenerate into an approximate needle shape. Lamellar nanoparticles begin to appear and the scale is 100 nm in the lower right corner of the SEM image

Table 2 The prediction results of the four deep learning models on the experimental samples

Prediction of physical parameters of nanoparticles

In the pipeline segment of deep learning algorithms applied to SAXS data processing, after the classification algorithm completes the shape classification, the regression algorithm will be called to predict the size of nanoparticle from the SAXS data. The regression algorithm uses Random Forest and XGboost regression algorithms, and the data input to the network for training is obtained from the simulated one-dimensional curve preprocessed by the Principal Component Analysis (PCA) algorithm. Using PCA algorithm to reduce dimensionality is to retain the most important features of high-latitude data, remove noise, save computing costs and time, and improve training efficiency. The labels used for supervised training are the size of the nanoparticles. The simulated data of four shapes of nanoparticles were used for training. In the Python script developed by us, the corresponding regression model was called to predict the size of nanoparticles after predicting the shape of nanoparticles. The performance evaluation criteria of the regression algorithm used MSE, MAE, and R2. The definition of these three evaluation criteria can be referred to the introduction in the previous chapter. Generally speaking, the smaller of the MSE and MAE values, the better; and the R2 value closer to 1, the better. By comparing the results of Figs. 4 and 5, it can be found that the prediction error of cylinder and lamellar shape is larger than that of sphere and parallelepiped. At the same time, the XGboost Regressor regression algorithm has slightly better prediction results than the Random Forest regression algorithm.

Fig. 4
figure 4

Prediction results of XGboost regression algorithm

Fig. 5
figure 5

Prediction results of Random Forest regression algorithm

Then the SEM images of the experimental samples were collected, and the ImageJ software was used to calculate the particle size distribution in the SEM images. The prediction results of two regression algorithms, Random Forest and XGboost, are shown in Table 3, and the prediction results are very close to the average size of nanoparticles calculated from SEM pictures. Combined with Fig. 3b, it can be found that at 2 min, the sample originally with cylinder-like particles began to appear shape of lamellar particles, and the shape of lamellar particles stacked on top of each other. In the SEM images at 5 min, the shape of nanoparticles was all lamellar. From 5 min to 9 h, the shape of the nanoparticles is lamellar, and only the size of the nanoparticles is increasing. The results of Random Forest and XGboost regression algorithms also show that the size of lamellar nanoparticles is increasing. It can also be found from Table 3 that the SEM statistical size of fluorescent polystyrene microspheres with four sizes of 80, 110, 215, and 380 nm is smaller than the nominal size, the relative errors, indicated in parentheses, exhibit a more pronounced magnitude. In general the statistical size of SEM pictures is usually smaller than the real size of nanoparticles, because the field of view of SEM is relatively narrow, and the statistical performance is not strong enough. Therefore, we performed dynamic light scattering (DLS) tests on these four samples, and the results are shown in Table 4, where the results are the average values of three consecutive tests. DLS is a relatively statistical technique, that can reflect the overall situation of spherical nanoparticles. But DLS tests the hydration radius of nanoparticles, the results will be slightly larger than the real sizes of nanoparticles. After combining the test results of SEM and DLS, it can be observed that the actual sizes of the 80 nm and 110 nm samples are essentially consistent with their nominal sizes. Additionally, both regression algorithms yield predicted values with less error compared to DLS results. For the nominal 215 nm sample, the predicted value of the two regression algorithms is also between the two experimental values, but its DLS test value is 33 nm larger than the nominal size. However, the experimental results of SEM and DLS of the nominal 380 nm sample are very close, which sare both about 355 nm, far from the nominal 380 nm, and the predicted values of the two regression algorithms are also around 355 nm.

Table 3 Comparison between SEM results and prediction results of Random Forest and XGboost algorithm
Table 4 Comparison between DLS results and prediction results of Random Forest and XGboost algorithm. The percentage values in parentheses represent the discrepancies between the regression algorithm's predictions and the DLS results

Discussion

In fact, the experimental samples of nanomaterials utilized in real research often comprise not only nanoparticles of singular morphology, but also a mixture of nanoparticles exhibiting diverse shapes or even lacking any distinct shape, particularly prevalent in polydisperse systems. For instance, in Fig. 6, we selected four typical nanoparticle samples that closely resemble a monodisperse system; however, they still have size distributions (as shown in Fig. 6e–f). The deep learning model demonstrates a favorable predictive performance for the single scattered system and the polydisperse system with a relatively uniform shape. However, its accuracy in predicting the polydisperse system with a mixture of diverse shapes is comparatively lower. Even when researchers analyze SEM image data, it remains challenging to categorize it as a distinct and singular shape. The selection of experimental samples should closely resemble a monodisperse system, while avoiding the use of overly complex polydisperse samples that may hinder the verification of deep learning algorithm predictions. The experimental results show that the samples obtained from the hydrothermal reaction for 2 min consist of both cylinder and lamellar structures, which leads to variations in the prediction outcomes of the four algorithms. The algorithm accurately detected a change in the shape of the nanoparticles during the 5 min sample prediction, demonstrating the suitability of the deep learning algorithm for in situ SAXS experiments. The SEM is a widely employed experimental technique for nanoparticle shape characterization. However, during in situ SAXS experiments, SEM cannot be utilized to characterize the sample. If SEM characterization is performed after the SAXS experiment, it will result in additional time consumption for the researchers. The existing software for processing SAXS experimental data integrates numerous functionalities, including shape calculation and fitting, which significantly enhances researchers' efficiency in handling experimental data. However, these softwares often lack the capability to process in situ data online in real-time or exhibit insufficient processing speed to meet the demands of high-throughput in situ SAXS data processing. In this study, we have developed a Python script that integrates deep learning classification and regression algorithms, establishing a streamlined pipeline segment for automated processing of high-throughput in situ SAXS experimental data.

Fig. 6
figure 6

Four typical shapes of nanoparticles and their size distribution statistics. The scale in the lower right corner of the SEM picture is 100 nm. a and e are the SEM results of the sample Co-lamellar-6h, b and f are the SEM results of the sample Prussian blue, c and g are the SEM results of the sample gold-cylinder-3, d and h are the SEM results of the sample FPM-sphere-380 nm

The accuracy of models trained on simulated SAXS data is relatively low when directly applied to predict experimental data. This can be attributed to the absence of noise from the experimental equipment in the simulated data, even when Gaussian distributions are incorporated. Consequently, the polydisperse observed in these simulations remains incomparable to that exhibited by real nanoparticles in experimental data. The four deep learning algorithms were fine-tuned using the previously accumulated experimental data, enabling the fine-tuned deep learning models to effectively generalize for classifying SAXS. When fine-tuning, the backbone feature extraction network remains frozen while only updating the parameters of the fully connected layer in the final stage of the network. This process preserves the feature extraction performance of the backbone feature extraction network and solely enhances the classification capability of the fully connected layer. The aforementioned training and validation results indicate that, with appropriate design and training, lightweight convolutional neural networks can achieve comparable performance to renowned open-source algorithms such as ResNet-18, ResNet-34, ViT. Moreover, these lightweight networks offer the added advantage of conserving computational resources. The aforementioned result, however, pertains solely to a specific task and does not imply its ability to match the performance of ResNet-18, ResNet-34, ViT on more intricate tasks. The researchers are increasingly recognizing the pivotal role of data quality in model performance, particularly in light of advancements in artificial intelligence. The deep learning algorithm enabled us to predict the shape of a sample with a time cost ranging from 50 to 100 ms, significantly reducing the processing time compared to skilled researcher who took approximately 6 to 8 min using data processing software for shape calculation. This improvement in efficiency amounts to thousands of times, making machine learning algorithms particularly suitable for batch processing high-throughput in situ characterization data. For instance, at a forthcoming high energy photon source facility, the exposure duration for a SAXS image is 100 ms and can be processed almost instantaneously through machine learning algorithms, enabling the researcher to obtain experimental outcomes upon completion of the experiment. Thanks to the high brightness of the high energy photon source facility, the exposure time can be further reduced, enabling real-time dynamic characterization of samples. This process generates a substantial amount of experimental data that cannot be effectively processed using traditional methods.

Conclusion

This studies were designed to address the challenges of high-throughput experimental data processing arising from the construction of high-energy synchrotron radiation facilities. To initially address this problem, we developed a pipeline segment for nanoparticle SAXS experimental data analysis using deep learning classification and regression algorithms, in which we focused specifically on shape classification and size prediction. Based on the SAXS theoretical formula, using Python code to generate batch SAXS simulation data solves the problem of lack of open-source synchrotron radiation database for training deep learning algorithms. SAXSNET, a convolutional neural network for nanoparticle morphology classification, was constructed using the PyTorch framework and compared with mature open-source algorithms such as ResNet-18, ResNet-34, and ViT. The results show that SAXSNET is comparable to existing open-source algorithms in resolving SAXS data of nanoparticles of different shapes, reaching an accuracy of more than 96%. However, SAXSNET algorithm has fewer network parameters, which can save computing resources and improve the training speed and prediction speed. By optimizing the framework and reducing the number of parameters, the speed of training and prediction is greatly improved without reducing the accuracy by using SAXSNET, which effectively solves the problem of accurately classifying high-throughput nanoparticle SAXS data using convolutional neural networks. By comparing with SEM, DLS and RAW software data, the feasibility of Random Forest and XGboost regression algorithm to predict the size of nanoparticles was verified. The SAXS data processing pipeline segment based on machine learning algorithm developed in this study not only realizes the automatic processing of terrain classification and size prediction of high-throughput SAXS data, but also eliminates the influence of subjective factors. In addition, the problem of having to switch between different data processing software in the existing data processing workflow is solved.

The present research still needs to be further improved and extended. For example, SAXS data classification of nanoparticles is currently limited to four shapes: cylinders, parallelepipeds, lamellar, and spheres. However, future work will incorporate more categories to enhance predictions of nanoparticle shape. After processing the experimental data, we found that most of the samples showed a certain degree of particle size distribution among the nanoparticles. Therefore, our future research will focus on using machine learning algorithms to accurately predict the size distribution coefficients of these nanoparticles. In the future, we also hope to use multimodal machine learning algorithms to process various synchrotron radiation data simultaneously. The ultimate goal is to leverage deep learning algorithms to build a pipeline of automated control experiments that seamlessly integrate data acquisition, processing, and feedback control. This will meet the real-time batch processing requirements of high-energy photon sources for high-throughput experimental data.