1 Introduction

It is appropriate to state that advances in materials science shape not only our daily lives but also promote growth. Materials are now inseparable with progress. This makes the search of new materials a contemporary and critical subject in material science. To accelerate the discovery and design process for new materials, various computational approaches have been introduced, in tandem with the experimental processes.

The mechanical and physical properties of materials largely depend on the grain distribution, shape and the size of the microstructure constituents. Thus, identification, classification and quantification of the microstructural constituents is important to establish the structure property correlation of a specific material.

To accelerate the manufacturing process, many industries are recently taking interest in automation by using the application of computer vision and image processing, which further enables cost-effective design of materials to achieve targeted properties [1,2,3,4,5]. Microstructure modeling can be effective in introducing automation or feedback based control of different processes such as deformation processing and heat treatment [1]. The previously mentioned approach predicts the microstructural features, including volume fraction of phases during heat treatment of steels. Earlier, optical microscopy was the only available technique for microstructure analysis. However, in recent times, various image-processing techniques have been developed to analyze the same [5]. Due to their (microstructural) recursive and consistent nature, computer vision/image processing based approaches are attracting increasing attention for the interpretation of data and analysis of microstructural images [6]. With time, computational image processing techniques are getting faster than traditional approaches with comparable reliability in understanding the microstructure [7]. An automated image processing system with pre-processed images may be immensely helpful to boost up the microstructure-process-property correlation (Fig. 1).

Fig. 1
figure 1

Block diagram of how to classification of different phases of steel using deep learning model

During the last two decades, computational approaches, based on multi-scale physical principles, have advanced rapidly. At the same time, constraints in understanding and use of the physical principles in multi-component, multiscale and multi parameter scenario has paved the path of data driven techniques, like machine learning, in modeling, simulating and optimization of the complex and robust systems. In this context, machine/deep learning techniques have gained immense popularity in recent time, in a wide range of application field like engineering, finance, business, transport etc. It is therefore, worthy to examine the application of efficient machine/deep learning techniques in exploring the materials problem of scientific and industrial interest.

In an effort to predict the phases and crystal structures in Multi Principal Element alloys (MPEA) correlations among five key features of the constituent elements has been studied.

Another exercise has demonstrated the development of an effective image analysis framework for classification of steel microstructures using deep learning methods without the need of separate segmentation and feature extraction mechanism. Finally, a computational method has been developed to classify the different process roots of steel constituted by composition and properties. This study provides a model to predict the steel processing method based on experimental data on composition, process route and properties.

In essence, the present effort is an attempt to establish the accelerated process for designing exotic materials and process by exploiting the knowledge and information hidden in the huge database available from earlier research and industrial practices.

2 Different Applications Areas of Materials Engineering

2.1 Multi-Component Alloys

The concept of multi-principal element alloys (MPEAs) is helpful in optimizing a set of properties while retaining the characteristic properties of the multi-principal elements, which makes such alloys more useful [8, 9]. The conventional metal alloys have the disadvantage that they suffer from a trade-off between strength and toughness. MPEAs are important because they have the ability to exhibit superior mechanical properties compared to conventional metal alloys. The common phases in the case of MPEAs are single-phase solid solution, amorphous, intermetallic compounds (IM), and combined SS and IM phases [10]. MPEAs exist in different phases enables to target either the strength or toughness in order to yield excellent mechanical properties. In order to design MPEAs, a large and varied composition space is available that has not been explored exhaustively to date, and thus the introduction of a rapid screening technique is necessary to select the accurate compositions to yield the best balance of properties. Therefore, phase selection is an important analysis of the pathway to design new MPEAs for various purposes and applications. The challenge is the obscurity in the mechanism of formation of different phases in MPEAs, making it difficult to predict the phase selection in a MPEA [10,11,12,13,14,15]. The conventional method of phase selection is mostly parametric which then leads to the empirical rules for the same. For example, the mixing enthalpy (ΔHmix) and atomic radius difference (δ) is supposed to be in certain ranges (− 15 < ΔHmix< 5 kJ mol − 1; 1% < δ < 5%) suitable for the formation of an SS phase. These parametric approaches extends the Hume-Rother rule, which states that the formation of binary alloy, with SS is affected by the size, crystal structure, valency and electronegativity [15,16,17,18,19,20,21,22,23,24,25]. The fact that the phase selection depends on more than three parameters, limits the predictive ability by the parametric approaches because such selections cannot be visualized manually because of its complexity and size variations (Table 1).

Table 1 MPEAs elements and its corresponding values with phases

Earlier, Raghavan et al. [16] have analyzed the phase formation in multi-component alloys. An attempt has been made to forecast phase formation using a CALPHAD-based approach for a wide range of compositions. The stable phase is supposed to be the initial phase, which is formed after cooling from the liquid state with the highest driving force.

Sheng Guo, et al. [14] have studied high mixing entropy and found that this was not the only factor that controls the solid solution formation in equiatomic multi-component alloys. Other determining factors are mixing enthalpy (ΔHmix), atomic size difference (δ) and mixing entropy etc.

Lilensten et al. [17] suggested a detailed analysis of the deformation mechanism of a quaternary BCC-MPEA at room temperature. To achieve reproducible results, all the analyses were performed with recrystallized microstructure.

Islam et al. [26] have studied the correlations between five features corresponding to valence electron concentration (VEC), difference in the Pauling negativities (Δχ), atomic size difference (δ), mixing enthalpy (ΔHmix), and mixing entropy (ΔSmix) that lead to the phase selection in a dataset with 118 data of MPEAs using artificial neural networks.

Huang et al. [27] have studied the phase prediction from high entropy alloys. Using machine learning this group presents an alternative path to discover the phases from new HEAs. However, reliable criteria for depicting the evolution of particular phase in a given compositional space is still awaited.

2.2 Identification and Quantification of Phases in Microstructures of Steel

In the endeavor of quantitative correlation of microstructure and property of steels, attempts has been made to employ image processing technique to identify and quantify the phases in steel microstructures.

Kesireddy et al. [28] have investigated the effectiveness of training a neural network to recognize the phases like pearlite, ferrite, martensite, and cementite using digital image processing. This model is useful for the phase segmentation, but quantitate analysis of phases was not attempted in their study.

Banerjee et al. [29] proposed a novel scheme for automatic extraction of the phases from microscopic image of dual phase steel. They have used various image processing techniques such as thresholding and edge detection along with Olysia software. However, segmentation of noisy images has not been emphasized in their approach.

Gupta et al. [30] have presented the processing and refinement of steel microstructure images for assisting the process of computerized heat treatment of plain carbon steel. The proposed refinement of steel microstructure images is aimed to enable computer-aided simulations of heat treatment of plain carbon steel, in time and cost-efficient manner; hence beneficial for the materials and metallurgical industry [31].

In one of the attempt of grain boundary detection, Alysson et al. have proposed a model using image processing techniques to determine average grain size [32]. Nowadays, a high-end computational approach has been introduced, called deep learning, to overcome the complex problems of the traditional approach in a faster and more accurate way [33].

Deep learning based grain boundary segmentation approach for steel microstructure has earlier been discussed in literature [34]. Decost et al. [35] have conducted a study using deep learning on high throughput quantitative metallography for complex microstructures having twenty-four ultra-high carbon steel images.

However, the inherent limitation is that it is limited to empirical methods and is not suitable to be utilised in classifying various phases in steel having two or more than two phases [36]. Any effort towards recognizing phases in multiphase steel relies on morphological or crystallographic properties [37,38,39,40,41,42,43]. A method provided by Pauly et al. employed data mining techniques; they have proposed a technique for extracting morphological features and a feature classification step using support vector machine [44, 45]. This method was applied on a chemically etched micrographic dataset of steel collected by scanning electron and optical micrographs. Though the method yieled reasonably realiable results, it could achive only 50% accuracy in classifying microstructures. this might be due to high complexity of substructures (Fig. 2).

Fig. 2
figure 2

Flowchart of computer vision based phase segmentation methodology [cf. [53] ]

In an attempt to overcome the limitation of conventional image processing and analysis techniques, deep learning has gained significant attention in object classification and image segmentation for different applications using AlexNet [46, 47]. There are other convolution neural network (CNN) as well, which work better than AlexNet, e.g., VGGNet, ResNet having more layers than AlexNet are capable to achieve better accuracy [48]. When it comes to the task of segmentation, a tweaked version of CNNs, proposed by Long et al. [49], is used like fully convolution neural network (FCNN), to employ classification using semantic segmentation. Currently, FCNNs are the popular trend with consistant efforts on devising ways to extend it and approach towards the higher benchmarks of image segmentation [50,51,52] (Table 2).

Table 2 Microstructure boundary identification using different image processing techniques [53]

2.3 Composition-Process-Property Correlation

The determination of material processing method is of great significance with respect to performance of steel as variation in processing route even for a specific grade of steel causes different microstructures, which in turn is capable of influencing properties [54]. During the manufacturing process, materials and process parameters are to be controlled and hence, these are ideally desired to be input variables for determining the process. In the present scenario, a large number of research initiatives are being undertaken to predict steel processing mechanism, mechanical properties of steel using various computational methodologies, especially based on artificial neural network, pattern recognition etc. [55,56,57,58,59,60,61,62,63,64].

In 2009, Brahme and Winning have designed an artificial neural-network based prediction model of cold rolling textures from steel which is used to predict fiber texture using texture intensities, carbon content, carbide and amount of rolling reduction [65].

Similarly, Simecek and Hajduk have developed MECHP tool to predict mechanical properties of hot rolled steel product, which measured process data like water-cooling and subsequent air-cooling of hot rolled narrow plate and wire [66].

Zhi Xu et al. [67] have presented a study using CNN model with the optimal structure of metallurgical phenomena in the steel rolling processes is described.

3 Machine Learning Techniques

Metallurgical research and industrial activities generate a huge volume of database, which may be effectively utilized for extracting quantitative and qualitative knowledge. It may be noted that during the last few decades, metallurgical processes are increasingly utilizing the computational techniques. While the concept of multi-principal element alloys is becoming popular, it is essential to develop trustable tools, which can perform a reliable prediction of the various phases those evolves in MPEAs depending on various parameters like mixing entropy, mixing enthalpy, etc.

Microstructural image analysis has also been found to be effective to delineate its phases, which may be helpful in identifying the volume fraction, area fraction, grain boundary detection etc. In this endeavor, digital image processing has been widely used in the case of metallographic images. However, in recent times, the deep learning concept is widely used for faster calculations.

Similarly, the selection of various processing routes for manufacturing steels i.e. hot rolling, cold drawing, annealing, spheroidizing also directly/indirectly decided by the compositions (e.g. C, Mn, P etc.) and mechanical properties (e.g. yield strength, tensile strength, elongation, etc.). Hence, a reliable model based on composition-process-property correlation may be useful to predict the appropriate process schedule for achieving the target properties for a given compositional space.

3.1 Machine Learning Based Prediction of Phases and Crystal Structure in Multi-Components Alloys

3.1.1 Description of Computational Scheme

The current study deals with the selection of phases from MPEAs and the crystal structure prediction if solid solutions. The steps involves are as follows:

  1. i.

    Collection of datasets from various literature.

  2. ii.

    Splitting of the dataset into two parts, for training and testing purpose.

  3. iii.

    Selection of different classifier to select the best option.

  4. iv.

    Training the model with training data to learning the trends of the data sample.

  5. v.

    Based on the training, testing of the model performance

  6. vi.

    Finally, calculation of accuracy with the help of confusion matrix.

3.1.2 Description of Computational Tools (Machine Learning)

Machine learning (ML) finds its origin from the late 80 s as an important tool for optimization. Derived from artificial intelligence (AI) in 1960 as an ally for expert systems. The branch of machine learning finds its notable achievements from various applications like speech and word recognition system [68], autonomous car driving systems [69], backgammon playing etc. Recent studies indicates that machine learning is a major innovative driving force, which will gain impetus for a technological revolution in the coming decades [70]. Supervised machine learning is a branch of machine learning that is used for performing classification and regression tasks on labeled data [71, 72]. The machine learning algorithms learn from examples in the way animals learn. The machines are conditioned with virtual rewards in place of treatments. The virtual rewards are given when the machine makes a correct decision and vice versa. A simple rule has to be formulated by machine learning program that explains the functionality at its best and checks for various functions. The rewards are usually given when the expiations are valid and concrete for the given data. Machine learning algorithms deal with input and output spaces in the form of derivations and conclusions. The reliability of induction increases with number of the inputs that can be formulated for mathematical frameworks [73]. The machine learning algorithms automatically learn from automatic parameter adjustment done by data as per the given inputs. For high dimensional inputs, machine learning has been framed to be powerful and beneficial to eliminate inefficiency created by manual programming [74]. Recently, the concept of machine learning has gained reasonable acceptance in the domain in the area of physical sciences and material science. However, the applications of machine learning/deep learning is still limited with respect to material informatics [75].

For a machine learning algorithm, given is a training set of (xi,yi) where i is in a range from 1 to N, x is a d-dimensional variable that is given as input to function (say g) which maps x as input to y as output. The algorithms are then tested to check whether the function has correctly mapped x to y. The function (g) yielded by x should correctly reproduce the examples of testing set, different from the training set. There are three sub-divisional components of every machine-learning algorithm namely representation, evaluation, and optimization [76].

The purpose to use machine learning algorithm is to analysis the high volume of databases. Sometimes if the database size is smaller, the model might suffer from overfitting problem. An overfitting problem results from high training accuracy but very low testing accuracy. The overfitting analysis in terms of bias variance tradeoff is to be conducted to capture the data noise or “hallucinate patterns” [77]. A simple set of hypothesis results in algorithm yielding small variance but high bias [78]. This is the commonly followed principle that guides machine learning and can be used for quantitative methods called regularization [79]. Thus, it is guaranteed that if the hypothesis set a true function, the probability that machine learning algorithm returns poor hypothesis with reduced set of training data is high.

From the Fig. 3 it is observed that traditional models showing higher error rate for large datasets, which can be attributed to as high bias. The training set error converges to testing set quickly, where Ein is error in training set and Eout is error in testing set. This causes low variance. Complex models on the other hand, have a small error for large datasets (small bias), but their error on the training set Ein is meaningless for too few data points (high variance), and slowly converges towards the error on the testing set Eout for large datasets. In this study several machine learning model such as naïve bayes, support vector machine, K- nearest neighbor, decision tree and random forest, have been employed as are presented in the following sections.

Fig. 3
figure 3

Illustration of the bias-variance tradeoff [cf [11].]

Naïve Bayes Naïve Bayes (NB) classifier is a straightforward probabilistic classifier in view of applying Bayes’ [72, 80, 81]. These is a presumption-based classifier and these presumptions make the calculation of Bayesian order approach more productive, however this suspicion seriously restricts its appropriateness. The NB classifiers can be prepared very productively by using a moderately limited quantity of information to assess those parameters, which are vital for characterization. Since independent factors (variables) are assumed, as the differences of the factors for each class need to be decided and not the whole covariance grid. The advantages of the Bayes classifier is that it requires a limited quantity of training information to estimate parameters for classification [82]. In essence, of the general classifier is powerful enough to disregard genuine lacks in its fundamental of likelihood model. A naïve bayes model can be formulated as

$$ P (c_{i} |D) = \frac{{P\left( {c_{i} } \right)P(D|c_{1} )}}{P\left( D \right)} $$
(3.1)

where \( P (c_{i} |D) \) and \( P(D|c_{1} ) \) are the conditional or posterior probabilities and \( P\left( {c_{i} } \right) \) and \( P\left( D \right) \) are prior probabilities.

Support Vector Machine Support vector machines (SVMs) are the most up to date supervised machine learning strategy [83,84,85]. Working principle of SVM with an edge of training set invokes either side of a hyperplane (A hyperplane in an n-dimensional Euclidean space that divides the space into two disconnected parts) that isolates two information classes. Expanding the edge along these lines make the biggest conceivable distance between the isolating hyperplane, on either side, it has been demonstrated to diminish an upper bound on the normal speculation error. If the training set are linearly separable, then for W weight vector with bias b, (W,b) expressed as [86];

$$ X_{i } + b \ge 1, \forall X_{i} \in P $$
(3.2)
$$ \varvec{W}^{\varvec{t}} X_{i } + b \ge - 1, \forall X_{i} \in N $$
(3.3)
$$ f_{w,b } \left( X \right) = sgn\left( {\varvec{W}^{\varvec{t}} X_{i } + b} \right) $$
(3.4)

For linearly separable points, once the ideal isolating hyperplane is discovered, data points that lie on its edge are known as support vector points and the arrangement is known as a linear combination of just these points (see Fig. 4), other information focuses are overlooked.

Fig. 4
figure 4

Support vector machine with neighbor hyperplane

In soft margin, the inclusion of slack variable if the points are not linearly separable, then, w and b represented as

$$ L(w) = \frac{{||\vec{w}||^{2} }}{2} + C\left( {\sum\limits_{i = 1}^{N} {\xi_{i}^{k} } } \right) $$
(3.5)

K Nearest Neighbor K- nearest neighbor (K-NN) classifier is a nonparametric classifier that decides the class label of x#, dependent on the supposition with the equivalent class, found in closeness to one another, when a predictable proximity is utilized [72, 87,88,89]. This means that the class label of x# would be the same for the class label shared by its nearest neighbor say xi*. For a given distance matrix, e.g., in Euclidean distance d can be calculated as:

$$ d\left( {x_{i}^{*} ,x^{\# } } \right) = \left| {|x_{i}^{*} - x^{\# } } \right||_{2} , \forall i = 1,2, \ldots .,p $$
(3.6)

where, \( \left| {|x_{i}^{*} - x^{\# } } \right||_{2} , \) is l2 norm and is the number of instances. In general, the value of k depends on taking the majority vote of class labels among the k-nearest neighbors and weight the vote according to distance weight factor, w = 1/d2

Decision Tree Decision trees (DT) are based on underlying principles of building nonlinear decision frontiers with the help of linear separators that can be expressed in terms of hyperplanes [90, 91]. Considering the labelled data set (xn, yn),with n in the range from 1 to N. Figure 5 shows the example of the hyperplanes which explains the case of x having only two coordinates x1 and x2. The data labels re represented by colors, which are the associated values of y. The objective of this algorithm is separation of data points on the basis of labels. The example takes into consideration the finite value of labels, which is red and blue. Thus, the classification tasks just has two labels against values of x to be classified according to y.

Fig. 5
figure 5

Example of splits made by a decision tree. The splits are parallel to the axes, and try to separate as well as possible the blue dots from the red dots

The machine learning function is derived from the combination of all learned hyperplanes [92, 93]. Trees can be used to represent the constant piecewise function. Each node in the tree is associated with hyperplane (as shown in Fig. 5). Thus for all xn∈ ℝL, the function can be expressed as:

$$ f \left( x \right) = \mathop \sum \limits_{l = 1}^{L} a_{l} K_{l} \left( x \right) $$
(3.7)

where Kl is a subset of hyperplanes that are orthogonal to the canonical basis (i.e. one of the “boxes”), and K1…KLis a partition of ℝL. al represents the value of the label attributed to the “box” Kl.

The classification problem determines a1 by majority votes in K1. If consider the previous example, considering the box, and if there were more points with blue label as compared to red labels, the value associated with a will be blue. The label blue shall be assigned to any testing data point that shall fall within this box (hyperplane). Taking the sum of the K, total number of classes are expressed as:

$$ a_{l} = \arg max_{k = 1 \ldots K} \mathop \sum \limits_{{x_{n} \in K_{l} \left( x \right)}} 1_{{y_{n} = k}} $$
(3.8)

For defining the regression problem a1 is determined by the empirical mean of the points in K1:

$$ a_{l} \left( x \right) = \frac{1}{{\left| {K_{l} \left( x \right)} \right|}}\mathop \sum \limits_{{x_{n} \in K_{l} \left( x \right)}} y_{n} $$
(3.9)

Identification of points to split the axes for building, the hyperplanes is the most challenging task while building trees. The identification of split is recursive, which is one node after other. The best binary split has to be identified at each given node among a set of all possible given splits ti,τ,. Here i corresponds to the ith axis of x and τ corresponds to point of split. Values of τ can be chosen in different ways such as through histograms and regularly spaced points. A local loss function L is optimized to choose the best split within the given set. The loss function is calculated at each node and the entry parameters vary accordingly. Thus, the loss function turns out to be local such that at each node, for each split the current dataset S shall be split into two subsets left and right. The subsets are used for evaluation of local loss and split selection. The left and right subset selected at the previous nodes shall correspond to the next node for a given dataset S. The procedure is iterative, and it ends when the criteria required is reached, that is either the maximum depth of tree arrives, or the maximum number of leaves are covered. For the classification problems, the split choices involves minimization of the loss function derived from the impurity criterion G. The value of G can be the Ginni Index. Taking into consideration the data set S, that has k number of classes, G(S) is represented as:

$$ G\left( S \right) = \mathop \sum \limits_{k = 1, \ldots ,K} \rho_{k} \left( S \right)\left( {1 - \rho_{k} \left( S \right)} \right) $$
(3.10)
$$ \rho_{k} \left( S \right) = \frac{1}{N}\mathop \sum \limits_{n = 1}^{N} 1_{{y_{n} = k}} $$
(3.11)

The loss function for the corresponding Equation is as below:

$$ L\left( {t_{i,\tau ,} S} \right) = \frac{{N_{left} }}{N} G\left( {left\left( {S,i,\tau } \right)} \right) + \frac{{N_{right } }}{N} G\left( {right\left( {S,i,\tau } \right)} \right) $$
(3.12)

The given Equation has two subsets, left and right, separated by the binary split. There cardinals are ti,τ, and Nleft and Nright. The axis and τ chosen at a given node are (ˆi,τˆ) = arg min i,τ L (ti,τ,S).

Random Forest The random forest (RF) aims to reduce the problem of variance in decision trees by adding randomness to the construction of trees such that multiple trees are created at the same point of time [94]. The randomized construction is then averaged for the decision making. Randomness in creation of trees is based on consideration of two sources bagging (or boot strapping) and selecting only subset of the axes (or hyperplanes).

Bagging considers the growth of each tree by using only the subset of original training data set. Secondly, by selecting the only subset of hyperplane, while growing the tree, limits the split to smaller sets of axes. The small sets of axes are candidates, and represented by F, in some notations. The fact that variance of the random forest estimator is less than that of single decision tree can be easily proved from implementation [95, 96]. Generally, the random forest remain faster, as compared to several other machine-learning algorithms. In addition, the results derived from random forest classifiers are more accurate and thus they yield very good performance by tuning the hyper parameters.

3.2 Classification of Steel Micrographs using Deep Learning

3.2.1 Description of Dataset

In this work, 959 carbon steel microstructure images are used for training and testing purposes, which is taken from literature [97]. These microstructures are having different primary micro constituent phases such as martensite (M), pearlite (P), spheroidite (S) (precipitate), pearlite and spheroidite (P + S), pearlite and widmanstatten (P + W), spheroidite and widmanstatten (S + W). Table 3 indicates the variation in the dataset.

Table 3 Number of data sample in each phase for steel micrograph prediction

3.2.2 Description of Computational Scheme

The aim of this work is to propose a rational and effective strategy for classifying constituent phases present in a steel microstructure through machine intelligence. The steps involves are as follows:

  1. i.

    Collection of sample images: Input images are collected from previous literature.

  2. ii.

    Preprocessing of those images such denoising.

  3. iii.

    Selection of source model: A pre-trained source model residual network is chosen from available models.

  4. iv.

    Reusage of model: The pre-trained model can then be used as the starting point for a model on the second task of interest. This may involve using all or parts of the model, depending on the modeling technique used.

  5. v.

    Tuning the model: Optionally, the model may need to be adapted or refined on the input–output pair data available for the task of interest. In this work, one more layer added to the model for classification purpose.

3.2.3 Description of Computational Tools (Deep Learning)

Deep learning is the study of neural network (NN) and “end to end” learning mechanism and these mechanisms are capable of understanding the complex features from input data and self-process those data to learn the model. Unlike traditional approaches, deep learning enables feature extraction and classifications of data process simultaneously.

Convolutional Neural Network Convolutional neural network is the deep neural networks that are primarily used to classify images, cluster them by similarity, and then perform identification by its self-learning mechanism [34, 98,99,100]. CNN is widely used in various computational recognition applications like face recognition, medical image identification, transportation, etc. Unlike artificial neural networks, where the input is a vector, here the input is a multi-channeled image. The Typical CNN comprises stacked layered components named convolution, pooling, flattening and full connection as shown in Fig. 6.

Fig. 6
figure 6

Deep neural network based microstructure phase prediction [cf. [34] ]

Transfer Learning Human beings can naturally pass information through various activities. When learning about one task, “what we gain” is the knowledge to solve related tasks in the same way. The more connected are the activities, the better our information can be passed and cross-used. There would be some simple examples like, skill to ride a motorbike, and learn how to ride a similar vehicle. Convolutional neural networks require a relatively large amount of data to learn features from images and use that for the classification. If this requirement of a large dataset isn’t fulfilled, the framework faces the problem with generalization, and this leads to the issue of overfitting. Recently, CNNs are being used for a wide and varied number of applications. A challenge faced by the studies involving CNNs, is the lack of a large enough labeled dataset. This severely hampers the classification accuracy of the model. To combat this issue, data augmentation techniques like generative adversarial networks (GANs) have been used to create new images from previously available images [101]. Generally, GANs require a large number of training examples to learn the complexities of the dataset and generate valid images, but some recent work has shown the effectiveness of GANs, when trained with a small amount of data [102]. Other techniques like semi-supervised learning have also been used in this respect, which uses a combination of labeled and unlabeled data for the learning process [103]. These methods along with other data augmentation methods like affine transformation have several drawbacks. They either carry with them the computation overhead of generating the images (in case of GANs) or to perform the transformations. Moreover, there is an assumption in these methods, that the distribution and the complexities of the labelled and unlabeled data is the same. However, this is not the case in most real-life applications. Transfer learning is the ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks or new domains, which share some commonality [104, 105]. Transfer learning is most commonly used methodology to train models and it can be fine-tuned with used dataset to produce appropriate results. Hence, it can be easily reused to fit to problem of classifying images. Transfer learning, is an optimization technique for saving time and having better performance.

Given a source domain \( D_{S} \) and learning task \( T_{S} \), a target domain \( D_{T} \) and learning task \( T_{T} \), transfer learning aims to help improve the learning of the target predictive function \( \varOmega_{T} \) in \( D_{T} \) using the knowledge in \( D_{S} \) and \( T_{S} \), where \( D_{S} \ne D_{T} \), or \( T_{S} \ne T_{T} \) [37]. Here, domain \( D \) is defined as a set containing two components, the feature space, \( X \) and the probability distribution of the features in this space, \( P\left( X \right) \). While the task \( T \) is a set containing two components, the label space \( Y \), and the predictive function \( \varOmega \). A pre-trained model would be the Res-Net dataset, which has a huge number of pictures relating to various classifications. Figure 7 shows a schematic representation of working framework of transfer learning model.

Fig. 7
figure 7

Schematic diagram of traditional machine learning model Vs transfer learning model

Deep Transfer Learning Strategies Deep transfer learning strategies has empowered to handle complex issues and yield reliable outcomes. In any case, the preparation of training set time and the measure of information required for such transfer learning frameworks are substantially more than that of conventional ML frameworks. Given a target task, identification of the commonality between the new task and previous (source) tasks, and transfer knowledge from the previous tasks to the target one, are carried out by various pre trained network such as Image Net [106], Alex Net [107], VGG 16 [108] etc. Transfer learning on the other hand neither requires the computation overhead, nor assumes any similarity between the complexities and domains of the training and testing datasets. A schematic diagram of pre trained network is shown in Fig. 8. Transfer learning extends the analogy that neural network-based classifiers make with the human mind. These networks have been trained on the ImageNet dataset.

Fig. 8
figure 8

Schematic diagram of pre trained network with custom layers

The ImageNet dataset is composed of over 14 million images with 1000 classes. The lower layers of the network are responsible for extracting the lower level features of an image like shapes, curves, and lines. The higher levels, on the other hand, extract and learn more complex features of the image. Most of the features learned in these layers are common across many computer vision problems; therefore, there is no need to learn them from the scratch. The models like VGG, ImageNet, and ResNet have been trained on the vast ImageNet dataset. To learn the intricacies of the target dataset, training only the upper layers of these models is required. The lower layers are frozen and the weights are not varied. Finally, a fully connected dense layer is added to better understand the specificity of the target dataset.

Residual Network (ResNet-18) One of the most popular pre trained network is residual network (ResNet 18). ResNet is a pre-trained model for better accuracy and improvement compared to normal plain network as it utilizes skip connections [109, 110]. The skip connections are used to skip a network layer to decrease the computation complexity and to increase the accuracy, as the network gets deeper. Figure 9(b) shows how a layer is skipped by adding the output of previous layer to the output of next layer and therefore, skipping the current layer.

Fig. 9
figure 9

Schematic diagram of a Plain Block, b Residual Block

3.3 Computer Vision Approaches for Phase Segmentation from Steel Micrograph

3.3.1 Description of Computational Tools

Digital Image Digital image processing is a branch of computerized methods that is gaining rapid popularity in the study of metallographic images [7, 111]. An image is two dimensional (i.e. two coordinates x and y in a plane) light intensity based function represented as f (x,y) [112, 113]. The value of the function f value may vary with the image brightness and or gray level of the matrix element. Such matric element referred as pixels. The digital images usually are processed through following techniques

Image Pre-Processing Preparing an image for improvement of image data for further analysis is termed as pre-processing. The original images are essentially color image (red–green–blue (RGB) converted to grayscale having 16 levels or binary scale whatever is applicable.

Noise Reduction Noise reduction is an important task in image pre-processing steps in computer vision and several techniques have been explored for solving the noise reduction from images. The solutions for noise reduction can be explored in two different ways namely, linear methods and nonlinear methods. Amongst these, two of the nonlinear methods are most popular because their behavior is well suited for the human visual system (HVS). They are adaptable to certain special noises that are impulsive and multiplicative in nature and thus are difficult to remove by linear methods [114].

The approaches are defined by two principles; one is nonlocal means technique and the other is sparseness of data [115]. In a non-local means algorithm, the mean value of all the pixels is calculated and weighted by considering similarity of these pixels with the target pixel. As a result, the post-filter process provide clearer and fewer loss details from the input image as compared to the local means algorithm. In an image, many pixels are having the same values, known as self-similarity pixels [116]. In Fig. 10, q1, q2, and q3 are three-neighbor pixels with respect to p.

Fig. 10
figure 10

Self-similarity pixel of non local means filter. Similar pixel neighborhoods give a large weight, w(p,q1) and w(p,q2), while much different neighborhoods give a small weight w(p,q3) [cf. [49] ]

In Fig. 10, most of the pixels in the vicinity of p will have properties similar to p’s neighborhood. Such self-similarity properties can be used for de-noising of the image. A pixel with similar neighborhoods can determine the de-noised values of a pixel, which is the working principle of non-local means (NL-means) de-noising algorithm. The following expression is used to get NL-means

$$ NL\left( V \right)\left( p \right) = \mathop \sum \limits_{q \in V} w\left( {p,q} \right)V\left( q \right) $$
(3.13)

where V is the noisy image, w(p,q) are pixel weights that must satisfy two conditions \( 0 \le w\left( {p,q} \right) \le q \) and \( \mathop \sum \limits_{q} w\left( {p,q} \right) = 1 \). From the average of all pixels, each pixel’s weight is calculated.

Edge Detections Along with noise detection, edge detection also grabs attention in dealing with image processing. Vaious method are used in edge detection and these methods are involved in statistical calculations, differentiation, machine learning, active contouring, multi-scaling, and anisotropic diffusion [117]. Anisotropic methods use morphological edge detectors on sparse representation of image data and are considered as State-of-art methods for edge detection [118]. Various neural network models inspired by nature have been proposed [119] along with several machine-learning approaches to find solutions [120]. Multi-fractal methods [121] and Markov models [122] are also the effective techniques proposed for detecting edges on images. Different working principles are available for image segmentation. The principles can be broadly categorized into a) traditional methods; normalized cut methods (NCM), efficient graph-based methods (EG), mean shift (MS), level set (LS), ratio contour (RC) and b) soft computing techniques. Traditional methods use thresholding, morphological methods, and edge-based segmentation. Soft computing is based on techniques like fuzzy theory, artificial newral network

(ANN), and genetic algorithm (GA). The adaptive accuracy and adaptability of soft computing techniques make them the most widely adopted methods. Few other methods are state transition algorithms, spanning tree-based methods [123] fuzzy logic-based techniques [124].

A hybrid of machine learning and markov model [125] are also available and implemented for image segmentation. Pixel-based method [126] is a segmentation technique that starts by initializing the seed points that represent the region Ns iterative includes other pixels to individual regions. The division follows greyscale properties for segmentation. The available pool of literatures provides numerous techniques for both edge detection and segmentation [127]. Sobel edge detector follows the gradient-based method which is based on 1st order derivatives. It calculates the first-order derivatives from the image, separately for the x and y-axis. After rotation of one kernel by 90°, another kernel is formed. Consider X is an input image and Gx and Gy are separate measurements of the gradient component in each orientation, the operator uses two 3 × 3 convolution kernel.

figure a

Here X axis denotes increasing value in the right direction, where Y axis denotes increasing value in the down direction. The resultant gradient magnitude can be measure by:

$$ \left| G \right| = \sqrt {G_{x}^{2} } + G_{y}^{2} $$
(3.14)

In addition, gradient direction calculation as given below

$$ \theta = { \arctan }\left( {\frac{Gx}{Gy}} \right) $$
(3.15)

It is observed that Sobel edge detection takes quite less computational time along with one more important feature in Sobel operator i.e., it is very simple as it uses the first derivative filter over Laplacian of Gaussian (LoG) and canny edge detection algorithm. Hence, the chance of feature loss during edge detection is less. A step-wise image processing techniques applied on metallographic images is shown in Fig. 11.

Fig. 11
figure 11

Image processing-based grain boundary preservation, a original image, b after denoising using bilateral filer, c gray-scale image, d fuzzy C-means clustering, e morphological erode operation, f morphological dilation operation, g region growing segmented image, h final refined image. [cf. [30] ]

4 Conclusion

In this study, some application areas in metallurgical practices, those can be solved by using computer vision and machine learning techniques are studied and explored. With the rapid evolution of integrated computational materials engineering (ICME), machine learning has been widely applied for better design of the available alloys and discovery of new alloys to achieve better technological performance.

In the first part, the multi principal element alloys are explored on the basis of understanding of the experimental results reported in various literatures. In order to design MPEAs, a large and varied composition space is available that is yet to be explored exhaustively as of now and thus the inclusion of a rapid screening technique is necessary to select the target combination of phases for a given compositional space. Hence, various classification techniques are discussed to get better classification accuracy from the MPEA dataset.

Classification of steel microstructures has been discussed, based on the primary constituent phases, in order to provide salient feature of that microstructure. This work demonstrates the feasibility of an effective steel microstructures classification scheme using deep learning methods without the need of separate segmentation and feature extraction mechanism allowing to handle complex microstructures with higher level of noise. For this purpose, a pixel wise microstructural image segmentation using a pre-trained residual network using transfer learning is developed. The findings of the present theoretical framework based on transfer learning has been found to be significant and effective step towards qualitative as well as quantitative interpretation of steel microstructure. A data driven model has been developed which is capable to conduct the qualitative and quantitative analysis of microstructures of plain carbon steel.

A computational method has been developed to classify the different process routes of steel processing based on composition-process-property correlation. Eventually the analysis provide a correlation among the constituent elements and process parameter. This study provides a model to predicts the steel processing method based on experimental results available in literatures.

The present work can be extended to the benefit of metallurgical industries by helping in quantitative and qualitative analysis of phase evolution and thereby prediction of properties based on microstructure-property correlations. Improved accuracy of the proposed method can be achieved by increasing the size of the data set and there is adequate scope of optimization of complexity necessary for obtaining accurate performance and reducing computational time.

This work will also be directly/indirectly helpful to estimate physical and mechanical properties based on the structure property correlation. The study predicts the steel processing method based on experimental data and the same work can be extended to predict properties of different materials by considering their composition and process routes, if enough data sets are available.

In summary, a simple attempt has been made in the present study to highlight the encouraging convergence among the principles of material science, computing techniques and informatics, which has been identified as the driver for Industry 4.0. It is reasonable to expect that, the approach address in the present study will come up, in the future research, particularly in the domain of materials informatics.