Abstract
Artificial intelligence (AI) and Machine learning (ML) train machines to achieve a high level of cognition and perform human-like analysis. Both AI and ML seemingly fit into our daily lives as well as complex and interdisciplinary fields. With the rise of commercial, open-source, and user-catered AI/ML tools, a key question often arises whenever AI/ML is applied to explore a phenomenon or a scenario: what constitutes a good AI/ML model? Keeping in mind that a proper answer to this question depends on various factors, this work presumes that a goodmodel optimally performs and best describes the phenomenon on hand. From this perspective, identifying proper assessment metrics to evaluate the performance of AI/ML models is not only necessary but is also warranted. As such, this paper examines 78 of the most commonly-used performance fitness and error metrics for regression and classification algorithms, with emphasis on engineering and sciences applications.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Learning is the process of seeking knowledge [1]. We, as humans, can learn from our daily interactions and experiences because we have the ability to communicate, reason, and understand. With the rapid technological advancement in computer sciences, computational intelligence has led to the development of modern cognitive and evaluation tools [2, 3]. One such tool is machine learning (ML) which is often described as a set of methods that, when applied, can allow machines to learn/understand meaningful patterns from data repositories; while maintaining minimal human interaction [4]. More specifically, a “computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E” [5]. In other words, ML trains machines to understand real-world applications, use this knowledge to carry out pre-identified tasks with the goal of optimizing and improving the machines’ performance with time and new knowledge. A closer look at the definition of ML infers that computers do not learn by reasoning but rather by algorithms.
From the perspective of this work, traditional statistical regression techniques are often used to carry out behavioral modeling wherein such techniques may suffer from large uncertainties, the need for the idealization of complex processes, approximation, and averaging widely varying prototype conditions. Furthermore, statistical analysis often assumes linear, or in some cases nonlinear, relationships between the output and the predictor variables, and these assumptions do not always hold true – especially in the context of engineering/real data. On the other hand, ML methods adaptively learn from experiences and extract various discriminators. One of the major advantages of ML approaches over the traditional statistical techniques is their ability to derive a relationship(s) between inputs and outputs without assuming prior forms or existing relationships. In other words, ML approaches are not confined to one particular space that requires the availability of physical representation but rather goes beyond that to explore hidden relations in data patterns [6,7,8,9,10,11].
While ML was initially developed for computer sciences, it is now an integral part of various fields including, energy/mechanical engineering [6,7,8,9], social sciences [10, 11], space applications [12, 13], among others [14,15,16,17,18,19]. Due to the availability of high-computationally powered machines and ease-of-access to data (thanks in part to the rise of Internet-of-Things and data-driven-applications), the utilization of ML into civil engineering, in general, and materials science, engineering in particular, has been duly noted in recent years [20,21,22,23,24,25].
An integral part of the wide spread of integrating ML into new research areas is due to the availability of user-friendly and easy-to-use software packages that simplifies the process of ML by utilizing pre-defined algorithms and training/validation procedure [26,27,28,29,30]. The availability of such tools, while facilitating ML analysis and providing new opportunities for researchers often unfamiliar with the ML fundamentals with means to easily carry out such analysis, could still be misused by providing a false sense of analysis interpretation [31]. Another concern of utilizing user-ready approaches to carry out ML analysis lies in the need for compiling proper observations (i.e. datapoints). In some classical fields (say material sciences, earthquake or fire engineering) where there is a limited number of observations due to expensive tests, or need for specialized instrumentation/facilities [32], then the use of ML may lead to a biased outcome – especially when combined with lack of expertise on ML [33, 34].
An examination of open literature raises a few questions: 1) are we developing accurate ML models? 2) are such models useful to our fields? 3) are we properly validating ML models? And 4) how to confidently answer “yes” to the aforementioned questions?
A distinction should be drawn in which we need to acknowledge that, we often apply existing ML algorithms to our problems rather than developing new algorithms. This acknowledgment goes hand in hand with that similar to applying other numerical tools such as the finite element method, to investigate the response of materials and structures (say concrete beams) under harsh environments (i.e. fire conditions) [35, 36]. From this perspective, we use an existing tool, say a finite element (FE) software (ANSYS [37], ABAQUS [38] etc.), to investigate how failure mechanism occurs in a concrete beam under fire. The accuracy of this FE model is often established through a validation procedure in which a comparison of predictions from the FE model (say temperature rise in steel rebars or mid-span deflection during a fire, or in some cases, point in time when the beam fails) is plotted against that measured in an actual fire test. If the comparison is deemed well, then the FE model is said to be valid and hence can be used to explore the effect of key response parameters (i.e. magnitude of loading, strength of concrete, intensity of fire etc.). From this perspective, the validity of an FE model is established if the variation between predicted results and measured observations is between 5–15%Footnote 1 [39].
Unlike the use of FE simulation, ML is often used in two domains: 1) to show the applicability of ML to understand a phenomenon [40, 41], and 2) to identify hidden patterns governing a phenomenon [33, 42]. In the first domain, ML is primarily used to show that an ML algorithm can replicate a phenomenon – or in other words, to validate the applicability of that particular ML algorithm to a material science problem (i.e. can deep learning be applied to predict the compressive strength of concrete given that information regarding the components in a concrete mix is available?). While works in this domain showcase the diversity of ML, these also provide an additional validation platform/case studies to already well-established algorithms. The contribution of such works to our knowledge base is to be thanked and acknowledged.
The second domain is where ML shines and can be proven as a powerful ally to researchers. This is because ML strives on data and is designed to explore hidden features and patterns. The integration of these two items has not been thoroughly applied into our fields and, if applied properly, cannot only open new opportunities but also revolutionize our perspective into our fields. Unfortunately, the open literature continues to lack works in this domain, and hence such works are to be encouraged.
Whether ML is used in the first or second domain, ML models need to be rigorously assessed [43, 44]. This is a critical key to ensure: 1) the validity of the developed ML model in understanding a complex phenomenon given a limited set of data points, and 2) proper extension of the same models towards new/future datasets. Traditionally, the adequacy of ML models is often established through performance fitness and error metrics (PFEMs). Performance and error measures are vital elements in the process of evaluating ML models/frameworks. These are defined as logical and/or mathematical constructs intended to measure the closeness of actual observations to that expected (or predicted). In other words, PFEMs are used to establish an understanding of how predictions from a model compare to real (or measured) observations. Such metrics often relate to the variation between predicted and measured observations in terms of errors [45,46,47].
Diverse sets of performance metrics have been noted in the open literature i.e. correlation coefficient (R), root mean squared error (RMSE), etc. In practice, one, a multiple, or a combination of metrics are used to examine the adequacy of a particular ML model. However, there does not seem to be a systematic view into which scenarios specific metrics are preferable to use. In order to bridge this knowledge gap, this work compiles the commonly-used PFEMs and highlights their use in evaluating the performance of regression and classification ML models.
Performance Fitness and Error Metrics
This section presents the most widely-used PFEMS and highlights fundamentals, recommendations, and limitations associated with their use in assessing ML models.Footnote 2 In this work, PFEMs are grouped under two categories; traditional and modern. In this section, these reoccurring terms are used; A: actual measurements, P: predictions, n: number of data points.
Regression
Regression ML methods deal with predicting a target value using independent variables. Some of these methods include artificial neural networks, genetic programing, etc. PFEMs grouped herein belong to a group of metrics that are based on methods to calculate point distance primarily using subtraction or division operations. These metrics contain fundamental operations, either A-P or P/A, and can be supplemented with absoluteness or squareness. These are the most widely-used metrics in literature. The simplest form of common PFEMs results from subtracting a predicted value from its corresponding actual/observed value. This is often straightforward, easy to interpret, and most of all yields the magnitude of error (or difference) in the same units as those measured and predicted and can indicate if the model overestimates or underestimates observations (by analyzing the sign of the reminder). One should remember that an issue could arise where due to the opposite between predictions and observations i.e. canceling positive and negative errors. In this scenario, a zero error could be calculated, indicating false accuracy.
This can be avoided by using an absolute error (i.e. |A-P|) which only yields non-negative values. Analogous to traditional error, the absolute error also maintains the same units of predictions (and observations), and hence is easily relatable. However, due to its nature, the bias in absolute errors cannot be determined.
Similar to the same concept of absolute error, the squared error also mitigates mutual cancellation of errors. This metric can be continuously differentiable and thus facilitates optimization. However, this metric emphasizes relatively large errors (as opposed to small errors), unlike absolute error, and could be susceptible to outliners. The fact that the units of squared error is squared leads to unconventional units for error (i.e. squared days); which are not intuitive. Other metrics may also include logarithmic quotient error (i.e. ln(P/A)) as well as absolute logarithmic quotient error (i.e. |ln(P/A)|). Table 1 lists other commonly used metrics, together with some of their limitations and shortcomings as identified by surveyed studies.
Most of the works conducted so far in the areas of engineering applications only utilized a few of the above PFEMs [20, 33, 61, 62, 72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92]. The bulk of the reviewed works continue to incorporate traditional metrics such as R, R2, MAE, MAPE, and RMSE as primary indicators of adequacy of the regression-based ML models. This seems to stem from our familiarity with these indicators, as opposed to others; such as Golbraikh and Tropsha’s [58] criterion, QSAR model by Roy and Roy [59], Frank and Todeschini [60], and specifically designed objective functions, often used in the realms of other fields and data sciences. It should be noted that out of the reviewed studies, the works of Gandomi et al. [90], Golafshani and Behnood [40] as well as Cheng et al. [62] applied a multi-criteria verification process that incorporated the use of traditional as well as modern PFEMs. Utilizing multi-criteria is not only beneficial to ensure the validity of a particular ML model but is also recommended to overcome some of the identified limitations of traditional metrics in Table 1 and hence should be encouraged.
Classification
In ML, classification refers to categorizing data into distinct classes. This is a supervised learning approach where machines learn to classify observations into binary or multi-classes. Binary classes are those with two labels (i.e. positive vs. negative etc.), and multi-classes are those having more than two labels (i.e. types of concrete e.g., normal strength, high strength, high performance etc.). Classification algorithms may include logistic regression, k-nearest neighbors, support vector machines, etc. [93, 94].
The performance of classifiers is often listed in a confusion matrix. This matrix contains statistics about actual and predicted classifications and lays the fundamental foundations necessary to understand accuracy measurements for a specific classifier. Each column in this matrix signifies predicted instances, while each row represents actual instances. This matrix was identified to be the “go-to” metric used in studies examining materials science and engineering problems [22, 95,96,97,98]. However, there are other PFEMs that can be used to evaluate classification models, and these, along with others, are listed in Table 2. Similar to Table 1, Table 2 also lists some of the remarks and limitations pointed out by surveyed works. In this table, P (denotes number of real positives), N (denotes number of real negatives), TP (denotes true positives), TN (denotes true negatives), FP (denotes false positives), and FN (denotes false negatives).
Closing Remarks
Our confidence in the accuracy of predictions obtained from ML algorithms heavily relies on the availability of actual observations and proper PFEMs. From this point of view, it is unfortunate that observations relating to the engineering discipline continue to be 1) limited in size, and 2) lack completeness. The lack of such observations is often related to limitations in conducting full-scale tests, the need for specialized equipment, and a wide variety of tested samples. For instance, one can think of how normal strength concrete mixes can significantly vary from one study to another simply due to variation in raw materials, mix proportions, and casting/curing procedures, etc.
Combining the above two points with the notion of simply “applying ML” to understand a given phenomenon (say flexural strength of beams) without a thorough validation is deemed to fail. In fact, in many instances, researchers noted the validity of a specific ML model by reporting its performance against traditional PFEMs, only to be later identified that such a model does not properly represent actual observations – despite having good fitness. This can be avoided by adopting a rigorous validation procedure [121, 122]. Unfortunately, many of the published studies in the area of ML application in engineering do not include multi-criteria/additional validation phases and simply rely on conventional performance metrics such as R or R2 of the derived models. Furthermore, adopting a set of PFEMs does not negate the occurrence of some common issues, most notably, overfitting, biasedness etc. As such, an analysis that utilizes ML should also consider some of the following techniques e.g. use of independent test datasets, varying degrees of cross-validation etc.
In order to ensure fruitful use of ML, it is our duty to seek proper application of ML. Besides, one of the major concerns about the ML-based models is their robustness under a wide range of conditions [123]. A robust ML model should not only provide reasonable PFEMs but should also be capable of capturing the underlying physical mechanisms that govern the investigated system [124]. An essential approach to verify the robustness of the ML models is to perform parametric and sensitivity analyses [123, 125]. These types of analyses ensure that the ML predictions are in sound agreement with the system’s real behavior and physical processes rather than being merely a combination of the variables with the best fit on the data. Another item to consider is to develop a user-friendly phenomenon-specific recommendation system wherein novice users who apply pre-identified PFEMs are selected to evaluate the performance of a given problem (say using R2 in a regression problem etc.).
The reader is to remember that the addition of one example to showcase recommended or important PFEMs negates the purpose of this paper (which is to compile commonly used performance metrics and list their key characteristics into one document to provide interested researchers in carrying out a ML analysis with a starting point to select proper performance metrics). Providing a comparison for all of the reviewed metrics will significantly extend this work beyond its scope and may not be feasible at the moment. We feel that this is best suited for a series of more in-depth reviews wherein metrics for classification and regression problems can be separately evaluated and reviewed under well-designed problems and a variety of conditions to ensure fairness and unbiasedness to come in the near future.
It is our intention to not specifically identify a measure (or a set of measures) due to the wide range of problems (as well as the quality of data) that a scientist could face. Please note that other researchers (which are quoted herein) also followed a similar approach.
-
o
“Although some methods clearly perform better or worse than other methods on average, there is significant variability across the problems and metrics. Even the best models sometimes perform poorly, and models with poor average performance occasionally perform exceptionally well.” [126].
-
p
“It is clearly difficult to convincingly differentiate ML algorithms (and feature reduction techniques) on the basis of their achievable accuracy, recall and precision.”[127].
-
q
“Different performance metrics yield different tradeoffs that are appropriate in different settings. No one metric does it all, and the metric optimized to or used for model selection does matter.”[102].
Conclusions
Based on the information presented in this note, the following conclusions can be drawn.
-
ML is expected to rise into a key analysis tool in the coming few years; especially within material scientists and structural engineers. As such, the integration of ML is to be thorough and proper. Hence, the need for proper validation procedure.
-
A variety of performance metrics and error metrics exists for regression and classification problems. This work recommends the utilization of multi-fitness criteria (where a series of metrics are checked on one problem) to ensure the validity of ML models as these metrics may overcome some of the limitations of induvial metrics. Such metrics can be of independent nature to each other such as, R2, RSME, and a20−index.
-
The performance of the existing metrics and future fitness functions can be further improved through systematic collaboration between researchers of interdisciplinary backgrounds. For example, efforts are invited to identify and recommend metrics suitable for specific problems and datasets.
-
Future works should be directed towards documenting and exploring performance metrics for other types of learnings such as unsupervised learning and reinforcement learning. This is ongoing research need that is to be addressed in the coming years.
Data Availability
No data, models, or code were generated or used during the study.
Notes
References
Mahdavi S, Rahnamayan S, Deb K (2018) Opposition based learning: A literature review. Swarm Evol Comput. https://doi.org/10.1016/j.swevo.2017.09.010
Botchkarev A (2019) A new typology design of performance metrics to measure errors in machine learning regression algorithms. Interdiscip J Information Knowledge Manag 14:045–076. https://doi.org/10.28945/4184
Bishop C (2007) Pattern Recognition and Machine Learning. Technometrics. https://doi.org/10.1198/tech.2007.s518
Fu G-S, Levin-Schwartz Y, Lin Q-H, Zhang D (2019) Machine Learning for Medical Imaging. J Healthc Eng. https://doi.org/10.1155/2019/9874591
Michalski, R. S., Carbonell, J. G., & Mitchell TM (1983) Machine learning: An artificial intelligence approach.
Majidifard H, Jahangiri B, Buttlar WG, Alavi AH (2019) New machine learning-based prediction models for fracture energy of asphalt mixtures. Meas J Int Meas Confed. https://doi.org/10.1016/j.measurement.2018.11.081
Hu X, Li SE, Yang Y (2016) Advanced Machine Learning Approach for Lithium-Ion Battery State Estimation in Electric Vehicles. IEEE Trans Transp Electrif. https://doi.org/10.1109/TTE.2015.2512237
Voyant C, Notton G, Kalogirou S, et al (2017) Machine learning methods for solar radiation forecasting: A review. Renew. Energy
Shukla R, Singh D (2017) Experimentation investigation of abrasive water jet machining parameters using Taguchi and Evolutionary optimization techniques. Swarm Evol Comput. https://doi.org/10.1016/j.swevo.2016.07.002
Hindman M (2015) Building Better Models: Prediction, Replication, and Machine Learning in the Social Sciences. Ann Am Acad Pol Soc Sci. https://doi.org/10.1177/0002716215570279
Grimmer J (2014) We are all social scientists now: How big data, machine learning, and causal inference work together. In: PS - Political Science and Politics
Naser M, Chehab A (2018) Materials and design concepts for space-resilient structures. Prog Aerosp Sci 98:74–90. https://doi.org/10.1016/j.paerosci.2018.03.004
Rashno A, Nazari B, Sadri S, Saraee M (2017) Effective pixel classification of Mars images based on ant colony optimization feature selection and extreme learning machine. Neurocomputing. https://doi.org/10.1016/j.neucom.2016.11.030
Jordan MI, Mitchell TM (2015) Machine learning: Trends, perspectives, and prospects. Science 349:255–260. https://doi.org/10.1126/science.aaa8415
Seitllari A (2014) Traffic Flow Simulation by Neuro-Fuzzy Approach. In: Second International Conference on Traffic. Belgrade, pp 97–102
Naser MZ (2019) AI-based cognitive framework for evaluating response of concrete structures in extreme conditions. Eng Appl Artif Intell 81:437–449. https://doi.org/10.1016/J.ENGAPPAI.2019.03.004
Li X, Qiao T, Pang Y et al (2018) A new machine vision real-time detection system for liquid impurities based on dynamic morphological characteristic analysis and machine learning. Meas J Int Meas Confed. https://doi.org/10.1016/j.measurement.2018.04.015
Oleaga I, Pardo C, Zulaika JJ, Bustillo A (2018) A machine-learning based solution for chatter prediction in heavy-duty milling machines. Meas J Int Meas Confed. https://doi.org/10.1016/j.measurement.2018.06.028
Shanmugamani R, Sadique M, Ramamoorthy B (2015) Detection and classification of surface defects of gun barrels using computer vision and machine learning. Meas J Int Meas Confed. https://doi.org/10.1016/j.measurement.2014.10.009
Naser MZ (2019) Properties and material models for common construction materials at elevated temperatures. Constr Build Mater 10:192–206. https://doi.org/10.1016/j.conbuildmat.2019.04.182
Raccuglia P, Elbert KC, Adler PDF et al (2016) Machine-learning-assisted materials discovery using failed experiments. Nature. https://doi.org/10.1038/nature17439
Alavi AH, Hasni H, Lajnef N et al (2016) Damage detection using self-powered wireless sensor data: An evolutionary approach. Meas J Int Meas Confed. https://doi.org/10.1016/j.measurement.2015.12.020
Farrar CR, Worden K (2012) Structural Health Monitoring: A Machine Learning Perspective
Mcfarlane C (2011) The city as a machine for learning. Trans Inst Br Geogr. https://doi.org/10.1111/j.1475-5661.2011.00430.x
Chan J, Chan K, Yeh A (2001) Detecting the nature of change in an urban environment: A comparison of machine learning algorithms. Photogramm. Eng. Remote Sensing
King DE (2009) Dlibml: A Machine Learning Toolkit. J Mach Learn Res
Collobert R, Kavukcuoglu K, Farabet C (2011) Torch7: A Matlab-like Environment for Machine Learning
Hall M, Frank E, Holmes G et al (2009) The WEKA data mining software. ACM SIGKDD Explor Newsl DOI 10(1145/1656274):1656278
Ramsundar B (2016) TensorFlow Tutorial. CS224d
Zaharia M, Franklin MJ, Ghodsi A et al (2016) Apache Spark. Commun ACM. https://doi.org/10.1145/2934664
Korolov M (2018) AI’s biggest risk factor: Data gone wrong | CIO. In: CIO. https://www.cio.com/article/3254693/ais-biggest-risk-factor-data-gone-wrong.html. Accessed 5 Jul 2019
Kodur VKR, Garlock M, Iwankiw N (2012) Structures in Fire: State-of-the-Art, Research and Training Needs. Fire Technol 48:825–839. https://doi.org/10.1007/s10694-011-0247-4
Naser MZ (2019) Fire Resistance Evaluation through Artificial Intelligence - A Case for Timber Structures. Fire Saf J 105:1–18. https://doi.org/10.1016/j.firesaf.2019.02.002
Domingos P (2012) A few useful things to know about machine learning. Commun ACM. https://doi.org/10.1145/2347736.2347755
Shakya AM, Kodur VKR (2015) Response of precast prestressed concrete hollowcore slabs under fire conditions. Eng Struct. https://doi.org/10.1016/j.engstruct.2015.01.018
Kodur VKR, Bhatt PP (2018) A numerical approach for modeling response of fiber reinforced polymer strengthened concrete slabs exposed to fire. Compos Struct 187:226–240. https://doi.org/10.1016/J.COMPSTRUCT.2017.12.051
Kohnke PC (2013) ANSYS. In: © ANSYS, Inc.
Abaqus 6.13 (2013) Abaqus 6.13. Anal User’s Guid Dassault Syst
Franssen JM, Gernay T (2017) Modeling structures in fire with SAFIR®: Theoretical background and capabilities. J Struct Fire Eng. https://doi.org/10.1108/JSFE-07-2016-0010
Golafshani EM, Behnood A (2018) Automatic regression methods for formulation of elastic modulus of recycled aggregate concrete. Appl Soft Comput J. https://doi.org/10.1016/j.asoc.2017.12.030
Sadowski Ł, Nikoo M, Nikoo M (2018) Concrete compressive strength prediction using the imperialist competitive algorithm. Comput Concr. https://doi.org/10.12989/cac.2018.22.4.355
Alavi AH, Gandomi AH, Sahab MG, Gandomi M (2010) Multi expression programming: A new approach to formulation of soil classification. Eng Comput 26:111–118. https://doi.org/10.1007/s00366-009-0140-7
Mirjalili S, Lewis A (2015) Novel performance metrics for robust multi-objective optimization algorithms. Swarm Evol Comput. https://doi.org/10.1016/j.swevo.2014.10.005
Mishra SK, Panda G, Majhi R (2014) A comparative performance assessment of a set of multiobjective algorithms for constrained portfolio assets selection. Swarm Evol Comput. https://doi.org/10.1016/j.swevo.2014.01.001
Schmidt MD, Lipson H (2010) Age-fitness pareto optimization
Cremonesi P, Koren Y, Turrin R (2010) Performance of Recommender Algorithms on Top-N Recommendation Tasks Categories and Subject Descriptors. RecSys
Laszczyk M, Myszkowski PB (2019) Survey of quality measures for multi-objective optimization: Construction of complementary set of multi-objective quality measures. Swarm Evol Comput 48:109–133. https://doi.org/10.1016/J.SWEVO.2019.04.001
Willmott CJ, Matsuura K (2005) Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res. https://doi.org/10.3354/cr030079
Makridakis S (1993) Accuracy measures: theoretical and practical concerns. Int J Forecast. https://doi.org/10.1016/0169-2070(93)90079-3
Ferreira C (2001) Gene Expression Programming: a New Adaptive Algorithm for Solving Problems. Ferreira, C (2001) Gene Expr Program a New Adapt Algorithm Solving Probl Complex Syst 13
(2016) Handbook of Time Series Analysis, Signal Processing, and Dynamics
Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. Int J Forecast. https://doi.org/10.1016/j.ijforecast.2006.03.001
Shcherbakov MV, Brebels A, Shcherbakova NL et al (2013) A survey of forecast error measures. World Appl Sci J. https://doi.org/10.5829/idosi.wasj.2013.24.itmies.80032
Bain LJ (1967) Applied Regression Analysis. Technometrics. https://doi.org/10.1080/00401706.1967.10490452
Armstrong JS, Collopy F (1992) Error measures for generalizing about forecasting methods: Empirical comparisons. Int J Forecast. https://doi.org/10.1016/0169-2070(92)90008-W
Poli AA, Cirillo MC (1993) On the use of the normalized mean square error in evaluating dispersion model performance. Atmos Environ Part A, Gen Top. https://doi.org/10.1016/0960-1686(93)90410-Z
Smith G (1986) Probability and statistics in civil engineering. Collins, London
Golbraikh A, Shen M, Xiao Z et al (2003) Rational selection of training and test sets for the development of validated QSAR models. J Comput Aided Mol Des 17:241–253. https://doi.org/10.1023/A:1025386326946
Roy PP, Roy K (2008) On some aspects of variable selection for partial least squares regression models. QSAR Comb Sci 27:302–313. https://doi.org/10.1002/qsar.200710043
Frank I, Todeschini R (1994) The data analysis handbook
Gandomi AH, Yun GJ, Alavi AH (2013) An evolutionary approach for modeling of shear strength of RC deep beams. Mater Struct Constr. https://doi.org/10.1617/s11527-013-0039-z
Cheng MY, Firdausi PM, Prayogo D (2014) High-performance concrete compressive strength prediction using Genetic Weighted Pyramid Operation Tree (GWPOT). Eng Appl Artif Intell. https://doi.org/10.1016/j.engappai.2013.11.014
Alwanas AAH, Al-Musawi AA, Salih SQ et al (2019) Load-carrying capacity and mode failure simulation of beam-column joint connection: Application of self-tuning machine learning model. Eng Struct. https://doi.org/10.1016/j.engstruct.2019.05.048
Chou JS, Tsai CF, Pham AD, Lu YH (2014) Machine learning in concrete strength simulations: Multi-nation data analytics. Constr Build Mater. https://doi.org/10.1016/j.conbuildmat.2014.09.054
Sadat Hosseini A, Hajikarimi P, Gandomi M et al (2021) Genetic programming to formulate viscoelastic behavior of modified asphalt binder. Constr Build Mater. https://doi.org/10.1016/j.conbuildmat.2021.122954
Nguyen TT, Pham Duy H, Pham Thanh T, Vu HH (2020) Compressive Strength Evaluation of Fiber-Reinforced High-Strength Self-Compacting Concrete with Artificial Intelligence. Adv Civ Eng. https://doi.org/10.1155/2020/3012139
Sultana N, Zakir Hossain SM, Alam MS, et al (2020) Soft computing approaches for comparative prediction of the mechanical properties of jute fiber reinforced concrete. Adv Eng Softw 149:. https://doi.org/10.1016/j.advengsoft.2020.102887
Willmott CJ (1981) On the validation of models. Phys Geogr. https://doi.org/10.1080/02723646.1981.10642213
Nash JE, Sutcliffe JV (1970) River flow forecasting through conceptual models part I - A discussion of principles. J Hydrol. https://doi.org/10.1016/0022-1694(70)90255-6
Gupta HV, Kling H, Yilmaz KK, Martinez GF (2009) Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J Hydrol. https://doi.org/10.1016/j.jhydrol.2009.08.003
Knoben WJM, Freer JE, Woods RA (2019) Technical note: Inherent benchmark or not? Comparing Nash-Sutcliffe and Kling-Gupta efficiency scores. Hydrol Earth Syst Sci. https://doi.org/10.5194/hess-23-4323-2019
Cheng MY, Chou JS, Roy AFV, Wu YW (2012) High-performance Concrete Compressive Strength Prediction using Time-Weighted Evolutionary Fuzzy Support Vector Machines Inference Model. Autom Constr. https://doi.org/10.1016/j.autcon.2012.07.004
Yaseen ZM, Deo RC, Hilal A et al (2018) Predicting compressive strength of lightweight foamed concrete using extreme learning machine model. Adv Eng Softw. https://doi.org/10.1016/j.advengsoft.2017.09.004
Yang L, Qi C, Lin X et al (2019) Prediction of dynamic increase factor for steel fibre reinforced concrete using a hybrid artificial intelligence model. Eng Struct. https://doi.org/10.1016/j.engstruct.2019.03.105
Qi C, Fourie A, Chen Q (2018) Neural network and particle swarm optimization for predicting the unconfined compressive strength of cemented paste backfill. Constr Build Mater. https://doi.org/10.1016/j.conbuildmat.2017.11.006
Chou J-S, Chiu C-K, Farfoura M, Al-Taharwa I (2010) Optimizing the Prediction Accuracy of Concrete Compressive Strength Based on a Comparison of Data-Mining Techniques. J Comput Civ Eng. https://doi.org/10.1061/(asce)cp.1943-5487.0000088
Deepa C, SathiyaKumari K, Sudha VP (2010) Prediction of the Compressive Strength of High Performance Concrete Mix using Tree Based Modeling. Int J Comput Appl. https://doi.org/10.5120/1076-1406
Erdal HI (2013) Two-level and hybrid ensembles of decision trees for high performance concrete compressive strength prediction. Eng Appl Artif Intell. https://doi.org/10.1016/j.engappai.2013.03.014
Yan K, Shi C (2010) Prediction of elastic modulus of normal and high strength concrete by support vector machine. Constr Build Mater. https://doi.org/10.1016/j.conbuildmat.2010.01.006
Rafiei MH, Khushefati WH, Demirboga R, Adeli H (2017) Supervised Deep Restricted Boltzmann Machine for Estimation of Concrete. ACI Mater J 114:. https://doi.org/10.14359/51689560
Yan K, Xu H, Shen G, Liu P (2013) Prediction of Splitting Tensile Strength from Cylinder Compressive Strength of Concrete by Support Vector Machine. Adv Mater Sci Eng. https://doi.org/10.1155/2013/597257
Anoop Krishnan NM, Mangalathu S, Smedskjaer MM et al (2018) Predicting the dissolution kinetics of silicate glasses using machine learning. J Non Cryst Solids. https://doi.org/10.1016/j.jnoncrysol.2018.02.023
Okuyucu H, Kurt A, Arcaklioglu E (2007) Artificial neural network application to the friction stir welding of aluminum plates. Mater Des. https://doi.org/10.1016/j.matdes.2005.06.003
Lim CH, Yoon YS, Kim JH (2004) Genetic algorithm in mix proportioning of high-performance concrete. Cem Concr Res. https://doi.org/10.1016/j.cemconres.2003.08.018
Haghdadi N, Zarei-Hanzaki A, Khalesian AR, Abedi HR (2013) Artificial neural network modeling to predict the hot deformation behavior of an A356 aluminum alloy. Mater Des. https://doi.org/10.1016/j.matdes.2012.12.082
Golafshani EM, Behnood A (2019) Estimating the optimal mix design of silica fume concrete using biogeography-based programming. Cem Concr Compos 96:95–105. https://doi.org/10.1016/J.CEMCONCOMP.2018.11.005
Naser MZ (2018) Deriving temperature-dependent material models for structural steel through artificial intelligence. Constr Build Mater 191:56–68. https://doi.org/10.1016/J.CONBUILDMAT.2018.09.186
Naser MZ (2019) Properties and material models for modern construction materials at elevated temperatures. Comput Mater Sci 160:16–29. https://doi.org/10.1016/J.COMMATSCI.2018.12.055
Mousavi SM, Aminian P, Gandomi AH et al (2012) A new predictive model for compressive strength of HPC using gene expression programming. Adv Eng Softw. https://doi.org/10.1016/j.advengsoft.2011.09.014
Gandomi AH, Alavi AH, Sahab MG (2010) New formulation for compressive strength of CFRP confined concrete cylinders using linear genetic programming. Mater Struct Constr. https://doi.org/10.1617/s11527-009-9559-y
Mollahasani A, Alavi AH, Gandomi AH (2011) Empirical modeling of plate load test moduli of soil via gene expression programming. Comput Geotech. https://doi.org/10.1016/j.compgeo.2010.11.008
Erdal HI, Karakurt O, Namli E (2013) High performance concrete compressive strength forecasting using ensemble models based on discrete wavelet transform. Eng Appl Artif Intell. https://doi.org/10.1016/j.engappai.2012.10.014
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing - EMNLP ’02
Galdi P, Tagliaferri R (2017) Data Mining: Accuracy and Error Measures for Classification and Prediction. In: Encyclopedia of Bioinformatics and Computational Biology
Valença J, Gonçalves LMS, Júlio E (2013) Damage assessment on concrete surfaces using multi-spectral image analysis. Constr Build Mater. https://doi.org/10.1016/j.conbuildmat.2012.11.061
Huang H, Burton HV (2019) Classification of in-plane failure modes for reinforced concrete frames with infills using machine learning. J Build Eng. https://doi.org/10.1016/j.jobe.2019.100767
Azimi SM, Britz D, Engstler M et al (2018) Advanced steel microstructural classification by deep learning methods. Sci Rep. https://doi.org/10.1038/s41598-018-20037-5
Hore S, Chatterjee S, Sarkar S, et al (2016) Neural-based prediction of structural failure of multistoried RC buildings. Struct Eng Mech. https://doi.org/10.12989/sem.2016.58.3.459
Bhowan U, Johnston M, Zhang M (2012) Developing new fitness functions in genetic programming for classification with unbalanced data. IEEE Trans Syst Man, Cybern Part B Cybern. https://doi.org/10.1109/TSMCB.2011.2167144
Boughorbel S, Jarray F, El-Anbari M (2017) Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS ONE. https://doi.org/10.1371/journal.pone.0177678
Tharwat A (2018) Classification assessment methods. Appl. Comput. Informatics
Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Jurman G, Riccadonna S, Furlanello C (2012) A comparison of MCC and CEN error measures in multi-class prediction. PLoS ONE. https://doi.org/10.1371/journal.pone.0041882
Powers DMW (2011) Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation. J Mach Learn Technol. 10.1.1.214.9232
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. https://doi.org/10.1016/S0031-3203(96)00142-2
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. https://doi.org/10.1148/radiology.143.1.7063747
Zhang Y, Burton HV, Sun H, Shokrabadi M (2018) A machine learning framework for assessing post-earthquake structural safety. Struct Saf. https://doi.org/10.1016/j.strusafe.2017.12.001
Davis J, Goadrich M (2006) The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd international conference on Machine learning - ICML ’06
Bi, J.; Bennett KPP (2003) Regression Error Characteristic Curves. Proc Twent Int Conf Mach Learn
Zhang M, Smart W (2006) Using Gaussian distribution to construct fitness functions in genetic programming for multiclass object classification. Pattern Recognit Lett. https://doi.org/10.1016/j.patrec.2005.07.024
Kocher M, Savoy J (2017) Distance measures in author profiling. Inf Process Manag. https://doi.org/10.1016/j.ipm.2017.04.004
Patel BV (2012) Content Based Video Retrieval Systems. Int J UbiComp. https://doi.org/10.5121/iju.2012.3202
Giusti R, Batista GEAPA (2013) An empirical comparison of dissimilarity measures for time series classification. In: Proceedings - 2013 Brazilian Conference on Intelligent Systems, BRACIS 2013
Vuk M, Curk T (2006) ROC Curve , Lift Chart and Calibration Plot. Metod Zv
Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010) The balanced accuracy and its posterior distribution. In: Proceedings - International Conference on Pattern Recognition
Cohen J (1960) A Coefficient of Agreement for Nominal Scales. Educ Psychol Meas. https://doi.org/10.1177/001316446002000104
Artstein R, Poesio M (2008) Inter-coder agreement for computational linguistics. Comput. Linguist.
Destercke S (2014) Multilabel Prediction with Probability Sets: The Hamming Loss Case. In: Communications in Computer and Information Science
Czajkowski M, Kretowski M (2019) Decision Tree Underfitting in Mining of Gene Expression Data. An Evolutionary Multi-Test Tree Approach. Expert Syst Appl. https://doi.org/10.1016/J.ESWA.2019.07.019
Devarriya D, Gulati C, Mansharamani V, et al (2019) Unbalanced Breast Cancer Data Classification Using Novel Fitness Functions in Genetic Programming. Expert Syst Appl 112866. https://doi.org/10.1016/J.ESWA.2019.112866
Bhaskar H, Hoyle DC, Singh S (2006) Machine learning in bioinformatics: A brief survey and recommendations for practitioners. Comput Biol Med. https://doi.org/10.1016/j.compbiomed.2005.09.002
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. Proc 14th Int Jt Conf Artif Intell - Vol 2
Alavi AH, Gandomi AH (2011) Prediction of principal ground-motion parameters using a hybrid method coupling artificial neural networks and simulated annealing. Comput Struct. https://doi.org/10.1016/j.compstruc.2011.08.019
Kingston GB, Maier HR, Lambert MF (2005) Calibration and validation of neural networks to ensure physically plausible hydrological modeling. J Hydrol. https://doi.org/10.1016/j.jhydrol.2005.03.013
Kuo YL, Jaksa MB, Lyamin AV, Kaggwa WS (2009) ANN-based model for predicting the bearing capacity of strip footing on multi-layered cohesive soil. Comput Geotech. https://doi.org/10.1016/j.compgeo.2008.07.002
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: ACM International Conference Proceeding Series. ACM Press, New York, USA, pp 161–168
Williams N, Zander S, Armitage G (2006) A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. Comput Commun Rev. https://doi.org/10.1145/1163593.s1163596
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
none.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Naser, M.Z., Alavi, A.H. Error Metrics and Performance Fitness Indicators for Artificial Intelligence and Machine Learning in Engineering and Sciences. Archit. Struct. Constr. 3, 499–517 (2023). https://doi.org/10.1007/s44150-021-00015-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s44150-021-00015-8