Abstract
Artificial intelligence (AI) has become an important topic in research as well as industry since its birth in the 1950s. Research on early approaches of machine learning has actually been going on since the 1940s. Still, the question often arises what is actually meant by AI, especially in practice. In this chapter, we give a brief introduction into artificial intelligence and more specifically machine learning (ML). We briefly summarise the history of artificial intelligence and machine learning, introduce the concepts of supervised learning, unsupervised learning, and reinforcement learning as the tree main types of ML algorithms and discuss how to measure the quality of machine learning algorithms. For the interested reader, the appendix of the chapter includes a brief description of artificial neural networks and machine learning metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anthalye, A., Engstrom, L., Ilyas, A., & Kwok, K. (2017). Fooling neural networks in the physical world with 3D adversarial objects. https://www.labsix.org/physical-objects-that-fool-neural-nets/
Bishop, C. (2006). Pattern recognition and machine learning. Springer-Verlag New York.
Brynjolfsson, E., & McAfee, A. (2017). The business of artificial intelligence. Harvard Business Review. https://hbr.org/cover-story/2017/07/the-business-of-artificial-intelligence
Buxmann, P., & Schmidt, H. (2019). Künstliche Intelligenz. Springer.
Carbonell, J. G., Boggs, W. M., Mauldin, M. L., & Anick, P. G. (1983). The XCALIBUR project: A natural language interface to expert systems. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI'83), Morgan Kaufmann Publishers Inc., 653–656.
Chaib-Draa, B., Moulin, B., Mandiau, R., & Millot, P. (1992). Trends in distributed artificial intelligence. Artificial Intelligence Review, 6(1), 35–66. https://doi.org/10.1007/BF00155579
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), 1, IEEE, 886-889. https://doi.org/10.1109/CVPR.2005.177.
Dartmouth College. (1956). Summer research project on artificial intelligence. In Volunteer officer experience (VOX) conference. USA.
Faraj, S., Pachidi, S., & Sayegh, K. (2018). Working and organizing in the age of the learning algorithm. Information and Organization, 28(1), 62–70. https://doi.org/10.1016/j.infoandorg.2018.02.005
Fawcett, T., & Provorst, F. (1997). Adaptive Fraud Detection. Data Mining and Knowledge Discovery, 1(3), 291–316. https://doi.org/10.1023/A:1009700419189
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Franklin, S., & Graesser, A. C. (1997). Intelligent agents III. Lecture notes on artificial intelligence (pp. 21–35). Springer-Verlag.
Goertzel, B. (2010). Toward a formal characterization of real-world general intelligence. Proceedings of the 3rd Conference on Artificial General Intelligence (AGI). Atlantis Press, 74-79. https://doi.org/10.2991/agi.2010.17.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Gouriveau, R., Medjaher, K., & Zerhouni, N. (2016). From prognostics and health systems management to predictive maintenance 1: Monitoring and prognostics. Wiley.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer-Verlag New York.
Hess, T., Legner, C., Esswein, W., Maaß, W., Matt, C., Österle, H., Schlieter, H., Richter, P., & Zarnekow, R. (2014). Digital life as a topic of business and information systems engineering? Business & Information Systems Engineering, 6(4), 247–253. https://doi.org/10.1007/s12599-014-0332-6
Hyndman, R., & Koehler, A. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679–688. https://doi.org/10.1016/j.ijforecast.2006.03.001
Kononenko, I. (2001). Machine learning for medical diagnosis: History, state of the art and perspective. Artificial Intelligence in Medicine, 23(1), 89–109. https://doi.org/10.1016/S0933-3657(01)00077-X
Korf, R. E. (1997). Does deep blue use artificial intelligence? ICGA Journal, 20(4), 243–245.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12), Curran Associates Inc (pp. 1097–1105). https://doi.org/10.1145/3065386
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. https://doi.org/10.1109/5.726791
Litjens, G., Kooi, T., Ehteshami Bejnordi, B., Setio, A. A. A., Ciompi, F., Ghafoorian, M., Van Der Laak, J. A., Van Ginneken, B., & Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60–88. https://doi.org/10.1016/j.media.2017.07.005
Manhart, K. (2017). Eine kleine Geschichte der Künstlichen Intelligenz. http://www.cowo.de/a/3330537
Marsland, S. (2014). Machine learning: An algorithmic perspective. Taylor & Francis, Inc.
Mertens, P. (1985). Künstliche Intelligenz in der Betriebswirtschaft. In D. Ohse, A. C. Esprester, H.-U. Küpper, P. Stähly, & H. Steckhan (Eds.), DGOR. Operations research proceedings (pp. 285–292). Springer. https://doi.org/10.1007/978-3-642-70457-4_71
Mitchell, T. M. (1997). Machine learning. McGraw-Hill.
Murphy, K. P. (2012). Machine learning: A probabilistic perspective. The MIT Press.
Newell, A., & Simon, H. (1958). Heuristic problem solving: The next advance in operations research. Operations Research, 6(1).
Nilsson, N. J. (2014). Principles of artificial intelligence. Tioga Press.
Odagiri, H., Nakamura, Y., & Shibuya, M. (1997). Research consortia as a vehicle for basic research: The case of a fifth generation computer project in Japan. Research Policy, 26(2), 191–207.
Pennachin, C., & Goertzel, B. (2007). Contemporary approaches to artificial general intelligence. In B. Goertzel & C. Pennachin (Eds.), Artificial General Intelligence (pp. 1–30). Springer. https://doi.org/10.1007/978-3-540-68677-4_1
Perrault, R., Shoham, Y., Brynjolfsson, E., Clark, J., Etchemendy, J., Grosz, B., Lyons, T., Manyika, J., Mishra, S., & Niebles, J. C. (2019). The AI index 2019 annual report. AI Index Steering Committee, Human-Centered AI Institute, Stanford University.
Pfitzner, D., Leibbrandt, R., & Powers, D. (2009). Characterization and evaluation of similarity measures for pairs of clusterings. Knowledge and Information Systems, 19(3), 361–394. https://doi.org/10.1007/s10115-008-0150-6
Polanyi, M. (1966). The tacit dimension. Peter Smith.
Powers, D. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37–63. https://doi.org/10.48550/arXiv.2010.16061
Reinsel, D., Gantz, J., & Rydning, J. (2018). The digitization of the world: From edge to core. IDC White Paper. https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
Rey, G. D., & Wender, K. F. (2018). Neuronale Netze. In Eine Einführung in die Grundlagen, Anwendungen und Datenauswertung. Hogrefe Verlag GmbH & Co. KG.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536. https://doi.org/10.1038/323533a0
Russell, S. J., & Norvig, P. (2010). Artificial intelligence: A modern approach. Prentice Hall Internationall, Inc.
Saul, L. K., & Roweis, S. T. (2003). Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4, 119–155.
Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417–457. https://doi.org/10.1017/S0140525X00005756
Shortlife, E. H., Davis, R., Axline, S. G., Buchanan, B. G., Green, C. C., & Cohen, S. N. (1975). Computer-based consultations in clinical therapeutics: Explanation and rule acquisition capabilities of the MYCIN system. Computers and Biomedical Research, 8, 303–320.
Zack, K. (2016). Sheepdog or mop? URL: https://twitter.com/teenybiscuit/status/707670947830968320/photo/1: Karen Zack via Twitter.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 Artificial Neural Networks
The basic idea behind the development of ANN is to simulate the (human) brain. In general, an ANN consists of nodes (neurons) and edges (synapses). As the following figure shows, three types of neurons are distinguished, which are also called units (Goodfellow et al., 2016; Rey & Wender, 2018):
-
Input units receive the input data, for example pixels in an image recognition algorithm or blood values when diagnosing diseases. Input units are denoted by x in Fig. 6.
-
Hidden units are located between input and output units and thus represent the inner layers of an ANN. They can be arranged in several layers one after the other and are denoted by h1…hn in Fig. 4.
Output units contain the output data, for example a classification dog or cat in an algorithm for the recognition of animals. These are marked with y in Fig. 4.
A simple neural network contains only one hidden layer and is often already sufficient for many applications. Deep neural networks have multiple hidden layers, while the necessary or good number of layers and neurons depends on the individual application.
As the figure shows, the neurons are connected by edges, expressed as arrows. If we denote two neurons with i and j respectively, wij expresses the weight along the edge between i and j (Fig. 5).
Ultimately, the acquired knowledge of an ANN is represented by these weights, which can be easily represented on the basis of matrices (Fig. 6).
The input that one neuron receives from others depends on the output of the sending neuron(s) and the weights along the edges. If Outputi denotes the activity level of a sending neuron i, then the input that a neuron j receives can be expressed as the sum over the weighted outputs of the neurons feeding it, adjusted with a bias offset value bj, as in the following equation.
The output of a neuron is based on the input and an activation function. Various function types are conceivable for this activation function a—in the simplest case it is linear.
The weights represent the knowledge of the ANN. These weights are modified based on learning rules. For example, when applying a supervised learning algorithm, the weights are modified or adjusted based on the training data. The most common procedure today is probably the so-called backpropagation method. Put simply, it works in such a way that errors in the initial layer are proportionately attributed to the error contributions of the hidden units involved and the weights are iteratively adjusted (Rumelhart et al., 1986).
1.2 Machine Learning Metrics
Different metrics exist to measure the quality of machine learning approaches, often depending on the type of approach that is used, e.g. classification, regression or deep learning. Note that a metric is different from a loss function. A loss function maps one or several variables to a real number and is often used as an objective function in mathematical optimization, for example. While metrics are usually used to measure the performance of an approach, a loss function is used to train a machine learning approach.
1.2.1 Classification Metrics
For classification problems several metrics exist, including accuracy, precision, recall and the F1 score. They can all be computed based on the CM (see Fig. 3).
1.2.1.1 Classification Accuracy
Classification accuracy is computed as the ratio of the number of correct predictions to the total number of input samples. While it is a simple metric, it is problematic when the costs of one type of misclassification are very high. If a patient is wrongly classified as non-cancerous, for example, it can have fatal consequences.
In general, accuracy can be computed as:
With respect to the CM, accuracy can be computed by taking the values on the main diagonal:
1.2.1.2 Detection Rate
The detection rate gives the percentage of correctly predicted trues (or 1 s) with respect to the total number of predictions:
1.2.1.3 Precision
The precision or the positive predicted value gives the percentage of correctly predicted 1 s with respect to all predicted 1 s:
1.2.1.4 Recall
A recall score measures the percentage of correctly predicted 1 s with respect to all actual 1 s. It is also called sensitivity or true positive rate:
1.2.1.5 Specificity
The specificity is also called the true negative rate. It determines the percentage of all 0 s that were correctly predicted:
1.2.1.6 Balanced Accuracy
The balanced accuracy is computed as the mean of recall and specificity and therefore balances the percentages of correctly predicted 1 s and 0 s:
1.2.1.7 F1 Score
The F scores combine the precision and recall metrics. In general, an F score for a value β can be computed as:
In the special case for β = 1, the F1 score is the harmonic mean between precision and recall. The range for the F1 score is [0, 1]. The greater the F1 score, the better the performance of the model. F1 can be computed as:
1.2.2 Regression Metrics
Typical regression metrics are the mean absolute error and the mean squared error.
1.2.2.1 Mean Absolute Error
The mean absolute error is equal to the average of the absolute differences between the original values vi and the predicted values wi. It expresses how far the predictions were from the actual values. However, it does not give the direction of the error, i.e. whether data was over or under predicted. With N denoting the number of values, it can be computed as:
1.2.2.2 Mean Squared Error
The mean squared error and the mean absolute error are comparably similar. The only difference is that the mean squared error uses the average of the squares of difference between the original and the predicted values. Due to taking the square of the error, larger errors become more dominant compared to smaller errors. Therefore, when using the mean squared error, the focus is on larger errors:
The root mean squared error takes the square root of the average of the squares of difference between the original and the predicted values and is therefore also sensitive to outliers.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Reuter-Oppermann, M., Buxmann, P. (2022). Introduction into Artificial Intelligence and Machine Learning. In: Reinhold, T., Schörnig, N. (eds) Armament, Arms Control and Artificial Intelligence. Studies in Peace and Security. Springer, Cham. https://doi.org/10.1007/978-3-031-11043-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-11043-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11042-9
Online ISBN: 978-3-031-11043-6
eBook Packages: Political Science and International StudiesPolitical Science and International Studies (R0)