Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In the summer of 2015 we will commemorate the 50th anniversary of the theory of Fuzzy Sets and Systems (FSS). This is a fuzzy jubilee because the exact time when Lotfi A. Zadeh (born 1921, Fig. 1 (a)) discovered the concept of fuzzy sets is unknown. The event is roughly assigned to the summer of 1965. This chapter reviews the genesis of this new mathematical theory following my original (historical) research work that encompasses inspections of scientific articles, newspapers, letters, and – most importantly – interviews with Lotfi A. Zadeh and other protagonists of the early years of the theory of FSS. For details see [1].

In Sects. 2 and 3 we briefly follow Zadeh’s development as an electrical engineer in the 1950s and 1960s from concrete Network theory and Filter theory to abstract System theory and then to the theory of Fuzzy sets and Systems as a generalized System theory. In Sect. 4, we present short views on historical paths of modern logic and philosophy of mathematics in the 20th century, the logical analysis of vagueness, the concepts of statistical metrics and ensembles flous and we consider Wittgenstein’s late philosophy in relation to our subject. Section 5 gives a review of Zadehs works on Fuzzy Sets in language and meaning that appeared before the “Fuzzy Boom” with real-world application systems that is the subject of Sect. 6. Section 7 gives an outlook on Zadeh’s later theories: Computing with words and with perceptions. Finally the bibliogrpahy includes some comments to most of the references to this book contribution.

2 From Electrical Engineering to System Theory

Fuzzy Sets and Systems can look back upon an eventful story in the scientific environment of electrical engineering, including the initial system theory and computer sciences known during this time, which were part of Zadeh’s training as a student in Tehran, Iran. Following his immigration to the USA in 1942, Zadeh continued his studies at the Massachusetts Institute of Technology (MIT) in Cambridge, Massachusetts. He moved to New York in 1946, where he was awarded a Ph.D by Columbia University in 1949. Since 1958, he has been a Professor of Electrical Engineering at the University of California at Berkeley. When he established the theory of fuzzy sets in the mid-1960s, he was already a well-known protagonist of the system theoretical approach in electrical engineering, which was a new scientific trend from the 1950s onward. Together with Charles A. Desoer (1926–2011, Fig. 1 (b)) he published in 1962 Linear System Theory: The State Space Approach [2], which became a standard textbook. In 1963, together with Desoer’s former Ph.D. student Elijah Polak (born 1931, Fig. 1 (c)) he edited the volume System Theory [3]. In his own contribution to this volume, which was entitled “The Concepts of System, Aggregate, and State in System Theory” [4], Zadeh presented his state space approach. Two years later, when he introduced fuzzy sets, he construed his new theory as a “general system theory”.

In 1954 – Zadeh was then an instructor at Columbia University in New York - he wrote for the Columbia Engineering Quarterly an article, named “System Theory” [5] that begins as follows: “If you never heard of system theory, you need not feel like an ignoramus. It is not one of the well-established branches of science. In fact, it has not yet been officially recognized as a scientific discipline. It does not appear on programmes of meetings of scientific societies nor in indices to scientific publications. It does not have well-defined boundaries, nor does it have settled objectives.” [5, p. 34]. Zadeh emphasized that all scientific disciplines are concerned with systems, but the new branch, named system theory, considers systems as mathematical constructs rather than physical objects: “The distinguishing characteristic of system theory is its abstractness.” [5, p. 16] In this and later papers Zadeh quoted the definition of a “system” from Webster’s Dictionary: A system is “an aggregation or assemblage of objects united by some form of interaction or interdependence.” (Fig. 2)

System theorists deal with abstract systems, “that is, systems whose elements have no particular physical identity” [5, p. 16]; they deal with “black boxes”. Fig. 3 reproduces the illustration to this article – actually a “black box”!

Communication systems are a special type of systems that have been of interest since the 1950s, when information and communication theory emerged as successful scientific and technological disciplines. Zadeh was deeply involved in the development of this new communication theory and its techniques when he delivered a lecture on “Some Basic Problems in Communication of Information” at the meeting of the Section of Mathematics and Engineering of the New York Academy of Sciences in March 1952 [6]. He represented signals as ordered pairs \((x(t), y(t))\) of points in a signal space \(\varSigma \), which is imbedded in a function space with a delta-function basis. This analogy between projection in a function space and filtration by an ideal filter led Zadeh to postulate a function symbolism of filters in the early 1950s [7]. Thus, \(N=N_1 + N_2\) represents a filter consisting of two filters connected by addition, \(N=N_1 N_2\) represents their tandem (sequential) combination and \(N=N_1 \left| N_2\right. \) the separation process (Fig. 4).

Fig. 1
figure 1

Lotfi A. Zadeh, Charles A. Desoer, and Elijah Polak: colleagues at the University of California, Berkeley

Fig. 2
figure 2

Block diagram of interconnected objects, [5, p. 17]

Later (in [8]), Zadeh discussed the concept of optimal filters as opposed to ideal filters following Norbert Wiener’s work. Ideal filters are defined as filters that achieve a perfect separation of signal and noise. However, in reality there are no ideal filters, e.g., an ideal low-pass filter should retain all frequency components until a certain threshold until which all components should be fully suppressed. In practice, we cannot have such a step-shaped separation. Zadeh knew that we get a smooth transition from \(1\) to \(0\) transmission coefficient as the frequency decreases. These transitions are similar to the well-known membership functions of fuzzy sets. However, in the 1950s the time was not ripe for this new mathematical theory. Zadeh defined optimal filters as those that give the “best approximation” of a signal, and he noted that “best approximations” depend on reasonable criteria. At this time he formulated these criteria in statistical terms.

Fig. 3
figure 3

Illustration from Zadeh’s article [5]

Fig. 4
figure 4

Functional symbolism of ideal filters, [7, p. 225]

But starting in the next decade he wrote the landmark article “From Circuit Theory to System” for the anniversary edition of the Proceedings of the IRE that appeared in May 1962 to mark the 50th year of the Institute of Radio Engineers (IRE) [9]. Here, he could outline problems and applications of system theory and its relations to network theory, control theory, and information theory. Furthermore, he pointed out “that the same abstract ‘systems’ notions are operating in various guises in many unrelated fields of science is a relatively recent development. It has been brought about, largely within the past two decades, by the great progress in our understanding of the behaviour of both inanimate and animate systems—progress which resulted on the one hand from a vast expansion in the scientific and technological activities directed toward the development of highly complex systems for such purposes as automatic control, pattern recognition, data-processing, communication, and machine computation, and, on the other hand, by attempts at quantitative analyses of the extremely complex animate and man-machine systems which are encountered in biology, neurophysiology, econometrics, operations research and other fields” [9, p. 856f].

In this article Zadeh used for the very first time the word “fuzzy” and he wrote it down to characterize his vision of new mathematics:

In fact, there is a fairly wide gap between what might be regarded as ‘animate’ system theorists and ‘inanimate’ system theorists at the present time, and it is not at all certain that this gap will be narrowed, much less closed, in the near future. There are some who feel that this gap reflects the fundamental inadequacy of the conventional mathematics – the mathematics of precisely-defined points, functions, sets, probability measures, etc. – for coping with the analysis of biological systems, and that to deal effectively with such systems, which are generally orders of magnitude more complex than man-made systems, we need a radically different kind of mathematics, the mathematics of fuzzy or cloudy quantities which are not describable in terms of probability distributions. [9, p. 857f].

However, when Zadeh published these notions, he did not know what this mathematics of fuzzy quantities would look like.

Another method to deal with imperfect or noisy signals in communication systems was introduced in the 1950s by Richard E. Bellman (1920–1984, Fig. 5 (a)), a young mathematician working at the RAND Corporation, United States Air Force Project in Santa Monica, California. Bellman was the founder of the method of Dynamic Programming [10], and tried to apply his “principle of optimality” in communication theory.

In the late 1950s, Bellman met Zadeh in New York, where Zadeh worked at Columbia University. Their friendship lasted until Bellman’s death in 1984. Even though they considered diverse mathematical aspects of electrical engineering, system theory and, later, computer science, they met each other very often and discussed several aspects of their scientific work. Bellman was the first and most important critic of Zadeh’s new theory of fuzzy sets in 1965.

Fig. 5
figure 5

(a): Richard E. Bellman and (b): Robert E. Kalaba, (c): Lotfi A. Zadeh

3 New View on System Theory: Fuzzy Sets and Systems

Zadeh and Bellman planned to work together at RAND in Santa Monica, California, in the summer of 1964. Prior to the summer of 1964, Zadeh gave a talk on pattern recognition at the Wright-Patterson Air Force Base, Dayton, Ohio. It may have been on this occasion that he started thinking about the use of grades of membership for pattern classification and that he conceived the first example of fuzzy mathematics, which he wrote in one of his first papers on the subject: “For example, suppose that we are concerned with devising a test for differentiating between handwritten letters \(O\) and \(D\). One approach to this problem would be to give a set of handwritten letters and indicate their grades of membership in the fuzzy sets \(O\) and \(D\). On performing abstraction on these samples, one obtains the estimates \(\tilde{\mu }_O\) and \(\tilde{\mu }_D\) of \(\mu _O\) and \(\mu _D\), respectively. Then given a letter \(x\) which is not one of the given samples, one can calculate its grades of membership in \(O\) and \(D\); and, if \(O\) and \(D\) have no overlap, classify \(x\) in \(O\) or \(D\).” [11, p. 30]

In a few days he extended this concept to the theory of fuzzy sets and a few weeks later, he discussed this preliminary version of the theory of fuzzy sets with Bellman. Then he wrote his manuscript on “Fuzzy Sets” [12] and submitted it to the journal Information and Control in November 1964. “Fuzzy Sets” appeared in June 1965 and was the first article on fuzzy sets in a scientific journal. However, Lotfi Zadeh also wrote other papers at the time. According to common practice at the department of electrical engineering in Berkeley, the article “Fuzzy Sets” was preprinted as a report of the Electronics Research Laboratory in November 1964 [13]. As a result of his talk in Dayton, Ohio, he wrote a paper which he sent to Bellman who was the editor of the Journal of Mathematical Analysis and Applications. Bellman agreed to publish the paper in said journal but the publication appeared late, in 1966, under the title “Abstraction and Pattern Classification” [14]. The authors of the article were Bellman, his associate Robert E. Kalaba (1926–2004, Fig. 5 (b)) and Zadeh. The text and the authors’ names are identical to those of the RAND memorandum RM-4307-PR, which appeared as early as October 1964. This memo was written by Zadeh alone. Here he defined fuzzy sets for the first time in a scientific paper, establishing a general framework for the treatment of pattern recognition problems [15].

In April 1965 the Symposium on System Theory was held at Polytechnic Institute in Brooklyn. At this meeting Zadeh presented “A New View on System Theory”: a view that deals with the concepts of fuzzy sets, “which provide a way of treating fuzziness in a quantitative manner.” In the subsequent publication of the proceedings of this symposium we find a shortened manuscript version of the talk. His contribution was entitled “Fuzzy Sets and Systems” in this publication: [11, p. 29]. In this lecture and in the paper, Zadeh first defined “fuzzy systems” as follows:

A system \(S\) is a fuzzy system if (input) \(u(t)\), output \(y(t)\), or state \(s(t)\) of \(S\) or any combination of them ranges over fuzzy sets, [11, p. 33].

In “Fuzzy Sets and Systems” Zadeh explained that “these concepts relate to situations in which the source of imprecision is not a random variable or a stochastic process but rather a class or classes which do not possess sharply defined boundaries.” [11, p. 29] His “simple” examples in this brief summary of his new “way of dealing with classes in which there may be intermediate grades of membership” were “the ‘class’ of real numbers which are much larger than, say, \(10\), and “the ‘class’ of ‘bald man’, and also the ‘class’ of adaptive systems.” [11, p. 29] For further details on the roots of fuzzy systems in system theory, the reader is referred to the author’s book [1].

In “Fuzzy Sets” [12], Zadeh introduced the new mathematical entities “fuzzy sets”: “Such classes are not classes or sets in the usual sense of these terms, since they do not dichotomize all objects into those that belong to the class and those that do not.” He introduced “the concept of a fuzzy set, that is a class in which there may be a continuous infinity of grades of membership, with the grade of membership of an object \(x\) in a fuzzy set \(A\) represented by a number \(f_A (x)\) in the interval \([0,1]\).”Footnote 1 Zadeh maintained that these new concepts provide a “convenient way of defining abstraction — a process which plays a basic role in human thinking and communication.” [11, p. 29] The question was how to generalize various concepts, union of sets, intersection of sets, and so forth. Zadeh defined equality, containment, complementation, intersection and union relating to fuzzy sets \(A\), \(B\) in any universe of discourse \(X\) as follows (for all \(x \in X\); see Fig. 6):

  • \(A = B\) if and only if \(f_A (x) = f_B (x)\),

  • \(A \subseteq B\) if and only if \(f_A (x) \le f_B (x)\),

  • \(\lnot A\) is the complement of \(A\) if and only if \(f_{\lnot A} (x) = 1 -f_A (x)\),

  • \(A = \cup B\) if and only if \(f_{A \cap B} (x) = max (f_A (x), f_B (x))\),

  • \(A = \cap B\) if and only if \(f_{A \cap B} (x) = min (f_A (x), f_B (x))\),

Fig. 6
figure 6

Illustration of the union as maximum of membership functions \(f_A\) and \(f_B\) (\(1\), \(2\)) and the intersection as minimum of membership functions \(f_A\) and \(f_B\) (\(3\), \(4\)), [12]

For his interpretation of fuzzy unions and intersections he wrote a separate paragraph, which shows a very important analogy to sieves, because Zadeh wrote: “Specifically, let \(f_i (x), i=1, \ldots , n\), denote the value of the membership function of \(A_i\) at \(x\). Associate with \(f_i (x)\) a sieve \(S_i (x)\) whose meshes are of size \(f_i (x)\). Then, \(f_i (x) \vee f_j (x)\) and \(f_i (x) \wedge f_j (x)\) correspond, respectively, to parallel and series combinations of \(S_i (x)\) and \(S_j (x)\) ,” as shown in Fig. 7.

More generally, a well-formed expression involving \(A_i, \ldots , A_n \cup \text{ and } \cap \) corresponds to a network of sieves \(S_i (x), \ldots , S_n (x)\) which can be found by the conventional synthesis techniques for switching circuits. As a very simple example,

$$\begin{aligned} C=\left[ (A_1 \cup A_2 )\cap A_3 \right] \cup A_4 \end{aligned}$$
(1)

corresponds to the network shown in Fig. 8.” [12, p. 344]

If the reader takes into account the fact that the term “sieve” denotes a filter, he will comprehend the analogy of fuzzy sets and electrical filters as outlined in the first section.

In the decade that followed the first publications on Fuzzy Sets and Systems [1115] Zadeh expected that they would have a role in the future of computer systems as well as humanities and social sciences. At that time nobody thought that this theory would be successful in the field of applied sciences and technology. Quite the contrary, he remarked that he did not expect the incorporation of FSS into the fields of sciences and engineering: “What we still lack, and lack rather acutely, are methods for dealing with systems which are too complex or too ill-defined to admit of precise analysis. Such systems pervade life sciences, social sciences, philosophy, economics, psychology and many other ‘soft’ fields.” [16]

Fig. 7
figure 7

Parallel and serial combination of sieves illustrating the fuzzy union, \(\cup \), (maximum) and intersection, \(\cap \), (minimum), [12]

Fig. 8
figure 8

A network of sieves simulating \(\left\{ \left( f_1 (x) \vee f_2 (x) \right) \wedge f_3 (x) \big \}\right. \) \(\left. \vee f_4 (x)\right. \), [12]

After “Fuzzy Sets” had appeared in print, Zadeh received many requests for offprints. Philosopher Max Black (1909–1988) who had published a paper entitled “Vagueness – An Exercise in Logical Analysis” [17] back in 1937 anticipated a vague idea from Zadeh’s theory , for he wrote:

The vagueness of the word chair is typical of all terms whose application involves the use of the senses. In all such cases, ‘borderline case’ and ‘doubtful objects’ are easily found to which we are unable to say either that the class name does or does not apply. [17, p. 434]

He now told Zadeh in a letter:

You were good enough to send me, some time ago, some of your recent papers on topics connected with ‘Fuzzy Sets’. If I have not written before, the reason has not been lack of interests, but an inescapable press of other duties. Now that I have had a chance, at least, to study your work, I want to express my admiration and interest. I believe that your ingenious construction promises to provide intellectual tools of great value. In case you have not come across it, I might draw your attention to an early article of mine ... [18]

He referred to his article [17], that was also already reprinted in his book, Language and Philosophy [19] and also to his “more recent article on similar topics ...”: “Reasoning with Loose Concepts” [20]. To understand the history of the logical analysis of vagueness let’s go back to the history of modern science and its philosophy!

4 Philosophy of Science, Vagueness and Fuzzy Sets

Beginning as early as the 17th century, a primary quality factor in scientific work has been a maximal level of exactness. Galileo Galilei (1564–1642) and René Descartes (1596–1650) started the process of giving modern science its preciseness through the use of the tools of logic and mathematics. The language of mathematics has served as a basis for the definition of theorems, axioms, and proofs. The works of Isaac Newton (1643–1727), Gottfried Wilhelm Leibniz (1646–1769, Pierre-Simon Laplace (1749–1827), and many others led to the ascendancy of modern science, fostering the impression that scientists were able to represent – completely and exactly – all the facts and processes that people observe in the world. But this optimism has gradually begun to seem somewhat nave in view of the discrepancies between the exactness of theories and what scientists observe in the real world. From the empiricist point of view the source of our knowledge is sense experience. John Locke (1632–1704) used the analogy of the mind of a newborn baby as a “tabula rasa” that would be written by the sensual perceptions the child has later. In Locke’s opinion these perceptions provide information about the physical world. Locke’s view is called “material empiricism” whereas so-called idealistic empiricism was the position of George Berkeley (1685–1753) and David Hume (1711–1776): the material world does not exist; only perceptions are real. Immanuel Kant (1724–1804, Fig. 9 (a)) achieved a synthesis of rationalism and empiricism in his magnum opus Critique of Pure Reason, published in 1781 [21]). Kant argued that human experience of a world is only possible if the mind provides a systematic structuring of its representations that is logically prior to the mental representations that were analyzed by empiricists and rationalists. With these philosophical views alone, we would not be able to explain the nature of our experience because these views only considered the results of the interaction between our mind and the world, but not the contribution made by the mind. Kant concluded that it must be the minds structuring that makes experience possible.

4.1 Heinrich Hertz’ Philosophy of Science

In his book The Principles of Mechanics Presented in a New Form the German physicist Heinrich Hertz (1857–1894, Fig. 9 (b)) had established his theory of knowledge: he viewed physical theories as “pictures” of reality. He began his introduction with the following words:

The most direct, and in a sense the most important, problem which our conscious knowledge of nature should enable us to solve is the anticipation of future events, so that we may arrange our present affairs in accordance with such anticipation. As a basis for the solution of this problem we always make use of our knowledge of events which have already occurred, obtained by chance observation or by pre-arranged experiment. In endeavoring thus to draw inferences as to the future from the past, we always adopt the following process. We form for ourselves images or symbols of external objects; and the form which we give them is such that the necessary consequents of the images in thought are always the images of the necessary consequents in nature of the things pictured. [...]

The images which we here speak of are our conceptions of things. With the things themselves they are in conformity in one important respect, namely, in satisfying the above-mentioned requirement. For our purpose it is not necessary that they should be in conformity with the things in any other respect whatever. As a matter of fact, we do not know, nor have we any means of knowing, whether our conceptions of things are in conformity with them in any other than this one fundamental respect. [22, p. 1]

We know from experience the conformity between nature and our mind that is necessary for that: (logically) inadmissible images are “all images which implicitly contradict the laws of our thought.” Although images are logically admissible, they can be incorrect “if their essential relations contradict the relations of external things.”

For one external object there can exist more than one correct image, differing in respect to appropriateness:

Of two images of the same object that is the more appropriate which pictures more of the essential relations of the object, the one which we may call the more distinct. Of two images of equal distinctness the more appropriate is the one which contains, in addition to the essential characteristics, the smaller number of superfluous or empty relations, the simpler of the two. [22, p. 2]

Hertz’s epistemology and his view of scientific theories as mind-created “images”, based on the scientist’s experience, was contrary to the dominant view at his time. Most scientists during the years around the turn of the 20th century regarded empirical theories as objective, and in particular, most of them believed in the existence of one unique theory. On the other hand, Hertz knew from the experience he had gathered in the genesis of electrodynamics that various theories with different systems of concepts are possible, and that one theory may eventually become accepted. In his “Language of images”, he wrote:

What enters into the images for the sake of correctness is contained in the results of experience, from which the images are built up. What enters into the images, in order that they may be permissible, is given by the nature of our mind. To the question whether an image is permissible or not, we can without ambiguity answer yes or no; and our decision will hold good for all time. And equally without ambiguity we can decide whether an image is correct or not; but only according to the state of our present experience, and permitting an appeal to later and riper experience. But we cannot decide without ambiguity whether an image is appropriate or not; as to this differences of opinion may arise. One image may be more suitable for one purpose, another for another; only by gradually testing many images can we finally succeed in obtaining the most appropriate. [22, p. 3]

Hertz spoke about “images” or “symbols” of external objects, because they are replacements for concepts in physical theories (e.g., mechanics, electricity and magnetism, and electrodynamics) that are not accessible to our sensory perceptions.

4.2 Wittgenstein’s Tractatus logico-philosophicus

“A picture is a model of reality.”, “We picture facts to ourselves.”, “A picture is a fact.” These are three consecutive propositions in Ludwig Wittgenstein’s Tractatus logico-philosophicus [23, prop. 2.1, 2.12, 2.141]. They demonstrate the influence of Heinrich Hertz’s Principles of Mechanics on his thinking – a debt that Wittgenstein himself acknowledged when he referred to Hertz in his diary [24, p. 476] and explicitly in another part of the Tractatus: “In the proposition there must be exactly as many things distinguishable as there are in the state of affairs which it represents. They must both possess the same logical (mathematical) multiplicity (cf. Hertz’s Mechanics, on Dynamic Models).” [23, prop. 4.04]

The first two propositions in Wittgenstein’s Tractatus are:

  1. 1.

    The world is everything that is the case.

  2. 2.

    The world is the totality of facts, not of things. [23]

Fig. 9
figure 9

(a): Immanuel Kant (b): Heinrich Hertz and (c): Ludwig Wittgenstein

Then, in the Tractatus, Wittgenstein (1889–1951, Fig. 9 (c)) wrote that the world consists of facts. Facts may or may not contain smaller parts. If a fact has no smaller parts, he calls it an “atomic fact.” If we know all atomic facts, we can describe the world completely by corresponding “atomic propositions.” – Propositions \(3\) and \(4\) in the Tractatus are:

  1. 3.

    The logical picture of the facts is the thought.

  2. 4.

    The thought is the significant pro position. [23]

“The totality of propositions is language.” [23, prop. 4.001] Wittgenstein argued that sentences in colloquial language are very complex. He conceded that there is a “silent adjustment to understand colloquial language” but it is “enormously complicated.” Therefore it is “humanly impossible to gather immediately the logic of language.” [23, prop. 4.002] This is the task of philosophy: “All philosophy is ‘Critique of language’.” [23, prop. 4.0031] Wittgenstein knew that common linguistic usage is vague, but at the time when he wrote the Tractatus, he tried to solve this problem by constructing a precise language – an exact logical language that gives a unique picture of the real world. Wittgenstein thought that the Tractatus solved all philosophical problems.

But the Tractatus spared many problems for the future! One of these philosophical problems concerns the vagueness in our language and also two founders of modern logic, Gottlob Frege (1848–1925) and Bertrand Russell (1872–1970), focused attention on and analyzed this problem. A separate and isolated development took place at the Lvov-Warsaw School of logicians; one of them was Jan Łukasiewicz (1878–1956) who introduced in 1920 a three-valued logic and later other multi-valued logics. Simultaneously the American mathematician, Emil L. Post (1897–1954), also introduced a logic of additional truth degrees with \(n \ge 2\), where \(n\) are the truth values.

The important contributions of these Polish mathematicians and logicians to modern logic were recognized when Alfred Tarski (1902–1983) followed an invitation of the Viennese mathematician Karl Menger to give a lecture in Vienna. All these thinkers had been influenced by Frege’s studies. This was especially true of Tadeusz J. S. Kortabinśki (1866–1938), who argued that a concept for a property is vague (Polish: chwiejne) if the property may be the case by grades [25] and Kazimierz Ajdukiewicz (1890–1963), who stated the definition that “a term is vague if and only if its use in a decidable context \(\ldots \) will make the context undecidable in virtue of those [language] rules” [26]. The Polish characterization of “vagueness” was therefore the existence of fluid boundaries.

4.3 Vagueness and Logic

The German philosopher and mathematician Gottlob Frege confronted the problem of vagueness when formalizing the mathematical principle of complete induction: he saw that some predicates are not inductive, viz. they have been defined for all natural numbers, but they result in false conclusions, e.g. the predicate “heap” cannot be evaluated for all natural numbers [27]. When he revised the basics of his Begriffsschrift for a lecture to the Society of Medicine and Science in Jena, Germany, of the beginning year 1891, Frege reinterpreted concept functions and subsequently he introduced these functions of concepts everywhere. He stated: If “\(x+1\)” is meaningless for all arguments \(x\), then the function \(x+1=10\) has no value and no truth value either. Thus, the concept “that which when increased by \(1\) yields \(10\)” would have no sharp boundaries. Accordingly, for functions the demand on sharp boundaries entails that they must have a value for every argument [28]. This is a mathematical verbalization of what is called the classical sorites paradox that can be traced back to the old Greek word \(\sigma o\rho \bar{\omega \varsigma }\) (for “heap”) used by Eubulid of Alexandria (4th century BC). In his Grundgesetze der Arithmetik (Foundations of Arithmetic) that appeared in the years 1893–1903, Frege called for concepts with sharp boundaries, because otherwise we could break logical rules and, moreover, the conclusions we draw could be false [29]. Frege’s specification of vagueness as a particular phenomenon influenced other scholars, notably his British contemporary and counterpart the philosopher and mathematician Bertrand Russell (1872–1970, Fig. 9 (b)), who published his article on “Vagueness” in 1923 [30].

Russell quoted the sorites—in fact, he did not use mathematical language in this article, but, for example, discussed colours and “bald men” (Greek: \(\varphi \alpha \lambda \alpha \kappa \rho o \varsigma \), falakros, English: fallacy, false conclusion):

Let us consider the various ways in which common words are vague, and let us begin with such a word as red. It is perfectly obvious, since colours form a continuum, that there are shades of color concerning which we shall be in doubt whether to call them red or not, not because we are ignorant of the meaning of the word red, but because it is a word the extent of whose application is essentially doubtful. This, of course, is the answer to the old puzzle about the man who went bald. It is supposed that at first he was not bald, that he lost his hairs one-by-one, and that in the end he was bald; therefore, it is argued, there must have been one hair the loss of which converted him into a bald man. This, of course, is absurd. Baldness is a vague conception; some men are certainly bald, some are certainly not bald, while between them there are men of whom it is not true to say they must either be bald or not bald. [30, p. 85].

Russell also argued that a proper name – and here we can take as an example the name “Lotfi Zadeh” – cannot be considered to be an unambiguous symbol even if we believe that there is only one person with this name. Lotfi Zadeh “was born, and being born is a gradual process. It would seem natural to suppose that the name was not attributable before birth; if so, there was doubt, while birth was taking place, whether the name was attributable or not. If it be said that the name was attributable before birth, the ambiguity is even more obvious, since no one can decide how long before birth the name become attributable.” [30, p. 86]

Fig. 10
figure 10

(a): Bertrand Russell, (b): Max Black and (c): Karl Menger

Russell reasoned “that all words are attributable without doubt over a certain area, but become questionable within a penumbra, outside which they are again certainly not attributable.” [30, p. 86f] Then he generalized that words of pure logic also have no precise meanings, e.g. in classical logic the composed proposition “\(p\) or \(q\)” is false only when \(p\) and \(q\) are false and true elsewhere. He went on to claim that the truth values “ ‘true’ and ‘false’ can only have a precise meaning when the symbols employed – words, perceptions, images \(\ldots \) – are themselves precise”. As we have seen above, this is not possible in practice, so he concludes “that every proposition that can be framed in practice has a certain degree of vagueness; that is to say, there is not one definite fact necessary and sufficient for its truth, but certain region of possible facts, any one of which would make it true. And this region is itself ill-defined: we cannot assign to it a definite boundary.” Russell emphasized that there is a difference between what we can imagine in theory and what we can observe with our senses in reality: “All traditional logic habitually assumes that precise symbols are being employed. It is therefore not applicable to this terrestrial life, but only to an imagined celestial existence.” [30, p. 88f]. He proposed the following definition of accurate representations:

One system of terms related in various ways is an accurate representation of another system of terms related in various other ways if there is a oneone relation of the terms of the one to the terms of the other, and likewise a oneone relation of the relations of the one to the relations of the other, such that, when two or more terms in the one system have a relation belonging to that system, the corresponding terms of the other system have the corresponding relation belonging to the other system.” And in contrast to this, he stated that “a representation is vague when the relation of the representing system to the represented system is not oneone, but onemany. [30, p. 89]

He concluded that “Vagueness, clearly, is a matter of degree, depending upon the extent of the possible differences between different systems represented by the same representation. Accuracy, on the contrary, is an ideal limit.” [30, p. 90].

The Cambridge philosopher and mathematician Max Black (1909–1988, Fig. 9 (b)) responded to Russell’s article in his already mentioned article of 1937 [17]. He differentiated vagueness from ambiguity, generality, and indeterminacy. He emphasized

that the most highly developed and useful scientific theories are ostensibly expressed in terms of objects never encountered in experience. The line traced by a draughtsman, no matter how accurate, is seen beneath the microscope as a kind of corrugated trench, far removed from the ideal line of pure geometry. And the ‘point-planet’ of astronomy, the ‘perfect gas’ of thermodynamics, and the ‘pure species’ of genetics are equally remote from exact realization.” [17, p. 427]

Black proposed a new method to symbolize vagueness: “a quantitative differentiation, admitting of degrees, and correlated with the indeterminacy in the divisions made by a group of observers.” [17, p. 441] He assumed that the vagueness of a word involves variations in its application by different users of a language and that these variations fulfill systematic and statistical rules when one symbol has to be discriminated from another. He referred to situations in which a user of the language makes a decision whether to apply \(L\) or \(\lnot L\) to an object \(x\). Black exemplified: “Such a situation arises, for instance, when an engine driver on a foggy night is trying to decide whether the light in the signal box is really a red or a green light” [17, p. 442] He defined this discrimination of a symbol \(x\) with respect to a symbol \(L\) by \(DxL\). (We obtain \(DxL = Dx \lnot L\) by definition.) Most speakers of a language and the same observer in most situations will determine that either \(L\) or \( \lnot L\) is used. In both cases, among competent observers there is a certain unanimity, a preponderance of correct decisions. For all \(DxL\) with the same \(x\) but not necessarily the same observer, \(m\) is the number of \(L\) uses and \(n\) the number of \( \lnot L\) uses. On this basis, Black stated the following definition (see Fig. 11):

Fig. 11
figure 11

Consistency of application of a typical vague symbol, [17, p. 443]

We define the consistency of application of \(L\) to \(x\) as the limit to which the ratio \(\frac{m}{n}\) tends when the number of \(DxL\) and the number of observers increase indefinitely. \([\ldots ]\) Since the consistency of the application, \(C\), is clearly a function of both \(L\) and \(x\), it can be written in the form \(C(L,x)\).” [17, p. 442]

In 1963, Black labeled concepts without precise boundaries as “loose concepts” rather than “vague” ones, in order to avoid misleading and pejorative implications [20]. Once again he expressly rejected Russell’s assertion that traditional logic is “not applicable” as a method of conclusion for vague concepts: “Now, if all empirical concepts are loose, as I think they are, the policy becomes one of abstention from any reasoning from empirical premises. If this is a cure, it is one that kills the patient. If it is always wrong to reason with loose concepts, it will, of course, be wrong to derive any conclusion, paradoxical or not, from premises in which such concepts are used. A policy of prohibiting reasoning with loose concepts would destroy ordinary language – and, for that matter, any improvement upon ordinary language that we can imagine.” [20, p. 7]

4.4 Menger’s Ensembles Flous

In Vienna in the 1920s and 1930s, Karl Menger (1902–1985, Fig 10 (c)) evolved into a specialist in topology and geometry, particularly with regard to the theories of curves, dimensions, and general metrics. After he immigrated to the USA, he continued his work on these subjects. In 1942, with the intention of generalizing the theory of metric spaces more in the direction of probabilistic concepts, he introduced the term “statistical metric”:

A statistical metric is “a set \(S\) such that with each two elements (‘points’) \(p\) and \(q\) of \(S\), a probability function \(\varPi (x; p,q)\) (The probability that the distance between \(p\) and \(q\) is \(x\)) is associated satisfying the following conditions:

  1. 1.

    \(\varPi (0; p,p) = 1\) (The probability is \(1\) that the distance between \(p\) and \(q\) is \(0\).)

  2. 2.

    If \(p \ne q\), then \(\varPi (0; p,p) < 1\).

  3. 3.

    \(\varPi (x; p,q) = \varPi (x; q,p)\).

  4. 4.

    \(T (\varPi (x; p,q) , \varPi (y; q,r) \le \varPi (x+y; p,r)\).

where \(T(\alpha ,\beta )\) is a function defined for \(0 \le \alpha \le 1\) and \(0 \le \beta \le 1\), such that

  1. (a)

    \(0 \le T(\alpha ,\beta ) \le 1\).

  2. (b)

    \(T\) is non-decreasing in either variable.

  3. (c)

    \(T(\alpha ,\beta ) = T(\beta , \alpha )\).

  4. (d)

    \(T (1,1) = 1\).

  5. (e)

    If \(\alpha > 0\) then \(T (\alpha , 1) > 0\). [31, p. 535f]

Condition 4., the “triangular inequality” of the statistical metric \(S\) implies the following inequality for all points \(q\) and all numbers \(x\) between \(0\) and \(z\):

$$\begin{aligned} \varPi (z; p,r) \ge Max ; T (\varPi (x; p,q), \varPi (z-x; q,r)) \end{aligned}$$
(2)

Here, Menger introduced the term triangular norm (t-norm) to indicate the function \(T\).

In 1951 Menger introduced a new notation \(\varDelta _{ab}\) for the non-decreasing cumulative distribution function, associated with every ordered pair \((a,b)\) of elements of a set \(S\) and he wrote: “The value \(\varDelta _{ab} (x)\) may be interpreted as the probability that the distance from \(a\) to \(b\) be \(< x\).” [32, p. 226] Much more interesting is the following text passage: “We call \(a\) and \(b\) certainly-indistinguishable if \(\varDelta _{ab} = 1\) for each \(x> 0\). Uniting all elements which are certainly indistinguishable from each other into identity sets, we decompose the space into disjoint sets \(A, B, \ldots \). We may define for any \(a\) belonging to \(A\) and \(b\) belonging to \(B\). (The number is independent of the choice of \(a\) and \(b\).) The identity sets form a perfect analog of an ordinary metric space since they satisfy the conditionFootnote 2:

If \(A \ne B\), then there exists a positive \(x\) with \(\varDelta _{ab} (x) < 1\) .”

In the same year Menger addressed the difference between the mathematical continuum and the physical continuum. Regarding \(A\), \(B\), and \(C\) as elements of a continuum, he referred to a claim of the French mathematician and philosopher Henri Poincaré, “that only in the mathematical continuum do the equalities \(A=B\) and \(B=C\) imply the equality \(A=C\). In the observable physical continuum, ‘equal’ means ‘indistinguishable’, and \(A=B\) and \(B=C\) by no means imply \(A=C\). ‘The raw result of experience may be expressed by the relation \(A=B\), \(B=C\), \(A<C\) which may be regarded as the formula for the physical continuum.’ According to Poincaré, physical equality is a non-transitive relation.” [33, p. 178]

Menger suggested a realistic description of the equality of elements in the physical continuum by associating with each pair \((A,B)\) of these elements the probability that \(A\) and \(B\) will be found to be indistinguishable. He argued:

For it is only very likely that \(A\) and \(B\) are equal, and very likely that \(B\) and \(C\) are equal – why should it not be less likely that \(A\) and \(C\) are equal? In fact, why should the equality of \(A\) and \(C\) not be less likely than the inequality of \(A\) and \(C\)?” [33, p. 178]

To solve “Poincaré’s paradox” Menger used his concept of probabilistic relations and geometry: For the probability \(E(a,b)\) that \(a\) and \(b\) would be equal he postulated:

  • \(E(a,a) = 1\), for every \(a\);

  • \(E(a,a) = E(b,a)\), for every \(a\) and \(b\);

  • \(E(a,b) \cdot E(b,c) \le E(a, c)\) for every \(a\), \(b\), \(c\).

If \(E(a,a) = 1\), then he called \(a\) and \(b\) certainly equal. (In this case we obtain the ordinary equality relation.) “All the elements which are certainly equal to \(a\) may be united to an ‘equality set’, \(A\). Any two such sets are disjoint unless they are identical.” [33, p. 179]

In addition to studies of well-defined sets, he called for a theory to be developed in which the relationship between elements and sets is replaced by the probability that an element belongs to a set; in contrast to ordinary sets, he called these entities “ensembles flous” [34, p. 226]. Later, Menger used the English term “hazy set” and to elucidate the contrast he referred to conventional sets as “rigid sets.” [35].

Menger never envisaged a mathematical theory of loose concepts that differs from probability theory. At a symposium of the American Association for the Advancement of Science organized in 1966 to commemorate the 50th anniversary of Ernst Mach’s death, he spoke about “Positivistic Geometry”. When he compared his “micro geometry” with the theory of fuzzy sets – he wrote: “In a slightly different terminology, this idea was recently expressed by Bellman, Kalaba and Zadeh under the name fuzzy set.” [35, p. 232] – He did not see that the “slight difference” between “degrees” (fuzziness) and “probabilities” is a difference not just in terminology but in the meaning of the concepts.

4.5 Wittgenstein’s Family Resemblances

In his later years (after 1947), Wittgenstein turned away from the epistemological system of the Tractatus with its ideal mapping between the objects of reality and a logically precise language. This philosophy of his later years is completely different from that of the Tractatus years. It seems as though the two philosophical systems were created by different men because this new view said: If we are not able to find such an exact logical language, then we have to accept the fact that there is vague linguistic usage in all languages. Then the images, models, and theories that we build with the words and propositions of our languages to communicate with them are and will also be vague.

His second main work, the Philosophical Investigations epitomize Wittgenstein’s late philosophy: “Instead of producing something common to all that we call language, I am saying that these phenomena have no one thing in common which makes us use the same word for all, but that they are related to one another in many different ways. And it is because of this relationship, or these relationships, that we call them all ‘language’. I will try to explain this.” [36, §65] We find the following explanation in the next paragraph of this book, in keeping with the concept of a game:

Consider for example the proceedings that we call “games”. I mean board-games, card-games, ball-games, Olympic games, and so on. What is common to them all? Don’t say: “There must be something common, or they would not be called ‘games’ ” but look and see whether there is anything common to all. For if you look at them you will not see something that is common to all, but similarities, relationships, and a whole series of them at that. To repeat: dont think, but look! Look for example at board-games, with their multifarious relationships.

Now pass to card-games; here you find many correspondences with the first group, but many common features drop out, and others appear. When we pass next to ball-games, much that is common is retained, but much is lost. Are they all “amusing”? Compare chess with noughts and crosses. Or is there always winning and losing, or competition between players? Think of patience. In ball games there is winning and losing; but when a child throws his ball at the wall and catches it again, this feature has disappeared. Look at the parts played by skill and luck; and at the difference between skill in chess and skill in tennis. Think now of games like ring-a-ring-a-roses; here is the element of amusement, but how many other characteristic features have disappeared! sometimes similarities of detail. And we can go through the many, many other groups of games in the same way; can see how similarities crop up and disappear. And the result of this examination is: we see a complicated network of similarities overlapping and crisscrossing: sometimes overall similarities. [36, §66]

In the next paragraph Wittgenstein creates a new concept to describe this new epistemological system:

I can think of no better expression to characterize these similarities than “family resemblances”; for the various resemblances between members of a family: build, features, colour of eyes, gait, temperament, etc. etc. overlap and crisscross in the same way. And I shall say: “games” form a family. [36, §67]

Concepts and their families have no sharp boundaries, as he also wrote in [36, §71]:

One might say that the concept “game” is a concept with blurred edges. “But is a blurred concept a concept at all?” Is an indistinct photograph a picture of a person at all? Is it even always an advantage to replace an indistinct picture by a sharp one? Isn’t the indistinct one often exactly what we need? Frege compares a concept to an area and says that an area with vague boundaries cannot be called an area at all. This presumably means that we cannot do anything with it. But is it senseless to say: “Stand roughly there”? [36, §71]

And in a later paragraph Wittgenstein wrote: “The results of philosophy are the uncovering of one or another piece of plain nonsense and bumps that the understanding has got by running its head up against the limits of language.” [36, §119] In other words, our conceptions, images, and symbols of external things or objects are entities without sharp borders. They are fuzzy entities!

5 Before the “Fuzzy Boom”: Fuzziness in Language and Meaning

In the 1960s Zadeh was interested in applying fuzzy sets in linguistics. This idea led to interdisciplinary scientific exchange on the campus of the University of California at Berkeley between him and the mathematicians Joseph Goguen (1941–2006, Fig 14 (a)) and Hans-Joachim Bremermann, the psychologist Eleanor Rosch (Heider) and the linguist George Lakoff. Goguen generalized the fuzzy sets to so-called “\(L\)-sets” [37, 38]. An \(L\)-set is a function that maps the fuzzy set carrier \(X\) into a partially ordered set \(L\), i.e. \(L: X \rightarrow L\). The partially ordered set \(L\) Goguen called the “truth set” of \(A\). The elements of \(L\) can thus be interpreted as “truth values”; in this respect, Goguen then referred to a “Logic of Inexact Concepts” [39]. His work was laid out in terms of logical algebra and category theory, and his proof of a representation theorem for \(L\)-sets within category theory justified fuzzy set theory as an expansion of set theory.

5.1 Fuzzy Languages and Fuzzy Algorithms

In 1970 Zadeh presented his paper “Fuzzy Languages and their Relations to Human and Machine Intelligence” at the conference Man and Computer in Bordeaux, France: He said: “As computers become more powerful and thus more influential in human affairs, the philosophical aspects of this question become increasingly overshadowed by the practical need to develop an operational understanding of the limitations of the machine judgment and decision making ability.” [40, p. 130]

He called it a paradox that the human brain is always solving problems by manipulating “fuzzy concepts” and “multidimensional fuzzy sensory inputs” whereas “the computing power of the most powerful, the most sophisticated digital computer in existence” is not able to do this. Therefore, he stated that “in many instances, the solution to a problem need not be exact, so that a considerable measure of fuzziness in its formulation and results may be tolerable. The human brain is designed to take advantage of this tolerance for imprecision whereas a digital computer, with its need for precise data and instructions, is not.” [40, p. 132] He intended to push his theory of fuzzy sets to model the imprecise concepts and directives: “Indeed, it may be argued that much, perhaps most, of human thinking and interaction with the outside world involves classes without sharp boundaries in which the transition from membership to non-membership is gradual rather than abrupt.” [40, p. 131] He stated:

Although present-day computers are not designed to accept fuzzy data or execute fuzzy instructions, they can be programmed to do so indirectly by treating a fuzzy set as a data-type which can be encoded as an array [\(\ldots \)]. Granted that this is not a fully satisfactory approach to the endowment of a computer with an ability to manipulate fuzzy concepts, it is at least a step in the direction of enhancing the ability of machines to emulate human thought processes. It is quite possible, however, that truly significant advances in artificial intelligence will have to await the development of machines that can reason in fuzzy and non-quantitative terms in much the same manner as a human being. [40, p. 132]

In August 1967, the Filipino electrical engineer William Go Wee at Purdue University in Indiana had submitted his dissertation “On Generalizations of Adaptive Algorithms and Application of the Fuzzy Sets Concept to Pattern Classification” [41] that he had written under King Sun Fu, one of the pioneers in the field of pattern recognition. Wee had applied the fuzzy sets to iterative learning procedures for pattern classification and had defined a finite automaton based on Zadeh’s concept of the fuzzy relation as a model for nonsupervised learning systems: “The decision maker operates deterministically. The learning section is a fuzzy automaton. The performance evaluator serves as an unreliable ‘teacher’ who tries to teach the ‘student’ to make right decisions.” [40, p. 101]

The fuzzy automaton representing the learning section implemented a “non-supervised” learning fuzzy algorithm and converged monotonically. Wee showed that this fuzzy algorithm could not only be used in the area of pattern classification but could also be translated to control and regulation problems. Working with his doctoral advisor, Wee presented his findings in the article “A Formulation of Fuzzy Automata and its Applications as a Model of Learning Systems” [42].

In 1968 Zadeh presented “fuzzy algorithms” [43]. Usual algorithms depend upon precision. An algorithm must be completely unambiguous and error-free in order to result in a solution. The path to a solution amounts to a series of commands which must be executed in succession. Algorithms formulated mathematically or in a programming language are based on set theory. Each constant and variable is precisely defined; every function and procedure has a definition set and a value set. Each command builds upon them. Successfully running a series of commands requires that each result (output) of the execution of a command lies in the definition range of the following command, that it is, in other words, an element of the input set for the series. Not even the smallest inaccuracies may occur when defining these coordinated definition and value ranges. He now saw “that in real life situations people think certain things. They think like algorithms but not precisely defined algorithms [40]. Inspired by this idea, he wrote:

Essentially, its purpose is to introduce a basic concept which, though fuzzy rather than precise in nature, may eventually prove to be of use in a wide variety of problems relating to information processing, control, pattern recognition, system identification, artificial intelligence and, more generally, decision processes involving incomplete or uncertain data. The concept in question will be called fuzzy algorithm because it may be viewed as a generalization, through the process of fuzzification, of the conventional (nonfuzzy) conception of an algorithm. [40, p. 94]

To illustrate, fuzzy algorithms may contain fuzzy instructions such as:

  1. (a)

    “Set \(y\) approximately equal to \(10\) if \(x\) is approximately equal to \(5\),” or

  2. (b)

    “If \(x\) is large, increase \(y\) by several units,” or

  3. (c)

    “If \(x\) is large, increase \(y\) by several units; if \(x\) is small, decrease \(y\) by several units; otherwise keep \(y\) unchanged.”

The sources of fuzziness in these instructions are fuzzy sets which are identified by their underlined names. [40, p. 94f]

All people function according to fuzzy algorithms in their daily life, Zadeh wrote – they use recipes for cooking, consult the instruction manual to fix a TV, follow prescriptions to treat illnesses or heed the appropriate guidance to park a car. Even though activities like this are not normally called algorithms: “For our point of view, however, they may be regarded as very crude forms of fuzzy algorithms.” [40, p. 95]

Already in 1969 Zadeh contributed to a NATO summer school on “Architecture and Design of Digital Computers” in Grenoble with the title “Toward Fuzziness in Computer Systems. Fuzzy Algorithms and Languages” [44] The association of fuzziness and computers must have sounded surprising in the late 1960s and referring to that Zadeh said in its introduction: “At first glance, it may appear highly incongruous to mention computers and fuzziness in the same breath, since fuzziness connotes imprecision whereas precision is a major desideratum in computer design.” [44, p. 9]

In the following paragraphs Zadeh justified by with arguing that future computer systems will have to perform many more complex information processing tasks than the computers that he and his contemporaries in the 1960s knew. He expected that future computers would have to process more and more imprecise information! “Fuzziness, then, is a concomitant of complexity. This implies that as the complexity of a task or a system for performing that task exceeds a certain threshold, the system must necessarily become fuzzy in nature. Thus, with the rapid increase in the complexity of the information processing tasks which the computers are called upon to perform, a point is likely to be reached perhaps within the next decade when the computers will have to be designed for processing of information in fuzzy form. In fact, it is this capability – a capability which present-day computers do not possess – that distinguishes human intelligence from machine intelligence. Without such capability we cannot build computers that can summarize written text, translate well from one natural language to another, or perform many other tasks that humans can do with ease because of their ability to manipulate fuzzy concepts.” [44, p. 10] For that purpose, Zadeh asserted that “intriguing possibilities for computer systems” are offered by fuzzy algorithms and fuzzy languages!

To execute fuzzy algorithms by computers they have to get an expression in fuzzy programming languages. Consequently the next step for Zadeh was to define fuzzy languages. “All languages, whether natural of artificial, tend to evolve and rise in level through the addition of new words to their vocabulary. These new words are, in effect, names for ordered subsets of names in the vocabulary to which they are added.” [44, p. 16]

Real world phenomena are very complex. To characterize or picture these phenomena in terms of our natural languages we use our vocabulary and because this set of words is restricted, Zadeh argued that this process leads to fuzziness:

Consequently, when we are presented with a class of very high cardinality, we tend to group its elements together into subclasses in such a way as to reduce the complexity of the information processing task involved. When a point is reached where the cardinality of the class of subclasses exceeds the information handling capacity of the human brain, the boundaries of the subclasses are forced to become imprecise and fuzziness becomes a manifestation of this imprecision. This is the reason why the limited vocabulary we have for the description of colors makes it necessary that the names of colors such as red, green, bleu [sic.], purple, etc. be, in effect, names of fuzzy rather than non-fuzzy sets. This is why natural languages, which are much higher in level than programming languages, are fuzzy whereas programming languages are not. [44, p. 10]

Here, Zadeh argued explicitly for programming languages that are – because of missing rigidness and preciseness and because of their fuzziness – more like natural languages. He mentioned the concept of stochastic languages that was published by the Finnish mathematician Paavo Turakainen in Information and Control in the foregoing year [45], being such an approximation to our human languages using randomizations in the productions, but Zadeh preferred fuzzy productions to achieve a formal fuzzy language. Then, he presented a short sketch of his program to extend non-fuzzy formal languages to fuzzy languages which he published in elaborated form with the co-author Edward T.-Z. Lee in “Note on Fuzzy Languages” [46]. His definition in these early papers was given in the terminology of the American computer scientists John Edward Hopcroft and Jeffrey David Ullman that was published in the same year [47].

\(L\) is a fuzzy language if it is a fuzzy set in the set \(V_T^{*}\), the so-called “Kleene closure of \(V_T\), the set of all finite strings composed of elements of the finite set of terminals \(V_T\), e.g. \(V_T = \left\{ a, b, c, \ldots , z\right\} \) .The membership function \(\mu _L (x) : V_T^{*} \rightarrow [0,1]\) associates with each finite string \(x\), composed of elements in \(V_T\), its grade of membership in \(L\). Here is one of the simple examples that he gave in this article [44]:

Assume that \(V_T = \left\{ 0, 1\right\} \), and take \(L\) to be the fuzzy set \(L = \left\{ (0, 0.9), (1, 0.2),(00, 0.8),\right. \) \(\left. (01, 0.3) \right. \), \(\left. (10, 0.7), (11,0.3)\right\} \) with the understanding that all the other strings in \(V_T^{*}\) do not belong to \(L\) (i.e., have grade of membership equal to zero). [44, p. 16].

In general the language \(L\) has high cardinality and therefore it is not usual to define it by a listing of its elements but by a finite set of generating rules. Thus, in analogy to the case of non-fuzzy languages Zadeh defined a fuzzy grammar as

a quadruple \(G = (V_N, V_T, P, S)\), where \(V_N\) is a set of variables (non-terminals) disjoint from \(V_T\), \(P\) is a set of [fuzzy] productions and \(S\) is an element of \(V_N\). The elements of \(V_N\) (called [fuzzy] syntactic categories) and \(S\) is an abbreviation for the syntactic category ‘sentence’. The elements of \(P\) define conditioned fuzzy sets in \((V_T \cup V_N)\). [44, p. 16]

5.2 Fuzzy Relations and Fuzzy Semantics

In 1971, Zadeh defined similarity relations and fuzzy orderings [48]. In doing so, he was proceeding from the concept of fuzzy relations as a fuzzification of the relation concept known in conventional set theory that he had already defined in his seminal article “Fuzzy Sets” [12]: If \(X\) and \(Y\) are conventional sets and if \(X \times Y\) is their Cartesian product, let: \(L(X)\) be the set of all fuzzy sets in \(X\), \(L(Y)\) be the set of all fuzzy sets in \(Y\), and \(L(X\times Y)\) be the set of all fuzzy sets in \(X \times Y\).

Relations between \(X\) and \(Y\) are subsets of their Cartesian product \(X \times Y\), and the composition \(t = q *r\) of the relation \(q\subseteq X \times Y\) with the relation \(r \subseteq Y \times Z\) into the new relation \(T \subseteq X \times Z\) is given by the following definition: \(t = q *r = \left\{ (x,z), \exists y: (x,y) \in q \wedge (y,z)\in r\right\} \).

Fuzzy relations between sets \(X\) and \(Y\) are subsets in \(L(X \times Y)\). For three conventional sets \(X\), \(Y\) and \(Z\), the fuzzy relation \(Q\) between \(X\) and \(Y\) and the fuzzy relation \(R\) between \(Y\) and \(Z\) are defined: \(Q \in L(X \times Y)\) and \(R \in L(Y \times Z)\). The combination of these two fuzzy relations into a new fuzzy relation \(T \in L(X \times Z)\) between \(X\) and \(Z\) can then be defined from the fuzzy relations \(Q\) and \(R\) into the fuzzy relation \(T \in L(X \times Z)\) when the logical conjunctions are replaced by the corresponding ones of the membership functions.

  • The above definition of the composition of conventional relations includes a logical AND (\(\wedge \)), which, for the “fuzzification”, is replaced by the minimum operator that is applied to the corresponding membership functions.

  • The above definition of the composition of conventional relations includes the expression “\(\exists y\)” (“there exists a \(y\)”). The existing \(y \in Y\) is the first or the second or the third \(\ldots \) (and so on); written logically: \(sup_{y \in Y}\) (\(\vee \)). In the “fuzzifications”, the logical OR conjunction is replaced by the maximum operator that is applied to the corresponding membership functions.

The fuzzy relation \(T=Q *R\) is therefore defined via Zadeh’s “rule of max-min combination” for membership functions: \(\mu _T (x,y) = max_{y \in Y} min \left\{ \mu _Q (x,y); \mu _T (y,z)\right\} \), \(y \in Y\). In infinite sets the max-min composition rule is replaced with the sup-min composition rule. However, it is adequate to assume here that all of the sets are finite.

As a generalization of the concept of the equivalence relation Zadeh defined the concept of “similarity”, since the similarity relation he defined is reflective, symmetrical and (max, min) transitive, i.e. for \(x,y \in X\) the membership function of \(S\) has the following properties:

  • reflexivity: \(\mu _S (x,x) = 1\),

  • symmetry: \(\mu _S (x,y) = \mu _S (y,x)\),

  • transitivity: \(\mu _S (x,z) \ge max_{y \in Y} min \left\{ \mu _{(x,y)},\mu _S (y,x)\right\} \).

Zadeh’s occupation with natural and artificial languages gave rise to his studies in semantics. This intensive work let him to the question “Can the fuzziness of meaning be treated quantitatively, at least in principle?” [49, p. 160] . His 1971 article “Quantitative Fuzzy Semantics” [49] starts with this hint:

Few concepts are as basic to human thinking and yet as elusive of precise definition as the concept of ‘meaning’. Innumerable papers and books in the fields of philosophy, psychology, and linguistics have dealt at length with the question of what is the meaning of ‘meaning’ without coming up with any definitive answers.”Footnote 3 [49, p. 159].

Zadeh started a new field of research “to point to the possibility of treating the fuzziness of meaning in a quantitative way and suggest a basis for what might be called quantitative fuzzy semantics” combining his results on fuzzy languages and fuzzy relations. In the section “Meaning” of this paper he set up the basics:

Consider two spaces: (a) a universe of discourse, \(U\), and (b) a set of terms, \(T\), which play the roles of names of subsets of \(U\). Let the generic elements of \(T\) and \(U\) be denoted by \(x\) and \(y\), respectively. Then he started to define the meaning \(M(x)\) of a term \(x\) as a fuzzy subset of \(U\) characterized by a membership function \(\mu (y\left| x)\right. \) which is conditioned on \(x\). [49, p. 164f]

One of his examples was:

Let \(U\) be the universe of objects which we can see. Let \(T\) be the set of terms white, grey, green, blue, yellow, red, black. Then each of these terms, e.g., red, may be regarded as a name for a fuzzy subset of elements of \(U\) which are red in color. Thus, the meaning of red, \(M(\text{ red })\), is a specified fuzzy subset of \(U\). [49, p. 164f]

In the following section of this paper, that is named “Language”, Zadeh regarded a language \(L\) as a “fuzzy correspondence”, more explicitly, a fuzzy binary relation, from the term set \(T = \left\{ x\right\} \) to the universe of discourse \(U = \left\{ y\right\} \) that is characterized by the membership function \(\mu _L : T \times U \rightarrow [0,1]\). If a term \(x\) of \(T\) is given, then the membership function \(\mu _L (x,y)\) defines a set \(M(x)\) in \(U\) with the following membership function:\(\mu _{M(x)} (y) = \mu _L (x,y)\). Zadeh called the fuzzy set \(M(x)\) the meaning of the term \(x\); \(x\) is thus the name of \(M(x)\).

With this framework Zadeh continued in his 1970 article [40] to establish the basic aspects of a theory of fuzzy languages that is “much broader and more general than that of a formal language in its conventional sense.” [40, p. 134] In the following we quote his definitions of fuzzy language, structured fuzzy language and meaning:

Definition 1

A fuzzy language \(L\) is a quadruple \(L=(U, T, E, N)\), in which \(U\) is a non-fuzzy universe of discourse; \(T\) (called the term set) is a fuzzy set of terms which serve as names of fuzzy subsets of \(U\); \(E\) (called an embedding set for \(T\)) is a collection of symbols and their combinations from which the terms are drawn, i.e., \(T\) is a fuzzy subset of \(E\); and \(N\) is a fuzzy relation from \(E\) (or more specifically, the support of \(T (= supp (T) = \left\{ x\left| \right. \mu _A (x) > 0\right\} )\) that is a non-fuzzy subset, to \(U\) which will be referred to as a naming relation.

In the case that \(U\) and \(T\) are infinite large sets, there is no table of membership values for \(\mu _T\)(x) and \(\mu _N (x,y)\) and therefore the values of these membership functions have to be computed. To this end, universe of discourse \(U\) and term set \(T\) have to be endowed with a structure and therefore Zadeh defined the concept of a structured fuzzy language.

Definition 2

A structured fuzzy language \(L\) is a quadruple \(L=(U, S_T, E, S_N)\), in which \(U\) is a universe of discourse; \(E\) is an embedding set for term set \(T\), \(S_T\) is a set of rules, called syntactic rules of \(L\), which collectively provide an algorithm for computing the membership function, \(\mu _T\), of the term set \(T\); and \(S_N\) is a set of rules, called the semantic rules of \(L\), which collectively provide an algorithm for computing the membership function, \(\mu _N\), of the fuzzy naming relation \(N\). The collection of syntactic and semantic rules of \(L\) constitute, respectively, the syntax and semantics of \(L\).

To define the concept of meaning, Zadeh characterized the membership function \(\mu _N : supp(T)\times U \rightarrow [0,1]\) representing the strength of the relation between a term \(x\) in \(T\) and an object \(y\) in \(U\). He clarified:

A language, whether structured or unstructured, will be said to be fuzzy if [term set] \(T\) or [naming relation] \(N\) or both are fuzzy. Consequently, a non-fuzzy language is one in which both \(T\) and \(N\) are non-fuzzy. In particular, a non-fuzzy structured language is a language with both non-fuzzy syntax and non-fuzzy semantics.” [40, p. 138]

Thus, natural languages have fuzzy syntax and fuzzy semantics whereas programming languages, as they were usual in the early 1970s, were non-fuzzy structured languages. The membership functions \(\mu _T\) and \(\mu _N\) for term set and naming relation, respectively, were two-valued and the compiler used the rules to compute these values \(0\) or \(1\). This means that the compiler decides deterministically by using the syntactic rules whether a string \(x\) is a term in \(T\) or not and it also determines by using the semantic rules whether a term \(x\) hits an object \(y\) or not. On the other hand we have natural languages, e.g. English, and it is possible that we use sentences that are not completely correct but also not completely incorrect. These sentences have a degree of grammaticality between \(0\) and \(1\). Of course, native speakers usually use correct sentences. “In most cases, however, the degree of grammaticality of a sentence is either zero or one, so that the set of terms in a natural language has a fairly sharply defined boundary between grammatical and ungrammatical sentences”, Zadeh wrote [40, p. 138].

Much more fuzziness we find in semantics of natural languages: Zadeh gave the example “if the universe of discourse is identified with the set of ages from \(1\) to \(100\), then the atomic terms young and old do not correspond to sharply defined subsets of \(U\). The same applies to composite terms such as not very young, not very young and not very old, etc. In effect, most of the terms in a natural language correspond to fuzzy rather than non-fuzzy subsets of the universe of discourse.” [40, p. 139]

Zadeh now identified these fuzzy subsets of the universe of discourse that correspond to terms in natural languages with its “meaning”:

Definition 3

The meaning of a term \(x\) in \(T\) is a fuzzy subset \(M(x)\) of \(U\) in which the grade of membership of an element \(y\) of \(U\) is given by \(\mu _{M(x)} (y) = \mu _N (x,y)\).

Thus, \(M(x)\) is a fuzzy subset of \(U\) which is conditioned on \(x\) as a parameter and which is a section of \(N\) in the sense that its membership function, \(\mu _{M(x)} : U \rightarrow [0,1]\), is obtained by assigning a particular value, \(x\), to the first argument in the membership function of \(N\).

Fig. 12
figure 12

The components of a fuzzy language: \(U\) = universe of discourse; \(T\) = term set; \(E\) = embedding set for \(T\) ; \(N\) = naming relation from \(E\) to \(U\); \(x\) = term; \(y\) = object in \(U\); \(\mu _M (x,y)\) = strength of the relation between \(x\) and \(y\); \(\mu _T (x)\) = grade of membership of \(x\) in \(T\). [40, p. 136]

Fig. 13
figure 13

Membership functions of fuzzy sets \(M\) (young), \(M\) (middle-aged) and \(M\) (old), [40, p. 140]

Zadeh concluded this paper mentioning that “the theory of fuzzy languages is in an embryonic stage” but he expressed his hope that based on this framework better models for natural languages will be developed than the models of the “restricted framework of the classical theory of formal languages.” [40, p. 163]

Later in the 1970s he published important papers summarizing and developing the concepts we presented above: in 1973 “Outline of a new approach to the analysis of complex systems and decision processes” [50] appeared in the IEEE Transaction on Systems, Man, and Cybernetics, in 1975 the three-part article “The concept of a Lingustic Variable and its Application to Approximate Reasoning” [51] appeared in the journal Information Sciences, in the same year Zadeh published “Fuzzy Logic and Approximate Reasoning” in the philosophical journal Synthese [52] and in 1978 Zadeh published “PRUF a meaning representation language for natural languages” in the International Journal of Man-Machine Studies [53].Footnote 4

During the 1970’s Berkeley-psychologist Eleanor Rosch developed her prototype theory on the basis of empirical studies. This theory assumes that people perceive objects in the real world by comparing them to prototypes and then ordering them accordingly. In this way, according to Rosch, word meanings are formed from prototypical details and scenes and then incorporated into lexical contexts depending on the context or situation. Rosch hypothesized that different societies process perceptions differently depending on how they go about solving problems [54]. When the linguist George Lakoff (born 1941, Fig. 14 (b)) heard about Rosch’s experiments, he was working at the Center for Advanced Study in Behavioral Sciences at Stanford. During a discussion about prototype theory, someone there mentioned Zadeh’s name and his idea of linking English words to membership functions and establishing fuzzy categories in this way. Lakoff and Zadeh met in 1971/72 at Stanford to discuss this idea and also the idea of fuzzy logic, after which Lakoff wrote his paper “Hedges: A Study in Meaning Criteria and the Logic of Fuzzy Concepts” [55]. In this work, Lakoff employed “hedges” (meaning barriers) to categorize linguistic expressions and he invented the term “fuzzy logic” whereas Goguen had used “logic of inexact concepts”.

Fig. 14
figure 14

(a): Joseph Goguen, (b): George Lakoff and (c): Ebrahim Mamdani

Based on his later research, however, Lakoff decided that fuzzy logic was not an appropriate logic for linguistics, but: “Inspired and influenced by many discussions with Professor G. Lakoff concerning the meaning of hedges and their interpretation in terms of fuzzy sets,” Zadeh had also written an article in 1972 in which he contemplated “linguistic operators”, which he called “hedges”: “A Fuzzy Set-Theoretic Interpretation of Hedges”. Here he wrote:

A basic idea suggested in this paper is that a linguistic hedge such as very, more, more or less, much, essentially, slightly etc. may be viewed as an operator which acts on the fuzzy set representing the meaning of its operand” [56].

In the 1970s Zadeh had expected that his theory of Fuzzy Sets “provides an approximate and yet effective means of describing the behavior of systems which are too complex or too ill-defined to admit of precise mathematical analysis.” [50, p. 28] He had expected that even at its present stage of development” his new fuzzy method.

can be applied rather effectively to the formulation and approximate solution of a wide variety of practical problems, particularly in such fields as economics, management science, psychology, linguistics, taxonomy, artificial intelligence, information retrieval, medicine and biology. This is particularly true of those problem areas in these fields in which fuzzy algorithms can be drawn upon to provide a means of description of ill-defined concepts, relations, and decision rules. [50, p. 44]

In an interview that Zadeh gave in 1994, he mentioned his surprise that Fuzzy Logic was “embraced by engineers” and “used in industrial process controls and in ‘smart’ consumer products such as hand-held camcorders that cancel out jittering and microwaves that cook your food perfectly at the touch of a single button.” In that interview he also said that he had “expected people in the social sciences – economics, psychology, philosophy, linguistics, politics, sociology, religion and numerous other areas to pick up on it [Fuzzy Logic]. It’s been somewhat of a mystery to me, why even to this day, so few social scientists have discovered how useful it could be.” [57]Footnote 5

However, it was the concept of fuzzy algorithms that fell on fertile ground first: Ebrahim H. Mamdani (1942–2010), Fig. 14 (c)),Footnote 6 a professor of electrical engineering at Queen Mary College in London, had read Zadeh’s article [50] shortly after it was published and he directed his doctoral student Sedrak Assilian to perform a trial to realize a fuzzy system under laboratory conditions and he also pointed to this paper in the article that he published together with Assilian after he had finished his Ph. D thesis:

The true antecedent of the work described here is an outstanding paper by Zadeh (1973) which lays the foundations of what we have termed linguistic synthesis \(\ldots \) and which had also been described by Zadeh as approximate reasoning (AR). In the 1973 paper Zadeh shows how vague logical statements can be used to derive inferences (also vague) from vague data. The paper suggests that this method is useful in the treatment of complex humanistic systems. However, it was realized that this method could equally be applied to ‘hard’ systems such as industrial plant controllers.” [60, p. 325]

This was the kick-off for the “Fuzzy-Boom” and Zadeh’s primary intention trailed away for decades.

6 A Real-World Application Fuzzy System

The potential of the new techniques of fuzzy sets and fuzzy systems had stimulated Mamdani to attempt the implementation of a real-world fuzzy system and Sedrak Assilian designed a fuzzy algorithm to control a small steam engine (Fig. 15) within a few days. The concepts of linguistic variables and the max-min composition were suitable to establish fuzzy control rules because input, output and state of the steam engine system range over fuzzy sets. Thus, Assilian and Mamdani designed the first real fuzzy application when they controlled the system with the input variables heat and throttle and the output variables pressure and speed (Fig. 16) by a fuzzy rule base system.

It was an experimental study which became very popular. Immediately we had a steam engine, and the idea was to control the steam engine. We started working at Friday, and – I do not remember clearly – by Sunday it was working” he said in an interview in 2008 [61, p. 75].

In 1974, Assilian completed his Ph. D. thesis on this first fuzzy control system; unfortunately, no other facts about Assilian are available; he also does not appear in later literature about Fuzzy Set Theory and its applications.

The entire system consisted of the combination of a steam engine and a boiler (see Fig. 17). The steam was supposed to reach a certain predetermined pressure within the boiler; this was achieved by regulating the temperature. The engine was to run as consistently as possible at a particular piston speed, for which purpose a throttle was installed. This was therefore a system with two inputs (heat supplied to the boiler, engine throttle) and two outputs (pressure in the boiler, engine speed) (see Fig. 16). These inputs and outputs range over fuzzy sets. Thus, Assilian and Mamdani designed the first real fuzzy system and also the first real fuzzy application when they controlled this system by a fuzzy rule base system.

Fig. 15
figure 15

Photograph of the “Fuzzy steam engine”, Queen Mary College, 1974, reprint courtesy of Brian Gaines, see also: [62, p. 18]

Sensors constantly monitored the boiler and indicated the current pressure. If the prevailing pressure corresponded to the set point value, then nothing needed be done. If it deviated from the set point, then some action had to be taken, and this task was to be assumed by an automatic fuzzy controller.

Fig. 16
figure 16

The process variables of the fuzzy steam engine, , p. 31]

Simple identification tests on the plant proved that it is highly nonlinear with both magnitude and polarity of the input variables. Therefore the plant possesses different characteristics at different operating points, so that the direct digital controller implemented for comparison purposes had to be returned (by trial and error) to give the best performance each time the operating point was altered. [63, p. 2]

Fig. 17
figure 17

The system consisting of a steam engine and a boiler, [62, p. 18]

Assilian and Mamdani defined six linguistic variables (four input and two output variables):

  1. 1.

    PE (Pressure Error), defined as the difference between the actual value

    and the set point of the pressure in the boiler.

  2. 2.

    SE (Speed Error), defined as the difference between the actual value

    and the set point of the of the piston speed

  3. 3.

    CPE (Change in pressure error), defined as the difference

    between the actual value of PE and its most recent value

  4. 4.

    CSE (Change in speed error), defined as the difference

    between the actual value of SE and its most recent value

  5. 5.

    HC (Heat Change) (action variable, as the result of which a command occurs).

  6. 6.

    TC (Throttle Change) (action variable, as the result of which a command occurs).

They introduced linguistic terms for the variables: PB (Positive Big), PM (Positive medium), PS (Positive Small), P0 (Positive, Zero), N0 (Negative Zero), NS (Negative Small), NM (Negative Medium), and NB (Negative Big). The variables were distributed over a number of points in accordance with the universe of discourse.

  • For the variables PE and SE there were \(13\) points, which ranged from the maximum negative error through zero to the maximum positive error, with the zero being divided into a “negative zero error” N \(0\) and a “positive zero error” P \(0\) (“N \(0\) – just below the set point \(\ldots \) P \(0\) – just above the set point” [63, p. 7f].

  • The variables CPE and CSE were similarly quantized.

  • The variable HC was ultimately quantized over \(15\) points.

  • Similarly, the variable TC was distributed over five points.

Mamdani and Assilian formed the fuzzy sets subjectively and then they defined 24 rules as IF-THEN rules. Table 1 gives three rules as examples, represented as in [63]. For the sake of simplicity, the authors of that work did not differentiate between “positive zero” and “negative zero”.

Table 1 Examples of Mamdani’s and Assilian’s IF-THEN rules in [63]

These rule relationships were implemented as fuzzy relations for which Zadeh had already indicated the max-min composition rule in his first publication on Fuzzy Sets. Additionally, a PDP 8/S digital computer [62, p. 17] calculated a corresponding fuzzy set as a value for the output variable. This method can be represented graphically in the following way (see Fig. 18):

Fig. 18
figure 18

Illustration of the application of the min-max rule based on [64, p. 161]

The sensors indicate sharp values for the input variables pressure deviation and its change, whose membership values with respect to the corresponding fuzzy sets can be read on the triangular membership functions. In the illustrated example for rule 1, the membership value with respect to the fuzzy set pressure deviation PS is \(0.2\) and it is \(0.4\) with respect to the fuzzy set change in pressure deviation N. Today this part of the fuzzy control process is known as “fuzzification”.

The max-min rule prescribes that the minimum of these two values is computed first. (In the example for rule 1 illustrated above, this value is \(0.2\)). Accordingly, after executing this rule alone, the output command was “Change heat supply NS” and it had a membership value of \(0.2\). The result of rule 1 thus results in a triangular function that is truncated at the value \(0.2\) – a trapezoidal membership function. However, rule 2 and rule 3 have also fired and so they must be evaluated analogously and parallel to rule 1. The final membership function for the fuzzy set as a value of the output variable change in pressure deviation is ultimately composed of the trapezoidal membership functions of the individual rule results. This composition occurs according to the max-min rule by forming the maximum of the membership functions of all three output fuzzy sets.

Just how was the output variable change in pressure deviation supposed to be adjusted, though? For this a sharp (that is, crisp or non-fuzzy) value is required and Mamdani and Assilian decided on a simple procedure:

Various considerations may influence the choice procedure depending on the particular application and in our case effectively that action is taken which has the largest membership grade. It is possible of course that more than one peak of a flat is obtained as illustrated below [see Fig 19]:

Negative deviations signify a movement toward the set point, positive deviations signify a movement away from the set point [60, p. 627].

Fig. 19
figure 19

Illustration of the selection of the centroid as a defuzzification method devised by Assilian and Mamdani [63, p. 627]

“The particular procedure in our case takes the action indicated by the arrow, which is midway between the two peaks or at the centre of the plateau.” [63, p. 5]

In his dissertation thesis entitled Artificial Intelligence in the Control of Real Dynamic Systems that Assilian produced in response to this fuzzy control problem [62], he wrote that the control strategy they had realized was one that a human operator could use to control a steam engine.

These control policies were established first by imagining the entire state space (PE \(\times \) CPE \(\times \) SE \(\times \) CSE) to be divided into a number of areas, and second, writing down a control policy for each of these areas. Obviously, the first set of rules obtained in this manner does not necessarily produce the best quality of control possible \(\ldots \) [62, p. 135]

Fig. 20
figure 20

FC commands for the steam engine designed by Assilian and Mamdani

Figure 20 shows the “Fuzzy control instructions for heat-pressure loop of steam engine” [60, p. 627]. This control algorithm was thus profoundly subjective. Not only the algorithm but also the membership function had been designed subjectively. Yet as Assilian and Mamdani managed to demonstrate, this Fuzzy Control (FC) system exceeded the performance of conventional control systems in several ways (see Fig. 21).

  • Much less information is required for FC than for conventional control.

  • The verbal knowledge of human experts did not have to be mathematically exact in order to be processed by the automatic control.

  • Errors were reduced little by little until the set point could be reached; digital controllers “overshot” this target instead.

  • The FC system worked faster than a conventional control system; the possibility of processing the parallel firing of several rules at the same time shortened the required control time.

Fig. 21
figure 21

The result of the Assilian-Mamdani FC (\(\circ \)) compared to a conventional controller. (Dynamic Divergence Caching (DDC) algorithm damped (\(\square \)) and undamped (\(x\))), [63, p. 6]

With this fuzzy control of a steam engine – or more precisely a combination of a boiler and a steam engine – the essential principles for the construction of an entire class of fuzzy control systems were established and Mamdani went ahead. Already in January 1976 he organized – together with Brian Gaines, then a professor of computer science at Essex university – a Workshop on “Discrete Systems and Fuzzy Reasoning” held at London’s Queen Mary College. At this workshop some similar projects to control technical systems using fuzzy algorithms were presented, e.g. a basic oxygen steel making process at the British Steel Corporation in Cambridge, England [65], a sinter making plant at the British Steel Corporation in Middlesborough, England [66, 67], and a pilot scale batch chemical process in the Warren Spring Laboratory in Stevenage, England in [68, 69]. Some other FC investigations of this time were a FC system to control a warm water plant in the Delft Technical High School in the Netherlands [70] and a heat exchanger at the McMaster University in Canada.

The step forward from small laboratory systems to the first large-scale commercial fuzzy controlled system was taken very soon. The first “big science” FC system was built in Denmark by Jens-Jörgen Østergaard and Lauritz Peter Holmblad who joined the company F. J. Smidth & Co. upon graduation from the Technical University of Copenhagen. It was a system for the automatic control of a cement kiln. Attempts to automate cement production had always failed in the past because the process of cement burning is highly complex, ovens do not behave linearly and only a few measurements can be taken during the process [71]. The fuzzy cement kiln developed by Holmblad and stergaard functioned very successfully and reliably, however. It was the starting point of the “Fuzzy Boom” that started in the 1980s in Japan and later pervaded the Western hemisphere. Many fuzzy applications, such as domestic appliances, cameras and other devices appeared in the last two decades of the 20th century. Of greater significance, however, was the development of fuzzy process controllers and fuzzy expert systems that served as trailblazers for scientific and technological advancements of fuzzy sets and systems.

7 Fuzzy Sets in Humanities and Social Sciences

In 1969 Zadeh proposed his new theory of fuzzy sets to biologists: “The great complexity of biological systems may well prove to be an insuperable block to the achievement of a significant measure of success in the application of conventional mathematical techniques to the analysis of systems.” [72] “By ‘conventional mathematical techniques’ in this statement, we mean mathematical approaches for which we expect precise answers to well-chosen precise questions concerning a biological system that has a high degree of relevance to its observed behaviour. Indeed, the complexity of biological systems may force us to alter in radical ways our traditional approaches to the analysis of such systems. Thus, we may have to accept as unavoidable a substantial degree of fuzziness in the description of the behaviour of biological systems as well as in their characterization.” [72]

We find great complexity not only in biological systems but also in social sciences and humanities. At the end of the 1960s and for a greater audience two years later, Zadeh wrote more generally: “What we still lack, and lack rather acutely, are methods for dealing with systems which are too complex or too ill-defined to admit of precise analysis. Such systems pervade life sciences, social sciences, philosophy, economics, psychology and many other ‘soft’ fields.” [16]

Zadeh was inspired by the remarkable human capability to perform a wide variety of physical and mental tasks without any measurements and any computations”, e. g. parking a car, playing golf, deciphering sloppy handwriting, and summarizing a story. He distinguished between mechanical (or inanimate or man-made) systems at one hand and humanistic systems at the other hand and he saw the following state of the art in computer technology:

Unquestionably, computers have proved to be highly effective in dealing with mechanistic systems, that is, with inanimate systems whose behavior is governed by the laws of mechanics, physics, chemistry and electromagnetism. Unfortunately, the same cannot be said about humanistic systems, which – so far at least – have proved to be rather impervious to mathematical analysis and computer simulation.

He defined a “humanistic system” to be

a system whose behaviour is strongly influenced by human judgment, perception or emotions. Examples of humanistic systems are: economic systems, political systems, legal systems, educational systems, etc. A single individual and his thought processes may also be viewed as a humanistic system. [51, Part I, p. 200]

Zadeh summarized “that the use of computers has not shed much light on the basic issues arising in philosophy, literature, law, politics, sociology and other human-oriented fields. Nor have computers added significantly to our understanding of human thought processes-excerpting, perhaps, some examples to the contrary that can be drawn from artificial intelligence and related fields.” [51, Part I, p. 200]

Thus, hard computing has been very successful in hard sciences but it could not be that successful in humanistic systems in the field of soft sciences. Therefore we should open the field of applications of soft computing to the soft sciences. This is what Zadeh had in mind when he proposed the notion of soft computing:

I expected people in the social sciences-economics, psychology, philosophy, linguistics, politics, sociology, religion and numerous other areas to pick up on it. It’s been somewhat of a mystery to me why even to this day, so few social scientists have discovered how useful it could be. Instead, Fuzzy Logic was first embraced by engineers and used in industrial process controls and in ‘smart’ consumer products such as hand-held camcorders that cancel out jittering and microwaves that cook your food perfectly at the touch of a single button. I didn’t expect it to play out this way back in 1965.” [57].

The field of Soft Computing in Humanities and Social Sciences is at a turning point. Not very long ago, the very label seemed a little bit odd. Soft Computing is a technological field while Humanities and Social Sciences fall at the other pole of the academic field. In recent years, however, this has changed. The strong distinction between “science” and “humanities” has been criticized from many fronts and, at the same time, increasing cooperation between the so-called “hard sciences” and “soft-sciences” is taking place in a wide range of scientific projects dealing with very complex and interdisciplinary topics [73].

In the last fifteen years the area of Soft Computing has also experienced a gradual rapprochement to disciplines in the Humanities and Social Sciences [58, 74].

8 Outlook: Computing with Words and Perceptions

Artificial Intelligence (AI) was born in the 1950s in the USA and spread to many scientific and technological communities throughout the world. The history of AI is a story of several successes but has lagged behind expectations. AI became a field of research to build computers and computer programs that act “intelligently” although no human being controls those systems. AI methods became logic-based to find exact solutions. However, not all problems can be resolved with these methods. On the other hand, humans are able to resolve such tasks very well, as Lotfi Zadeh mentioned in many speeches and articles over the last century. In conclusion, he stated that “thinking machines” do not think as humans do. From the mid-1980s he focused on “Making Computers Think like People” [75]. For this purpose, the machine’s ability “to compute with numbers” was supplemented by an additional ability that was similar to human thinking. In the 1990s he established Computing with Words (CW) [76, 77] instead of exact computing with numbers, as a method for reasoning and computing with perceptions based on the theory of fuzzy sets. In his article “Fuzzy Logic = Computing with Words” in May 1996, he stated that “the main contribution of fuzzy logic is a methodology for computing with words. No other methodology serves this purpose.” [75, p. 103] Three years later he wrote “From Computing with Numbers to Computing with Words – From Manipulation of Measurements to Manipulation of Perceptions”, to show that a new Computational Theory of Perceptions, or CTP for short, is based on the methodology of CW. In CTP, words play the role of labels of perceptions and, more generally, perceptions are expressed as propositions in natural language. [76, p. 105].

Fig. 22
figure 22

Perception-based system modeling, [78]

As we said already, he was inspired by the “remarkable human capability to perform a wide variety of physical and mental tasks without any measurements and any computations. [...] Underlying this capability is the brain’s crucial ability to reason with perceptions – perceptions of time, distance, speed, force, direction, shape, intent, likelihood, truth and other attributes of physical and mental objects.” [77, p. 105]. Zadeh intended to establish a new dimension of artificial intelligence [78, p. 73]. He received an opportunity to propose these considerations concerning “A New Direction in AI” to the AI community at the beginning of the new millennium, when his manuscript was accepted for the AI Magazine issue in the spring of 2001 [78].

In this article he presented a new view on system theory, namely perception-based system modeling: In “perception-based system modeling”, the input, the output and the states are assumed to be perceptions (Fig. 22).

The 50th anniversary of a scientific theory is a good opportunity to cast a retrospective look at its consequences and achievements. Many aspects of this history are a matter of course, such as definitions of the theorys entities, theorems and protagonists of important developments. However, some are facts that were unknown to most interested persons and also some specialists. The original research work on the history of the theory of fuzzy sets, as presented in this chapter, shows that its history cannot be comprehended without reflecting on the history of system theory. Moreover, fuzzy set theory must be regarded as an inherent part of the history. This deep connection is evidenced from the very beginning of Zadeh’s scientific all the way up to his recent lectures and articles. With his varying views on system theory, Computing with Words and the Computational Theory of Perceptions, he postulated new directions for science and technology, in the fields of information science, computer science, and artificial intelligence. Perhaps the lesson to be learned from this history is that creating new views is one of the most effective means of keeping a scientific theory – such as fuzzy set theory – alive.