Abstract
The paper discusses the linear separability of data classes and the relationship of threshold neurons with class classifiers. The possibility of constructing a neuron that can solve two different problems without the need for an intermediate change in its parameters and architecture is shown theoretically. The idea is illustrated with a specific example of a neuron solving problems simultaneously with both Boolean functions “AND” and “OR”. A conclusion has been drawn for the existence of a neuron that can solve a class of an infinite number of problems. A necessary condition for this is that the domain of the problems is linearly separable from the surface in the input data space and the existence of parallel classifiers for separability for each individual problem.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Let a set of objects be given
each object \(A_i\), i = 1…m is characterized by “n” different attributes of a completely random type. In the general case, when considering the linear separability, it is not necessary to define the type of the individual attributes in advance, but to draw attention to the possibility of grouping the objects into classes through these attributes. One attribute can be qualitative—for example, “color” or “shape,” another—quantitative, for example, “weight” or “temperature,” and if necessary, for each qualitative characteristic we could give some quantitative expression.
Following this line of thought, if we introduce quantitative correspondences of qualitative characteristics, then we can describe each object only with quantitative characteristics. For example, if for the colors we assume green = 28, and for the shapes—polygon = 123, then the object green polygon is:
and it can be represented as:
To simplify the reasoning, let us assume that all the characteristics of the considered objects are quantitative values. This in turn means that each object of the set A can be considered as a point in the n-dimensional space of the attributes (Fig. 1), i.e.,
As for the linear separability of A, it can be represented as a union of two linearly separable classes [1, 2]:
if there exists a plane \(\alpha\) with representation:
such as for \(\forall~B_{i } \left( {x_{i1} , x_{i2} , \ldots x_{in} } \right) \in B,\) the inequality is fulfilled
and \(\forall~C_{j } \left( {x_{j1} , x_{j2} , \ldots x_{jn} } \right) \in C\)—respectively:
Let us consider a two-dimensional space of attributes, assuming that all objects of the set A are characterized by only two qualities. In this case, n = 2 the plane (1) is reduced to its two-dimensional analogue—straight line with the equation:
where \(x_1\) and \(x_2\) are the coordinates of an arbitrary point of \(\alpha\).
Transferred to \(E^2\) (Fig. 2), based on the two-dimensional variant of conditions (2) and (3), classes \(B\) and \(C\) are linearly separable if for \(\forall~B_{i } \left( {x_{i1} , x_{i2} } \right) \in B\) the equation is fulfilled:
Since the conditions (2′) and (3′), classifying the belonging of an object to one of the two classes are based on equation (1′), the line \(\alpha\) is a linear classifier for B and C.
There are many optimization algorithms for finding a suitable classifier that can effectively separate classes [3, 4] and some testing methods for linear separability [5]. In our case, however, we are interested not so much in finding the most efficient separation, but in finding suitable parallel classifiers. Let us pay attention to the fact that if two classes are bounded and linearly separable, an infinite number of linear classifiers can be indicated (Fig. 3), which separate them in the way described by (1)–(3).
Let us look at the two-dimensional Boolean function “AND” (Table 1). Its domain consists of 4 points:
Let us introduce the class of points \(B\), containing the elements of \(D\), for which \(X_1 \wedge X_2 = 0\), and the class \(C\), containing the points from D.O, for which \(X_1 \wedge X_2 = 1.\) Given the essence of the Boolean function “AND”, we look for a classifier type (\(1^{\prime}\)), for which the following system is implemented:
It is obvious that the free member b must meet the conditions:
Then for every two specific positive numbers w1 and w2, we can find a corresponding value of b, by which we can define a classifying straight line of the type (1′), which will be only one of an infinite number of members of the same family of classifiers. If, for example, w1 = 0.3 and w2 = 0.7, then according to the system (4) we can choose arbitrarily b:
One possible solution is b = −0.8. Thus, this particular representative of the whole possible class is given by the equation:
For the coordinates of each point \(\left( {x_1 , x_2 } \right) \in \alpha_1\) is met
This is a straight line passing through point (0, 1.143) and having an angular coefficient k = −(0.3/0.7) = −0.428. Figure 4 shows an illustration of other possible solutions:
\(\alpha_2 : 0.3x_1 + 0.7x_2 - 0.9 = 0\),
which has the same angular coefficient as that of \(\alpha_1\), and
\(\alpha_3 :2x_1 + 9x_2 - 10 = 0,\)
with angular coefficient k = −0.222.
2 Threshold Neuron and Its Relationship with Linear Classifiers
Let us consider a threshold neuron with n inputs which are activated through the following step function:
If the weight vector \(\overrightarrow W\) has coordinates (\(w_1 ,w_2 , \ldots ,w_n )\), at input stimuli \(\left( {x_1 ,x_2 , \ldots x_n } \right)\) along the axon of the neuron, a signal will propagate with the following value:
where b is the threshold of the neuron [6, 7]. Given the peculiarities of the activating function, it is clear that this signal has a binary character—it will be “0” or “1”. The neuron thus defined is called the threshold logic or TLU (Fig. 5), or just threshold neuron [8]. The term perceptron is often used as a synonym for threshold logic unit, although the perceptron is generally much more than a simple threshold logic unit [9].
Based on the way the TLU is constructed, it is clear that if the stimuli \(\left( {x_1 , x_2 , \ldots x_n } \right)\) appear in its dendritic tree, at fixed weights \(\left( {w_1 ,w_2 , \ldots ,w_n } \right)\), only the following results are possible:
Thus, unlike other neurons, TLU easily realizes the concept of “all or nothing,” which makes it especially suitable for solving logical problems [10].
We will consider a special case when using threshold neurons, namely the one in which the set of input data is linearly divisible by a given attribute.
Thus, let the domain of the input variables be linearly divisible with respect to the ordered n-tuple \(\left( {x_1 , x_2 , \ldots x_n } \right)\) leading to the appearance of “1” at the output of the neuron, and those \(\left( {x_1 , x_2 , \ldots x_n } \right),\) which generate the end result “0”. From the assumed linear separability, it follows that there is a plane:
for \(\forall \left( {x_{01} ,x_{02} , \ldots x_{0n} } \right)\), for which at the output we have
the inequality is fulfilled
and for \(\forall \left( {x_{11} , x_{12} , \ldots x_{1n} } \right)\), for which at the output we have
The presented information about the linear classifiers outlines their connection with the threshold neurons. If we look at the total signal in the body of the artificial neuron, we will notice the obvious classification form:
while the classification itself is done through the activating function (5). Thus, the action of the trained threshold neuron could be considered as an act of classification of the linearly separable domain of its input variables. This essential relationship between TLU and linear classifiers allows us to draw two very important conclusions:
-
(1)
Based on each mathematically constructed linear classifier, we can form a threshold neuron that is genetically prepared to solve the classifier problem. If we consider, for example, one of the classifiers for the Boolean function “AND”, which we built above:
$$\alpha_1 : 0.3x_1 + 0.7x_2 - 0.8 = 0$$It is clear that we could form the neuron as shown in Fig. 6. Along the axon, we have a signal which, given the weights and thresholds, fully corresponds to the Boolean function “AND”.
-
(2)
We mentioned that if two classes are linearly separable, then there are infinitely many linear classifiers separating the objects in each of the classes. And since the classifier is uniquely determined by the coefficients \(w_1 , w_2 , \ldots , w_n , b,\) with which we could subsequently form a corresponding threshold neuron, this means something very important to us, namely For each specific problem with a linear separable compact domain of the input variables, there are an infinite number of threshold neurons that are able to solve it.
On the other hand, from the existence of infinitely many, but let us emphasize now, parallel linear classifiers, follows the existence of neurons:
with weights for which
and associated with planes
3 Solving Multiple Problems from a Single Threshold Neuron
An interesting question is about the possibility of the same neural network to solve a set of several tasks. There are various researches on this issue. For example, Kirkpatricka and team apply a special type of neural network regularization, which is associated with sequential training of the network to solve two tasks—A and B [11]. In the second task B, the weights required for the first task A are retained and the gradient descent continues. On the other hand, Yang and team train single network models to perform 20 cognitive tasks that depend on working memory, decision making, categorization, and inhibitory control [12]. The authors find that after training, recurrent units can be grouped into clusters that are functionally specialized for different cognitive processes and introduce a simple but effective measure to quantify relationships between single-unit neural representations of tasks. In the present study, we focus on the possibility of a separate threshold neuron to solve two logical functions simultaneously, without the need for sequential training for both tasks, as in the networks of Kirkpatricka and team.
The connection of the threshold neurons with the linear classifiers creates preconditions for searching for ways to apply linear separation in neural networks. We have already mentioned that for each specific problem with a bounded, closed and linearly separable area of the input data, an infinite number of neurons can be constructed, which are genetically prepared to solve it. Let us look at things from a different perspective and ask ourselves the question—is it possible for a threshold neuron to be constructed in a way that allows it to solve several different problems?
Let two problems be given related to finding the binary solutions of different functions of n-tuples: \(f_1 = f_1 \left( {x_1 , x_2 , \ldots x_n } \right)\) и \(f_2 = f_2 \left( {x_1 , x_2 , \ldots x_n } \right)\). Let the domains D.O1 and D.O2 of the two functions be compact and linearly separable with respect to the n-tuples \(\left( {x_1 , x_2 , \ldots x_n } \right)\), for which \(f_1\) and \(f_2\) return “0” and, respectively, “1”. Under these conditions, it follows that there are threshold neurons \(H_1\) and \(H_2\), which successfully solve the two problems (Fig. 7) in the following way:
However, it is there a vector \(\left( {w_1 ,w_2 , \ldots ,w_n } \right)\), and a value for \(b,\) with which both problems can be solved by a single neuron? What would such a threshold neuron look like? If we submit only the variables \(\left( {x_1 ,x_2 , \ldots ,x_n } \right)\) at the input, will the sought TLU know what to do with them? Should we use this data to calculate the function \(f_1 \left( {x_1 ,x_2 , \ldots ,x_n } \right)\), or should we use the inputs provided to calculate the value on \(f_2 \left( {x_1 ,x_2 , \ldots ,x_n } \right)\)? Obviously, along with the input data, the neuron needs another input to get a question: “How much is \(f_1 = f_1 \left( {x_1 ,x_2 , \ldots ,x_n } \right)\) ?” or the task “Calculate \(f_2 = f_2 \left( {x_1 ,x_2 , \ldots ,x_n } \right)\).”
The purpose at first glance is to find the weights \(\left\{ {w_i } \right\}_{i = 1}^n\) and the threshold \(b\), for which:
Now we have another input for the neuron, and that is “Question” (Fig. 8). It is also a variable input value to the sought neuron and is fed to its body through the dendritic tree, along with the other variables. Thus, with the emergence of the need for a question, the input vector now takes the shape \(\overrightarrow X \left( {x_1 ,x_2 , \ldots ,x_n ,x_{n + 1} = {\text{question}}} \right)\), and the weight—\(\overrightarrow W\) \((w_1 ,w_2 , \ldots ,w_{n,} w_{n + 1} )\). That is why we are actually looking for
We have already discussed that the qualitative characteristics of objects can be represented by quantitative values. And since at the input of the neuron there is no way to ask the question \(x_{n + 1}\) directly by using some linguistic construction, we need a quantitative interpretation that the neuron can process correctly. So, let
and
This way, we look for weights \(\left\{ {w_i } \right\}_{i = 1}^{n + 1}\) of a neuron with n + 1 inputs \(x_1 ,x_2 , \ldots ,x_n , x_{n + 1}\) and a threshold b, for which:
The ability of the neuron \(H_1\) to calculate correct values for the function \(f_1 \left( {x_1 ,x_2 , \ldots ,x_n } \right)\) means that there is a linear classifier of the domain D.O1 of the input variables of the first problem, represented by a plane
\(S_1 : \mathop \sum \limits_{i = 1}^n w_{1i} x_i + b_1 = 0,\)
where \(x_i\) are the coordinates of any point \(M \in S_1\).
Adding the question \(x_{n + 1}\) requires a transition to (n + 1)—tuple dimensional space of attributes and a requirement for linear separation of D.O1 through
In a similar way for the plane \(S_2\), which divides linearly D.O2 in the (n + 1)—tuple dimensional space of the attributes, we have:
where \(x_i\) are the coordinates of any point \(M \in S_2\).
The domain for the threshold neuron we are looking for is:
The neuron H that solves the two problems would exist only if this common D.O is linearly separable. However, given that D.O consists of points in the coordinate plane \(Ox_1 x_2 \ldots x_n x_{n + 1} , as~x_{n + 1} = 0\) and points in its parallel \(Ox_1 x_2 \ldots x_n x_{n + 1} , as~x_{n + 1} = 1\), from (6)–(8) it follows that for neurons \(H_1\) and \(H_2\) a very important condition must be imposed: The existence of parallel linear classifiers of the type (6) and (7), through which to build the sought plane dividing the common D.O. This condition means that it is necessary to have surfaces S1 and S2, with the representation (6) and (7), respectively, for which the condition of parallelism is fulfilled:
allowing us to construct new planes separating D.O of the same class of parallel planes
In other words, from the existence of parallel surfaces S1 and S2, with the representation (6) and (7) follows the possibility of constructing a family of parallel planes containing S1 and S2 \((\lambda = 0~or~1)\) and linearly separating D.O.
Let us look at a specific example—the Boolean functions “AND” and “OR” with the truth tables given in Table 1. The domains of the two functions are linearly separable with respect to the pairs (X1, X2) returning “0” or “1”. Let \(S_1\) be one of the infinite sets separating D.O1 lines:
Then there is a threshold neuron H1, solving the function \(f_1 \left( {x_1 , x_2 } \right) = x_1 \wedge x_2\), with weights and threshold indicated by the coefficients of \(D_1\) (Fig. 9). Similarly, if we look at the Boolean “OR” function and use the classifier:
Through it we can form a neuron H2, solving the function \(f_2 \left( {x_1 , x_2 } \right) = x_1 \vee x_2\).
So, we have two problems
which we can solve quite precisely through two different neurons. Our goal is to build a neuron H, which is capable of solving both problems (Fig. 10).
The classification lines \(S_1\) and \(S_2\) have angular coefficients, respectively, \(k_1 = - \left( {\frac{{w_{11} }}{{w_{12} }}} \right) = - 1.5\), and \(k_2 = - \left( {\frac{{w_{21} }}{{w_{22} }}} \right) = - 0.8333\). Obviously, they are not parallel. We cannot build a plane through them that divides in an appropriate way
One possible solution is to keep looking for other linear classifiers for the two functions until we find parallel ones. Another solution is to use the condition of parallelism (9), which in our case has the following representation:
Therefore, we can choose the linear classifier S2, in such a way that
So with the correction (10) of \(w_{22} ,\) we have:
We can easily see that this line is still a linear classifier for the Boolean “OR” function, with an angular coefficient \(k_2 = - \left( {\frac{{w_{21} }}{{w_{22} }}} \right) = - 1.5\), i.e., \(S_1 \parallel S_2\).
Let point \(M_{1 }~{\text{and}}~{\text{point}}~M_2 \in S_1\), while point \({{M}}_{3}~{\text{and}}~{\text{point}}~M_4 \in S_2\). The choice of these points does not matter much. It is enough to concretize them clearly, so that belonging to the two lines, through them to form the vectors with which we will construct a plane containing \(S_1\) and \(S_2\). Let:
Let us now form the vectors \(\vec{p}\) = \(\overrightarrow {M_3 M_1 }\) and \(\vec{q}\) = \(\overrightarrow {M_4 M_2 }\). We have
Then, if the plane S is defined by the vectors \(\vec{p}{\text{ and }}\vec{q}\), and the point \(M_4\), then
as
For the free member D, we have:
where \(\left( {x_1 , x_2 , x_3 } \right)\) are the coordinates of point M4.
Given the specific values of the coordinates of the vectors \(\vec{p} , \vec{q}\) and the point \(M_4 ,\) it follows that we can construct a linear classifier of
which has the representation:
and the corresponding threshold neuron has the construction as shown in Fig. 11.
Let us recall that
where \(x_3\) = 0, if we want to ask a question to the neuron “How much is \(f_1 \left( {x_1 , x_2 } \right) = x_1 \wedge x_2 ?\)”, and \(x_3\) = 1, to demand calculation of the value of \(f_2 \left( {x_1 , x_2 } \right) = x_1 \vee x_2\).
We will do a check with two specific examples. Let us pass the pair of logical variables \((x_1 = 0 , x_2 = 1)\) at the input of the neuron and pose the question “How much is \(f_1 \left( {x_1 , x_2 } \right) = x_1 \wedge x_2 ?\)”. In the dendritic tree, we have input stimuli:
Then, according to (11), a signal propagates along the axon of the neuron
Now let us ask the question to calculate the value of \(f_2 \left( {x_1 , x_2 } \right) = x_1 \vee x_2\). At the same values of \(x_1\) and \(x_2\), at the input of the neuron, we have:
and on the axon—a signal:
Other cases can be checked in a similar way.
4 Conclusion
Examining the linear separability of the data and the relation of the threshold neurons with the classifiers of the classes, we showed that it is possible to construct a neuron that can solve two different problems without the need for an intermediate change in its weights. But how effective and useful is a neuron that calculates several functions? On the one hand, the use of such a neuron saves the use of neural structures; therefore—memory, and on the other hand—this leads to an increase in the number of operations in the body of the neuron by two.
In conclusion, we should note that summarizing the presented ideas and results, we can find a neuron that solves a whole class of an infinitely number of problems with domains that are linearly separated from the found surface S. The only condition is that for each individual problem there is separability with linear classifiers which are parallel to each other.
References
Gelig, A., Mateev, A.: Introduction to the Mathematical Theory of Learned Recognition Systems and Neural Networks. St. Petersburg State University (2014)
Bauckhage, C., Cremers, O.: Lecture Notes on Machine Learning: Linear Separability, University of Bonn, Bonn (2019)
Kowalczyk, A.: Support Vector Machines Succinctly, Syncfusion, Inc. (2017)
Picton, P.: Threshold logic: is there finally a solution? In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 45–51 (2016). https://doi.org/10.1109/IJCNN.2016.7727179
Elizondo, D.: The linear separability problem: some testing methods. IEEE Trans. Neural Netw. 17(2) (2006)
El-Shahat, A.: Advanced Applications for Artificial Neural Networks. Georgia Southern University (2018)
Gorzałczany, M.: Essentials of Artificial Neural Networks. Physica, Heidelberg Online ISBN 978-3-7908-1801-7 (2002)
Gurney, K.: An Introduction to Neural Networks. Taylor & Francis e-Library, London and New York (2004)
Kruse, R., Borgelt, C., Braune, C., Mostaghim, S., Steinbrecher, M.: Threshold logic units. In: Computational Intelligence. Texts in Computer Science. Springer, London (2016). https://doi.org/10.1007/978-1-4471-7296-3
Minsky, M.L., Papert, S.A.: Perceptrons. MIT Press, Cambridge (1969)
Kirkpatricka, J., Pascanua, R., Rabinowitza, N., Veness, J.: Overcoming catastrophic forgetting in neural networks, DeepMind, N1C 4AG. United Kingdom, London (2017)
Yang, G., Joglekar, M., Song, H., Newsome, W., Wang, X.-J.: Task representations in neural networks trained to perform many cognitive tasks. Nat Neurosci 22, 297–306 (2019)
Acknowledgements
The work is partly funded by the MU21-FMI-004 project at the Research Fund of the University of Plovdiv “Paisii Hilendarski.”
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yotov, K., Hadzhikolev, E., Hadzhikoleva, S. (2022). Linear Separability as a Condition for Solving Multiple Problems by a Single Threshold Neuron. In: Pandian, A.P., Fernando, X., Haoxiang, W. (eds) Computer Networks, Big Data and IoT. Lecture Notes on Data Engineering and Communications Technologies, vol 117. Springer, Singapore. https://doi.org/10.1007/978-981-19-0898-9_46
Download citation
DOI: https://doi.org/10.1007/978-981-19-0898-9_46
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0897-2
Online ISBN: 978-981-19-0898-9
eBook Packages: EngineeringEngineering (R0)