1 Introduction

In computer vision system, the interpretations of 3D motion [9], 3D transformations, and 3D face or object matching [6, 10] important tasks. There have been many methodologies to solve them. However, these methods are time-consuming, computation extensive, and weak to noise. A small complexity, robust performance, and quick convergence of artificial neural network (ANN) are vital for its wide applicability. This study is devoted to investigate the solution using neural network with 3D parameters. Any ANN designed must have small complexity for real-life applications, and at the same time, it should have adequate generalization and adequate functional mapping capabilities. The proposed learning machine consists of multi-layer network of 3D vector-valued neurons. In 3D vector-valued neural network, the parameters such as threshold values, input–output signals are all 3D real-valued vectors and weights associated with connections are 3D orthogonal matrices. All the operations in such neural models are scalar matrix operations. The corresponding 3D vector-valued back-propagation training algorithm is the natural extension of complex-valued back-propagation algorithm [1, 5] and has natural ability to learn 3D motion as complex-BP learns 2D motion.

In this paper, we explore the characteristics of 3D vector-valued neural network through variety of computational experiments. Section 2 explains the architecture of a 3D vector-valued neuron. Section 3 and Appendix define 3D vector-valued version of back-propagation algorithm for a multi-layer network. The learning and generalization ability of above mentioned neural network is investigated using diverse test patterns in Sect. 4. Section 5 works out the neural network and learning algorithm for 3D face matching. Section 6 presents the concluding remarks and suggestions for future works.

2 3D Vector-valued neuron

The complexity of ANN depends on the number of neurons and learning algorithm. The higher the complexity of an ANN is the more computations and memory intensive it can be. The number of neurons to be used in an ANN is a function of the mapping or the classifying power of the neuron itself [7, 8]. Therefore, in case of high-dimensional problem, it is imperative to look for higher dimensional neuron model that can directly process the high-dimensional information. It will serve as a building block for a powerful ANN with fewer neurons. Various researchers have independently proposed extension of real-valued neuron (one dimension) to higher dimension [3, 4]. Most of them have followed natural extension of number field like real number (one dimension), complex number (two dimension), 3D real-valued vectors (three dimension), quaternion (four dimension) for representation of higher dimension neurons. Therefore, it will be worthwhile to explore the capabilities of the 3D vector-valued neurons in function mapping and pattern classification problems in 3D space. The activation function for 3D vector-valued neuron can be defined as 3D extension of real activation function. Let V = [V x , V y , V z ]T be the net internal potential of a neuron, then its output is defined as:

$$ Y= f(V) = [f(V_x),\,f(V_y),\,f(V_z)]^T $$
(1)

3 3D Back-propagation network

The learning rules correlates the input and output values of the neurons by adjusting the interconnection weights. In neural network literature, most popular back-propagation learning algorithm (for single dimension) has been extended to higher dimensional neuron models [1, 2, 7]. Appendix presents a 3D vector version of back-propagation (3D-BP) algorithm for a three layer network (L − M − N) of vector-valued neurons, based on scalar product operation among network parameters where, L is the number of inputs, M is the number of neurons in hidden layer, and N is the number of neurons in output layer. The synaptic weights are three dimensional orthogonal matrices. The learning rules have been obtained using steepest descent on error function, which enables the neuron or network to learn 3D vector patterns naturally by minimizing mean square error:

$$ E=\frac{1} {2}|e|^{2} , $$
(2)

where \(|e| = \sqrt{e_{x}^2 + e_{y}^2 + e_{z}^2}\) and e = [e x , e y , e z ]T.

4 Generalized mapping in 3D

The three basic classes of transformations (scaling, rotation, and translation) are used to convey the dominant geometric characteristics of mapping [7]. They are used to solve many mathematical and practical engineering problems. This article presents the mappings for different combinations of above mentioned three transformations. In our computational experiments, we have considered a three layer network (2-6-2) of vector-valued neurons. It transforms every input point (x, y, z) into another point (x′, y′, z′) in 3D space. All points in training and testing patterns are within a unit sphere centered at origin.

For all experiments considered in this section, the training input–output points lie on a straight line in the 3D space. The line has a defined reference point (mid of line). Similarly, test patterns take a set of points lying on the surface of curve (3D structure), which has a well-defined reference point, for example, center of sphere or nose tip of face. Thus, the first input of considered network (2-6-2) takes a set of points from the surface of input curve, and second input is its reference point. Similarly, the first output neuron gives the surface of transformed curve, and second output gives its reference point. In different experiments, it is observed that considering the reference point of curves yields better accuracy in results. In all transformation experiments, the training is done with 21 equal interval points lying on a line and generalization of trained network is tested over different 3D geometric structures. The considered network is run till error ‘E’ converges to 0.00001 using 3D-BP algorithm. The transformations of different 3D structures are graphically shown in various figures of this section.

In our first experiment, the considered network is trained for input–output mapping over a straight line in 3D for similarity transformation (scaling reduction of 1/2). The generalization ability of such a trained network is tested over sphere (1,681 data pints) and 3D face of point cloud data (6,397 data pints). The transformation result in Figs. 1 and 2 shows excellent generalization with proposed methodology.

Fig. 1
figure 1

Similarity transformation in 3D

Fig. 2
figure 2

Similarity transformation in 3D

In second experiment, Fig. 3 presents the generalization performance of considered network, which learned the composition of scaling and translation transformations. There are 451 test data points on cylinder. All patterns in 3D are contracted by factor 1/2 and displaced by (0, 0.2, 0).

Fig. 3
figure 3

Scaling and translation in 3D

In another experiment, the input training points along the straight line are rotated over π/2 radians clockwise and displaced by (0, 0, 0.2). Thus, considered network has learned the composition of rotation and translation. Figure 4 presents the generalization capability of this network over cylinder containing 451 data points.

Fig. 4
figure 4

Rotation and translation in 3D

In last experiment, Fig. 5 presents the generalization performance of neural network which learned the composition of all three transformations. There are 451 data points on surface of a cylinder. They are contracted by factor 1/2, rotated over π/2 radians clockwise, and displaced by (0, 0, 0.2).

Fig. 5
figure 5

Composition of all transformation in 3D

5 3D Face matching

The face recognition technologies have come a long way in the last twenty years for automatic verification of identity information for secure transactions, surveillance, access control, and security task [6, 10]. But little attention was given in design of authentication system which can process 3D patterns. It is one of the main areas of current research to determine the identity of human being. This paper focuses on 3D faces recognition using high-dimensional neural network. The proposed technique is found to be pose and expression invariant. In our experiments, we have performed simulations over point cloud data of two sets of 3D faces. Given 3D faces were pre-sampled but not normalized. They ware normalized using standard 3D geometric transformations (Translation, rotation, and finally scaling). It aligns each face on a same scale and at same orientation. In order to make standard alignment, the origin is translated to the nose tip. The scanned 3D face data are of front part of face and almost straight, and accordingly it is translated. A 1-2-1 network of vector-valued neurons was used in following two experiments:

Example 1

The network was trained by a face, Fig. 6e, from first set of face data shown in Fig. 6. This face data set contains five faces of different persons. Each face contains 6,397 data pints. All five faces are tested in trained network. The testing error for four other faces is much higher in comparison with the face which is used for training, as shown in Table 1. It demonstrates that four other faces are different from the face used in training.

Table 1 Comparison of testing error of first set (Fig. 6) of face data
Fig. 6
figure 6

The set of five different faces

Example 2

The network was trained by a face, Fig. 7a, from second set of face data shown in Fig. 7. This face data set contains five faces of same person with different orientation and poses. Each face contains 4,663 data pints. All five faces are tested in trained network. The testing error for four other faces is minimum and comparable to the face which is used for training, as shown in Table 2. It demonstrates that four other faces are same as the face used in training.

Table 2 Comparison of testing error of second set (Fig. 7) of face data

6 Conclusion

The generalization of transformations in different experiments brings out the fact that 3D vector-valued network preserves the angle between oriented curves, and the phase of each point on the curve is also maintained during transformations. This enables to learn 3D motion of signals in the 3D vector-valued neural network by virtue of its inherent property, as complex valued network allows learning 2D motion of signals [7]. In contrast, real-valued neural network administers 1D motion of signals, and such generalization is not possible [3]. Our experiments confirmed that a 3D back-propagation network can learn composition of transformations and has ability to generalize them with small error. The 3D back-propagation network has shown very good performance for 3D face matching, especially in varied orientations and poses of faces.

Fig. 7
figure 7

The set of five faces of same person with different orientation and poses

The work presented in this paper is basic and fundamental, but results are surprisingly encouraging. We expect the application of such learning machine in design of robust authentication system for large data set, which is an urgent need of today’s world scenario. The 3D mapping and 3D face matching presented in this paper will be extended to 3D object motion interpretation and 3D face reconstruction in near future.