1 Introduction

The creation of caricatures constitutes one of art’s most demanding and difficult fields. Making a caricature does not involve the mere deforming of facial features. It concerns the distortion of carefully selected features using deterministic methods that are derived from fuzzy rules. Selection of the wrong features or application of non-deterministic deformation rules produce outcomes that will most likely fail to result in accurate caricatures. For this reason, the development of systems that can ensure the realization of accurate caricatures is a major challenge.

Over the past few decades various interesting approaches have been proposed in the field. One of the first and most important attempts is that of Brennan’s [2]. In her master thesis, Brennan developed an interactive system that helps the user select the dominant features of a given face and then exaggerates them by comparing with the average face. Another well-known system is that of PICASSO [5]. The PICASSO system computes an average face and then produces caricatures based on the difference between the input and the average face. In “Making Caricature with Morphing”, Akleman [1] designed a system that provides tools, which help a user to manually create caricatures. A different approach has been followed by Chang et al. in [3], where a feature database has been used, from which the final output is synthesized, after classifying the input to certain patterns. Towards this direction lies also the work of Liang el al. in [8], who used an artist’s samples, in order to train a system to recognize exaggeration patterns and apply them to upcoming input. A noteworthy effort was made by Gooch et al., in [4]. Although their work focused on the generation of sketches from images, their program helped the user distort a face and produce funny sketches by providing a grid. P. Chiang et al. in [3] developed a method that processes the input, compares it with the average face and exaggerates features according to certain rules. The output is a face grid that is applied (warped) to an artist’s sketches of various faces. G. Z. Xu et al. in [14] performed Principal Component Analysis (PCA) on the input face, to synthesize the output for various exaggeration degrees. A useful contribution has been presented by Mo et al. in [11]. The authors suggested that the variance of a population must also be considered in the process of caricature generation. Another probabilistic approach has been proposed by J. Liu et al. in [9], who suggested a method that trains a system to map inputs to outputs, in order to generate caricatures by using PCA and Support Vector Regression (SVR) learning. In a similar way, Chien-Chung Tseng et al. in [13] performed PCA and statistical analysis of the input, in order to determine prominent features and exaggerate them using sketches. Finally, J. Liu et al. in [10] tried to optimize their previous effort, by using a Manifold algorithm for dimension reduction, instead of SVR learning, during the training process.

As understood from the above, all efforts have focused on the development of systems that follow distortion rules, which are either extracted from the systems themselves –via learning– or are provided to them a priori, as the outcome of thought and experience of an expert.

In the first case, these rules can be expressed (e.g. as weighted vectors) but cannot be described. This kind of approach ultimately leads to an approximation of a space of caricatured faces, which cannot ensure accurate results for all inputs. Also, it requires training, which apart from being time and resource consuming, ties us to the quantity and quality of our samples, and limits us to the personal taste and style of the artists involved.

In the second case, the rules do not conform to proper scientific standards and are thus arbitrary. Whether it be tools provided to an artist, or a system that was programmed with the assistance of an artist, the dependence on the human factor has a significant impact on the output.

In a more general scope, all the aforementioned methods have a great reliance on the human touch, which intensely affects and shapes the end results. To avoid the development of intuitive systems that can lead to biased or inaccurate caricatures, we made an attempt to rule out the need for perspective. Separating subjective from objective in art is by itself a very ambitious task. Yet by logically analyzing the whole caricature generation process, we attempted to address this matter and propose a generic method that provides a set of well-defined rules, which can be put to use for the purpose of caricature generation.

2 Data preparation

To implement caricature generation systems, data preparation is required. Firstly, the way of representing facial information needs to be determined. Moreover, devising methods that are able to process and compare faces in a meaningful manner must also be addressed. In this direction, some kind of normalization of the input data and a metric are to be specified, in order to decide what facial features should be exaggerated. In this section, we present the conventions and techniques that were used for pre-processing face data in our study.

2.1 Face representation

In our approach, the input data -that is, the face- are represented as a set of points on the plane, in what ultimately constitutes a face graph. These points are selected in such a way, as to optimally provide the information of the facial features that is required for analysis. This information is of geometrical nature (eye distance, face width, mouth height, etc.) and depends on the artistic guidelines followed by any given system. Thus, every face is expressed as a set of points {p i  ∈ Z 2 | p i specifies the position of a facial feature on a face}, which are then subject to analysis and processing.

In the process of caricature generation with the method developed herein, a face is taken as input, then it is compared with the average face, and afterwards an output face is obtained as the final outcome. The described process can be seen in Fig. 1. In order to compare faces, facial features are represented as non-dimensional normalized numbers (namely scalars), say x i , where i is a feature index. Non-dimensional, in the sense that features are expressed as distance ratios. So, to be more precise, we do not perform comparisons between features, but rather between relations among features (ratios). Normalized, in the sense that these relations are determined according to a reference face, that is, the average face. Examples of feature relations are the ratios of the face height to face width, nose height to nose width, width of mouth to width of face, and so on. For instance for the feature “length of nose” we denote:

$$ {x}_i= Length\kern0.5em of\kern0.5em Nose\kern0.5em Ratio=\frac{Length\kern0.5em of\kern0.5em Nose}{Length\kern0.5em of\kern0.5em Face}, $$
Fig. 1
figure 1

Caricature generation process. The original face is compared with the average face in order to determine what features ought to be enhanced. The result of enhancing those features is the caricatured face

to express the length of the nose and compare and alter it, later on. Obviously x i  ∈ [0 ,  +  ∞ ).

In every face we select N important feature relations, like the one described above. Thus, every face is mapped to a N-dimensional vector:

x = [x 1, x 2,  … , x N ], where x i are the previously described features.

2.2 The significance of the average face

The “Reference Face”, as it is usually called, is a very important concept for caricature generation systems. The reference face is the face selected by the designers, with which all input must be compared, in order to determine the appropriate distortions for the face during succeeding stages of the overall process.

For the purpose of the present work, we choose the average face as our reference face, so as to determine the deviation of the input from the average face, and then amplify or de-amplify the differences. By “deviation” we mean the distance between the input and reference faces, according to a specified norm which is defined with the help of the points of the various features of the input and reference faces.

The actual selection of an average face is in the hand of the designer. Undeniably this depends on geo-political and cultural factors. From a practical point of view, though, the geometrical information of the average face that is needed for caricature generation is approximately the same, regardless of these factors. As a result, the statistically precise selection of an average face does not play a significant role in the field of caricature generation.

2.3 Face normalization

In order to perform fair comparisons among faces, we first have to normalize our face vectors. To do so, we begin by dividing each feature relation x i with the corresponding average feature relation. If μ = [μ 1, μ 2,  … , μ Ν ] is the average face, then our initial vector is:

$$ {\boldsymbol{x}}_n=\left[\frac{x_1}{\mu_1},\frac{x_2}{\mu_2},\cdots, \frac{x_N}{\mu_N}\right]=\left[{x}_{n1},{x}_{n2},\dots, {x}_{n N}\right],{x}_{n i}\in \left[0,+\infty \right), $$

and for the average face:

$$ {\boldsymbol{\mu}}_{\boldsymbol{n}}=\left[{\mu}_{n1},{\mu}_{n2},\dots, {\mu}_{n N}\right]=\left[1,1,\dots, 1\right] $$

If we subtract the average vector from the input vector, then we have the following “diversity” vectors:

$$ {\boldsymbol{x}}_{\boldsymbol{\delta}}={\boldsymbol{x}}_{\boldsymbol{n}}-{\boldsymbol{\mu}}_{\boldsymbol{n}}=\left[{x}_{n1}-1,{x}_{n2}-1,\dots, {x}_{n N}-1\right]=\left[{x}_{\delta 1},{x}_{\delta 2},\dots {x}_{\delta N}\right],{x}_{\delta i}\in \left[-1,+\infty \right) $$

Assuming that every face can be described by a diversity vector, as described above, a face representation space occurs. Let’s name this space a “diversity” space. In this space, we define an operation between two of its elements, as the half addition of these two elements. If \( {\boldsymbol{x}}_{\delta}^{(1)} \) and \( {\boldsymbol{x}}_{\delta}^{(2)} \) are two elements (faces), then by combining \( {\boldsymbol{x}}_{\delta}^{(1)} \) and \( {\boldsymbol{x}}_{\delta}^{(2)} \) a third element, \( {\boldsymbol{x}}_{\delta}^{(3)} \), is produced:

$$ {\boldsymbol{x}}_{\delta}^{(3)}=\frac{{\boldsymbol{x}}_{\delta}^{(1)}+{\boldsymbol{x}}_{\delta}^{(2)}}{2}=\left[\frac{x_{\delta 1}^{(1)}+{x}_{\delta 1}^{(2)}}{2},\frac{x_{\delta 2}^{(1)}+{x}_{\delta 2}^{(2)}}{2},\dots, \frac{x_{\delta N}^{(1)}+{x}_{\delta N}^{(2)}}{2}\right]=\left[{x}_{\delta 1}^{(3)},{x}_{\delta 1}^{(3)},\dots, {x}_{\delta 1}^{(3)}\right] $$

According to the above, the diversity space constitutes a group with its neutral element being that of the average face vector. Indeed, any elements of this space can be combined in order to produce a third element, while combining an element with the neutral element results in itself.

In this manner, for any given face x, such that x δi  ∈ [−1, 1], we are in a position to determine its opposite, x′, by simply finding a diversity vector x δ with which (x δ  + x δ ) / 2 = μδ = [0,0,…,0]. As we will soon see, limiting ourselves to [−1, 1] does not pose a problem for our purpose of caricature generation. For the resulting vector x δ , it is quite easy to determine x′ and represent the final actual face. The corresponding face of this opposite vector is called “anti-face” or “counter-face”, and consists of a face with opposite features to the prior face, namely, if the initial face has a small nose then its anti-face has a big nose, and so on.

Let’s name the pair of x and its anti-face, x′, “opposite pair”. For an opposite pair we obviously have that (x δ ) = x δ . An instance of a pair of opposite faces is displayed in Fig. 2.

Fig. 2
figure 2

Face and anti-face. A pair in the face space with the exact opposite features. By taking the average of any pair of faces such as this, we have the average of the face space

2.4 The anti-face as a means of self-perception

Having presented the notion of anti-face, we would like to point out a realization regarding it, other than its key role in our model for caricature generation. As presented by Rhodes et al. in [12], we all hold an average face representation in our brain, based on our visual experiences. Since we are constantly exposed to images of ourselves and people very similar to us, it follows that this average is slightly shifted towards ourselves; hence our face seems more normal to us than others. To understand the degree to which our face seems peculiar to others, we can use the anti-face. By rendering our anti-face using a statistically derived average (rather than the biased average derived from our experience) we can generate a face that deviates from the average the same amount as our own face does, only that it deviates in the complete opposite direction. Thus, we get a very firm depiction of a seemingly irrelevant face, which causes the same reaction to others as our own face does. This gives us a valid sense of how our face is perceived by others.

3 Determining facial feature exaggeration rules

Having analyzed how faces are represented and compared in this study, we now move on to the subject of how facial features are to be exaggerated. Let us begin so by defining the transformation of an element in the diversity space. If f i (x δi ) ,  x δi  ≥  − 1 is a transformation (deformation to be exact) of a feature x δi , then:

$$ f\left({\boldsymbol{x}}_{\boldsymbol{\delta}}\right)=\left[{f}_1\left({x}_{\delta 1}\right),{f}_2\left({x}_{\delta 2}\right),\dots, {f}_N\left({x}_{\delta N}\right)\right] $$

which is a mapping of x δ , based on the functions f i . The f i functions can be either the same or different. If they are identical for all features, then the face is uniformly distorted. If they are different, then some features may be exaggerated easier or harder than others. In the latter case, this process is what constitutes an artist’s “style”, meaning his/her tendency to emphasize certain characteristics more than others.

Without loss of generality, from here on we will examine the case of using the same f i functions, that is, the transformation of facial features will be uniformly applied. Therefore, when we refer to a transformation f, we are referring to the same rule, which is evenly applied to a face x, and thus the expressions f(x δ ) and f(x δi ) are an isomorphism, as described by Lewis and Papadimitriou in [7]. The presentation that follows is easily extended to cover the case of different f i functions.

Before we move on to the proposed method, it is important to point out the significance of the average face and the anti-face in the process of caricature generation. As stated in [12], visual mechanisms code identity in relation to the average face. To perceive and understand a caricature, one’s brain compares what is seen with the average face, in order to distinguish what features have been exaggerated. If an artist, for example, enlarges all noses in his or her work, an observer would be confused, because all caricatures would imply that all the corresponding faces have big noses. This is where the notion of anti-face comes in handy. The use of the anti-face not only helps sustain the average, but also helps provide balanced or “fair” distortions for the input faces. For all the aforementioned reasons, the use of unbiased functions for face distortion is crucial.

Let {x δ , x δ } be an opposite pair. The core of the method proposed herein relies on carrying out the following: In the process of selecting a transformation rule f for a caricature generation system, we impose the condition that the transformation of an opposite pair must be itself an opposite pair.

This condition can be stated with respect to the anti-face, x′, of a face, x, as follows:

The transformation of x δ , f(x δ ), should coincide with the anti-face, f(x δ ) , of the transformation of the anti-face of x δ , f(x δ ).

Therefore we demand that the transitions in Fig. 3 hold:

Fig. 3
figure 3

State diagram showing the transitions between the space of original faces and distorted faces

From the condition described, we also have that a transformation rule f must not shift the average.

By accepting this condition, we have that:

$$ \frac{f\left({\boldsymbol{x}}_{\boldsymbol{\delta}}\right)+ f\left({\boldsymbol{x}}_{\boldsymbol{\delta}}^{\prime}\right)}{2}=0\to f\left({\boldsymbol{x}}_{\boldsymbol{\delta}}\right)=- f\left({\boldsymbol{x}}_{\boldsymbol{\delta}}^{\prime}\right) $$

But, \( \frac{f\left({\boldsymbol{x}}_{\delta}\right)+ f\left({\boldsymbol{x}}_{\delta}^{\prime}\right)}{2}=0\to {\boldsymbol{x}}_{\delta}=-{\boldsymbol{x}}_{\delta}^{\prime } \), so:

$$ f\left({\boldsymbol{x}}_{\boldsymbol{\delta}}\right)=- f\left(-{\boldsymbol{x}}_{\boldsymbol{\delta}}\right) $$

From the last equation and knowing that x δi  ≥  − 1, we conclude that f must be odd and thus:

  • (α) f(0) = 0.

  • (β) x δi  ∈ [−1, 1], since x δi  ≥  − 1 and f is odd

  • (γ) f(x δi ) ∈ [−1, 1], because f(x δi ) + 1 = f(x i ), and f(x i ) is a distance ratio, which means that f(x i ) ∈ [0 ,  +  ∞ )

We come to the conclusion that all f rules must both be defined and take values in [−1, 1]. The values −1 and 1 of features are physical boundaries. A value of −1 means that a feature does not exist, since it has 0 length. A value of 1 means that a feature is twice as large as the average, which is unnatural. For these extreme cases the application of a rule f has no purpose. Thus, for the depiction of such cases, if a feature is larger than 1 we leave it the same (since it is exaggerated on its own), while if a feature is smaller than −1 it is cut off at −1.

Note that due to f being odd, it follows that the whole process of caricature generation proposed herein is reversible, that is, there is no loss of information and thus we can regain the original face of a caricatured one by applying f −1 of Fig. 3, thus forming a cyclic model of transformations.

If we visualize all the above, we conclude that the rule we should select must belong to the space shown in Fig. 4, and must be anti-symmetrical with respect to the bisector of the axes:

Fig. 4
figure 4

Transformation rule space.

Using curves such as in Fig. 4, we can define different artistic styles by determining the easiness with which separate facial features will be emphasized.

Since f is odd, the dashed lines on the left and right of (1,1) are of equal length. For x δ  = 0 a feature remains unchanged, while for x δ  ≠ 0 a feature is amplified according to f(x δ ).

Ultimately, the essence of all the analytical methods above could be reduced to the expression: “In order to produce caricatures that are perceived correctly, one must distort a given face in the same manner as he or she would distort its anti-face”. Hence the need for symmetric transformations occurs, so as to maintain the average of both original and caricature faces equal and unchanged. Consequently, the enhanced features of a caricature can better indicate the deviation of the initial features of a face from those of the average.

4 Results

By implementing a system based on the proposed method and applying various exaggeration rates for the output, we can obtain the desired results. As mentioned in the previous section, any exaggeration rule conforming to the described conditions of Fig. 3 can be applied for the purpose of caricature generation. For the demonstration that follows we chose a sigmoid function as our exaggeration rule:

$$ f\left({\boldsymbol{x}}_{\delta}\right)=\frac{2}{1+{\mathrm{e}}^{-\alpha {\boldsymbol{x}}_{\boldsymbol{\delta}}}}-1 $$

where α is the exaggeration factor. The sigmoid function is adjusted to fit the desired transformation rule space of Fig. 4. An appropriate polynomial or trigonometric function could have been used as well. Due to the properties of the above sigmoid function, values of α near 2 result to an approximate linear curve which leaves the input unchanged. For values of α greater that 2 exaggeration is achieved, while for values smaller than 2 features are driven toward the average.

Some examples of the resulting caricatures are depicted in column (b) of Fig. 5. Exaggeration factors ranging from 5 to 7 were applied, which produce an effective slope for the exaggeration rule that was used. As expected, the results not only conform to the proposed theoretical background, but also to our intuition. Indeed, we observe that the closer to the average a characteristic lies, the more it remains unchanged, while on the contrary, the further it deviates from it, the more it is enhanced towards the direction of that deviation.

Fig. 5
figure 5

(a) Original faces, (b) Caricatures generated with the proposed method, (c) Beautified faces

In [6], Langlois and Roggman demonstrate that the average face is commonly considered to be attractive. In our case, we pursued the rendering of input faces into output ones with more attractive features, without the loss of the identity of any given face. Due to the properties of the distortion rules that were previously described, for curves similar to that of the reflection of f with respect to the bisector (f −1 of Fig. 4), de-amplification is obtained. This attenuation of features drives the input face towards the average, thus resulting in quasi beautification (see column (c) of Fig. 5).

Resuming the discussion over the resulting caricatures of our model, it is important to point out that the output can be parameterized and set to follow various artistic styles. An artistic style is when an artist has the tendency to highlight certain aspects of the face more than others. For instance, an artist may want to focus and emphasize on the deviation of people’s eyes in his or her work. This means that the deviation of a person’s eyes from the average is exaggerated more than the deviation of other characteristics. In terms of our method, this means that different exaggeration rules are applied to various characteristics (i.e. rules like that of Fig. 4 but with different slopes). The caricatures in Fig. 5 were produced by uniformly applying the same rule to all facial features. The caricatures in Fig. 6 demonstrate an example of three different styles. Columns (a), (b) and (c) focus on the deviation of eyes, noses and head shapes, respectively, by using more sensitive exaggeration rules for these features. With respect to the sigmoid function that was used to produce these caricatures, each of the focused features in columns (a), (b) and (c) is exaggerated by a factor of α = 4 (which is a moderate exaggeration), while the rest of the facial features are exaggerated by a factor of α = 2.5 (which is a very subtle exaggeration).

Fig. 6
figure 6

Caricatures focusing on the deviation of the initial faces’ (a) eyes (b) noses and (c) head shapes

By observing the caricatures of Fig. 6, it is clear that the main feature of interest is more easily exaggerated in every case. What’s important, according to the scheme of Fig. 3, is that for any given style, the output face space is consistent as to the input face space.

5 Conclusions

Traditional methods for caricature generation are either highly dependent on human interaction and design or statistical analysis and machine learning. In some cases, this leads to transformation rules that are either biased or unaccented regarding feature exaggeration, as discussed in section 3. In other cases, this produces unpredictable or ambiguous results.

The purpose of this study was to try and setup a theoretical framework for selecting transformation rules that guarantees successful exaggerations of facial features. Towards this direction, we suggested that the consideration and employment of the average face in conjunction with the anti-face is effective and beneficial for the field of caricature generation. This consideration helps provide transformation rules that do not shift the average, and thus leads to the creation of non-arbitrary and non-heuristic caricatures. The rules that are defined by the proposed method are characterized by consistency and produce satisfying results. Furthermore, they allow for the easy handling of factors that determine the exaggeration rate, namely the achievement of various kinds of artistic styles of the output.

Efforts on building caricature systems like the one made herein, do not imply that human caricaturists can or should be replaced by automatic caricature generators. Computer generated artwork in all its forms can often be distinguished from that of humans’ and is generally considered less appealing. Hopefully, methods like this can form the basis for producing alternative styles of caricatures, or assisting artists in their decisions regarding what features to select and how to exaggerate them.

In the framework of this study, we also indicated that the anti-face can be used as a means to understand the oddity with which our face is perceived by others. Lastly, we pointed out that a suitable variation of our method produces attractive faces by yielding output faces shifted towards the average.