Keywords

1 Introduction

Since Disney’s release of Snow White and the Seven Dwarfs in 1937, traditional hand-drawn animation gained momentum as one of the most entertaining and inspiring mediums of creativity and storytelling of its time. A few decades later, many animation studios have yet to put out any hand-drawn animation films, with Disney having no plans for releasing any 2D animation films in the future [28]. The cost and skill associated with drawing out every frame, a practice originally done on cel-sheet paper due to the lack of technology, was too expensive compared to its computer-generated counterpart that requires no drawing experience. As easier 3D tools became more widely accessible, more novice content creators would avoid the art of hand-drawn animations in favor of 3D animation, citing that they do not have the necessary drawing skills to create traditional animations [1].

Fig. 1.
figure 1

Our program computes the joint angles of a set of key body joints to pose any 3D model. The posed model is toon-shaded and sketched over to help give a hand-drawn animation. Bottom row of frames were sketched and colored using the middle row reference keyframes.

A few notable qualities make hand-drawn animation more favorable compared to its 3D counterpart. Traditional animation has a human-like touch, where every stroke is carefully thought out rather than interpolated. Characters are not limited to human constraints and can bend and rotate limbs in any fashion. Yet in order to create such character animations, individuals need to know how to draw characters accurately (per frame) and consistently (across frames), putting forth a high artistic barrier for those interested in the field.

In order to alleviate the high requirements of 2D animation, we developed an animation pipeline that does not require any artistic ability from the user when creating hand-drawn animations. While previous work explores 3D animation from 2D sketches, our pipeline is the first of its kind to use 3D models to help go from stick figures to a 2D character animation. We can model realistic body turns and proper object perspectives that are otherwise difficult for 2D character animators to draw, giving beginner animators an easy way to generate and stylize characters in a toon-like fashion. We demonstrate the program’s ease-of-use in creating several different hand-drawn animation cycles applied to various character models and evaluate the system on a number of individuals with and without traditional artistic backgrounds. The final pipeline allows anyone to, regardless of artistic ability, create 2D hand-drawn animations, encouraging individuals without artistic backgrounds to create hand-drawn animations despite their lack of drawing experience.

2 Related Works

Making 2D animation easier has been a persistent challenge, pushing creators to spend more time dreaming and less time drawing. Disney Research’s Mender was developed as a vector/raster hybrid interface for computing in-betweens quickly [33] and Adobe’s Flash Professional software integrated a similar pen tool to vectorize brush strokes, but both approaches suffered from the same artifacts when trying to adjust to complex geometries like face and body turns. Researchers at Autodesk added “Motion Amplifiers” that apply a set of transformations on a vector to model the motions found in the 12 Principles of Animation [17, 30], although the resulting vector still suffers from its inability to exhibit 3D rotations. 2D sketches can also be used to to pose 2D vectored characters [27], but this is limited in viewpoint as we cannot consider 3D rotations.

We can edit 3D motions to obey the 12 Principles by applying filters or convolutions to exaggerate [18, 34] or squash and stretch [19] these 3D motions, making them look more 2D like. While these filters are a step in the right direction, the viewer can often see that the results were generated by a filter and not by hand animation, thus leading the viewer to loose interest in the work. An alternative to this is to use inverse kinematics on the joint angles [20, 21] or bone segments [3] to pose the characters in the exaggerated ways we see in 2D animation. One solver in particular attempts to use 2D sketches in the model’s 3D environment as a basis for 3D character posing using inverse kinematics [24], making 3D character posing and animating a 2D constrained problem, given that all target movements lie along the same plane. We use this mechanism of editing 3D data along the 2D camera plane as a part of the backbone of our drawing interface, as it provides a way to pose 3D characters from 2D-constrained sketches.

State-of-the-art research in this area focuses on 2D-assisted 3D animation from rough 2D skeletons [7, 9, 12] and from detailed artist drawings [15]. Converting 2D sketches to 3D posing data is an under-constrained problem, leading to many possible poses per 2D sketch. Rather than querying the user to pick the best pose from a sketch or searching for similar 3D poses from a database [23, 29], we attempt to properly constrain the problem by drawing 2D stick figures in 3D space. These 3D interfaces for 2D sketches have been used previously for iterative character re-posing [24] and to define spatial trajectories of character motions [10, 11, 31]. These same 3D drawing environments can also be used for creating 3D human character models [25] or various other 3D models [5, 35, 36] from 2D sketches. They provide an interface for drawing within 3D space that leads to a fully-constrained conversion between sketch and 3D posing, and rotations in these 3D sketch spaces can be achieved by easily computing the quaternion rotations [16] between joints. While intuitive, these models are used for 3D animation rather than 2D. We combine these 3D spaces with 2D rendering techniques to provide the first accurate 3D animation interface for 2D rendering.

Toon Shading is a non-photorealistic rendering style that helps give 3D animation a 2D cartoon-like look [4, 13] by thresholding on the shader’s color value to give a render high-contrast shading effects that we see in 2D imaging and paintings. Hand-drawn shading is an alternative that integrates sketch-like strokes around the contours of an image to give it a more natural, hand-drawn feel [6, 22, 26, 32]. When rasterizing 3D-posed reference frames, we can apply these hand-drawn shading styles to the contours of our exported 3D character animation sequence to help make it look more like a 2D sketch while also introducing the stochasticity and roughness of human sketches that we see between frames.

Fig. 2.
figure 2

Block diagram of the interface. Blue regions refer to the sketch interface while orange regions refer to the posing interface. The joint rotations of the sketch in 3D space are used to pose a user-imported 3D model. These model frames are toon-shaded and cropped before being loaded behind the original stick figure, providing a live feedback-loop. (Color figure online)

3 Approach

Figure 2 shows a high-level description of the animation interface segmented into two parts: 2D and 3D. The 2D drawing interface is the user side that is responsible for gathering the joint vectors that comprise the stick figure. This data is sent to the 3D interface where these rotations are computed and applied to a rigged character model that is toon-shaded and rasterized before being sent back to the 2D interface. The user has a chance to make any corrections and re-pose any frames they are not happy with. Once done with their animation cycle, users can sketch their character animation using these frames as keyframe reference or can use the hand-drawn program we discuss in the paper to algorithmically give the character a sketch-like feel.

Fig. 3.
figure 3

3D posing interface with configurable parameters. The user specifies the joint mappings between the joints in the 2D setting with the joint names for the specific model.

2D Interface. We construct a 2D interface that can sketch multi-color strokes in 3D space. The different stroke colors refer to different body parts, including the back, left-leg, left-ankle, right-leg, right-ankle, left-shoulder, left-arm, right-shoulder, and right-arm. The user can draw on the plane parallel to the current camera’s view and drag endpoints of existing stokes parallel to the same camera plane. We install an orbital camera to view the stick figure from different angles and to help redraw/drag existing strokes. This mechanism allows users to draw in the native xy-plane that they are used to drawing in 2D on pen and paper while still allowing the user to modify depths of strokes in the z-pane. If the user draws a stroke that already exists on the current frame, the previously existing stroke is cleared, allowing for only one instance of each body part per frame. The interface’s provided timeline coupled with the onion skinning tool allows for drawing in-betweens with ease after drawing the main keyframes.

The user can link together joints, where the program attaches strokes if they share a common joint. Some common joints include the shoulder connected by the back and left/right-shoulder or the knee connected by the left/right-leg and left/right ankle. This way, the user does not have to make sure strokes connect when drawing them, but can auto-connect joints afterwards.

The rotational data of the joints are used to pose and display a toon-shaded rasterization of the user-selected model behind the user’s stick figure for each frame. This allows the user to see in full perspective how their posing looks and allows them to edit their stick figures in realtime.

figure a

3D Interface. Each joint maps to a base vector and end vector, where we seek to obtain the rotation transforming a joint along the base vector to the end vector. For each joint, we fetch the initial base vector of the model before any transformations are applied. This helps us compute the quaternion rotation from the base’s rest vector to the current base vector [8, 14]. We apply the inverse of this quaternion to the base and end vectors for the current joint, allowing us to reset the base vector to its rest position before computing the current joint’s quaternion rotation from base to end in its initial space. This is because we will be applying the quaternion rotation to each joint from it’s rest position, so we want to compute quaternions when the base vector is aligned with it’s rest position.

Each quaternion can be broken into a pitch, yaw, and roll. Because we do not define orientations for our joints, the roll is left undefined when computing the quaternion. This can lead to random rotations that affect the child joint, offsetting target rotations by some amount along the roll of the parent joint. To fix this, we remove the roll component from each joint’s parent quaternion. We compute the current axis as the child joint’s roll axis and the target axis as the world-space end vector of the child joint and attempt to minimize the dot product between these two vectors. We can compute the rotation angle between the current and target axis, and apply an axis-aligned rotation onto the parent joint’s roll axis for the computed rotation angle. This effectively removes the roll component of the parent’s quaternion and correctly aligns the child joint to its target location.

The user can import any rigged 3D model to pose using their stick-figure joint rotations. Since the model joint names and rotations differ from model to model, our interface traverses the model’s scene graph and provides a list of all joint names to the user. The user can then configure which local joints map to which model joints. Because these are model-specific parameters, they only need to be computed once per model as shown in Fig. 3.

When rasterizing the frames, we use toon-shading where we extrude the geometry of back-facing normals that are shaded black to give it an outline effect. We interpolate the vertex normals of the model along the faces to create smooth normals that we can threshold on in order to compute hard shadows for our figure. We combine this with a movable directional light to give it a flat-shading effect. For each frame, we pose the model using the quaternions computed per joint and then rasterize the frames and save them to the device (Fig. 4).

Fig. 4.
figure 4

Applying the same joint rotations to multiple toon-shaded models

We can search for the rasterized frames on the device and compute the bounding box of the toon-shaded image in order to crop the image and display it as a texture behind the bounding box of the stick figure. This allows the user to see in real-time how the frame posing aligns with their stick-figure and allows them to make quick edits to body parts and re-rasterize frames if they need to. This iterative process helps users experiment and refine their animations cycles.

Fig. 5.
figure 5

Computing the hand-drawn and paint details separately before compositing them. A displacement map sampled from perlin noise is used on the paint details to give them additional temporal inconsistency.

Sketch Effect. We can generate a hand-drawn feeling along the contours from the backface shading of each frame by applying rough sketch strokes to these contours. Each sketch is comprised of a bezier curve of several nearby points. We generate a random point along the contour of the frame and search for the next point within a given radius of the previous point using uniform rejection sampling, making sure the next point falls within the contour of the frame as well. To help remove extraneous strokes that cut across non-outline regions, we also verify the midpoint between every two points also lies along the contours. We can generate N random strokes in the bounding box of the frame or generate N/C strokes per cell in a \(C \times C\) uniform grid to promote the uniformity of strokes distributed.

We can separate the shading components from our toon-shaded models and run a thresholded convolution filter that bins colors into either a light or a shaded color depending on which color the average of the convolution is closer to per pixel. This provides blotchier, smoother details that replicate hand-painted effects for shading. We can apply both light and dark shadows by duplicating the blotchy shading and apply a choke convolution that narrows the region, creating darker shadows in more concentrated regions. We finish by adding a displacement map to the shading, where the displacements are read from a perlin noise texture to provide smooth jittery distortions to the shading per frame to give shading a slight temporal inconsistency, similar to what would be found in hand-painted frames. The results are composited with the stroke effect in Fig. 5 to provide both hand-drawn strokes and shading.

Fig. 6.
figure 6

Front-view run cycles. Bottom row colored-in by novice user.

4 Results

Of the 5 participants we selected to evaluate our system, 1 participant had traditional drawing abilities, 1 participant had vector-art abilities but no drawing abilities, and the remaining 3 participants had no prior art experience. We aim to measure the effects of the pipeline along a diverse set of backgrounds but focus primarily on individuals without art backgrounds.

We evaluated our system on several core animations cycles, such as jumping-jacks and run cycles, which can be seen in Fig. 1, Fig. 6 and Fig. 7. Models were provided by Adobe Mixamo [2], allowing us to configure the joint rotations once in order for the rotations to work for most models. Our comprehensive demo of video animations can be found here.

Participants without drawing experience were asked to sketch over the resulting edge-detected run-cycle frames in Fig. 1 and Fig. 6 to give them a more hand-drawn feel. These participants found that sketching over the reference frames was simple and required no previous drawing experience, yet mentioned that drawing over every frame can take a long time. Because of this, these participants preferred using our sketching algorithm on the contour frames to generate quick automatic sketches of these animation cycles in Fig. 7, requiring no additional sketching and saving users a substantial amount of time.

Fig. 7.
figure 7

Participant-generated stick figure animations and the resulting posed output. Top uses edge detection while bottom uses our sketch-painting algorithm with N=1,500 and C=5.

5 Discussion

The Run, Walk, and Kick cycles in Table 1 were generated by the 2 artist participants while all other cycles were from non-artist participants. Participants without prior art experience were able to create fluid animation cycles, primarily because the animation interface required them to draw rough stick figures. In some cycles in Table 1 the drawing time was substantially longer than the editing time. This happened in cycles such as run and kick that were visually more challenging to think and draw the posings for. Most cycles had lower editing times, mainly because it is easier for users to see and edit in-betweens of primitive color-coded stick figures using onion skinning than of larger, more complex 3D character rigs, allowing users to quickly identify and edit changes to their work whenever they noticed a temporal inconsistency in the motion.

Table 1. Animation cycle timed (in mins)

Following the study, users were asked about their impressions about the pipeline. Non-artist users found it easy to think about their animations as stick figures rather than individual joint rotations when producing their character animation cycles, as stick figure drawings were ubiquitous in their pre-school and elementary school periods. When asking about their thought process during the animation stage, these individuals without art or posing experience said they would move around in front of a mirror and copy down their motions as stick figures in order to easily create their animation cycles from their own movements. They found that real-life was an easy reference to them, and that converting real-life posings as stick figures came naturally for them.

The 2 participants with art experience said that drawing stick figures was a much simpler interface than traditional forward kinematics posing, and that they could draft ideas down much faster than the conventional counterpart of drawing frames by hand. This demonstrates the pipeline’s effectiveness for individuals with and without art experience. Individuals without art experience can use the pipeline to create consistent and smooth hand-drawn character animations without needing to understand character anatomy while individuals with art experience can use the pipeline to quickly prototype 2D hand-drawn animations.

Limitations. Our interface only allows drawing line strokes, prohibiting the generation of curved strokes for arc-like posing. While some models may have many pivot points along the back to make it bend and curve, other models may lack these pivot points. In order to generalize our interface to posing multiple models, we use line strokes, not worrying about curved or arc-like body parts.

When computing joint angles, the stick figure from the 2D interface is only concerned with aligning body parts in the 3D model to match their stick figures. Yet this is still an under constrained problem, since our algorithm does not account for the fact that users should be able to rotate joints around themselves that do not change the character pose, but change body part orientations. A common example is twisting an arm around itself. These body part orientations are not specified in the original 2D interface. Future work could explore adding a normal to each stroke to help visualize the orientation, and moving this normal around would change the orientation. When computing joint angles, this changes our alignment strategy from aligning vectors to aligning planes since each joint is described by their unique direction and normal. In this case, there always exists a unique rotation between two planes.

Future work could also look into adding in support for head rotations as well. Most artists represent heads in basic sketches as circle with a cross, where the cross intersection represents where the nose is oriented. Our program would analyze where this point is relative to the center of the circle, as well as whether the cross is bent inwards or outwards to denote whether the nose is facing towards or away from the camera. This information would be enough to construct a unit vector from the center of the face to the cross point and compute the quaternion rotation of the vector from its rest position.

6 Conclusion

Our implementation bridges a 2D drawing interface with a 3D posing and rendering interface in order to assist with the process of creating 2D hand-drawn animations. We also introduce a novel toon-based shading scheme that builds on classic cel-shading to create hand-drawn effects and shading. We believe that with our interface, if anyone can draw stick-figures, then anyone can animate.