Keywords

1 Introduction

Our initial research began in 2019 with a primary question: Can accomplishments in cultural heritage, such as the creation of virtual environments of historic sites and advancements in game development, such as inhabiting virtual environments with actors and stories, be utilised for the benefit of creating virtual ‘cinematic’ heritage?

We knew the outcome of the investigation would be the creation of a virtual reality application, a walk-in movie scene of a film, which the viewer can freely explore (with a headset in a room-scale VR setup), while a narrative with actors unfolds around them. We had to identify the heritage content that would be the subject of the application.

As researchers based in Singapore, we wanted to address the relatively internationally little-known Singapore and Malayan film industry and history, most notably the late colonial period (1940s to 1960s), when there was a prolific output of Malay-language cinema produced by two vertically integrated film studios - Malay Film Productions (owned and managed by The Shaw Brothers), and Cathay-Keris Studio (owned by Loke Wan Tho). Rather than pick on a frequently re-screened ‘classic’ film from this era, we decided to focus our research on the film Pontianak, made and released in 1957, by Cathay-Keris (See Fig. 1).

Fig. 1.
figure 1

© 1957, Cathay-Keris

Film stills from Pontianak (1957) and Dendam Pontianak (1957)

The choice of this film as a case study was grounded in its significance as a heritage artefact. Firstly, it features representations of traditional kampongs (villages) of Malaya and Singapore from that period (and historically depicted). Our case study presents the first virtual kampong for audience exploration, which constitutes a high degree of relevance in the context of cultural and historical heritage preservation. Secondly, the key source of the film (and the series that followed) is traditional Malay mythology and folklore that was widely believed in and still is (to some extent) - in contemporary Singapore and Malaysia. Thirdly, Pontianak is considered to be a ‘lost’ film, as there are no existing prints or copies of the film in any archive, and none have been seen since at least the early 1960s [1, p. 126].

Investigating a lost film brought a new level of complexity to our work, as we were required to ‘recreate’ the film from scant sources, rather than ‘restore’ an existing (and possibly damaged or incomplete) copy. Traditional heritage approaches to cinema were thus impossible, which to some extent justified our highly non-traditional use of VR to animate a work that was otherwise impossible to experience. Our idea was to use the immersive, experiential qualities of VR to create a new work, inspired by Pontianak, and rather than attempt to simulate the film in an accurate way, we hoped to create an experience that imaginatively reflected the film, and our research into it, which would inspire audiences to learn more about this ‘lost’ piece of film heritage.

Before we began we needed to assess how similar projects were executed and what was feasible for our team to achieve. The Epic Games Digital Human project [2], which created a digital incarnation of actor Andy Serkis, Actor of Gollum in The Lord of the Rings (2001), Kong in King Kong (2005), Caesar in Rise of the Planet of the Apes (2010), and many more [3], demonstrated that convincing realism was achievable beyond big-budget film productions but in a real-time game engine environment. In 2021, Epic Games went even further and released MetaHuman Creator [4], a tool that enables artists to create realistic computer-generated (CG) characters to be used with their game engine Unreal. In 2018, Epic’s Digital Human project created characters capable of believable acting by capturing the performance and facial expression of real actors utilising a professional high-end motion capture system from Vicon [5]. Since then, motion capture alternatives have become available that promise comparable results for a fraction of the cost. These significant advancements have implications beyond applications in entertainment and games, and furthermore they generate a question: Are smaller academic research teams, non-commercial projects, and artists in reaching distance of creating realistic digital humans? And how can virtual heritage applications benefit from these developments?

At time of writing, our research project, creating the virtual cinematic heritage application for the film Pontianak, is still ongoing; steps and processes involved in designing the virtual Malay village environment, experiments in reenacting a key scene of the film as well as findings in the history and synopses of the films have been described in a previous publication [6]. In this paper, we will provide some background into the source material and our historical research, then go on to outline different strategies of capturing performances, evaluate a low-cost motion capture system and detail further findings and knowledge gained through several iterations of capturing performances with actors for the virtual reenactments of our virtual cinematic heritage application.

2 Pontianak (1957) and the Snake Bite Scene

The first Pontianak film was released in April 1957 in Singapore. It was the first film to make prominent use of supernatural figures that were part of a common Malay folklore, originally recorded in English by anthropologists at the turn of the century [7, p. 325–326]. The portrayal of the Pontianak in the film marked a significant re-invention of the mythos, credited to the film’s writer Abdul Razak in collaboration with the film’s lead actress Maria Menado [8]. The films’ box office success led to a second film Dendam Pontianak (Revenge of the Pontianak) being rushed into production and released that September, followed by a third film, Sumpah Pontianak (Curse of the Pontianak), released in 1958. There would be other Pontianak films during that period (produced by the rival studio run by the Shaw Brothers), but these three films brought together the distinct talents of Abdul Razak (as writer), the director B.N Rao (who had been ‘imported’ from Mumbai’s burgeoning film industry), the producer Ho Ah Loke (who represented the ‘Keris’ part of of Cathay-Keris), and the actress Maria Menado, originally from Indonesia, who notably played three roles in the film (or rather three incarnations of the same character): the ‘deformed’ orphan Chomel, a ‘beautiful’ Chomel after she’s used magic to change her appearance, and the monstrous Pontianak creature itself - a vampiric figure with elongated fangs and claw-like fingernails, wearing a white robe.

As mentioned, despite their reportedly wide distribution (throughout Malaya and Indonesia, and other Asian territories) Pontianak and Dendam Pontianak, are believed to be ‘lost films’. According to sources [1, p. 126], it is said that when Ho Ah Loke broke off his production partnership with Cathay’s Loke Wan Tho, the two divided the film titles between them including the remaining prints. Some years later it is said that Ho disposed of the prints, including the two Pontianak films, in a quarry or a lake. This oft-repeated story is flawed, mainly because there would have been hundreds of prints of each title, although little is known about film distribution in Malaya during that period and how prints were stored or destroyed.

Since Pontianak has not been viewed since its first period of release in 1957 and 1958 there is a scarcity of information or images available. We were able to find contemporary articles and reviews from the time of the film’s release in local newspapers and magazines. One major source for stills and story information was the published synopsis of the film, which we were able to locate via private collectors of film memorabilia. Film synopses were a common form of promotional material and merchandise during this period for Malay-language films, and they would contain images from the film, behind-the-scenes photos, as well as a prose summary of the major events of the film’s storyline.

We have determined, through the synopses and other secondary sources, that the basic plot of Pontianak is an origin story of the titular creature, who is an abandoned child, found by a bohmoh (Malay shaman), and raised as his daughter/servant, ironically given the name Chomel (meaning ‘pretty’ or ‘cute’ in Bahasa Melayu), even though she is coded as ‘ugly’ and ‘deformed’ in the narrative. When the bohmoh dies, Chomel is entrusted to burn his magic books, but instead she learns the spell to make herself beautiful. However, she is told that if she drinks human blood the spell will be broken and she will become a Pontianak. This story is in stark contrast to the commonly-known myth of the Pontianak as a ghost of a woman who died in childbirth. In the film, the beautiful Chomel, travels to a kampong where she meets and falls in love with the son of the village chief, Othman (played by M. Amin). They marry and have a daughter, Maria, and it is after this, that we reach the crucial scene when Chomel finally transforms into the Pontianak.

Another key source was the written account of A R Mustafar, an independent historian of Malay film, who reports that he watched Pontianak upon its release in 1957. He described the transformation scene as a crucial moment for the audience, witnessing the cinematic rendering of this infamous supernatural figure for the first time. He writes:

Something that struck out to me was when M. Amin’s calf was bitten by a snake and when he was in so much pain, Maria Menado sucked out the poison. In that moment, the cinema went absolutely silent since they knew what was going to happen next. The change from Maria Menado’s beautiful face into that of the scary Pontianak shocked the audience, even causing a slight commotion for a while. When the shock died down, silence came again [9, p. 114].

In terms of narrative context, we know from the film’s synopsis that Chomel has been warned after she used magic to make herself beautiful that drinking snake poison will turn her into a monster, which is something the audience would have been aware of - hence the anticipation of that moment. The synopsis goes on to describe the scene in more detail:

(They) were having a relaxing chat alongside their daughter who was playing, Othman was suddenly bitten by a snake on his neck. Othman was moaning in pain, Chomil (sic) wanted to leave her husband to take medicine meant to fight a snake’s venom, but Othman couldn’t wait and asked his wife to suck out the venom that was causing him so much pain, from his neck. Othman moaned in pain again and asked his wife to suck out the snake’s venom from his neck. Due to her faithfulness to her husband, Chomil held on to her husband’s neck and began to suck the venom out of his neck…Tasting blood…it tastes good…and without realising Chomil ended up sucking up all the blood in her husband without stopping…Othman began to scream all of a sudden…Chomil started to change…her black hair became white, her skin started to wrinkle…her nails became sharp…Othman became weak due to the blood loss and…he died, falling onto the ground [10, p. 10].

Given how pivotal this scene was to the film, both in terms of the narrative and the audience response, we decided to make this the focus for our first iteration of using VR technology to recreate the sequence. The first step was to script a sequence between the two characters Othman and Chomel, in which they walked through the kampong which would build up to the moment of the snake bite and then the transformation.

We were partly inspired to have them walking as we were taking reference from a film still of the actors M. Amin and Maria Menado in character, that presents them standing, and also we wanted to create an experience in which the viewer can travel through the virtual kampong rather than be in one static location. This decision would present technical challenges described later. We wrote dialogue for the characters which was an imaginative projection rather than an attempt at speculating what the ‘real’ dialogue would have been. In our dialogue Othman is curious about the mysterious past of his wife, questioning her as to her origins, and revealing tensions between them. This dramatic element, was designed to function as exposition for a viewer unfamiliar with the story - it was also written the spirit of Malay-language films of the era, which tended towards being dramatically direct and expositional. However, their conversation is interrupted when the snake falls from a tree (an assumption we made about the original film given that we know that the snake targets the neck) and bites Othman, leading to him to implore Chomel to suck the venom from his wound, which she does reluctantly, and then she finally transforms into the Pontianak, which is where our sequence ends.

3 Performance Capture

To populate a virtual environment with digital humans, an artist or researcher has two basic options in regard to creating the animation of the characters. The most common approach is to use a library of actions such as idling, turning, walking, jumping etc. and then transition between these to create a flow of continuous motions. This approach is the foundation of real-time interactivity of computer games. The individual actions are created by manual key-frame animation or using motion capture performances which are then edited for short actions that can be looped. Advancements are being made regarding how seamless the transitions between actions are rendered. A second approach is to motion capture an actor’s performance for the entire scene in one continuous linear action. This filmic or theatrical approach forfeits interactivity for the benefit of realism of the performance. While only workable for non-interactive background characters in games, this second approach provides an opportunity for virtual heritage applications to improve the authenticity and believability of reenactments of historical events.

As laid out in the previous chapter our main objective was to enact a scene from a film, its structure linear by design, we focused on the second approach to capture the entire performance in one continuous linear action, with our main focus on capturing the aforementioned 4-min-long Snake Bite scene. Additional shorter scenes were captured to further evaluate the two motion capture systems available to our project. The two systems are a camera-based system from Vicon, which is a permanent setup in our research facilities, and as a second system, the portable inertial sensor-based system from Rokoko, which is considered an entry-level low-cost alternative. Skogstad and Nymoen [11] analyse both concepts and conclude “If high positional precision is required, OptiTrack [a camera-based system] is preferable over Xsens [a sensor-based system], but […] Xsens provides less noisy data without occlusion problems”. The two specific systems compared here by Skogstad and Nymoen, OptiTrack and Xsens, are a fair comparison as both are considered in a similar price range; in contrast, our two systems from Vicon and Rokoko cannot be considered as such. However, the lower cost Rokoko system is promoted as an alternative to the more expensive camera-based systems and the portability feature is an advantage that must be considered, and as we were aiming to capture actors walking within a large area, the portable sensor-based system appeared more appropriate for our use case.

Fig. 2.
figure 2

Virtual kampong village, still images from VR experience, 2020, the authors

3.1 Capturing the Pontianak Snake Bite Scene

Our goal was to capture the entire 4-min-long scene with the two actors talking while walking through the virtual kampong village (See Fig. 2) in one continuous take. Following several tests, the outdoor sports field on our university campus was chosen as a capturing area as it provided the necessary open space, power outlets and, with the grass field, a relatively soft area for the actor performing as the husband to fall on. Capturing outdoors in a wide-open space (See Fig. 3) had another significant advantage, Rokoko’s sensor-based suits are sensitive to electromagnetic interference, which can be minimised at an open space such as the sports field. Although the recommended distance to metallic objects is only one metre, we could not establish a fully interference-free capturing area in our school’s building. However, we identified some areas in our building such as the auditorium stage with an acceptably low amount of interference.

Fig. 3.
figure 3

Actors in sensor-based suits at outdoor capture, 2020, the authors

To avoid the actors colliding with virtual buildings and trees, the walking path was translated from the virtual environment via a 2D coordinate system to the real-world capturing area, marking checkpoints and boundaries on the sports field and spanning an area of 40 by 25 m for the actors to walk through. A more sophisticated approach of matching an actor’s position with the virtual environment during capturing is to stream the motion capture performance to the virtual environment in real-time to create a live preview, a process considered ‘virtual production’. However, sensor-based systems do not provide a reliable absolute position in ‘world’ space. For our particular use case with two actors walking side-by-side for minutes in a large area, the so-called ‘drifting’ caused the captured virtual positions of the two actors to be metres apart over time - while in reality, they were still just centimetres apart from each other. Rokoko offers a smart solution to compensate for this shortcoming by supporting SteamVR, allowing HTC Vive trackers to be mounted on actors and props. As such a setup requires several Vive base stations surrounding the capture area, it evolves into a combination of sensor and a camera-based system, neglecting some of the sensor-based system’s portability advantages. Furthermore, the capture volume is limited by the base station setup, which, according to HTC, supports an area of 10 by 10 m [12]. Thus, for our application with a 40 × 25 m large outdoor area for the Snake Bite scene, adding base stations was not an option. As a result of capturing without ‘absolute’ position, the drift between our two characters captured simultaneously with two suits, accumulated to several metres over the entire capture time and required us to scale and reposition the data in post-production extensively to fit the layout of the virtual village.

3.2 Facial Capture

Once these positional corrections were done, a video render of the captured characters’ entire walk was prepared (See Fig. 4) to support the facial capture and voice-over acting at the sound recording studio. Simultaneously to the voice-over acting, the facial data was captured using the iPhone FaceID depth map system and applied to our characters in the Reallusion iClone software.

Fig. 4.
figure 4

Witness camera (top) and retargeted characters for voice over, 2020, the authors.

3.3 Evaluation and Post-production

A basic post-production workflow for motion capture follows these simple steps: Review and identify the best take with the least issues and perform a clean-up of the data as necessary. The extent of the clean-up process depends on the precision of the captured data and the final required quality. This manual clean-up process can only be effective if witness cameras are used to produce video references from the capturing session, commonly shot from two angles simultaneously, allowing the animator to identify discrepancies between the actor’s actual movement and the captured data and to adjust accordingly. As our capturing area covered such a large area and our main witness camera being a hand-held gimbal following the actors (See Fig. 4), our reference videos were not easily usable for the clean-up stage, exposing the size of the area and the lack of static cameras as a flaw in the planning of our venture.

Evaluating the captured data and estimating the extent of how much labour-intensive clean-up would be required presented another challenge. Issues in the data which appear minor and acceptable for an animated film (for instance), might be severe and unacceptable for a virtual reality project which provides depth perception through stereopsis in the HMD. Our eyes have possibly seen countless occurrences of humans walking in real life, to a degree that, except for individuals suffering from stereoblindness, even a layman is able to identify an awkwardness in a simple walking performance of a virtual actor if the data is flawed and presented in stereoscopic 3D. Our project went through several steps of authoring such as basic clean-up, merging the facial and body data, cloth simulation, hair grooming etc. before eventually reviewing the assembled final character in VR, to only then discover that the underlying captured data possessed more severe issues than previously seen on the 2D computer display. From this experience, we concluded that every single authoring step and in particular the quality control of the motion capture data must be performed in stereoscopic 3D / virtual reality instantly and without delay.

3.4 Capturing a Second Scene with Two Systems

These findings were directly applied to the motion capture session of a second and much simpler, shorter reenactment scene, in which the Pontianak ravages a victim and, once discovered by the viewer, runs away leaving the blood-drained victim plummeting to the ground. Only requiring an area of 4 by 3 m, we were able to capture the performances with both systems available to us, the portable sensor-based suit and the studio camera-based system. To evaluate the captured data in VR, and to compare the two systems directly, we skipped previously applied intermediate steps and used simple grey-scale characters - that distinctly contrasted with the background environment, allowing us to focus precisely on potential issues in the capture data. The evaluation confirmed that similar to the Snake Bite data review, issues that appeared minor on a 2D computer display were visually amplified in stereoscopic VR. In regard to the accuracy of the motion data and perceived realism, the camera-based studio system unsurprisingly outperformed the sensor-based suit in all three - stationary, falling and running - performance actions. The data of the sensor-based suit, while still exhibiting issues, appeared most accurate for the stationary part of the performance; in contrast, the falling and running actions demonstrated severe levels of inaccuracy.

Fig. 5.
figure 5

Body, finger, and facial capture of the Pontianak, 2021, the authors.

3.5 Stationary Performance Capture

Based on these findings, we decided to further investigate if the sensor-based suit would produce acceptable results if the actors were kept stationary. Twenty short performances including standing, sitting, and lying were captured with a single actor playing two characters of the VR experience, and in addition to the body capture, finger and facial data were captured simultaneously (See Fig. 5). We again skipped all non-essential postproduction steps to review the data as early as possible in stereoscopic VR. The review confirmed a significant increase in accuracy in comparison to all previous attempts, allowing us to use the motion capture data with very little clean-up effort.

4 Results

At this current stage, the project has produced results beyond the films’ historical findings in the form of two room-scale virtual reality applications compiled for Steam VR and viewed with an HTC Vive Pro setup.

The Pontianak Snake Bite VR Experience.

The audience is invited to explore the virtual environment freely, examine the old kampong houses and the surrounding tropical vegetation. The story logic allows the user to follow Chomel and Othman on their 4-min-long stroll through the village to the jungle path location where the snake bite scene plays out and Chomel dramatically transforms into the Pontianak (See Fig. 6). As described earlier, our actors’ walking path spans an area of 40 by 25 m and thus requiring the audience to use the SteamVR teleportation feature to navigate through the larger environment. Although this navigation concept works as planned, the experience of constantly teleporting to follow our actors’ dialogue is overwhelming and potentially results in the user missing key moments. We therefore implemented an alternative approach which positions the user automatically at the snake bite location, allowing to uninterruptedly follow the approaching actors’ conversation. Using cinematic terms (of shot sizes and camera framing) and the user representing the camera, the first approach translates to framing the actors i.e., as a medium shot, by constantly repositioning the camera location, and the second approach begins with a wide shot in which the actors are approaching and ends in a medium shot. Both approaches have their limitations and present an experimental approach to the walk-in movie idea. Among others, a lesson learned from these experiments is that continuously moving actors makes the experience ‘complicated’ both for the production of the work and for the user. Regarding the performance capture, as mentioned earlier, the results of the sensor-based suit from moving characters were not aesthetically or dramatically convincing, on the other hand, the sheer size of the capture area couldn’t be achieved, within our resources, without the portable system.

Fig. 6.
figure 6

Stills from The Pontianak Snake Bite VR Experience, 2020, the authors.

Fig. 7.
figure 7

Still from VR experience The Woman Who Fell to Earth and Met the Pontianak, 2021, the authors.

Described above, led us to create a second VR experience designed around mostly stationary performances. This ‘spin-off’ project is an artistic experiment and “B-movie” homage that confronts a melancholic female Stranger (an Alien/Cyborg) with the unfamiliar environment of a historic early 20th-century Malayan kampong village, and the Pontianak (See Fig. 7) [13]. The audience can explore the village freely, meander around and observe the Stranger and the Pontianak at their own leisure. There is only a minimal narrative with no dialogue or dramatic action, and although the user is required to navigate (with the teleportation feature) to the changing locations of the various scenes at which the Stranger appears, the overall experience is that of a free observer maintaining their agency. The stationary performances produced immediately usable and convincing capturing results from the low-cost system, with the finger and facial data adding significantly to the believability.

5 Discussion and Conclusion

This investigation had two main objectives, firstly to imaginatively reenact scenes of the lost Pontianak film as a VR experience, and secondly to evaluate if a low-cost motion capture system could support the creation of realistic acting digital humans to potentially enable virtual heritage applications to improve authenticity and believability of virtual actors.

In terms of the cinematic heritage content, we have made some progress - transforming the research into narrative scenarios that could be used in VR, but we are still grappling with the technical limitations in order to determine exactly what can be achieved in bringing these elements together (See Fig. 6). The static environmental ‘set’ of the Malay village possesses a high level of detail and demonstrates the exciting potential of VR as a tool for heritage, but the goal of creating realistic virtual performers, who can enact the roles from the reconstructed lost film, is still very much a work-in-progress. The script we had created for the Snake Bite scene proved to be too long and too dynamic, with its ‘walking and talking’ progression, revealing the many limitations of a portable low-cost motion capturing system.

In the end of this phase of production, we achieved usable results only from stationary actors, however, movements in space, such as walking and falling, produced high levels of inaccuracy which require manual clean-up to an extent of being uneconomical for a small research team. The idea of utilising the portable sensor-based suits to capture in a wide-open area to simulate actors strolling through a large virtual environment turned out to be overly ambitious in regard to the capabilities of the low-cost system. It also revealed that we did not invest enough in the importance of witness cameras to produce essential video references.

In summary, the portable low-cost system is a viable alternative for non-commercial, artistic and smaller academic research teams working on VR experiences, but only if extensive resources for manual clean-up are available or stationary actions are sufficient. Furthermore, if the primary project outcome is a stereoscopic virtual reality experience, as it is the case of our project, it is crucial to perform the review and quality control of the motion capture data directly in virtual reality. Although creating near photorealism for advanced tasks such as virtual environments and digital humans have become de facto possible and are in reaching distance, they still pose tremendous challenges for a small research team with limited resources.

This means that we had to re-evaluate what was actually possible in terms of our imaginative reenactment of scenes from Pontianak. While we might be able to produce key images and moments from the film (or at least our hypothesis of what happened in the film), the goal of producing a whole sequence with multiple characters is much more challenging to attain. At this stage, we are in the process of reconsidering what is possible as well as what can be stimulating and interesting to audiences interested in such a heritage project. We still believe that the VR approach to a ‘lost’ film is a multifaceted way to get closer to something that no longer exists, and the next stage for the project will be to present it to different audiences to gauge their reactions and feedback and to see if it is effective as a heritage or artistic experience.