Keywords

1 Introduction

This article documents ongoing experimentation with the form of non-linear film experiences in a manner which might be called ‘customised cinema’ rather than the more frequently used label of ‘interactive cinema’. In interactive films and artworks, the norm has traditionally been direct and frequent interaction via a technological interface which elicits a response in the film/artwork. Assuming that the interaction changes in some way the representation of the film/artwork, then navigational non-linearity, or variation of some kind, can be presupposed. The branching narrative through traversal of interlinked pre-filmed sequences has been a standard format in continual use from Kinoautomat (1967) [1, 2] through the YouTube “annotations” craze of the 2010s, to Netflix’s Bandersnatch (2018) and beyond. Developments of this paradigm have tended mostly toward technical experimentation with delivery systems and physical interfaces, and although the attraction of this format (which has its outcomes predetermined) has not waned over the years, its lack of variability and agency has received frequent criticism: media artist Ken Feingold writes that “An interactive artwork playing images and sounds off a laser disc, CD-ROM, or hard drive is no more genuinely interactive than a machine selling canned soft drinks” [3]. Rezk and Haahr make it clear that alternatives should be explored: “Bandersnatch belongs to a particular class of Interactive Digital Narratives (IDNs): highly restrictive, nonlinear, branching structure films that offer limited agency and, in many ways, have less to offer than more sophisticated IDNs. While the simple format is appealing to new audiences, there is clearly scope for improvement” [4].

Let us now envisage the situation whereby the non-linearity in the interactive film does not arise through a branching process but through having scenes in the film modified according to personal data from its viewer. In this case there would be differences in the audiovisual representation at every single viewing which would be caused by personalised modifications upon a fixed template. The research questions discussed in this paper arise from attempts to develop and test such a ‘customised film’ prototype using personal resources, an average laptop computer, and zero budget, with the aim of offering insights into possible future directions for this type of approach. The film is designed for a single user as an art installation and would involve an active, voluntary data collection stage followed by a passive film-watching phase in which the acquired audiovisual (and other) data is represented in creative ways. Details of the resultant film installation entitled You•Who? follow below.

2 Context and Concept

Some precedents that inspired and influenced the work appeared on websites as short small-dimensioned filmic advertising over a decade ago. The web project Tackfilm [5] invited viewers to upload a photograph of themself, which then appears in a few short live-action sequences of an advertisement promoting the payment of the Swedish television license fee: the viewer’s likeness is shown on a giant poster on a busy street, for example, to suggest they are a national hero for paying the fee. Sing it Kitty (2015 website, no longer extant) manipulated an uploaded photograph to create a comical lip-synched face morph to a music video, also in an advertisement. Another method was to obtain the personal data by asking users to connect via their Facebook account: Roadtrip Forever (2013 website, no longer extant) promoted road safety by using Facebook friends’ names and photographs; Take This Lollipop [6] used the participant's personal data and location to emphasise in a frightening way the potential danger of not taking online privacy seriously. The Wilderness Downtown [7] used data and images related to a user's childhood (on-screen questions are asked before the film starts) whereas Just a Reflektor [8] functioned by inserting live webcam imagery into strategic parts of a pre-recorded music video. All the above were, however, of short duration with small video dimensions: a challenge to be solved with You•Who? was how to make this a much longer fiction film (ten minutes or more) with acceptable video dimensions (1280 × 720 pixels) that tells an engaging fictional story and uses a larger variety of different data incorporated in numerous creative ways into the final rendered film. In terms of relevant current work the marketing company Idomoo [9] creates ‘personalized video’ targeted at commercial customers although the data utilised in the videos seems to be exclusively text and/or numerical and again, the videos are of short duration. Manipulating a photo of the user’s face has also been popularised due to technological advances in face-swapping via AI methods such as deepfake: the app Zao for example, integrates an uploaded user photo into popular scenes from a variety of movies and TV shows, and ‘refacing’ using apps such as Doublicat has also become common practice.

The technique behind You•Who? originated in a short comedy film shown as part of a live interactive performance that I was involved with in the mid-2000s called Cause & Effect [10]. At the beginning of the show, the stage presenter casually interviewed four audience members as a warm up, but the responses were recorded and these audio recordings were subsequently (and unexpectedly) played back at carefully timed moments during a purpose-made linear film for comedic purposes, producing riotous laughter in the audience every time. Comedy therefore seemed the perfect genre for a fully-fledged attempt at a customised film and an opportunity arose to make a prototype to be shown in a public-facing outdoor installation at the Vilnius Culture Night in 2017, for which a specially-made film featuring recognisable locations in the city was produced. The event was busy, crowds gathered behind a user and often gave spontaneous applause when the film was over as if the participant had given a genuine acting performance. This event was followed by other location-based prototypes, leading to the design and production of the final film You•Who? which was successfully shown in several gallery situations until possibilities were hampered by the travel restrictions of the global Coronavirus pandemic.

The fictional story portrayed in You•Who? was constructed so as to be self-referential to the capturing of personal data and identity theft, with a scenario designed to maximise the creative and humorous potential for embedding data into the film’s sequences. Location-specific scenes were dropped in favour of a more generic approach suitable to be shown anywhere. In You•Who? the protagonist, returning from a conference, is gradually ‘possessed’ by another conference attendee—portrayed by the data from the participating viewer of the film—and at the climax of the film the protagonist's identity is taken over completely by the participant in a final face-morphing sequence.

The challenges in the film production were practical and conceptual. Essentially the film has just two actors, the filmed protagonist and the user—even though the user was never filmed in the traditional sense and the protagonist acts as if to an imaginary co-actor. In order to exercise complete control over every shot and because there was no budget, I filmed and acted the role of the protagonist myself. This meant it was easy enough to rethink ideas and re-film scenes that didn’t quite work as expected when they were rendered along with elements of the user data, leading to a highly iterative process and the possibility of bringing in new ideas along the way (Fig. 1).

Fig. 1.
figure 1

Scenes from the film’s finale, the protagonist’s face morphs into that of the participant.

3 Technical Development

The basic specification for the required experience comprises two distinct elements, a voluntary data collection phase which when completed is followed immediately by the playback of the linear film into which the data is rendered. To take the concept beyond a gimmick the specification was for a fiction film of at least ten minutes duration and at least 720px resolution. Data would include as a minimum an image of the user’s face, a voice recording, and textual input, hence the installation interface setup would not be too demanding, requiring a webcam, microphone and keyboard connected to my own 2013 MacBook Pro laptop with 2.4 GHz processor.

The critical technical element in the process was to generate the customisation efficiently such that participants experience no delay whilst their data (such as an AIFF audio file or a JPEG photograph) is rendered into existing videos and then shown. Pre-filmed sequences were edited into a Quicktime movie ‘template’ into which the rendering would take place, with a difficulty being that video rendering and viewing are mutually exclusive processes. Initial ideas around the use of alpha layers proved unsatisfactory and the solution found to avoid delays was simply to break the film template into smaller chunks, some of which do not include data and can be shown ‘as is’ whilst rendering continues through the series of movie template files which are viewed almost as soon as the rendering is complete in a just-in-time rendering production line. Adobe After Effects software was used for the rendering via a background rendering application called ‘AErender’ bundled with the program installation, meaning that all the capabilities (effect filters, opacity, soft edges, masking, etc.) of After Effects would be available for use. Although it was ultimately decided to delete all user data immediately after the film was viewed (to prevent privacy problems or data storage legal issues) the potential would be there for a user to be given a viewable copy of ‘their’ film as a Quicktime movie file.

Basic proof-of-concept experimentation showed that timing issues were solved if the full template Quicktime movie were divided into twelve separate sequences, alternating between those not using audience data (played as readymades to buy time for the ongoing render and to provide exposition) and those requiring to be rendered with the data. As an example, the opening credits of the film and its introductory sequence lasted two minutes and did not include user data, allowing enough time for the background rendering to be completed on the subsequent sequence of the film. This could then be seamlessly displayed after the introduction scene was finished, and whilst this was being watched background rendering would proceed to the next sequence in the queue. The After Effects project file consisted of six renders cued up, one for each segment of the film that needed user data to be included, with all the variable data elements placed in a single folder that was emptied after each run-through and then built up again with the subsequent user’s responses.

The task was complicated somewhat by using a laptop which only just managed to render and play back videos simultaneously and became very hot, and although the system worked totally satisfactorily a faster processor would have been beneficial. It was not possible to be as creative as desired with the use of After Effects effect filters, for example, as this demanded too much processor load. Nevertheless, developing a successful method using a basic laptop opens up the overall concept more than if expensive high-specification hardware were a necessity.

4 Software Issues

A non-linear branching film with direct interaction can be made with total ignorance of the world of programming. Software such as Klynt Interactive or Stornaway.io can be used to make a web-ready interactive film for a browser without the need for any coding—all hyperlinking and interactive functions are accessible via graphical user interfaces. For You•Who? deciding upon the software to control the overall experience (obtain the user data, instigate the rendering and play back the final video sequences) was dependent on my own technical capabilities and unwillingness to bring in a coding expert. Adobe Director, which had not quite reached its end-of-life at that time, was chosen to glue the entire project together simply because I had experience in using it for earlier projects and it uses a timeline interface metaphor which suited the carefully coordinated sequence of events necessary to make the project work. Director was always popular with non-coders because basic screen design, hyperlinking and interactivity can be implemented using ready-made functionality, yet at the same time ‘Lingo’ code can be inserted in various ways to shift towards a more algorithmic approach. Using these Lingo commands, external applications (in this case custom Automator and Applescript apps) can be launched at precisely the right moment in the Director timeline that they are needed. Automator software (mac-only), which facilitates setting up a sequential flow of system processes that can be packaged as an application, was used to compartmentalise particular tasks, which included launching Python code via simple shell scripts. Applescripts, which can also be saved as self-contained applications, were found useful to carry out system-related tasks such as using inbuilt Photo Booth software to take a photograph and to copy the resultant image to the You•Who? data folder. These small self-contained applications dedicated to a particular task could be tested independently which greatly aided the debugging and testing process.

Setting up the user data capture was relatively straightforward using either menus made with the inbuilt features of Director or by launching the aforementioned custom-made apps for specific tasks. What data to obtain, and how to deploy it creatively, was perhaps the most imaginative aspect of the entire project and this grew as the various prototypes were improved and augmented. In the first main iteration of the project, visual data in the form of a captured jpeg of the participant’s face and a short video capture in Quicktime format offered the greatest filmic potential. Additionally an AIFF audio file was captured of the participant speaking their name and two simple questions were requested to be answered by typing into on-screen text boxes (“What is your name?” and “What is your philosophy in life?”). As explained below, once the basic mechanics of the entire process were working smoothly and could be tested, additional possibilities were added to the data capture stage.

5 AI Enhancements

The biggest fault that occasionally arose in the first prototypes was when a participant’s face was not in the centre of the image capture and got cropped or did not appear at all, which would totally spoil the watching experience. It was therefore found necessary to utilise basic machine learning on the photo grab to detect, align and crop the human face and to retake the photograph if a face was not found. Thus with some difficulty a Python 3 environment was configured with OpenCV and dlib libraries for image processing and machine learning operations. The original grabbed RGB jpeg file undergoes a series of processes, implementing Haar cascades with the ‘68 facial landmarks’ predictor. This paper does not contribute new knowledge to any of these programming techniques which were adapted from existing code samples and models in the public domain, with only a few tweaks necessary to fit the project requirements without the need to learn Python from scratch. Each significant operation was written as self-contained Python code, and Automator apps consisting of a simple line of shell script were used to run the code: these are invoked by Director at the exact moment required. This method aided with timing issues, as the expected duration of each operation could be easily determined and the user experience adjusted so that an operation was completed before the next was initiated.

Fig. 2.
figure 2

Summary of You•Who? timeline, processes and tasks (time flows from left to right).

Having developed the workflow (summarised in Fig. 2) and becoming bolder with implementing snippets of Python code, new and imaginative ideas for the project emerged, for example generating an outline drawing of the user’s grabbed face using Canny edge detection [11]. Code for face-swapping was extended by, at the data capture stage, asking the user to choose one of six artwork portraits (Mona Lisa, Girl with Pearl Earring, etc.) and one of six photographs (face-like crater on moon, first ever selfie from 1839, etc.): face-swapped versions appear unexpectedly and to comic effect in certain scenes of the rendered final film. Similarly the climax of the film reveals the protagonist’s horrified face gradually being morphed frame by frame into that of the user. In the latest version of You•Who? age estimation and human emotion recognition is applied to the grabbed jpeg and an extra short scene inserted into the film (just after the introductory sequence) which has been rendered with seven variations depending upon the emotion detected by the model. Additional machine learning techniques will undoubtedly stimulate further refinements to the project.

6 Observations

This short paper aims to introduce the concept of customised cinema and to explain a workable project flow using an average laptop: observations noted here so far are subjective and anecdotal. Because the installation proved somewhat delicate in terms of hardware and software configurations it has not been possible so far to send the project to a gallery as ‘plug and play’, meaning that there were no realistic possibilities of exhibiting You•Who? during the travel restrictions of Coronavirus.

Observations of You•Who? during the occasions it has been exhibited so far (most notably Cork’s Glucksman Gallery and the Science Gallery Dublin) suggest that the function of the installation seemed clear to participants and the goal of entertainment was achieved. Despite the thought-provoking theme of identity theft it was noticeable that the experience is comedic causing laughter rather than uneasiness. Enjoyment was shown not only by participants but by any audience standing behind and watching the screen—which made the installation popular with families and groups of friends, whilst complicating the face detection. Some of those that had seen the ‘trick’ (as watchers or participants) were observed to participate again but by deliberately entering data in a creative way so as to make the film funnier or more surreal.

Once up and running the technical performance has been reliable although rendering After Effects sequences whilst simultaneously playing Quicktime movie files demands much from the CPU and the computer gets very hot. The very latest and fastest hardware would undoubtedly make a positive difference and would allow more ambitious processor-intensive effects to be rendered in the After Effects projects resulting in a final film with even greater visual impact.

7 Discussion and Future Directions

The concept could easily be adapted to permit multiple participants to enter their data into the same film if the narrative was constructed so as to make this an engaging experience. Opportunities abound for varying the type of voluntary data captured from a user and, although the current prototype deliberately avoids the necessity of an internet connection, various web services might be used to pull in relevant data such as the current weather, time, and location that might also be inserted into the film. AI-based techniques could be extended to further enhance the visual representation of the video imagery. More might also be made of alternative variant segments of certain ready-made video sequences depending on the data obtained although this goes somewhat contrary to the aim of avoiding a branching-type structure.

Although You•Who? is created to be humorous and entertaining (whilst foregrounding the potential horror of digital identity theft) the overall concept would work equally well when used for educational or other purposes. The presentation format has some flexibility: when shown as an installation the setup consists of a table with screen, laptop (concealed), keyboard, external microphone and webcam. At a couple of venues the project was presented as a live event, projected onto a cinema screen with a willing participant entering data ‘live’ whilst the cinema audience watched—in this sense the entire audience watched the performance of the participant in a format reminiscent of karaoke/movieoke. With a roster of customised films a full-length live show can be envisaged with different participants for each film.

In terms of hardware and software, the visual creativity of the rendered video could undoubtedly be improved with faster hardware that could render sophisticated After Effects effects in a short enough time to be ready for presentation. In terms of software, it is clear that in the hands of a skilled programmer the entire project could be realised in a single piece of code. At the time of writing the project is being remade in a more future-proof way since the use of now-discontinued Director software is no longer tenable: the latest version will utilise the ‘visual development environment’ of LiveCode [12] which uses a coding language reminiscent of Lingo but has a card- and event-based metaphor rather than a timeline.

In terms of the overall concept and terminology the film is envisaged as a generic type of experience, a subset of interactive film, a variant of non-linear visual narrative that is not based on the classic model of interaction plus branching. The chosen term ‘customised film’ is somewhat vague and open: ‘contributory film’ is perhaps more accurate but gives the impression of crowd-funding; ‘audience participation film’ was already used to describe Kinoautomat when shown at HemisFair’68; ‘personalised film’ is another alternative although it seems to be used by the marketing community in a broader manner. ‘Latent film’ could be suitable in the sense that a film template exists to make unlimited different alternative renderings, although ‘latent video’ and ‘latent cinema’ [13] are already in use to describe image-based GAN interpolation videos. A definitive and more appropriate term awaits to be found.