This content was created by Paul Merchant, Jr..
Figure 11. AR street model placed on table during conference.
1 2024-09-17T05:59:25+00:00 Paul Merchant, Jr. 0158f9ffdc23fbe192fc5189110473e127e778be 3369 1 plain 2024-09-17T05:59:25+00:00 Paul Merchant, Jr. 0158f9ffdc23fbe192fc5189110473e127e778beThis page is referenced by:
-
1
2024-09-17T05:59:44+00:00
Silent Films and Augmented Reality
3
image_header
2024-11-14T07:44:08+00:00
by Matthew Lewis
Ohio State University
AbstractIntroduction
>Augmented Reality
>Data and AR
>Locating Data
>Mirrorworlds
Problem/Opportunity Space
>Related work
Approach and Implementation
>Overview
>Fort Lee research
>Technology access
>AR hardware and software choices
>Web-based XR software
>Environment modeling
>Hidden surfaces
>Actors
>Accuracy versus experience
Next Steps
Conclusion
AcknowledgementsIntroduction
Emerging technologies are beginning to enable the exploration of the two-dimensional worlds of silent films as three-dimensional (3D) virtual environments. Considerations for this translation from film to spatial data include choices of representation, accessibility, data sources and processes, and the layered contexts of viewer experience. This work is emerging not from a film history perspective but rather at the intersection of computer science (computer graphics), visual design, and data visualization. Specific technologies in these fields are becoming more accessible to interdisciplinary researchers, allowing digital humanities research to move in directions previously not considered.
Figure 1 shows an example of transforming a single silent film image into a collection of 3D objects that can be viewed using augmented reality, or AR. A video clip shows that this film frame of the storefront from The Cord of Life (Griffith, 1909), has been made into a 3D object that can be entered while looking through a smartphone.Additional silent film frames that show views of a set of buildings and storefronts were collected and assembled to create a larger 3D model that can also be explored using AR. A virtual block such as this can be placed and resized anywhere, appearing like a hologram that can be seen by looking through a smartphone screen, like a window into the past. Using web technology to do this means that by sharing a link, a URL, or a QR code, people with smartphones can experience recreated historical locations such as this one.
Figure 2 shows a view of the virtual block of storefronts assembled from silent movie frames. The buildings existed on the north side of Main Street in Fort Lee, New Jersey, around 1910. Here, the virtual model has been placed in a park in Missoula, Montana. A linked video was recorded from the screen of a smartphone.The process for someone to view this model in their environment requires just a few simple steps. An internet-connected smartphone or tablet manufactured in the past few years is needed. The user needs to visit a website in their device’s web browser (e.g., by entering a URL like https://street-test.glitch.me for the example in Figure 2). After the model quickly downloads, an on-screen button is pressed to activate the camera. The device’s camera should be aimed at the ground to place the model. The 3D model can then be resized, rotated, and moved to be placed as desired in the physical space.1
The following sections discuss the design choices made, processes used, and challenges and impacts to be considered when creating an experience such as this. First, a few of the primary concepts required to understand terminology and limitations will be introduced.
Augmented Reality
AR is a technology that has emerged from the combination of computer graphics, computer vision, and real-time mobile display hardware. It is frequently portrayed in science fiction films as futuristic holograms: virtual people, places, or interface screens become data, floating in the air everywhere in the world around us.
In recent years, companies such as Apple and Google have been primarily focused on supporting common handheld mobile devices for their AR software. A few companies such as Microsoft and Magic Leap have been at the forefront of wearable AR hardware, creating relatively expensive head-mounted displays (HMDs) such as the HoloLens and the Magic Leap One (Figure 3). While none of the existing AR HMDs have been very financially successful as commercial products, many valuable lessons have been learned by the research community.
The research described in this paper has relied on more commonly available smartphones and tablets for displaying virtual 3D objects located in the physical world instead of hardware that is usually only available in well-funded research labs. The techniques and technologies used, however, will be just as applicable for HMDs when they become more readily available.
There are a few paths to choose from for AR software, with different pros and cons. Many AR projects make use of powerful commercial game-authoring software like Unity to create software that can be distributed though commercial app stores. Other AR researchers use web-based, open-source software frameworks. As with all web-based distribution, this can make new work a bit easier to share with a more diverse group of people. The work described in this paper uses this web-based approach. A third, “walled garden,” approach is to rely on an individual company’s AR product or platform, such as Adobe’s Aero, Snap’s Lens Studio, or Meta’s Spark AR. These alternatives will be discussed in greater detail.
Regardless of the approach chosen for their modeling and delivery, virtual objects presented and viewed using AR technologies are data that must be created or collected and appropriately formatted to be displayed at interactive speeds. The next sections will talk about data and its location.
Data and AR
We historically think of data as charts, spreadsheets, and even maps. But today data can be just about anything that can be digitized, analyzed, communicated, or displayed. Smart objects sense and communicate with us, advertising companies sift through everything we say and do, and data scientists analyze everything from dancers to dramas. With the explosion of high-speed internet and streaming video, movies of all types have become one of the most analyzed and valuable forms of data. I have been involved with very diverse types of data including the visualization of birds and insects, buildings and vehicles, and plants and performers. All these new sources of data in turn require new forms of data collection, analysis, and visualization.
Data created to be viewed with AR technology is primarily three-dimensional. When 3D objects are created, AR can be used to place and view them in the physical world, as shown in Figure 4. This virtual tree and bar graph in a hallway are examples of recent research projects involving the use of AR to place different types of data in a variety of spatial contexts. In this case, tree-planting possibilities can be considered in disadvantaged communities and humanities data can be located in educational spaces.Locating data
Placing media data at specific locations in the world has long been one of many challenges for implementing AR. Location-based AR games like Pokémon GO and Minecraft Earth have been leading the way toward interacting with 3D content and experiences at specific outdoor locations around the world.
This field has been referred to in the past as locative media.2 The concept of experiencing different media physically at a given location has been the subject of advertising, a great deal of artwork, and more generally speaking, a great deal of “surveillance capitalism.”3 Pokémon GO is one of the more familiar examples, but even video billboards can be thought of as location-based data displays. Some other examples people might be familiar with include geofencing rented scooters or proximity-based notifications involving Tiles or AirTags attached to devices to prevent loss. Many companies are now promoting software that locates data in space in different, contextually complex ways.
Technologies for determining the location of objects and people mostly fall into a few categories analogous to strategies we might use. Devices may “listen” for wireless signals (e.g., GPS, Bluetooth, ultra-wideband, or Wi-Fi) that allow them to guess where they are when they “hear” the direction and distance of familiar emitters. Alternatively, devices might use cameras to look at an environment and try to recognize known landmarks. Finally, a device might be able to interpret direction and height and how far it’s traveling (e.g., twenty steps north, up one flight of stairs, then fifteen steps east). Many devices use a combination of these strategies to try to keep track of their location in the world.
Mirrorworlds
Kevin Kelly has written at length about a Mirrorworld containing additional layers of data overlaid and embedded in our physical environments.4 Locating metadata (i.e., data about data) about places, objects, and people creates what is known as a digital twin of the real world. Such data then allows the physical environment to be searched, even in real time. Googling the world becomes an extremely impactful possibility with significant implications.
Like most things that we create in the world, located data experiences and interactions can be first designed and simulated virtually. Instead of needing to do all AR software and hardware research at an actual physical site to evaluate alternative choices, the locations themselves can be digitized into data and then simulated in virtual reality (VR). Unlike AR, which usually involves placing virtual objects within some real, physical environment, VR uses an HMD that completely blocks out the real world and totally immerses users in a virtual environment. For example, one might walk around within a virtual version of a real-world museum without needing to travel to its location.
When referring to the technologies of VR and AR collectively, the term “XR” will often be used. The X can mean “eXtended,” or it can be thought of as the variable x that can stand for both (V)irtual and (A)ugmented reality. Given this brief introduction to a few concepts around XR and located data, the next section will describe the motivations, trajectory, and intersection with the humanities research topic that resulted in this project.
Problem/Opportunity Space
This section presents the context and goals, both technical and scholarly, that led to the design solutions implemented. The first couple of years of the COVID pandemic led to a shift toward academic conferences being attended entirely online via videoconferencing software. At one such online conference about this subject, I was introduced to film scholars with a strong interest in virtual technologies. This resulted in the collaboration that generated the work described in this paper.
From a technological standpoint, the genesis of this photographic AR work emerged from the new accessibility of 360-degree “spherical” panoramic photography (Figure 5). That this work began with emerging camera technology is notable. While these new photographs allowed the capture and sharing of specific locations from a single viewpoint, it was the potential for using such photographs of spaces to create photoreal 3D environments that was particularly intriguing. After struggling with new beta software being developed for this purpose, a project was initiated to create a new workflow that could use well-established software that was already familiar. The use of existing tools made the process available to students and researchers alike who were interested in XR research.
During the pandemic, research funding made it possible to hire a graduate student to assist with creating virtual spaces from photographs. Modeling a preschool classroom to test the feasibility of prototyping virtual technology systems was the first such funded project (Figure 6). But beyond creating realistic 3D models of spaces, the primary challenge was being able to walk around in them using VR. The eventual goal was to create a prototype and evaluate designs for candidate emerging technology systems within VR without needing to be in the physical spaces (e.g., connected “smart” light and speaker systems for the preschool classroom).
As full-time work from home began, conversations were initiated with other faculty about potential opportunities for doing collaborative data visualization in VR. Figure 7 shows initial work with Mike Rayo from the Industrial Systems Engineering Department at the Ohio State University. Possible 3D graph representations were discussed, and we looked at the process of sharing designs using VR in recreations of familiar spaces that were no longer accessible to us during the pandemic. We also explored methods for placing 3D data models in our homes using AR (e.g., in the backyard).
Importantly, both experiences were shared using web addresses (i.e., URLs) and mobile web browsers in smartphones and stand-alone VR HMDs. It should be emphasized that this process was chosen specifically for the ease of sharing via VR and AR while developing prototypes. From a technology perspective, this demonstrated that resources and processes were available to create photorealistic, web-based XR spaces and objects using processes that foregrounded the ability to collaborate with people from other disciplines.
This XR research work was presented at the Beyond Zoom conferences held online at Dartmouth. This was where I met Mark Williams and John Bell, directors of the Media Ecology Project (MEP), a research initiative funded by a National Endowment for the Humanities Digital Humanities Advancement Grant.5 Discussions of their silent film data analysis work led to this project of applying XR workflows to the problem of immersively visualizing silent movie environments. The initial data in this case was a set of newly restored film frames, discussed later.
Related work
A few examples of other work that uses films as data are briefly surveyed here. While not intended to be a comprehensive review, it should provide the reader with a broader sense of the diverse permutations of source material and presentation approaches for combining physical and virtual aspects of films in different media and contexts.
Jeff Desom created an “optical theatre” called HOLORAMA that “brings several iconic scenes from the history of cinema back to life” by combining virtual actors and physical sets.6 In this case, physical sets from specific movie shots were meticulously modeled at dollhouse scales. The Pepper’s ghost illusion was used to project actors holographically into memorable scenes in the space, reflected by a semitransparent screen.
In an earlier work, Desom edited together a modified twenty-minute version of Hitchcock’s film Rear Window (1954).7 A single accelerated shot of the buildings seen through the titular window was assembled into a large panoramic view. Image- and video-editing software was used to reposition the original shots to the appropriate positions within the panoramic frame.
In the theatrical play Gin & “It,” the characters in a different Hitchcock film, Rope, were projected onto the theater stage while live actors performed the roles of stagehands involved in backstage intrigue during a fictional recreation of the movie’s single-take filming.8 The live actors were carefully choreographed carrying physical screens that “caught” the actors’ projected performances from the film, making it seem as if the film’s actors were walking through their cinematic roles on the physical stage.
Similarly, amusement park rides sometimes project video of characters and events from films into mixed physical-virtual rides. One example is the Ratatouille ride at Disney Studios Paris, in which trackless cars drive guests through a series of very large rooms with enormous physical set designs inspired by the Pixar film. Scenes and characters are projected so that the riders feel like they are immersed in the action of the movie at the scale of mice.9
For the Shining360 project, Claire Hentschker used Kubrick’s film The Shining (1980) as source material.10 In particular, a number of shots that feature a smoothly moving camera were selected. Such footage, with its significant overlap between frames and gradual changes in perspective, is ideal for photogrammetry software, which generates 3D environments and objects when provided with a large set of sequential photographs. The 3D environments thus created were rendered into a 360-degree video that can be viewed in VR using a 360-video smartphone app and an inexpensive cardboard VR HMD.
The set from a famous scene in the film Citizen Kane (Welles, 1941) was reconstructed by the Emerging Technologies Group at the University of Michigan to provide film students with an opportunity to gain experience making cinematography decisions. A professional actor was motion-captured for this Citizen Kane VR experience. Students make choices about how to reshoot the virtual recorded actor using a simulated film camera.11
In the previous example, motion capture technology was used to apply an actor’s movement to a modeled 3D computer graphics character. Alternatively, live actors can be filmed from many angles to generate a volumetric, photoreal 3D model. These virtual actors can then be positioned inside 3D-modeled VR environments. This actor-centric, volumetric approach was used in a VR experience based on The Cabinet of Dr. Caligari (Wiene, 1920) called Cesare’s Dream.12
A middle ground between the motion capture and volumetric approaches is to use a single camera shot of actors in a film and then add depth information generated via machine learning to extract them from the frame. This allows the images to be used with AR techniques specifically. This technique was demonstrated by Fleisher and Anlen to place dancing Pulp Fiction (Tarantino, 1994) actors in the room with the viewer.13
These examples provide a few alternatives for representing film sets, shots, and actors, as well as presenting them in different physical and virtual combinations. The next section explains the sources of film data used for the silent film AR project, then discusses the processes used to create the 3D representation necessary for AR, and finishes with discussion of the interaction experience.
Approach and Implementation
Overview
This work started with a single image from the Library of Congress Paper Print collection—specifically, the frame of a storefront from The Cord of Life, which had been restored through the work of the Film Preservation Society and shared with Dartmouth’s MEP.14 The story behind the location at which this and many other films were shot was extremely interesting and led to the idea of collecting and assembling multiple shots and locations. Once assembled, the resulting environment could then be explored using increasingly accessible XR technologies.
Fort Lee research
Early in the collaborative discussions between computer graphics technologists and film historians, it was mentioned that many silent films were shot in the town of Fort Lee, New Jersey. The film we started with, The Cord of Life, was located on a specific block using a storefront on Main Street that appeared in many films. Other frequently used locations are beginning to receive recognition, such as an alley in Hollywood that was the site of many silent films around 1920.15 But in the earliest days of silent film production in the US, Fort Lee was “the centre of the cinematic universe.”16
Figure 8 shows the single restored film frame used to create the 3D AR example shown in Figure 1. In order to expand on that work, views of this and neighboring buildings shot from additional angles and showing adjoining wall surfaces were needed. Film historian Chris Milewski created a web page that identified silent films shot throughout different neighborhoods in Fort Lee.17 Shot locations were estimated on maps of the time period that show the outlines of streets and buildings. This proved to be an extremely helpful resource.Using these lists of films and locations, Dartmouth student Lauren Spencer reviewed the named films, seeking those specifically shot around the storefront, and recorded time codes and descriptions of candidate moments that show building architecture while minimizing the number of people in the shots. After reviewing all the films and findings identified by Spencer, I selected frames from four films showing parts of this adjoining set of Main Street buildings. Additional films and angles that either replicated the same viewing angles or were filled with many people, obscuring the building surfaces, were not used.
Two frames were used from The Cord of Life, the storefront image and a second frame showing one and a half store entrances on the far left side of the block. The shot is taken from across the street, filling only the top portion of the frame, and is partially obstructed by a pole in the foreground (Figure 9).Two frames were also used from the film The Curtain Pole (Griffith, 1909). The first is one of the few shots showing the upper stories of the building above the primary storefront. The top of the building is cropped in the middle of the third story. It is shot from a significant distance away, barely filling the middle third of the frame. This building is also partially obstructed by poles in the foreground. The second frame used from The Curtain Pole shows the tall advertisement signs that stand to the right of the primary storefront. As the road curves, they are not visible from the initial storefront shot, but they can be seen in the previous Curtain Pole shot behind the horse and buggy in the road. It is notable that the signs are shot from a sharp angle with strong perspective foreshortening (Figure 9).
The two remaining frames in Figure 9 are from the films Billy’s Séance (1911) and The Salvation Army Lass (Griffith, 1909). The first shows a clear shot of the entrance doors to the left of the primary storefront. These are rarely shown unobstructed in other films. The second frame shows the front of the building to the right of the advertisement wall—again, from a very challenging oblique angle. The building front is only partly visible. These last two frames were not yet available as high-quality restored images, and thus screen grabs from low-quality online copies of the film were used. Very high-quality frames from The Cord of Life and The Curtain Pole were provided by Tracey Goessel (via Dennis Doros, Mark Williams, and John Bell), restored from Library of Congress paper prints by the Film Preservation Society’s Biograph Project.A film could not be found that showed the full front of the auditorium to the left of these buildings, but a great historic photo was provided by Richard Koszarski, who has written quite a lot about the film history of Fort Lee.18 This photo was also added to the 3D model. The presence and state of the auditorium on the far left end and the different storefronts varied over the years that these films were shot (Figure 10). The accuracy and “truth” of this portrayal of this block of Main Street will be discussed in a later section. The next section describes the technology workflow design.
Technology access
There is a wide range of accessibility in the hardware and software that could possibly be used by researchers to accomplish the goals discussed so far. Most of these processes could be conducted using expensive technology from research labs that are only available to relatively few people. The term “accessibility” can be used in several ways. It is commonly used when discussing differing sensory and motor abilities. To read more about current efforts toward making XR technologies accessible to people with disabilities, investigate the XR Access Initiative Research Consortium.19 The term can also refer to the availability, affordability, or approachability of different technologies. This project has focused on the use of hardware and software that are more available to academic researchers and their students rather than requiring teams of elite programmers or expensive technology. Specifically, this work relies primarily on free, web-based libraries and smartphones or tablets.
The interactive software used when creating 3D models from these film frames is not the most commonly used for computer graphics. The primary use of SideFX’s Houdini tends to be for creating advanced visual effects for the commercial film and gaming industries. The basic capabilities required for 3D modeling and texturing are generally available in nearly all the other commonly used 3D animation packages, such as Blender, C4D, and Maya. There are also several alternatives available for sharing and viewing these models using AR. Several of the pros and cons of these will be discussed. The approachability of technologies for AR has been a significant historical barrier to access. In just the past couple of years, AR has finally begun to shift from something only programmers and 3D computer graphics experts can create to something a student can learn in a few hours using free software.
AR hardware and software choices
From a hardware perspective, AR technology has been sharply divided into two categories: handheld AR using tablets and smartphones or wearable AR using HMD “glasses,” as shown in Figure 3. While AR HMDs mostly cost a few thousand dollars, handheld AR is now supported on most recent, midrange smartphones.
From a software perspective, there has been a similar schism between two primary authoring paths. Most developers rely on commercial game development software such as Unity, then export a mobile app. They then must either install the app on a physical device in their possession or distribute it through a commercial app store such as Apple or Google. An alternative route has been to use the web, frequently referred to as WebXR: uploading AR experiences to a web server, then sharing URLs via links or QR codes like any other website.
In recent years, an additional solution between these two extremes has emerged. A few companies have released AR platform apps to encourage AR sharing. Meta, Snap, and Adobe have each created AR-authoring software that allows creators to design AR experiences that can be shared with anyone who has downloaded and installed the company’s app. Experiences can then be distributed via a link, as with the web-based approach, but an app store download is also required. However, this app only needs to be installed once to use that platform’s experiences, rather than a new app for each AR experience, as is common with the game development process.
Using a social media company’s authoring tools also requires both creators and viewers to create social media accounts, install the company’s apps, and participate in the respective tracking ecosystems. The use of open-source frameworks can make work a bit easier to share with a more diverse group of people. Most young people are not using Facebook, and few older people use Snapchat. While their AR-authoring tools have easier-to-use interfaces, researchers may be hesitant to require viewers to install free software from advertising companies. Individual companies also have a history of stopping support for emerging technologies that researchers might be relying on (e.g., Google’s Eddystone support in Chrome) or purchasing smaller companies’ products and then discontinuing their AR platform (e.g., Apple and Metaio’s Junaio AR browser). While open-source solutions are usually less refined, they have a much stronger record for availability.
Using a web-based approach meant that during a remote presentation of this work at a silent film conference, an attendee was able to point their phone at an on-screen QR code on a presentation slide, touch the web link that appeared on their phone screen to open the 3D model in their phone’s web browser, then use drag and pinch gestures on their touch screen to place and resize the virtual street model on a table in front of them in the auditorium (Figure 11). If an app-based approach had been used, the attendee might have needed to visit their phone’s app store, remember and enter their store username and password, download the new app, locate the downloaded app icon, navigate the new app’s required startup screens, and so on.
Web-based XR software
The New York Times Research and Development Department has written about their efforts to explore what they refer to as spatial journalism.20 They have been exploring how different emerging technologies can be used to share 3D content, including places, people, and objects, with their readers. Several of their recent articles have been about their use of web-based, open-source XR software. They recently shared a new software library they’ve developed to assist creators with this process for complex scenes.21
But at the heart of open-source, web-based distribution are HTML and JavaScript. For nearly three decades, it has been possible to open a text editor, type a few lines of text, then save it to a networked web server. This enables anyone on the internet to view this file in a web browser by clicking a link or typing a URL. While formatted text, images, and links can be created and shared just using HTML, adding capabilities is usually accomplished with some code written in the JavaScript programming language. This does not mean the creator must become a programmer. Other people’s “libraries” of code can be added to a page to enable new features, like AR.
There are several options for using libraries to add AR capabilities to web pages. For this silent movie AR project, a free JavaScript library from Google called model-viewer was used to handle the AR portion.22 A free website called Glitch that enables anyone to share websites was also used.
The two dozen lines of HTML shown in Figure 12 can simply be copied and pasted to instruct the app to “show this uploaded 3D model using AR.” Students have been able to learn this very quickly. Learning to work with 3D modeling and texturing software to create the 3D model takes quite a bit longer, however. Making changes to a website such as this only requires editing the text in any web browser. Compare this workflow to a typical app-based distribution process. A developer who makes a change must submit it to the app store for approval, then wait for evaluation and approval—possibly multiple times if any issues are found by the evaluators.There are a few alternatives to the model-viewer JavaScript library. It was chosen because of its compatibility with both Android and Apple devices. It also provides a very straightforward interface for allowing the viewer to position, orient, and resize the virtual object using AR. While it’s great for displaying a single, minimally animated virtual object, it doesn’t allow for much in the way of customizing experiences, interactions, or virtual scenes.
Another relatively easy-to-use JavaScript library called A-Frame allows much greater customization of complex scenes and interactions, but it’s not as well supported for AR on Apple devices in the standard mobile web browsers like Safari or Chrome. An alternative browser from Mozilla Labs called WebXR Viewer offers AR support on iOS, but it requires downloading an app, and it’s no longer being developed or supported. There have been recent signs that Apple may begin formally supporting WebXR in its browser in the coming year.23 Further options exist, such as Three.js and the WebXR Device API, but they require programming expertise and much more effort to be implemented from scratch.
Environment modeling
The construction process used for this historic virtual block of Main Street relies on 3D modeling knowledge that is frequently learned by students studying video game design or 3D animation. The specific approach can be described as virtual projection mapping, analogous to the real-world use of light to project film onto surfaces. Traditional projection mapping is a technique in which an image is projected into the real world through the lens of a video projector pointed at buildings, objects, or surfaces in a physical environment. The image is then adjusted—“mapped”—to match the shapes of some physical surfaces. The adjustment process usually involves distorting the image by squashing and stretching its edges (e.g., “corner pinning”) so that the virtual image aligns with the edges of the physical structures.
The procedure used for this project is somewhat the opposite: the movie frame showing a physical environment is “projected” into a virtual environment using a simulated lens on a virtual slide projector placed in space at a location that roughly corresponds with where the photo was taken in the real world. Virtual surfaces and geometric objects (e.g., boxes) are then manually repositioned, rotated, stretched, and resized until they align with the surfaces shown in the projected image. For example, a virtual wall would be created, placed, resized, and aligned to “catch” a section of brick wall visible in a projected photograph of a building. For the primary storefront mapping, full 3D objects were arranged in space to align with the surfaces in the film frame. Some examples include the doorway frames and opening, the protruding window box, furniture inside the room, the stairs, and the structures under the window (Figure 13).This full 3D mapping allows the user to enter the virtual room and view it from different perspectives using XR. The remaining and adjoining film frames were arranged and overlapped in space on simpler, flat surfaces. This “backdrop” approach produces a sense of scale and presence and looks acceptable from a distance and in the periphery, but it is significantly less immersive when approached up close, much like physical stage backdrops.
Nearly all 3D animation software has the capabilities required to project images onto virtual 3D surfaces through a process called texture mapping. The technique is commonly used to wrap virtual objects with complicated surface materials, like bark on a tree or the cover on a book. Using photographs that show entire environments as textures is much less common, and it’s particularly tricky when not much is known about the camera. In this work many assumptions and estimations needed to be made about the distance, height, and lens properties for the seven camera shots used.
Hidden surfaces
When a given photo has elements in the foreground that obstruct the view from the camera, this creates hidden surfaces for which there is likely no photographic source. These hidden regions will be immediately visible when an XR user begins to walk around. Either the viewer will simply see a gap (a hole in virtual surfaces), or they can be filled in by an artist, much as damage might be repaired when doing art restoration. A simple example is the areas of the ground or building hidden behind the poles in Figure 9. Additionally, for a single frame, there is no way to know what’s around the corner or just out of frame. Simply taking one step to either side in XR can reveal spaces around corners or edges where surfaces in the world must be invented if not available in other images.
This problem can be somewhat addressed by creating layered versions of the images used for the texture-mapping process. While the original photo can be used to texture all foreground surfaces (closest to the camera), a copy of the image can be made and then modified for use on the background elements. The background image copy will be altered to remove the obstructing foreground objects. The empty spaces behind them can then be filled in with something plausible, based on the surrounding image.
For the initial storefront, Dartmouth student Noah Hensley assisted with image layer painting. In Figure 14, the display window corners and glass reflections visible in the original film frame (Figure 8) obscure the view of the interior walls. Small sections of the floor, interior walls, spaces around corners, and the ground behind foreground objects on the right all need to be filled in with either basic color or details to enable changing the viewpoint in XR. These changes can be painted in Photoshop manually, or “cloning” can be used to copy similar neighboring regions and fill in the empty spaces. Alternatively, newer machine learning–based in-painting algorithms look at the surrounding regions and then try to fill in the selected area with a pattern that looks reasonable. This new, “invented” visual information will not be historically accurate but is experientially important to a viewer. The question of accuracy versus experience is discussed later.Several related challenges occur when working with film frames. When creating virtual objects, an accurate size is unknown. Context clues are frequently useful, like the expected height of door handles or stairs. While there are no colors to be adjusted or matched, the lighting and film exposure are usually very different between shots and films. It’s also quite challenging to align overlapping regions that were shot from completely different perspectives, particularly from opposite directions. Buildings change over time—two overlapping photos may have been shot years apart. Finally, shots of buildings frequently contain people, obstructing the view of the surfaces.
Actors
A remaining interesting challenge is bringing the actors into these XR spaces. A simple initial approach involves cutting them out of frames and placing them in space like a cardboard standup photo, as shown in Figure 2. One primary benefit is the ability to evaluate their depth in space and their scale in the XR environment.The available approaches for removing actors from the background frame for inclusion in 3D are rapidly evolving, mostly from advances in machine learning. What was recently a time-consuming manual process of tracing a figure’s edges is shifting from “computer assisted” to “computer automated.” It’s now possible, using recent versions of Adobe software, to extract people from video sequences by roughly suggesting where in the frame the person is (e.g., just circling them), then allowing the software to do its best to track their outlines through a shot. If it gets confused and makes an error, quickly pointing out the mistaken region is enough to get the software back on track. Hensley assisted with extracting an actor from a shot from The Cord of Life (Figure 15).
Accuracy versus experience
An AR Mirrorworld brings several concerns. Aside from the possible onslaught of advertising and surveillance commonly envisioned in popular fiction, there are much more mundane questions of truth that seem particularly relevant now. If seeing is believing, then the ability to create and control what people see should be quite concerning. As with social media news feeds, there is likely to be a combination of corporate AI recommendation algorithms and self-selected curation that determines what one encounters. What challenges will emerge regarding accuracy of information once what people see when they look around can literally be adjusted in both big and small ways? We are rightly concerned now about groups seeing completely different news and media. AR enables one to see—or not see—different versions of the world.
But even sticking to historic recreations, when we assemble a 3D reconstruction and consider working with AI-generated (or AI-assisted) imagery, we find ourselves adjacent to conservators and archivists, needing to be concerned about restoration versus invention. While virtual restoration carries no risk of damaging physical artifacts, photorealistic spatial presentations can be very misleading. Everyone who works with digital images learns very early that one cannot zoom in to low-resolution images to see greater detail. One only finds blurry pixels. In the past couple of years, this has changed. New AI algorithms that have been trained on millions of images can now improve resolution, sharpen edges, and even add new details.
This upscaling software (e.g., Gigapixel AI from Topaz Labs) is great for improving XR immersion and viewer experience but can cause issues for historical image research. One might reasonably expect that an algorithm that could zoom in to a blurry image to create a higher-resolution picture with sharper details of brick walls and rocky cliffs might be very useful. But specific patterns of individual bricks and precise rock features are examples of methods researchers have relied on to identify locations of film shots.24 Creating details that were not present could cause significant problems for historians.
Next Steps
We are just beginning to consider what emerging technologies such as AR might mean for studying the past. It will certainly enable us to investigate, as Williams once suggested, new research questions we could not have asked before. In the 3D environment, a significant addition would be to add the dirt, sidewalks, and roads in front of the buildings to better ground them in the real world. There is much more 3D modeling that can be done to the rest of the buildings on the block, such as their entrances, awnings, and sides. Building interiors could also be constructed. The imagined interior layouts vary based on the films—the same room might be a bakery, hat shop, or saloon. Being able to enter a building provides an opportunity to look out the front window to see the street and beyond. While good images of the road itself are not readily available, there are some shots in films that show views of buildings on the other side of the road.
Several questions regarding early film technology could be investigated more experientially. The planes of action, distances, and framing that early directors relied on and started to deviate from could be examined “on set.” Much as when early painters began exploring perspective, so too could early experimentation with camera placement be visualized in film shot setups. When presenting early film scenes using AR, integrated into our real physical environment, the designer needs to decide how all aspects should look. For example, should the lighting in the virtual scene be adjusted to better match the lighting in the physical environment? This is commonly done with AR to increase the illusion that the virtual objects are actually in the physical space by adjusting shadows, reflections, and matching light direction and color. Or should the virtual silent movie elements maintain our common expectations for what early film should look like—high contrast, jumpy with scratches, a bit too fast? When anything can be changed, a significant production design decision becomes which, if any, of the elements should be “improved.”
Conclusion
Exploring new research questions in this space will be driven neither by humanities scholarship nor new technological capabilities alone but will emerge from a web of collaborations and knowledge. As with the very earliest films, this project began with technologists exploring the possibilities enabled by camera innovations. The rapidly changing, unpredictable nature of emerging technologies for collecting, analyzing, and visualizing data requires a “systems thinking mindset to generate collaborations in new spaces.”25
In D’Ignazio and Klein’s book Data Feminism, they write, “The most complete knowledge comes from synthesizing multiple perspectives, with priority given to local, Indigenous, and experiential ways of knowing.”26 While combining film frames that show multiple perspectives was definitely not what the authors were referring to, working collaboratively to construct new knowledge and ways of experiencing it will enable us to consider what emerging technologies such as AR may mean for studying the past in the future.
Acknowledgments
This research has been conducted as part of the Media Ecology Project, a digital humanities research initiative directed by Mark Williams and John Bell at Dartmouth College, specifically in relation to a National Endowment for the Humanities Digital Humanities Advancement Grant (Level III) titled “Understanding Visual Culture through Silent Film Collections.” For this grant, Williams has brought together for collaborative research purposes a wide spectrum of early cinema collections, publications, and historical metadata for scholars, archivists, technicians, librarians, and artists to create a virtuous cycle of new interdisciplinary scholarship about archival media that adds value back to participating archives.
The virtual AR tree research in Figure 4 and the VR preschool in Figure 6 were created with funding from Battelle Engineering, Technology, and Human Affairs Endowment grants. The data visualization research in Figure 4 received funding support from the Translational Data Analytics Institute (TDAI), the Global Arts and Humanities Discovery Theme (GAHDT), and the Advanced Computing Center for the Arts and Design (ACCAD) at the Ohio State University.
A list of external links featured in this essay can be found here.27Matthew Lewis is an Assistant Professor in the Department of Design at The Ohio State University. He holds a joint appointment with the Advanced Computing Center for the Arts and Design (ACCAD) in OSU’s College of Arts and Sciences. He is additionally one of the core faculty at the Translational Data Analytics Institute (TDAI). Dr. Lewis works at the intersection of emerging technologies, computer graphics, and generative design, creating visualizations, art works, and digital tools. He has taught creative coding, interactive performance and installation technologies, virtual environments, 3D animation, digital lighting, and procedural animation. His recent research makes use of virtual and augmented reality in the digital humanities.
Title Image provided by Matthew Lewis
Silent Films and Augmented Reality © 2024 by Matthew Lewis is licensed under CC BY 4.0.
The Journal of e-Media Studies is published by the Dartmouth Library.