All posts by Ian Nacke

Photogrammetry II: Experimenting with Full Body

Photographing and Masking

For our second attempt at photogrammetry, we decided to expand on the scale of the model and the image quality. To this end, we started with a DSLR camera, setting ISO to a fixed brightness and leaving the other settings to auto. We discovered that zoom and different photo orientations are sub-optimal for photogrammetry. This is likely because zoom changes the field of vision of the camera and distorts the distances between pixels. Incorrectly oriented photos could be rotated, however, and FOV did not seem to hugely impact the outcome.

This shot gives detail to the face
This shot prevents a hole forming at the top of the head
This shot helps resolve the shoes

Moving from a sitting position to a standing full body shot required much more time and photographs. The sitting person can be captured in full with a single shot, but a full body needs to be divided into three or four groups. In addition to full body shots, we also did focused shooting around the face (ensuring to get photographs above the head to fill in the hole for the hair,) and shoes. Shoes were somewhat problematic, as sneakers tend to cast shadows and curve inwards at the bottom, confusing not just the photogrammetry algorithm but also any object selection tools in Photoshop, leading to poor results. In the future, we would like to try doing a separate scan of the sneakers on their own, and connecting the bottoms of the shoes to the final model, creating a cleaner base. Because of the greater work load involved in touching up and producing a full body mesh, we did not have time for this step.

As a result of the high quantity of images, masking took a long time. I developed a Photoshop workflow for masking that involves creating a custom action to select subject, expanding the selection by a few pixels, and masking to the alpha layer. Actions can be run on a batch of images, automatically generating masks for all photos. These masks can then be touched up in Metashape in the following step…

Photoshop can output the results of a batch to a folder

Exporting PNGs from Photoshop at largest file size leaves large blocks of buffer room

The outline represents the precise border of the mask. The gray represents parts of the image data discarded by Photoshop.

(presumably saving on processing power when saving transparency). We use this to our advantage because Metashape displays these overflow chunks, and we can add and subtract from the mask in Metashape’s masking tool to fix inconsistencies.

Extremely detailed masking is absolutely essential. Because Photoshop object selection tends to err on the conservative side, these changes will dramatically effect the outcome. In my final outcome, a noticeable chunk of the lower left forearm is missing. This is likely because that section of the input images were not touched up properly, and masking subtracted too much from the subject. This could also be a result of mistakes when cleaning the dense point cloud. Because I generated at ultra quality, the number of points was almost too many. Me and my partner concluded that it is better to generate at medium to high quality and then refine the edges to perfection.

Metashape Workflow

There is not much to say for aligning photos. We made sure to identifying bad alignments and reset/realign them. Aside from this the process is very straightforward.

A close up of the dense point cloud reveals a bumpy surface

For dense point cloud generation, my model seemed to generate with very bumpy textures. This could be because of the number of photos contributing too many points leading to greater amounts of inaccuracy, or because of the quality setting which was ultra-high, producing even more points. Perhaps we lacked enough image data for the setting to render a good result. My partner was able to get an almost higher quality outcome with about two thirds the images, so the time spent waiting for ultra quality to render should instead be spent touching up masks and cleaning point clouds.

Our overall takeaway is that too much cleaning at this stage will actually mutilate the mesh. However not enough will leave bumpy surfaces on your model.

Moving areas obviously pose the greatest challenge. This is why the chest, which expands and contracts with breathing, and the arms which move slightly rendered much worse than the pants, which are loose and unaffected by minor adjustments of the legs. A look at the point data gathered from images including the arms shows that the smooth coloring of the arms also posed a challenge.

Stomach and arms both noticeably lack tie points.

My partner experienced issues with my arms moving too much. In the end we tried masking specifically the arms and rendering them as a separate chunk before joining them together. We discovered that because they were part of the same photo set, allowing Metashape to do automatic alignment was better than marking common points. In the end, we still chose to retake the photos.

Arms are generated from a subset of the original images with least motion.

Holes are better filled in Metashape than Meshmixer, generally. Meshmixer is unable to fill large holes, while metashape can. Holes in Metashape are not well re-textured though, and need to have their correct textures projected onto them in another software.

One issue I had was Metashape closing the gaps between the arms and the sides, as well as the two legs. It’s clear in the dense cloud that there is a gap, however in the model the gap is closed. So, while the arms should not be too far from the sides, they also ideally should have some distance to prevent this gap filling. Either that, or have the subject wear something with sleeves that touch at the sides, so this effect is not noticeable.

Mini Project: Photogrammetry Part One

Tie-points (loose point cloud) rendered and animated in blender

This blog post follows the basic process of generating a dense point cloud in Agisoft Metashape. The process of regenerating three dimensional data from a set of photographs is fascinating. In some ways the point clouds generated from this process are even prettier to look at than the geometry that the process can produce.

Masking

The process of masking isn’t essential to the process of generating the point cloud, but for a higher quality alignment of the photographs in 3D, masking is absolutely necessary. Masking excludes background colors from confusing the triangulation algorithm.

Metashape’s automatic masking is challenging to work with and basically esoteric in comparison to other modern tools capable of masking images. I worked with Photoshop’s subject selection to mask all 170 imported photographs in about 20 minutes. If you don’t have Photoshop nearly any image editing software capable of masking an image will work better than Metashape. These masked images, saved as transparent PNGs can then be imported into metashape, and masks can be generated from their alpha channels.

PNGs imported into Metashape with alpha-masks generated

Camera Alignment

This step involves the click of a single button, so here are some photographs detailing the process and result.

Each image’s position is triangulated based on shared pixels in the other images. Here we can see all images position in reconstructed 3D space.
Another angle
The “tie-points” which represent shared pixels detected in multiple cameras, loosely resemble the 3D geometry of the model

Dense Point Cloud Generation

The dense point cloud position alongside photos used for construction

The final step is to use the aligned cameras to construct a dense point cloud. This process is especially long and tedious. However, If your ultimate goal is to produce a traditional mesh model, then it is best to generate the point cloud at high quality. Afterwards, the model can be generated at differing levels of quality and optimization, but your point cloud will have maximum detail. However if your end step is this point cloud, one could opt for a lower level of detail. Be aware that rendering time is exponential according to the level of detail (and number of input photos).

Race to the Dorm and Experimenting With 360 Video

An Overview of 360 Video

What is shown below is 360 video recorded on a RICO-THETA 360-degree camera. Our video can and was edited with basic video editing techniques, such as adding music, sound, and graphical touchups. However, what concerns this project and blog post more are the unique edits possible in 3D or 360 footage. The most basic modification that can be done to 3D video is to the “center” of the video, or the starting orientation of the viewer. However this much alone is not very interesting. What I ultimately want for this project is to immerse the viewer graphically and auditorally. There are methods to do this, though in some we were successful and others not.

Audio

Despite best efforts to extract 3D audio from the THETA camera, all we were able to get was a stereo channel. Whether this is due to the recording capacity of the camera or user error is up to more knowledgeable readers, (comparisons done between youtube tutorial footage and in-class slides showing how to edit ambiX audio in premiere seem to indicate that the audio demoed in class was stereo rather than ambisonic.) It would make sense that stereo audio should be easily converted into ambisonic. Simply place the soundsource in front of the observer in ambisonic space, and pan/pitch/tilt the ambisonic sphere created. With this in mind, and no actual ambisonic audio forthcoming, I attempted to simulate the three dimensional spherical effect using other tools.

Using Premiere to Convert Stereo into Ambisonic

It would initially seem to be quite easy to binauralize short audio shots. Built into Premiere pro is an audio binauralizer which can demo the effects of tilt, pitch, and pan on audio in 3D. This effect is only for demoing purposes though, and does not render well to video. Another option was to apply the ambisonic panner effect to the audio. This effect allows for keyframing the pan of an audio source, which would enable us to pan an audio source from in front of the camera to behind. Inserted sound effects such as voices or passing vehicles could be made more effective by creating the illusion that the sound is passing from in front to behind the viewer. Unfortunately the ambisonic panner is only available for 5.1 audio clips. Unfortunately, despite attempting to convert stereo audio into surround sound 5.1, we were ultimately not successful.

A Programmatic Solution

Native solutions within Premiere turned out unsuccessful, so as a computer science student, I looked for other approaches. One such approach came in the form of Miki Lombardi’s “Ambisonic Audio Generator.” The generator is a short piece of code which converts stereo audio into 4-channel ambisonic audio. While technically recreating the binaural effect, Lombardi’s code does so by simulating it in stereo, rather than outputting a 4-channel audio file.

Conclusion

While the possibilities of editing 360 video are exiting, they were not realized with this project. Ultimately, to be creative with 360 video one needs to be capable of more than adjusting the native video and audio recorded on a 360 camera. Having manual keyframable control over audio sources (not just ambisonic ones) would allow for the illusion of three dimensional sounds not found in the original footage. Similarly, the ability to edit other footage on top of 360 video, as well as cut and transition between 360 video are all important skills that allow an artist to go beyond simply touching up recorded footage.

Footnote: Here is a version of the above 360 video with a anaglyph filter applied. See the previous blog post for more details about anaglyph images.

Transitory Places: A Loud but Silent Anaglyph Image

An “anaglyph” image, which contains three dimensional information in two images meant to be viewed by the left and right eyes, while wearing a pair of red/blue 3D glasses.
The image data used to construct the image meant to be viewed by the observer’s left eye.
And the image meant to be viewed by the right eye.
THEME

A “Loud but silent moment” initially reminded me of the phrase “deafening silence,” meaning an overpowering silence, so quiet that it feels like it can be heard. My first thoughts were of my neighborhood and daily commute, so I chose a photograph that captures the freeway running past my apartment alongside the windows of the neighboring complex.

Considering the stereoscopic element of this project, which aims to capture an element of three dimensionality, I wanted to interpret the concept of noise in terms of physical space. There is a clear divide in the photograph between what one may consider “silent” and “loud.” The street is full of noise and motion, while the apartment blocks are isolated, unmoving, and quiet. However, I also feel the opposite way: that the freeway is a kind of white noise which we ignore – no different than silence – while each lit up window represents a pocket of meaningful noise, containing conversations and interactions between humans.

INTERPRETATION

Near my apartment block, everything is quiet and nothing is loud. The streets are empty, save the hours between five and ten at night, when a long rush hour and constant stream of cars travel out of Shanghai. Our neighborhood in Pudong is a sort of transitory place; few people besides street cleaners and construction workers walk on the sidewalks which separate undeveloped plots of land from empty four lane roads. The cars which travel down these roads draw your attention to just how silent the surroundings are, and at the same time are a kind of silence themselves.

Perhaps because of the government’s “15-minute city” policy, which promises outer-city residents that all their needs must be met within a 15-minute walking distance of their residence, pockets of things are commonly found in Pudong. These pockets of human activity are surrounded by stretches of developing and yet-to-be-developed land, with only long roads to cut them apart. For people who live at the edges of these empty spaces, the noise can be overpowering, but at the same time, lacking qualities, empty, not so different from silence.

 

NAMOO: Reviewing VR Experience

“Namoo” is a hand painted and animated 3D diorama that tells the story of a painter’s life from beginning to end. This VR short film made for the Oculus Quest directed by Erik Oh, tells a story based on Oh’s grandfather. Over the ten-minute runtime, a tree grows and changes to represent Oh’s grandfather’s life and experiences. For example, as a child, when the painter first discovers art, he leaves a handprint on the bark which remains there until the end of the film, signifying the importance of this moment for the painter. Similarly, when he finds love, the shape of the leaves forms a heart and throughout the film all sorts of mementos get caught in the branches, filling up the tree like ornaments. Though the story is stripped down and simple, what shines are these mementos, which are beautifully animated morphing and changing to reflect their changing places in the painter’s life.

The tree stands next to the painter's easel. The tree's "leaves" are all sorts of objects.
The tree stands next to the painter’s easel. The tree’s “leaves” are all sorts of objects.

It was because of “Namoo’s” meticulous and intentional animation that VR really shined as a medium for experiencing this film. The animation is done frame by frame with no computer interpolation, meaning the movie runs at a low framerate but the quality of the animation is very high. Every scene is so detailed and must have taken so much painstaking effort to construct that after the five-minute mark I couldn’t believe the film was still playing. Different details reveal themselves depending on the angle and position one was viewing the film from, much like a miniature physical diorama. It felt unique and different, more dynamic and engaging to be able to explore a film in this way. I was constantly moving around the scene to see things from different angles, just like the way one explores a diorama in real life. Unlike a physical diorama, in “Namoo” everything is always changing, and I was forced to rush around the tree absorbing as many details as possible, seeing all the little animations taking place before they played out and one scene transitioned into the next.

The tree floats into the sky.
The painter stands on the tree as it floats into the sky.

The question of artistic value is often present when discussing video games, but because “Namoo” is not a game at all, rather than lingering on this question I’d like to discuss the unique aspects of VR as a medium for creating art. Obviously, “Namoo” was created by artists, which in my mind should make it a piece of artwork; each frame of the experience is painted and animated by hand. With an aesthetic reminiscent of VR painting tools like Tilt Brush, I was confident that the animation must have also been done in VR. This is what set “Namoo’s” visual style apart and made it work. For “Namoo,” VR is not just a medium in the sense of consumption, but also in the sense of the means used to create the artwork, much like acrylic or oil are media for two dimensional paintings. The beauty of VR as an artistic medium struck me most when I was surrounded by the artwork, rather than circling it. When the sky changed, filling with clouds, I suddenly felt much more present in the space. The act of looking up and around myself, placing myself into “Namoo’s” world, realized the space more comprehensively and effectively than any other interaction I had with the film previously could. While I appreciated being able to inspect the little details by getting up close, being dwarfed by my surroundings was the sensation that pushed me to feel immersed in the environment. At one point in “Namoo” the tree rises into the sky and despite being suspended a short distance away, I was eager to imagine myself sitting on the tree as it rose into space. This capability of VR to completely surround an observer such that they exist within the image world is one of its most powerful tools. At that point in “Namoo” I was trying to imagine myself existing within the image world, and it remains the most memorable moment from the experience.

This scene of the painter as a baby stood out because of the beautiful glow and tiny particles of light floating everywhere.

Immersive Arts: Reading Assignment 1

In  Virtual Art: From Illusion to Immersion, Oliver Grau uses the phrase, “psychological distance” to refer to the abstract distance between the observer and the “image space,” aka the physical space that is suggested and created with images. The “distance” he describes is more than the literal perceived distance between the observer and the subject matter; it represents the effectiveness of a media at creating a convincing depiction of a physical space/world. Grau uses the concept of interface, the way a media uses the senses to connect an observer to an image world, to analyze concepts such as psychological distance, immersion, and critical reflection. The rest of this post will focus on the different ways that media represents reality by “interfacing” with our senses.

We can understand the factors that affect psychological distance in terms of each of our senses and how they interface with the world. Different media are then more effective at replicating some senses than others. For example, sculpture or theater possess actual three-dimensionality. While only technically being an image of something, such media brings the image world into the observer’s world. Virtual reality is the complete opposite, completely sealing off the observer from visual elements that are not a part of the image world. VR essentially brings the observer into the image world. I would initially consider sculpture more psychologically distancing than VR, but both approaches really exist on opposite ends of an axis, where the former exists within the real world and the latter completely replaces the real world, with no clear distinction on which is better. Another axis to consider is participation, which is a focus of games. Games make the observer a participant within the image world, draw attention to the role of interaction in psychological distance. Specifically, the difference between games and other media is not simply the ability to interact with the image space, but rather that games can place an observer within the image space itself to reduce psychological distance. It is perhaps because virtual reality excels over sculpture in intractability that it feels less psychologically distant.

We can use this understanding of psychological distance to understand immersion in media. Grau describes immersion in terms of a process: the mentally absorbing transition from the observer’s world to the image world, which psychological distance plays a role in facilitating. Grau believes that to achieve maximum immersion, the media must convince the observer that they are existing within the world that the images represent, which can only be done by exactly replicating how humans, through all our senses, perceive the real world. It is because Grau believes the goal of immersion is to become as close as possible to reality, that he questions if critical reflection is still possible. He explains that immersion techniques such as interface make certain media easier to reflect on than others. interface is easier to reflect on when it is visible, and yet virtual reality in some way obfuscates that visibility. Virtual reality uses the “natural interface” which, while it is only introduced and not explained, seems to encompass as many natural senses as possible. Because of the complexity of this kind of interface, Grau believes virtual reality poses a challenge to critical reflection that other forms of media cannot.

Perfection: Visual Metaphor Final Project by Ian and Valeryn

 My project is a relaxing pottery video. This footage is accompanied by a voice over reading a short love poem about being perfect and being wanted. This video was inspired by the genre of videos on YouTube encompassing ASMR and tutorial videos. In these videos, the focus is on the process of creating something, with emphasis on the process, sound and visuals. When me and Val initially started brainstorming, these kinds of videos were what we both associated with the “visual metaphor” theme. However, we both lacked the skill to really make something that would be worthy of teaching to others, so we chose to focus instead on the process of making art itself, rather than the final product. This constant challenge of wanting to create something good in art is something that I have been thinking about a lot. On the contrary, Val brought up the idea of fitting in in society and perfection as an individual in relation to others. The concept of fitting in is very relevant for both of us as we are international students living in Shanghai each exploring how to fit in in this society as Asian Americans. The connection between finding perfection in art and in oneself is what we based our visual metaphor project on.

We edited our storyboard for this project many times until the deadline, but below is a general outline of what we wanted to create, accompanied by the quotes from the poem that would play over these sections:

  • sculpt (I wish I was skinny, I wish I was tall.)
  • painting (I wish I was perfect and prettier than them all.)
  • smash it (I want to be popular)
  • sculpt again (I want to be adored)
    + the clips of smushing the clay
  • Smash second time (but most of all, I want to be yours)

We chose to shoot our footage at a pottery studio that Val found. However, without having been there, our footage process was done without much planning. When we arrived, we were given time to simply work on pottery, and we shot over 30 minutes worth of footage over the hour we spent there. One of the biggest challenges we faced was in using this footage to create a coherent story. Our storyboard required us to mix and match the footage to feature the correct scenes to match the poem, but the pottery went through many phases, so doing so would disrupt the continuity of the footage. In other words, if we mixed and matched footage, the pottery would look different and it would lose the story like process of making the pottery. In the end, we used premiere’s warp stabilization effect extensively in order to stabilize the freehand footage, as we did not use a stabilizer. One shot is reversed. This was because we mirrored the footage to preserve my hand positions. However, once mirrored, the pottery wheel spun the wrong way, so we reversed the footage to match the rest of the video.

In this project, my main role consisted of video editing and acting. I sculpted and painted the pottery, while Val did all of the shooting. So when it came time to edit, I assembled most of the video, while Val edited in the sound. I think the best part of working with Valeryn was coming up with the initial idea though. Her footage is shot really well and the sound editing is equally clean, but I am most proud of how our visual metaphor took shape, and I could not have come up with such an interesting metaphor without brainstorming with her.

We took the most inspiration from silent tutorial videos on YouTube. Creators such as Alvin Zhou were very relevant when shooting footage and doing sound design. It was hard to fit our entire narrative into such a short timeframe, but we tried to preserve the relaxing zen-like qualities of work by creators like Zhou. However, camera language and color correction were some of the weaker parts of our project. We learned a lot from the critique section on how to improve this aspect. Our footage could have been more diverse in terms of camera angle and shot length. Though much of the footage shot was the same size and angle, we could’ve done more in editing to zoom and diversify shots.

Reflection: Memory Soundscape

Concept

My mother is a teacher and for most of my childhood, I went to the school that she worked at. My mother regularly had meetings and would have to stay at school late, keeping me there as well until we could go home, so I spent a lot of time doing work at school after my friends had gone home.

In high school, there was a study space near the cafeteria that I stayed at until late most days, which is where my memory soundscape takes place. Right after school ended, the area was full of students and noise, but by the time I would go home, the large building would be empty and outside would be dark. On the night that this memory soundscape takes place, it was winter and in my hometown during the winter it rains often and gets dark early. I put on my jacket and walk through the empty building and out the door into the rain.

My memory soundscape is about the somewhat melancholy somewhat stifling feeling of being in an empty space and of leaving that space to walk through the rain at night.

Process

My soundscape can be broken into three distinct parts. It begins with the noise of people walking around and talking. Then the room empties. Finally, I put on my coat and walk outside.

Sounds

The ambience of school buildings at night was very important to me to capture accurately. I stayed at the AB until late at night to capture sounds of AC units, students talking and the sounds of plumbing. For the other aspects of the scene, I recorded my footsteps in large hallways, and opening and closing fire escape doors. To create the rain and wind at the end, I recorded my shower and bathroom faucet.

Editing

I started my editing process creating the ambience of the room. This was the most important part of the memory because when there’s no people in a room, the sounds it makes are it’s character. I panned the different tracks relative to where I remember them being in my memory. Making the footsteps was somewhat complicated. At first it didn’t sound right. Putting in the footsteps with no effects sounded wrong, but it also didn’t sound close enough to be my own footsteps when I added reverb. The solution was layering one version with reverb and one without on top of each other. Finally, making the rain start and stop with the opening and closing of the doors was simply a matter of careful fading.

However, in the end, I realized that the mix was missing something. It didn’t feel personal enough, like it wasn’t reflecting my own experience of the memory. An empty room, especially at night after a day of being full, feels more muted and isolating, so the solution was to add an FFT filter over the entire mix that kicked in gradually. I carefully adjusted the levels to make it sound according to how I felt about the memory.

Conclusion

I received an extension for this project and did not end up with feedback from my classmates, but there are many things I would change with more time. More time would give me the chance to add complexity and depth to the mix. For example, I only use two recordings for the main ambient noise, excluding other details. I would like to record more samples of individual sounds that I could add at different “positions” in the soundspace to add depth. I would also layer more sounds, such as more detail to the sound of getting up and walking outside. These actions are expressed very minimally, and could use more sound to make the actions clearer. 

Gamification Design Proposal and Reading Response

Proposal

A video game experience to raise awareness about ocean pollution, designed for children as well as adults of all ages.

Ocean pollution is an incredibly important issue that has significant impact on everyone’s lives. There is a lack of information on this topic, not to mention significant misinformation surrounding ocean garbage.

My game will use scientifically accurate data to represent the pacific garbage patch in a way that shows the true extent of ocean pollution. The player will pilot a boat through a video game representation of the pacific garbage patch that is accurate to size according to scientific data.

Currently the only way that humans have to deal with ocean garbage is simply gathering it all up in large nets, which aside from being grossly underfunded, ignores many other issues such as microplastics, which are too small to be picked up in nets. As such, the player’s goal in the game will be to collect the garbage they come across in the ocean. They will do this in order to navigate to the center of the pacific garbage patch. The closer to the center the player gets, the more dense the garbage will be, blocking their path and forcing them to collect it to open a path.

Any search on ocean pollution and its prevention will likely focus on what you as an individual can do, such as using less disposable plastic and recycling more. However, individual use makes up a miniscule part of ocean pollution compared to the practices of plastic and oil companies. Plastic produced by plastic corporations is specifically designed to prevent decomposition, and it is the fault of these companies that so much plastic ends up in the oceans. These companies that invest huge amounts of money annually into campaigns promoting recycling and eco-friendly practices in order to redirect blame from themselves onto average people. However, the true issue lies with the production of plastic in the first place. To highlight this issue, in the center of the game representation of the pacific garbage patch, a massive factory building will stand on top of an island of trash, spewing garbage out of large openings. The speed of garbage coming out will reflect the scientifically accurate speed of the growth of the pacific garbage patch.

Reading

Kapp defines games as having a number of features. He states that games must have Quantifiable Outcome, in which the actions that a player takes on the game world, and the state required to “win” the game are all clear and visible. However, with my game I’d like to bend this rule a little bit, by minimizing this feeling. I want the player to feel that the impact they’ve made on the game world by collecting trash is miniscule, in order to understand the significance and size of the problem being presented.