Week 11 – High-End Tools for VR and Their Significance – Jonghyun Jee

Viewing and stitching footage I took with the 360 camera, I felt excited and intimidated at the same time. The fact that I have to deal with these 3D stuffs drove me somewhat nervous. And that was what made me more surprised when I was watching these three high-end tools (Mettle’s Mantra VR, Foundry’s Cara VR, and Boris FX Mocha). Although they vary in details, they seem to share one characteristic: an intuitive interface.

Mantra VR enables its users to easily stylize their 3d footage with a myriad of its presets and filters. This music video of Piers Baron showcases some interesting visual effects of Mantra; especially, the way how they seamlessly mirrored their video is simply amazing. I assume we can try the similar effects on After Effects, but still, it will be far more time-consuming than using Mantra that is specialized for visualizing such effects. Compared with that of the others, the price of Mantra ($229) sounds fairly cost-effective.

Cara & Nuke feels like, in short, a 3D canvas what I envisioned and wanted. I watched the whole tutorial of how to place objects in 360 VR. Surprisingly, many of its functions are quite straightforward. A lot of VFX chores such as color-correction and stabilization are automatically processed to the point where it seems to require significantly less manual labors.  

And yet, my favorite one was Boris FX Mocha. It looks almost magical. Area brush—one of many powerful tools that are available in Mocha—makes layer-masking so simple and effective. Their unique module building “Mega Plate” visualizes the processes of object removal and sky replacement in such an intuitive way.

Their highest priorities, I think, are to give users more delicate control in more simple ways. For example, their tracing functions make traditional key-frames seem rather outdated. With these tools, people will create VR video in their own amazing ways, which were only available for a limited number of VFX professionals. So far, the development of VR contents was heavily dependent on either media or game companies; these high-end tools will lower the entry barriers for amateur producers, and consequently diversify the contents of VR/AR experience (as Photoshop and Illustrator did for the graphic design). 

Week 10 Assignment (Style Transfer) by Jonghyun Jee

For this week’s assignment, I chose “Taenghwa” as the style input. Taenghwa, paintings of Korean Buddhism that are mostly for indoor display of temples, is characterized by its rich symbols and intense colors. Below is the image I used:

Using Intel DevCloud, The entire training process took roughly 20 hours. Since I tried several times but eventually failed to utilize DevCloud for my midterm project, I could correctly learn this time how to submit my shell file and export trained data. After the training process was completed, it generated a folder with a json file and a number of files that are labelled “variable_#.” I integrated this data-set with Aven’s Inference Style Transfer model. 

Despite running slowly, it stylized my webcam input in real time. We can clearly see how the color scheme got changed into red and green, which are main colors of the Taenghwa. Most lines seem, as expected, somewhat vague and lumpy. And yet, I was excited to spot some characteristics of Taenghwa from the output. So I tried it with other images, using Aven’s another code: Style Transfer for Images.

First image I tried was The Birth of Venus by Sandro Botticelli. Its output seems rather surreal than giving the impression of Taenghwa. Another remarkable thing I found is that most of the stylized images have glitched parts. As you can see from the image above, the clam shell and Venus’ leg are partially glitched like rgb subpixels of an analog TV screen. They can also be found on the image in which I have my hands together; the curtain behind me also has a glitchy part. I’m wondering what caused this effect because the original Taenghwa input doesn’t have this sort of characteristics. 

The other image I tried was the 12th century mosaic of Christ Pantocrator. Since it has clear distinctions between the figure and the background, the generated image seems clearer than the past results. And this image definitely displays more characteristics of Taenghwa, considering its composition and solemn atmosphere.

Overall, the whole process went smoothly and I could learn a lot from training the algorithm with the data I chose. I’m excited to learn more about the GANs in upcoming weeks, so I can realize the Hanafuda GAN for my final project.

Midterm Project Proposal and Research – Jonghyun Jee

Overview

I changed the topic of my project from the ancient Chinese characters to the Hanafuda cards. There are a couple reasons why I changed the subject: first, I found that the idea I proposed before is more based on computer vision rather than artificial intelligence (it does not necessarily need deep learning to identify the resemblance between an image input and a certain Chinese character); second, I wanted to actually train my algorithm instead of utilizing the pre-trained models.

Image result for hanafuda"

Hanafuda cards, or Hwatu in Korean, are playing cards of Japanese origin that are commonly played in South Korea, Japan, and Hawaii. In a single set, there are twelve suits, representing months. Each is designated by a flower, and each suit has four cards—48 cards in total. Hanafuda can be viewed as an equivalent of poker cards in Western, or mahjong in China as they are mostly used for gambles.

I got the inspiration from Siraj Raval’s “Generating PokĂ©mon with a Generative Adversarial Network,” which trains its algorithm with the images of 150 original PokĂ©mons and generates new PokĂ©mon-ish images based on the data-set by using WGAN. I replaced the dataset with the images of hanafuda cards I found on the web, and modified the code to work accordingly with the newly updated data-set.

Methodology

Collecting the proper image set was the first thing I did. I scraped 192 images in total, including a scanned set of nineteenth century hanafuda cards from the digital library of the Bibliothèque Nationale de France,  a set of modern hanafuda cards from a Japanese board game community, vector images of modern Korean hwatu cards from the Urban Brush, and a set of Hawaiian-style Hanafuda cards. I had to manually gather those images and trim out their borders, which may affect the result of generated images.

Although they have slightly different styles, they share same themes, symbols and compositions—as we can see from the image above, there are distinct hanafuda characteristics: including elements of the nature, Japanese style of painting, simple color scheme, and so on. I hope artificial intelligence can detect such features and generate new, interesting hanafuda cards that have not existed so far.

The code is comprised of three parts: Resize, RGBA to RGB, and the main GAN. First,  as it is necessary to make the size of all image sources same, “Resize” automatically scales the images to the same size (256*404). Second, “RGBA2RGB” converts the RGBA images to RGB, namely from PNG to JPG. The first two steps are to standardize the images so they can be fed to the algorithm, the main GAN.

The GAN part has two key functions: Discriminator and Generator. The Discriminator function of this GAN keeps trying to distinguish real data from the images created by the Generator function. It has four layers in total, and each layer performs convolution, activation (relu), and bias. At the end of the Discriminator part, we use Sigmoid function to tell whether the image is real or fake. The Generator function has six layers that repeat convolution, bias, and activation. After going through the six layers, the generator will use tanh function to squash the output and return a generated image. It is remarkable that this Generator begins by randomly generating images regardless of the training data-set. And as training goes on, the random inputs will morph into something looks like hanafuda cards due to the optimization scheme, which is back-propagation. 

Experiments

As I’m using a Windows laptop, there are some difficulties in terms of work submission to Intel DevCloud. After I figure out how to install needed dependencies(cv2, scipy, numpy, pillar) on the Intel environment, I’ll train the algorithm and see how I might improve the results. Here are a number of potential issues and rooms for improvement:

image1

Above is a set of images generated by PokĂ©mon GAN. Here we can clearly see some PokĂ©mon-like features; and yet, they are still somewhat abstract. Aven pointed out that the result of my hanafuda cards are very likely to be as abstract as those PokĂ©mons are. I need to do more research on how to make my potential results less abstract, or I may select a few interesting outputs and retouch them a little bit. 

Since the data-set is comprised of cards of the three different countries, I’m so excited to see what will come out of this project. I’d like to find some “pairable” cards that share similar characteristics, so I can possibly group them as 13th or 14th month. 

Assignment #1&2 by Jonghyun Jee

Examples of image works that are based on using more than one image taken from the exact same spot

Photos: What does it look like to stand in the same spot for 40 years?

Camilo JosĂ© Vergara has spent more than forty years photographing and rephotographing the same forgotten corners of American cities. From crumbling housing blocks in the Bronx and an abandoned Detroit mansion, to dwindling row houses in Camden and the many lives of a Los Angeles baptist church. In all cases Vergara eschews the monumental to focus on a city’s discreet pockets. Returning year after year to the same positions, he regenerates images even as the structures in front of his lens decompose and are reborn in a cycle of photographic renewal. Architecture given shape by time and neglect takes on an organic quality—a reminder that edifices are as temporary as the lives they shelter. Vergara’s urban generation loss depicts fluid cities as a mirror of the present aging into obsolescence. Ultimately his images force a reckoning with death, confronting our inability to grasp the undercurrents relegating urban space and time. (Timeline)

The photos and quote above are a great example of projects that tracks how a specific spot has changed throughout years.

Interesting Spots in Shanghai

1. The Bund

Image result for bund transformation

(Source: the Telegraph)

Symbolic of concession-era Shanghai, the Bund was the city’s Wall Street, a place of feverish trading and fortunes made and lost. Originally a towpath for dragging barges of rice, the Bund (an Anglo-Indian term for the embankment of a muddy waterfront) was gradually transformed into a grandiose sweep of the most powerful banks and trading houses in Shanghai.The optimal activity here is to simply stroll, contrasting the bones of the past with the futuristic geometry of Pudong’s skyline (Lonely Planet).

The Bund is undoubtedly the icon of Shanghai and its rapid development. Since we can find a myriad of photos and video footage of the way how the Bund has changed, it is definitely a great spot to begin our project. The history of Huangpu River goes further back than the Warring States era, so we cannot talk about the transformation of Shanghai without talking about Huangpu River and its surroundings.

2. Zhujiajiao

Image result for Zhujiajiao

(Source:  Ginger Around the World)

Arguably a Chinese version of Venice, Zhujiajiao is 30 km away from Shanghai and reachable via subway. This water town was established 1,700 years ago and still has a number of Qing dynasty buildings; and yet, a recent over-development has become a huge issue as there is a flood of shopping and entertainment complexes being constructed around the old town. 

3. Longhua Temple & Pagoda

Image result for Longhua Temple & Pagoda

(Source: Culture Trip)

Shanghai’s oldest and largest monastery is named after the pipal tree (lĂłnghuá) under which Buddha achieved enlightenment. Trees are decorated with red lanterns, incense smoke fills the front of the grounds and monks can regularly be heard chanting, making this one of the city’s most atmospheric sites. The much-renovated temple is said to date from the 10th century.

Although most of the present day buildings date from later reconstructions, the temple preserves the architectural design of a Song dynasty (960–1279) monastery of the Chan School. During the Second Sino-Japanese War, the Japanese operated their largest civilian internment camp in the area of Longhua temple. J.G. Ballard’s novel Empire of the Sun details this time in history, also claiming that the pagoda was used by the Japanese as a flak cannon tower.

Week 6 Midterm Project Proposal by Jonghyun Jee

Background

It’s interesting to think about the way how earliest Chinese characters were created, as they are visual representations of real objects rather than of symbols. ĺ±± looks like a mountain, 木 a tree; we can clearly see the resemblance. Although modern Chinese characters have developed into a complex writing system that include both pronunciation and abstraction parts, early Chinese characters such as  ZhuĂ nshĹ«(篆书) and JiÇŽgÇ”zì(甲骨字) are more inclined to be pictographic.

Project

I’d like to create a project that is based on these ancient Chinese characters: if a user uploads an image file as an input, trained algorithm will show the result of most resembling ancient Chinese character. Or it can trace contours of an image and create a new Zhuànshū character (I don’t think it requires artificial intelligence to perform such task though). If unsupervised learning is too hard, I may try supervised learning by labelling a few ancient Chinese characters.

Methodology

For the training data, I need to figure out where to get a data set that includes characters as many as possible. I’m thinking of extracting data from a Chinese font file, as it’s standardized and convertible in digital format; 64X64 or presumably 32X32 sizes will be sufficient to represent each character. It may require image classification but I still need a lot more research on how to substantialize this idea.

Examples

Below are the images that I found particularly resembling. It’d be interesting to see how algorithm might pair a given image with a similar traditional character.