Week 13: Final Project —— Yunhao Ye (Edmund)

Background and inspiration:

When I did my case study of image-to-image translation, I found a very interesting project, which is called “person-to-person video transfer. The designer Dorsey first records the speech of Kerzweil, and then records himself mimic Kerzweil’s pose frame by frame, then he use these two videos to train a model to transfer himself to Kerzweil with the same pose. This project attracts me because it reminds me of the idea of my tiny project of PoseNet. In that project, I use the output of that model to portray a puppet controlled by the strings, so you can control the pose of the “puppet” by your body. And in this final project, I want to improve that project with a more creative conception. 

Conception:

Basically, what I want to do is a pose control model, I use DensePose as an intermediate model to generate the pose image of the user, and I will train my own CycleGAN model to generate a human-like image of based on that pose image. So the model can generate human-like images based on the pose of the person in the input, the input can be an image, a video, or camera captured pictures.

Progress:

The model I use is called CycleGAN, and the file I use is the Google Colab version which is provided on this GitHub website. Also, Aven helps me a lot to modify the original code to make it much more convenient to use.

In the training process, I need two data sets, one is the images of people and the other one is the images of pose generated by DensePose. And the model will be trained to build 2 models to generate from one category to another category. And the one I need is the pose to people one.

In the first stage, the people images I used are random. Since my assumption is that if a use images from a lot of people to train the model, the model can generate human-like images with random styles. 

So I collect the photo of single person’s full body images from the internet. Also, I found some solo dance videos, and then collect frames in those videos. Since all the dataset images for CycleGAN must be squared, so I wrote a very simple Processing program to help me resize the photo without changing the ratio, and it will add white padding to the empty part.

After I got all these resized people images, I send them to RunwayML DensePose model to generate the pose images. 

After getting all the images I need, I start to train the model on Google Colab. I have trained 8 models and about 1000 epochs in total.During this process, I have changed my dataset for many times, including deleting some images of which I think it is not good for training and adding new images to it.  But I found the results are not much similar to what I would expect. Firstly, the characters and details of a human is not shown clearly in the images it generates. And most importantly, one model cannot generate images of different color styles and each model has its own color style. Though I can still make the images random by using many models. But that kind of random structure —— “random choice in the styles I choose” is different of what I want to achieve —— “random generating elements based on what machine has learnt from the learning process”.

(below are testing results of models belonging to different epochs in the same training process)

 

So I give up the original plan and step into the second stage. I need to use the images of a certain person, and both Aven and I think choosing the images of myself to train the model is a good idea. In this case, the model will be trained to generate Edmund(me)-like images based on the pose of the person in the input. So I take some videos of myself and then use the frames as the new training dataset.

This time, I also trained 8 different models but only with 600 epochs. I notice that it is really quick for the model to achieve its best performance (about 20th-40th epochs) and after that the images it generate will get really abstract. But after about another 100 epochs, the performance of the models will return to the standard of those models generated from 20th-40th epochs. So, generally, I only need to train about 50 epochs for each model. And the quality of the images it generates are much higher than the first stage, the human features are easier to detect.

Below is the testing result of a well-performed model.

The result is really good among all the models, you can tell the black hair on my head, and you can see half of my arm is not wrapped up by the shirt, which is portrayed with a warmer color. And if you see more carefully, you can see my shirt is in dark green and my jeans is blue. But the shapes of “me” in the generated images are too regular, by too regular, I mean it is not creative and it is strictly same to the pose images.

And my final choice is here. I like it because its style is kind of abstract. The shape of “me” is not 100% the shape of a human, but it may curve or round in a certain way. I like this kind of abstract style to recreate myself. In the meanwhile, the features of me can still be seen clearly.

After I get that model, I run it locally and then make RunwayML  connect with that local server, then I wrap this model up with the DensePose model, so it now can get the images of a person, and the generate the images of myself, and the user can control my pose with their own bodies.

Here is the screenshot of the workspace, and you can use processing to send data to and get data from this model with a code very similar to the first one shown before.

Unfortunately, the performance of the model in RunwayML is much worse than its performance during the testing process in Google Colab, the results should be the same but there seems to be something wrong. I have tried many ways but still cannot find how to solve it.

Also, since I can only use it locally with a CPU, the runtime of this model is really long. So it does not support a live camera as its input (actually the live camera is the most interesting part, what a pity). 

Here are the images of my friend and the images generated by the model (in Google Colab) based on those images.

Future Improvement:

Firstly, of course, I need to fix the problem with RunwayML, or find another convenient way to run my model.

Secondly, I want to upload this model to a online server then I can use GPU online to help me compute the result, then the model can process with live camera videos.

Thirdly, maybe I can train it with images of more than one person’s poses, then the model may learns to change anyone in the image to me. It’s cool to imagine the world is filled with myself. 

Week 9_2: GAN Case Study —— Yunhao Ye (Edmund)

I found two projects on the internet. The first one is an interesting project, it tries to teach the model to generate a random icon of Homer Simpson style. 

And here is a gif showing its result from epoch 0 to epoch 300.

Here is a gallery of the final images it gets.

Another project is a useful and practical one, its title is ‘Quick and Easy Time Series Generation with Established Image-based GANs’. Then briefly search on the internet what time series is. In Wikipedia, it is described as a series of data points indexed (or graphed) in time order.

Examples:

So the purpose of this project is to generate this kind of graphs, but it also means that these datas do not have a concrete source and they are forged by the model.

But why we need to do that? It says that many surveys in scientific or financial areas need lots of datas while those datas are protected by the privacy of people. So the researcher cannot get abundant datasets. And then they may need this technique to get more useful data for their survey.

Here is its basic structure

Here is its final result and comparison with real time series

I have also tried the Big GAN Colab codes on my own and have generated these videos.

 

Week 9_1: Neural Style Transfer Case Study —— Yunhao Ye (Edmund)

When I do my research on the internet, I found that most images created by Neural Style Transfer are generated from the content of a photo and the style of an artistic painting. So I wonder why there is not somebody generate images with reverse way, which means with the content of a painting and the style of a photo. And I began to search similar project.

Unfortunately, I cannot find a project try to generate this kind of images, but I do find a project generate images with the content of a photo and the style of another photo. The model is called Deep Photo Style Transfer. Basically, it is very just another model of Neural Style Transfer, but the designer make it fits with this special patter with the two inputs are both photo. Here are some images I get from its essay.

I think the effect is pretty good, the result is as satisfactory as the Fast Style Transfer in RunwayML. And I think this model is also something worth to try since we do not always fetch a style from a painting.

And I have found a website on which you can generate your own Neural Style Transfer images based on the input content image and style image. So I try to continue my idea. I first use the famous painting ‘The Starry Night’, which is the most popular style image used by people, as my content image. And I chose a photo of the starry sky as my style image.

(content)

(style)

(result)

I also tried this with another famous painting —— ‘The Great Wave off Kanagawa’ and a image of a huge wave.

(content)

(style)

(result)

When I try to generate my own images based on the Fast Style Transfer model on RunwayML, I try to provide content images which has special content. Since most of the content images people provide are typical photos composed with background and individual objects, I want to choose those images which do not make much sense to us. By doing this, I want to see how the model identify the content of those images while we cannot describe their content clearly.

1.

(content)

(result)

For this image, the model does not change the style of the line, instead, it add more details to the area surrounded by the lines. 

2.

(content)

(result)

For this image, the model kind of recognize each character as a block and add some features to it but it cannot get the clear structure of the character. And it also stylize the whole background.

3.

(content)

(result)

If we provide image with only texts with larger characters, the model then can recognize the structure of each character and create another style of fonts.

4.

(content)

(result)

For this image, the model does not seem pay much attention to the QR code. I feel like it just paint the original style image on or under this content image.

Week 8: Deep Dream Case Study —— Yunhao Ye (Edmund)

When discussing the Deep Dream during this week’ s lecture, it reminds me of the Uncanny Valley, since we can easily modify the human features and replace them with others’ using this technique.

Uncanny Valley is a hypothesized relationship between the degree of an object’s resemblance to a human being and the emotional response to such an object. The concept of the uncanny valley suggests that humanoid objects which imperfectly resemble actual human beings provoke uncanny or strangely familiar feelings of eeriness and revulsion in observers. Technically, it is not a strict scientific rule, but it is commonly used in our daily life. (below is a famous graph explaining it)

So I try to collect Deep Dream images which can cause horror or disgust to us with because of the Uncanny Valley. I categorize them into three types.

The first type is the most kwon form of both the Uncanny Valley and Deep Dream, it is the human with animals’ features.

Personally, I do not feel uncomfortable when I look at these images, I think it may be because we can only use dog faces and they are so familiar for us so we do not feel strange to see them.

The second type is the animal with human’s features. Actually, I cannot find Deep Dream images belonging to this type, probably because there is no human layer in the technique. But I suggest if we can produce those images, it will be much more scary then the first type. Here is an example (not a Deep Dream image), I think the puppy with human’s face gives me more shock then the man with dog’s face.

The third type are the animals with irregular body structure. This is not the area of Uncanny Valley since it is not related to human. But in my opinion, it is the animal’s extension of Uncanny Valley which can cause a similar effect.

It can be seen that the animal’s natural physical structure has been changed greatly, which provides a feeling of sutured monsters.

Additionally, I have found a website which produces Deep Dream images based on the food images. It is very creative and it really impress me a lot.

These kind of images can make people very disgusting, I think it is because we have high demand and expectation for food since it will go deep into our body. So if our food looks strange, we will not feel physically comfortable.

And for my own exploration, I tried to make different font styles with the help of Deep Dream. But the result is not very satisfactory, the font is too thin so it is hard to add much details on the the texts. Also, it will cause an error when using png file, so I cannot discard the white background.

Here is the original image I used

Here are the Deep Dream images I got

Week 6: Midterm Project —— Yunhao Ye (Edmund)

Basically, my midterm project is going to be a visual one and I will explore with the object detection technique. Currently, I want to use “Yolo V3” but it is not decided yet. What I want to do is to portray the world in a machine’s view.

Since the object detection model just tries to discover the objects it can recognize and provide its label, position and position. It does not care the details in those boxes and if there is any difference between the objects belonging to the same label. So, in my mind, the world in machine’s view may be a world filled with repetitive rectangles with different colors. And I want to demonstrate that kind of view in the “Processing” with the help of “runwayML”.

The first image is one of the famous example images of “Yolo V3” and the second one is a simulation of the visual effect of my project. I will cut off the words of the labels and make it a complete visual one. And I may also change the opacity of the boxes based on the possibility of its speculation.

This idea is inspired by some pictures only with simple shapes, I love that simple style with and want to relate it to the machine vision. 

And the combination of the simple shapes reminds me of the oil paintings portraying the static objects. When the artists want to evaluate the painting, they will find the borders of the each object in the painting in their mind, just like the object detection model does to the input images. The artists do this to observe if the objects in the painting achieve a beauty of balance between the objects. And my object may also helps to discover the beauty behind the realistic images.