Final Documentation for CartoonGAN – Eric & Casey

Website: https://cartoon.steins.live/

Recap

In the midterm, we have implemented a CartoonGAN on the browser that allows users to upload or take a photo and transform it into a cartoon-like style.  And we have trained two models, one is for Miyazaki and the other is one is for Aku no Hana.

Current Solution

Having done some experiments with generative arts, we decided to continue working on the CartoonGAN project by adding features including GIF transformation, foreground/background only transformation, and exporting the CartoonGAN into an ml5 function/model.

Continue reading “Final Documentation for CartoonGAN – Eric & Casey”

FInal Project Concept Eric & Casey

For this final project, we had two ideas in our minds. I will introduce them one by one:

Refine the CartoonGAN project

This idea is a continuation of our midterm project by adding two functionalities: gif converting and foreground/background only converting.

In detail, the first one requires work on unpacking gif into frames and pipe all frames as a batch to the model. Then the model gives the output value and our app will pack those frames into a new gif.  Through this process, we need to put extra attention to avoid running out of memory. 

As for the second idea, I am planning to combine the BodyPix model with our CartoonGAN model. However, to maximize the efficiency of the computation.  I will apply some WebGL tricks with the BodyPix and CartoonGAN output. So that we can deliver the best user experience.

Generative ART with Sentiment Analysis on both audio and text

The idea is that the text could contain certain emotions so does the audio/speech. What’s more interesting is that even the same content, the text, and the audio could have different emotions. We can use this kind of conflictions to make generative arts.

However, there are some problems remaining here.  First, there is no real-time sentiment analysis during these years. We can find an alternative that uses face expression.

The other thing is how to visualize or say generate the emotions that will attract the audience?  This seems to be a harder and more critical question than the previous one.

Week 11 Assignment: Explore BigGAN Eric Li

For this assignment, I played around with the BigGAN for video and image generating.  In detail, I played with the truncation and noise_seed for image generation.  From my understanding, the truncation is to limit the latent space and resample from a refined space so that the generated image will be more vivid as the resampled latent space is denser. Besides, the noise_seed adds noise to the image generation process so that we are able to see different images sampled from the latent space.

Continue reading “Week 11 Assignment: Explore BigGAN Eric Li”

Week10- StyleTransfer Training

Traning

The training script for StyleTransfer depends on a quite old TensorFlow version which imports extra difficulties for building the environment. Moreover, the training process uses the CoCo dataset which contains 14 GB of images. It also introduces a lot of training time. 

In my case, I used RTX2080Ti to do the training process. For two epochs and 8 batch, it takes about 3 hours to finish the training phase. In general, it is super fast considering how many images CoCo could have.

Convert

Since this is an old script, I can not directly transform it into a browser compatible model. In this case,  I used the ml5 script to do the work. 

Testing

Below aresome images I converted:

Midterm Project Casey&Eric

Social Impact

By bringing the CartoonGAN model into the browser, we make it possible to transfer the real image into the target style. That could bring the users and their loving images into the cartoon they like. And even more, if it is a GIF or video, we can directly convert it into the cartoon style. It makes possible for users to directly get the styled GIF or video. In a word, we are trying to blur the boundary of the cartoon world and the physical world by putting the model running on the browser.

Future Development

In terms of the future plan, we have three things primary aspects, realtime performance, more input formats, and ml5 function wrapping.

Realtime Performance:

There are some potential solutions for realtime performance support. One is to deploy the model and inference service over the server-side. However,  with a powerful computer, we can do the generation process within seconds. However, if we are trying to realtime generating a video or long gif, such a solution will not best fit in the scenario as users might need to wait longer to upload and fetch videos.

Another solution for that is to use native hardware acceleration to boost the edge inference procedure. Thus, here we can use tfjs-node or TensorFlow Lite to play the game. However, we still need to see if such native hardware acceleration will handle real-time inference.

More input formats:

As we have proposed in the previous section, it will make more sense if more formats of input could be supported like Gif or video. As for the gif, it is possible to unpack a gif into a slice of images and do inference over them. After that, we can pack the styled gif back.

However, as for the video, we may do not want to support such functionality over the browser as it will easily crash the system due to a large number of frames a video could contain. We need to seek solutions from the server-side and the native side.

ML5 wrapping:

As the CartoonGAN itself is interesting, we can also export the model into the ml5 structure. In this way, it will best benefit the community by bringing more diverse models.