Week 07: CartoonGAN (Midterm Progress) – Casey & Eric

Website: cartoon.steins.live

Github: https://github.com/WenheLI/cartoonGAN-Application

Methodology

    1. Model Struct
      1. To get the best cartoon-like style, we use the CartoonGAN[1] proposed by students from THU. As a typical GAN model, we need to have two separate nets, the generator, generating target images and the discriminator, telling the differences between target images and original images.
      2. The two networks structure demonstrates the complexity of this model.  And we need to build up such a model in TensorFlow python and export the model to h5 format for the next step.
      3. In addition, the model requires some high-level layers and customized layers. If we want to make the model running on a browser, we need to replace those high-level and customized layers with pure Python and basic Keras abstraction. In this way, we can have a plain model that could be able to run directly on the browser.
    2. Model Converting
      1. After the previous step, we got a workable model that can be converted by TensorFlow-convertor. 
      2. In addition, if the model involves customized layers, we need to either implement it on the javascript side and Python side.  To make life easier,  in this stage, I chose to implement it on the Python side. 
    3. Web Procedure
      1. After the model is converted, we want to put the model on the browser with the help of TensorFlow.js. 
      2. We want to have multiple models that allow users to choose from.  How to design the operation logic remain a problem.
      3. We also want to implement the application on the mobile side either in the form of PWA or Wechat MINI program. 

Experiments

    1. Model Training
      1. Because of the complex model, hours of time are needed to put into the model training process.  However, the model training is hard for GAN. We took a couple of days to put everything on track.
      2. Previously, we used a batch size of 128 with four RTX 2080ti. However, it makes harder for the generator to converge due to the large variance introduced by large batches.  Below is the loss curve for 128-batch after one day’s training.
      3. After finding that the generator gets trapped within the local optimal, we shift the batch size to 8 for a better generator.  Right now, the generator gets trained for 12 hours and the loss curve looks good. We still need days to see if it goes well. Currently, we can see the generated images have some edges on it.

  2. Model On Web

Since I got some prior models for CartoonGAN, we could implement the web part along with the model training process.  Two major problems we are facing right now are a large memory that the model consumes, which we can not improve any more in this case due to the complexity of the model.

Another problem is while inference, the model takes a large number of CPU/GPU usage which will stop the render of UI. To make the best of solving it, I introduce WebWorker to mitigate the rendering delay. 

Also, the large memory consuming makes it hard to do inference over a mobile browser, as it will take more than 1.5 GB of v-ram.

References

[1] CartoonGAN: Generative Adversarial Networks for Photo Cartoonization

Midterm Project Proposal Casey & Eric

CartoonGAN in Web

Background

    Generative adversarial network(GAN) has been widely used for tasks like image/audio generation, image to image translation, and 3D model reconstruction. GAN has become a commonly used tool for generating things. It is also a great help for doing style transfer for high-resolution images.  Though, the trade-off is that such a model requires a large number of computational resources.  Researchers have done a lot of jobs to mitigate the gap between cartoon and real image. And in this paper, we want to bring CartoonGAN[1] to the browser, optimize it, and make applications on top of it.

Motivation

TensorFlow.js makes performing deep learning projects within the browser possible. And the work of CartoonGAN[1] allows us to transfer any cartoon styles we want. Thus, it will be a great combination of web and CartoonGAN so that users can transfer the styles they want on the web.

References

[1] CartoonGAN: Generative Adversarial Networks for Photo Cartoonization

Week5 – Open the Black Box – Wenhe

Brief

For this assignment, I have trained a set of models for cifar-10 based on CNN architecture. Through the whole training process, I tested the effect of different nn networks,  dropout layers, different batch size, and epochs. Also I play with the data augmentation.

Hardware

I am using Google Cloud Platform and set up a platform with Tesla V4 and 4-core CPU.

Architecture

Above is a vgg like arch, which containers two successive conv layers, embedded with dropout layers and finally fully-connected layers.  Below is a table indicating how it goes.

Epoch Batch Size Time Accuracy Test Accuracy
20 64 40s 69% 68%
100 64 109s 75% 72%
20 5 2200s 80% 75%
20(Augumented) 64 40s 70% 71%

As we can see, the large batch size, the more accurate it is. It is also the same to epoch. However, because of the staple point, the increment of epoch does not always make the accuracy increase.

Also, I have tried some other architecture, like two layers of Conv, which gives a similar result compared with vgg one, which is likely to be related to a relatively small set of features. 

In addition, the great drop for the for the third test is mainly due to the small set of batch which makes it harder to generalize.

The Neural Network vs. the Brain/Neuron Eric Li

As the name Neural Network suggests, the idea is somehow borrowed from the human body/brain. To make it clear, the Neural Network is trying to simulate the structure of the brain and how neurons are connected. Previously, the concept of a neural network is also called the multi(layer) perceptron, which suggests that every node in the Neural Network will represent or capture a feature and by combining those features, detailed results will be the outcome. This is basically how the multi-perceptron works. However, the human brain works in a similar but different mechanism,  

Neural Network

Essentially, the Neural Network is doing feature extractions and feature combination and use these features to further deliver the output. And the CNN, RNN and more serve as an extraction mechanism for different kind of tasks. Furthermore, in practice, the Neural Network does not have a generic solution to all tasks. That is to say, for specific tasks, customized models are needed. 

Human Brain

On the other hand, the brain works similarly in a way that does the feature extractions and feature combination by hands, eyes, and ears. However, the brain can do continuous learning, which keeps yield solutions to different tasks,

Week3 – Eric Li

Brief

https://www.youtube.com/watch?v=ZD2yjnwMOO

The above is a quick demo of what I have done for this week’s assignment. Basically, what I have done is to make a Pong-game controlled by your nose rather than a keyboard. And below is the github link:

https://github.com/WenheLI/AIArts/tree/master/week3

Tech

The core thing here is how to get the user’s nose position from the webcam. And, here, I use PoseNet to do the detection. Every frame, the posenet will output the keypoints(nose, eyes, ears and so on). In our case, we only need to know where the nose is and we map the nose position to our game space position.

Apart from that, I was planning to use BodyPix rather than posenet. While it turns out that the BodyPix in ml5 can not output the exact position of each body but a mask, which makes it impossible to use in my case.  Moreover, there is a chance to get some detailed output using purely tensorflow.js, but for this assignment, the time is limited.