Introduction:
For the midterm project, I was really interested in exploring GAN (generative adversarial network) and its power to produce realistic-looking generative images. I researched some existing projects and looked into different variations and interesting usages of GAN. My project idea is composed of two parts.
1) Create a web application that allows users to upload an image of themselves and generate the celebrity version of themselves using machine learning.
2) Create an interface that allows users to generate random tattoo images. The stretch goals would be to generate tattoos based on their input sketches and to allow users to choose tattoo styles.
My objectives are divided into two aspects. During the first part, I will be using an established, well-organized dataset on celebrity faces to grasp the idea of GAN and how to use it. Then the next part is me utilizing GAN to fulfill my interest in tattoo generation. In this case, I will be collecting my own tattoo dataset. Since I foresee this to be a long-term and time-consuming project, I expect to make substantial progress for the midterm and eventually complete my goals for the final project.
Process:
To get started, I talked to Aven and also did my own research to find some existing Github repositories on projects working with GAN specifically deep convolution GAN (DCGAN).
1) Working model:
Even though I encountered some dependency issues, it was not too hard to find a working model. The ones I looked at are:
https://github.com/robbiebarrat/art-DCGAN
https://github.com/soumith/dcgan.torch
2) Training Celebrity Face Dataset:
However, when I tried to train my own model on the celebrity dataset, I faced a major roadblock: GPU vs. CPU. I did not realize that most training scripts are written GPU only. This is due to the nature of GAN that a huge dataset with a great deal of epoches are required to achieve good results. In this case, training on CPU would be extremely slow and ineffective, therefore most projects take the GPU approach. At the end, I managed to find a model that is compatible for CPU training and obtained a model with 5 epochs. I was not able to train with additional epochs due to time constraint this time. As a result, the outputs are not ideal this time.
Faces Generated with Current Model
The generated faces already resemble features of normal human faces but they do not look natural or realistic. This is definitely something I can work on for the final project. And I believe with a more robust model (additional epochs during training) the results will be very promising.
3) Collecting Tattoo Image Dataset:
For my second goal, I had to manually collect a Tattoo ImageDataset since there are no readily available ones online. In order to do so, I utilized a python script called “Google Images Download” with chromedriver which scrapes Google Images based on search keywords/key-phrases on Google Images and download them onto my computer. However, there is a limitation to the scraping tool so I was only able to download 500 photos at a time. I repeated the process several times.
Here is an example of the resulting images:
Then I had another concern about how these tattoo images have many distracting features like skin color and texture, messy background that may negatively affect the outcome of the model. To solve this issue, I switched to tattoo stock images with white(clean) background. I also changed the naming and manually went through the dataset to double check and filter out unwanted data.
4) Train model with tattoo image dataset
Next, I trained the model with the new tattoo dataset. I started with 1000 images and run the script with 25 epochs. Checkpoints are created for each epoch and the resulting models for the discriminator and generator are here:
The output produced is below:
As we can see, the outputs really do not look like anything. I suspect there are two main reasons: 1) The dataset containing 1000 images is too small for the purpose of training a GAN model. There is simply not enough data for the model to pick up. 2) The model was not trained with enough iterations and epochs. The results should greatly improve as the number of iterations and epochs increases.
I attempted to fix these issues by increasing the dataset size and train the model with more epochs. However, I experienced technical difficulties during the later training process. Despite efforts to change the dataset itself and tuning different parameters, I still was not able to train another model to produce better results.
Next Steps:
As documented in the previous sections, I have made progress in many aspects but definitely not enough to achieve my original goals. There are several aspects I needed to work on to complete this as my final project.
For the first goal, I need to train a better model that will produce realistic, generative human faces. Then I needed to connect the front end with ml5.js that capture the user image in some way with the ML model. This way users can give an input face and the model can spit out the generative version of the given input. I need to look into more technologies in terms of achieving this. It would look something like this:
For the second goal, there are a bit more steps and experimentations to be taken. First, I needed to collect a bigger dataset with good images with more careful filtering. This should not be technically challenging but requires more time. Second, training the model. The last error message displayed should be debugged to allow me to train additional models and generate results. Lastly, the presentation is important like stated in the first goal and a nice user interface should be implemented.
Takeaways:
- You never know if something works on Github until you try it yourself.
- It is more time consuming to train Machine Learning models than you think.
- Successfully trained models do not guarantee good results.
- It is important to think more than just the model itself, also consider user experience and interactivity.