Midterm Proposal

The idea

Human has a long history of using logos. Those graphic elements severed as a good way for people to express their identities and feelings. Some logos that we are familiar today can actually date back to a very early age. Some logos remained the same as their original shape while some changed a lot through the process of evolution. For instance, the logo of Apple changed a lot and every new version can represent some changes regarding the identity of the company. From those long-lasting logos we can see that a good logo should be decent and simple and also consists of multiple meanings.

However, but every logo is well designed. We can still see a lot of logos that’s confusing and even scary.  Those logos might fail to leave people a good impression of the company or the product they represent. Unfortunately, designing a decent logo might not be an easy job for ordinary people since it not only involves a lot of design principles but also requires a good skill in using those softwares such as AI and PS. It will be wonderful if we can design a product which can generate decent logos for users by simply pressing a button.

Technology 

To automatically generate something sounds like what GAN can achieve. The GAN stands for the generative adversarial network. It consists of two neural networks which compete with each other. The generator will first generate some fake samples base on random noise, then the discriminator will be used to tell whether the sample generated by the generator is fake or not. Then the feedback will be given to the generator for it to generate samples of the better quality. Once the discriminator fails to tell whether the sample is fake or not, it indicates that the quality of the sample generated by the generator is good enough. By using this GAN network, I hope I will be able to train a model which can automatically generate decent images of logos to help normal people to design logos which are decent and impressive. 

Dataset 

The dataset I got is LLD (Large Logo Dataset) which contains over 600 thousands images of logos obtained from the Internet. 

What I expect 

By using this logo to train a GAN network, I am hoping I will be able to get a GAN model which can generate high quality images of logos which are will designed and impressive. I believe this will save a lot of effort of ordinary people to have their own logos.

Reference

https://en.wikipedia.org/wiki/Neural_network

https://99designs.hk/blog/design-history-movements/the-history-of-logos/

https://data.vision.ee.ethz.ch/sagea/lld/

Midterm Documentation | Logo Generator by Kefan Xu

Project Name

Logo Generator 

Intro

In this project, I proposed a new way of designing logos using Generative Adversarial Network (GAN). 

The traditional way of logo design involves a lot of human effort. One has to sketch the draft and use professional software such as the Photoshop and the Illustrator to carry it out. This job might be difficult for amateurs and involves a lot of training. As a result, there are a lot logos which totally make no sense and even terrible in the market. 

By adopting the power of the Generative Adversarial Network (GAN), I hope to help amateurs design decent logos with a single click. The GANs are combinations of two neural networks which keep competing with each other. The generator will keep generating fake content and the discriminator will try to distinguish whether those content is real or not. Once the discriminators can’t tell which one is real or fake, it means that the generator has generate content with a high quality. To achieve my goal, I found a large image set called LLD, it contains 66k logos in both the png format. Based on this dataset, I selected a smaller one with 10000 logos to train my model. 

What I did 

First I resize the image into a 64*64 size. Then used numpy to turn them into arrays. Then I tested a model I found online with this dataset. Here shows the detailed information of the generator and the discriminator of this model:

However, the result, after 1000 iteration, does not make any sense. Though in the documentation the writer said this network had quite good outcome in generating flog images, it seems it failed to generate logos.  One of the possible reason is that the generator has beaten the discriminator. That’s to say, though the quality of image generated by the generator is very low, they were good enough to fool the discriminator. So I switched to other GAN models.

Then I used the DCGAN, this model used CNN (Convolutional Neural Network), here showed the result with the batch size of 32, learning rate 0.0001, 1280 epoch and 20000 iterations.

It showed that the model really did something. In the result of the last few iterations, there are some very vogue shapes of logos. Here shows the generator and the discriminator of this model. 

Further manipulation of parameters of this model didn’t bring any obvious improvements. Then one of my friends, who major in CS, suggested me to try WGAN. Since this network is more stable, and, in this case, might get the ideal result faster than just using DCGAN. As he suggested, the training way of this model has been switched to the WGAN. Here shows the result:

We can see that it got the ideal result much faster than use only the DCGAN. After 3500 iterations, there were already some quite good results. And some text on the logo has been extracted and generated. 

Conclusion 

In my midterm project logo generator, we can see that the GAN has the potential to generate logos in a fast way. By combine the WGAN and the DCGAN, it can get the result much faster and more accurate. However, compared to other subjects, the logo is more difficult to generator since it sometimes a combination of image and text. The GAN can be confused by this combination and the training can take a longer time.

Future Work

To generate high quality logos, more iteration is needed for sure. Other than that, changing the image size and the learning rate might also influence the result. The dissertation Logo Synthesis and Manipulation with Clustered Generative Adversarial Network proposed a new approach of generating logos using a model called iWGAN, and I will try implement this new model in my future work. 

My code can be found here.

Week 5: Training CNN Using CIFAR10

When training the CNN using CIFAR10 dataset this week, I focused on two parameters, batch_size and epochs. And I have found something very interesting.

This screenshot shows the original result, given the parameters used during the class. In oder to get this result, the batch_size was set to 2048 with 3 epochs. We can see that every epoch approximately took 4-5 minutes.  The test loss was 1.90 with the accuracy of 0.33. 

Then I changed the epoch to 6. There isn’t any change with the time that every epoch took. From the result, we can see that it brought a slightly increase in its accuracy, from 0.33 to 0.38.

Then I doubled the batch_size with 3 epochs. Still, the time it took during each epoch didn’t change much. But the loss increased and accuracy dropped to 0.29, even lower than the first result.

Since it seemed that increasing the batch_size can’t contribute to the accuracy, I reduced the batch_size to 32, which was the batch_size of the given example. I ran it with 3 epochs. 

 

The result showed that it took less time in each epoch. Surprisingly, the accuracy increased a lot, to 0.54. It seemed that a smaller batch_size willed to a higher accuracy. So in the forth training, I reduced the batch_size to 16, and it yield an accuracy of 0.58. Then the training with the batch_size of 8 yield an accuracy of 0.62.

 

Since I did all the training on my own computer, the epoch times is limited. But given enough time, I am confident to bring the accuracy rate to over 75% which is the accuracy rate in the example using a batch_size of 32 with 50 epochs. 

But why smaller batch_size will lead to a higher accuracy?  Since the batch_size refers to the number of samples the model trained in each step. The result suggested that if we train it with less sample, the result will be better. This conclusion conflicts with my intuition. I took the Machine Learning class in the last semester. And I learned the concept of overfitting in that class.  So I am wondering if this is the case. If we train the model with small batch_size, the model is still far from prefect but will have a good match with the testing set. To prove my assumption, I might use other dataset to test the model later. 

iML Week 3: Combine pix2pix() and SketchRNN()

I presented the SketchRNN() last week. This model was trained based on the world’s largest doodling dataset Quick Draw, it has over 15 million of drawings which are collected when users are playing the game Quick, Draw! This model can help users to complete their drawings based on a certain chosen model. This model was integrated into the ml5.js and can collaborate with p5.js. This example shows how to build an interactive project using ml5.js and p5.js. It used the p5.js to build a canvas for the user to sketch and the SKetchRNN() will finish the rest of it.

When I was going through different models that ml5.js offered. I found another interesting one called pix2pix(). It is an image-to-image translation model. From its website, we know that it is able to “generate the corresponding output image from any input images you give it”. That’s to say, if you sketch an outline of a cat, the model will be able to generate a corresponding real cat image for you. And this image generated by this model was created based on the outline you drew. The user experience is like the model is automatically filling the color for your sketch. It also gives examples of using this model to generate building facades, shoes, etc. Though those examples show very good results. When I am trying to draw things myself, the outcomes were far from perfect. Most of them were simply filled in the yellow color. It seems the model can’t recognize what I drew. Besides my bad drawing skill, I guess this is because the model was trained based on sketches with quite perfect outlines. The result might become different if it can be trained with the Quick Draw dataset, and I want to explore this part in the future. The example it gives is also based on p5.js, which made me wonder that if I can combine pix2pix with SketchRNN.

My output
Official output

The idea is that the SketchRNN will help the user to finish his drawing and the pix2pic model will help the user to color it. This video shows what I got. When the user clicks on the sketch board, it will automatically generate the figure. This figure is sketchy and the user can help to refine it using the sketch function provided by p5. Once the user feels satisfied with his work. He can click the transfer button to let the pix2pix() to color it.

The sample sketch works quite well

Though the results were not as good as I expect, they showed at least these two models were actually doing something. I came up with two ways to improve the outcome: 1. retrain the pix2pix() with low-quality sketch dataset such as Quick Draw to improve its ability of recognizing bad drawings; 2. retrain the SketchRNN() with high-quality doodles so it can generate well-enough sketches that can be able to recognize by pix2pix(). Due to the time limit, the model chosen for SketchRNN was the cat model. Given more time, I might add more model for the user to choose and figure out a way to let the machine predict what the user wants to draw and choose the model automatically. Besides retraining the model, the future works also include improving the UI and adding more instructions.

Case Study – Magenta

Magenta is a research project developed by Google AI aims to use Machine Learning as a tool in the creative industry. 

It’s powered by TensorFlow and has a JavaScript API called Magenta.js. Here is its website: https://magenta.tensorflow.org, which shows how they used Magenta to generate music and create sketches. I found one project which was very interesting: Draw Together with a Neural Network .

It allows you to create sketches together with an AI. It has several models. Once you choose one, you only need to draw part of the figure and the AI will help you finish the rest of it. 

If you are wondering how they achieve this, you can check out this website , it describes how they taught the machine to draw using a neural network called sketch-run. The dataset they used to train this neural network was from Quick Draw. They collected this huge dataset by asking users to play their game and recorded their sketches.

 

By combing both the human intelligence and machine learning technique, this project may provide a new way of creating art works in the future.