Week07 – Midterm Proposal

Originally, I wanted do something with GANs. I saw how good GANs are at creating novel information just by training on a dataset. Particularly, I was interested in the histogram of gradients visualizations / latent space interpolations. I was initially interested in this because I saw some GIFs of this online, the effects are rather stunning, so I wanted to create my own. I know how good generative adversarial networks work, and I know through a latent interpolation, people would better understand how the work internally. Because the generator network is essentially a deconvolutional network which takes random noise, we can perturb that latent vector slightly in order to move in some certain direction, that direction I am not sure of, and if we move around we will eventually see the different effects that the model can generate. I got sort of confused around this stage, because a lot of the gan models had weird inputs so I was not sure how to manipulate the latent variable. If I had more time, I would explore the model code more closely in order to see how I can interpolate the models to how I desired. I was also suggested to use TL-GAN, however I could not figure out how to get it to work with what I wanted ( style transfer). 

In Hindsight: Ultimately, I abandoned the project because I didn’t know how to interpolate in latent space for the models that I downloaded (cyclegan), so I instead switched focus to something closer to the domain of style transfer. One that looked particularly interesting to me was Cyclegan, which allows for style transformations from a domain without an explicit one to one matching in the training set. 

Note – I wrote this after completing the midterm

Author: Andrew Huang

Week08 – Midterm Documentation – Andrew Huang

Title : Picture to Picture Style transfer using Cyclegan

For my midterm I wanted to build and reproduce the results of cyclegan. Originally I wanted to do a kind of histogram of gradients effect, but I saw that wasn’t trivial when wanting to choose a direction to warp when picking a feature, so I decided to shift my idea closer to style transfer. The effects of style transfer for cyclegan were really impressive to me, especially with the animation of horses I saw in the repo. The model I decided to train has not done before,  however, I saw that existing data has already existed with it. The dataset  I decided to train on was Vangogh2photo, which was a collection of 200 van gogh paintings, and 200 vacation style photos. Note that the training set does not have corresponding images for each painting, that is, we do not need specific 1:1 matching in order to train on the data.   In addition to style transferring to that domain, cyclegan has a unique property in the way that it trains, allowing one to see a transferred photo being turned back into the original model. 

Training

After training for about a week, ( I used checkpointing to make sure the model can incrementally train after the intel server’s walltime was over), I think I got decent results. The original artwork was a little too different from the artwork, so I think the results weren’t perfect, nonetheless, I cherrypicked some good examples which can be demo’d to prove that the model did indeed train well on the data. 

Example Output

More photos: https://drive.google.com/open?id=1ErFCdAxX70y83rcGo3_gaQUIiFe_5cVp

Future work and Conclusion

Originally I wanted to build a site which can serve the users the model so that they can upload their own photo and see it style transfered. Overall, I think this project taught me a lot about style transfer and GANs in general. In the future, I will note to collect more and better data to make sure that the results of the generation are more realistic and interactive. 

Week5 – Trained Cifar-10 Model

For this week’s assignment, I trained the Cifar-10 CNN model for 100 epochs on the data. It took me 30 minutes because I ran it on the intel server. I actually increased the batch size because the compute speed on the server is really fast, so bringing batches in and out of memory too often would actually slow training down, so I increased it to 2048. Other than that , I didn’t tune any other parameters. The training speed on the cluster server is extremely fast compared to my Macbook, only taking 17 seconds per epoch, compared to the one hour it would take on mine. I thought about changing the optimizer to Adam, but I figured RMSprop would be better for this situation. In addition, I added model checkpointing so that I can stop the program and continue training another time. Overall, this was a good learning exercise to setup jupyter notebook and to setup checkpointing so that I can use models in the future. The Intel machine learning server was also very powerful and great to use. 

Overall I got a test accuracy of 65.4%, so I am pretty happy with the results for such a shallow CNN. 

Epoch 00100: val_acc did not improve from 0.65740
10000/10000 [==============================] - 1s 136us/step
Test loss: 0.9971296675682068
Test accuracy: 0.654

Notebook and weights: https://drive.google.com/open?id=1iml5bOarTovsm0hlL71hlhOKdAOMMSn_

Week 3 : Video Classifier with TTS – Andrew

For this week’s assignment, I tried to get something more ambitious to work  with javascript, but certain roadblocks prevented me from continuing, (namely website OAuth Api Keys). I first implemented the Video classifier from the example code, and then I added Text to speech from the examples in the github documentation. After seeing how sometimes, the predictions from the model were not optimal, this got me thinking, what if I could play a game of telephone with the model, where given some random image A, it would classify A, I would pick the second largest probability classification, I would then search for images of that class online, and repeat this process. I want to see how far I would deviate from the original image. It would be very interesting to see how far I get, but I shouldn’t get anything outside of the 1000 classes of Image-net. However, I couldn’t get Imgur’s API key’s to play nice, so then I just decided to stick with what I have for time’s sake. Creativity wise I’d say I didn’t get very far, but I think the real-time video classifier is still a pretty useful tool to have.

MobileNet classification of my phone

Code: https://drive.google.com/open?id=1UMI0viojNqtMdsvSC0coh5Zez88s1GmK

Week 02 Assignment: Case Study Research – Andrew Huang

Resnet vs ODENet

Neural Ordinary Differential Equations

One of the most prestigious AI conferences last year, NeurIPS, released several new papers detailing the bleeding edge of modern day machine learning. One of the “top papers” of that research was neural ordinary differential equations. At first at first that seems hard to understand, but I will try to explain that clearly.

A neural network is a universal function approximator, however, the way it trains is in a discrete manner, so that as you add more layers, the amount of computation needed to train it increases linearly. This is very important because in memory constrained environments ( phones, IoT devices) it becomes more and more important to make sure that the device has low memory usage and low power consumption. Big models like VGG are not feasible on these constrained environments, and it becomes apparent what the disadvantages are in not using them. Additionally, bigger models are harder to train because of a thing called “vanishing gradients” (hard to explain).  However, a couple years ago, Microsoft researchers came up with ResNet, where a simple principle was added to FF neural networks to rectify this problem. If we think of a neural network by layer as discrete matrix operations. A forward pass can be represented as  $ latex y_1 = mx+b $. However, we introduce a small change, instead of each layer simply being the output of the previous layer, it also is added on to the result of the last layer. So the next layer becomes $latex y_2 = m_2y_1 + b_2 + y_1 $ with a residual block style NN. This small change introduces some big implications. First of all, now the function can be represented as a Differential Equation, where we can now use Euler’s Method. There are some powerful implications of this, and now we can treat neural networks as continuous functions, and vastly improve their approximations. In the future, if this were to be used, neural networks could greatly improve their performance and training time, and also be more efficient in memory constrained environments

Sources:

https://arxiv.org/pdf/1806.07366.pdf

https://github.com/llSourcell/Neural_Differential_Equations/blob/master/Neural_Ordinary_Differential_Equations.ipynb