Week 5: Training CIFAR-10

For this week’s assignment I trained the CIFAR-10 CNN by modifying the number of epochs and the dropout rates. I used CPU, as it worked best with my computer’s specs:

For the experiment I was interested in how dropout rates affect the accuracy, since I read in the article linked to the class slides how dropout rates represent a curve where there is initially a higher accuracy as the neurons are fully utilized to a higher potential but eventually a steep drop off as the dropout rates near a ratio of 0.2.

To test this, I first modified the number of epochs, changing it from 100 to 9, in order to run the experiments fast. With the epochs at 9 and no other changes, the training ran in 8 minutes and 48 seconds with an accuracy of 0.4236. The original dropout rates were in the range of the highest accuracy at 0.25, 0.25 and 0.5.

Next, I changed the dropout rates to test if accuracy would be lowered. By changing the dropout rates from 0.25 to 0.8, 0.25 to 0.7 and 0.5 to 0.9, the ratios became much lower and the training ran in 8 minutes 33 seconds with a new accuracy of 0.2215.

This exemplifies the fact that training with a lower epoch count and higher dropout rates (which increases the amount of neurons ignored in the training) does indeed lower the accuracy rate, in this case by half. So although the original accuracy rate with 9 epochs wasn’t that high at 0.4236, it was cut in half at 0.2215 when the dropout rates changed.

Week 5: Train CIFAR-10 CNN by Jonghyun Jee

Introduction

This week’s assignment is to train a CIFAR-10 CNN on our own, based on what we’ve learned in the last class. By trying different values for batch size and the number of epochs, I could find some noticeable characteristics of deep learning.

Machine Specs

CPU: Intel® Core™ i5-6200U CPU @ 2.30GHz

GPU: Intel® HD Graphics 520

RAM: 8.00 GB (7.08 GB usable)

Experiments

First, I want to address why terms such as batch size, epochs, and iterations were introduced in the first place. We use these concepts when the data we deal with is so large that we can’t pass all the data to the computer at once. So to figure out this problem, we should divide the data into a smaller size and give it to the computer one by one, updating the weight of the neural network at the end of each step to fit it to the given data.

Batch size and the number of epochs will, of course, depend on a variety of factors such as the size/type of training data, hardware performance, and so on. For this week’s assignment, I tried three experiments playing around with seemingly extreme values.

I began with standard starting points (batch size: 64, epochs: 20).

It took 68 minutes to complete this training and yielded the result of 0.7189. If given more time (or better computer), I’ll try to expand the number of epochs till the point it converges. This result was not unfavorable, but still not met the expectations–as our in-class example Fashion-MNIST yielded a way better accuracy within a shorter amount of time.

Then tried to see how this works (batch size: 1024, epoch: 1).

Ended in three minutes, yielded a poor result (0.287).  1024 will be a bit much to my computer.

And also this one (batch size: 1, epoch: 10).

Took more than 20 minutes for a single epoch, yielding 0.3589. Had to cancel right after the first epoch.

After going through a number of training processes, a question came up in my mind: is there a better way of finding the most efficient “batch size” and “epoch,” without trying to change values and re-train everything over and over?

Conclusion

Arguably, it’s hard to generalize something out of these small number of results; and yet, combined with a bit of research, I could observe some interesting features:

  1. Larger batch sizes result in faster progress in training, but it doesn’t mean that we should maximize. We have to consider our machine performance, and on top of that, we should keep in mind that larger batch sizes don’t always converge as fast. Smaller batch sizes train slower, but can converge faster. So we can probably start with the highest batch size without exceeding memory, and then lower it if it takes longer than a minute per batch.
  2. As the number of epochs increases, more number of times the weight are changed in the neural network and thus the curve goes from under-fitting to optimal to over-fitting curve. So for epochs we can start with a number between 5–10 and if the loss doesn’t lower then stop training and set that epoch size.
  3. Training took longer than I had expected. If failed to set good values on batch size and number of epochs, it’s definitely going to be time-consuming and inefficient. 
  4. There are some codes that help users calculate an optimal batch size; but still, since all data size/type and the objectives of the code vary, we anyway have to put values and try to train multiple times so we can see what would be the most optimal batch size for our own project.

Week 05 Assignment: Train CIFAR-10 CNN (Erdembileg)

Background: For this week’s assignment, we were introduced to epochs and other inner workings of training. We have been assigned to look into the different variables and tweak them in order to look at the various effects it has on the end result such as loss and accuracy. I first tweaked epochs( number of times it runs), then I moved to tweak the batch size and later the pool size to test the performance. 

Machine Specs:

Variations of the Epoch: I first changed Epoch time by making it from 100 to 10 to see faster results.

10Epoch

It took me about 25 minutes to complete the whole run. Immediately we can see that 10 iterations result in an accuracy of only 0.4077 and a loss of 1.6772. We can’t really do anything with a single experiment so therefore we need to refer to the next photo of 8 epochs.

8Epoch

This time it took me 20 minutes to complete the whole run. The test run ended with a loss of 1.7213 and an accuracy of 0.401. The loss increased quite a bit within a difference of 2 epochs while the accuracy difference was only 0.006.

5Epoch

 Let’s check the 5 epoch run. The time it took me to run was actually pretty fast compared to the previous two tests at 12 minutes. However, the accuracy is substantially different at a 0.3636 and a loss of 1.8298.

It seems that the effectiveness of running multiple epochs is shaped like a curve on a graph. You can only get substantial improvements up until a point. After that certain point, the improvements get slower and slower, requiring more and more processing.

Batch Size Experiment:

Looking through sites like Quora I came across an article stating how different batch sizes can have an effect on the accuracy of the model. After looking through people’s answers and posts, it seems like a batch size of 64 seems to be a good experiment. I ran the test of Batch size 64, 128, 256 with 5 epochs.

Batch 64

Overall the test took me about 12 minutes to finish a test of 5 epochs and a batch size of 64 and immediately we can see a huge difference in the accuracy compared to 2048 batch size test. The accuracy is at a whopping 0.5594 and a loss of 1.2511. 

I wasn’t expecting a huge amount of success in stats from a small batch size. 

Batch 128

This time we doubled the 64 batch to a 128. This made it slightly faster at 11 minutes and 30 seconds. However, we are starting to see a drop in accuracy and an increase in loss. Accuracy is 0.5223 and the loss is 1.3298.

It would seem that the higher a batch size increases, the less effective it becomes at 5 epochs. 

Batch 256

Test loss was at 1.4259 and the accuracy at 0.4896. Run time is at 10~minutes

Once again we see that increasing the batch size will not help us when we keep the epoch at 5. 

After having run 3 tests corresponding to variation in the batch size, I realize that increasing the batch size past 64 is not a good idea if I want my accuracy high and loss low. I’m now interested in founding out if changing the batch size to below 64 will increase the success of the test. I decided to test this idea out by running a test with a batch size of 32.

Batch 32

I was surprised that the test produced a much better result at 0.6237 accuracy and a loss of 1.08. Run time of 13 minutes.

What I now understand from the two types of tests that I have conducted is that epoch runs will start to stagnate after a certain number of runs. It will start to produce significantly higher results up until a point and any number of runs after that will be producing lower and lower differences than those before it. 

Batch size also plays a large role in the accuracy and loss of the test. It seems to me that 32-64 is a solid batch size for this test and I’m sure that the results would have been better if we were to increase the epoch runs. 

It would seem that there must be a perfect combination of these two elements when training a model to increase success. 

Week 05: Train CIFAR-10 CNN – Katie

Introduction

For this week’s assignment, we are training a CIFAR-10 CNN.  Before doing anything involving actual training, I first wanted to understand the CIFAR-10 dataset because I’ve never worked with it before.  I read that the dataset contains 60,000 images in total (the training set is only 50,000), with each image at 32×32. These are broken down into 10 classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. Each class contains 6,000 images. 

cifar-10 dataset classes
https://medium.com/@jannik.zuern/training-a-cifar-10-classifier-in-the-cloud-using-tensorflow-and-google-colab-f3a5fbdfe24d

Machine Specs

machine_specs

Tests

I ran 3 different tests to test the relationship between epochs and batch size, and its effect on accuracy. Test 1 is high epoch, high batch. Test 2 is low epoch, high batch. Test 3 is low epoch, low batch. 

1: (Epochs: 100, batch size: 2048, 20 hours)

The first test I did (not sensibly) was to just run the program.  It was really painful but I wanted to see it through to see what it could do. I knew this was going to take forever to complete, so I tried running it when I was asleep. I forgot that my computer also goes to sleep very quickly when I’m not using it. I had to come back to this a lot to wake my computer up, so as a result, this took me 20 hours.  This resulted in a 0.6326 accuracy.  I was pretty surprised at this, considering the testing we did in class resulted in much higher accuracy, and much faster. 

test

2: (epochs: 10, batch size: 2048, 25 minutes)

Thinking that it was the high number of epochs that took so long to process (even if it would have been continuous), I assumed that lowering them to 10 would go very fast. It definitely cut down on the total amount of time, but it seemed to be processing at about the same rate as the first test. The final result is a 0.4093 accuracy.

test2

3: (Epochs: 10, Batch size: 256, 20 minutes)

Finally, I decreased the batch size as well, meaning testing low epoch, low batch. This had surprised me that despite testing a much lower batch size than in test 2, it took about the same amount of time. Even more surprising was the fact that it resulted in a 0.5453 accuracy, meaning that the lower batch size produced a more accurate result.

test3

Conclusion

I’m still a bit confused about the different aspects that go into CNN training, and especially how their relationships with each other affect the outcomes. But at the very least I learned the amount of time it would take to try to train the CIFAR-10 dataset. I’m still not totally sure why it took so much more time for a less accurate result than in class, especially since the Fashion-MNIST dataset has an almost equal number of images as CIFAR-10. On quick evaluation, I see that each image in Fashion-MNIST is only 28×28, whereas CIFAR-10 is 32×32. I wonder if that has something to do with it, but that difference still seems pretty small for such a big gap in the results.

Week 05: Train CIFAR-10 CNN – Jinzhong

INTRODUCTION

This week’s assignment is to tr

ain a CIFAR-10 CNN ourself by using tensorflow(keras) with its built-in CIFAR-10 dataset downloader. In this experiment, I mainly tested the settings of batch size, as well as the optimizers to explore how these factors related to the changes in the training result.

MACHINE

The machine I used is Azure Cloud Computing Cluster (Ubuntu 18.04):

  • Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz x2
  • 8GB Memory
  • 4.15.0-1055 Kernel 

NETWORK LAYOUTS

The network has the following architecture:

2DConv -> 2DConv -> 2DMaxPooling -> Dropout ->

2DConv -> 2DConv -> 2DMaxPooling -> Dropout ->

Flatten -> FullyConnected -> Dropout -> FullyConnected -> Dropout 

The architecture looks good to me, it has 4 convolutional layers to extract features from the source images, plus using dropout to question layers in order to find the exact feature point of each type of picture.

EXPERIMENTS

Firstly, I tried to modify the batch size to 64 (the default 1024 is so scary…), and I get the following training outcomes:

The loss is mostly above 1, and the final accuracy is not good enough, so I again narrowed down the scale of batch size to 32, the result as followed was better than the first trial:

2

Now, the accuracy is 4% better than the previous one. So, it gives me a question: is 32 batch size small enough to get a good answer? Next, I repeatitive divided the batch size by 2 to 16 to test if it is true that small batch size does a better job in this scenario:

The outcome is positive, the accuracy is above 70% this time. So, we assume that smaller batch size can accordingly improve accuracy at this point. (But it cannot be always small. We can imagine that 0 batch size contributes nothing…)

The second experiment is to explore whether RMSprop is a proper optimizer in this scenario. So I use a 32 batch size, 10 epoch of training with 2 different optimizers – RMSprop and Adam(lr=0.0002, beta_1=0.5). 

During testing, we superisingly found that Adam optimizer is much better than the previous one in this network architecture. It reaches 70% accuracy only when it steps to the 5th epoch.

And the overall accuracy after 10 epoches of training gets 0.7626.