Train CIFAR-10 CNN | aiarts.week05

This assignment is an exercise on training CIFAR-10 CNN on our own machine. Here is a complete log of my attempt on this matter.

Machine Specs

machine specs 01machine.specs.02

I am training on a MacBook Air (produced in early 2015), with a macOS Mojave version 10.14.6, and a 1.6 GHz Intel Core i5 CPU.

*This machine is equipped with an Intel HD Graphics 6000 GPU, but I believe it doesn’t meet the bar for our training tasks, therefore I chose to proceed this training task on the CPU.

Training Setup

batch_size = 512
num_classes = 10
epochs = 30

Total Time Spent

Data downloading time
 
Data download:  30min
Training:  1h 50min in total;  230s/epoch, 4ms/sample on average
 

Runtime Performance & Outcome

Full Training Process

Final Outcome:

Test loss: 1.0597462123870849
Test accuracy: 0.6278
 

Reflection

In this training task, I adopted a relatively large epoch count and batch size, because I was really curious what the training performance will be like with a medium-weight task on my machine.

Data downloading surprisingly enough took a very long time, probably because of the data’s size and my network condition.

Training also took a considerably long time, but it’s actually a lot better than I’d expected, given the number of epochs and batch size I choose. Training took approximately 4~5ms/sample, and that makes around 230s/epoch, since I set 50000 samples per epoch.

As the training went through more and more epochs, it’s very easy to see that the loss value steadily decreased, from 2.1606 in the first epoch, to 1.2— in the 30th epoch, with a declining step ranging from 0.01 to 0.12 per epoch. Concurrently, the accuracy value increased steadily over each epoch trained, and went from 0.1990 to 0.6278 over 30 epochs. It is clear that the more epochs the training goes through, the better the outcomes are. It’s really a fun thing to watch how these statistics change over time, and I’ really curious to see where the limit lies for this type of training in terms of loss and accuracy.

Week 5 Assignment: Training CIFAR-10 CNN – Cassie

I thought it was interesting how in the experiment we did in class, even if the epoch number was higher, the accuracy didn’t necessarily always increase, even if the general trend was upward. I wondered if this would still be true if I changed all the epoch numbers on my own laptop.

Machine specs:

When I opened the week05-02-trainCNN python code I was surprised to see such large numbers for the epochs, especially because in class we kept the epochs to double and single digits. I decided to run the code just as it was as a control/starting point (100 epochs + 2048 batch size). However, as soon as I ran it, I instantly regretted it – I was 8 minutes in and only on the 4th epoch when I decided to terminate the program.

Instead, I consulted this stack overflow forum as to what a good starting point for the batch size would be. One person mentioned a batch size of 32 is pretty standard, so I decided to use this batch size number as a control for testing out different epoch numbers.

Test 1

  • Epochs: 1
  • Batch size: 32
  • Time spent: 4 minutes
  • Accuracy: 0.4506

That was honestly a lot more accurate than I thought it was going to be. For the next test, I increased the number of epochs to 5 under the assumption that the accuracy would increase.

Test 2

  • Epochs: 5
  • Batch size: 32
  • Time spent: 22 minutes
  • Accuracy: 0.6278

This was a significantly larger accuracy than I was expecting, since in my mind 5 seems to be a pretty low epoch number compared to the original 100 that was written in the code. Although the overall accuracy increased, it was interesting to note that after it passed through the first epoch the accuracy was only 0.4413, which was lower than the accuracy was in test 1. I assumed it would be the same or at least higher, considering I am using the same computer and same numbers except for number of epochs.

Now I was curious about how changing the batch number would affect the accuracy. I was also curious as to how it would affect the time, because when I initially ran the 100 epoch + 2048 batch size code it was running at a faster rate than my first two tests (even though I was still too impatient to sit through it). I decided to keep the number of epochs at 5 for these tests as a control, so I could compare the results to test 2.

Test 3

  • Epochs: 5
  • Batch size: 100
  • Time spent: 18 mins
  • Accuracy: 0.546

As suspected, this test took less time. However, what surprised me was that the accuracy was lower in comparison to test 2. For some reason I assumed that if the batch size was higher, then the accuracy would also be higher.

The biggest takeaway from this experiment is that training takes a lot of time! At least in these three tests, the one that took the most time gave the highest accuracy which made the time seem worth it. I also didn’t experiment nearly enough to try and find the ideal factors for an optimal accuracy rate – it definitely seems that it takes a very specific combination of factors and a lot of testing in order to get the desired results.

Week 5 – Train CIFAR-10 CNN – Eszter Vigh

ev821

Machine Specs

  • Shuts down all the time for no apparent reason… 
  • Quickly running out of space
  • Won’t open Atom half the time
  • Atom Live Server won’t always work 
  • Let’s see how training goes!

Optimization

  • I learned optimization is a little complicated for me to tackle right now with no machine learning background, so I did some background research on alternative optimizers to the one we have in Root Mean Square Propagation (RMSProp).
    • So first of all, what is RMSProp?
      • It’s a gradient based optimizer.
      • It uses a moving average of squared gradients to normalize the gradient itself.
      • Source
    • So what are the alternatives?
      • (AdaGrad)~Adaptive Gradient Algorithm
        • increases the learning rate for sparser parameters and decreases the learning rate for ones that are less sparse
        • Source
      • Adam (not my best friend… but the optimizer)
        • It’s both RMSProp and AdaGrad combined! Wow!
        • Source
      • SGD (The competitor to Adam)
        • GD only computes on a small subset or random selection of data examples. SGD produces the same performance as regular gradient descent when the learning rate is low.
        • Source
      • ICLR 2019
        • The combination of SGD and Adam.
        • Source 

test epoch 1 ev821

Test Epoch Test 1/3 || Test Accuracy: 0.1853

  • Time Spent: 3 minutes
  • Code completely the same, just changing the Epochs to 1. 
  • Findings: It’s worth running the model longer than just the one Epoch. (So, yes running just one Epoch while convenient, sucks in terms of accuracy… 18% is horrific)
  • Thoughts: I wish I was patient enough to sit through more than one Epoch. 

ev821

Test Numb_Class Test 1/2 || Test Result: ERROR

  • Time Spent: 5 minutes
  • Code changed to 10 epochs (for the sake of my sanity, more on the 100 epoch test later) and the Numb_Class was changed to 5.
  • Findings: And Index Error! There is a note on the index 6 being out of bounds with size 5…
  • Thoughts: I guess this means 1 won’t work! 

ev821

Test Epoch Test 2/3 || Test Accuracy: 0 .4245

  • Time Spent: 15-20 minutes (sometime between the two batch tests)
  • Code completely the same except for the epochs being changed to 10.
  • Findings: I was hoping for a more dramatic increase in the 80% accuracy range just because of our class activity. If anything, this just showed me that I was just going to have to test more, and commit a good chunk of time (at least an hour) to testing.
  • Thoughts: It’s funny because at the time I thought 100 epochs would take around 1 hour… just wait… because it didn’t. In fact… it took just such an excruciating amount of time… I almost lost faith in this homework assignment. 

ev821

Test Batch Test 1/2 || Test Accuracy: 0 .4185

  • Time Spent: 20 minutes
  • Changed epochs to ten (for the sake of my sanity) and cut the batch number in half (1024). 
  • Findings: The accuracy wasn’t cut in half. Sure, it went down in comparison to the 10 epochs with the larger batch, but it was realistically only 1% which isn’t that much considering it was losing over 1000 batches. 
  • Thoughts: I wonder what would happen with more batches. Like at what point does the amount of batches not matter? (I’m thinking about this in terms of like, how one would look at water soluble vitamins like Vitamin C… it’s weird because you cannot store the excess, so the extra Vitamin C just straight up doesn’t do you any good… is that what batches do too?)

ev821

Test Batch Test 2/2 || Test Accuracy: 0 .718

  • Time Spent: 15 minutes
  • Changed epochs to ten (for the sake of my sanity) and cut the batch number to 10. 
  • Findings: The accuracy was higher, significantly higher, like 72%, which is the highest accuracy I have seen at this point. 
  • Thoughts: So the data doesn’t make sense. I decrease the batches and at first the accuracy goes down, then I bring them way down, by another thousand and the accuracy goes up? That would mean the accuracy graph isn’t linear and is at least curved at some point where maybe between a certain amount the accuracy decreases before going up again. (That’s really complicated math I don’t care for… but I’m thinking something like this (Image from Wikipedia):

graph

ev821

Test Numb_Class Test 2/2 || Test Result: ERROR

  • Time Spent: 5 minutes
  • Code changed to 10 epochs (for the sake of my sanity, more on the 100 epoch test later) and the Numb_Class was changed to 1.
  • Findings: And Index Error! There is a note on the index 6 being out of bounds with size 1…
  • Thoughts: Well my previous guess was right! Yay me!

ev821

Test Epoch Test 3/3 || Test Result: 0.637

  • Time Spent: 7 hours 25 minutes
  • Code unchanged from the sample.
  • Findings: Running it all school day doesn’t improve the results that dramatically. 
  • Thoughts: This took far longer than I thought. I am really tired actually. I didn’t even change the code! The goal with this was a baseline… it took me all day… I mean sure I knew machine learning took time. But ALL DAY? It wasn’t worth it! The accuracy is still only 64% (if you round generously). 

Week5 – Open the Black Box – Wenhe

Brief

For this assignment, I have trained a set of models for cifar-10 based on CNN architecture. Through the whole training process, I tested the effect of different nn networks,  dropout layers, different batch size, and epochs. Also I play with the data augmentation.

Hardware

I am using Google Cloud Platform and set up a platform with Tesla V4 and 4-core CPU.

Architecture

Above is a vgg like arch, which containers two successive conv layers, embedded with dropout layers and finally fully-connected layers.  Below is a table indicating how it goes.

Epoch Batch Size Time Accuracy Test Accuracy
20 64 40s 69% 68%
100 64 109s 75% 72%
20 5 2200s 80% 75%
20(Augumented) 64 40s 70% 71%

As we can see, the large batch size, the more accurate it is. It is also the same to epoch. However, because of the staple point, the increment of epoch does not always make the accuracy increase.

Also, I have tried some other architecture, like two layers of Conv, which gives a similar result compared with vgg one, which is likely to be related to a relatively small set of features. 

In addition, the great drop for the for the third test is mainly due to the small set of batch which makes it harder to generalize.

Week 05 Open the Black Box:Convolution & Convolutional Neural Networks—Ziying Wang

This week, I get to train cifar-10-CNN model for the first time. I trained it for 3 times, each time I modify a few parameters. The data I collected include acc, loss, val_acc, and val_loss for the training set, loss, acc, test loss and test accuracy for test set, time and batch for each training. I generated graphs with Tensorboard to demonstrate a clearer developing trend for my data.

The machine specs of my laptop are as follows:

In my first time training, I adjust the number of epoch to be 5, hoping to receive the data and the result quicker, with the batch number of 2048, my laptop took 6m16s to finish training and testing. The acc value of the training set continually increases and the loss value continually decreases. The val_acc and val_loss are similar to the acc and the loss value. With the former continually increases and the latter continually decreases. In the testing set, it is using the model as it is last trained, with the loss and the acc values the same as the last pair of val_loss and val_acc values. Test loss and test accuracy are as follows, 0.3625 isn’t a high accuracy for my model.

For the second time, I changed the number of the epoch from 5 to 10. The batch number remains 2048. Accordingly, the total time for training and testing increases to 14m48s. This time, the acc and the loss value for my training set follow the previous pattern when the number of epoch is 5, however, there are two glitches in the changing val_acc and one in val_loss, for the former value, it should be in the pattern of continuously growing, however, the data decreases when the 8th and the 10th epoch is finished; for the val_loss values, it should be decreasing continuously, however, the value bounced back a little when the 8th epoch is finished. With the model after the last training, I received a lower test loss and a higher test accuracy which is 0.4114, indicating that the performance didn’t improve too much.

For my third time training, I kept the number of the epoch at 10 and decreases the batch value from 2048 to 32. This time the total time for my training and testing lasted for 17m12s. There’s one glitch for val_acc and val_loss: when the 8th epoch is finished, the values of val_acc and val_loss didn’t follow the previous pattern. Nevertheless, after applying the final version of the trained model to the testing set, I received the best pair of test loss and test accuracy figures of my three pieces of training: the test accuracy increases a lot compared to the second result.

Concluded through Tensorboard:

The following is the graph generated on Tensorboard, the shortest thread indicates the first training, the blue one is the second and the red one is the third. It can be concluded that when the number of epoch changes, the values remain similar (see the shortest thread and the blue thread). Whereas when I make changes on batch numbers, the data vary vastly and the results I receive distinguish in a large amount (see the blue and the pink threads). It may result from the fact that the change I made between the first and the second training on the number of epochs doesn’t vary much (from 5 to 10).

My explanations:

The results can’t be concluded as the higher the number of epochs is, the more accurate this model can be, or, the smaller a batch is, the more accurate the model is. But theoretically, one epoch can be interpreted as one thorough go-over of all training resources, the model refines itself a bit when every epoch is finished, and the model should be at its best state when val_acc stops changing. With the changing in the numbers of a batch, the smaller the batch is, the more iterations it needs to go through, and the training the model has to go through.