Week 5 – Train CIFAR-10 CNN – Eszter Vigh

ev821

Machine Specs

  • Shuts down all the time for no apparent reason… 
  • Quickly running out of space
  • Won’t open Atom half the time
  • Atom Live Server won’t always work 
  • Let’s see how training goes!

Optimization

  • I learned optimization is a little complicated for me to tackle right now with no machine learning background, so I did some background research on alternative optimizers to the one we have in Root Mean Square Propagation (RMSProp).
    • So first of all, what is RMSProp?
      • It’s a gradient based optimizer.
      • It uses a moving average of squared gradients to normalize the gradient itself.
      • Source
    • So what are the alternatives?
      • (AdaGrad)~Adaptive Gradient Algorithm
        • increases the learning rate for sparser parameters and decreases the learning rate for ones that are less sparse
        • Source
      • Adam (not my best friend… but the optimizer)
        • It’s both RMSProp and AdaGrad combined! Wow!
        • Source
      • SGD (The competitor to Adam)
        • GD only computes on a small subset or random selection of data examples. SGD produces the same performance as regular gradient descent when the learning rate is low.
        • Source
      • ICLR 2019
        • The combination of SGD and Adam.
        • Source 

test epoch 1 ev821

Test Epoch Test 1/3 || Test Accuracy: 0.1853

  • Time Spent: 3 minutes
  • Code completely the same, just changing the Epochs to 1. 
  • Findings: It’s worth running the model longer than just the one Epoch. (So, yes running just one Epoch while convenient, sucks in terms of accuracy… 18% is horrific)
  • Thoughts: I wish I was patient enough to sit through more than one Epoch. 

ev821

Test Numb_Class Test 1/2 || Test Result: ERROR

  • Time Spent: 5 minutes
  • Code changed to 10 epochs (for the sake of my sanity, more on the 100 epoch test later) and the Numb_Class was changed to 5.
  • Findings: And Index Error! There is a note on the index 6 being out of bounds with size 5…
  • Thoughts: I guess this means 1 won’t work! 

ev821

Test Epoch Test 2/3 || Test Accuracy: 0 .4245

  • Time Spent: 15-20 minutes (sometime between the two batch tests)
  • Code completely the same except for the epochs being changed to 10.
  • Findings: I was hoping for a more dramatic increase in the 80% accuracy range just because of our class activity. If anything, this just showed me that I was just going to have to test more, and commit a good chunk of time (at least an hour) to testing.
  • Thoughts: It’s funny because at the time I thought 100 epochs would take around 1 hour… just wait… because it didn’t. In fact… it took just such an excruciating amount of time… I almost lost faith in this homework assignment. 

ev821

Test Batch Test 1/2 || Test Accuracy: 0 .4185

  • Time Spent: 20 minutes
  • Changed epochs to ten (for the sake of my sanity) and cut the batch number in half (1024). 
  • Findings: The accuracy wasn’t cut in half. Sure, it went down in comparison to the 10 epochs with the larger batch, but it was realistically only 1% which isn’t that much considering it was losing over 1000 batches. 
  • Thoughts: I wonder what would happen with more batches. Like at what point does the amount of batches not matter? (I’m thinking about this in terms of like, how one would look at water soluble vitamins like Vitamin C… it’s weird because you cannot store the excess, so the extra Vitamin C just straight up doesn’t do you any good… is that what batches do too?)

ev821

Test Batch Test 2/2 || Test Accuracy: 0 .718

  • Time Spent: 15 minutes
  • Changed epochs to ten (for the sake of my sanity) and cut the batch number to 10. 
  • Findings: The accuracy was higher, significantly higher, like 72%, which is the highest accuracy I have seen at this point. 
  • Thoughts: So the data doesn’t make sense. I decrease the batches and at first the accuracy goes down, then I bring them way down, by another thousand and the accuracy goes up? That would mean the accuracy graph isn’t linear and is at least curved at some point where maybe between a certain amount the accuracy decreases before going up again. (That’s really complicated math I don’t care for… but I’m thinking something like this (Image from Wikipedia):

graph

ev821

Test Numb_Class Test 2/2 || Test Result: ERROR

  • Time Spent: 5 minutes
  • Code changed to 10 epochs (for the sake of my sanity, more on the 100 epoch test later) and the Numb_Class was changed to 1.
  • Findings: And Index Error! There is a note on the index 6 being out of bounds with size 1…
  • Thoughts: Well my previous guess was right! Yay me!

ev821

Test Epoch Test 3/3 || Test Result: 0.637

  • Time Spent: 7 hours 25 minutes
  • Code unchanged from the sample.
  • Findings: Running it all school day doesn’t improve the results that dramatically. 
  • Thoughts: This took far longer than I thought. I am really tired actually. I didn’t even change the code! The goal with this was a baseline… it took me all day… I mean sure I knew machine learning took time. But ALL DAY? It wasn’t worth it! The accuracy is still only 64% (if you round generously). 

Leave a Reply