Week 5: Train CIFAR-10 CNN by Jonghyun Jee

Introduction

This week’s assignment is to train a CIFAR-10 CNN on our own, based on what we’ve learned in the last class. By trying different values for batch size and the number of epochs, I could find some noticeable characteristics of deep learning.

Machine Specs

CPU: Intel® Core™ i5-6200U CPU @ 2.30GHz

GPU: Intel® HD Graphics 520

RAM: 8.00 GB (7.08 GB usable)

Experiments

First, I want to address why terms such as batch size, epochs, and iterations were introduced in the first place. We use these concepts when the data we deal with is so large that we can’t pass all the data to the computer at once. So to figure out this problem, we should divide the data into a smaller size and give it to the computer one by one, updating the weight of the neural network at the end of each step to fit it to the given data.

Batch size and the number of epochs will, of course, depend on a variety of factors such as the size/type of training data, hardware performance, and so on. For this week’s assignment, I tried three experiments playing around with seemingly extreme values.

I began with standard starting points (batch size: 64, epochs: 20).

It took 68 minutes to complete this training and yielded the result of 0.7189. If given more time (or better computer), I’ll try to expand the number of epochs till the point it converges. This result was not unfavorable, but still not met the expectations–as our in-class example Fashion-MNIST yielded a way better accuracy within a shorter amount of time.

Then tried to see how this works (batch size: 1024, epoch: 1).

Ended in three minutes, yielded a poor result (0.287).  1024 will be a bit much to my computer.

And also this one (batch size: 1, epoch: 10).

Took more than 20 minutes for a single epoch, yielding 0.3589. Had to cancel right after the first epoch.

After going through a number of training processes, a question came up in my mind: is there a better way of finding the most efficient “batch size” and “epoch,” without trying to change values and re-train everything over and over?

Conclusion

Arguably, it’s hard to generalize something out of these small number of results; and yet, combined with a bit of research, I could observe some interesting features:

  1. Larger batch sizes result in faster progress in training, but it doesn’t mean that we should maximize. We have to consider our machine performance, and on top of that, we should keep in mind that larger batch sizes don’t always converge as fast. Smaller batch sizes train slower, but can converge faster. So we can probably start with the highest batch size without exceeding memory, and then lower it if it takes longer than a minute per batch.
  2. As the number of epochs increases, more number of times the weight are changed in the neural network and thus the curve goes from under-fitting to optimal to over-fitting curve. So for epochs we can start with a number between 5–10 and if the loss doesn’t lower then stop training and set that epoch size.
  3. Training took longer than I had expected. If failed to set good values on batch size and number of epochs, it’s definitely going to be time-consuming and inefficient. 
  4. There are some codes that help users calculate an optimal batch size; but still, since all data size/type and the objectives of the code vary, we anyway have to put values and try to train multiple times so we can see what would be the most optimal batch size for our own project.

Leave a Reply