Week 05 – CIFAR-10 Training – Abdullah Zameek

For this assignment, it came down to a question of which hyperparameters I was going to tweak and experiment with. Since I’m still uncertain as to what the num_classes are (and still reading up on it), I decided to just tweak the batch_size and epochs. I set the number of epochs to be 3, and adjusted the batch sizes starting from 16 all the way upto 2048.

I chose an epoch size of three because it allowed me to obtain results in a relatively short period of time.  Ideally, three cycles is not sufficient whatsover to obtain a good result, but since I’m measuring how the loss changes relative to the batch size, I think it doesn’t matter as much anymore. 

The results have been summarised below.

Batch Size of 16

Batch Size of 16
Batch Size of 32
32

Batch Size of 64

batch64

Batch Size of 128

batch128

 Batch Size of 256

256-

Batch Size of 512

512-

Batch Size of 1024

1024-

Batch Size of 2048

2048-

Batch Size Test Accuracy  Test Loss
16 0.5755 1.1882
32 0.5591 1.2426
64 0.5399 1.3063
128 0.4559 1.4981
256 0.4462 1.5408
512 0.4149 1.6366
1024 0.3734 1.7963
2048 0.3408 1.8702
     

As can be seen from the table above, a batch size of 32 seems to be optimum. 
I’m still not sure about the technical rationale as to why the Test Accuracy drops as the batch size increases, but I think it is because of the following. As the batch size increases, there are more data points to train/classify. However, since we are only iterating over them thrice, there isn’t enough time to train the examples sufficiently and completely classify them which makes the model not very accurate. However, with a smaller batch, such as 32, there are fewer samples to train which would lead to a better result. 

I decided to increase the number of epochs for a batch size of 32, so I set it to 50 and let it run overnight. I got the following result. 32-50

The test loss reported was 0.67667 and an accuracy of 0.7751. Great! 
It seems to have greatly improved since the epoch of 3, and I expect that it might converge to around 0.95+ after 300 epochs or so. Maybe this is something I would like to test once I know how to use the Intel Server.

When attempting the Data Augmentation part, I got the following error straight off the bat 

errormsg

Leave a Reply