Week 02 Assignment: Case Study Research – Andrew Huang

Resnet vs ODENet

Neural Ordinary Differential Equations

One of the most prestigious AI conferences last year, NeurIPS, released several new papers detailing the bleeding edge of modern day machine learning. One of the “top papers” of that research was neural ordinary differential equations. At first at first that seems hard to understand, but I will try to explain that clearly.

A neural network is a universal function approximator, however, the way it trains is in a discrete manner, so that as you add more layers, the amount of computation needed to train it increases linearly. This is very important because in memory constrained environments ( phones, IoT devices) it becomes more and more important to make sure that the device has low memory usage and low power consumption. Big models like VGG are not feasible on these constrained environments, and it becomes apparent what the disadvantages are in not using them. Additionally, bigger models are harder to train because of a thing called “vanishing gradients” (hard to explain).  However, a couple years ago, Microsoft researchers came up with ResNet, where a simple principle was added to FF neural networks to rectify this problem. If we think of a neural network by layer as discrete matrix operations. A forward pass can be represented as  $ latex y_1 = mx+b $. However, we introduce a small change, instead of each layer simply being the output of the previous layer, it also is added on to the result of the last layer. So the next layer becomes $latex y_2 = m_2y_1 + b_2 + y_1 $ with a residual block style NN. This small change introduces some big implications. First of all, now the function can be represented as a Differential Equation, where we can now use Euler’s Method. There are some powerful implications of this, and now we can treat neural networks as continuous functions, and vastly improve their approximations. In the future, if this were to be used, neural networks could greatly improve their performance and training time, and also be more efficient in memory constrained environments

Sources:

https://arxiv.org/pdf/1806.07366.pdf

https://github.com/llSourcell/Neural_Differential_Equations/blob/master/Neural_Ordinary_Differential_Equations.ipynb

Leave a Reply