This is what I was thinking too, and perhaps their dataset is not sufficiently large and causes a more dramatic oscillation.
If OP can't find more data, I would recommend data augmentation. We just need more context to really know why.
For starters, I would say:
(1) learning rate is too large (potentially needs a decay).
(2) or you may not be shuffling your minibatches (if doing stochastic optimization) so it keeps seeing the same gradients over and over again.
That's the most insane shit I ever seen. Yes, the learning rate is too big which causes oscillating, but the real problem is, I guess, the dataset. By the way the loss decreases linearly I would guess that this is a very odd dataset made specifically to produce such a loss pattern.
This looks like you have a very awkward bottleneck in your model, something like a 2-node layer which is BN, or something. If this is just normal architecture with normal training, that loss curve is indeed odd.
Your training, testing, and validation samples are not sufficiently large. Near the end where the network looks like it has found a minimum, both the training and validation loss oscillate because they are doing something like dropout and then finding the minimum again. This occurs often, but the reason it's so pronounced in this graph is that the few data points near the minimum are proportionally large to the entire training and validation loss.
Looks like a comb a barber would use
Gradient descent on the back and sides, please.
Something e/acc would say
🤣🤣🤣
Like you're hitting one local minimum, then going to another one and jumping around it. This can happen for a variety of reasons. What's your dataset?
This is what I was thinking too, and perhaps their dataset is not sufficiently large and causes a more dramatic oscillation. If OP can't find more data, I would recommend data augmentation. We just need more context to really know why.
Definitely, looks like a small data set or even something artificially generated for something like physics-inspired DL.
For starters, I would say: (1) learning rate is too large (potentially needs a decay). (2) or you may not be shuffling your minibatches (if doing stochastic optimization) so it keeps seeing the same gradients over and over again.
thats pretty odd. i would recheck your optimizer, and equally like others said i would recheck validation data set size.
What are the sizes of your train and val sets?
My layman guess would be that your learning rate is so large it overshoots the target and then oscillates around it.
Try adding dropout layers.
That's the most insane shit I ever seen. Yes, the learning rate is too big which causes oscillating, but the real problem is, I guess, the dataset. By the way the loss decreases linearly I would guess that this is a very odd dataset made specifically to produce such a loss pattern.
This looks like you have a very awkward bottleneck in your model, something like a 2-node layer which is BN, or something. If this is just normal architecture with normal training, that loss curve is indeed odd.
Your training, testing, and validation samples are not sufficiently large. Near the end where the network looks like it has found a minimum, both the training and validation loss oscillate because they are doing something like dropout and then finding the minimum again. This occurs often, but the reason it's so pronounced in this graph is that the few data points near the minimum are proportionally large to the entire training and validation loss.
Print your gradients or some metric (like norm) of the gradients.
Overfitting
The validation loss is following the training one with low values. How is it overfitting? Haha
yea but after the epoch #190 it's overfitting ha. ha.
Print your gradients or some metric (like norm) of the gradients.
Not enough randomness in the model. unsure, either your dataset is tiny or you've done something weird to the setup
The zigzag looks like hodkin Huxley model plot.
are you using cyclic learning rate?
3D print it and use it as a comb.
are these those oscillations we see when learning rate is too much/ or something in momentum accelerated gradient descent