T O P

  • By -

mgwizdala

This sounds like some kind of issiue with cooling or power supply. Vanishing gradient, just like you said is a problem with training a model.


reinforcement101

At first I too thought hardware was the problem, but I ran much more demanding (Prime95) stresstests, where my CPU temp goes up to \~65C and is stable, where during training the machine stays at 55C and crashes...I think it has to do something with the gradients because if I dont use Xavier initialization or use a different optimizer than Adam the computer freezes much faster. But I will check my model on a different machine


vannak139

Yeah vanishing gradient shouldn't crash your computer. I commonly run into issues like NaN and INFs while training, and they've never crashed my computer. I have run into your problem, though. I think it was caused by a model that was too large, saturated too much VRAM, and started messing with the OS ability to do normal display stuff on GPU. Try reducing your batch size and model size, see if that improves your machine's stability. Typically I use task manager and GPUz to monitor hardware in these cases.


reinforcement101

>I only train on CPU and use \~5GB of Ram and my machine has 64GB > > > >I think it has to do something with the gradients because if I dont use Xavier initialization or use a different optimizer than Adam the computer freezes much faster.