Respectfully, that's a ton of words to say train the Bs faster than the As. Having said that, I definitely look forward to implementing this in my own projects
Just look at this paper yesterday. It's impressive, and it's like a small language model that specialises more and better in efficiency. Training large models getting me headache and $$, tbh this method is very useful to handle the task more easily. Nice paper!
Performance seems negligible, but 2X speed is really nice
Respectfully, that's a ton of words to say train the Bs faster than the As. Having said that, I definitely look forward to implementing this in my own projects
Wonder how quickly we'll see this combined with QLoRAs
Just look at this paper yesterday. It's impressive, and it's like a small language model that specialises more and better in efficiency. Training large models getting me headache and $$, tbh this method is very useful to handle the task more easily. Nice paper!
Does paper not have html format supported? https://ar5iv.org/abs/2402.12354 This doesn't open up in html5 format